AU2004231233A1 - Render Time Estimation - Google Patents

Render Time Estimation Download PDF

Info

Publication number
AU2004231233A1
AU2004231233A1 AU2004231233A AU2004231233A AU2004231233A1 AU 2004231233 A1 AU2004231233 A1 AU 2004231233A1 AU 2004231233 A AU2004231233 A AU 2004231233A AU 2004231233 A AU2004231233 A AU 2004231233A AU 2004231233 A1 AU2004231233 A1 AU 2004231233A1
Authority
AU
Australia
Prior art keywords
territory
edge
module
dependent
render
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2004231233A
Inventor
Son Thai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2003907201A external-priority patent/AU2003907201A0/en
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2004231233A priority Critical patent/AU2004231233A1/en
Publication of AU2004231233A1 publication Critical patent/AU2004231233A1/en
Abandoned legal-status Critical Current

Links

Description

S&F Ref: 700343
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant: Actual Inventor(s): Address for Service: Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3chome, Ohta-ku, Tokyo, 146, Japan Son Thai Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Render Time Estimation ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU 2003907201 [32] Application Date 23 Dec 2003 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- C RENDER TIME ESTIMATION 0 z C Field of the Invention The present invention relates to renderers and in particular to a method for 5 predicting whether a renderer can render a page fast enough for a print engine.
Background Print engines have real-time constraints about when pixel data needs to be available.
If pixel data is not supplied on time, incorrect output will be produced. The traditional approach to addressing these constraints is to place all the required pixel data in memory prior to commencing the print operation.
When a computer application provides data to a device for printing and/or display, an intermediate description of the page is often given to the device driver software in a page description language, such as PostScript or PCL, which provides descriptions of the objects to be rendered onto the page, rather than a raster image to be printed.
Equivalently, a set of descriptions of graphics objects may be provided in function calls to a graphics interface, such as the Microsoft Windows GDI, or Unix's X- 11. The page is typically rendered for printing and/or display by an object-based graphics system (or Raster Image Processor).
Most of these object based graphics systems utilise a large area of memory, known as a frame store or a page buffer, to hold a pixel-based image of the page or screen for subsequent printing and/or display. Typically, the outlines of the graphic objects are calculated, filled and written into the frame store. For two-dimensional graphics, objects that appear in front of other objects are simply written into the frame store after the background objects, thereby replacing the background on a pixel by pixel basis. This is 583213 specification -2o commonly known as "Painter's algorithm". Objects are considered in priority order, from the rearmost object to the foremost object. Typically each object is rasterised in scanline
O
Z order and pixels are written to the framestore in sequential runs along each scanline.
NSome graphics interfaces allow a logical or arithmetic operation to be specified, to be c 5 performed between one or more graphics objects and the pixels already rendered in the e¢3 frame buffer. In these cases the principle remains the same: objects (or groups of objects) are rasterised in scanline order, and the result of the specified operation is calculated and Swritten to the framestore in sequential runs along each scanline.
As printers become higher in resolution and higher in throughput, the amount of required memory for storing such pixel data has become prohibitive, typically in excess of 100 Mbytes. Such large amounts of memory are costly and difficult to run at high speed.
One method for overcoming the large frame-store problem is the use of "banding".
When banding is used, only part of the framestore exists in memory at any one time. All of the objects to be drawn are retained in a "display list", which is an internal representation of the information required to draw the objects on the page. The display list is considered in object order as above, and only those pixel operations which fall within the fraction of the page which is held in the band are actually performed. After all objects in the display list have been drawn, the band is sent to the printer (or to intermediate storage) and the process is repeated for the next band of the page.
It is desirable for a renderer to supply pixel data in real-time as the page is being printed. For simple pages, the renderer should be able to supply pixel data in time to keep the print engine busy. For complex pages, the renderer may not be able to render fast enough to keep up with the print engine. Most drum printers are unable to stop the drum mid-page, so halting the print engine is not a good solution.
583213 specification -3- O A method of overcoming this difficulty is to predict whether the renderer can render a page fast enough to keep up with the print engine before submitting a print job to the
O
Z renderer. If the renderer is not capable of keeping up with the print engine, an alternative N course of action can be taken to ensure that the page is correctly printed. An example of Mc, 5 such action is pre-rendering the page into supplementary memory and printing the page by reading pixel data from supplementary memory.
C€3 N Therefore in low-cost printers, where the frame store memory is only large enough Sto store a band, there is a need for a render time estimation to maximise a printer's (,i efficiency.
The publication U.S. patent number 5,216,754 by Sathi et al., issued on 1 June 1993, discloses a system for predicting the render time based on the complexity of a page.
Fig. 25 shows the data flow in the system described in the aforementioned US patent 5,216,754. In this system a decomposer 2502 converts page commands from image input terminal 2500 in page description language into image data of a low-level format. The page is divided into bands each consisting of a number of scanlines. The decomposer 2502 generates a list of objects for each band of the page and estimates the time taken to render a band by looking at complexity factors of objects in that band. The complexity factors depend on the number of objects, the geometry and size of objects. Another factor is the compression ratio of a compressed image that is to be decompressed. If a band is too complex, the real-time capability of the renderer is exceeded, and the page image data is simplified by an image manipulation sub-system 2504 before being rendered by the image generation sub-system 2506 and printed by the image output terminal 2508. The simplification can involve pre-rendering the page image. On the other hand, if a band is not complex, then the page image data is passed directly to the image generation subsystem 2506 for rendering to the image output terminal 2508.
583213 specification -4- O The system described in the Sathi patent is typical of the prior art in that it converts the page description language into objects on the page. This intermediate format is 0 Z typically rendered by applying the Painter's Algorithm where one object is rendered into N memory, then another object is rendered into the same memory, possibly overwriting M 5 some pixel data from the first object if the two objects overlap. Such a renderer is called (Ni an object-based renderer.
MNi NOther graphic systems utilise pixel-sequential rendering to overcome both the large Sframestore problem and the over-painting problem. In these systems, each pixel is generated in raster order. Again, all objects to be drawn are retained in a display list. On each scan line, the edges of objects, which intersect that scanline, are held in increasing order of their intersection with the scan line. These points of intersection, or edge crossings, are considered in turn, and used to toggle an array of fields that indicate the activity of the objects in the display list. There is one activity field for each object painting operation that is of interest on the scan line. There is also a field to indicate operations that do not require previously generated data. Between each pair of edges considered, the colour data for each pixel, which lies between the first edge and the second edge, is generated by using a priority encoder on the activity flags to determine which operations are required to generate the colour, and performing only those operations for the span of pixels between the two edges. In preparation for the next scanline, the coordinate of intersection of each edge is updated in accordance with the nature of each edge, and the edges are sorted into increasing order of intersection with that scanline. Any new edges are also merged into the list of edges.
Graphic systems which use pixel-sequential rendering have significant advantages in that there is no frame store or line store, no unnecessary over-painting, and the object 583213specification o priorities are dealt with in constant order time by the priority encoder, rather than in order N time, where N is the number of priorities.
O
Z If the render-time estimation system described by the U.S. patent number 5,216,754 is applied to a pixel-sequential rendering system, then overlapping opaque objects cause 5 the rendering time to be over-estimated. This leads to some unnecessary pre-rendering, C€3 thereby reducing the system's efficiency.
(Ni Thus the render time estimation system described by the U.S. patent number S5,216,754 is not suitable for use in a system which uses a different intermediate format of a page image, and/or a different type of renderer such as for example a pixel-sequential renderer.
The discussion in the "Background" section relates to documents or devices which form public knowledge through their respective publication and/or use. Such should not be interpreted as a representation that such documents or devices in any way form part of the common general knowledge in the art.
Summary It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
According to a first aspect of the present disclosure, there is provided a method of estimating a time taken to render a collection of consecutive scanlines from a render job, the method comprising the steps of receiving a description of said renderjob comprising a plurality of levels; receiving a model ofa pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque levels; estimating territories of said scanlines that receive an unobscured contribution from one or more 583213specification -6levels in said description; and estimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines.
O
Z According to a second aspect of the present disclosure, there is provided an apparatus for estimating a time taken to render a collection of consecutive scanlines from c 5 a render job, the apparatus comprising means for receiving a description of said render job comprising a plurality of levels; means for receiving a model of a pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque Slevels; means for estimating territories of said scanlines that receive an unobscured contribution from one or more levels in said description; means for estimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines.
According to a third aspect of the invention, there is provided a computer program comprising machine-readable program code for controlling the operation of a data processing apparatus on which the program code executes to perform a method of estimating a time taken to render a collection of consecutive scanlines from a render job, the method comprising the steps of receiving a description of said render job comprising a plurality of levels; receiving a model of a pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque levels; estimating territories of said scanlines that receive an unobscured contribution from one or more levels in said description; and estimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines.
According to a further aspect of the invention there is provided a computer program product comprising machine-readable program code recorded on a machinereadable recording medium, for controlling the operation of a data processing apparatus on which the program code executes to perform a method of estimating a time taken to 583213specification render a collection of consecutive scanlines from a render job, the method comprising the steps of: receiving a description of said render job comprising a plurality of levels;
O
Z receiving a model of a pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque levels; estimating territories of said scanlines that Cc 5 receive an unobscured contribution from one or more levels in said description; and e¢3 estimating, from said description, said model and said territories, the time taken for said (Ni renderer to render said scanlines.
According to a further aspect of the invention there is provided a system comprising a storage unit for storing a render job comprising a plurality of levels; and a processor for estimating a time taken by a pixel-based renderer to render a collection of consecutive scanlines from said render job, wherein said processor estimates territories of said scanlines that receive an unobscured contribution from one or more levels in said description; and estimates the time taken for said renderer to render said scanlines, wherein said time estimate is based on said description and said territories.
Brief Description of the Drawings One or more embodiments of the present invention will now be described with reference to the drawings, in which: Fig. 1 is a schematic block diagram representation of a computer system incorporating the preferred arrangement; Fig. 2 is a block diagram showing the functional data flow of the preferred arrangement; Fig. 3 is a schematic block diagram representation of the pixel sequential rendering apparatus and associated display list and temporary stores; 583213specification -8o Fig. 4 is a schematic functional representation of the edge processing module of Fig. 3;
O
Z Fig. 5 is a schematic functional representation of the priority determination module of Fig. 3; Fig. 6 is a schematic functional representation of the fill colour determination e¢3 _module of Fig. 3; (Ni SFigs. 7A to 7C illustrate pixel combinations between source and destination; SFig. 8A illustrates a two-object image used as an example for explaining the operation of the preferred arrangement; Fig. 8B shows a table of a number of edge records of the two-object image shown in Fig. 8A; Figs. 9A and 9B illustrate the vector edges of the objects of Fig. 8A; Fig. 10 illustrates the rendering of a number of scan lines of the image of Fig. 8A; Fig. 11 depicts the arrangement of an edge record for the image of Fig. 8A; Fig. 12A depicts the format of an active edge record created by the edge processing module 400 of Fig. 4; Fig. 12B depicts the arrangement of the edge records used in the edge processing module 400 of Fig.4; Figs. 12B to 12J illustrate the edge update routine implemented by the arrangement of Fig. 4 for the example of Fig. 8A; Figs. 13A and 13B illustrate the odd-even and non-zero winding fill rules; Figs. 14A to 14E illustrate how large changes in X coordinates contribute to spill conditions and how they are handled; Figs. 15A to 15E illustrates the priority filling routine implemented by the arrangement of Fig. 583213 specification -9- Figs. 16A to 16D provide a comparison between two prior art edge description formats and that used in the preferred apparatus;
O
Z Figs. 17A and 17B show a simple compositing expression illustrated as an N expression tree and a corresponding depiction; c 5 Fig. 17C shows an example of an expression tree; Fig. 18 depicts the priority properties and status table of the priority determination Smodule of Fig. 3; SFig. 19 shows a table of a number of raster operations; Figs. 20A and 20B shows a table of the principal compositing operations and their corresponding raster operations and opacity flags; Fig. 21 depicts the result of a number of compositing operations; Fig. 22A shows a series of fill priority messages generated by the priority determination module 500; Fig. 22B shows a series of colour composite messages generated by the fill colour determination module 600; Fig. 23 is a schematic functional representation of the pixel compositing module of Fig. 3 in accordance with one arrangement; Figs. 24A, B, C, and D shows the operation performed on the stack for each of the various stack operation commands in the Pixel Compositing Module 700 of Fig. 3; Fig. 25 is a block diagram showing the data flow in the prior art system described by U.S. patent number 5,216,754 (Sathi et al.); Fig. 26 is a block diagram showing the output buffer of the rendering system of Fig.
2; Figs. 27A to 27P are examples of different possible cases for overlapping objects; Fig. 28 is a flowchart of the Render Time Estimation control flow and processes; 583213 specification o Fig. 29 is a flowchart of the render-time estimation step in the method of Fig. 28; Fig. 30 is a flowchart of the calculation of page territories in the estimation step of 0 Z Fig. 29; and Fig. 31 is a schematic block diagram of a general-purpose computer on which the 5 render time estimation method may be implemented.
e¢3 Detailed Description including Best Mode Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
For a better understanding of the pixel sequential rendering system 1, a brief overview of the system is first undertaken in Section 1.0. Then follows a brief discussion in Section 2.0 of the driver software for interfacing between a third party software application and the pixel sequential rendering apparatus 20 of the system. A brief overview of the pixel sequential rendering apparatus 20 is then discussed in Section As will become apparent, the pixel sequential rendering apparatus 20 comprises an instruction execution module 300; an edge tracking module 400; a priority determination module 500; an optimisation module 550, a fill colour determination module 600; a pixel compositing module 700; and a pixel output module 800. A brief overview of these modules is described in Sections 3.1 to 3.7.
PIXEL SEQUENTIAL RENDERING SYSTEM Fig. 1 illustrates schematically a computer system 1 configured for rendering and presentation of computer graphic object images. The system includes a host processor 2 583213 specification -11associated with system random access memory (RAM) 3, which may include a nonvolatile hard disk drive or similar device 5 and volatile, semiconductor RAM 4. The 0 Z system 1 also includes a system read-only memory (ROM) 6 typically founded upon semiconductor ROM 7 and which in many cases may be supplemented by compact disk rn 5 devices (CD ROM) 8. The system 1 may also incorporate some means 10 for displaying C€3 images, such as a video display unit (VDU) or a printer, or both, which operate in raster C€3 N fashion.
SThe above-described components of the system 1 are interconnected via a bus system 9 and are operable in a normal operating mode of computer systems well known in the art, such as IBM PC/AT type personal computers and arrangements evolved therefrom, Sun Sparcstations and the like.
Also seen in Fig. 3, a pixel sequential rendering apparatus 20 connects to the bus 9, and in the preferred arrangement is configured for the sequential rendering of pixelbased images derived from graphic object-based descriptions supplied with instructions and data from the system 1 via the bus 9. The apparatus 20 may utilise the system RAM 3 for the rendering of object descriptions although preferably the rendering apparatus may have associated therewith a dedicated rendering store arrangement 30, typically formed of semiconductor RAM.
The pixel sequential rendering system operates generally speaking in the following manner. A render job to be rendered is given to driver software running on processor 2 by third party software for supply to the pixel sequential renderer 20. The render job is typically in a page description language or in a sequence of function calls to a standard graphics API, which defines an image comprising objects placed on a page from a rearmost object to a foremost object to be composited in a manner defined by the render job. The driver software converts the render job to an intermediate render job and 583213 specification -12- B predicts whether the intermediate render job can be rendered fast enough by the renderer (Ni to supply data in real time to the print engine of the printer
O
Z The renderer 20 takes in the intermediate render job and the prediction of whether N the intermediate render job can be rendered in real time. If the intermediate render job Cc 5 can be rendered in real time, the renderer will render the intermediate render job and write C€3 the rendered data to an output buffer, from which data is supplied to the print engine of N printer 10. If the intermediate render job cannot be rendered in real time, the intermediate Srender job is rendered by renderer 3001. The renderer 3001 may be a software module running on the host processor 2. Alternatively, the renderer 3001 can be the pixel sequential renderer 20. Once the renderer 3001 has rendered the intermediate render job, the resulting pre-rendered data is sent to a compressor 3000, which compresses the prerendered data and stores the compressed data for later use.
A decompressor 3002 takes in the stored compressed data and decompresses the data into rendered data. The decompressor 3002 supplies the decompressed rendered data for printing by the printer 10. The decompressed rendered data may pass through the same output buffer used by the renderer 20. The compressor 3000 and decompressor 3002 are preferably software modules running on the host processor 2. Alternatively, the compressor 3000 and decompressor 3002 may run on distributed processors (not shown) connected to bus 9. As a further alternative, the compressor 3000 and decompressor 3002 may be implemented as dedicated hardware connected to the bus 9.
When an intermediate render job is fed to the pixel sequential renderer 20, the pixel sequential renderer generates the colour and opacity for the pixels one at a time in raster scan order. At any pixel currently being scanned and processed, the pixel sequential renderer 20 composites only those exposed objects that are active at the currently scanned pixel. The pixel sequential renderer determines that an object is active at a currently 583213 specification -13 scanned pixel if that pixel lies within the boundary of the object. The pixel sequential renderer achieves this by reference to a fill counter associated with that object. The fill
O
Z counter keeps a running fill count that indicates whether the pixel lies within the boundary of the object. When the pixel sequential renderer encounters an edge associated with the object it increments or decrements the fill count depending upon the direction of the edge. The renderer is then able to determine whether the current pixel is within the (Ni N boundary of the object depending upon the fill count and a predetermined winding count Srule. The pixel sequential renderer determines whether an active object is exposed with reference to a flag associated with that object. This flag associated with an object indicates whether or not the object obscures lower order objects. That is, this flag indicates whether the object is partially transparent, and in which case the lower order active objects will thus make a contribution to the colour and opacity of the current pixel.
Otherwise, this flag indicates that the object is opaque in which case active lower order objects will not make any contribution to the colour and opacity of the currently scanned pixel. The pixel sequential renderer determines that an object is exposed if it is the uppermost active object, or if all the active objects above the object have their corresponding flags set to transparent. The pixel sequential renderer then composites these exposed active objects to determine and output the colour and opacity for the currently scanned pixel.
The driver software, in response to the page, also extracts edge information defining the edges of the objects for feeding to the edge tracking module. The driver software also generates a linearised table (herein after called the priority properties and status table or the level activation table) of the expression tree of the objects and their compositing operations which is fed to the priority determination module. The priority properties and status table contains one record for each object on the page. In addition, 583213 specification -14o each record contains a field for storing a pointer to an address for the fill of the corresponding object in a fill table. This fill table is also generated by the driver software
O
Z and contains the fill for the corresponding objects, and is fed to the fill determination (Ni module. The priority properties and status table together with the fill table are devoid of any edge information and effectively represent the objects, where the objects are C€3 infinitively extending. The edge information is fed to the edge tracking module, which (Ni N determines, for each pixel in raster scan order, the edges of any objects that intersect a Scurrently scanned pixel. The edge tracking module passes this information onto the priority determination module. Each record of the priority properties and status table contains a counter, which maintains a fill count associated with the corresponding object of the record. The priority determination module processes each pixel in a raster scan order. Initially, the fill counts associated with all the objects are zero, and so all objects are inactive. The priority determination module continues processing each pixel until it encounters an edge intersecting that pixel. The priority determination module updates the fill count associated with the object of that edge, and so that object becomes active. The priority determination continues in this fashion updating the fill count of the objects and so activating and de-activating the objects. The priority determination module also determines whether these active objects are exposed or not, and consequently whether they make a contribution to the currently scanned pixel. In the event that they do, the pixel determination module generates a series of messages which ultimately instructs the pixel compositing module to composite the colour and opacity for these exposed active objects in accordance with the compositing operations specified for these objects in the priority properties and status table so as to generate the resultant colour and opacity for the currently scanned pixel. These series of messages do not at that time actually contain 583213specification the colour and opacity for that object but rather an address to the fill table, which the fill determination module uses to determine the colour and opacity of the object.
0 Z For ease of explanation the location (viz level) of the object in the order of the objects from the rearmost object to the foremost is herein referred to as the object's 5 priority. Preferably, a number of non-overlapping objects that have the same fill and compositing operation, and that form a contiguous sequence in the order of the objects, N may be designated as having the same priority. Most often, only one priority (viz level) is Srequired per object, however some objects may require several instructions, and thus the object may require several priorities (viz levels). For example, a character with a colour fill may be represented by, a bounding box on a first level having the colour fill, a one-bit bitmap which provides the shape of the character on a second level, and the same bounding box on a third level having the colour fill, where the levels are composited together xor Page) and S) xor B to produce the colour character.
The pixel sequential renderer also utilises clip objects to modify the shape of another object. The pixel sequential renderer maintains an associated clip count for the clip in a somewhat similar fashion to the fill count to determine whether the current pixel is within the clip region.
As will become apparent, there exist runs of pixels having constant colour and opacity between adjacent edges. The pixel sequential renderer can composite the colour and opacity for the first pixel in the run and in subsequent pixels in the run reproduce the previous composited colour and opacity without any further compositions, thus reducing the overall number of compositing operations. In the circumstances where a run of pixels comprises varying colour and opacity at one or more priority levels, this technique cannot be used. However, in the latter case the preferred arrangements are still able to minimise the number of compositing operations, as will be described below in more detail.
583213specification -16- SOFTWARE DRIVER 0 Z A software program (hereafter referred to as the driver), is loaded and executed on the host processor 2 for generating instructions and data for the pixel-sequential graphics rendering apparatus 20, from data provided to the driver by a third-party application. The third-party application may provide data in the form of a standard language description of (Ni the objects to be drawn on the page, such as PostScript and PCL, or in the form of function calls to the driver through a standard software interface, such as the Windows GDI or X-11. The software driver also predicts whether the intermediate render job can be rendered fast enough by the renderer to supply data in real time to the print engine.
Preferably, the software comprises two modules, one for compiling the rendering job (called the display list generator, or DLG), and the other for estimating the rendering time. For ease of explanation, the software module for estimating the rendering time is hereinafter called the RTE module. Alternatively, the DLG and the RTE can be integrated into the same software driver module.
The driver software separates the data associated with an object (supplied by the third-party application) into data about the edges of the object, any operation or operations associated with painting the object onto the page, and the colour and opacity with which to fill pixels which fall inside the edges of the object.
The driver software partitions the edges of each object into edges which are monotonic increasing in the Y-direction, and then divides each partitioned edge of the object into segments of a form suitable for the edge module described below. Partitioned edges are sorted by the X-value of their starting positions and then by Y. Groups of edges starting at the same Y-value remain sorted by X-value, and may be concatenated together 583213specification -17to form a new edge list, suitable for reading in by the edge module when rendering reaches that Y-value.
O
Z The driver software sorts the operations, associated with painting objects, into priority order, and generates instructions to load the data structure associated with the 5 priority determination module (described below). This structure includes a field for the e¢3 fill rule, which describes the topologyof how each object is activated by edges, a field for (Ni N the type of fill which is associated with the object, being painted, and a field, to identify Swhether data on levels below the current object is required by the operation. There is also a field, herein called clip count, that identifies an object as a clipping object, that is, as an object which is not, itself, filled, but which enables or disables filling of other objects on the page.
The driver software also prepares a data structure (the fill table) describing how to fill objects, said fill table is indexed by the data structure in the priority determination module. This allows several levels in the priority determination module to refer to the same fill data structure.
The driver software assembles the aforementioned data into a job containing instructions for loading the data and rendering pixels, in a form that can be read by the rendering system 20, and transfers the assembled job to the rendering system 20. This may be performed using one of several methods known to the art, depending on the configuration of the rendering system and its memory.
The render time estimation module (RTE) looks at the aforementioned data and the instructions in the aforementioned rendering job to make a prediction about the possibility of real-time rendering. The result of the prediction is appended to the aforementioned rendering job. The RTE module is described in more details in section 583213specification -18- PIXEL SEQUENTIAL RENDERING APPARATUS Referring now to Fig. 2, a functional data flow diagram of the preferred
O
Z arrangement is shown. The functional flow diagram of Fig. 2 commences with an object graphic description 11 which is used to describe those parameters of graphic objects in a 5 fashion appropriate to be generated by the host processor 2 and/or, where appropriate, e¢3 stored within the system RAM 3 or derived from the system ROM 6, and which may be e¢3 interpreted by the pixel sequential rendering apparatus 20 to render therefrom pixel-based images. For example, the object graphic description 11 may incorporate objects with edges in a number of formats including straight edges (simple vectors) that traverse from one point on the display to another, or an orthogonal edge format where a twodimensional object is defined by a plurality of edges including orthogonal lines. Further formats, where objects are defined by continuous curves are also appropriate and these can include quadratic polynomial fragments where a single curve may be described by a number of parameters which enable a quadratic based curve to be rendered in a single output space without the need to perform multiplications. Further data formats such as cubic splines and the like may also be used. An object may contain a mixture of many different edge types. Typically, common to all formats are identifiers for the start and end of each line (whether straight or curved) and typically, these are identified by a scan line number thus defining a specific output space in which the curve may be rendered.
For example, Fig. 16A shows a prior art edge description of an edge 600 that is required to be divided into two segments 601 and 602 in order for the segments to be adequately described and rendered. This arises because the prior art edge description, whilst being simply calculated through a quadratic expression, could not accommodate an inflexion point 604. Thus the edge 600 was dealt with as two separate edges having end points 603 and 604, and 604 and 605 respectively. Fig. 16B shows a cubic spline 610 that 583213 specification -19o is described by endpoints 611 and 612, and control points 613 and 614. This format (Ni requires calculation of a cubic polynomial for render purposes and thus is expensive of
O
Z computational time.
N Figs. 16C and 16D show examples of edges applicable to the preferred arrangement. In the preferred arrangement, a edge is considered as a single entity and if necessary, is partitioned to delineate sections of the edge that may be described in N different formats, a specific goal of which is to ensure a minimum level of complexity for Sthe description of each section.
(,i In Fig. 16C, a single edge 620 is illustrated spanning between scanlines A and M.
An edge is described by a number of parameters including startx, starty, one or more segment descriptions that include an address that points to the next segment in the edge, and a finish segment used to terminate the edge. According to the preferred arrangement, the edge 620 may be described as having three step segments, a vector segment, and a quadratic segment. A step segment is simply defined as having a x-step value and a ystep value. For the three step segments illustrated, the segment descriptions are and Note that the x-step value is signed thereby indicating the direction of the step, whilst the y-step value is unsigned as such is always in a raster scan direction of increasing scalene value. The next segment is a vector segment which typically requires parameters start_x start_y numof scanlines (NY) and slope In this example, because the vector segment is an intermediate segment of the edge 620, the start_x and starty may be omitted because such arise from the preceding segment(s).
The parameter numof scanlines (NY) indicates the number of scanlines the vector segment lasts. The slope value (DX) is signed and is added to the x-value of a preceding scanline to give the x-value of the current scanline, and in the illustrated case, DX +1.
The next segment is a quadratic segment which has a structure corresponding to that of 583213specification the vector segment, but also a second order value (DDX) which is also signed and is added to DX to alter the slope of the segment.
O
Z Fig. 16D shows an example of a cubic curve according the preferred arrangement which includes a description corresponding to the quadratic segment save for the addition c 5 of a signed third-order value (DDDX), which is added to DDX to vary the rate of change C€3 of slope of the segment. Many other orders may also be implemented.
It will be apparent from the above that the ability to handle plural data formats Sdescribing edge segments allows for simplification of edge descriptions and evaluation, without reliance on complex and computationally expensive mathematical operations. In contrast, in the prior art system of Fig. 16A, all edges, whether, orthogonal, vector or quadratic were required to be described by the quadratic form.
The operation of the preferred arrangement will be described with reference to the simple example of rendering an image 78 shown in Fig. 8 which is seen to include two graphical objects, in particular, a partly transparent blue-coloured triangle 80 rendered on top of and thereby partly obscuring an opaque red coloured rectangle 90. As seen, the rectangle 90 includes side edges 92, 94, 96 and 98 defined between various pixel positions and scan line positions Because the edges 96 and 98 are formed upon the scan lines (and thus parallel therewith), the actual object description of the rectangle 90 can be based solely upon the side edges 92 and 94, such as seen in Fig. 9A.
In this connection, edge 92 commences at pixel location (40,35) and extends in a raster direction down the screen to terminate at pixel position (40,105). Similarly, the edge 94 extends from pixel position (160,35) to position (160,105). The horizontal portions of the rectangular graphic object 90 may be obtained merely by scanning from the edge 92 to the edge 94 in a rasterised fashion.
583213specification -21 The blue triangular object 80 however is defined by three object edges 82, 84 and 86, each seen as vectors that define the vertices of the triangle. Edges 82 and 84 are 0 Z seen to commence at pixel location (100,20) and extend respectively to pixel locations N(170,90) and (30,90). Edge 86 extends between those two pixel locations in a traditional 5 rasterised direction of left to right. In this specific example because the edge 86 is e¢3 horizontal like the edges 96 and 98 mentioned above, it is not essential that the edge 86 be (Ni N defined. In addition to the starting and ending pixel locations used to describe the edges S82 and 84, each of these edges will have associated therewith the slope value in this case +1 and -1 respectively.
Returning to Fig. 2, having identified the data necessary to describe the graphic objects to the rendered, the graphic systems 1 then generates a display list and estimates the render time required for the objects defined in the display list.
The display list generation and render time estimation 12 is preferably implemented as a software driver executing on the host processor 2 with attached ROM 6 and RAM 3. The display list generation 12 converts an object graphics description, expressed in any one or more of the well known graphic description languages, graphic library calls, or any other application specific format, into a display list. The display list is typically written into a display list store 13, generally formed within the rendering stores 30 but which may alternatively be formed within the RAM 4. As seen in Fig. 3, the display list store 13 can include a number of components, one being an instruction stream 14, another being edge information 15 and where appropriate, raster image pixel data 16.
The instruction stream 14 includes code interpretable as instructions to be read by the pixel sequential rendering apparatus 20 to render the specific graphic objects desired in any specific image. For the example of the image shown in Fig. 8, the instruction stream 14 could be of the form of: 583213specification -22render (nothing) to scan line at scan line 20 add two blue edges 82 and 84; 0 Z render to scan line at scan line 35 add two red edges 92 and 94; and 5 render to completion.
e¢3 Similarly, the edge information 15 for the example of Fig. 8 may include the Nfollowing: S(i) edge 84 commences at pixel position 100, edge 82 commences at pixel position 100; (ii)edge 92 commences at pixel position 40, edge 94 commences at pixel position 160; (iii) edge 84 runs for 70 scan lines, edge 82 runs for 70 scanlines; (iv) edge 84 has slope edge 84 has slope +1; edge 92 has slope 0 edge 94 has slope 0.
(vi) edges 92 and 94 each run for 70 scanlines.
It will be appreciated from the above example of the instruction stream 14 and edge information 15 and the manner in which each are expressed, that in the image 78 of Fig. 8, the pixel position and the scanline value define a single output space in which the image 78 is rendered. Other output space configurations however can be realised using the principles of the present disclosure.
Fig. 8 includes no raster image pixel data and hence none need be stored in the store portion 16 of the display list 13, although this feature will be described later.
The contents of the display list store 13 is read by a pixel sequential rendering apparatus 20, which is typically implemented as an integrated circuit. The pixel sequential rendering apparatus 20 converts the display list into a stream of raster pixels 583213 specification -23which are placed in a pixel output buffer 19. Pixels may be forwarded from the output buffer 19 to another device, for example, a printer, a display, or a memory store.
0 Z Although the preferred arrangement describes the pixel sequential rendering apparatus 20 as an integrated circuit, it may be implemented as an equivalent software module executing on a general purpose processing unit, such as the host processor 2.
_If the estimated render time indicates that the rendering cannot be achieved in real time, then the intermediate render job is rendered by the renderer 3001, which sends the Srendered pixels to the compressor 3000. The compressor 3000 compresses and stores the (,i rendered data.
The decompressor 3002 reads the data stored by the compressor 3000 and decompresses the data to obtain the rendered pixels, which may be forwarded to the print engine of the printer In one implementation, the renderer 3001, the compressor 3000 and the decompressor 3002 are software modules running on the host processor 2.
Fig. 3 shows the configuration of the pixel sequential rendering apparatus 20, the display list store 13 and the temporary rendering stores 30. The processing stages 22 of the pixel-sequential render apparatus 20 include an instruction executor 300, an edge processing module 400, a priority determination module 500, an optimisation module 550, a fill colour determination module 600, a pixel compositing module 700, and a pixel output module 800. The processing operations use the temporary stores 30 which as noted above, may share the same device (eg. magnetic disk or semiconductor RAM) as the display list store 13, or may be implemented as individual stores for reasons of speed optimisation. The edge processing module 400 uses an edge record store 32 to hold edge information which is carried forward from scan-line to scan-line. The priority determination module 500 uses a priority properties and status table 34 to hold 583213specification -24o information about each priority, and the current state of each priority with respect to edge crossings while a scan-line is being rendered. The fill colour determination module 600
O
Z uses a fill data table 36 to hold information required to determine the fill colour of a N, particular priority at a particular position. The pixel compositing module 700 uses a pixel compositing stack 38 to hold intermediate results during the determination of an output pixel that requires the colours from multiple priorities to determine its value.
N The display list store 13 and the other stores 32-38 detailed above may be Simplemented in RAM or any other data storage technology.
The processing steps shown in the arrangement of Fig. 3 take the form of a processing pipeline 22. In this case, the modules of the pipeline may execute simultaneously on different portions of image data in parallel, with messages passed between them as described below. In another arrangement, each message described below may take the form of a synchronous transfer of control to a downstream module, with upstream processing suspended until the downstream module completes the processing of the message.
3.1 Instruction Executor The instruction executor 300 reads and processes instructions from the instruction stream 14 and formats the instructions into messages that transferred via an output 398 to the other modules 400, 500, 550, 600 and 700 within the pipeline 22. In the preferred arrangement, the instruction stream 13 may include the instructions: LOAD PRIORITY PROPERTIES: This instruction is associated with data to be loaded into the priority properties and status table 34, and an address in that table to which the data is to be loaded. When this instruction is encountered by the instruction executor 300, the instruction executor 300 issues a message for the storage of the data in the specified location of the priority properties and status table 34. This may be 583213specification accomplished by formatting a message containing this data and passing it down the processing pipeline 22 to the priority determination module 500 which performs the store 0 Z operation.
LOAD FILL DATA: This instruction is associated with fill data associated with 5 an object to be loaded into the fill data table 36, and an address in that table to which the data is to be loaded. When this instruction is encountered by the instruction executor 300, Sthe instruction executor 300 issues a message for the storage of the data at the specified Saddress of the fill data table 36. This may be accomplished by formatting a message containing this data and passing it down the processing pipeline 22 to the fill colour determination module which performs the store operation.
LOAD NEW EDGES AND RENDER: This instruction is associated with an address in the display list store 13 of new edges 15 which are to be introduced into the rendering process when a next scanline is rendered. When this instruction is encountered by the instruction executor, the instruction executor 300 formats a message containing this data and passes it to the edge processing module 400. The edge processing module 400 store the address of the new edges in the edge record store 32. The edges at the specified address are sorted on their initial scanline intersection coordinate before the next scanline is rendered. In one arrangement, they are sorted by the display list generation process 12. In another arrangement, they are sorted by the pixel-sequential rendering apparatus SET SCANLINE LENGTH: This instruction is associated with a number of pixels which are to be produced in each rendered scanline. When this instruction is encountered by the instruction executor 300, the instruction executor 300 passes the value to the edge processing module 400 and the pixel compositing module 700.
583213 specification -26- SET OPACITYMODE: This instruction is associated with a flag, which indicates whether pixel compositing operations will use an opacity channel (also known
O
Z in the art as an alpha channel). When this instruction is encountered by the instruction (,i Nexecutor 300, the instruction executor 300 passes the flag value in the pixel compositing n 5 module 700.
_SET_BUF: This instruction sets the address of external memory buffers used by (Ni the pixel sequential rendering apparatus 20. Preferably, at least the input, output and spill Sbuffers of the edge processing module 400 are stored in external memory.
The instruction executor 300 is typically formed by a microcode state machine that maps instructions and decodes them into pipeline operations for passing to the various modules. A corresponding software process may alternatively be used.
3.2 Edge Processing Module The operation of the edge processing module 400 during a scanline render operation will now be described with reference to Fig. 4. The initial condition for the rendering of a scanline is the availability of three lists of edge records. Any or all of these lists may be empty. These lists are a new edge list 402, obtained from the edge information 15 and which contains new edges as set by the LOAD_NEWEDGES_AND_RENDER instruction, a main edge list 404 which contains edge records carried forward from the previous scanline, and a spill edge list 406 which also contains edge records carried forward from the previous scanline.
Turning now to Fig. 12A, there is shown the format of such an edge record, which may include: a current scanline intersection coordinate (referred to here as the X coordinate), 583213specification -27- (ii) a count (referred to herein as NY) of how many scanlines a current segment of this edge will last for (in some arrangements this may be represented as a Y
O
Z limit), (iii) a value to be added to the X coordinate of this edge record after each scanline (referred to here as the DX), (iv) a priority number or an index to a list of priority numbers, an address (addr) of a next edge segment in the list; and (vi) a number of flags, marked p, o, u, c and d. The flag d determines whether the edge effects the clipping counter or the fill counter. The flag u determines whether the fill counter is incremented or decremented by the edge. The remaining flags are not essential to the invention and will not be further described.
Such a format may accommodate vectors, and orthogonally arranged edges. The format may also include a further parameter herein called DDX, which is a value to be added to the DX of this edge record after each scanline. The latter enables the rendering of edges comprising quadratic curves. The addition of further parameters, DDDX for example, may allow such an arrangement to accommodate cubic curves. In some applications, such as cubic Bezier spline, a 6-order polynomial (ie: up to DDDDDDX) may be required. The flag indicates whether a winding count is to be incremented or decremented by an edge. The winding count is stored in a fill counter and is used to determine whether a currently scanned pixel is inside or outside the object in question.
In the example of the edges 84 and 94 of Fig. 8A, the corresponding edge records at scanline 20 could read as shown in the Table of Fig. 8B.
In this description, coordinates which step from pixel to pixel along a scanline being generated by the rendering process will be referred to as X coordinates, and coordinates which step from scanline to scanline will be referred to as Y coordinates.
5832 13specification -28- 8 Preferably, each edge list contains zero or more records placed contiguously in memory.
Other storage arrangements, including the use of pointer chains, are also possible. The
O
Z records in each of the three lists 402, 404 and 406 are arranged in order of scanline
(N
N intersection coordinate. This is typically obtained by a sorting process, initially Mc 5 managed by an edge input module 408 which receives messages, including edge
C,
information, from the instruction executor 300. It is possible to relax the sort to only N regard the integral portion of each scanline intersection coordinate as significant. It is Salso possible to relax the sort further by only regarding each scanline intersection coordinate, clamped to the minimum and maximum X coordinates which are being produced by the current rendering process. Where appropriate, the edge input module 408 relay messages to modules 500, 600 and 700 downstream in the pipeline 22 via an output 498.
The edge input module 408 maintains references into and receives edge data from each of the three lists 402, 404, and 406. Each of these references is initialised to refer to the first edge in each list at the start of processing of a scanline. Thereafter, the edge input module 408 selects an edge record from one of the three referenced edge records such that the record selected is the one with the least X coordinate out of the three referenced records. If two or more of the X-records are equal, each is processed in any order and the corresponding edge crossings output in the following fashion. The reference, which was used to select that record, is then advanced to the next record in that list. The edge just selected is formatted into a message and sent to an edge update module 410. Also, certain fields of the edge, in particular the current X, the priority numbers, and the direction flag, are formatted into a message which is forwarded to the priority determination module 500 as an output 498 of the edge processing module 400.
Arrangements that use more or fewer lists than those described here are also possible.
583213 specification -29o Upon receipt of an edge, the edge update module 410 decrements the count of how many scanlines for which a current segment will last. If that count has reached zero,
O
Z a new segment is read from the address indicated by the next segment address. A N' segment preferably specifies: C 5 a value to add to the current X coordinate immediately the segment is read, C€3 (ii) a new DX value for the edge, (Ni (iii) a new DDX value for the edge, and (iv) a new count of how many scanlines for which the new segment will last.
(,i If there is no next segment available at the indicated address, no further processing is performed on that edge. Otherwise, the edge update module 410 calculates the X coordinate for the next scanline for the edge. This typically would involve taking the current X coordinate and adding to it the DX value. The DX may have the DDX value added to it, as appropriate for the type of edge being handled. The edge is then written into any available free slot in an edge pool 412, which is an array of two or more edge records. If there is no free slot, the edge update module 410 waits for a slot to become available. Once the edge record is written into the edge pool 412, the edge update module 410 signals via a line 416 to an edge output module 414 that a new edge has been added to the edge pool 412.
As an initial condition for the rendering of a scanline, the edge output module 414 has references to each of a next main edge list 404' and a next spill edge list 406'. Each of these references is initialised to the location where the, initially empty, lists 404' and 406' may be built up. Upon receipt of the signal 416 indicating that an edge has been added to the edge pool 412, the edge output module 414 determines whether or not the edge just added has a lesser X coordinate than the edge last written to the next main edge list 404' (if any). If this is true, a "spill" is said to have occurred because the edge cannot 583213 specification be appended to the main edge list 404 without violating its ordering criteria. When a spill occurs, the edge is inserted into the next spill edge list 406', preferably in a manner that
O
Z maintains a sorted next spill edge list 406'. For example this may be achieve using a insertion sorting routine. In some arrangements the spills may be triggered by other c 5 conditions, such as excessively large X coordinates.
If the edge added to the edge pool 412 has an X coordinate greater than or equal to the edge last written to the next main edge list 404' (if any), and there are no free slots available in the edge pool 412, the edge output module 414 selects the edge from the edge pool 412 which has the least X coordinate, and appends that edge to the next main edge list 404', extending it in the process. The slot in the edge pool 412 that was occupied by that edge is then marked as free.
Once the edge input module 408 has read and forwarded all edges from all three of its input lists 402, 404 and 406, it formats a message which indicates that the end of scanline has been reached and sends the message to both the priority determination module 500 and the edge update module 410. Upon receipt of that message, the edge update module 410 waits for any processing it is currently performing to complete, then forwards the message to the edge output module 414. Upon receipt of the message, the edge output module 414 writes all remaining edge records from the edge pool 412 to the next main edge list 404' in X order. Then, the reference to the next main edge list 404' and the main edge list 404 are exchanged between the edge input module 408 and the edge output module 414, and a similar exchange is performed for the next spill edge list 406' and the spill edge list 406. In this way the initial conditions for the following scanline are established.
Rather than sorting the next spill edge list 406' upon insertion of edge records thereto, such edge records may be merely appended to the list 406', and the list 406' 583213 specification 1 -31 sorted at the end of the scanline and before the exchange to the current spill list 406 becomes active in edge rasterisation of the next scanline.
O
Z It can be deduced from the above that edge crossing messages are sent to the priority determination module 500 in scanline and pixel order (that is, they are ordered Cc 5 firstly on Y and then on X) and that each edge crossing message is labelled with the priority to which it applies.
Fig. 12A depicts a specific structure of an active edge record 418 that may be Screated by the edge processing module 400 when a segment of an edge is received. If the first segment of the edge is a step (orthogonal) segment, the X-value of the edge is added to a variable called "X-step" for the first segment to obtain the X position of the activated edge. Otherwise, the X-value of the edge is used. The Xstep value is obtained from the segment data of the edge and is added once to the Xedge value of the next segment to obtain the X position of the edge record for that next segment. This means that the edges in the new edge record will be sorted by Xedge Xstep. The Xstep of the first segment should, therefore, be zero, in order to simplify sorting the edges. The Y-value of the first segment is loaded into the NY field of the active edge record 418. The DX field of the active edges copied from the DX field identifier of vector or quadratic segments, and is set to zero for a step segment. A u-flag as seen in Fig. 12A is set if the segment is upwards heading (see the description relating to Fig. 13A). A d-flag is set when the edge is used as a direct clipping object, without an associated clipping level, and is applicable to closed curves. The actual priority level of the segment, or a level address is copied from the corresponding field of the new edge record into a level field in the active edge record 418. The address of the next segment in the segment list is copied from the corresponding field of the new edge record into a segment address field (segment addr) of 583213specification -32the active edge record 418. The segment address may also be used to indicate the termination of an edge record.
O
Z It will be appreciated from Fig. 12A that other data structures are also possible, and necessary for example where polynomial implementations are used. In one alternative data structure, the 'segment addr' field is either the address of the next (Ni segment in the segment list or copied from the segments DDX value, if the segment is (Ni N quadratic. In the latter case, the data structure has a q-flag which is set if the segment is a quadratic segment, and cleared otherwise. In a further variation, the segment address and the DDX field may be separated into different fields, and additional flags provided to meet alternate implementations.
Fig. 12B depicts the arrangement of the edge records described above in the preferred arrangement and used in the edge processing module 400. A new active edge record 428, a current active edge record 430 and a spill active edge record 432, supplements the edge pool 412. As seen in Fig. 12B, the records 402, 404, 406, 404' and 406' are dynamically variable in size depending upon the number of edges being rendered at any one time. Each record includes a limit value which, for the case of the new edge list 402, is determined by a SIZE value incorporated with the LOAD EDGES AND RENDER instruction. When such an instruction is encountered, SIZE is checked and if non-zero, the address of the new edge record is loaded and a limit value is calculated which determines a limiting size for each of the lists 402, 404, 406, 404' and 406'.
Although the preferred arrangements utilizes arrays and associated pointers for the handling of edge records, other implementations, such as linked lists for example may be used. These other implementations may be hardware or software-based, or combinations thereof.
583213 specification -33 o The specific rendering of the image 78 shown in Fig. 8A will now be described with reference to scanlines 34, 35 and 36 shown in Fig. 10. In this example, the
O
Z calculation of the new X coordinate for the next scanline is omitted for the purposes of (,i N clarity, with Figs. 12C to 121 illustrating the output edge crossing being derived from one c 5 of the registers 428, 430 and 432 of the edge poll 412.
C€3 Fig. 12C illustrates the state of the lists noted above at the end of rendering (Ni N scanline 34 (the top portion of the semi-transparent blue triangle 80). Note that in scanline 34 there are no new edges and hence the list 402 is empty. Each of the main (,i edge lists 404 and next main edge list 404' include only the edges 82 and 84. Each of the lists includes a corresponding pointer 434, 436, and 440 which, on completion of scanline 34, point to the next vacant record in the corresponding list. Each list also includes a limit pointer 450, denoted by an asterisk which is required to point to the end of the corresponding list. If linked lists were used, such would not be required as linked lists include null pointer terminators that perform a corresponding function.
As noted above, at the commencement of each scanline, the next main edge list 404' and the main edge list 404 are swapped and new edges are received into the new edge list 402. The remaining lists are cleared and each of the pointers set to the first member of each list. For the commencement of scanline 35, the arrangement then appears as seen in Fig. 12D. As is apparent from Fig. 12D, the records include four active edges which, from Fig. 10, are seen to correspond to the edges 92, 94, 84 and 82.
Referring now to Fig. 12E, when rendering starts, the first segment of the new edge record 402 is loaded into an active edge record 428 and the first active edge records of the main edge list 404 and spill edge list 406 are copied to records 430 and 432 respectively. In this example, the spill edge list 406 is empty and hence no loading takes place. The X-positions of the edges within the records 428, 430 and 432 are then 583213specification -34- O compared and an edge crossing is emitted for the edge with the smallest X-position. In this case, the emitted edge is that corresponding to the edge 92 which is output together
O
Z with its priority value. The pointers 434, 436 and 438 are then updated to point to the
(N
N next record in the list.
c 5 The edge for which the edge crossing was emitted is then updated (in this case by adding DX 0 to its position), and buffered to the edge pool 412 which, in this example, N is sized to retain three edge records. The next entry in the list from which the emitted Sedge arose (in this case list 402) is loaded into the corresponding record (in this case record 428). This is seen in Fig. 12F.
Further, as is apparent from Fig. 12F, a comparison between the registers 428, 430 and 432 again selects the edge with the least X-value which is output as the appropriate next edge crossing (X=85, Again, the selected output edge is updated and added to the edge pool 412 and all the appropriate pointers incremented. In this case, the updated value is given by X X DX, which is evaluated as 84 85 1. Also, as seen, the new edge pointer 434 is moved, in this case, to the end of the new edge list 402.
In Fig. 12G, the next edge identified with the lowest current X-value is again that obtained from the register 430 which is output as an edge crossing (X=1l15, P=2).
Updating of the edge again occurs with the value be added to the edge pool 412 as shown.
At this time, it is seen that the edge pool 412 is now full and from which the edge with the smallest X-value is selected and emitted to the output list 404', and the corresponding limited pointer moved accordingly.
As seen in Fig. 12H, the next lowest edge crossing is that from the register 428 which is output (X=160 The edge pool 412 is again updated and the next small Xvalue emitted to the output list 404'.
583213specification O At the end of scanline 35, and as seen in Fig. 121, the contents of the edge pool 412 are flushed to the output list 404' in order of smallest X-value. As seen in Fig. 12J,
O
Z the next main edge list 404' and the main edge list 404 are swapped by exchanging their N pointers in anticipation of rendering the next scanline 36. After the swapping, it is seen Cc 5 from Fig. 12J that the contents of the main edge list 404 include all edge current on
C,
scanline 36 arranged in order of X-position thereby permitting their convenient access cl which facilitates fast rendering.
C Ordinarily, new edges are received by the edge processing module 400 in order of increasing X-position. When a new edge arrives, its position is updated (calculated for the next scanline to be rendered) and this determines further action as follows: if the updated position is less than the last X-position output on the line 498, the new edge is insertion sorted into the main spill list 406 and the corresponding limit register updated; otherwise, if there is space, it is retained in the edge pool 412.
As is apparent from the foregoing, the edge pool 412 aids in the updating of the lists in an ordered manner in anticipation of rendering the next scanline in the rasterised image. Further, the size of the edge pool 412 may be varied to accommodate larger numbers of non-ordered edges. However, it will be appreciated that in practice the edge pool 412 will have a practical limit, generally dependent upon processing speed and available memory with the graphic processing system. In a limiting sense, the edge pool 412 may be omitted which would ordinarily require the updated edges to be insertion sorted into the next output edge list 404'. However, in the preferred arrangement this situation is avoided, as a normal occurrence through the use of the spill lists mentioned above. The provision of the spill lists allows the preferred arrangement to be implemented with an edge pool of practical size and yet handle relatively complex edge 583213 specification -36intersections without having to resort to software intensive sorting procedures. In those small number of cases where the edge pool and spill list are together insufficient to Z accommodate the edge intersection complexity, sorting methods may be used.
(,i NAn example of where the spill list procedure is utilised is seen in Fig. 14A where 5 three arbitrary edges 60, 61 and 63 intersect an arbitrary edge 62 at a relative position between scanlines A and B. Further, the actual displayed pixel locations 64 for each of Nscanlines A, B, are shown which span pixel locations C to J. In the above described Sexample where the edge pool 412 is size to retain three edge records, it will be apparent that such an arrangement alone will not be sufficient to accommodate three edge intersections occurring between adjacent scanlines as illustrated in Fig. 14A.
Fig. 14B shows the state of the edge records after rendering the edges 60, 61 and 63 on scanline. The edge crossing H is that most recently emitted and the edge pool 412 is full with the updated X-values E, G and I for the edges 60, 61 and 63 respectively for the next scanline, scanline B. The edge 62 is loaded into the current active edge record 430 and because the edge pool 412 is full, the lowest X-value, corresponding to the edge is output to the output edge list 404'.
In Fig. 14C, the next edge crossing is emitted (X J for edge 62) and the corresponding updated value determined, in this case X C for scanline B. Because the new updated value X C is less than the most recent value X E copied to the output list 404', the current edge record and its corresponding new updated value is transferred directly to the output spill list 406'.
Fig. 14D shows the state of the edge records at the start of scanline B where it is seen that the main and output lists, and their corresponding spill components have been swapped. To determine the first emitted edge, the edge 60 is loaded into the current active edge register 430 and the edge 62 is loaded into the spill active edge register 432.
583213specification -37- The X-values are compared and the edge 62 with the least X-value (X C) is emitted, updated and loaded to the edge pool 412.
Z Edge emission and updating continues for the remaining edges in the main edge Slist 404 and at the end of the scanline, the edge pool 412 is flushed to reveal the situation S 5 shown in Fig. 14E, where it is seen that each of the edges 60 to 63 are appropriately N ordered for rendering on the next scanline, having been correctly emitted and rendered on N, scanline B.
As will be apparent from the foregoing, the spill lists provide for maintaining edge rasterisation order in the presence of complex edge crossing situations. Further, by virtue of the lists being dynamically variable in size, large changes in edge intersection numbers and complexity may be handled without the need to resort to sorting procedures in all but exceptionally complex edge intersections.
In the preferred arrangement the edge pool 412 is sized to retain eight edge records and the lists 404, 404' together with their associated spill lists 406, 406' have a base (minimum) size of 512 bytes which is dynamically variable thereby providing sufficient scope for handling large images with complex edge crossing requirements.
3.3 Priority Determination Module The operation of the priority determination module 500 will now be described with reference to Fig. 5. The primary function of the priority determination module 500 is to determine those objects that make a contribution to a pixel currently being scanned, order those contributing objects in accordance with their priority levels, and generate colour composite messages for instructing the pixel compositing module 700 to composite the ordered objects to generate the required colour and opacity for the current pixel.
The priority determination module 500 receives incoming messages 498 from the edge processing module 400. These incoming messages may include load priority data 583213 specification -38messages, load fill data messages, edge crossing messages, and end of scanline messages.
These messages first pass through a first-in first-out (FIFO) buffer 518 before being read
O
Z by a priority update module 506. The FIFO 518 acts to de-couple the operation of the (-i edge processing module 400 and the priority determination module 500. Preferably the S 5 FIFO 518 is sized to enable the receipt from the edge processing module 400 and transfer (Ni a full scanline of edge-crossings in a single action. Such permits the priority (Ni determination module 500 to correctly handle multiple edge-crossings at the same pixel S(X) location.
The priority determination module 500 is also adapted to access a priority state table 502, and a priority data table 504. These tables are used to hold information about each priority. Preferably, the priority state and priority data tables 502, 504 are combined into one table 34 as shown in Fig. 3. Alternatively these tables 502, 504 can be kept separate.
Preferably, the priority properties and status table 34 includes at least the following fields as shown in Fig. 18 for each priority level: a fill-rule flag (FILLRULEISODD_EVEN) which indicates whether this priority is to have its inside versus outside state determined by the application of the odd-even fill rule or the non-zero winding fill rule; (ii) a fill counter (FILL COUNT) for storing a current fill count which is modified in a manner indicated by the fill rule each time an edge effecting this priority is crossed; (iii) a clipper flag CLIPPER) which indicates whether this priority is to be used for clipping or filling; (iv) a clip type flag (CLISPOUT) which, for edges which have the clipper flag set, records whether the clipping type is a "clip-in" or a "clip-out"; 583213specification -39a clip counter (CLIP COUNT) for storing a current clip count which is decremented and incremented when a clip-in type clip region effecting this priority is
O
Z entered and exited respectively, and incremented and decremented when a clip-out type c clip region effecting this priority is entered and exited respectively; and S 5 (vi) a flag (NEED_BELOW) which records whether this priority requires Cc€ levels beneath it to be calculated first, referred to as the "need-below" flag.
(vii) a fill table address (FILL INDEX), which point to an address where the fill O of the priority is stored; (viii) a fill type (FILL TYPE), (ix) a raster operation code (COLOUROP), an alpha channel operation code (ALPHAOP) consisting of three flags (LAO_USE_D_OUT_S, LAO_USE_SOUT_D and LAO_USE_SROP_D), (xi) a stack operation code (STACK_OP), and (xii) a flag (X_INDEPENDENT) which records whether the colour of this priority is constant for a given Y, referred to here as the "x-independent" flag; and (xiii) other information (ATTRIBUTES) of the priority.
Clipping objects are known in the art and act not to display a particular new object, but rather to modify the shape of an another object in the image. Clipping objects can also be turned-on and turned-off to achieve a variety of visual effects. For example, the object 80 of Fig. 8A could be configured as a clipping object acting upon the object to remove that portion of the object 90 that lies beneath the clipping object 80. This may have the effect of revealing any object or image beneath the object 90 and within the clipping boundaries that would otherwise be obscured by the opacity of the object The CLIPPER flag is used to identify whether the priority is a clipping object. Also, the CLISP flag is used to determine whether the priority is a clip in or a clip out, and the 583213specification o CLIP COUNT is used in a similar fashion to FILL COUNT to determine whether the current pixel is within the clip region.
O
Z Figs. 13A and 13B demonstrate the application of the odd-even and non-zero winding rules for activating objects. The relevant rule to be used is determined by means 5 of the fill-rule flag FILL_RULEISODDEVEN.
C€3 For the purposes of the non-zero winding rule, Fig. 13A illustrates how the edges 71 (Ni ,i and 72 of an object 70 are allocated a notional direction, according to whether the edges Sare downwards-heading or upwards-heading respectively. In order to form a closed boundary, edges link nose-to-tail around the boundary. The direction given to an edge for the purposes of the fill-rule (applied and described later) is independent of the order in which the segments are defined. Edge segments are defined in the order in which they are tracked, corresponding to the rendering direction.
Fig. 13B shows a single object (a pentagram) having two downwards-heading edges 73 and 76, and three upwards-heading edges 74, 75 and 77. The odd-even rule operates by simply toggling a Boolean value in the FILL COUNT as each edge is crossed by the scanline in question, thus effectively turning-on (activating) or turning-off (deactivating) an object's colour. The non-zero winding rule increments and decrements a value stored in the fill counter FILL COUNT dependent upon the direction of an edge being crossed. In Fig. 13B, the first two edges 73 and 76 encountered at the scanline are downwards-heading and thus traversal of those edge increment the fill counter, to +1 and +2 respectively. The next two edges 74 and 77 encountered by the scanline are upwardsheading and accordingly decrement the fill counter FILL COUNT, to +1 and 0 respectively. The non-zero winding rule operates by turning-on (activating) an object's colour when the fill counter FILL COUNT is non-zero, and turning-off (de-activating) the object's colour when the fill counter FILL COUNT is zero.
583213specification -41- The NEEDBELOW flag for a priority is established by the driver software and is used to inform the pixel generating system that any active priorities beneath the priority in
O
Z question do not contribute to the pixel value being rendered, unless the flag is set. The flag is cleared where appropriate to prevent extra compositing operations that would c 5 otherwise contribute nothing to the final pixel value.
The raster operation code (COLOUROP), alpha channel operation (ALPHA OP) and stack operation (STACK_OP) together form the pixel operation S(PIXEL OP), that is to be performed by the pixel compositing module 700 on each pixel where the priority is active and exposed.
Preferably, most of the information contained in the combined table 34 is directly loaded by instructions from the driver software. In particular, the fill-rule flag, the clipper flag, the clip type flag, and the need-below flag, fill table address, fill type, raster operation, code, alpha channel operation code, stack operation code, xindependent flag, and other attributes may be handled in this manner. On the other hand, the fill counter, and clip counter are initially zero and are changed by the priority determination module 500 in response to edge crossing messages.
The priority determination module 500 determines that a priority is active at a pixel if the pixel is inside the boundary edges which apply to the priority, according to the fill-rule for that priority, and the clip count for the priority. A priority is exposed if it is the uppermost active priority, or if all the active priorities above it have their corresponding need-below flags set. In this fashion, pixel values may be generated using only the fill data of the exposed priorities. It is important to note that an object's priority designates the location (viz level) of the object in the order of the objects from the rearmost object to the foremost object. Preferably, a number of non-overlapping objects that have the same fill and compositing operation, and that form a contiguous sequence, 583213specification -42- 8 may be designated as having the same priority. This effectively saves memory space in the fill table. Furthermore, the corresponding edge records of objects need only reference Z the corresponding priority in order to reference the corresponding fill and compositing N operation.
c 5 Returning now to Fig. 5, the priority update module 506 maintains a counter 524 which records the scanline intersection coordinate up to which it has completed N processing. This will be referred to as the current X of the priority update module 506.
The initial value at the start of a scanline is zero.
Upon examining an edge crossing message received at the head of the FIFO 518, the priority update module 506 compares the X intersection value in the edge crossing message with its current X. If the X intersection value in the edge crossing message is less than or equal to the current X of the priority update module 506 processes the edge crossing message. Edge crossing message processing comes in two forms, "normal edge processing" (described below) is used when the record in the priority state table 502 of the combined table 34 indicated by the priority in the edge crossing message has a clipper flag which indicates that this is not a clip priority, otherwise "clip edge processing" (described below) is performed.
"Normal edge processing" includes, for each priority in the edge crossing message and with reference to fields of the record of combined table 34 indicated by that priority, the steps of: noting the current fill count of the current priority; (ii) either: if the fill rule of the current priority is odd-even, setting the fill count to zero if it is currently non-zero, else setting it to any non-zero value, or 583213specification -43if the fill rule of the current priority is non-zero winding, incrementing or decrementing (depending on the edge direction flag) the fill count; and
O
Z iii) comparing the new fill count with the noted fill count and if one is zero and the other is non-zero performing an "active flag update" (described below) operation on the 5 current priority.
Some arrangements may use a separate edge crossing message for each priority (Ni N rather than placing a plurality of priorities in each edge crossing message.
SAn active flag update operation includes first establishing a new active flag for the current priority. The active flag is non-zero if the fill count for the priority in the priority state table 502 is non-zero and the clip count for the priority is zero, else the active flag is zero. The second step in the active flag update operation is to store the determined active flag in an active flags array 508 at the position indicated by the current priority, then if the need-below flag in the priority state table for the current priority is zero, also storing the active flag in an opaque active flags array 510 at the position indicated by the current priority.
"Clip edge processing" includes, with reference to fields of the priority state table record indicated by the first priority in the edge crossing message, the steps of: noting the current fill count of the current priority; (ii) either: if the fill rule of the current priority is odd-even, setting the fill count to zero if it is currently non-zero else setting it to any non-zero value, or if the fill rule of the current priority is non-zero winding, incrementing or decrementing (depending on the edge direction flag) the fill count; and (iii) comparing the new fill count with the noted fill count and determining a clip delta value of: 583213 specification -44- S(a) zero, if both the new fill count is zero and the noted fill count is zero, or both the new fill count is non-zero and the noted fill count is non-zero, 0 Z plus one, if the clip type flag of the current priority is clip-out and (,i the noted fill count is zero and the new fill count is non-zero, or the clip type flag of the c 5 current priority is clip-in and the noted fill count is non-zero and the new fill count is zero, or otherwise, minus one; and (iv) for every subsequent priority after the first in the edge crossing message, (,i add the determined clip delta value to the clip count in the record in the priority state stable indicated by that subsequent priority, and if the clip count either moved from nonzero to zero, or from zero to non-zero in that process, performing an active flag update operation as described above on that subsequent priority. It should be noted that the initial value of each clip count is set by the LOADPRIORITY PROPERTIES instruction described previously. The clip count is typically initialised to the number of clip-in priorities, which affect each priority.
Some arrangements do not associate a priority with a clip, but instead directly increment and decrement the clip count of all priorities given in the edge crossing message. This technique can be used, for example, when clip shapes are simple and do not require the application of a complex fill rule. In this specific application, the clip count of the level controlled by an edge is incremented for an upwards heading edge or decremented for a downwards heading edge. A simple closed curve, described anticlockwise, acts a clip-in, whereas a simple closed curve, described clockwise, acts as a clip-out.
When the X intersection value in the edge crossing message is greater than the current X of the priority update module 506, the priority update module 506 forms a 583213 specification O count of how many pixels to generate, being the difference between the X intersection value in the edge crossing message and the current X, this count is formatted into a 0 Z priority generation message, which is sent via a connection 520 to a priority generation
(N
NI module 516. The priority update module 506 then waits for a signal 522 from the priority generation module 516 indicating that processing for the given number of pixels has completed. Upon receipt of the signal 522, the priority update module 506 sets its current (Ni N X to the X intersection value in the edge crossing message and continues processing as described above.
Upon receipt of a priority generation message 520, the priority generation module 516 performs a "pixel priority generation operation" (described below) a number of times indicated by the count it has been supplied, thereupon it signals 522 the priority update module 506 that it has completed the operation.
Each pixel priority generation operation includes firstly using a priority encoder 514 (eg. a 4096 to 12 bit priority encoder) on the opaque active flags array 510 to determine the priority number of the highest opaque active flag. This priority (if any) is used to index the priority data table 504 and the contents of the record so referenced is formed into a fill priority message output 598 from the priority generation module 516 and sent to the fill colour determination module 600 via the optimisation module 550. Further, if a priority was determined by the previous step (ie. there was at least one opaque active flag set), the determined priority is held, and is referred to as the "current priority". If no priority was determined the current priority is set to zero. The priority generation module 516 then repeatedly uses a modified priority encoder 512 on the active flag array 508 to determine the lowest active flag which is greater than the current priority.
The priority so determined (if any) is used to index the priority determination table 34 and the contents of the record so referenced is formed into a fill priority message. This fill 583213specification -46- O priority message is then sent 598 to the fill colour determination module 600 via the Soptimisation module 550, then the determined priority is used to update the current Z priority. This step is used repeatedly until there is no priority determined (that is, there is no priority flagged in the active flags, which is greater than the current priority). Then the S 5 priority generation module 516 forms an end of pixel message and sends it to the fill colour determination module 600. The priority determination module 500 then proceeds to the next pixel to generate another series of fill priority messages in similar fashion.
Turning now to Fig. 22A, there is shown an example of such a series of fill priority messages 2200 generated by the priority determination module 500 for a single current pixel. As described above, these fill priority messages 2202 are first preceded by a START OF PIXEL command. The fill priority messages 2202 are then sent in priority order commencing with the lowest exposed active priority level. When there are no more fill priority messages 2202 for the current pixel, the priority determination module 500 then sends an END OFPIXEL message 2206.
Each of one these fill priority messages 2202 preferably includes at least the following fields: An identifier code FILL_PRTY 2204 for identifying the message as a fill priority message. This code also includes an index LEVEL_INDX to the corresponding record in the combined table 34, and also a code FIRST_PIXEL indicating whether or not this fill priority message belongs to a first pixel in a run of pixels having the same fill priority messages. The priority determination module 500 asserts the FIRST_PIXEL code for all those fill priority messages of a currently scanned pixel that is intersected by an edge as indicated by the edge crossing messages. The FIRST_PIXEL code is deasserted for all fill priority messages of a currently scanned pixel if there is no edges intersecting that pixel as indicated by the edge crossing messages.
583213specification -47- (ii) A fill table address FILL_INDEX, (iii) A fill type FILL_TYPE,
O
Z (iv) A raster operation code COLOUR_OP, An alpha channel operation code AlphaOP, 5 (vi) A stack operation code STACK_OP, and C€3 (vii) A flag XIND which records whether the colour of this priority is constant for a given Y, referred to here as the "x-independent" flag. This flag is asserted when the colour for this priority is constant.
The values of fields (ii) to (vii) for the fill priority message are retrieved from the corresponding record in the combined table 34.
Preferably, the priority generation module 516 notes the value of the x-independent flag of each fill priority message that it forwards to the fill colour determination module 600 while it processes the first pixel of a sequence. If all the forwarded messages have the x-independent flag specified, all subsequent messages in the span of pixels between adjacent edge intersections can be replaced by a single repeat specification of count minus one. This is done by producing a repeat message and sending it to the fill colour determination module 600 in place of all further processing in this sequence. As will be recognised that if all the fill priority messages of a first pixel in a span of pixels between adjacent edges have their x-independent flag asserted, then the colour and opacity of the pixels in the span of pixels will be constant. Thus in these cases, the pixel compositing module 700 need only composite the first pixel in the span of pixels to generate the required constant colour and opacity and pass this onto the pixel output module 800. The generated repeat command then is passed to the pixel output module 800 which reproduces the constant colour and opacity for the subsequent pixels in the 583213specification -48span of pixels from the colour and opacity of the first pixel. In this fashion, the number Sof compositing operations performed by the pixel compositing module 700 is reduced.
Z As another preferred feature to the basic operation described above, the priority generation module 516 sends the highest opaque priority via the connection 522 to the C 5 priority update module 506 after each edge crossing message. The priority update e¢3 module 506 holds this in a store 526. The priority determination module 506 then, ihe N instead of a simple test that the X intersection in the message is greater than the current X, performs a test that the X intersection in the message is greater than the current X and that at least one of the levels in the message is greater than or equal to the highest opaque priority, before producing a fill priority message. By doing this, fewer pixel priority determination operations may be done and longer repeat sequences may be generated.
Using the example of the graphic objects shown in Figs. 8A, 9A and 9B, the priority update process described above can be illustrated, for scanline 35 using the edge crossings seen from Figs. 12C to 12J, as seen in Figs. 15A to Figs. 15A to 15E illustrate operation of the priority tables 502 and 504, which in the preferred arrangement are merged into a single combined table 34 (see Fig. 18), called a priority determination table 34 together with arrays 508, 510 and encoders 512 and 514.
As seen in Fig. 15A, edge crossing messages are received in order for a scanline from the edge processing module 400 and are loaded into the table 34, which is arranged in priority order. The edge crossing messages include, in this example, an incrementing direction according to the non-zero winding rule of the edge traversal. It is possible for no entries in the priority table 34 to be set.
The priority determination table as illustrated 34 includes column entries for fill count, which are determined from the edge according to the non-zero winding rule or, where appropriate, the odd-even rule. The need-below flag is a property of a priority and 583213specification -49is set as part of the LOADPRIORITIESPROPERTIES instruction. The need-below is set for all priority levels when the table 34 is loaded. Other columns such as "clip count" 0 Z and "fill index table" may be used, but for this example are omitted for simplicity of N explanation. Where no level is active the corresponding entries are set to zero. Further, Mc, 5 the values of the arrays 510 and 508 are updated from the table 34 after receiving a (Ni subsequent edge crossing.
(Ni Ci From Fig. 15A, it will be apparent that, for convenience, a number of records have been omitted for clarity. As described previously, the contents of the table 34, where not used in the priority determination module 500 are passed as messages to each of the fill colour determination module 600 for pixel generation, and to the pixel compositing module 700 for compositing operations.
The first edge crossing for scanline 35 (Fig. 12E) is seen in Fig. 15A where for P=I, the fill count is updated to the value of the edge according to the non-zero winding rule. The "need-below" flag for this level has been set to zero by the driver software as the object in question is opaque.
Because a previous state of the table 34 was not set, the arrays 510 and 508 remain not set and the priority encoder 514 is disabled from outputting a priority. This is interpreted by priority generation module 516 which outputs a count n=40 (pixels) for a "no object" priority (eg: P being the first, blank, portion of the scanline Fig. 15B shows the arrangement when the edge crossing of Fig. 12F is received.
The fill count is updated. The arrays 510 and 508 are then set with the previous highest level from the table 34. At this time, the module 516 outputs a count n=45, P=I representing the edge 96 of the opaque red object 90 before intersection with the semitransparent triangle 583213 specification Fig. 15C shows the arrangement when the edge crossing of Fig. 12G is received.
O Note that the fill count has been adjusted downwardly because of the non-zero winding z Srule. Because the object that is valid prior to receiving the current edge crossing is not opaque, the modified priority encoder 512 is used to select the priority P=2 as the highest active level which is output as is current for n=(1 15-85)=30 pixels.
Fig. 15D shows the arrangement when the edge crossing of Fig. 12H is received.
Note that previously changed "need-below" for P=2 has been transferred to the active array 508, thus permitting the priority encoder to output a value P=1 current for n=(160- 115)=45 pixels.
Fig. 15E shows the result when the edge crossing of Fig. 121 is received, providing for an output of P=0 for n=(1 80-1 6 0)=20 pixels.
As such, the priority module 500 outputs counts of pixels and corresponding priority display values for all pixels of a scanline.
3.4 Optimisation Module The next module in the pipeline is the optimisation module 550. This module 550 looks for groups of instructions (viz fill priority messages) that can be combined into a single colour and instruction, which can be calculated once and stored into a register at the pixel compositing module 700 on the first pixel in a run of pixels. On subsequent pixels, the colour and instruction can be restored from the register, rather than being calculated each time. For example, in the situation where a resultant colour for a pixel is x-independent over a run of pixels, the optimisation circuit can send a REPEAT PIXEL message to the compositing module 700 which can restore the resultant colour for subsequent pixels. In this way, the optimisation module reduces the number of compositing operations being performed by the compositing module. In the 583213 specification -51 Scircumstances where the first pixel in a run comprises an x-dependent pixel value at a O particular level, the REPEAT PIXEL message cannot be used. However, the optimisation 0 Z module is still able to minimise the number of compositing operations, as will be described below.
5 There are many other cases, where one or more x-independent flags of the C€3 forwarded fill priority messages in a first pixel in run of pixels between adjacent edges are (Ni Snot asserted. For example, one of the objects associated with a fill priority message may be a bitmap, thus the colour and opacity varies over the run of pixels. In these cases the optimisation module 550 identifies groups of fill priority messages in the first pixel of the run of pixels that have the x-independent flags asserted and passes this information to the pixel compositing module, which then calculates their combined colour and opacity and stores it a register. On subsequent pixels, the colour and opacity can be restored from the register rather than being calculated each time, thus leading to a reduction of compositing operations.
Fill Colour Determination Module The operation of the fill colour determination module 600 will now be described with reference to Fig. 6. Incoming messages 598 from the priority determination module 500, which include set fill data messages, repeat messages, fill priority messages, end of pixel messages, and end of scanline messages, first pass to a fill lookup and control module 604. The fill lookup and control module 604 maintains a current X position counter 614 and a current Y position counter 616 for use by various components of the fill colour determination module 600.
583213 specification -52- SUpon receipt of an end of scanline message, the fill lookup and control module 604 resets the current X counter 614 to zero and increments the current Y counter 616.
0 Z The end of scanline message is then passed to the pixel compositing module 700.
N Upon receipt of a set fill data message, the fill lookup and control module 604 stores the data in the specified location 602 of the fill data table 36.
(Ni Upon receipt of a repeat message, the fill lookup and control module 604 (Ni increments the current X counter 614 by the count from the repeat message. The repeat Smessage is then passed to the pixel compositing module 700.
Upon receipt of an end of pixel message 2202, the fill lookup and control module 604 again increments the current X counter 614, and the end of pixel message is then passed to the pixel compositing module 700.
Upon receipt of a fill priority message, the fill lookup and control module 604 performs operations which include: the fill type from the fill priority message is used to select a record size in the table 36; (ii) the fill table address from the fill priority message, and the record size as determined above, is used to select a record from the fill data table 36; (iii) the fill type from the fill priority message is used to determine and select a sub-module to perform generation of the fill colour. The sub-modules may include a raster image module 606, a flat colour module 608, a linearly ramped colour module 610, and an opacity tile module 612; (iv) the determined record is supplied to the selected sub-module 606-612; the selected sub-module 606-612 uses the supplied data to determine a colour and opacity value; 583213specification -53- (vi) the determined colour and opacity is combined with remaining information from the fill colour message, namely the raster operation code, the alpha channel 0 Z operation code, the stack operation code, to form a colour composite message 2208, which is sent to the pixel compositing module 700 via the connection 698.
Thus, as shown in Figs. 22A and 22B, a message sequence 2200 starting with a (Ni start of pixel message 2201 message, then fill priority messages 2202 followed by an end C€3 of pixel message 2206 is transformed into a message sequence 2212 comprising a start of ipixel message 2201, colour composite messages 2208 followed by an end of pixel message 2206. These colour composite messages 2202 preferably includes the same fields as the fill priority messages 2202, with the following exceptions: code CLR_CMP 2210 for identifying the message as a colour composite message. This CLRCMP code also includes the index to the corresponding record in the combined table 34; (ii) a colour and opacity field for containing the colour and opacity value of the priority. The latter replaces the Fill index and fill type fields of the fill priority messages; and (iii) STORE and RESTORE bits. These bits are added by the optimisation module 550 and will be discussed in some detail below.
In the preferred arrangement the determined colour and opacity is a red, green, blue and opacity quadruple with 8-bit precision in the usual manner giving 32 bits per pixel. However, a cyan, magenta, yellow and black quadruple with an implied opacity, or one of many other known colour representations may alternatively be used. The red, green, blue and opacity case is used in the description below, but the description may also be applied to other cases.
583213specification -54- The operation of the raster image module 606, the flat colour module 608, the linearly ramped colour module 610, and the opacity tile module 612 will now be
O
Z described. The flat colour module 608 interprets the supplied record as a fixed format N record containing three 8-bit colour components (typically interpreted as red, green and Mr 5 blue components) and an 8-bit opacity value (typically interpreted as a measure of the fraction of a pixel which is covered by the specified colour, where 0 means no coverage, that is complete transparency, and 255 means complete coverage, that is, completely Sopaque). This colour and opacity value is output directly via the connection 698 and forms the determined colour and opacity without further processing.
The linearly ramped colour module 610 interprets the supplied record as a fixed format record containing four sets of three constants, cx, cy, and d, being associated with the three colour and one opacity components. For each of these four sets, a result value r is computed by combining the three constants with the current X count, x, and the current Y count, y, using the formula: r= clamp cx x cy y d) Where the function clamp is defined as: 255 255 <x clamp LxJ 0 x 255 0 x<O The four results so produced are formed into a colour and opacity value. This colour and opacity value is output directly via the connection 698 and forms the determined colour and opacity without further processing.
The opacity tile module 612 interprets the supplied record as a fixed format record containing three 8-bit colour components, an 8-bit opacity value, an integer X phase, (px), a Y phase, an X scale, a Y scale, and a 64 bit mask. These values 583213 specification Soriginate in the display list generation and contained typically in the original page description. A bit address, a, in the bit mask, is determined by the formula:
O
a x/2sx px mod 8 y/2sY py) mod 8 x 8 M, The bit at the address in the bit mask is examined. If the examined bit is one, (Ni the colour and opacity from the record is copied directly to the output of the module 612 Sand forms the determined colour and opacity. If the examined bit is zero, a colour having three zero component values and a zero opacity value is formed and output as the determined colour and opacity.
The raster image module 606 interprets the supplied record as a fixed format record containing six constants, a, b, c, d, tx, and ty; an integer count of the number of bits (bpl) in each raster line of the raster image pixel data 16 to be sampled; and a pixel type. The pixel type indicates whether the pixel data 16 in the raster image pixel data is to be interpreted as one of: one bit per pixel black and white opaque pixels; (ii) one bit per pixel opaque black or transparent pixels; (iii) 8 bits per pixel grey scale opaque pixels; (iv) 8 bits per pixel black opacity scale pixels; 24 bits per pixel opaque three colour component pixels;, or (vi) 32 bits per pixel three colour component plus opacity pixels.
Many other formats are possible.
The raster image module 606 uses the pixel type indicator to determine a pixel size (bpp) in bits. Then a bit address, a, in the raster image pixel data 16 is calculated having the formula: 583213specification -56a=bpp*L a* x+c* y+txJ+bpl*Lb*x+d*y+tyJ 0 z A pixel interpreted according to the pixel type from the record 602 is fetched from the calculated address in the raster image pixel data 16. The pixel is expanded as necessary to have three eight bit colour components and an eight bit opacity component.
(Ni By "expanded", it is meant for example, that a pixel from an eight bit per pixel grey scale opaque raster image would have the sampled eight bit value applied to each of the red, green and blue component, and the opacity component set to fully opaque. This then forms the determined colour and opacity output 698 to the pixel compositing module 700.
As a consequence, the raster pixel data valid within a displayable object is obtained through the determination of a mapping to the pixel image data within the memory 16. This effectively implements an affine transform of the raster pixel data into the object-based image and is more efficient than prior art methods which transfer pixel data from an image source to a frame store where compositing with graphic object may occur.
As a preferred feature to the above, interpolation between pixels in the raster image pixel data 16 may optionally be performed by first calculating intermediate results p, and q according to the formulae: p=a*x c*y tx q=b*x d*y ty Next the bit addresses, aOO, a01, alO0, and al 1, of four pixels in the raster image pixel data 16 are determined according to the formulae: 583213 specification -57- >a00 bpp Lp] bpl Lq]
O
Sa01 a00 bpp Sal0 a00 bpl M 5 all a00 bpl bpp Cc€ Next, a result pixel component value, r, is determined for each colour and opacity
O
I component according to the formula: r interp(interp(get(a00), get(a01), interp(get(al0),get(al p), q) where the function interp is defined as: interp(a, b, c) a (c-LcJ) In the above equations, the representation Lvaluel floor (value), where a floor operation involves discarding the fractional part of the value.
The get function returns the value of the current pixel component sampled from the raster image pixel data 16 at the given bit address. Note that for some components of some image types this can be an implied value.
As a preferred feature to the above, image tiling may optionally be performed by using x and y values in the above equations which are derived from the current X and Y counters 614,616 by a modulus operation with a tile size read from the supplied record.
Many more such fill colour generation sub-modules are possible.
583213specification -58-
O
3.6 Pixel Compositing Module 0 Z The operation of the pixel compositing module 700 will now be described. The N, compositing module 700 accepts colour composite messages passed to it from the priority n 5 determination module 500, via the fill colour determination module 600, and performs the colour and opacity operation specified in the colour op and alpha fields of the message, Cn, Nand the stack operation specified in the stack operation field of the messages. The primary function of the pixel compositing module is to composite the colour and opacity of all those exposed object priorities that make an active contribution to the pixel currently being scanned.
Preferably, the pixel compositing module 700 implements a modified form of the compositing approach as described in "Compositing Digital Images", Porter, T: Duff, T; Computer Graphics, Vol 18 No 3 (1984) pp 2 5 3 2 59 Examples of Porter and Duff compositing operations are shown in Fig. 21. However, such an approach is deficient in that it only permits handling a source and destination colour in the intersection region formed by the composite, and as a consequence is unable to accommodate the influence of transparency outside the intersecting region. The preferred arrangement overcomes this by effectively padding the objects with completely transparent pixels. Thus the entire area becomes in effect the intersecting region, and reliable Porter and Duff compositing operations can be performed. This padding is achieved at the driver software level where additional transparent object priorities are added to the combined table. These Porter and Duff compositing operations are implemented utilising appropriate colour operations as will be described below in more detail with reference to Figs. 20A, 20B, and 19.
Preferably, the images to be composited are based on expression trees.
Expression trees are often used to describe the compositing operations required to form an 583213 specification -59o image, and typically comprise a plurality of nodes including leaf nodes, unary nodes and binary nodes. A leaf node is the outermost node of an expression tree, has no descendent
O
Z nodes and represents a primitive constituent of an image. Unary nodes represent an operation which modifies the pixel data coming out of the part of the tree below the unary c 5 operator. A binary node typically branches to left and right subtrees; wherein each subtree is itself is an expression tree comprising at least one leaf node. An example of an ,j expression tree is shown in Fig. 17C. The expression tree shown in Fig. 17C comprises Sfour leaf nodes representing three objects A, B, and C, and the page. The expression tree of Fig. 17C also comprises binary nodes representing the Porter and Duff OVER operation. Thus the expression tree represents an image where the object A is composited OVER the object B, the result of which is then composited OVER object C, and the result of which is then composited OVER the page.
Turning now to Figs. 17A and 17B, there is shown a typical binary compositing operation in an expression tree. This binary operator operates on a source object (src) and a destination object (dest), where the source object src resides on the left branch and the destination object (dest) resides on the right branch of the expression tree. The binary operation is typically a Porter and Duff compositing operation. The area src n dest represents the area on the page where the objects src and dest objects intersect (ie both active), the area srcn dest where only the src object is active, and the area src n dest where only the dest object is active.
The compositing operations of the expression tree are implemented by means of the pixel compositing stack 38, wherein the structure of the expression tree is implemented by means of appropriate stack operations on the pixel compositing stack 38.
Turning now to Fig. 23A, there is shown the pixel compositing module 700 in accordance with one arrangement in more detail. The pixel compositing module 700 583213specification receives incoming messages from the fill colour determination module 600. These incoming messages include repeat messages, series of colour composite messages (see
O
Z Fig. 22B), end of pixel messages, and end of scanline messages, and are processed in sequence.
The pixel compositing module 700 comprises a decoder 2302 for decoding these incoming messages, a compositor 2303 for compositing the colours and opacities contained in the incoming colour composite messages. The pixel compositing module S700 also comprises a stack controller 2306 for placing the resultant colours and opacities on a stack 38, and output FIFO 702 for storing the resultant colour and opacity.
During the operation of the pixel compositing module 700, the decoder 2302, upon the receipt of a colour composite message, extracts the raster operation COLOUR_OP and alpha channel operation codes ALPHA_OP and passes them to the compositor 2304.
The decoder 2302 also extracts the stack operation STACK_OP and colour and opacity values COLOUR, ALPHA of the colour composite message and passes them to the stack controller 2306. Typically, the pixel composing module 700 combines the colour and opacity from the colour composite message with a colour and opacity popped from the pixel compositing stack 38 according to the raster operation and alpha channel operation from the colour composite message. It then pushes the result back onto the pixel compositing stack 38. More generally, the stack controller 2306 forms a source (src) and destination (dest) colour and opacity, according to the stack operation specified. If at this time, or during any pop of the pixel compositing stack, the pixel compositing stack 38 is found to be empty, an opaque white colour value is used without any error indication.
These source and destination colours and opacity are then made available to the compositor 2304 which then performs the compositing operation in accordance with the COLOUR_OP and ALPHA_OP codes. The resultant (result) colour and opacity is then 583213specification -61- O made available to the stack controller 2306, which stores the result on the stack 38 in accordance with the STACKOP code. These stack operations are described below in 0 Z more detail below.
SDuring the operation of the pixel compositing module 700, if the decoder 2302 5 receives an end of pixel message, it then instructs the stack controller 2306 to pop a e¢3 colour and opacity from the pixel compositing stack 38. If the stack 38 is empty an (Ni Sopaque white value is used. The resultant colour and opacity is then formed into an pixel Soutput message which is forwarded to the pixel output FIFO 702. If the decoder 2302 receives a repeat message or an end of scanline message, the decoder 2302 by-passes (not shown) the compositor 2304 and stack controller 2306 and forwards the messages to the pixel output FIFO 702 without further processing.
Figs. 24A, B, C, and D show the operation performed on the pixel compositing stack 38 for each of the various stack operation commands STACKOP in the colour composite messages.
Fig 24A shows the standard operation STD_OP 2350 on the pixel compositing stack 38, where the source colour and opacity (src) are obtained from the colour composite message, and the destination colour and opacity (dest) is popped from the top of the pixel compositing stack 38. The source colour and opacity (src) is taken from the value in a current colour composite message for the current operation, and destination colour and opacity (dest) is popped from the top of the stack 38. The result of the COLOUROP operation performed by the compositor 2304 is pushed back onto the stack 38.
Fig 24B shows the NO_POPDEST stack operation 2370 on the pixel compositing stack 38. The source colour and opacity (src) is taken from the value in a current composite message for the current operation, and the destination colour and opacity (dest) 583213specification -62is read from the top of the stack 38. The result of the COLOUROP operation performed by the compositor 2304 is pushed onto the top of the stack 38.
O
Z Fig 24C shows the POP_SRC stack operation, where the source colour and (,i opacity are popped from the top of the stack, and the destination colour and opacity is popped from the next level down the stack. The.result of the COLOUR_OP operation performed by the compositor 2304 is pushed onto the top of the stack.
N Fig. 24D shows the KEEP SRC stack operation, where the source colour and opacity are popped from the top of the stack, and the destination colour and opacity is (,i popped from the next level down the stack. The result of the COLOUR_OP operation performed by the compositor 2304 is pushed onto the top of the stack.
Other stack operations can be used, without departing from the spirit of the invention.
The manner in which the compositor 2304 combines the source (src) colour and opacity with the destination (dest) colour and opacity will now be described with reference to Figs. 7A to 7C. For the purposes of this description, colour and opacity values are considered to range from 0 to 1, (ie: normalised) although they are typically stored as 8-bit values in the range 0 to 255. For the purposes of compositing together two pixels, each pixel is regarded as being divided into two regions, one region being fully opaque and the other fully transparent, with the opacity value being an indication of the proportion of these two regions. Fig. 7A shows a source pixel 702 which has some three component colour value not shown in the Figure and an opacity value, The shaded region of the source pixel 702 represents the fully opaque portion 704 of the pixel 702.
Similarly, the non-shaded region in Fig. 7A represents that proportion 706 of the source pixel 702 considered to be fully transparent. Fig. 7B shows a destination pixel 710 with some opacity value, The shaded region of the destination pixel 710 represents the 583213specification
I
63 fully opaque portion 712 of the pixel 710. Similarly, the pixel 710 has a fully transparent portion 714. The opaque regions of the source pixel 702 and destination pixel 710 are,
O
Z for the purposes of the combination, considered to be orthogonal to each other. The overlay 716 of these two pixels is shown in Fig. 7C. Three regions of interest exist, which include a source outside destination 718 which has an area of so (1 do), a source intersect destination 720 which has an area of so do, and a destination outside source 722 which has an area of (1 so) do. The colour value of each of these three regions is calculated conceptually independently. The source outside destination region 718 takes its colour directly from the source colour. The destination outside source region 722 takes its colour directly from the destination colour. The source intersect destination region 720 takes its colour from a combination of the source and destination colour.
The process of combining the source and destination colour, as distinct from the other operations discussed above is termed a raster operation and is one of a set of functions as specified by the raster operation code from the pixel composite message.
Some of the raster operations included in the preferred arrangement are shown in Fig. 19.
Each function is applied to each pair of colour components of the source and destination colours to obtain a like component in the resultant colour. Many other functions are possible.
The alpha channel operation from the composite pixel message is also considered during the combination of the source and destination colour. The alpha channel operation is performed using three flags LAOUSEDOUT_S, LAOUSE S OUTD, LAOUSESROP_D, which respectively identify the regions of interest (1 so do, so (1 do), and so do in the overlay 716 of the source pixel 702 and the destination pixel 710. For each of the regions, a region opacity value is formed which is zero if the 5832 13specification -64corresponding flag in the alpha channel operation is not set, else it is the area of the region.
0 Z The resultant opacity is formed from the sum of the region opacities. Each N, component of the result colour is then formed by the sum of the products of each pair of Mc, 5 region colour and region opacity, divided by the resultant opacity.
_As shown in Fig. 20, the Porter and Duff operations may be formed by suitable SALPHAOP flag combinations and raster operators COLOUROP, provided that both Soperands can be guaranteed to be active together. Because of the way the table is read, if only one of the operands is not active, then the operator will either not be performed, or will be performed with the wrong operand. Thus objects that are to be combined using Porter and Duff operations must be padded out with transparent pixels to an area that covers both objects in the operation. Other transparency operations may be formed in the same way as the Porter and Duff operations, using different binary operators as the COLOUROP operation.
The resultant colour and opacity is passed to the stack controller circuit and pushed onto the pixel compositing stack 38. However, if the stack operation is STACKKEEP_SRC, the source value is pushed onto the stack before the result of the colour composite message is pushed.
When an end of pixel message is encountered, the colour and opacity value on top of the stack is formed into a pixel output message, and sent to the Pixel Output module.
Repeat pixel messages are passed through the Pixel Compositing module to the Pixel Output module.
Preferably, the compositing module 700 includes the registers used for storing the intermediate results of optimised sequences, and accepts messages store into, and restore from these registers, according to messages issued by the optimisation circuit 550.
583213specification o The optimisation module 550 is used in conjunction with the pixel compositing module 700 as shown in Fig. 23A. In this case, a colour composite message with the
O
Z STORE bit set will cause the compositing module 700 (Fig. 23A) to store the value on top of the stack 38 into the register 2310, prior to performing the colour, opacity and stack C 5 operations specified in colour composite message. Conversely, when a colour composite message with the RESTORE bit set is encountered, the value in this register 2310 is Scopied from the register 2310 onto the top of the stack 38, prior to performing the colour, Sopacity and stack operations specified in the colour composite message.
3.7 Pixel Output Module The operation of the pixel output module 800 will now be described. Incoming messages are read from the pixel output FIFO, which include pixel output messages, repeat messages, and end of scanline messages are processed in sequence.
Upon receipt of a pixel output message the pixel output module 800 stores the pixel and also forwards the pixel to its output. Upon receipt of a repeat message the last stored pixel is forwarded to the output 898 as many times as specified by the count from the repeat message. Upon receipt of an end of scanline message the pixel output module 800 passes the message to its output.
The output 898 may connect as required to any device that utilizes pixel image data. Such devices include output devices such as video display units or printers, or memory storage devices such as hard disk, semiconductor RAM including line, band or frame stores, or a computer network. However, as will be apparent from the foregoing, a method and apparatus are described that provide for the rendering of graphic objects with full functionality demanded by sophisticated graphic description languages without a need for intermediate storage of pixel image data during the rendering process.
583213specification -66- RENDER TIME ESTIMATION (RTE) SOFTWARE MODULE
O
Z As described above, the RTE module forms part of the driver software running on the host processor 2. The RTE module estimates the time required to render the display list that is generated by the driver software. In order to make an estimate of render time for a render job, the RTE software module uses a simplified model of the operation of the (Ni pixel sequential renderer 20. The following sections describe a model used by the RTE Ssoftware module to represent how the renderer 20 and the print engine of the printer operate the output buffer 19. In the described arrangement, the RTE software module models the output buffer 19 as a First In First Out buffer (FIFO).
4.1 Output Buffer Model Turning now to Fig. 26, there is shown a block diagram of the output buffer 19 of the rendering system of Fig. 2. The output buffer is preferably a FIFO output buffer 2600. The renderer 20 writes rendered data to the buffer 2600 and the print engine reads rendered data from an end of the buffer 2600.
In the described FIFO model, the granularity for writing is called a chunk. The granularity for reading can be the same as or smaller than the granularity for writing. The RTE software module models the FIFO such that a chunk of data must be filled up completely before it is available for reading.
As seen in Fig. 26, the contents of buffer 2600 is divided into three parts. At the top of the buffer there is available data 2602 from which the print engine reads and pops data.
Below the available data 2602 there is a chunk 2604 that receives data from the renderer (or decompressor 3002). The remainder of the buffer 2600 not occupied by the available data 2602 or the chunk 2604 is the free buffer 2606.
5832 13specification -67o The print engine reads pixel data from the top of the FIFO 2600 at a constant rate.
Data that has been read is popped off the FIFO 2600 and data beneath moves upward to
O
Z replace the read data. The print engine can only read data that has been written to the (,i N FIFO 2600 and is available for reading, i.e. the available data 2602.
M 5 Before the print engine starts reading data from the buffer 2600, some head-start data is accumulated in the available data 2602. Once the print engine starts reading, the N collection of available data 2602 (and its size) changes as data is popped off the buffer 2600 and new chunks of data are added to the available data 2602.
(,i The renderer 20 or decompressor 3002 writes rendered pixel data in the chunk 2604 below the available data 2602 at a variable rate. In estimating the possibility of real time rendering, the RTE software module considers one chunk of data per iteration.
The RTE software module estimates the time for a particular chunk 2604 to be filled up. During this time, if the print engine reads less data (or the same amount of data) than the available data 2602, then the chunk 2604 can be rendered and written to the output buffer 19 in time for printing. Otherwise, if the print engine reads more data during the estimated chunk render time than is present in the available data 2602, then the chunk 2604 cannot be rendered fast enough, and so the page cannot be printed as it is being rendered.
Once a chunk 2604 is filled up with data, the current iteration ends. The chunk is merged with the existing available data 2602 and becomes part of the available data for the next iteration. A part of the free buffer 2606 is reserved for the next chunk of data 2604. The size of the free buffer 2606 for the next iteration is also updated.
As the available data 2602 grows, so the free buffer 2606 shrinks. At the end of an iteration, if there is not enough free buffer 2606 to reserve a chunk 2604 for the next iteration, the modelled renderer stalls for one iteration and waits for more free buffer 2606 583213specification -68- O to become available. The next iteration starts when there is enough free buffer 2606 to store the next chunk of data 2604.
Z The process of modelled writing and reading data continues until the RTE software module reaches the end of the current page.
4.2 Printer Synchronisation Preferably, the rendering system 1 controls synchronisation by telling the renderer to start rendering a page and then, after a head start time period, telling the print engine of the printer 10 to start reading and printing.
When modelling the rendering of a display list description, the RTE module calculates the amount of initial head-start data that is placed in the buffer 2600 during the head start time period. In the model used by the RTE, the renderer 20 has a head-start time to generate the initial head-start data added to the available data 2602. This headstart time is mainly dependent on: The residual data of the previous render job in the output buffer 19 (in the case where the printer 10 has been busy); The printer inter-page time gap between reading data from two consecutive pages (after reading data for one page, the printer does not immediately start reading data for the next page); and e The printer cycle-up time (in the case where the printer has cycled down).
In one implementation of the RTE module, the head-start time is modelled by the inter-page time gap.
The remainder of section 4 describes how the RTE module estimates the time taken by the renderer 20 to render part of the intermediate render job, generating a chunk 2604 of rendered data in the output buffer 2600.
583213specification
I
-69- 4.3 Nomenclature The following acronyms are used in the variable names for the model of the renderer Acronym Corresponding module IE Instruction Executor 300.
EM Edge Processing Module 400.
EM-EF or EF Edge Fetch sub-module (part of Edge Input Module 408).
EM-PF or PF Edge Prefetch sub-module (part of Edge Input Module 408).
EM-MC or MC Main Controller sub-module.
EM-UM or UM Update sub-module (part of Edge Update Module 410).
EM-SM or SM Sort sub-module(part of Edge Update Module 410).
EM-OM or OM Edge Output Module 414.
LAM Priority Determination Module 500.
LOM Optimisation Module 550.
PGM Fill Colour Determination Module 600.
PCM Pixel Compositing Module 700.
POC Pixel Output Module 800.
FG Priority Generation Module 516 In addition, a Band is a group of consecutive scanline(s) on a page defined by a LOAD_EDGES_AND_RENDER instruction and possibly by chunk boundaries.
On a scanline across a printed page, if the consecutive pixels between two edges are all the same, then the pixels are called x-independent pixels, and the series of such pixels 583213 specification O is called an x-independent run. If the consecutive pixels between two edges are not all the same, then the pixels are called x-dependent pixels. An x-independent pixel run can be
O
Z substituted by a PIXEL_REPEAT command.
In the described implementation, PIXELREPEAT commands are not stored in the
B
S5 output FIFO buffer 2600, so the number of scanlines in a chunk is given by B where CSPpl B the chunk size (in bytes); SS the number of bytes to store a pixel in the chunk; and Ppl the number of pixels on a scanline.
4.4 Estimating Page Render Time The flow chart of Fig. 28 illustrates the operation of the RTE module. The process commences 2802 with the first scanline on the page. Then, in step 2804, the RTE module initialises the estimate of the head start time, i.e. the time available for writing to the buffer 2600 before the print engine starts reading from the buffer 2600. As discussed above, the head start time is estimated to be the inter-page time gap. The head start time is then reduced by a fixed estimate of a typical latency in the rendering pipeline 22, i.e.
the minimum number of clock cycles before the pipeline 22 produces an output. In one implementation, the fixed estimate of the latency is 52 clock cycles. The estimate of the initial head start time is limited to be always greater than or equal to zero.
Step 2804 also initialises the amount of available data 2602 to zero and sets the amount of free buffer 2606 to the full size of the output buffer 2600.
Next, in step 2806, the RTE module estimates the time required to render the current chunk of scanlines, or the remaining scanlines on the page if the number of remaining scanlines is less than a chunk. A procedure for estimating the chunk render time is described in more detail below with reference to Fig. 29. In summary, the 583213 specification -71 estimating step first calculates the render time for each of the time-intensive modules in the rendering pipeline 22, including memory access times. The pipelining and/or stalling
O
Z effects in the render pipeline 22 are also considered. The time used to access memory Nvia a Memory Request Arbiter (MRA) is modelled, as it is likely that pipeline modules n 5 will have to wait for access to memory 30 while other modules are active. The probability of delays depends on the activity of the modules.
MNi N Next, in step 2808, the RTE module checks whether the estimated chunk render Stime is less than or equal to the remaining head-start time. If so (the YES option of step 2808), process flow proceeds to step 2814. Thus, until the head-start time is over, the test of step 2810 is bypassed. Note that once the initial head start time has passed, step 2808 will always return a NO.
If the estimated chunk render time exceeds the head start time (the NO option of step 2808), then in step 2810 the RTE module calculates the amount of data the print engine reads from buffer 2600 in a time equal to the estimated chunk render time. The amount of data read by the print engine during the chunk render time is simply the print engine constant reading rate multiplied by the estimated chunk render time.
Step 2810 checks whether the calculated amount of data read by the print engine is greater than the modelled amount of available data 2602. If so (the YES option of step 2810), the RTE module determines 2812 that the page cannot be rendered in real time.
If sufficient available data 2602 is available for the print engine during the estimated chunk render time (the NO option of step 2810), then process flow proceeds to step 2814, which checks whether there are still scanlines on the page. If there are no more scanlines to render (the NO option of step 2814), then the RTE module determines 2816 that the page can be rendered in real time.
583213 specification -72- If there are still scanlines on the page (the YES option of step 2814), then in step 2818 the RTE module updates the modelled available data 2602, the head start time, and 0 Z the new amount of free buffer 2606. The amount of available data 2602 is modelled as: New amount of available data (old amount of available data) (chunk size) C 5 (amount of data read by print engine during estimated chunk render time).
_The chunk size is determined by the number of scanlines in the chunk, the number of pixels per scanline and the number of bytes required to store a pixel.
The modelled amount of free buffer 2606 is: New amount of free buffer (fixed total size of buffer 2600)-(new amount of available data) Step 2818 also reduces any remaining head-start time by the estimated chunk render time, down to a minimum of zero.
Next, in step 2820, the RTE module checks whether there is enough free buffer 2606 for a chunk of new data 2604 to be added to buffer 2600. If there is enough space (the YES option of step 2820), then processing returns to step 2806 to estimate the render time of the next chunk.
If there is not enough space for a new chunk (the NO option of step 2820) then the RTE module assumes that the print engine reads while the renderer 20 waits. This is modelled in step 2822 by setting the head start time to zero, setting the free buffer 2606 to the chunk size and setting the available data 2602 to be: New amount of available data (total size of buffer 2600) (chunk size).
583213 specification -73 Estimating the chunk render time Fig. 29 shows in more detail the step 2806 of estimating the render time required for
O
Z a chunk. The estimate is based on a model of the operation of the render pipeline 22. The (,i (Ni pipeline 22 is illustrated in Fig. 3.
An intermediate render job prepared by the display list generation module of the driver software consists of instructions for the renderer 20 to execute. There are many.
N types of instructions. The instructions of the intermediate render job that are most relevant to the RTE module are the LOAD EDGES AND RENDER instructions. Each LOAD_EDGES_AND_RENDER instruction specifies a number of scanlines to be rendered. A scanline is the collection of pixels horizontally across a printed page, with a height of one pixel. The Renderer 20 reads in edge records from memory and generates rendered data for the specified number of scanlines. Edge records in memory are updated as they are being used. An edge is described by a series of Segments. As an edge is updated from one scanline to the next scanline, the current Segment is tracked in y and x coordinates, and if the current Segment ends, the next Segment becomes current. The number of scanlines specified in different LOAD_EDGESANDRENDER instructions may be different. The collection of scanlines specified in each LOAD EDGES AND RENDER instruction is called a band.
The step 2806 of estimating the chunk render time starts in step 2900 and enters a loop 2902 which processes each band in the chunk.
The chunks of data in the Output Buffer 2600 do not necessarily coincide with the bands of the LOAD EDGES AND RENDER instructions. When the chunks are superimposed on the bands of a page, if there are multiple bands in a chunk, or if there are partial bands in a chunk, the RTE module estimates the render time for each band and each partial band in the chunk. The chunk render time equals the sum of these estimates.
583213specification -74- A band in a chunk is specified by a LOADEDGESANDRENDER instruction, or part of such an instruction if the instruction covers a chunk boundary and is divided by the
O
Z chunk boundary.
The LOAD_EDGES_AND_RENDER instruction is interpreted by the Instruction c 5 Executor (IE) 300, which generates commands to send down to the next module 400 of _the render pipeline 22 in the Renderer 20. The commands are executed by the destination Smodule and more commands are generated for the next module 500. This process 0 continues until the last module 800 writes rendered data to the Output Buffer 2600.
4.5.1 Estimating execution times for the Edge Processing Module In step 2904 the RTE module estimates the execution times for each time intensive sub-module of the Edge Processing Module 400. The Instruction Executor 300, Level Optimisation Module 550 and Pixel Compositing Module 700 are sufficiently fast that they do not significantly affect the render time of the pipeline 22. Accordingly, the model of the render pipeline used by the RTE module does not estimate the render times of the Instruction Executor 300, the Level Optimisation Module 550 and the Pixel Compositing Module 700.
One RENDER command is generated by the Instruction Executor 300 from a LOAD_EDGES_ANDRENDER instruction. The Edge Module 400 consumes the RENDER command and generates EDGE_CROSSING commands. An EDGE_CROSSING command is generated for each intersection of an edge with a scanline, for each level controlled by the edge.
There are lower level sub-modules within the Edge Processsing Module 400(EM) itself. The sub-modules are the: Edge Fetch sub-module (EM-EF or EF) 583213specification o Edge Prefetch sub-module (EM-PF or PF) Main Controller sub-module (EM-MC or MC)
O
Z Update Module sub-module (EM-UM or UM).
Sort Module sub-module (EM-SM or SM).
Cc Output Module (EM-OM or OM) 414.
_With reference to Fig. 4, the Edge Fetch submodule, the Edge Prefetch sub-module and the Main Controller submodule are all implemented as part of the Edge Input Module S408. The Update Module submodule and the Sort Module are implemented within the Edge Update Module 410.
The Memory Request Arbiter (MRA) is the interface between the renderer 20 and the memory 30. The Edge Fetch sub-module (EM-EF) reads Edge records from memory and stores the Edge records in a Cache within the Edge Input Module 408. The Prefetch sub-module (EM-PF) looks at the Edge records in the Cache, reads the associated levels and segments from memory and stores the levels and segments in the Cache. The Main Controller sub-module (EM-MC) reads Edge records and the associated levels and segments from the Cache. The EM-MC generates EDGE_CROSSING commands and sends such commands down the Level Bus 498 to the Priority Determination Module 500.
The Edge records already processed by the EM-MC are sent to the Update sub-module (EM-UM). The EM-UM performs edge tracking and updates the Edge records' y and x coordinates. The Sort sub-module (EM-SM) attempts to sort the updated Edge records into a Sort Buffer the edge pool buffer 412), in increasing order of their new x coordinates. When the Sort Buffer 412 is full and another updated Edge record (the latest Edge record) must be sorted, there are two cases that may arise. If the latest Edge record has the lowest x coordinate compared to all Edge records in the Sort Buffer 412, the latest Edge record is a Spill Edge and must be inserted into an ordered list 406 of Spill Edges in 583213 specification -76memory 32. This Spill Sorting process involves reading individual Edge records from memory 32 and writing Edge records to memory 32, perhaps many times. The order of 0 ©2 Z complexity of Spill Sorting is O(n2), where n is the number of Spill Edges in memory. If the latest Edge record does not have the lowest x coordinate, then the Edge record in the Sort Buffer 412 with the least x coordinate is written to memory 32 by the Edge Output Module 414. The Sort Buffer 412 now has room for the latest Edge to be sorted in. At the end of each scanline, the Sort Buffer 412 is also written to memory by the Edge Output Module 414.
Step 2904 estimates execution times for the EF, PF, MC, UM, SM and OM submodules, taking into account the probability of edge spills for estimating the execution time for spill sorting in the OM sub-module 414.
The RTE module keeps counts of the estimated memory accesses for each submodule.
4.5.1.1 Estimate the execution time for the Edge Fetch sub-module The Edge Fetch sub-module (EM-EF) reads Edge records from memory and stores them in the Cache.
From the edge records in the LOAD_EDGES_AND_RENDER instruction for a current band, the RTE module estimates the amount of data transfer from memory required by the Edge Fetch sub-module. The reading time and the amount of data transfer depend on the edge type, but for the most common edge type, the reading time is about clock cycles, and 16 bytes are read from memory.
The EF sub-module time and data transfer are modelled by: EF_time number of edges per scanline number of scanlines in the band clock cycles, 583213specification -77- EF_read number of edges per scanline number of scanlines in the band 16 bytes.
0 N 4.5.1.2 Estimate the execution time for the Edge Prefetch sub-module The execution time and data transfer of the PF sub-module are modelled by: PFtime number of edges in the band number of scanlines in the band 3 clock cycles 0PF_read number of scan lines in the band 0 number of edges with indirect segments 8 bytes number of edges with indirect levels 8 bytes Swidth of bitmap in bytes number of bitmap edges An indirect level is a level that is indirectly addressed, and thus requires more memory reads/writes than a directly addressed level.
4.5.1.3 Estimate the execution time for the Main Controller sub-module: The execution time and data transfer for the MC submodule are modelled by: MC_time number of scan lines in the band number of edges in the band 3 clock cycles (number of edges with indirect levels total number of levels in all edges) 1 clock cycles MCread number ofscanlines in the band S((number of indirect levels in an edge 4) 2 bytes) edgeswithmorethan 4indirectlevels where the summation is over all edges with more than four indirect levels.
583213specification 78 4.5.1.4 Estimate the execution time for the Update Module sub-module:
O
The execution time of the Update submodule for the current band is modelled as: UM time number of scan lines in the band number of edges that are not bitmap edges* 6.5 clock cycles number of bitmap edges 3 clock cycles number of indirect segments in all edges in the band 1 clock cycle.
4.5.1.5 Estimate the execution time for the Sort Module sub-module: In one implementation, there are two sorters operating simultaneously in the implementation of the Edge Processing Module 400. Accordingly, the number of edges per sorter can be halved and the execution time is modelled as: SM_time number of scanlines in the band (number of edges in the band 2) 8.7 clock cycles.
4.5.1.6 Estimate the execution time for the Output Module sub-module: The output submodule has two output buffers from which up to 64 edges may be flushed at the end of each scanline. The modelled OM measures include the time required for this flushing, and are given by: OM_ time number ofscanlines in the band (number of edges in the band 2) 1.7 clock cycles.
and OM write EF read.
583213 specification -79- 4.1.1.1 Estimate the time required for spill edge sorting To estimate the time required for the edge processing module 400 to complete spill
O
Z edge sorting, it is necessary to estimate the number of spill edges on the scanlines of the current band. This may be done by estimating how much an edge zigzags across the c 5 page. When going from one scanline to the next, an edge has a horizontal displacement (in x-coordinates). For each edge in the band, the RTE module calculates the average N positive displacement per scanline (positive dx) by: Z displacement of this edge to the right) number ofscanlines in the band allscanline sin theband The summation is over all scanlines in the band.
If the edge does not have any displacement to the right, the average positive dx is set to zero. If an edge is a bitmap edge, the bitmap width is taken to be the horizontal displacement from one scanline to the next.
Similarly, each edge's negative displacement per scanline (negative dx) is calculated as: displacement of this edge to the left) number of scanlines in the band allscanline sin theband If the edge does not have any displacement to the left, the average negative dx is set to zero.
Let e the number of edges being sorted on a scanline. There are two sorters, so e number of edges per scanline in the band 2.
The value e needs only to be taken into account by the RTE module if e is greater than the number of edges that can be stored in the Sort Buffer 412. In the described implementation, the Sort Buffer can store 32 edges.
The RTE module selects a pair of edges with the maximum (dx 2 dxi) value, where: 583213specification 0dx 2 the average positive dx of the second edge (if this value does not exist, let dx 2 be
(N
zero, and
O
Z dxi the average negative dx of the first edge (if this value does not exist, let dxl be zero).
C 5 Without knowing the x positions of the edges, the RTE module assumes the
(N
Sedges are equally likely to be anywhere across the width of the page. The probability of (c the edges intersecting is approximately: Pintersect MIN(1, W w where w the width of the page, in number of pixels per scanline.
If the first Edge intersects with any 32 other Edges selected from edges, the first edge becomes a Spill Edge. The probability for the first Edge to become a Spill Edge can be estimated by: Pspil MIN( 1, e-lC3 2
P
2 intersect) The notation is the mathematical combination function where nCr n! which is also known as the binomial coefficient.
L the expected (or average) number of spill edges in a scanline, is modelled as: L MIN(e 32, e Pspil).
The number of spill reads and writes can then be modelled as: Spill_read 2 number ofscanlines in the band L(L-1)/2 32 bytes, and Spill_write 2 number ofscanlines in the band L(L+1)/2 32 bytes.
4.5.2 Estimating the execution times of remaining modules In step 2906, the RTE module estimates the execution times of the remaining time-intensive modules in the pipeline 22. The modules of the renderer 20 have fast 583213 specification -81 0 operations and slow operations. Fast operations generate a command for the next module in the pipeline very quickly. Accordingly, the model used by the RTE module has to
O
Saccount for 'fast' and 'slow' times. In particular, the total execution time of a module does not necessarily allow the RTE to estimate the stalling time of a module caused by S 5 another module performing slow operations. The pipeline model breaks up the running time of a module into two parts: the time when the module is performing fast operations Sand the time when the module is performing slow operations.
0 The RTE module keeps track of the number of estimated memory accesses.
4.5.2.1 Estimate the execution time for the Priority Determination Module (LAM) For each EDGE_CROSSING command received from the Edge Processing Module 400, the Priority Determination Module 500 updates the level activation table 530 to activate or deactivate a level.
If the EDGE_CROSSING command has a greater x-value than the current xposition, the Priority Determination Module 500 generates REPEAT_FOR commands for the Optimisation Module 550. Each command includes a list of contributing levels and the number of pixels to repeat (this number can be 1).
The number of EDGE_CROSSING commands sent to the Priority Determination Module 500 can be estimated as follows: n EC command number ofscanlines number of levels controlled by the edge allOpenPageorGDledges 2 number ofscanlines I number of levels controlled by the Postscript edge allPostscriptedges C bitmap height in scanlines maximum number of edge crossings in the bitmap.
allBitmapedges 583213specification -82o The summations relate to different types of edges, for example edges provided to the driver software in a page description language such as PostScript or provided in 0 function calls to a graphics interface such as Microsoft GDI.
c When going through the list of edges to count the number of levels, the RTE module keeps track of the total number of unique levels being controlled by edges in the band, and each unique level's bounding box. These levels are kept in a list ordered in z- N order, from top levels to bottom levels. The need for such a list arises because different 0edges can relate to the same level. Instead of reading the level information from memory each time a level is required, the Priority Determination Module 500 maintains a list of levels that have already been activated. Such an arrangement reduces the number of memory accesses. Edge crossing commands from the same edge are located at the same x-coordinate.
Let total_n_levels total number of unique levels being controlled by edges in the band Another useful measurement is the number of active edges on a scanline in the band, which can be modelled as: n_active_edges number ofscanlines number of Open Page or GDI edges 2 number ofscanlines number of Postscript edges C bitmap height in scanlines maximum number of edge crossings in the bitmap.
allBitmapedges The execution time of the Priority Determination Module 500 (LAM) is modelled as: LAM_time number ofscanlines 1 clock cycle n_ECcommands 12 clock cycles 583213specification -83o The number of data reads and writes required by the Priority Determination Module 500 is estimated as follows:
O
ZIF total n levels 512: cLet LAMLC_read n_EC_commands (1 512/total_n_levels) 32 bytes Let LAM_LC_write n_EC_commands (1 512/total_nlevels) 32 bytes IF total_n_levels 4096 ec¢ SLAM_SCread n_EC_commands (1 4096/total_n_levels) 32 bytes -N 10 LAM_SC_write n_EC_commands (1 4096/totaln_levels) 32 bytes
ELSE
LAM SC read 0 LAM SC write 0
ENDIF
ELSE
LAM LC read 0 LAM LC write 0
ENDIF
The RTE module also estimates the execution time of the Priority Generation Module 516 which generates CMDFILL and CMDREPEATFOR commands to pass down to the Optimization Module 550 (LOM). The number of commands generated by the Priority Generation Module 516 depends on the number of active edges on each scanline. The RTE module estimates the time for the Priority Generation Module 516 to process all current active levels while generating commands.
Clip levels are ignored by the RTE module. Clip levels are never active. They can only de-activate other levels, hence reducing the workload for the Priority Generation Module 516. The number of direct fill colours is assumed to be small and direct fill colours are treated as normal fills.
583213specification -84- 8 The RTE module estimates the Priority Generation Module 516 execution time by considering the object levels (ie. not clip levels) controlled by all edges. For efficiency,
O
Z the RTE module keeps a list of unique object levels controlled by the edges being N' considered, ordered in z-order, from top levels to bottom levels. This list saves the RTE rn, 5 from searching the Level Table regularly, which is a time consuming activity. As described above, the Priority Determination Module 500 keeps a similar list, called (Ni N Summary Tables, when rendering the page. The RTE sets up the list before a set of Sscanlines is considered in order to estimate the render time. The RTE module sets up the (,i list by going through all edges and their levels, adding pointers of unique object levels to the list.
Each active level is regarded as an object and the RTE module models an object by the level's bounding box since the actual shape of objects is not easily taken into account in the RTE model. Since an estimate is made for a set of scanlines, the RTE module uses only the part of the area of the level that lies on the current set of scanlines. The area to be considered is obtained from the level's bounding box.
The pixel-sequential renderer 20 does not render objects or portions of objects that are obscured by opaque objects within a band. As this feature of the renderer reduces the render time, the feature must be considered when estimating the render time of the Priority Determination Module 500. Fig. 30 illustrates a procedure for estimating the render time, taking into account the manner in which objects overlap.
In the following description, a distinction is drawn between 'territories' and 'level areas'. The distinction may be illustrated with reference to the objects in Fig. 8A, which shows a red opaque rectangle 90 and a blue semi-transparent triangle 80. Each object 80, 90 has a level area measured in pixels that describes the area of the object.
However, the triangle 80 is superimposed on the rectangle 90 and thus the total area 583213specification occupied by the overall object is less than the sum of individual areas of objects 80 and The territory of the overall object is defined by the number of pixels in the union of
O
Sobjects 80 and The RTE uses rectangular bounding boxes to approximate levels. Thus, when considered by the RTE, the rectangle 90 and triangle 80 are approximated by bounding boxes, which are exact for the rectangle SIn modelling the render time of the priority determination module 500, the RTE 0 module needs to estimate the following x_dependentterritory, the x-dependent pixels on the printed chunk, regardless of how many levels are overlapped; and x_dependentlevel_area and xindependentlevelarea, which are estimates of the x-dependent and x-independent pixels receiving contributions from different levels on the chunk.
In calculating level areas, the levels may be imagined to be spread out so as to be non-overlapping. To estimate these variables, the RTE module goes through the list of active levels from top levels to bottom levels, accumulating some values as each level is considered.
The process starts in step 3010, in which the RTE module initialises variables, using a data structure that represents a set of polygons with vertical and horizontal sides. This data structure is used to represent the following variables, which are initialised to be empty sets: x_independent opaque_territory, xindependent transparent territory, x_dependent_opaqueterritory, and x_dependenttransparentterritory.
583213 specification -86- O These intermediate variables represent different types of levels on the current chunk. The territories are non-overlapping.
0 Z Another variable initialised in step 3010 is n_x_dependent_levels, which is the c number of levels that potentially contribute to the x_dependentlevel_area.
S 5 nx dependent_levels is initialised to zero.
xindependenttransparent_level_area is another intermediate variable initialised to zero. This area can potentially become x-dependent if there is an x-dependent level Sbelow it.
Further variables initialised to zero in step 3010 are x_dependent_territory, xdependent_level_area and xindependent_level_area.
Next, in step 3010, the RTE module enters a loop 3012 that considers each unique level in turn. As each level from the list of unique level is considered, step 3014 checks whether the current level's bounding box intersects any of the following territories: x_independent opaque_territory, x_independent_transparent_territory, x_dependent_opaque_territory, and x_dependent transparent_territory.
If there is no intersection (the NO option of step 3014), then in step 3016 the parts of the current level that do not intersect with any of the above territories are added to the appropriate territory as determined by the type of the current level.
If there is an intersection (the YES option of step 3014), then in step 3018 the RTE module adds any non-intersecting portion of the current level to the appropriate territory.
Then, in step 3020, the part of the current level that intersects one of the above territories is considered, and is added to the appropriate territory in accordance with one of the sixteen scenarios of Figs. 27A to 27P.
583213specification -87- In the case where a new x-independent opaque level 2701 is under the x_independent_opaque_territory 2700, the territory 2702 is unchanged by the intersection
O
Z with the new level 2701 (Fig. 27A).
(N
Where a new x-independent transparent level 2704 is under the c 5 x_independent_opaque_territory 2703, the intersection 2706 is taken away from the x_independentopaque_territory to form the updated x-independent opaque territory c 2705. The intersection 2706 is added to the x_independenttransparent_territory (Fig.
O 27B).
In the case where a new x-dependent opaque level 2708 is under the x_independent_opaque_territory 2707, the intersection 2710 is taken away from the x_independent_opaque_territory, yielding the updated x_independent_opaque_territory 2709. The intersection 2710 is added to the x_dependent_opaque_territory (Fig. 27C).
Where a new x-dependent transparent level 2712 is under the x_independent_opaque_territory 2711, the intersection 2714 is taken away from the xindependent_opaque_territory, forming the updated x_independent_opaque_territory 2713. The intersection 2714 is added to the x_dependent_transparent_territory (Fig, 27D).
Where a new x-independent opaque level 2716 is under the x_independent_transparent_territory 2715, the intersection 2718 is taken away from the x_independent_transparent_territory to give updated x_independent_transparent_territory 2717. The intersection 2718 is added to the x_independent_opaque_territory. The area of the intersection is also accumulated to the x_independent_transparent_level_area 2721.
The variable n_x_dependent_levels is increased by one in this case (Fig. 27E).
Where a new x-independent transparent level 2020 is under the x_independent_transparent_territory 2719, the territory 2719 is unchanged by the 583213 specification -88- O intersection. The area of the intersection 2722 is accumulated to the x_independent_transparent_level_area, and n_x dependent_levels is increased by one 0 Z (Fig. 27F).
Where a new x-dependent opaque level 2724 is under the x_independent_transparent_territory 2723, the intersection is taken away from the x_independent_transparent_territory 2723 (giving territory 2725). The intersection is c added to the x_dependentopaqueterritory 2726, and the area of the intersection is 0 accumulated to the x independent transparentlevelarea 2727. In this case nx dependent_levels is increased by one (Fig. 27G).
Where a new x-dependent transparent level 2729 is under the x_independent_transparentterritory 2728, the intersection is taken away from the x_independent_transparent_territory to give updated x_independent_transparent_territory 2730. The intersection is added to the x_dependent_transparent_territory 2731. The area of the intersection is also accumulated to the x_independenttransparent_level_area 2732, and n_x_dependent_levels is increased by one (Fig. 27H).
Where a new x-independent opaque level 2734 is under the x_dependent_opaqueterritory 2733, the territory 2733 is unchanged by the intersection (Fig. 271).
Where a new x-independent transparent level 2737 is under the x_dependent_opaque_territory 2736, the intersection is taken away from the territory 2736 to give updated x_dependent_opaque_territory 2738. The intersection is added to the x_independent_transparent_territory 2739. The area of the intersection is accumulated to the x_dependent_level_area 2740 and n_x_dependent_levels is increased by one (Fig. 27J).
83213 specification -89- Where a new x-dependent opaque level 2742 is under the x_dependent_opaqueterritory 2741, the territory 2741 is unchanged by the intersection 0 Z (Fig. 27K).
C Where a new x-dependent transparent level 2745 is under the 5 x_dependent_opaque_territory 2744, the intersection is taken away from territory 2744 to give the updated x_dependent_opaqueterritory 2746. The intersection is added to the N, xdependenttransparent_territory 2747 (Fig. 27L).
O Where a new x-independent opaque level 2749 is under the x_dependent_transparent_territory 2748, the territory 2748 is unchanged by the intersection. The area of the intersection is accumulated to the x_dependentlevel_area 2751 and n_x_dependent_levels is increased by one (Fig. 27M).
Where a new x-independent transparent level 2753 is under the x_dependenttransparent_territory 2752, the territory 2752 is unchanged by the intersection. The area of the intersection is accumulated to the x_dependent_level_area 2755 and n_x_dependent_levels is increased by one(Fig. 27N).
Where a new x-dependent opaque level 2757 is under the x_dependent_transparent_territory 2756, the intersection is taken away from the territory 2756 to yield the updated x_dependent_transparent_territory 2758. The intersection is added to the x_dependent_opaque_territory 2759. The area of the intersection is also accumulated to the x_dependent_level_area 2760 and n_x_dependent_levels is increased by one (Fig. 270).
Where a new x-dependent transparent level 2762 is under the x_dependent_transparent_territory 2761, the territory 2761 is unchanged by the intersection. The area of the intersection is accumulated to the x_dependentlevel_area 2764 and nx dependent_levels is increased by one (Fig. 27P).
583213specification The rectangular shapes seen in Figs. 27A-P are chosen for illustrative purposes. In practice the territories, which are approximated by a collection of polygons with vertical
O
Z and horizontal edges, may have different shapes. Figs. 27A to P do not indicate the allocation of any non-intersecting portions of the new level under consideration.
C 5 Next, in step 3024, the RTE module checks whether there are any more unique levels in the current band. If YES, process flow returns to step 3014 to process contribution of the next unique level. If there are no more unique levels, process flow Sproceeds to step 3026, in which the RTE module calculates xdependent territory, x_dependent_level_area and xindependent_level_area.
Let A the band area; n number of scanlines in the band and E number of edges per scanline.
x_dependentterritory area of x_dependent_opaque territory area of x_dependent transparent territory The variables xdependentlevel_area and xindependentlevel_area may be calculated using the following logic: If nx dependentlevels 120 then x_dependent_level_area xdependent_level_area area of x_dependent_opaque_territory area ofx_dependent_transparent_territory x_independent_level_area. xindependent transparent_level_area area ofx_independent_opaqueterritory area ofx_independent_transparent_territory.
If nxdependent_levels is greater than 120 then x_dependent level_area x_dependent_level_area area of x_dependent_opaqueterritory area of xdependent_transparent territory xindependent_transparent_level_area 583213specification -91xindependentlevel _area. area of x independent opaque territory area of x independent_transparent territory 0 N Next, in step 3028, the RTE module estimates the execution time of the priority determination module 500 and the data read and written.
mc If nx dependent_levels 120, then the following calculations are performed.
Cc Note that x-dependent pixels are also generated by REPEAT_FOR commands, and they Sare treated as x-independent pixels.
Here n REPEAT commands nE FGtime nE 7 clock cycles (1 (x_dependent_level_area xindependent_levelarea)/A) IF total n levels 512: LAMLCread LAM_LC_read nE (1 512/total_n_levels) 32 bytes LAMLCwrite LAM_LC_write nE (1 512/total_n_levels) 32 bytes IF total n levels 4096 LAMSCread LAM SCread nE (1 4096/total_n_levels) 32 bytes LAM SC write LAMSCwrite nE (1 4096/totaln_levels) 32 bytes
ENDIF
ENDIF
If, however, nx dependent_levels 120 then the following calculations are performed. Some x-dependent pixels may be generated individually instead of by 583213specification -92- REPEATFOR commands. Thus x-independent and x-dependent pixels must be considered separately.
O
Let D= area of x dependent territory area of the scanlines and I=1-D For x-independent regions the calculations are similar to the above but the number of REPEAT FOR commands is: n REPEAT commands nEl Let xindep_FG_time 7 clock cycles nEl (1 xindependent_level_area IF total n levels 512: LAMLCread LAM_LCread nEl (1 512/total_n_levels) 32 bytes LAM_LC_write LAMLC write nEl (1 512/total_n_levels) 32 bytes IF total n levels 4096 LAMSC_read LAMSC_read nEl (1 4096/total_n_levels) 32 bytes LAMSCwrite LAMSC_write nEl (1 4096/total_n_levels) 32 bytes
ENDIF
ENDIF
For x-dependent regions nlevels_read xdependent_level_area, and xdepFGtime xdependent_levelarea 6 clock cycles IF total n levels 512: 583213specification -93 LAMLCread LAMLCread x_dependent_level_area (1 512/totaln_levels) 32 bytes 0 CN LAM LC write LAM LC write xdependent_level_area (1 512/total_n_levels) 32 bytes SIF total n levels 4096
(N
LAM SC read LAM SC read CN x_dependent_level_area (1 4096/totaln_levels) 32 bytes LAMSC write LAMSCwrite x_dependent_level_area (1 4096/totaln levels) 32 bytes
ENDIF
ENDIF
Using the variables calculated above, the RTE module models the execution times of the priority generation module 516 and the priority determination module 500 as FG_time x_indep_FG_time xdep_FG_time LAM time LAM time FG time 4.5.2.2 Estimate execution time for the Fill Colour Determination Module (PGM) The Fill Colour Determination Module 600 consumes the FILL OPTIMISE commands and generates PIXELAT_LEVEL commands. The Fill Colour Determination Module 600 replaces the abstract Contributing Levels in FILL_OPTIMISE commands with the actual colour information in PIXELATLEVEL commands.
583213specification 94 O If the colour information comes from bitmaps, the Fill Colour Determination Module 600 must read the bitmaps from memory. If the bitmaps must be decompressed c before use, the Fill Colour Determination Module 600 must ensure that the decompression is completed before the bitmaps are read. The decompression speed ¢c€ 5 depends on the compression ratio and this ratio is given for each bitmap.
The amount of data read from memory can be estimated from the active levels having bitmaps as the fill data. Object bounding boxes are used as estimates of the image sizes. Bitmaps may be stored in a Bitmap Cache. In this case, Cache hits are estimated based on the Cache size and the affine transformation matrix when page coordinates are mapped to bitmap coordinates. One aspect to consider is the number of bytes in the bitmap that are jumped to when the current pixel on the page is moved to the next pixel.
Other types of colour information do not take long to read compared to the time required to read bitmaps.
Any REPEAT_FOR commands reaching the Fill Colour Determination Module 600 must be for x-independent pixels and are passed down to the Pixel Compositing Module 700.
The execution time of the Fill Colour Determination Module 600 can be modelled as follows: For x-dependent pixels: Let number of levels contributing to the x-dependent territory nx dependent_levels.
IF n_x_dependent_levels 120
THEN
nFILL_commands x_dependent_level_area.
PGM time n FILL_commands (in clock cycles)
ELSE
IF in colour mode
THEN
n_FILL_OPTIMISE_commands x_dependent_level_area 2 ELSE in monochrome mode n_FILL_OPTIMISEcommands x_dependentlevel_area /4
ENDIF
IF (there are 2 or more bitmaps in the band or there is at least one interpolated bitmap in the band)
THEN
PGM_time n_FILL_OPTIMISEcommands 2 clock cycles
ELSE
PGM_time n_FILL_OPTIMISEcommands (in clock cycles)
ENDIF
ENDIF
IF (number of levels in the band 64)
THEN
fill_cache_read x_dependentlevel_area 2 32 bytes
ELSE
fill_cache_read 0
ENDIF
Initialise bitmapread 0 583213soecification 96
O
O If there are more than 8 bitmaps in the band, choose the largest 8 bitmaps to Scache, but if there are 8 bitmaps or less in the band, cache all of them.
l For each cached bitmap: bitmap_read area of the bounding box bytes per pixel in the bitmap e c n c If there are more than 8 bitmaps in the band, for each non-cached bitmap: IF the non-cached bitmap is an interpolated bitmap N
THEN
bitmap_read area of the bounding box 2 32 bytes
ELSE
bitmap_read area of the bounding box 32 bytes
ENDIF
For x-independent pixels the model is as follows: PGM_time nEI 2 clock cycles PGM_time number of lines in the band fill_cache_read fillcache_read number of x-independent levels in the band (in bytes).
4.5.2.3 Estimate the execution time for the Pixel Output Module (POC) The POC 800 is able to process the PIXEL_OUT commands as fast as the commands are received. Any REPEAT_FOR command must be for x-independent runs. In the preferred embodiment, the POC 800 expands this command to output one or more pixels per clock cycle.
C0- t 1 1- -97-
O
0 When the POC 800 is running fast, the execution time is modelled as follows: O For x-dependent pixels: POCfast_time xdependent_territory (17/16) clock cycles.
When the POC 800 is running slow, the execution time is modelled as follows: For x-independent pixels: c IF (in colour mode):
THEN
POC_slow_time (area of the band x_dependentterritory) (17/32) clock cycles ELSE (in monochrome mode) POC_slow time (area of the band xdependent_territory) (17/64) clock cycles
ENDIF
4.5.3 Modelling memory access delays After modelling the render times of the priority determination module 500, the fill colour determination module 600 and the Pixel Output Module 700, the RTE module, in step 2908, uses the previously calculated memory access tallies of each rendering module to estimate access delays in the Memory Request Arbiter (MRA).
The MRA is the interface between the renderer 20 and memory 30. Renderer modules make read and write requests to the MRA via request sources. Each request source is a queue of requests that come from the same Renderer module for a specific purpose. For example, one request source may be for the Edge Processing Module 400 to read Edge records from memory and another request source is for the Fill Colour Determination Module 600 to read bitmaps from memory.
-98-
O
O The MRA uses a round-robin scheme to service each request source in turn. If a
(N
Srequest source has no pending requests, it is simply skipped for one round.
ci The following equations provide an approximation of the average time each request spends waiting in the queue and being serviced.
First, the RTE estimates the amount of data transfer required by each request source to render the required scanlines: IE R1W =8 requests
C
N
I EM_R2W_MAIN (EM_read 32) requests EM_R2W_PREFETCH= (PFread 32) requests EM_R2W_LEVELS= (MC_read 32) requests EM_R2 W_SPILLSORT= (Spill_read 32) requests LAM_R1W_LC (LAM_LC_read/16) requests LAM_ R1 W_SC (LAM_SC_read 16) requests PGM_R2W_FILL (fill_cache_read 32) requests PGMR2 W_BITMAP= (bitmap_read 32) requests EM_W2W_OUTPUT= ((Spill_write OM_write) 32) requests LAM W1 W_LC (LAM_LC_write 16) requests LAM_ W W_SC (LAM SCwrite 16) requests To estimate the maximum data request rates, the model used by the RTE module assumes the Renderer 20 to run at half of the top output speed. This initial estimated speed is 1 pixel per clock cycle in colour mode and 2 pixels per clock cycle in monochrome mode. So the initial estimated render time is: initrendertime area of the band clock cycles if in colour mode, or initrendertime (area of the band 2) clock cycles if in monochrome mode.
/-on i *r 99 O From the number of requests required by each request source, the RTE module ocalculates the specific maximum request rate for each request source; the total number of requests and the average of the maximum request rates for all request sources combined.
The specific maximum request rate for each request source is calculated as Cc€ c 5 follows, assuming the clock frequency to be 133 MHz.
3 h (IER1 W clock frequency initrendertime) requests per second X 2 (EM_R2 W_MAIN clock frequency init_render time) requests per second
N
C \3 (EM_R2W_PREFETCH clock frequency initrender time) requests per second 4 (EM_R2 W_LEVELS clock frequency init_render time) requests per second Xs (EM_R2W_SPILLSORT clock frequency init_rendertime) requests per second S6 (LAM_R1 W_LC clock frequency init_render_time) requests per second X 7 (LAMR1 WSC clock frequency init_render_time) requests per second Xs (PGM R2 WFILL clock frequency initrendertime) requests per second S9 (PGMR2W_BITMAP clock frequency init_rendertime) requests per second 1o (IDM maximum reading rate in bytes per clock cycle clock frequency 32) requests per second.
1 (EM_W2W_OUTPUT clock frequency init_rendertime) requests per second k 12 (LAM_ W W_LC clock frequency init_rendertime) requests per second 13 (LAM WI W_SC clock frequency init_render_time) requests per second 14 (IDM maximum writing rate in bytes per clock cycle clock frequency 32) requests per second.
1 '?OrPifIr qti n -100- O Note that the IDM is an Image Decompression Module that is not part of the Spipeline 22. The IDM runs in parallel to the pipeline 22 and in general does not affect the Sspeed of the pipeline 22. The IDM is used to decompress images used in the rendering of the page.
The total number of requests is modelled as follows: ~total requests IE_R1 W EMR2 W_MAIN EM_R2 W_PREFETCH EMR2W_LEVELS EM_R2W_SPILLSORT LAMR1 W LC LAMR1 WSC S+ PGMR2W_FILL PGM R2W_BITMAP EM W2W_OUTPUT LAM_W1 W_LC LAM_W1 W_SC X lo init_render_time/clockfrequency X 14 init_render_time/clock_frequency (Equation 1) The average of the maximum request rates for all request sources combined is modelled using the following equations: total_request_rate total_requests clock frequency initrender_time) requests per second A queuing system similar to that used by the renderer 20 and MRA is analysed in Section 4.11, pages 206 to 209 of "Queuing Systems Volume II: Computer Applications" by Leonard Kleinrock, John Wiley and Sons, 1976. The formulae in that section can be applied in the current section of the model used by the RTE module.
It is necessary to test whether the memory bandwidth is sufficient to support rendering at top speed. Let M be the number of request sources; p be the average request service rate in requests per second and X be the average request rate for a request source as if each source is making the same number of requests. Then, M= 14 p the memory bandwidth 200 220 32 requests per second k (total_request_rate M) 101 O M* is the maximum number of request sources the memory can support for rendering at top speed, each request source having request rate of X.
O
M* 1+ If M M* then the bandwidth of memory 30 is high enough to support n 5 rendering at top speed. In this case the average time taken per memory request is simply C the memory service time without any waiting time. In other words, where T is the Saverage time a request must spend for waiting in queues and for being served: T 1 for each request.
If M M* then the memory bandwidth is not high enough to support rendering at top speed. In this case: average_number_of_requests_per_source total_requests /14 Initialise total_requests_above_average 0 FOR each request source from the 14 sources in (Equation 1),
DO
IF (number of requests average_number_of_requests_per_source)
THEN
totalrequests_above_average total_requests_above_average number of requests from that source -average_number_of_requests_per_source
ENDIF
ENDFOR
With k being the average request rate for each source, T is approximated by S 1+ total requests above average T M 1 M average number of res per source1 M I^average number of requests per source) 583213specification 102- 4.5.4 Updating the execution time estimates with memory access delays Z In step 2910, the RTE Module updates the execution time of each modelled sub-
(N
module in the Edge Processing Module 400 by adding the corresponding access delays.
m 5 Then the RTE Module estimates the Edge Module 400 execution time when running slow c operations, using the Spill Sorting time in the OM sub-module. Step 2910 then estimates the Edge Module execution time when running fast operations using the maximum Sexecution time of the remaining EM sub-modules.
The model is as follows: F_EM ((EF_time PF_time MC_time UM_time SM time OM time) clock frequency) ((EF_read PF_read) T/32) S_EM (MC_read Spill_read Spill_write) T/32 If the number of edges per scanline is less than or equal to 64, then S_EM S_EM (number of edges per scanline number ofscanlines T 2).
If the number of edges is greater than 64, then S_EM S_EM (64 number of scanlines T /2) F_EM F_EM ((number of edges per scanline 64 number ofscanlines T/ 2).
Next, in step 2912, the estimated memory access delays are added to the execution times of the remaining time-intensive modules in the RTE model, i.e. the Priority Determination Module 500, the Fill Colour Determination Module 600 and the Pixel Output Module 800. The estimates are calculated for the slow mode and the fast mode.
For the Priority Determination Module 500: F_LAM LAM time clock frequency -103- SLAM (LAM_LC_read LAMLCwrite LAM_SCread LAM_SCwrite) N T/16 0 For the Fill Colour Determination Module 600: CN F_PGM PGMtime clock frequency S_PGM (fill_cache_read bitmap_read) T/32 c For the Pixel Output Module 800: ,FPOC POCast time clockfrequency S_POC POC slow_time clock frequency 4.5.5 Estimating the total render time for the current band In the next step 2814 of Fig. 29, the RTE module estimates the total render time for the current band, using the modelled execution times of the rendering modules. Both the fast and the slow execution times are estimated.. The RTE module takes into account the concurrency and stalling effects in the pipeline 22.
In previous steps of the method of Fig. 29, the running time of each relevant Renderer module has been calculated, including memory access times. The Renderer modules are designed to run concurrently, but some modules may slow down from time to time due to increased workload, thus stall the other modules in the render pipeline 22.
The RTE module estimates the degree of concurrency of the modules in pipeline 22.
For each module, the RTE module identifies the portion of the running time when the module is running at a slow speed due to increased workload. For the rest of the module's running time it is assumed to run at optimal speed. Let the slow and fast times of the four time-intensive modules be: S S_EM; S2 S_LAM; S3 S_PGM;
S
4
S_POC;
F =F EM; 583213specification -104-
O
0 F2 F LAM; F3 F_PGM; and
O
SF 4
FPOC.
Let C the running time of all Si with some concurrency x tmax( =1 j= x 4 where tmax Si the maximum time period for all the Si to run.
Si=1 4 t. Sj and f tm the fraction oftmax when none of the Si is running.
tmax The model estimates Ri, the reduction in stalling time of the i th module due to the presence of a buffer or buffers at either end of the module in the render pipeline 22. A buffer between two modules helps to regulate and smooth out the data flow when the relative speed between the two modules changes from fast to slow or from slow to fast.
This helps to reduce the stalling time of the faster module. The amount of reduction depends on the buffer size, the total data flow between the two modules, and how often there are relative speed changes. In the preferred embodiment, the RTE assumes that there is a change of relative speed from slow to fast once and from fast to slow once in a scanline, due to the periodic nature of scanlines on a printed page.
Step 2914 models the stalling in the edge processing module 400 as follows; Let the buffer size (B_EM_LAM) between the edge processing module 400 and the priority determination module 500 be 128 entries and calculate D_EM_LAM, the average data transfer from the edge processing module 400 to the priority determination module 500 per scanline as: D_EM_LAM 1 (n_EC_commands /number ofscanlines in the band) c-oT-ti r- 105- Let O S LAM C-S EM Z REMLAM= x C E x C) MIN(1, B/(2D)) C C C RI nREM-LAM Step 2914 calculates the stalling in the priority determination module 500 as C follows: S EM C-S LAM SRLAM EM x x C) MIN(1, B_EM_LAM/(2D_EM_LAM)) C C where B_LAMPGM the buffer size between the priority determination module 500 and the fill colour determination module 600 is 120 levels, and D_LAM_PGM is the average data transfer from LAM to PGM from the priority determination module 500 to the fill colour determination module 600) per scanline.
D_LAM_PGM (nREPEAT_commands x_dependent_level_area) n S PGM C-S LAM RLAM PGM x x C) MIN(1, BLAMPGM/(2D_LAM_PGM)) C C R2 n(RLAM EM RLAM PGM) Step 2914 models the stalling in the fill colour determination module 600 as follows: S LAM C-S PGM RPGM LAM X x C) MIN(1, B_LAM_PGM/(2D_LAM_PGM)) C C If the renderer 20 is operating in colour mode then B_PGM_POC, which is the buffer size between PGM (fill colour determination module 600) and POC (pixel output module 800) is 64 pixels. If the renderer 20 is not operating in colour mode, then B_PGM_POC is 128 pixels.
-106- SThe RTE module estimates the average data transfer from PGM 600 to POC 800 Sper scanline as D_PGMPOC (n_REPEAT commands MIN(x_dependent territory, x_dep_area))/ n Then S POC C-S PGM RGM POC x x C) MIN(1, B_PGMPOC/(2DPGMPOC)) and C C C 1 R3 n(RpGM LAM RPGMPOc).
Step 2914 estimates the stalling for the pixel output module 600 as follows: S PGM C-S POC RPOC PGM X x C) MIN(1, B_PGM_POC/(2D PGM_POC)) C C R4 nRpoc PGM Having calculated the values R1-R4 for the four time-intensive render modules, step 2914 estimates the render time for the scanlines as render time MAX(Fi) C MIN(R,) where i 1,2,3,4.
Next, in step 2916, the RTE module adds the calculated band render time to the chunk render time. Step 2916 also updates any decompression task being performed by the image decompression module by recording how much decompression is done and how much is left to be done during the calculated band render time. If the current decompression task is finished, the image decompression module may start a new task in the next band, if there are any such tasks to perform.
Other Arrangements The previous description relates to a model of a preferred implementation of the renderer 20. In cases where the renderer 20 differs from the above description, 583213soecification -107consequential changes may be made to the model used by the RTE module to estimate the o render time of the renderer 20. Some potential variations to the renderer 20 and the model used by the RTE are presented in the following sections.
5.1 Variation 1: C 5 The Output Buffer 2600 may be a ping-pong buffer. The RTE module can simulate such a buffer by fixing the chunk size to be one half the FIFO buffer size. In this case, the granularity of both reading and writing is a half-buffer.
(CN 5.2 Variation 2: A second option for print engine synchronisation is for the RTE module to calculate a suitable time period for the print engine 10 to wait. This time is passed back to the rendering system for print engine control. The print engine of printer 10 must wait for this time period after the Renderer 20 starts to render the intermediate render job prepared by the software driver before starting to print the page. If the actual Output Buffer 19 is a ping-pong buffer, then the print engine waiting time is the time taken by Renderer 20 to render and fill up the first half-buffer of the ping-pong buffer. If the actual Output Buffer 19 is a FIFO, a suitable waiting time is calculated by the RTE. A suitable waiting time may be the time to render and fill up a chunk, where the chunk size is determined based on the Output Buffer size.
5.3 Variation 3: PIXEL_REPEAT commands may be stored in the output FIFO 19. These commands are expanded into individual pixels before being sent to the print engine of printer 10. In this case, the number of scanlines that can fit into a chunk depends on the number of x-dependent pixels and the number of x-independent runs making up the scanlines. There are two ways of estimating the number of scanlines. The first way places fewer scanlines into the chunk initially and then tries to fit some more scanlines -108into the chunk if there is room. In the second method, an initial estimate is made of the
(N
number of scanlines that will fit. The estimate is refined in subsequent iterations by
O
Z examining the page contents on the scanlines being looked at. The first approach is slower but is more accurate and is guaranteed to converge. The second approach is more C< 5 flexible in terms of calculation time. The number of iterations can be as few as 1 after the c, initial estimate. However the second approach does not guarantee convergence after many iterations. In one embodiment, the second approach is used with 1 iteration, for Sefficiency reasons. Let B the chunk size (in bytes); S the number of bytes to store a pixel in the chunk; R the number of bytes to store a pixel-repeat command in the chunk; Ppi the number of pixels on a scanline; and calculate the number of scanlines that can fit into a chunk if all pixels are expanded as:
B
L-
SPpl The initial estimate of the number of scanlines is 2L.
In the first iteration, an estimate is made of the average number of edges per scanline.
Let E (the average number of edges per scanline because the left margin behaves like an edge.
The area on the scanlines with x-dependent pixels is estimated by xdependent_territory. The RTE module considers x-dependent territory in the case of dense populations of x-dependent objects, and considers x-dependent area in the case of few and sparse x-dependent objects.
Let D be the fraction of the scanlines with x-dependent pixels and I be the fraction of the scanlines with x-independent pixels: D (x_dependent_territory)/(area of the scanlines) 583213 specification 109- 0 The improved estimate of the number of scanlines, n, is then
B
n DPpl S EIR ¢€3 5 If more iterations are needed, E and D (and thus I) should be re-estimated based (Ni N, on the new estimate of the number of scanlines, n. Then a new n can be calculated.
Implementation The aforementioned preferred method(s) comprise a particular control flow. There are many other variants of the preferred method(s) which use different control flows without departing the spirit or scope of the invention. Furthermore one or more of the steps of the preferred method(s) may be performed in parallel rather sequential.
The method of render time estimation is preferably practiced using a generalpurpose computer system 3100, such as that shown in Fig. 31 wherein the processes of Figs. 26-29 may be implemented as software, such as an application program executing within the computer system 3100. In particular, the steps of method of render time estimation are effected by instructions in the software that are carried out by the computer. The instructions may be formed as one or more code modules, each for performing one or more particular tasks. The software may be stored in a computer readable medium, including the storage devices described below. The software is loaded into the computer from the computer readable medium, and then executed by the computer. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer preferably effects an advantageous apparatus for render time estimation.
The computer system 3100 is formed by a computer module 3101, input devices such as a keyboard 3102 and mouse 3103, and output devices including a display device 3114. A Modulator-Demodulator (Modem) transceiver device 3116 may be used by the computer module3101 for communicating to and from a communications -110network 3120, for example connectable via a telephone line 3121 or other functional o medium. The modem 3116 can be used to obtain access to the Internet, and other network systems, such as a Local Area Network (LAN) or a Wide Area Network (WAN), and may be incorporated into the computer module 3101 in some implementations.
The computer module 3101 typically includes at least one processor unit 3105, and a memory unit 3106, for example formed from semiconductor random access Cc memory (RAM) and read only memory (ROM). The module 3101 also includes an number of input/output interfaces including an audio-video interface 3107 that ,i couples to the video display 3114 and loudspeakers 3117, an I/O interface 3113 for the keyboard3102 and mouse3103 and optionally a joystick (not illustrated), and an interface 3108 for the modem 3116 and printer 3115. In some implementations, the modem 31116 may be incorporated within the computer module 3101, for example within the interface 3108. A storage device 3109 is provided and typically includes a hard disk drive 3110 and a floppy disk drive 3111. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 3112 is typically provided as a non-volatile source of data.
The components 3105 to 3113 of the computer module 3101, typically communicate via an interconnected bus 3104 and in a manner which results in a conventional mode of operation of the computer system 3100 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PCs and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.
Typically, the application program is resident on the hard disk drive 3110 and read and controlled in its execution by the processor 3105. Intermediate storage of the program and any data fetched from the network 3120 may be accomplished using the semiconductor memory 3106, possibly in concert with the hard disk drive 3110. In some instances, the application program may be supplied to the user encoded on a CD-ROM or floppy disk and read via the corresponding drive 3112 or 3111, or alternatively may be read by the user from the network 3120 via the modem device 3116. Still further, the software can also be loaded into the computer system 3100 from other computer readable media. The term "computer readable medium" as used herein refers to any storage or transmission medium that participates in providing instructions and/or data to the 583213specification -111computer system 3100 for execution and/or processing. Examples of storage media 0 include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer e¢3 5 module 3101. Examples of transmission media include radio or infra-red transmission C€3 C, channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
Cl 7.0 Industrial Applicability It is apparent from the above that the arrangements described are applicable to the image processing industries (eg. computer and data processing industries).
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.

Claims (4)

  1. 2. A method as claimed in claim 1 comprising the step of: determining, based on said estimated time, whether said renderer can render said renderjob in real time.
  2. 3. A method as claimed in claim 2 wherein said renderer writes data to an output buffer from which said data is read, and wherein said determining step comprises the substeps of: maintaining an estimate of the amount of available data in said output buffer; and calculating an expected amount of data read from said output buffer in said estimated time; and determining, if said expected amount of data exceeds said estimated available data, that said renderer cannot render said render job in real time. 583213specification
  3. 113- 8 4. A method as claimed in any one of the previous claims wherein said step of calculating territories comprises the sub-steps of: O Z generating a list of levels that are active on said scanlines, said levels having one or more descriptive parameter; m 5 determining, for a current one of said active levels, whether said current active ilevel intersects a territory on said scanlines; N adding, if said current active level has a non-intersecting portion, said non- O intersecting portion to a corresponding territory according to said parameters; updating, if said current active level has an intersecting portions, the intersected territory with the intersecting portion in accordance with combination rules based on said parameters. A method as claimed in claim 4 wherein said parameters relate to opacity and to x-dependence, and wherein said combination rules comprise: if an x-independent opaque level intersects an x-independent opaque territory, the intersecting portion remains in the x-independent opaque territory; if an x-independent transparent level intersects an x-independent opaque territory, the intersecting portion is removed from the x-independent opaque territory and added to an x-independent transparent territory; if an x-dependent opaque level intersects an x-independent opaque territory, the intersection is removed from the x-independent opaque territory and is added to an x- dependent opaque territory; if an x-dependent transparent level intersects an x-independent opaque territory, the intersection is removed from the x-independent opaque territory and added to an x- dependent transparent territory; 583213specification -114- if an x-independent opaque level intersects an x-independent transparent territory, the intersection is removed from the x-independent transparent territory and O Z added to an x-independent opaque territory and also to an x-independent transparent level area; Cc 5 if an x-independent transparent level intersects an x-independent transparent territory, the territory remains unchanged and the intersection is added to an x- c independent transparent level area; O if an x-dependent opaque level intersects an x-independent transparent territory, the intersection is removed from the x independent transparent territory and added to an x-dependent opaque territory and also to an x-independent transparent level area; if an x-dependent transparent level intersects an x-independent transparent territory, the intersection is removed from the x independent transparent territory and added to an x-dependent transparent territory and also an x-independent transparent level area; if an x-independent opaque level intersects an x-dependent opaque territory, the territory remains unchanged; if an x-independent transparent level intersects an x-dependent opaque territory, the intersection is removed from the x-dependent opaque territory and added to an x- independent transparent territory and also an x dependent level area; if an x-dependent opaque level intersects an x-dependent opaque territory, the territory is unchanged; if an x-dependent transparent level intersects an x-dependent opaque territory, the intersection is removed from the x-dependent opaque territory and added to an x- dependent transparent territory; 583213specification -115- 8 if an x-independent opaque level intersects an x-dependent transparent territory, the territory remains unchanged and the intersection is added to an x-dependent level O Z area; cN if an x-independent transparent level intersects an x-dependent transparent 5 territory, the territory remains unchanged and the intersection is added to an x-dependent level area; N if an x-dependent opaque level intersects an x-dependent transparent territory, O the intersection is removed from the x-dependent transparent territory and added to an x- dependent opaque territory and also an x-dependent level area; and if an x-dependent transparent level intersects an x-dependent transparent territory, the x-dependent transparent territory remains unchanged and the intersection is added to an x-dependent level area. 6. A method as claimed in any one of the previous claims wherein said renderer comprises a pipeline of render modules and said model approximates the execution time of at least one said render module and the number of memory reads and writes of said at least one render module for said scanlines. 7. A method as claimed in claim 6 wherein said model provides an approximation of memory access delays due to the operation of a memory request arbiter between said renderer and a data storage. 8. A method as claimed in claim 6 or 7 wherein said model approximates stalling effects in said pipeline when the operation of one or more render modules causes the operation of other ones of the render modules to slow. 583213 specification -116- 9. A method as claimed in claim 8 wherein said model approximates the effect of 0 Z buffers between said render modules in reducing said stalling effects. 10. A method as claimed in any one of the previous claims wherein said render job defines a set of edges that activate said levels in said scanlines and said renderer sorts said N edges, and wherein said estimating step comprises a sub-step of approximating the Snumber of memory accesses required in sorting said edges. 11. A method as claimed in claim 4 wherein said active levels are approximated by rectangular bounding boxes. 12. Apparatus for estimating a time taken to render a collection of consecutive scanlines from a render job, the apparatus comprising: means for receiving a description of said render job comprising a plurality of levels; means for receiving a model of a pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque levels; means for estimating territories of said scanlines that receive an unobscured contribution from one or more levels in said description; and means for estimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines. 13. A computer program comprising machine-readable program code for controlling the operation of a data processing apparatus on which the program code executes to 583213 specification -117- perform a method of estimating a time taken to render a collection of consecutive scanlines from a render job, the method comprising the steps of: O Z receiving a description of said render job comprising a plurality of levels; receiving a model of a pixel-based renderer, said renderer only rendering Cc 5 portions of levels that are not obscured by opaque levels; e¢3 estimating territories of said scanlines that receive an unobscured contribution (Ni N from one or more levels in said description; and Sestimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines. 14. A computer program product comprising machine-readable program code recorded on a machine-readable recording medium, for controlling the operation of a data processing apparatus on which the program code executes to perform a method of estimating a time taken to render a collection of consecutive scanlines from a render job, the method comprising the steps of: receiving a description of said render job comprising a plurality of levels; receiving a model of a pixel-based renderer, said renderer only rendering portions of levels that are not obscured by opaque levels; estimating territories of said scanlines that receive an unobscured contribution from one or more levels in said description; and estimating, from said description, said model and said territories, the time taken for said renderer to render said scanlines. A system comprising: a storage unit for storing a render job comprising a plurality of levels; and 583213specification
  4. 118- 0 a processor for estimating a time taken by a pixel-based renderer to render a collection of consecutive scanlines from said render job, wherein said processor O Sestimates territories of said scanlines that receive an unobscured N contribution from one or more levels in said description; and C 5 estimates the time taken for said renderer to render said scanlines, wherein said time estimate is based on said description and said territories. 16. A method substantially as described herein with reference to Figs. 2 and 27-30. 17. An apparatus substantially as described herein with reference to Figs. 2 and 27- 18. A computer program substantially as described herein with reference to Figs. 2 and 27-30. 19. A computer program product substantially as described herein with reference to Figs. 2 and 27-30. A system substantially as described herein with reference to Figs. 2 and 27-30. DATED this TWENTY-SECOND Day of NOVEMBER 2004 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON CO'* 1 1
AU2004231233A 2003-12-23 2004-11-22 Render Time Estimation Abandoned AU2004231233A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2004231233A AU2004231233A1 (en) 2003-12-23 2004-11-22 Render Time Estimation

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AU2003907201 2003-12-23
AU2003907201A AU2003907201A0 (en) 2003-12-23 Render Time Estimation
AU2004231233A AU2004231233A1 (en) 2003-12-23 2004-11-22 Render Time Estimation

Publications (1)

Publication Number Publication Date
AU2004231233A1 true AU2004231233A1 (en) 2005-07-07

Family

ID=34750766

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2004231233A Abandoned AU2004231233A1 (en) 2003-12-23 2004-11-22 Render Time Estimation

Country Status (1)

Country Link
AU (1) AU2004231233A1 (en)

Similar Documents

Publication Publication Date Title
US7483036B2 (en) Reducing the number of compositing operations performed in a pixel sequential rendering system
US7714865B2 (en) Compositing list caching for a raster image processor
US6483519B1 (en) Processing graphic objects for fast rasterised rendering
US6828985B1 (en) Fast rendering techniques for rasterised graphic object based images
US7538770B2 (en) Tree-based compositing system
US7551173B2 (en) Pixel accurate edges for scanline rendering system
US5760792A (en) Fifo logical addresses for control and error recovery
US20020158881A1 (en) Apparatus and method for acceleration of 2D vector graphics using 3D graphics hardware
JP2005516315A (en) Efficient display updates from object graphics changes
AU760826B2 (en) Rendering graphic object based images
AU2004231233A1 (en) Render Time Estimation
AU744091B2 (en) Processing graphic objects for fast rasterised rendering
AU2005200948B2 (en) Compositing list caching for a raster image processor
AU2004200655B2 (en) Reducing the Number of Compositing Operations Performed in a Pixel Sequential Rendering System
AU743218B2 (en) Fast renering techniques for rasterised graphic object based images
AU2004233516B2 (en) Tree-based compositing system
AU2005201868A1 (en) Removing background colour in group compositing
AU2004231232B2 (en) Pixel accurate edges for scanline rendering system
US5946003A (en) Method and apparatus for increasing object read-back performance in a rasterizer machine
US7355602B1 (en) Surrogate stencil buffer clearing
AU2004237873A1 (en) State table optimization in expression tree based compositing
AU2005201929A1 (en) Rendering graphic object images
AU2004233469A1 (en) Rendering linear colour blends
AU2002301643B2 (en) Activating a Filling of a Graphical Object
AU4064502A (en) Compositing objects with opacity for fast rasterised rendering

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period