EP1527397A2 - Fast signal convolution using separated-spline kernel - Google Patents

Fast signal convolution using separated-spline kernel

Info

Publication number
EP1527397A2
EP1527397A2 EP02774117A EP02774117A EP1527397A2 EP 1527397 A2 EP1527397 A2 EP 1527397A2 EP 02774117 A EP02774117 A EP 02774117A EP 02774117 A EP02774117 A EP 02774117A EP 1527397 A2 EP1527397 A2 EP 1527397A2
Authority
EP
European Patent Office
Prior art keywords
pattern
data
data indicative
kernel
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02774117A
Other languages
German (de)
French (fr)
Inventor
David Jeremy Copeland
Richard E. Crandall
Ulrich Hofmann
Richard L. Lozes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Applied Materials Inc
Original Assignee
Applied Materials Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/866,890 external-priority patent/US6912323B2/en
Application filed by Applied Materials Inc filed Critical Applied Materials Inc
Publication of EP1527397A2 publication Critical patent/EP1527397A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations

Definitions

  • the invention relates to signal processing methods and apparatus for performing convolution on data indicative of a pattern (e.g., image data indicative of a pixel array).
  • a pattern e.g., image data indicative of a pixel array.
  • the convolution kernel is (or is approximated by) a separated-spline function.
  • Convolution is commonly performed on signals in many contexts, including the fields of sound, still image, video, lithography, and radio (radar) signal processing.
  • the signals to be convolved are pattern signals.
  • Each of the expressions "pattern” and “pattern signal” is used herein in a broad sense to denote a one-dimensional sequence or two-dimensional (or higher dimensional) array of data words (which can be, but need not be pixels).
  • the data words comprise binary bits, and the convolution is performed in discrete fashion on the binary bits using software, digital signal processing circuitry, custom hardware, or FPGA systems (field programmable gate array based computing systems).
  • data herein denotes one or more signals indicative of data
  • data word herein denotes one or more signals indicative of a data word.
  • the present invention grew out of attempts to establish a suitable "O(NN)" algorithm (an algorithm requiring not more than on the order of NN multiplications and additions) for convolving a two-dimensional pattern comprising NN pixels, where each of NandN is very large) with a Gaussian kernel (or other smooth kernel) such that the convolution is exact or very close to exact.
  • a suitable "O(NN)” algorithm an algorithm requiring not more than on the order of NN multiplications and additions
  • a Gaussian kernel or other smooth kernel
  • the objective in performing proximity correction is to generate a "raw" optical signal (or “raw” electron beam signal) which can be input to a set of reflective or refractive optics (or electron beam optics), in order to cause the output of the optics to produce a desired pattern on a mask or wafer.
  • a deconvolution operation is typically performed on a very large array of pixels (which determine a pattern "p") in order to correct for the well known proximity problem.
  • the proximity problem results from electron scattering in the substrate (mask or wafer) being written. Such scattering exposes broadened areas on the substrate to electrons (i.e., an area surrounding each pixel to be written in addition to the pixel itself), with the scattering effectively broadening the electron beam beyond the beam diameter with which the beam is incident on the substrate.
  • such a deconvolution operation includes at least one convolution step. Accordingly, in performing typical proximity correction, a very large array of pixels (determining a pattern "p") must be convolved with a diffusion kernel. Although such a convolution is typically performed on a pattern comprising a very large array of binary pixels, this restriction is not essential in the following discussion and is not essential to implementation of the invention. Indeed, the invention can implement convolution on data indicative of any pattern "p" with a smooth convolution kernel "d” having characteristics to be described below. For data indicative of a pattern "p" and a convolution kernel "d” we consider the cyclic convolution:
  • x ⁇ denotes that the convolution operator has cyclic character, and an acyclic convolution which differs only in the indicial constraint and range:
  • the pattern is two-dimensional (a two-dimensional arrays of data values determines the pattern) and the summation defining the convolution (a summation which corresponds to either one of the summations set forth in the previous paragraph) is over index k as well as index of the array p jk -
  • the indices n, i, j and domain lengths N in the formulae set forth in the previous paragraph are 2- vectors.
  • the result of the cyclic convolution has length N(it comprises N data values), and the result of the acyclic convolution has length +N-l.
  • D is the circulant matrix of d (hereinafter the "circulant" of d), whose 1- dimensional form is defined (assuming that N is greater than 3) as:
  • U.S. Patent Application No. 09/480,908 discloses a fast convolution method whose central idea (in one-dimensional embodiments) is to approximate a smooth kernel d by a polynomial sphne kernel/ (where/is a spline function_ ( ) which is piecewise a polynomial of degree ⁇ with L pieces fi(x)), and then to use appropriate operators that annihilate (or flatten) each polynomial of given degree (in a manner to be explained) to calculate the convolution of/ and ? quickly.
  • the smooth kernel d is approximated by a spline kernel /which is not a polynomial spline kernel, but which consists of L pieces defined over adjacent segments of its domain (in typical two-dimensional cases, the latter spline kernel is a radially symmetric function whose domain is some continuous or discrete set of values of the radial parameter).
  • spline convolution as described in U.S. Application No. 09/480,908 has features reminiscent of conventional wavelet schemes and is an O(N) algorithm (as are wavelet schemes), an advantage of "spline" convolution is that it can be performed (on data indicative of a pattern/?
  • the pattern "p" can be one-dimensional in the sense that it is determined by a continuous (or discrete) one-dimensional domain of data values (e.g., pixels), or it can be two-dimensional in the sense that it is determined by a continuous two- dimensional domain of data values (or a two-dimensional array of discrete data values), orp can have dimension greater than two.
  • the pattern ? is one-dimensional in the sense that it is determined by a discrete, ordered set of data values (e.g., pixels) /?,, where i varies from 0 to N-l (where N is the signal length), or it is two-dimensional in the sense that it is determined by an array of data values py, where i varies from 0 to N-l and/ ' varies from 0 to N-l, or it has dimension greater than two (it is determined by a three- or higher-dimensional set of data values).
  • a discrete, ordered set of data values e.g., pixels
  • py where i varies from 0 to N-l and/ ' varies from 0 to N-l, or it has dimension greater than two (it is determined by a three- or higher-dimensional set of data values).
  • the kernel d is determined by an array of data values dy, where i varies from 0 to N- 1 andy varies from 0 to N-l (but the kernel d can alternatively be determined by a discrete set of data values do through d ⁇ . ⁇ ).
  • the convolution Dp is accomplished by performing the steps of:
  • ⁇ g + i has the form of the N x N circulant matrix defined as follows:
  • each entry is a binomial coefficient
  • the annihilation operators can be defined as
  • the end points of each segment (the "pivot points") of spline kernel may be consecutive elements d t and dm of kernel d, and step (a) can be implemented by performing curve fitting to select each segment of the spline kernel as one which adequately matches a corresponding segment of the kernel d .
  • appropriate boundary- conditions are satisfied at each pivot point, such as by derivative-matching or satisfying some other smoothness criterion at the pivot points.
  • step (c) includes a preliminary "ignition" step in which a small number of the lowest components o ⁇ r — Fp are computed by exact multiplication of/? by a few rows of F, and then a step of determining the rest of the components of r using a natural recurrence relation determined by the spline kernel and the operator ⁇ ⁇ j + ⁇ .
  • the lowest components of r are r 0) ri,— > r s, where " ⁇ " is the maximum degree of the spline segments of spline kernel/(for example r 0 , r ⁇ , and r 2 where the spline kernel comprises quadratic segments), and these ( ⁇ + 1) components are determined by exact multiplication of ? by ( ⁇ + 1) rows of F.
  • the (c5 + 1) components can alternatively be determined in other ways. Then, the rest of the components "rs" are determined using a natural recurrence relation determined by the operator ⁇ + .
  • ignition operation which generates the components r 0 , r ⁇ ,..., rs, can be accomplished with O(N) computations.
  • the recurrence relation calculation can also be accomplished with O(N) computations.
  • each entry is a binomial coefficient
  • is the maximum degree of the spline segments of spline kernel/
  • the flattening operator ⁇ ⁇ is defined similarly.
  • the convolution Dp (where D is the circulant of smooth kernel d) includes the steps of:
  • FDF X requires one transform.
  • D is fixed, and transformed on a one-time basis
  • subsequent convolutions Dp only require two transforms each, as is well known.
  • the complexity then of Fourier-based cyclic convolution is thus O(Nlog N) operations (i.e., on the order of N log N multiplications and additions) for convolving a pattern/? of length N (a pattern determined by N data values), because of the 2 or 3 FFTs (Fast Fourier Transforms) required.
  • the Fourier method is an exact method (up to round-off errors depending on the FFT precision).
  • WDW X is sparse, where "sparse" in the present context denotes simply that any matrix- vector product Wx, for arbitrary x, involves reduced complexity O(N), rather than say O(N 2 ).
  • Separated-spline convolution in accordance with the present invention is an O(N) method for convolving a pattern ? determined by N data values.
  • Separated-spline convolution in accordance with the present invention has an advantage over spline convolution in accordance with U.S. Application No. 09/480,908 in that separated-spline convolution in accordance with the invention can be performed (on data indicative of a two- or higher-dimensional pattern/? consisting of N data values) with dN arithmetic operations (multiplications and additions), whereas spline convolution in accordance with
  • the invention is a method for perfonning two-dimensional cyclic or acyclic convolution of an ⁇ -dimensional pattern "/?"
  • a two-dimensional pattern ? is determined by a continuous two-dimensional range of data values or two-dimensional array of discrete data values.
  • the pattern/? is two-dimensional in the sense that it is determined by a discrete, ordered set of data values (e.g., pixels) py, where i varies from 0 to N-l and/ ' varies from 0 to
  • step (c) a small number of the lowest components of n are computed directly during the preliminary ignition step, and the rest of the components of r ⁇ are then determined using the natural recurrence relation.
  • step (c) a small number of large negative components of r x are computed directly during the preliminary ignition step, and the rest of the components of r ⁇ are then determined using the natural recurrence relation.
  • k + (x) is a one-tailed Laplacian kernel
  • Each one-dimensional convolution is performed in three parts.
  • the cap ⁇ R) and the two decay regions (where x > R and x ⁇ -R) are each produced by different convolutions, whose results are added together to obtain the final result.
  • step (c) includes the step of storing (k ⁇ x p) ⁇ in the memory such that each row of (k ⁇ x p) ⁇ occupies the memory locations formerly occupied by a corresponding row of p(x,y).
  • step (e)) is omitted.
  • the one-dimensional kernel k 2 (x) has two parts:
  • steps (a)-(d) are performed by an appropriately programmed processor, and the processor performs the additional step of: (e) after step (d), transposing the result of step (d) to produce r(x, y), and returning to an initial processor state.
  • steps (a)-(d) are performed by an appropriately programmed processor, and the processor performs the additional step of: (e) after step (d), transposing the result of step (d) to produce r(x, y), and returning to an initial processor state.
  • each occurrence of the factor 2sin 2 ( ⁇ /2L) is replaced by the factor 1.
  • the invention is a computer programmed with software for performing convolution, on data indicative of an ?z-dimensional pattern (where n is greater than or equal to 2), using a separated kernel in accordance with any embodiment of the inventive method.
  • Other embodiments of the invention include a digital signal processor including digital signal processing circuitry configured to perform convolution on data indicative of an n-dimensional pattern (where n is greater than or equal to 2), using a separated kernel in accordance with any embodiment of the inventive method, an apparatus (such as custom or dedicated electronic circuitry, or a field programmable gate array based computing system ("FPGA system”)) configured to perform convolution on such data in accordance with any embodiment of the inventive method, and a lithography system including such digital signal processing circuitry, such custom or dedicated electronic circuitry, or such an FPGA system.
  • FPGA system field programmable gate array based computing system
  • a computer-readable storage medium which stores computer-executable instructions, wherein the instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
  • Figure 1 is a block diagram of a computer system programmed with software for implementing the inventive method.
  • FIG. 2 is a block diagram of a lithography system including a digital signal processor configured to perform convolution (in accordance with the invention) on image data, and a device which generates a pattern signal (e.g., an optical beam electron beam having time-varying amplitude) from the resulting convolved image data.
  • the pattem signal is provided to a set of optics (e.g., reflective or refractive optics, or electron beam optics) and the output of the optics is projected as a pattern on a glass plate, thus producing a mask useful in integrated circuit manufacture.
  • Figure 3 is a block diagram of a digital signal processor (which can be used as the digital signal processor of Fig. 2) configured to perform convolution
  • Figure 4 is a block diagram of a lithography system which is a variation on the system of Figure 2.
  • Fig. 5 is a simplified elevational view of a computer-readable storage medium (a CD-ROM) which stores computer-executable instructions, wherein the instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
  • a computer-readable storage medium a CD-ROM
  • data denotes one or more signals indicative of data words.
  • data indicative of a pattern "/?” is convolved (in accordance with the invention) with data indicative of a smooth kernel "d" denotes that one or more signals indicative of the pattern/? is (are) processed with another set of one or more signals indicative of the kernel d, to generate data (i.e., one or more signals) indicative of the convolution result.
  • k(x,y) a(r Q 4 - 2r 2 (x 2 + y z ) + (x 2 + v 2 ) 2 ) for x 2 + y 2 ⁇ r 2
  • convolution in accordance with the invention employs separated kernels.
  • the basic premise is that, in order to perform two- dimensional convolution in accordance with the invention, one "factors" the separated kernel and performs two one-dimensional convolutions. These convolutions are not dependent on the radius, and in practice this technique will reduce the number of convolution operations drastically.
  • Jdx,y k x (x)ky(y).
  • k l (x) a(b 2 -x 2 ) for ⁇ x ⁇ ⁇ R,
  • Each one-dimensional convolution is performed in three parts.
  • xj ⁇ R) and the two decay regions (where x > R and x ⁇ -R) are each produced by different convolutions, whose results are added together to obtain the final result.
  • a c k c (x) ek c (x+i) +fk c (x+j) + gk c (x+m) + hk c (x+n),
  • a c k c (x) will be zero except at a small number of specific values of the parameter x.
  • a c k c x /?) for each value of x, one need only compute a small number of additions (and an even smaller number of additions near the boundaries of the interval over which the convolution is performed).
  • a small number of initial values of A c 'l (A c k c x p)(x) are found by direct computation to "ignite" the recursion relation calculation.
  • a c k c (x) k c (x+3) - 3k c (x+2) + 3k c (x+l) - k c (x).
  • a c k c x p)(x) for each value of x, one need only compute seven additions (and an even smaller number of additions near the boundaries of the interval over which the convolution is performed).
  • k c (x+3) A c k c (x) + k c (x) + 3k c (x+2) - 3k c (x+l), and recursively solving for x.
  • the final difficulty is that we need three initial values of c to do this.
  • the one-dimensional convolution for the positive decay region is performed using the one-tailed Laplacian decay kernel
  • A+ k + (x) k + (x + 1) - cT x k + (x).
  • r(x + 1) (A + k + ⁇ p)(x)+ cT x r(x).
  • the one-dimensional convolution for the negative decay region is exactly the same as the one-dimensional convolution for the positive decay region, except that the recursion should be taken in the opposite direction. This slows the algorithm, since the k c and k + convolutions can be calculated on a single pass of the pointers, but the k. convolution requires a second, reversed pass.
  • A- k x) k.(x - 1) - dr kfx).
  • the algorithm takes as an input a pattern/? ⁇ ,; ) for (x,y) e D, withD being a rectangular domain of given size. It returns r(x,y) - (k xp)(x,y) for all points in D, where k(x,y) is defined in equation (1) above.
  • k(x,y) is defined in equation (1) above.
  • step i Loop over the rows. For each row, compute the convolution of the row with k x (x) by computing a cap convolution (step i), a positive decay convolution (step ii), and a negative decay convolution (step iii) on the row, adding together (step iv) the three convolutions produced in steps (i), (ii), and
  • steps (i), (ii), (iii), and (iv) is performed as follows for each row of the pattern:
  • step (c) Repeat step (a), this time on rows of (k x x p) ⁇ rather than on rows of/?, thereby convolving all columns ofk ⁇ x p with k x ; (d) transpose the result of step (c) to produce r(x,y), and return.
  • step (b) includes the step of storing (k x x p) in the memory, such that each row of (k x x p) ⁇ occupies the memory locations formerly occupied by a corresponding row of p(x,y).
  • the algorithm is preferably performed by a processor programmed to compute each annihilated convolution and simultaneously inflate it. This way, there is no need to store the annihilated convolution before it is flattened. Also, both h, and k c are preferably computed on the same pass through the processor. Also, two full transpositions will typically not be necessary where a column-ferrying technique is employed to compute the cap and positive decay region convolutions, and the column return, negative convolution and add should all be part of the same loop. It is also likely that when employing some types of processors to perform the algorithm, in-place operations are fastest.
  • the above-described embodiment of convolution in accordance with the invention is dependent on the size of the pattern since the Laplacian decay is e x and the Gaussian is e x2 .
  • the error is dependent on the size of the domain.
  • Example 2 Squared cosine separated kernel.
  • a "squared cosine" separated kernel a "squared cosine" separated kernel
  • Convolution using the squared cosine separated kernel is faster and cleaner than convolution using the above-described "quadratic with Laplacian decay" kernel, since convolution using the squared cosine kernel is independent of the pattern's dimension.
  • the squared cosine kernel also allows a mostly in-place method for the set of row convolutions in one pass, and also a ferrying technique with calculations done on the transfer. Since the squared cosine kernel is more stable, floating point arithmetic can be used.
  • the two dimensional kernel is defined as:
  • k c (x) 2 cos(7ZX / R) for ⁇ x ⁇ ⁇ R
  • the other kernel is
  • k c is annihilated by A c .
  • the annihilation operator A that we will use is the "product" of these two operators:
  • AM Ac o AM + 2) - (1 + 2 ⁇ a ⁇ LMx + 1) ⁇ M) ⁇
  • step (b) Transpose k 2 x p to produce (k x /?) ⁇ ; (c) Repeat step (a), this time on rows of (k 2 x p) T rather than on rows of p(x,y), thereby convolving all columns of k 2 x p with k .
  • a final step of transposing the result of step (c) is performed to produce r(x, y), and the processor then returns to its initial state.
  • step (b) includes the step of storing (k 2 x p) ⁇ in the memory, such that each row of (k 2 x /?) ⁇ occupies the memory locations formerly occupied by a corresponding row of p(x,y).
  • the naive computation count for the described algorithm is as follo s: there are h repetitions of step (a), where h is the height; computing Ak involves one multiplication and three adds, done w + R + 3 times (where w is the width); the recursion requires one multiplication and four adds done w + R + 3 times
  • VN array requires:
  • steps (d) and (e) are repeated for each additional dimension of pattern ? (with the transposition operation as defined above in the Summary), and the transposition of the result of the final repetition of step (e) is a close approximation (or exactly equal) to the desired convolution Dp .
  • step (c) a small number of the lowest components of r ate computed directly during the preliminary ignition step, and the rest of the components of r are then determined using the natural recurrence relation.
  • step (c) a small number of large negative components of r are computed directly during the preliminary ignition step, and the rest of the components of r are then determined using the natural recurrence relation.
  • discrete convolution is performed employing a matrix formalism, whereby a 2- dimensional pixel rectangle is converted into a 1-dimensional column vector using lexicographical indexing.
  • the circulant matrix F becomes an NN-by-NN monstrosity, but when an annihilation operator A is applied, the operator AE will be sparse.
  • This class of embodiments of the invention has the advantage of converting nonvanishing circular regions to deterministically- indexed matrix elements.
  • FIG. 1 is a block diagram of a computer system which embodies the invention.
  • the system includes processor 2 (which is programmed with software for implementing any embodiment of the inventive convolution method), display device 4, input device 6, and memory 8 (and optionally also output device 5) coupled to processor 2.
  • processor 2 is a typical processor configured to process binary data, it is programmed with software for implementing a "discrete" implementation of the inventive method.
  • processor 2 is programmed to determine (from a user- specified convolution kernel d of interest) particular parameters of a spline kernel k which cause the spline kernel to approximate the convolution kernel d (subject to user-specified constraints).
  • processor 2 generates one or more look-up tables, stores them in memory 8 (or a cache memory associated with processor 2), and then accesses the stored look-up tables during performance of the invention. The user controls processor 2
  • Output device 5 (which can be employed instead of or in addition to display device 4) is preferably a pattern-capable device such as a sound reproduction unit, an I/O port (input/output port), or a signal processing (and/or storage) device (or system).
  • FIG. 2 is a block diagram of a lithography system including digital signal processor (“DSP") 10 which is configured to perform convolution (in accordance with the invention) on image data stored in memory unit 14.
  • DSP digital signal processor
  • the image data stored in memory unit 14 determines the pattern/? to be convolved.
  • the output data is stored in memory 14 (and optionally undergoes further processing) and/or is output to "pattern signal" generation device 16.
  • Device 16 generates a pattern signal (e.g., a beam of optical or other electromagnetic radiation having time- varying amplitude or an electron beam having time- varying amplitude) in response to data it receives
  • a pattern signal e.g., a beam of optical or other electromagnetic radiation having time- varying amplitude or an electron beam having time- varying amplitude
  • device 16 emits a beam of optical radiation which is incident on optics 18 to cause optics 18 to project an output beam on lithography target 20.
  • Optics 18 scans the output beam across lithography target 20, in response to scan control signals from control unit 12.
  • the amplitude of the beam emitted from device 16 varies as a function of time (in response to the output data from DSP 10, which assumes the scan pattern determined by the scan control signals from unit 12) in such a manner that the scanned output beam (the output of optics 18) exposes target 20 to a pattern of pixels.
  • device 16 emits an electron beam which is incident on optics 18, to cause optics 18 to project an output electron beam on lithography target 20.
  • Optics 18 scans the output electron beam across target 20, in response to scan control signals from control unit 12.
  • the amplitude of the electron beam emitted from device 16 varies as a function of time (in response to the output data from DSP 10, which assumes the scan pattern determined by the scan control signals from unit 12) in such a manner that the scanned output beam from optics 18 exposes target 20 to a pattern of pixels.
  • device 16 can emit radiation which is focused (without being scanned) by optics 18 to project on target 20 an image comprising pixels, said image determining a pattern.
  • one embodiment of device 16 emits optical radiation which is focused by optics 18 so as to project from optics 18 as a pattern on target 20, without the need for optics 18 to scan any beam across target 20.
  • Pattern signal recognizing that examples of such pattern signal include a beam of optical or other radiation to be scanned by optics 18, an electron beam to be scanned by optics 18, and radiation to be focused by but not scanned by optics 18.
  • Optics 18 can be a set of reflective and/or refractive optics (with or without scanning capability, including means for moving one or more elements of the optics to scan a beam across target 20), or it can be a set of electron beam optics (with scanning capability, including means for moving one or more elements thereof to scan an electron beam across target 20).
  • the output of optics 18 is projected (e.g., including by being scanned) as a pattern on lithography target 20.
  • target 20 is a glass plate (so that projection of the pattern thereon produces a mask useful in integrated circuit manufacture) or a semiconductor wafer.
  • Optics 18 typically focuses the pattern signal so that a very small pattern is projected on target 20.
  • the "raw" pattern signal that is output from device 16 determines a pattern, diffraction artifacts (or other artifacts) introduced by optics 18 (or inherent in the interaction between the imaging beam and target
  • the "raw" pattern signal output from device 16 is an electron beam to be focused by electron beam optics 18, and scanned onto a sequence of pixels on target 20, in an effort to project on target 20 a pattern determined by the amplitude of the focused electron beam incident on each single pixel of the sequence.
  • the well known "proximity problem" (discussed above) causes exposure of an area surrounding each pixel on which the focused electron beam is incident (due to scattering of electrons away from each such pixel to the surrounding areas of the target).
  • the pattern actually produced on target 20 is determined by supe ⁇ osition of the results of directing the focused electron beam at each pixel of the sequence, where a multi-pixel region is exposed each time the focused electron beam is incident at one of the pixels of the sequence.
  • DSP 10 is configured to generate output data which will cause device 16 to output a "raw" pattern signal having the characteristics that are needed to produce a desired pattern on target 20.
  • DSP 10 performs a deconvolution operation on a large array of pixels (image data stored in memory 14) in order to compensate for any artifacts expected to be introduced by optics 18 and/or any expected scattering (by target 20) of an electron beam incident on target 20 from optics 18.
  • the deconvolution operation performed by DSP 10 includes a convolution operation (performed in accordance with the invention) on stored image data that it retrieves from memory 14, where the image data determines a very large array of pixels which in turn determines a pattem "/?".
  • Controller 12 of the Fig. 2 system provides appropriate control signals to units 10, 14, 16, and 18, and is capable (for example) of downloading instructions to DSP 10 to cause it to execute the convolution operation with specified parameters.
  • Fig. 3 is a block diagram of a digital signal processor (DSP) which can be used as DSP 10 of Fig. 2, and which is configured to perform convolution in accordance with the invention on image data.
  • the DSP of Fig. 3 includes arithmetic computational unit (ACU) 34 which includes addition and multiplication circuitry (for performing the matrix multiplication and recurrence relation operations required to implement the convolution), program memory 30 (which stores the instructions which are executed by the DSP to perform the convolution operation), program control unit (PCU) 32, memory management unit 36, and data memory 38, connected as shown.
  • ACU arithmetic computational unit
  • PCU program control unit
  • controller 12 of Fig. 2 loads appropriate instructions into memory 30, and data indicative of a pattern ? (the data labeled "INPUT" in Fig. 3) is loaded into memory 38.
  • PCU 32 includes instruction fetch circuitry for fetching a sequence of the instructions from program memory 30, instruction decoding circuitry, and registers for storing control bits generated by the decoding circuitry for assertion at appropriate times to unit 36 and/or unit 34.
  • Memory management unit 36 is configured to generate address signals (each identifying a memory location in memory 38 for writing data to or reading data from) in response to control bits from PCU 32, and to assert such address signals over an address bus to memory 38.
  • address signals each identifying a memory location in memory 38 for writing data to or reading data from
  • unit 36 asserts address signals to data memory 38.
  • data memory 38 sends signals indicative of data to ACU 34 (over a data bus).
  • memory 38 In some implementations, memory
  • data indicative of the final convolution result is output from memory 38 (as output data " OUTPUT 1") to pattern signal generator 16.
  • data indicative of the final convolution result streams directly (or through a buffer) to pattern signal generator 16 from ACU 34 (as output data "OUTPUT2").
  • Fig. 4 is a variation on the system of Fig. 2, in which elements 16, 18, and 20 are identical to identically numbered elements of Fig. 2.
  • element 46 is configured to perform convolution (in accordance with any embodiment of the invention) on image data (determining the pattern/? to be convolved) which it receives from memory unit 44.
  • the output data is streamed directly from DSP to pattern signal generation device 16, and device 16 generates a pattern signal in response to the output data from element 46. Controller 42 of the Fig.
  • convolution kernels "d" employed in the field of electron beam lithography proximity error correction are sufficiently smooth to be adequately approximated by a separated-spline kernel "k.”
  • Convolution kernels that are noisy (random), such as those encountered in cryptography, are typically not sufficiently smooth to be adequately approximated by a separated-spline kernel
  • Fig. 5 is a simplified elevational view of computer-readable storage medium 50 (which is a CD-ROM) which stores computer-executable instructions (software).
  • the instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
  • the invention is implemented by hardwired circuitry (e.g., custom or dedicated electronic circuitry) or FPGA systems (field programmable gate array based computing systems) rather than in software or by a system including a digital signal processor ("DSP").
  • DSP digital signal processor

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)

Abstract

A fast convolution method applicable to convolving a signal (indicative of an (n)-dimensional pattern, where (n) is greater than or equal to two) with a smooth kernel that can be approximated by a separated-spline kernel, and a system configured to perform such method using software or signal processing cirucitry. Unlike Fourier-based convolution methods which require on the order of (N) log (N) arithmetic operations for a signal of length (N ), the method of the invention requires only on the order of N arithmetic operations to do so. Unlike wavelet-based convolution approximations (which typically also require more arithmetic operations than are required in accordance with the invention to convolve the same signal), the method of the invention is exact for convolution kernels which are spline kernels.

Description

METHOD AND APPARATUS FOR FAST SIGNAL CONVOLUTION USING SEPARATED-SPLLNE KERNEL
Cross-reference to Related Application The present application is a continuation-in-part of pending U.S. Patent
Application No. 09/480,908, filed January 11, 2000 by Richard E. Crandall, and which is assigned to Etec Systems, Inc. The full text of U.S. Patent Application No. 09/480,908 is incorporated herein by reference.
Field of the Invention
The invention relates to signal processing methods and apparatus for performing convolution on data indicative of a pattern (e.g., image data indicative of a pixel array). In accordance with the invention, the convolution kernel is (or is approximated by) a separated-spline function.
Background of the Invention
Convolution is commonly performed on signals in many contexts, including the fields of sound, still image, video, lithography, and radio (radar) signal processing. Typically, the signals to be convolved are pattern signals. Each of the expressions "pattern" and "pattern signal" is used herein in a broad sense to denote a one-dimensional sequence or two-dimensional (or higher dimensional) array of data words (which can be, but need not be pixels). Typically, the data words comprise binary bits, and the convolution is performed in discrete fashion on the binary bits using software, digital signal processing circuitry, custom hardware, or FPGA systems (field programmable gate array based computing systems).
The term "data" herein denotes one or more signals indicative of data, and the expression "data word" herein denotes one or more signals indicative of a data word. The motivations for implementing convolution rapidly, even when processing data indicative of very large patterns, are myriad. The present invention was motivated by the need for proximity correction in the field of lithography. In such problems, one attempts a two-dimensional convolution between data indicative of a large pattern "p" (where the pattern is a pixel array) and a diffusion kernel V. Often the kernel "d" is a Gaussian or a superposition of Gaussians, or is otherwise a smooth kernel. More specifically, the present invention grew out of attempts to establish a suitable "O(NN)" algorithm (an algorithm requiring not more than on the order of NN multiplications and additions) for convolving a two-dimensional pattern comprising NN pixels, where each of NandN is very large) with a Gaussian kernel (or other smooth kernel) such that the convolution is exact or very close to exact.
The objective in performing proximity correction (in the field of lithography) is to generate a "raw" optical signal (or "raw" electron beam signal) which can be input to a set of reflective or refractive optics (or electron beam optics), in order to cause the output of the optics to produce a desired pattern on a mask or wafer. To determine the characteristics of a raw optical signal (or raw electron beam signal) that are needed to produce the desired pattern on the mask or wafer, a deconvolution operation is typically performed on a very large array of pixels (which determine a pattern "p") in order to correct for the well known proximity problem. In the case of electron beam lithography, the proximity problem results from electron scattering in the substrate (mask or wafer) being written. Such scattering exposes broadened areas on the substrate to electrons (i.e., an area surrounding each pixel to be written in addition to the pixel itself), with the scattering effectively broadening the electron beam beyond the beam diameter with which the beam is incident on the substrate.
In nearly all proximity correction schemes, such a deconvolution operation includes at least one convolution step. Accordingly, in performing typical proximity correction, a very large array of pixels (determining a pattern "p") must be convolved with a diffusion kernel. Although such a convolution is typically performed on a pattern comprising a very large array of binary pixels, this restriction is not essential in the following discussion and is not essential to implementation of the invention. Indeed, the invention can implement convolution on data indicative of any pattern "p" with a smooth convolution kernel "d" having characteristics to be described below. For data indicative of a pattern "p" and a convolution kernel "d" we consider the cyclic convolution:
where xς denotes that the convolution operator has cyclic character, and an acyclic convolution which differs only in the indicial constraint and range:
where "χA" denotes that convolution operator has acyclic character.
For simplicity, we restrict much of the discussion herein to one- dimensional cases (in which the pattern^ is an ordered set of N data values and the kernel is an ordered set of M values). Despite this, it should be appreciated that in typical embodiments of the invention, the pattern is two-dimensional (a two-dimensional arrays of data values determines the pattern) and the summation defining the convolution (a summation which corresponds to either one of the summations set forth in the previous paragraph) is over index k as well as index of the array pjk- In the case of a two-dimensional pattern ? determined by an N by N array of data values, the indices n, i, j and domain lengths N in the formulae set forth in the previous paragraph are 2- vectors.
In one-dimensional cases, the result of the cyclic convolution has length N(it comprises N data values), and the result of the acyclic convolution has length +N-l.
It is standard that a cyclic convolution d xcp can be cast as an equivalent matrix-vector product:
d xcp ≡Dp,
where D is the circulant matrix of d (hereinafter the "circulant" of d), whose 1- dimensional form is defined (assuming that N is greater than 3) as:
Therefore, conventional methods for cyclic convolution can be cast in the language of matrix algebra. Acyclic convolution can be obtained with similar matrix manipulations. For the sake of simplicity, we will use the symbol x hereinbelow to denote convolution having either acyclic or cyclic character. In most of the discussion the symbol will refer to convolution having acyclic character. Those of ordinary skill in the art will recognize that, given a specified acyclic convolution, a corresponding cyclic convolution can be implemented by slight modification of the parameters (e.g., the boundary conditions and definition of the circulant of the kernel) that determine the acyclic convolution.
Above-referenced U.S. Patent Application No. 09/480,908 discloses a fast convolution method whose central idea (in one-dimensional embodiments) is to approximate a smooth kernel d by a polynomial sphne kernel/ (where/is a spline function_ ( ) which is piecewise a polynomial of degree δ with L pieces fi(x)), and then to use appropriate operators that annihilate (or flatten) each polynomial of given degree (in a manner to be explained) to calculate the convolution of/ and ? quickly. In some embodiments, the smooth kernel d is approximated by a spline kernel /which is not a polynomial spline kernel, but which consists of L pieces defined over adjacent segments of its domain (in typical two-dimensional cases, the latter spline kernel is a radially symmetric function whose domain is some continuous or discrete set of values of the radial parameter). Though "spline" convolution as described in U.S. Application No. 09/480,908 has features reminiscent of conventional wavelet schemes and is an O(N) algorithm (as are wavelet schemes), an advantage of "spline" convolution is that it can be performed (on data indicative of a pattern/? consisting of N data values) with cN arithmetic operations (multiplications and additions), whereas conventional wavelet convolution on the same data would require bN arithmetic operations, where the factor "b" is typically (i.e., with typical error analysis) significantly larger than the factor "c." In other words, the implied big-O constant for the spline convolution is significantly smaller than the typical such constant for conventional wavelet convolution.
Spline convolution, as described in U.S. Application No. 09/480,908, is a method for performing cyclic or acyclic convolution of a pattern "p" (i.e., data indicative of a pattern "p") with a smooth diffusion kernel d, to generate data indicative of the convolution result r = Dp, where D is the circulant of d. The pattern "p" can be one-dimensional in the sense that it is determined by a continuous (or discrete) one-dimensional domain of data values (e.g., pixels), or it can be two-dimensional in the sense that it is determined by a continuous two- dimensional domain of data values (or a two-dimensional array of discrete data values), orp can have dimension greater than two. In typical discrete implementations, the pattern ? is one-dimensional in the sense that it is determined by a discrete, ordered set of data values (e.g., pixels) /?,, where i varies from 0 to N-l (where N is the signal length), or it is two-dimensional in the sense that it is determined by an array of data values py, where i varies from 0 to N-l and/' varies from 0 to N-l, or it has dimension greater than two (it is determined by a three- or higher-dimensional set of data values). Typically, the kernel d is determined by an array of data values dy, where i varies from 0 to N- 1 andy varies from 0 to N-l (but the kernel d can alternatively be determined by a discrete set of data values do through d^.\).
In some embodiments described in U.S. Application No. 09/480,908, the convolution Dp is accomplished by performing the steps of:
(a) approximating the kernel d by a polynomial spline kernel /(unless the kernel d is itself a polynomial spline kernel, in which case d =/and step (a) is omitted);
(b) calculating q = Bp = Δg+iFp, where F is the circulant of kernel/, and Δ&r\ is an annihilation operator (whose form generally depends on the degree δ of the polynomial segments off) which operates on circulant F in such a manner that Δ δ+ι = B is sparse; and
(c) back-solving q to determine r = Fp.
In cases in which the kernel d is itself a polynomial spline kernel (so that d =/ and F = D), the method yields an exact result (r = Dp). Otherwise, the error inherent in the method is (f- d) x p, where x denotes convolution, and thus the error is bounded easily.
In one-dimensional cases (in which the pattern to be convolved is a one- dimensional pattern of length N), Δg+i has the form of the N x N circulant matrix defined as follows:
in which each entry is a binomial coefficient, and δ is the maximum degree of the spline segments of spline kernel/ For example, δ = 2 where the spline kernel/comprises quadratic segments. In two- or higher-dimensional cases, the annihilation operators can be defined as
where Q"",, is the «-th partial derivative with respect to the Λ-th of d coordinates. For example, the Laplacian
v2 = d2 xl+ ... + d2 d
X2, .- Xd).
In the one-dimensional case, the end points of each segment (the "pivot points") of spline kernel/may be consecutive elements dt and dm of kernel d, and step (a) can be implemented by performing curve fitting to select each segment of the spline kernel as one which adequately matches a corresponding segment of the kernel d . In some implementations, appropriate boundary- conditions are satisfied at each pivot point, such as by derivative-matching or satisfying some other smoothness criterion at the pivot points.
In some implementations described in Application No. 09/480,908, step (c) includes a preliminary "ignition" step in which a small number of the lowest components oϊr — Fp are computed by exact multiplication of/? by a few rows of F, and then a step of determining the rest of the components of r using a natural recurrence relation determined by the spline kernel and the operator Δ<j+ι. For example, in the one-dimensional case, the lowest components of r are r0) ri,—> rs, where "δ" is the maximum degree of the spline segments of spline kernel/(for example r0 , r\ , and r2 where the spline kernel comprises quadratic segments), and these (δ + 1) components are determined by exact multiplication of ? by (δ + 1) rows of F. The (c5 + 1) components can alternatively be determined in other ways. Then, the rest of the components "rs" are determined using a natural recurrence relation determined by the operator Δ + . The
"ignition" operation which generates the components r0, r\,..., rs, can be accomplished with O(N) computations. The recurrence relation calculation can also be accomplished with O(N) computations.
In other embodiments, the method disclosed in U.S. Application No. 09/480,908 for performing the convolution r = Dp includes the steps of:
(a) approximating the kernel dby a polynomial spline kernel/(unless the kernel d is itself a polynomial spline kernel, in which case d =/and step (a) is omitted);
(b) calculating q - Bp = A&Fp, where F is the circulant of kernel/and As is a flattening operator (whose form generally depends on the degree δ of the polynomial segments of F, and which operates on circulant F such that B - Δ$F is almost everywhere a locally constant matrix); and
(c) back-solving A$r = q to determine r = Fp. In one-dimensional cases (in which/? has length N), As has the form of the N x N circulant matri :
in which each entry is a binomial coefficient, and δ is the maximum degree of the spline segments of spline kernel/ In higher-dimensional cases, the flattening operator Δδ is defined similarly.
In other embodiments disclosed in U.S. Application No. 09/480,908, the convolution Dp (where D is the circulant of smooth kernel d) includes the steps of:
(a) approximating the kernel d by a spline kernel/which is not a polynomial spline kernel (unless the kernel d is itself such a spline kernel, other than a polynomial spline kernel, in which case d =f d step (a) is omitted);
(b) calculating q = Bp = AFp, where F is the circulant of kernel/ and A is an annihilation or flattening operator, where A operates on circulant F in such a manner that AF - B is sparse when A is an annihilation operator, and A operates on circulant F in such a manner that AF = B is almost everywhere a locally constant matrix when A is a flattening operator; and
(c) back-solving Ar — q to determine r ~ Fp.
To better appreciate the advantages of the present invention over conventional convolution methods, we next explain two types of conventional convolution methods: Fourier-based convolution and wavelet-based convolution.
As is well known, Fourier-based convolution relies on the elegant fact that if F is a Fourier matrix, say
then the transformation FDF1 of the circulant is diagonal, whence we compute:
Dp = l (FDF-1) Fp,
where the far-right operation Fp is the usual Fourier transform, the operation by the parenthetical part is (by virtue of diagonality) dyadic multiplication, and the final operation Fl is the inverse Fourier transform. For arbitrary D one requires actually three Fourier transforms, because the creation of the diagonal matrix
FDFX requires one transform. However, if D is fixed, and transformed on a one-time basis, then subsequent convolutions Dp only require two transforms each, as is well known. The complexity then of Fourier-based cyclic convolution is thus O(Nlog N) operations (i.e., on the order of N log N multiplications and additions) for convolving a pattern/? of length N (a pattern determined by N data values), because of the 2 or 3 FFTs (Fast Fourier Transforms) required. It should be noted that the Fourier method is an exact method (up to round-off errors depending on the FFT precision).
Another class of conventional convolution methods consists of wavelet convolution methods, which, by their nature, are generally inexact. The idea underlying such methods is elegant and runs as follows in the matrix-algebraic paradigm. Assume that, given an N-by-N circulant D, it is possible to find a matrix Pf(this is typically a compact wavelet transform) which has the properties: (1) Wis unitary (i.e. W~ is the adjoint of W);
(2) PFis sparse; and
(3) WDWX is sparse, where "sparse" in the present context denotes simply that any matrix- vector product Wx, for arbitrary x, involves reduced complexity O(N), rather than say O(N2).
With the assumed properties, we can calculate: Dp = Wλ(WDWl)Wp by way of three sparse-matrix-vector multiplications, noting that unitarity implies the sparseness of Wl. Therefore the wavelet-based convolution complexity is O(N) for convolving a pattern ? determined by N data values, except that it is generally impossible to find, for given circulant D, a matrix W that gives both sparsity properties rigorously. Typically, if the convolution kernel d is sufficiently smooth, then a wavelet operator W (which is sparse) an be found such that within some acceptable approximation error the property (3) above holds. Above-noted properties (1) and (2) are common at least for the family of compact wavelets (it is property (3) that is usually approximate).
An advantage of "spline" convolution (in accordance with the teaching of U.S. Application 09/480,908) over conventional wavelet convolution is that it can be performed (on data indicative of a pattern ? comprising N data values) with cN arithmetic operations, whereas conventional wavelet convolution on the same data would require bN arithmetic operations, where (assuming typical error budgets) the factor "b" is significantly larger than the factor "c." Among the other important advantages of the "spline" convolution method of Application 09/480,908 (over conventional convolution methods) are the following: spline convolution is exact with respect to the spline kernel f, whereas wavelet convolution schemes are approximate by design (and error analysis for wavelet convolution is difficult to implement); the signal lengths for signals to be convolved by spline convolution are unrestricted (i.e., they need not be powers of two as in some conventional methods, and indeed they need not have any special form); and spline convolution allows acyclic convolution without padding with zeroes.
Separated-spline convolution in accordance with the present invention (like spline convolution in accordance with U.S. Application 09/480,908) is an O(N) method for convolving a pattern ? determined by N data values. Separated-spline convolution in accordance with the present invention has an advantage over spline convolution in accordance with U.S. Application No. 09/480,908 in that separated-spline convolution in accordance with the invention can be performed (on data indicative of a two- or higher-dimensional pattern/? consisting of N data values) with dN arithmetic operations (multiplications and additions), whereas spline convolution in accordance with
U.S. Application No. 09/480,908 on the same data would require cN arithmetic operations, where the factor "c" is larger, and typically significantly larger, than the factor "d." In other words, the implied big-O constant for separated-spline convolution according to the present invention is significantly smaller than the typical implied big-O constant for spline convolution as described in U.S. Patent Application No. 09/480,908.
Summary of the Invention In a class of embodiments, the invention is a method for perfonning two-dimensional cyclic or acyclic convolution of an π-dimensional pattern "/?"
(i.e., data indicative of an n-dimensional pattern "/?"), where n > 2, with a smooth diffusion kernel d, to generate data indicative of the convolution result r = Dp = d x p, where D is the circulant matrix (sometimes referred to herein as the "circulant") of d, and "x" denotes convolution. A two-dimensional pattern ? is determined by a continuous two-dimensional range of data values or two-dimensional array of discrete data values. In typical discrete implementations, the pattern/? is two-dimensional in the sense that it is determined by a discrete, ordered set of data values (e.g., pixels) py, where i varies from 0 to N-l and/' varies from 0 to
N-l . Where the pattern p has dimension greater than two, it is determined by a three- or higher-dimensional set of data values. Typically, the kernel d is determined by an array of data values dy, where i varies from 0 to N-l and/' varies from 0 to N-l. In preferred embodiments, the inventive method for performing the convolution r = Dp = d xp , where d is well approximated by (or equal to) a separated-spline kernel, includes the steps of:
(a) specifying the separated-spline kernel as k(x ,..., xn)~ k\(xx )h x2)... kn(xn), where k admits of an operator A n is the dimension of pattern p, and Aj is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjKj is almost everywhere a locally constant matrix (when Aj is a flattening operator);
(b) calculating x p for each row of the pattern ?; (c) back-solving A ιrι = q\ for said each row of the pattern to determine r\ ~ k\ x p for the pattern, by performing a preliminary ignition step in which a small number of components of r\ are computed directly, and then determining the rest of the components of r\ using a natural recurrence relation determined by the operator A\;
(d) transposing r\ = k\ x p generated in step (c), to generate r\ T = (k\ x p)τ for the pattern, and calculating q = Aiki x r\ τ = Aikj x (k\ x p)τ for each row of (k\ x ?)τ;
(e) back-solving Ajr1 = for said each row of (k\ xpf to determine rτ
= (£ x k\ x p)τ for the pattern.
In the case that pattern/? is two-dimensional (n = 2), the result of step (e) is typically transposed to generate data indicative of r = (k% x k\ x p), where the result r is exactly equal to k x /?, which is a close approximation to the desired convolution r = Dp .
Where/? is an «-dimensional pattern for which n > 2, steps (d) and (e) are iterated, in the sense that they are repeated for each for each additional dimension (with g, = Atkt x qt-\ being calculated for the z'-th dimension during the z'-th iteration), and the transposition of the result of the final repetition of step (e) is exactly equal to k /?, which is a close approximation to the desired convolution r — Dp. For an n-dimensional array (having axes z'ι, z2, ..., z'„), the multidimensional transposition is defined as = Mhh--1"1' . In some implementations of step (c), a small number of the lowest components of n are computed directly during the preliminary ignition step, and the rest of the components of r\ are then determined using the natural recurrence relation. In preferred implementations of step (c), a small number of large negative components of rx are computed directly during the preliminary ignition step, and the rest of the components of r\ are then determined using the natural recurrence relation.
In cases in which the kernel d is itself a separated-spline kernel (so that d = k, and K = D), the method yields an exact result (r = Dp). Otherwise, the error inherent in the method is (k - d) x p, and thus the error is bounded easily. In one class of preferred embodiments in which pattern/? is two- dimensional, the two dimensional kernel is defined as k(x,y) = k\(x)k\(y), where
The one-dimensional kernel k\(x) has three parts: k\(x) = kc(x) + k+(x) + k.(x), where kc(x) is a kernel,
kc(x) = (R2 - x2) for \x\ ≤ R
k+(x) is a one-tailed Laplacian kernel,
and k.(x) is a one-tailed Laplacian kernel,
Each one-dimensional convolution is performed in three parts. The cap ≤ R) and the two decay regions (where x > R and x < -R) are each produced by different convolutions, whose results are added together to obtain the final result.
More specifically, in the noted class of embodiments, the inventive method accomplishes the convolution r = d χp, where/? is a two-dimensional pattern and d is a kernel that is well approximated by (or equal to) a separated- spline kernel, by performing the steps of:
(a) specifying the separated-spline kernel as k(x, y)= kχ(x)kx(y), where k\ (x) is defined in the previous paragraph; (b) computing a cap convolution kc x p(x) for each row of the pattern/?, a second convolution k+ x p(x) for said each row, and a third convolution k. x p(x) for said each row, and adding together the cap convolution, the second convolution, and the third convolution for said each row to generate data indicative of q\ - k\ x p — (kc x p) + k+ x p) + (k. x p), wherein the cap convolution is accomplished by computing Ackc xp for said each row, where Ac is an annihilation operator defined as
Ackc(x) - kc(x+3) - 3kc(x+2) + 3kc(x+l) - kc(x), and computing Ac 'l(Ackc xp)(x) = kc χp(x) using the recursion relation kc(x+3) = Ackc(x) + kc(x) + 3kc(x+2) - 3kc(x+l), the second convolution is accomplished by computing A+k+ x p(x), for said each row, where A+ is an annihilation operator defined as
A+k+(x) = k+(x + 1) - d~ k+(x), and computing A + ~x(A+k+ x p)(x) = k+ x p(x) using the recursion relation: (k+ x p)(x+l,y) = (A+k+ x p)(x,y) + d~x(k+ x p)(x,y), and the third convolution is accomplished by computing A.k. x p(x), for said each row, where A. is an annihilation operator defined as
A. k.(x) = k.(x - 1) - d~ k.(x), and computing A.'x(A.k. xp)(x) = k. xp(x) by recursing from the end to the beginning using the recursion relation:
(k. x p)(x -l,y) = (A- x p)(x,y) + d~ (k. x ρ)(x,y)\
(c) transposing x ?)τ ;
(d) repeating step (b), this time on rows of (k\ x p)τ rather than on rows of/?, thereby convolving all columns of k\ x p with k\ and (e) transposing the result of step (d) to generate data indicative of r = k x p.
Typically, the pattern p(x,y) is stored in a memory prior to step (b), each iteration of step (b) is performed on a different row of the pattern p(x,y) read from the memory, and step (c) includes the step of storing (k\ x p)τ in the memory such that each row of (k\ x p)τ occupies the memory locations formerly occupied by a corresponding row of p(x,y). Optionally, the final step (step (e)) is omitted.
In another class of preferred embodiments in which pattern ? is two- dimensional, the two-dimensional kernel is defined as k(x,y) = h(x)h(y), where
£2(;c) = 2cos2( ε /(2i?)) for ≤ R
The one-dimensional kernel k2(x) has two parts:
kz(x) = kc(x) + kt(x),
where kc(x) is a kernel
= 0 for |x| > R,
and kt(x) is
More specifically, in the embodiments in which k(x,y) = k2(x)k2(y) with k2(x) as defined in the previous paragraph, the inventive method for performing the convolution r = d p , where d is well approximated by (or is equal to) a separated-cosine kernel, includes the steps of: (a) specifying the separated-cosine kernel as k(x, y)= k2(x)k2(y);
(b) for each row of the pattern p(x,y), performing the following two steps:
(i) computing values Ak2 x p for said each row, for -R - 3 ≤ x ≤ n -l, as follows: Ak2(x) x p(x) = 2sin2(π/2L)(/?(x + R) +p(x + R + l) -p(x- R) -p(x- R + l)); and
(ii) performing a recursion operation, using the recursion relation r(x + 2) =Ar(x) + 2sin2(π/2L)(r(x+l) - r(x)) + r(x - 1), and using the relations r(-R - 4) = r(-R - 3) = r(-R - 2) = 0 to ignite said recursion operation, to find r(x) = A'1 (Ak2 x /?) for said each row, thereby producing, at the end of a final iteration of step (ii), data indicative of k2 x p;
(c) transposing k2 x p to produce (k2 x p) ; and
(d) repeating step (b), this time on rows of (k2 x /?) rather than rows of /?, thereby convolving all columns of k x p with k2. Preferably, steps (a)-(d) are performed by an appropriately programmed processor, and the processor performs the additional step of: (e) after step (d), transposing the result of step (d) to produce r(x, y), and returning to an initial processor state. In variations on the embodiments in this class, each occurrence of the factor 2sin2(π/2L) is replaced by the factor 1. In other embodiments, the invention is a computer programmed with software for performing convolution, on data indicative of an ?z-dimensional pattern (where n is greater than or equal to 2), using a separated kernel in accordance with any embodiment of the inventive method. Other embodiments of the invention include a digital signal processor including digital signal processing circuitry configured to perform convolution on data indicative of an n-dimensional pattern (where n is greater than or equal to 2), using a separated kernel in accordance with any embodiment of the inventive method, an apparatus (such as custom or dedicated electronic circuitry, or a field programmable gate array based computing system ("FPGA system")) configured to perform convolution on such data in accordance with any embodiment of the inventive method, and a lithography system including such digital signal processing circuitry, such custom or dedicated electronic circuitry, or such an FPGA system.
Also within the scope of the invention is a computer-readable storage medium which stores computer-executable instructions, wherein the instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
Brief Description of the Drawings Figure 1 is a block diagram of a computer system programmed with software for implementing the inventive method.
Figure 2 is a block diagram of a lithography system including a digital signal processor configured to perform convolution (in accordance with the invention) on image data, and a device which generates a pattern signal (e.g., an optical beam electron beam having time-varying amplitude) from the resulting convolved image data. The pattem signal is provided to a set of optics (e.g., reflective or refractive optics, or electron beam optics) and the output of the optics is projected as a pattern on a glass plate, thus producing a mask useful in integrated circuit manufacture. Figure 3 is a block diagram of a digital signal processor (which can be used as the digital signal processor of Fig. 2) configured to perform convolution
(in accordance with the invention) on image data.
Figure 4 is a block diagram of a lithography system which is a variation on the system of Figure 2. Fig. 5 is a simplified elevational view of a computer-readable storage medium (a CD-ROM) which stores computer-executable instructions, wherein the instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
Detailed Description of the Preferred Embodiments
Throughout the disclosure, including in the claims, the term "data" denotes one or more signals indicative of data words. Thus, the statement that data indicative of a pattern "/?" is convolved (in accordance with the invention) with data indicative of a smooth kernel "d" denotes that one or more signals indicative of the pattern/? is (are) processed with another set of one or more signals indicative of the kernel d, to generate data (i.e., one or more signals) indicative of the convolution result.
Preferred embodiments of the invention will be described with reference to "annihilation" and "flattening" operators.
We initially pro vide heuristic motivation for the theory of separation. The theory of spline convolution is based on the premise that we begin with a piecewise kernel, each of whose regions is annihilated by an operator, A. We then perform a convolution of the pattern with Ak. Since A annihilates each region of the kernel, the convolution is really with the boundary of the kernel's regions. Thus for a dimension d convolution, Ak is in some sense a (d - 1)- dimensional kernel. For example, consider the kernel:
k(x,y) = a(rQ 4 - 2r2(x2 + yz) + (x2 + v2)2) for x2 + y2 ≤ r2
= 0 for x2 + y2 > r2
When this kernel is annihilated, its boundary is a ring at r0, generated by the discontinuity of k. The non-zero area of Ak is linear in ro, and this is what is meant when we say that it is of "dimension" d - 1. It is for this reason that convolution in accordance with the invention employs separated kernels. The basic premise is that, in order to perform two- dimensional convolution in accordance with the invention, one "factors" the separated kernel and performs two one-dimensional convolutions. These convolutions are not dependent on the radius, and in practice this technique will reduce the number of convolution operations drastically.
Assume that we have a two-dimensional kernel, k(x,y), that can be expressed as:
Jdx,y) = kx(x)ky(y).
If this is so, then given a pattern, /?, we can express the result of the convolution as:
r(χ>y) = ∑k χ - i)∑k y(y - )p( j)
' J
But notice that py(yJ) = ∑Ky- J)p( J) j is the column-by-column convolution of/? with ky. To get the convolution result in accordance with the invention, we determine py(y,f) then convolve this with kx along the rows. As the following examples will show, this convolution is characterized by only a few operations per pixel.
What follows are two examples of separated-spline convolution implemented in accordance with the invention. These examples were selected to showcase the polynomial, Laplacian, and trigonometric spline techniques.
Example 1: Quadratic with Laplacian decay separated kernel
The one-dimensional kernel for this algorithm is defined as: kl(x) = a(b2 -x2) for \x\ ≤ R,
One may assert that (b2 - R2) = c R to achieve continuity. The two dimensional kernel is defined as:
k(x,y) = kx(x)kx(y). (1)
Each one-dimensional convolution is performed in three parts. The cap (where |xj < R) and the two decay regions (where x > R and x < -R) are each produced by different convolutions, whose results are added together to obtain the final result.
The one-dimensional cap convolution, using the kernel kc(x) = a(b2 -x2) for \x\ ≤ R
is accomplished as follows. We choose an annihilation operator Ac of the form
Ackc(x) = ekc(x+i) +fkc(x+j) + gkc(x+m) + hkc(x+n),
where e,f, g, h, i,j, m, and n are integers.
With appropriately chosen values of e,f g, h, i,j, m, and n, Ackc(x) will be zero except at a small number of specific values of the parameter x. Thus, to compute (Ackc x /?) for each value of x, one need only compute a small number of additions (and an even smaller number of additions near the boundaries of the interval over which the convolution is performed). After computing (Ackc x p)(x) for each value of x, a recursion relation is employed to compute Ac 'x(Ackc x p)(x) = r(x). In typical implementations, a small number of initial values of Ac 'l(Ackc x p)(x) are found by direct computation to "ignite" the recursion relation calculation.
Consider the case that the parameters "a" and "b" in kc(x) satisfy a = 1 and b = R. In this case, we choose the annihilation operator Ac to be
Ackc(x) = kc(x+3) - 3kc(x+2) + 3kc(x+l) - kc(x).
With this operator Ac, Ackc(x) will be zero except at four values of x x = -2R +1, x = 2R +1, x = -2R -1, and x = 2R -1. Thus, to compute (Ackc x p)(x) for each value of x, one need only compute seven additions (and an even smaller number of additions near the boundaries of the interval over which the convolution is performed). The recursion relation employed to compute Ac 'x(Ackc x p)(x) = r(x) is determined by rewriting the equation that defined Ac in the form
kc(x+3) = Ackc(x) + kc(x) + 3kc(x+2) - 3kc(x+l), and recursively solving for x. The final difficulty is that we need three initial values of c to do this. There are two solutions to this problem of "igniting" the convolution. First, we can solve for the values (r(0), r(l), r(2)} via direct computation, which takes 3(2R - 2) multiplies and 3(2R -3) adds. Otherwise, we could note that τ(x) must be zero for x less than -R, since these points lie beyond the kernel radius away from the pattern. Thus we know that r(-R - 2), r(-R - 1), and r(-R) are all zero. We can then recurse to find (r(0), r(l), r(2)} and thus ignite the convolution. The latter procedure requires 3(R + 2) adds and
(R + 2) multiplies, so in this case, the second approach is far more economical. There are other cases in which the second approach will be less economical than the first approach.
The one-dimensional convolution for the positive decay region is performed using the one-tailed Laplacian decay kernel,
which is annihilated by the simple annihilation operator A+:
A+ k+(x) = k+(x + 1) - cTxk+(x).
Notice that A+ k+(R) = cdR while A+ k+(R) = 0 otherwise. This is just a pattern translation and scale, while the recursion is as simple as can be:
r(x + 1) = (A+k+ χp)(x)+ cTxr(x).
An interesting point about this decay kernel is that it can be ignited in the region of the result. That is, we know that r is zero for all x less than R. So we fill the first R values with zeroes, then the spline actually begins.
The one-dimensional convolution for the negative decay region is exactly the same as the one-dimensional convolution for the positive decay region, except that the recursion should be taken in the opposite direction. This slows the algorithm, since the kc and k+ convolutions can be calculated on a single pass of the pointers, but the k. convolution requires a second, reversed pass.
What follows is one algorithm for implementing the above-mentioned two-dimensional convolution as a separated-spline convolution in accordance with the invention. For the algorithm statement, we will need the following definitions. The k. kernel is
and its annihilator^_is:
A- k x) = k.(x - 1) - dr kfx).
The algorithm takes as an input a pattern/?^,; ) for (x,y) e D, withD being a rectangular domain of given size. It returns r(x,y) - (k xp)(x,y) for all points in D, where k(x,y) is defined in equation (1) above. We ask in this instance that the size of D be fixed only because, due to the nature of the
Laplacian decay, the optimal "constants" describing k are dependent on the width and height of D. Note that the algorithm computes the convolution by treating k(x,y) as k(x,y) =(kc + k+ + k.)(x)(kc + k+ + t)(y) = kx(x)k (y).
The algorithm is presented in a non-optimized manner so that can be more easily understood. It is important to realize that with floating point arithmetic, there will most likely be catastrophic, chaotic behavior on the polynomial convolution, so this must be done with fixed-point numbers. The algorithm, to be referred to as "Algorithm for Quadratic with
Laplacian decay separated-spline convolution," is:
(a) Loop over the rows. For each row, compute the convolution of the row with kx(x) by computing a cap convolution (step i), a positive decay convolution (step ii), and a negative decay convolution (step iii) on the row, adding together (step iv) the three convolutions produced in steps (i), (ii), and
(iii) for said row. One repetition of steps (i), (ii), (iii), and (iv) is performed as follows for each row of the pattern:
(i) Compute the kc convolution: compute (Ackc xp)(x), an annihilated kernel convolved with said row of the pattern, where Ac is as defined above; and compute AC Λ (Ackc x p)(x,y)) using the above-described recursion relation, which is: kc(x+3) =Ackc(x) + kc(x) + 3kc(x+2) - 3kc(x+l), in the case that that the parameters "a" and "b" in kc(x) satisfy a = 1 and b = R;
(ii) Compute the k+ convolution: compute (A+k+ x p)(x), an annihilated kernel convolved with said row of the pattern, where A+ is as defined above. Realize that this is a trivial calculation, since (_4+fc+ x ρ)(x,y) - cd~Rp(x - R,y); and compute A+' (A+k+ x p)(x,y)) using the recursion relation:
(k+ x p)(x+l,y) - (A+k+ x p)(x,y) + d~l(k+ x p)(x,y); (iii) Compute the . convolution: compute (A.k. xp)(x), an annihilated kernel convolved with said row of the pattern, where A. is as defined above. Realize that this is a trivial calculation, since (AJ. x p)(x,y) = cd~Rp(x + R,y); and compute A.' (A.k. x p)(x,y)) by recursing from the end to the beginning using the recursion relation:
(k. xp)(x -l,y) = (A-k. xp)(x,y) + Tx(k. xp)(x,y); (iv) Add the results of steps (i), (ii), and (iii) for said row of the pattern, thereby producing at the end of the last iteration of step (iv), data indicative of kx x p = (kc x p) + (k+ x p) + (k. x /?);
(b) Transpose kx x p to produce (kx x p)τ ;
(c) Repeat step (a), this time on rows of (kx x p)τ rather than on rows of/?, thereby convolving all columns ofkχ x p with kx; (d) transpose the result of step (c) to produce r(x,y), and return.
Typically, the pattern p(x,y) is stored in a memory prior to step (a), each iteration of step (i) is performed on a different row of the pattern p(x,y) read from the memory, and step (b) includes the step of storing (kx x p) in the memory, such that each row of (kx x p)τ occupies the memory locations formerly occupied by a corresponding row of p(x,y).
Some modifications of the algorithm can be made to increase its speed. For example, in all cases the algorithm is preferably performed by a processor programmed to compute each annihilated convolution and simultaneously inflate it. This way, there is no need to store the annihilated convolution before it is flattened. Also, both h, and kc are preferably computed on the same pass through the processor. Also, two full transpositions will typically not be necessary where a column-ferrying technique is employed to compute the cap and positive decay region convolutions, and the column return, negative convolution and add should all be part of the same loop. It is also likely that when employing some types of processors to perform the algorithm, in-place operations are fastest.
Note that as an approximation to the Gaussian, the above-described embodiment of convolution in accordance with the invention is dependent on the size of the pattern since the Laplacian decay is ex and the Gaussian is ex2. Thus the error is dependent on the size of the domain.
Example 2: Squared cosine separated kernel. We next present another separated-spline kernel (a "squared cosine" separated kernel) and an embodiment of the inventive convolution method which employs the squared cosine separated kernel. Convolution using the squared cosine separated kernel is faster and cleaner than convolution using the above-described "quadratic with Laplacian decay" kernel, since convolution using the squared cosine kernel is independent of the pattern's dimension. The squared cosine kernel also allows a mostly in-place method for the set of row convolutions in one pass, and also a ferrying technique with calculations done on the transfer. Since the squared cosine kernel is more stable, floating point arithmetic can be used. This is true, because in the case of polynomials, an error becomes chaotic as that polynomial, whereas for the squared cosine kemel, a small error becomes small oscillatory noise. If the pattern is in some sense "regular," the noise is expected to be unnoticeable for some (very small) level of precision. We define the one-dimensional squared cosine kernel as:
k2(x)
The two dimensional kernel is defined as:
k(x,y) = k2(x)k2(y). (2)
As before, we will write the kernel k2(x) as the sum of two kernels. The first of these is:
kc (x) = 2 cos(7ZX / R) for \x\ ≤ R
The other kernel is
*,(*) = 1 for \X\ ≤ R
For these two kernels, it is true that k2(x) = kc(x) + £,• (x). The kernel kj is easily annihilated by the discrete derivative which we will call A( for notational continuity. This annihilation operator's explicit definition is
Annihilating the cosine, kc is less straightforward, but if one notices that
cos (π(x + 1)/L) + cos(π(x - 1)/L = 2(cos πx/L)(cos π/L),
it is apparent that if we define operator Ac by:
A =Λx +1) +Ax -1) - 2(cos rt tfx),
then kc is annihilated by Ac. The annihilation operator A that we will use is the "product" of these two operators:
AM = Ac o AM + 2) - (1 + 2 ∞aπ LMx + 1) ~M) ~
Λχ- D
A can be seen to annihilate k2, since Ak2 = Ai x Ac k2 which is just At acting on something that has been annihilated, and the same can be said for Akt. Computation shows that the nonzero values of Ak axe:
Ak2 (-R - 1) = 2 sin2(π/2L)
Ak2 (-R) = 2 sin2(π/2L) Ak2 (R - l) =-2 sin2(π/2L) Ak2 (R) =-2 sin2(π/2L). We next present an algorithm for convolution of k(x,y), determined by the above-described squared cosine kernel k (x)k2(y) = (kc(x) + h (x)(kc(y) + k( (y)) with a pattern, p(x,y), to compute r(x, ) in accordance with the invention. This will be very similar to the above-described algorithm for convolution using the above-defined "quadratic with Laplacian decay" kernel. The algorithm, to be referred to as "Algorithm for squared cosine separated-spline convolution," is:
(a) Loop over the rows. For each row of the pattern p(x,y), compute the convolution of said row with k (x). This is done via a one dimensional spline convolution including the steps:
(i) compute the values Ak2(x) x p(x) for said row, for -R - 3 < x ≤ n -1, as follows: Ak2(x) x p(x) = 2sin2(ιτ/2L)(p(x + R) +p(x + R + l) - (p(x~R) -p(x -R + 1));
(ii) use the relations r(-R - 4) = r(-R - 3) = r(-R - 2) = 0 to ignite a recursion operation to find r(x) for the row, via the recursion relation r(x + 2) =Ar(x) + (l + 2cosπ/L)(r(x+l) - r(x)) + r(χ - 1), thereby producing, at the end of the last iteration of step (ii), data indicative of k x p;
(b) Transpose k2 x p to produce (k x /?)τ ; (c) Repeat step (a), this time on rows of (k2 x p)T rather than on rows of p(x,y), thereby convolving all columns of k2 x p with k .
Preferably, a final step of transposing the result of step (c) is performed to produce r(x, y), and the processor then returns to its initial state.
Typically, the pattern p(x,y) is stored in a memory prior to step (a), each iteration of step (i) is performed on a different row of the pattern p(x,y) read from the memory, and step (b) includes the step of storing (k2 x p)τ in the memory, such that each row of (k2 x /?)τ occupies the memory locations formerly occupied by a corresponding row of p(x,y).
The naive computation count for the described algorithm is as follo s: there are h repetitions of step (a), where h is the height; computing Ak involves one multiplication and three adds, done w + R + 3 times (where w is the width); the recursion requires one multiplication and four adds done w + R + 3 times
(where w is the width). Therefore, the entire algorithm on a square VN by
VN array requires:
multiplies and
adds. This can be optimized in a couple of ways. First, one can normalize in the final stage, and let Akxp be computed asp(x + R) +p(x + R + 1) - (p(x - R) -p(x -R + 1)). This saves N + 2VN + 6 multiplies, which would significantly reduce computational costs. Second, all of the multiplies are by the same two fixed numbers, so that this can be optimally hard-coded. Third, and most importantly, is that since this process works on rows independently, it is very easily parallelizable and also vectorizable. The number of adds can also be reduced, but this will typically not significantly affect the computation time. More generally, in a class of preferred embodiments, the inventive method for performing the convolution r = Dp - d x p (with x denoting the convolution), where d is well approximated by (or equal to) a separated-spline kernel, includes the steps of:
(a) specifying the separated-spline kernel as k(x ,..., x„)= kx(x )k2(x2))... kn(xn), where k admits of an operator A = AxA ..An, where Aj is an annihilation or flattening operator that operates on circulant Kj of kernel kj in such a manner that AjKj — Bj is sparse (when Aj is an annihilation operator) or AjKj = Bj is almost everywhere a locally constant matrix (when Aj is a flattening operator) and n is the dimension of pattern ?; (b) calculating qx - Bφ =A\k\ x p for each row of the pattern ?;
(c) back-solving A rx = qx for each said row of the pattern to determine ri = k\ x p for said pattern, by performing a preliminary ignition step in which a small number of components of rx are computed directly, and then determining the rest of the components of rx using a natural recurrence relation determined by the operator A\
(d) transposing rx = k x p generated in step (c), to generate rjT =
(k x p)τ for the pattern, and calculating q2= B2rx τ= A2k2 x (k\ x p)τ for each row of (k p) ; and (e) back-solving A2rτ = q2 for said each row of (k x p)τ to determine r2 T = (k x kx x p)τ for the pattern.
In the case that pattern/? is two-dimensional (n = 2), the result of step
(e) is typically transposed to generate data indicative of r = (k2 x x p), and the result r is a close approximation to the desired convolution Dp . Where n > 2, steps (d) and (e) are repeated for each additional dimension of pattern ? (with the transposition operation as defined above in the Summary), and the transposition of the result of the final repetition of step (e) is a close approximation (or exactly equal) to the desired convolution Dp .
In some implementations of step (c), a small number of the lowest components of r ate computed directly during the preliminary ignition step, and the rest of the components of r are then determined using the natural recurrence relation. In preferred implementations of step (c), a small number of large negative components of r are computed directly during the preliminary ignition step, and the rest of the components of r are then determined using the natural recurrence relation.
In cases in which the kernel d is itself a separated-spline kernel (so that d = k, and K = D), the method yields an exact result (r = Dp). Otherwise, the error inherent in the method is (k - d) x /?, and thus the error is bounded easily. What follows is an example of working source code (in the language C) for a function implementing the separated-spline technique for convolution in accordance with the invention. The kernel specified in the example is the squared cosine kernel discussed above. This function is called by sending pointers to the input pattern (*Pattern), the output array (*Result), and a one- dimensional working buffer (*r). Global variables are the array size, which is NX x NY, and N=max(NX,NY). This code has been optimized to run in minimal memory with few logical clauses. The downside to this is that the simplicity of the algorithm is not readily apparent at first. Notice that most of the code is some variation of the lines:
r[x+l] = c2 * r[x] - r[x-l] + s; s += s2*(q[x+L] + q[x+L+l] - q[x-L] - q[x-L+l]);
so that the actual algorithm is conceptually very compact.
Example: C source code for implementation of squared cosine separated-spline convolution:
Void
Cossplineconv(float ^Pattern, float *Result, float *r) { int x, y; float *q, *Resultptr, s; float c = (float) cos(PI L), c2 = 2*c, s2 = 1, max, norm;
for(y=0; y < NY; y++) { /* X direction convolution */ r[-L-5] = r[-L-4] = 0; s = 0; q = &pattern[N*y]; Resultptr = &Result[N*y];
x = -L-4; r[x+l } = c2 * r[x] - r[x-l] + s; s+=0;
x = -L-3; r[x+l] = c2*r[x]- -r[x-l] + s; s+=0;
x = -L-2; r[x+l] = c2*r[x]- -r[x-l] + s; s+=0;
x = -L-l; r[x+l] = c2 * r[x] - r[x-l] + s; s+=s2*q[x+L+l];
for(x = -L;x<0;++x){ r[x+l] = c2 * r[x] - r[x-l] + s; s += s2*(q[x+L] + q[x+L+l]); }
for(x = 0;x<L-l;-H-x){ r[x+l] = c2 * r[x] - r[x-l] + s; s += s2*(q[x+L] + q[x+L+l]);
}
x = L-l; r[x+l] = c2 * r[x] - r[x-l] + s; s += s2*(q[x+L] + q[x+L+l] - q[χ-L+l]);
for(x = L;x<NX-L-l;χ++) { r[x+l] = c2 * r[x] - r[x-l] + s; s +== s2*(q[x+L] + q[x+L+l] - q[x-L] - q[x-L+l]);
}
x = NX-L-l; r[x+l] = c2 * r[x] - r[x-l] + s; s += s2*(q[x+L] - q[x-L] - q[x-L+l]);
for(x = NX-L; x < NX-1 ; X++) { r[x+l]=c2*r[x]-r[x-l] + s; s -= s2*(q[x-L] + q[x-L+l]);//Note -=.
}
for(x = 0; x < NX; x++)
Resultptr[x] = r[x]; }
for(x=0; x <NX; x++) { /* Y direction convolution */ r[-L-5] = r{-L-4] = 0; s = 0; q = &Result[x];
Resultptr = &Result[x];
y = -L-4; r[y+l}=c2*r[y]-r[y-l] + s; s += 0;
y = -L-3; r[x+l] = c2 * r[y] - r[y-l] + s; s+=0; y = -L-2; r[y+l] = c2*r[y]-r[y-l] + s; s+=0;
y = -L-l; r[y+l] = c2*r[y]-r[y-l] + s; s+=q[N*(y+L+l)];
for(y=-L;y<0;++y){ r[y+l] = c2 * r[y] - r[y-l] + s; s += (q[N*(y+L)] + q[N*(y+L+l)]);
}
for(y = 0;y<L-l;-H-y){ r[y+l]-c2*r[y]-r[y-l] + s; s += (q[N*(Y+l)] + q[N*(y+L+l)]);
}
y = L-l; r[y+l] = c2 *r[y] - r[y-l] + s; s += (q[N*(y+L)] + q[N*y+L+l)] - q[N*(y-L+l)]);
for(y = L;y<NY-L-l;y++) { r[y+l] = c2 * r[y] - r[y-l] +s; s += (q[N*(y+l)] + q[N*(y+L+l)] - q[N*(y-L)] q[N*(y-L+l)]); }
y = NY-L-l;. r[y+l] = c2*r[y]-r[y-l]+s; s += (q[N*(y+L)] - q[N*(y-L)] - q[N*(y-L+l)]);
for(y = NY-L; y < NY-1; y++) { r[y+l] = c2 * r[y] - r[y-l] + s; s -= (q[N*(y-L)] + q[N*(y-L+l )]);//Note -= }
for(y = 0; y < NY; y++)
Resultptr[N*yJ == r[y];
In some 2-dimensional implementations of the inventive method, discrete convolution is performed employing a matrix formalism, whereby a 2- dimensional pixel rectangle is converted into a 1-dimensional column vector using lexicographical indexing. In this case the circulant matrix F becomes an NN-by-NN monstrosity, but when an annihilation operator A is applied, the operator AE will be sparse. This class of embodiments of the invention has the advantage of converting nonvanishing circular regions to deterministically- indexed matrix elements.
Figure 1 is a block diagram of a computer system which embodies the invention. The system includes processor 2 (which is programmed with software for implementing any embodiment of the inventive convolution method), display device 4, input device 6, and memory 8 (and optionally also output device 5) coupled to processor 2. Where processor 2 is a typical processor configured to process binary data, it is programmed with software for implementing a "discrete" implementation of the inventive method. Typically, memory 8 stores data indicative of the circulant D of the convolution kernel d, the circulant Kx of factor kernel x2, ..., x„) = )kx(x2))... k„(x„), the pattern/? to be convolved, intermediate quantities generated during performance of the method, and data indicative of the convolved signal r — Kp resulting from the convolution. In some implementations, processor 2 is programmed to determine (from a user- specified convolution kernel d of interest) particular parameters of a spline kernel k which cause the spline kernel to approximate the convolution kernel d (subject to user-specified constraints). In some implementations, processor 2 generates one or more look-up tables, stores them in memory 8 (or a cache memory associated with processor 2), and then accesses the stored look-up tables during performance of the invention. The user controls processor 2
(including by specifying processing parameters or constraints) using input device 6. Text and images generated by processor 2 (such as representations of a two-dimensional pattern/? to be convolved and the convolution result Kp generated in accordance with the invention) are displayed on display device 4. Output device 5 (which can be employed instead of or in addition to display device 4) is preferably a pattern-capable device such as a sound reproduction unit, an I/O port (input/output port), or a signal processing (and/or storage) device (or system).
Figure 2 is a block diagram of a lithography system including digital signal processor ("DSP") 10 which is configured to perform convolution (in accordance with the invention) on image data stored in memory unit 14. The image data stored in memory unit 14 determines the pattern/? to be convolved. DSP 10 processes the image data to generate output data indicative of the convolution result r =Kp. The output data is stored in memory 14 (and optionally undergoes further processing) and/or is output to "pattern signal" generation device 16. Device 16 generates a pattern signal (e.g., a beam of optical or other electromagnetic radiation having time- varying amplitude or an electron beam having time- varying amplitude) in response to data it receives In a class of embodiments, device 16 emits a beam of optical radiation which is incident on optics 18 to cause optics 18 to project an output beam on lithography target 20. Optics 18 scans the output beam across lithography target 20, in response to scan control signals from control unit 12. The amplitude of the beam emitted from device 16 varies as a function of time (in response to the output data from DSP 10, which assumes the scan pattern determined by the scan control signals from unit 12) in such a manner that the scanned output beam (the output of optics 18) exposes target 20 to a pattern of pixels. In other embodiments, device 16 emits an electron beam which is incident on optics 18, to cause optics 18 to project an output electron beam on lithography target 20. Optics 18 scans the output electron beam across target 20, in response to scan control signals from control unit 12. The amplitude of the electron beam emitted from device 16 varies as a function of time (in response to the output data from DSP 10, which assumes the scan pattern determined by the scan control signals from unit 12) in such a manner that the scanned output beam from optics 18 exposes target 20 to a pattern of pixels. Alternatively, device 16 can emit radiation which is focused (without being scanned) by optics 18 to project on target 20 an image comprising pixels, said image determining a pattern. For example, one embodiment of device 16 emits optical radiation which is focused by optics 18 so as to project from optics 18 as a pattern on target 20, without the need for optics 18 to scan any beam across target 20.
We shall refer to the output of device 16 as a "pattern signal," recognizing that examples of such pattern signal include a beam of optical or other radiation to be scanned by optics 18, an electron beam to be scanned by optics 18, and radiation to be focused by but not scanned by optics 18. Optics 18 can be a set of reflective and/or refractive optics (with or without scanning capability, including means for moving one or more elements of the optics to scan a beam across target 20), or it can be a set of electron beam optics (with scanning capability, including means for moving one or more elements thereof to scan an electron beam across target 20). The output of optics 18 is projected (e.g., including by being scanned) as a pattern on lithography target 20.
Typically, target 20 is a glass plate (so that projection of the pattern thereon produces a mask useful in integrated circuit manufacture) or a semiconductor wafer. Optics 18 typically focuses the pattern signal so that a very small pattern is projected on target 20.
Although the "raw" pattern signal that is output from device 16 determines a pattern, diffraction artifacts (or other artifacts) introduced by optics 18 (or inherent in the interaction between the imaging beam and target
20) may cause the pattern actually produced on target 20 to differ from this pattern. For example, consider the case that the "raw" pattern signal output from device 16 is an electron beam to be focused by electron beam optics 18, and scanned onto a sequence of pixels on target 20, in an effort to project on target 20 a pattern determined by the amplitude of the focused electron beam incident on each single pixel of the sequence. In this case, the well known "proximity problem" (discussed above) causes exposure of an area surrounding each pixel on which the focused electron beam is incident (due to scattering of electrons away from each such pixel to the surrounding areas of the target). As a result, the pattern actually produced on target 20 is determined by supeφosition of the results of directing the focused electron beam at each pixel of the sequence, where a multi-pixel region is exposed each time the focused electron beam is incident at one of the pixels of the sequence.
Thus, DSP 10 is configured to generate output data which will cause device 16 to output a "raw" pattern signal having the characteristics that are needed to produce a desired pattern on target 20. To accomplish this, DSP 10 performs a deconvolution operation on a large array of pixels (image data stored in memory 14) in order to compensate for any artifacts expected to be introduced by optics 18 and/or any expected scattering (by target 20) of an electron beam incident on target 20 from optics 18. The deconvolution operation performed by DSP 10 includes a convolution operation (performed in accordance with the invention) on stored image data that it retrieves from memory 14, where the image data determines a very large array of pixels which in turn determines a pattem "/?". DSP 10 thus processes the image data in accordance with the present invention to generate data indicative of the convolution result r = Kp. The latter data is then asserted to device 16, or is further processed prior to assertion to device 16.
Controller 12 of the Fig. 2 system provides appropriate control signals to units 10, 14, 16, and 18, and is capable (for example) of downloading instructions to DSP 10 to cause it to execute the convolution operation with specified parameters.
Fig. 3 is a block diagram of a digital signal processor (DSP) which can be used as DSP 10 of Fig. 2, and which is configured to perform convolution in accordance with the invention on image data. The DSP of Fig. 3 includes arithmetic computational unit (ACU) 34 which includes addition and multiplication circuitry (for performing the matrix multiplication and recurrence relation operations required to implement the convolution), program memory 30 (which stores the instructions which are executed by the DSP to perform the convolution operation), program control unit (PCU) 32, memory management unit 36, and data memory 38, connected as shown. In response to commands from a user, controller 12 of Fig. 2 loads appropriate instructions into memory 30, and data indicative of a pattern ? (the data labeled "INPUT" in Fig. 3) is loaded into memory 38.
PCU 32 includes instruction fetch circuitry for fetching a sequence of the instructions from program memory 30, instruction decoding circuitry, and registers for storing control bits generated by the decoding circuitry for assertion at appropriate times to unit 36 and/or unit 34.
Memory management unit 36 is configured to generate address signals (each identifying a memory location in memory 38 for writing data to or reading data from) in response to control bits from PCU 32, and to assert such address signals over an address bus to memory 38. Thus, in response to control bits from PCU 32 (which have been generated in PCU 32 by decoding instructions from program memory 30), unit 36 asserts address signals to data memory 38. In response to the addresses asserted by memory management unit 36, data memory 38 sends signals indicative of data to ACU 34 (over a data bus). The resulting output signals from ACU 34 (indicative of partially processed data, or of the final convolution result, r = k x /?, can propagate over the data bus to memory 38 for storage at locations of memory 38 determined by addresses asserted by unit 36 to memory 38. In some implementations, memory
38 functions as an I/O buffer for the DSP, and data indicative of the final convolution result is output from memory 38 (as output data " OUTPUT 1") to pattern signal generator 16. In other implementations, data indicative of the final convolution result streams directly (or through a buffer) to pattern signal generator 16 from ACU 34 (as output data "OUTPUT2").
Fig. 4 is a variation on the system of Fig. 2, in which elements 16, 18, and 20 are identical to identically numbered elements of Fig. 2. In the Fig. 4 embodiment, element 46 is configured to perform convolution (in accordance with any embodiment of the invention) on image data (determining the pattern/? to be convolved) which it receives from memory unit 44. Element 46 (which can be a digital signal processor including digital signal processing circuitry configured to perform convolution on data in accordance with any embodiment of the inventive method, custom or dedicated electronic circuitry configured to perform convolution on data in accordance with any embodiment of the inventive method, or a programmable gate array-based computing system configured to perform convolution on data in accordance with any embodiment of the inventive method) processes the image data to generate output data indicative of the convolution result r = Kp. The output data is streamed directly from DSP to pattern signal generation device 16, and device 16 generates a pattern signal in response to the output data from element 46. Controller 42 of the Fig. 4 system provides appropriate control signals to elements 44, 46, 16, and 18, and is capable (for example) of downloading instructions to element 46 to cause it to execute the convolution operation with specified parameters. It is contemplated that the DSP of Fig. 3 can implement any embodiment of the inventive method. At the end of a convolution operation, processed data indicative of the convolution result r = Kp will have been generated. This data can be streamed directly to device 16, or it can be further processed (e.g., in unit 34) and thereafter asserted to device 16 or to memory 14.
The inventive method can implement any convolution (r = d x /?), provided that the convolution kernel (V) is sufficiently smooth to be adequately approximated by a separated-spline kernel ("k"), in the following sense. Kernel "d" is adequately approximated by separated-spline kernel "k" if the error inherent in the method (which is (k - d) p) is within acceptable limits.
Typically, convolution kernels "d" employed in the field of electron beam lithography proximity error correction are sufficiently smooth to be adequately approximated by a separated-spline kernel "k." Convolution kernels that are noisy (random), such as those encountered in cryptography, are typically not sufficiently smooth to be adequately approximated by a separated-spline kernel
"k."
Fig. 5 is a simplified elevational view of computer-readable storage medium 50 (which is a CD-ROM) which stores computer-executable instructions (software). The instructions are such that a computer performs an embodiment of the inventive method in response to executing the instructions.
Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly Hmited to such specific embodiments. For example, it is contemplated that in some embodiments the invention is implemented by hardwired circuitry (e.g., custom or dedicated electronic circuitry) or FPGA systems (field programmable gate array based computing systems) rather than in software or by a system including a digital signal processor ("DSP").

Claims

WHAT IS CLAIMED IS:
1. A method for performing convolution of an n-dimensional pattern p(x , ..., xn), with a smooth kernel d which is at least approximately equal to a separated-spline kernel, to generate data indicative of a convolution result r =
Dp = d p, where D is the circulant of d, and n is greater than one, said method including the steps of:
(a) specifying the separated-spline kernel as k(x , ..., x„)= k (x )... kn(xn), where kj has circulant £,-, k(x , ..., x„) admits of an operator A =A A2..An, and / is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjKj is almost everywhere a locally constant matrix (when Aj is a flattening operator);
(b) processing pattern data indicative of the pattern/? and kernel data indicative of the kernel kx to generate additional data indicative of qx
A\kx x p for each row of the pattern/?;
(c) processing the additional data to backsolve Axr = qx for said each row of the pattern to determine data indicative of ri = k\ x p for said pattern, by performing a preliminary ignition step in which a small number of components of r1 are computed directly, and then determining all other ones of the components of rx using a natural recurrence relation determined by the operator A ;
(d) processing the data indicative of ri = k x p to generate data indicative of q = A2k2 x (k x p)τ for each row of (k x /?)τ, where rx = (k\ x p)τ denotes transposition of rx ; and
(e) processing the data indicative of q2 - A2k2 x r to backsolve A2r2 =
T r T q2 for said each row of (kx xp) thereby determining data indicative of r2 = (k2 χ kx x p for said pattern/?.
2. The method of claim 1, wherein n = 2, and step (e) includes the step ofperforming a transposition on the data indicative of r2 T = (k2 x k x p)τ to generate data indicative of r = (k2 x k\ x p) for said pattern, where r is a close approximation to the convolution Dp .
3. The method of claim 2, also including the steps of:
(f) generating a pattern signal in response to convolved image data determined at least in part by the data indicative of r = (k2 x k x p) for said pattern; and (g) causing the pattern signal to be incident on a set of optics, and projecting a pattern image from the optics onto a lithography target in response to the pattern signal.
4. The method of claim 3, wherein the pattern signal is an optical signal.
5. The method of claim 3, wherein the pattern signal is an electron beam signal.
6. The method of claim 3, wherein the convolved image data employed in step (f) is said data indicative of r = (k2 x k\ x p) for said pattern.
7. The method of claim 1, wherein n is greater than 2, and also including the step of:
(f) after determining the data indicative of r2 T = (k2 x k x p)τ for said pattern, performing / iterations of steps (d) and (e), wherein during the z'-th iteration of steps (d) and (e), where i is a positive integer and/' = (n - 2), data indicative of qi+2 = Ai+2ki+2 x qι+\ and of ri+ r = (h+2 x ki+x x ... x k xp are generated.
8. The method of claim 1, wherein the pattern data is stored in memory locations of a memory prior to step (b), and wherein step (d) includes the steps of:
(i) storing the data indicative of ri - k x p in the memory such that data indicative of each column of (k x p) occupies a row of the memory locations formerly occupied by the pattern data indicative of a row of /?; and
(ii) during generation of the data indicative of q = A k x (kx x /?)τ for said each row of (kx xp)τ , reading data indicative of one said column of (k x /?) from one said row of the memory locations.
9. A method for performing convolution of a two-dimensional pattern p(x, y), with a smooth kernel d which is at. least approximately equal to a separated-spline kernel, to generate data indicative of a convolution result r = Dp = d p, where D is the circulant of d, said method including the steps of: (a) specifying the separated-spline kernel as k(x,y)= kx(x)k\(y), where
kl(x) = (R2 -x2) for |*| ≤ Λ
and where the kernel kx(x) satisfies:
kx(x) = kc(x) + k+(x) + (x), where kc(x) is a kernel kc(x) = (R2 -x2) for \x\ ≤ R
k+(x) is a one-tailed Laplacian kernel and k.(x) is a one-tailed Laplacian kernel
k_(x) - cdx for jx| < -R
(b) processing pattern data indicative of the pattern/? and kernel data indicative of the kernel kc , the kernel k+, and the kernel k. to generate additional data indicative of a cap convolution kc x p(x) for each row of the pattern/?, a second convolution k+ x p(x) for said each row, and a third convolution k. x p(x) for said each row, and adding together the additional data indicative of the cap convolution, the second convolution, and the third convolution for said each row to generate data indicative of qx - k x p(x,y) - (kc x p(x,y)) + (k+ x p(x,y)) + (t x p(x,y)), wherein the data indicative of the cap convolution is generated by computing
Ackc xp(x) for said each row, where Ac is an annihilation operator defined as
Ackc(x) := kc(x+3) - 3kc(x+2) + 3kc(x+l) - kc(x), and computing Ac 'x(Ackc xp)(x) = kc xp(x) using the recursion relation kc(x+3) = Ackc(x) + kc(x) + 3kc(x+2) - 3kc(x+l), the data indicative of the second convolution is generated by computing
A+k x p(x), for said each row, where A+ is an annihilation operator defined as A+ k+(x) = k+(x + 1) - d~xk+(x), and computing A+ ~x(A±k+ x p)(x) = k+ x p(x) using the recursion relation:
(k+ x p)(x+l,y) = (A+k+ x p)(x,y) + d~x(k+ x p)(x,y), and the data indicative of the third convolution is generated by computing A. k. x p(x), for said each row, where A. is an annihilation operator defined as
A. k.(x) = k.(x - 1) - d~xk.(x), and computing A.'x(A.k. x p)(x) ~ k. x p(x) by recursing from end to beginning using the recursion relation: (k. x p)(x - l,v) = (A± x p)(x,y) + x(k. χ p)(x,y);
(c) transposing the data indicative of kx x p(x,y)) to generate data indicative of (k x p(x,y)) ; and
(d) repeating step (b), this time by processing the data indicative of (kx x p(x,y)y, rather than the pattern data, with the kernel data, thereby generating data indicative of q x = (k (x) x k (y) x p(x,y)) .
10. The method of claim 9, wherein the pattern data is stored in memory locations of a memory prior to step (b), and wherein step (c) is accomplished by storing the data indicative of (k\ x p) in the memory such that data indicative of each column of (k x p) occupies a row of the memory locations formerly occupied by the pattern data indicative of a row of p(x, ).
11. The method of claim 9, also including the step of:
(e) transposing the data indicative of q τ = (k x kx x p)τ to produce data indicative of r = k xp.
12. The method of claim 11, also including the steps of:
(f) generating a pattern signal in response to convolved image data determined at least in part by the data indicative of r = k x p; and (g) causing the pattern signal to be incident on a set of optics, and projecting a pattern image from the optics onto a lithography target in response to the pattern signal.
13. The method of claim 12, wherein the pattern signal is an optical signal.
14. The method of claim 12, wherein the pattern signal is an electron beam signal.
15. A method for performing convolution of a two-dimensional pattern p(x, y), with a smooth kernel d which is at least approximately equal to a separated kernel, to generate data indicative of a convolution result r(x,y) - Dp = d x p, where D is the circulant of d, said method including the steps of:
(a) specifying the separated kernel as k(x,y)~ k2(x)k2(y), with k(x,y) approximating the kernel d, where k2(x) = 2cos2(πx/2R) for \x\ ≤ R,
k2(x) = 0 for |x| > i?;
(b) processing pattern data indicative of each row of the pattern/? and kernel data indicative of the kernel k2 to
(i) generate data indicative of Ak2 x p for said each row, for -R - 3 ≤ x ≤ n -l, where Ak2(x) x p(x) = B(p(x + R) +p(x + R + 1) -p(x
-R) -p(x - R + l)); and
(ii) performing a recursion operation, using the recursion relation r(x + 2) =Ar(x) + E(r(x+1) - r(x)) + r(x - 1), and using the relations r(- R - 4) = r(-R - 3) = r(-R - 2) = 0 to ignite said recursion operation, to find r(x) = A' (Ak2(x) x p(x)) for said each row, thereby producing, at the end of a final iteration of step (ii), data indicative of k2 x p;
(c) transposing the data indicative ofk2 x p to produce data indicative of (k x p)1; and (d) repeating step (b), this time on rows of the data indicative of
(k2 x pΫ rather than on the rows of the pattern data, thereby generating data indicative of convolution of all columns ofk2 x p with k2.
16. The method of claim 15, wherein steps (b), (c), and (d) are performed by a programmed processor, the processor is in an initial state at the start of step (b), and after step (d), the processor performs the additional steps of:
(e) transposing the data indicative of convolution of all columns of k2 x p with k to produce data indicative of the convolution result r(x,y); and
(f) after step (e), returning to the initial processor state.
17. The method of claim 16, also including the steps of:
(g) generating a pattern signal in response to convolved image data determined at least in part by the data indicative of the convolution result r(x,^); and
(h) causing the pattern signal to be incident on a set of optics, and projecting a pattern image from the optics onto a lithography target in response to the pattern signal.
18. The method of claim 17, wherein the pattern signal is an optical signal.
19. The method of claim 17, wherein the pattern signal is an electron beam signal.
20. The method of claim 15, wherein the pattern data is stored in memory locations of a memory prior to step (b), and wherein step (c) is accomplished by storing the data indicative of k% x p in the memory such that data indicative of each column of (k x p) occupies a row of the memory locations formerly occupied by the pattern data indicative of a row of p(x,y).
21. The method of claim 15, wherein B = 2sin (π/2L).
22. The method of claim 15, wherein B — I.
23. A computer system, comprising: a processor programmed with software for convolving first data indicative of an ^-dimensional pattern p(xx, ..., x„), with a smooth kernel d at least approximately equal to a separated-spline kernel, to generate data indicative of a convolution result r = Dp = d x p, where D is the circulant of d, and n is greater than one, by processing second data, indicative of a separated- spline kernel k(xx, ..., x„)= kx(xx )... kn(xn) which approximates d, where kj has circulant .tv/, the kernel k(xx, .... xn) admits of an operator^ = A A ...An, where Aj is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) ox AjKj is almost everywhere a locally constant matrix (when .4; is a flattening operator), with said first data to generate third data indicative of qx =
Axk\ x p for each row of the pattern/?, and processing the third data to backsolve Axrx = qx for said each row of the pattern to determine fourth data indicative of ri = kx x p for said pattern, and processing the fourth data to generate fifth data indicative of <?2 = A2k2 x (k x p) for each row of (k\ x p) , where r r = (k\ x p)τ denotes transposition of r , and processing the fifth data to backsolve A2rτ= q for said each row of (k xp)τ thereby determining sixth data indicative of r2 T = (k x k x p)τ for said pattern; at least one memory coupled to and accessible by the processor, and configured to store at least the first data, the second data, and the fifth data; and an input device coupled to the processor, wherein the processor executes the software in response to at least one control signal from the input device.
24. The system of claim 23, wherein n ~ 2, and the software causes the processor to perform a transposition on the sixth data to generate seventh data indicative of r = (k2 x k x p) for said pattern, wherein the seventh data is said data indicative of the convolution result r = Dp = d p.
25. A digital signal processor, including: digital signal processing circuitry configured to perform arithmetic operations on data including first data indicative of an ^-dimensional pattern p(xx, ..., xn) , and second data indicative of a separated-spline kernel k(xx, ....
Xn)= kx(xx )... kn(x„) , where kj has circulant Kj, the kernel k(x , ..., xn) admits of an operator^ = A\A2...An, whereby is an annihilation or flattening operator which operates on the circulant Kj of kemel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjKj is almost everywhere a locally constant matrix (when Aj is a flattening operator), and said kernel k is at least approximately equal to a smooth kernel d having circulant D, where n is greater than one; and a program control unit coupled to the digital signal processing circuitry, and configured to generate control bits in response to instructions, and to assert the control bits to the digital signal processing circuitry to cause the digital signal processing circuitry to perform convolution on the first data by processing the first data and the second data to generate third data indicative of q =Axk x p for each row of the pattern/?, and processing the third data to backsolve Axrx = qx for said each row of the pattern to determine fourth data indicative of r = kx x p for said pattern, and processing the fourth data to generate fifth data indicative of q2 = A2k2 x (kx x p)r for each row of (k xp)τ, where rx τ = (k x p)T denotes transposition of rx , and processing the fifth data to backsolve A r = q for said each row of (k xp)τ thereby determining sixth data indicative of r2 T = (k x kx x p)τ for said pattern.
26. The system of claim 25, wherein n = 2, and the control bits cause the digital processing circuitry to perform a transposition on the sixth data to generate seventh data indicative of r = (k x k x p) for said pattern.
27. A lithography system, including: a digital signal processor, comprising digital signal processing circuitry and a program control unit coupled to the digital signal processing circuitry, the digital signal processing circuitry being configured to perform arithmetic operations on data, including first data indicative of an /z-dimensional pattern p(x , ..., xn) , and second data indicative of a separated-spline kernel k(x , ..., Xn)- kx(xx )... knn) , where kj has circulant }, the kernel k(xx, ..., xn) admits of an operator^ =A A2...An, where b is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjK is almost everywhere a locally constant matrix (when A is a flattening operator), and said spline kernel k is at least approximately equal to a smooth kernel d having circulant D, where n is greater than one, in response to control bits from the program control unit, to perform convolution on the first data by processing the first data and the second data to generate third data indicative of q =Axk x p for each row of the pattern ?, and processing the third data to backsolve Axr = q\ for said each row of the pattern to determine fourth data indicative of rx = k x p for said pattern, and processing the fourth data to generate fifth data indicative of q%= A k x (k x p) for each row of (kx x' p) , where r = (kx x p) denotes transposition of r , and processing the fifth data to backsolve A2r2 — q for said each row of (k xp)τ thereby determining sixth data indicative of r = (k2 x k\ x )τ for said pattern, and performing a transposition on the sixth data to generate seventh data indicative of r2 = (k2 x kx /?) for said pattern; a pattern signal generator configured to generate a pattern signal in response to convolved image data determined at least in part by said seventh data; and a set of optics positioned so that the pattern signal is incident thereon, wherein the optics projects a pattem image on a lithography target in response to the pattern signal.
28. The lithography system of claim 27, wherein the digital signal processor asserts the seventh data to the pattern signal generator, and the pattem signal generator is configured to generate the pattern signal in response to said seventh data.
29. The lithography system of claim 27, wherein the pattern signal is an optical signal.
30. The lithography system of claim 29, wherein the optical signal is an optical beam and the set of optics is configured to focus the optical beam to generate a focused beam and to scan the focused beam relative to the lithography target.
31. The lithography system of claim 27, wherein the pattern signal is an electron beam signal.
32. The lithography system of claim 31, wherein the set of optics is configured to focus the electron beam signal to generate a focused electron beam and to scan the focused electron beam relative to the lithography target.
33. An apparatus configured to convolve a first signal indicative of an n- dimensional pattern p(x , ..., xn), with a second signal indicative of a separated- spline kernel k(xx, ..., x„)= k (xx )... kn(xn) which approximates a smooth kernel d, where kj has circulant Kj, the kernel k(xx, ..., xn) admits of an operator^ =
A A%..An, where Aj is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjKj is almost everywhere a locally constant matrix (when Aj is a flattening operator), to generate a signal indicative of a convolution result r - Dp = d xp, where D is the circulant of d, and n is greater than one, by generating from the first signal and the second signal a third signal indicative of qx = Axkx x p for each row of the pattern/?, and processing the third signal to backsolve Axr = q for said each row of the pattern to generate a fourth signal indicative of rx = kx x p for said pattern, and generating from the fourth signal a fifth signal indicative of q2 = A k% x (k x T X X T"
/?) for each row of(kx xp) , where rx = (k\ x p) denotes transposition of r\ , and processing the fifth signal to backsolve A2r2 = q2 for said each row of (kx x p)τ thereby generating a sixth signal indicative of r2 T = (k x kx x p)τ for said pattern, whereby said sixth signal is at least approximately indicative of the convolution result r = Dp.
34. The apparatus of claim 33, wherein the apparatus consists of electronic circuitry.
35. The apparatus of claim 33, wherein the apparatus is a programmable gate array-based computing system.
36. A computer-readable storage medium which stores instmctions, wherein the instructions are executable by a computer, and wherein the instructions are such that the computer performs a method for convolving first data indicative of an n-dimensional pattern p(xx, ..., xn), with a smooth kernel d at least approximately equal to a separated-spline kernel k(xx, .... xn)= kx(x )... kn(xn), where kj has circulant Kj, the kernel k(x , ..., xn) admits of an operator A = where Aj is an annihilation or flattening operator which operates on the circulant Kj of kernel kj in such a manner that AjKj is sparse (when Aj is an annihilation operator) or AjKj is almost everywhere a locally constant matrix (when Aj is a flattening operator), to generate data indicative of a convolution result r = Dp = d xp in response to executing said instmctions, where D is the circulant of d, and n is greater than one, said method including the steps of: processing second data, indicative of the separated-spline kernel k(xx, ..., χn)= Axkx x p for each row of the pattern/?; processing the third data to backsolve A rx = qx for said each row of the pattern to determine fourth data indicative of rx = kx x p for said pattern; processing the fourth data to generate fifth data indicative of q = A k2 (k p)τ for each row of (kx x p)τ, where r τ= (k\ x p) τ denotes transposition ofri; processing the fifth data to backsolve ^2r2 T = qi for said each row of (kx /?) thereby determining sixth data indicative of r2 ~ (k x kx x p) for said pattern; and performing a transposition on the sixth data to generate seventh data indicative of r = (k2 x kx x p) for said pattern, whereby said seventh data is at least approximately indicative of the convolution result r = Dp.
EP02774117A 2001-05-29 2002-05-24 Fast signal convolution using separated-spline kernel Withdrawn EP1527397A2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US866890 2001-05-29
US09/866,890 US6912323B2 (en) 2000-01-11 2001-05-29 Method and apparatus for fast signal convolution using separated-spline kernel
PCT/US2002/016475 WO2002097655A2 (en) 2001-05-29 2002-05-24 Fast signal convolution using separated -spline kernel

Publications (1)

Publication Number Publication Date
EP1527397A2 true EP1527397A2 (en) 2005-05-04

Family

ID=25348651

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02774117A Withdrawn EP1527397A2 (en) 2001-05-29 2002-05-24 Fast signal convolution using separated-spline kernel

Country Status (5)

Country Link
EP (1) EP1527397A2 (en)
JP (1) JP2005517999A (en)
KR (1) KR100839391B1 (en)
TW (1) TWI220482B (en)
WO (1) WO2002097655A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI645335B (en) * 2016-11-14 2018-12-21 耐能股份有限公司 Convolution operation device and convolution operation method
TWI616813B (en) * 2016-11-14 2018-03-01 耐能股份有限公司 Convolution operation method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPO478897A0 (en) * 1997-01-24 1997-02-20 Canon Information Systems Research Australia Pty Ltd Scan line rendering of convolutions
US6701028B1 (en) * 2000-01-11 2004-03-02 Applied Materials, Inc. Method and apparatus for fast signal convolution using spline kernel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO02097655A2 *

Also Published As

Publication number Publication date
WO2002097655A2 (en) 2002-12-05
KR100839391B1 (en) 2008-06-20
TWI220482B (en) 2004-08-21
JP2005517999A (en) 2005-06-16
KR20040011514A (en) 2004-02-05
WO2002097655A3 (en) 2005-03-03

Similar Documents

Publication Publication Date Title
US6912323B2 (en) Method and apparatus for fast signal convolution using separated-spline kernel
Gil et al. Computing 2-D min, median, and max filters
US7321890B2 (en) Database and method for organizing data elements
JP3637923B2 (en) Method for operating a processing device
Su et al. Efficient memory partitioning for parallel data access via data reuse
EP1682934B1 (en) Method and device for correcting slm stamp image imperfections
Chan et al. On a nonlinear multigrid algorithm with primal relaxation for the image total variation minimisation
EP1527397A2 (en) Fast signal convolution using separated-spline kernel
US6678868B2 (en) Using Boolean expressions to represent shapes within a layout of an integrated circuit
Grigoriev et al. A fast multi-level boundary element method for the Helmholtz equation
CN117610040A (en) Method for executing number theory transformation NTT based on hardware module and hardware module
US8164593B2 (en) Systems and methods for graphical rendering
Deng et al. High-speed parameterisable Hough transform using reconfigurable hardware
Maleki et al. Automatic hierarchical parallelization of linear recurrences
Zhang et al. HDecoder: a hardware LDPC decoder using high level synthesis for phase modulated collinear holographic storage
Forshaw Speeding up the Marr-Hildreth edge operator
Subramanian A C-to-FPGA solution for accelerating tomographic reconstruction
Dizon et al. Optimization in the construction of cardinal and symmetric wavelets on the line
Alanazi Accelerated FPGA-Based Vector Directional Filter for Real-Time Color Image Denoising with Enhanced Performance.
Ojo et al. Research article implementation of wavelet-based architecture for optimization image filtering
Jia et al. Fast transform from an adaptive multi-wavelet representation to a partial Fourier representation
Suchitra et al. Accelerating rotation of high-resolution images
Pauca et al. Architecture-efficient Strassen's matrix multiplication: A case study of divide-and-conquer algorithms
Schneider et al. An adaptive wavelet-vaguelette algorithm for the solution of PDEs
Srivastava Statistical Method of Estimation of ESE in CBP

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20031217

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH DE GB LI NL

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20081202