NZ532757A

NZ532757A - Method and apparatus for identifying a maximum subarray

Info

Publication number: NZ532757A
Application number: NZ53275704A
Authority: NZ
Inventors: Sung Eun Bae; Tadao Takaoka
Original assignee: Canterprise
Priority date: 2004-05-04
Filing date: 2004-05-04
Publication date: 2005-07-29
Also published as: WO2005106704A1

Abstract

A systolic array of processors programmed to identify a maximum subarray in an array of input data, and output the result.

Description

532 75 7 Patents Form No. 5 Our Ref: JT221293 NEW ZEALAND PATENTS ACT 1953 Complete After Provisional No. 532757 Filed: 11 April 2005 COMPLETE SPECIFICATION METHOD AND APPARATUS FOR IDENTIFYING A MAXIMUM SUBARRAY We, CANTERPRISE LIMITED, a New Zealand company of University of Canterbury, 39 Creyke Road, Christchurch, New Zealand hereby declare the invention, for which We pray that a patent may be granted to us and the method by which it is to be performed, to be particularly described in and by the following statement: PT054329500 100568706 1 INTELLECTUAL PROPERTY OFFICE OF N.Z. 11 APR 2005 RECEIVED Method and Apparatus for Identifying a Maximum Subarray 1. Introduction The best known upper bound for the maximum subarray problem is 0(n1'(\og\ogriAogri)V2) [6,7]. It is however unfortunate to find this divide-and-conquer algorithm is still not fast enough for a real time graphic application. Suppose we wish to find the brightest rectangular portion in a motion picture, we may want an algorithm at least as fast as raster operations, so that the algorithm can identify the brightest area with a minimal delay before the next frame begins to be drawn on the screen. Considering that the rasterisation takes 0(mrt) time where m is the height of screen and n is the width, we can not expect to totally eliminate the inter-frame delay unless an algorithm of quadratic upper bound is discovered.

Parallelism is a reasonable approach to overcome this problem. A parallel algorithm may achieve the speedup at the cost of employing multiple processors, while the increased number of processors does not always return better performance. 2. Outline of the Maximum Subarray Problem The maximum subarray problem is to identify a rectangular portion a[k..i,l..j], that has the largest sum in a two dimensional array. We give a two-dimensional array A number of different techniques are well known. For example, 0(m n) time and sub-cubic time sequential algorithms are disclosed in [2,3] and [6,7] respectively. Parallel •3 algorithms achieving O(\ogn) time using 0{n l\ogn) processors are also disclosed in [4,5,8].

The current sequential algorithms are not fast enough for a real time graphic application. Suppose we wish to find the brightest rectangular portion in a motion picture, we may want an algorithm at least as fast as raster operations, so that the algorithm can identify the brightest area with a minimal delay before the next frame begins to be drawn on the screen. Considering that the rasterisation takes 0(mn) time where m is the height of screen and n is the width, we can not expect to totally eliminate the inter-frame delay unless an algorithm of quadratic upper bound is discovered.

The current parallel algorithms, on the other hand, are not very practical as they require an excessive number of processors and the parallel computational model they adopt are very difficult to implement in real life. A good compromise between the speed and the cost is required.

The K maximum subarrays problem is a generalized problem that is interested in the top K rectangular portions of largest sums. No previous technique known for the maximum subarray problem can solve this problem. 3. Summary of the invention It is an object of the invention to identify the maximum subarray in a fast and economical manner, or at least to provide a useful alternative to conventional methods and apparatus.

Accordingly, the invention provides a systolic array of processors programmed to identify a maximum subarray in an array of input data, and output the result.

The invention also provides a method of identifying a maximum subarray, the method comprising loading input data into a systolic array of processors, and identifying the maximum subarray by performing a systolic algorithm with the systolic array.

Typically the systolic array is implemented in a dedicated VLSI chip. A VLSI algorithm is given which finds the maximum subarray in 0(ri) time using 0(n ) processors. For the K maximum subarrays problem, another VLSI algorithm is given, which achieves 0(K*n) time using 0(n2) processors.

The invention may be implemented in a number of devices. For instance the invention may be implemented in an image processor which identifies areas in a field of image data that have a certain quantitative property such as the brightest portion in the image. Alternatively the invention may be implemented in a data mining device, for instance analyzing stock market data. Alternatively the invention may be implemented in a movement detector which seeks to identify the maximum subarray in a field of image difference data, where the image difference data represents the difference between two frames of image data. 4. Brief Description of the drawings Various embodiments of the invention will now be described with reference to the accompanying drawings, in which: Figure. 1 shows the layout of a VLSI circuit and how the data transmission is made. For an m*n size array, the circuit of the same size is needed, where each cell represents a processor.

Figure. 2 shows the design of each cell for the VLSI maximum subarray algorithm. Figure. 3 shows how the prefix sum for a two dimensional array can be computed.

Figure. 4 shows the design of each cell for the VLSI K maximum subarrays algorithm.

Figure. 5 shows how the datum transmitted horizontally and vertically are synchronized at the right time.

Figures. 6a to 6e show the internal process within the VLSI circuit for the maximum subarray problem Figures. 7 and 7b show the internal process within the VLSI circuit for the K maximum subarrays problem.

. Detailed description of the invention VLSI Maximum Subarray Algorithm A VLSI circuit is roughly defined by a two-dimensional systolic array with an additional control circuit. All cells are identical processor elements with a regular interconnection pattern. In the following section we describe an implementation of the sequential algorithm in [2,3] with a VLSI circuit.

Table 1: Definition of registers in a cell Agister V Atoning value assigned to a cell R Atoning rowwise partial sun frcmcel l(i, I) to ce 11 (i, j) Initialization r(i,j)«-v(i,j) Ifclate r(i,j)Wreoeived(ilj-1)+V(i,j) T Atoning vertical simrHtion of r's Initialization t(i,j)<-max{0,r(i,j)} Update t (i, j) ^-max{0, treceived( i-1, j)+r( i, j)} S Atoning value of the max rnm subarray discovered with in (1,1) | (i, j) Initialization s(i,j)<—v(i,j) Ujxlate S(i,j)*HT0X{ t(i,j),Sreceived(i-1,j), SreoeivBcl(i,j-1),S(i,j)} k Atoning rcw nurber of the top boundary of the region evaluated by t(i,j) Initialization • k(i,j)«-i, if t(i,j)>0 • k(i,j)<—i+1, otherwise U|xlate • k( i, j )< kreceived( i"1, j), if t(i,j)>0 • k( i ,j )<— i+1, otheiwise | I Atoning oolurri nurber of the left boindary of the region evaluated by r (i,j) Initialization Ijxlate I(i, j)<— I(i, j)-1 on each transnission frcm left "This algorithn uses a circuit of size for a tv\o-dmensional (m,n) array as showi in Figure 1. Tire cell at (i,j) is denoted by cell(i,j). Each cefI(i,j) has six registers v,r,t,s,k aid I, whose meenings are described in Table 1. The cells are programmed as slxwi in Figure 2. Each cell is also equipped with four extra marrory spaces for storing the coordinates of the msKrniri subarray that have been discovered so far. Solid lines in Figure 2 show how the values of each register are conveyed and dotted lines show how the control signal is delivered. The control unit perfoims the operation, and controls the flaw of data. V\fe also assure that each cell is a/tare of its coordinates (i,j).

Let the values of the six registers of cell(ij) be denoted by v(i,j), r(i,j), t(i,j), k(i,j) and respectively and the in-flow values of a register of cell(i,j) be explicitly marked with subscript "received" such as rreceivec^i,j). Also we denote the coordinates of the maximum subarray by (rl,cl)|(r2,c2)(y), which means the maximum subarray whose sum equals s(i,j) is found in a rectangle defined by (rl,cl) at the upper-left corner and (r2, c2) at the lower-right corner. In our example in Figure 6e, (1,4)|(2,4) of cellQ, 4) at time= 6 indicates that the region that has the sum 5= 13 is defined by the rectangle whose upper-left corner is (1,4) and bottom-right corner is (2,4).

When initiated, the value of v(i,j) is simply copied to r{i,j) and s(i,j) . t(ij) then copies r(i,j) if r(i,j)>0 or 0 otherwise. The row number i is copied to k(ij) if t(i,j)>0, otherwise k(ij)<— z+1. The column numberj is also copied to The coordinates are simply given (i,j)\(i,j) (Refer to time= 0 in Figure 6a).

The transmission of data takes place in two directions- from up or left. While a cell always receives values from up, transmission from left is triggered by the control signal. The control signal is initiated at the top boundary of the circuit, and propagated downwards at each time unit elapse. Every cell in a row receives the control signal at the same time, and starts horizontal transmission. At the next time unit elapse, the control signal is passed downwards triggering horizontal transmission at the next row.

Now we describe how the registers of each cell are updated on a transmission.

If a transmission comes from left, it causes r(i,j)<— rreCeivedOj-1)+ i.e., causes the row-wise partial sum of the left cell, r(i,j-1) to be received and added to v(i,j), then l{i,j) is decremented by 1. As l(i,j)=p indicates that the region currently covered by r(i,j) begins atpth column, its decrement suggests that one more column in the left side is now added to evaluate r(i,j). If the updated r(i,j) is greater than s(i,j), it causes s(i,j)+— r(i,j) and the coordinates (r 1,c 1 )| (r2, c2)aj)<—(i, l(ij-1 ))| (ij), i.e., the row-wise sum and its corresponding region update s(ij) and the coordinates of the subarray. On the other hand, if Sreceived(y-l) which was received from the left cell, is greater than this value and its coordinates are copied, such as s(i,j)<— s(i,j-1) and (/-1, c 1 )| (r2, c2\ij)<—(r\,c\)\ (r2, c2\ij.\y When a transmission comes from up, it causes t(i,j)<— r(/,/)+rreceived(/-lj). As k(i,j) indicates the row number of the top boundary of the region covered by k(i,j) simply inherits the value of k(i-l,j). If t(i,j) is found to be greater than s(i,j), s(ij) is updated and the coordinates of the corresponding region are set, such as (r-\,c-\)\(r-2,c-2\jj)<r-(k(i,j),l(i,j))\(i,j). If t(i,j) is found negative, update t(i,j)<—0 immediately, and set i+l. This operation effectively resets the accumulation and is necessary because the region of negative sum only contributes to making the overall sum smaller, thus subarray corresponding to t(i,j) is better discarded. Since k(i,j) now is /+1, when this cell propagates its register values to a neighbor cell downwards, cell(i+1 J) knows that (/'+ l)th row is the top boundary of the subarray to be discovered. We also update s(ij) if ■WivedO'-V) is greater. On update of we can simply copy the coordinates of the corresponding subarray as they are kept intact, but merely relayed.

One may have noticed that the transmission also comes from the top or left-most boundary to those cells in the first column and cells in the first row as shown in Figure 6. We can assume that there are Oth row and 0th column that simply pass zero-valued data, such as r(i, 0)= 0 and t(0,j)=0. This design of the transmission keeps every cell busy throughout the whole process for computing r(i,j) and t(i,j), in fact, performing the same computation repeatedly. This way, the implementation of the circuit becomes greatly simple whereas it does no vicious effect. During this process, however, l(i,j) and k(i,j) can be less than 1 according to the update scheme stated in Table 1. This is against our intuition since they are meant to be positive row or column numbers. One may restrict them to be updated only when they are greater than 1, while the correct solution is still guaranteed without this extra care. In the snapshot given in Figure 6, l(i,j) and k(i,j) are preserved to be positive.

Note that the update of r(i,j) always precedes that of rth row of the matrix lags one time unit behind (z-l)th row, andy th column lags one time unit behind § -l)th column. This ensures r(i,j) and t(i,j) obtain their possible extension at the right time.

The transmission from cell(i,i) to cell(i+ \,i) spends one unit of time, and similarly one unit of time for the transmission from cell(i,i) to cell(i,i+1). The solution is found at cell(m,n) after the circuit has examined all candidate subarrays as the algorithm in [2,3] does sequentially. Obviously, it takes m units of time to start the horizontal transmission of mth row, which triggers cell(m,n) to start receiving r{m,n-1). Then extra n-2 units of time are spent for mth row to complete its row-wise computation, which makes the total time T=m+n-2.

This observation suggests that this VLSI algorithm solves the maximum subarray problem in 0(n) time with a VLSI circuit of 0(n2) size. In Figure 6e, at time= 6, 5(4,4) has a value 17, and the region refers to is (3,1)|(4,3), which is the solution of the problem.

K maximum subarrays problem In the following section we consider the K maximum subarrays problem which finds the general ^.maximum sums.

If we are only interested in strictly disjoint portions in an array, this problem is trivial and any of known techniques for the maximum subarray problem are applicable. However, in the general case, where overlapping is allowed, we need a new technique. The present invention, the VLSI algorithm for the K maximum subarrays thus involves the design of a new sequential algorithm that can be implemented on a VLSI circuit.

The sequential algorithm designed for the K maxima problem solves one-dimensional and two-dimensional problems in 0(K*ri) time and 0(K*n ) time respectively, and the VLSI algorithm invented solves the two dimensional problem in 0(K*n) time with a circuit of Oin ) size.

New O(n) Algorithm for the Maximum Subsequence Problem The new algorithm has its central algorithmic concept in the prefix sum. The prefix sums sum[\..ri\ of a one-dimensional array a[\..ri\ are computed by ■swmfO] <— 0; for i <— 1 to«do sum[i] <— sum[i -1] + a[i\\ end As sum[x]= a[i], the sum of a[x..y\ is computed by the subtraction of these prefix sums such as: 2];^ =surn[y] - sum[x -1] 8 To yield the maximum subsequence, we have to find indices x,y that maximise «[/]. Then the following lemma is obvious.

Laura 1 For allx,y e [I.ji], and x<y, max(XL «[/]) = max(sum[y] - sum[x -1]) = maxj^ (sum[y] - min^ (sum[x -1])) Based on Lemma 1 we can design the following linear time algorithm that finds the maximum subsequence in a one dimensional array.

While we accumulate the prefix sum, we also maintain the minimum of the preceding prefix sums. By subtracting the minimum prefix sum from the prefix sum, we have a candidate which may become the solution for the maximum subsequence problem.

K Maximum Subsequences Algorithm Expanding this idea, let us proceed to discuss the K maximum subsequences problem, again for the one dimensional case. output: max min *— 0; //mm prefix mm; max ^ oo; //mm subsequence; sum [0] *— 0; for i 4— 1 to n do aim [i]«— sum [i — 1] + a[i]; cand ♦— sum [£]—min ; If cand >msx then | max ♦— cand end if sum [»] <min then | min 4— end end Algorit hm 1 j Maximum sttbsecpetice algorithm Instead of having a single variable that book-keeps the minimum prefix sum, we maintain a list of K minimum prefix sums.

Let mini be the list of K minimum prefix sums for a[\..i\ given by {mint (1),... min, (X)} and candi be the list of {sum(i)- mint (1), sum{i)- mint (2),... sum(i)- mint (K)}.

Let maxi be the list of K maximum subsequences for a[l..i] (then maxn is our goal), and merge(x,y) be the merged list of two sorted sequences x and y. According to the principle of optimality, the following recurrence holds.

Laura2 maxi+1 is the Kmaxmumelements of iww£(TTEKi,candi+i) We maintain the list of K minimum prefix sums and each time a prefix sum is computed, we subtract these K minima from this prefix sum, and prepare a list of candidate K maximum values. These K values are merged with the current maximum subsequences and we choose the best K values among them. After this, we insert the prefix sum that has been computed earlier to the list of K minimum prefix sums. Of course, if this sum is found to be greater than all current K minima, insertion will not take place.

It should be noted that we need to set one of the initial minimum prefix sums to 0. This ensures a correct candidate for the K maxima is computed even if all K minimum prefix sums are positive. In this case, a prefix sum itself is greater than all candidates sum(i)-mirii(\..K). Initialising min,{ 1)=0 provides a simple solution to overcome this problem. output : max [1. JCJ for k<— to K do min [fc] *— oo; max [fc] ■*-—oo; sum [0] +- 0; min [1] *— 0; for i <— 1 to ti do sum [i] <— sum [i — 1] + a[t]; for k 1 to K do cand [k] <— sum[i] — min[fc]; //KMax returns K maximum vdms from the argument army; max -f— jO?Iax(merge(max,ca nd)) —(*); insert sum [i] into min ; end Algorithm 2t K maximum subseqmmc^ algorithm The line (*) in the algorithm preserves the loop-invariant from step i to step /+1 as stated in Lemma 2.

At each iteration, it takes 0(K) time for generating the candidate list, and 0(K) time for merging this list and the list of current maximum subsequences. Inserting a prefix sum into the list of minimum prefix sums may take at least 0(logK) time, which establishes 0(K) overall time for each iteration. As we need to perform n iterations, the final time complexity is 0(K*n). When K= 1, this problem is equivalent to the maximum subsequence problem. The algorithm solves the problem in 0(ri) time, which is comparable to the algorithm in [2,3] K Maximum Subarrays Algorithm Now let us consider the two dimensional version of this problem. There are m rows in the array l..«], and we make the row-wise prefix sum r[i] for each row.

For simplicity, let us denote a rectangular area defined by the upper-left corner (i,g) and the lower-right corner §,h) by P(i,g,j,h) (Figure 3). When we examine a rectangular area j h P(i, l,j,h), we prepare a list of the prefix sums sum where sum[h]= ^ ^ a[row\[col]. row-i col-l 11 Assuming that the row-wise prefix sums r\l..m] are available, sum is obtained by sum[h]= ^ r[row]. On this prefix sum sum, we can run the K maximum subsequences row~i algorithm which costs 0{K*n) time.

Note that there are m+(m-l)+...+2+l=m(m+\y2 such rectangles P(i,\,j,n) in array a, such as P( 1,1,1,»)^P(1,1,2,n),...P{\, 1 ,m,ri), P(2,l,2,n), ...P(2, l,m,n),...P(m, \,m,n). There are as many prefix sum sums as the number of such rectangles P(i, 1 ,j,ri) in this array, and we need to run the K maximum subsequences algorithm on each of these sums. This results in the overall time complexity 0(K*n*m2). The complete algorithm is given in Algorithm 3. 12 output: max [1..KJ compute mw-mse prefix mm; for i <— 1 to m do for j 4-1 ton do r[i][j] 4— r[i][j -1] + a[i][j]; end for k<—l to K do max [k] * 00; sum [0] *— 0; min [1] <— 0; for I *—1 to n do su mabove [I] <— 0; Take a pmfix mm mm containing mm [i.j] and perform K mmirnum subsequences algorithm; for i 1 to m do //Reset sumabove. sum starts from rvw i; for It—I to n do sumabove [1] <— 0; for j 4— i t o m do //Reset min ; for k <— 1 to K do min [k] *— 00; min [1] *— 0; Z/Solm K maximum subsequencesr problem: for 14— 1 to n do sum [/] <— sumabove [I] + r[/]; for k *- 1 to K do | cand [k] <— sum [J] — man [ft]; end max iiTvlax{merge(max»cancl)); insert sum [I] into min ; end //Copy sum to sumabove ; for I *— 1 to n do sumabove [I] <—sum [I]; end end Algorithm 3's K maximum swbarrays algorithm VLSI K Maximum Subarrays Algorithm 13 Based on the sequential algorithm described above, let us proceed to design a VLSI algorithm for the K maximum subarrays problem.

The circuit layout of the VLSI algorithm for this problem remains same as previous design shown in Figure 1. Major difference, however, may be found in the internal design of cell and data transmission policy. This is because a different sequential algorithm, Algorithm 3 is adopted.

Each cell(i,j) has registers v,r,s and m\...mK, M\...Mk, whose meanings are described in Table 2. Let the register of cell(i,j) have its coordinates suffixed, such as s(i,j), etc., and m\...mK (ij) be an abbreviation of m\(i,j)...mK(ij) in this text.

Tdble 2: Definition of registers in a cell Register V rreening value assigned to a cell R meaning | rowwise partial simfrcmcell(i,l) to eell(i,j) initialisation! r(i,J)«-v(i,j) update | r(i,j)< rreceived(i>j"1)+v(i.j) | S mining Prefix sun | initialisation s(i,j)<—0 | update S( i, j )* Speceived(i"11 j )+rreceived(' i j "1 )+V(' i j) * rrv.JTk mBEning Kmin him prefix suns initialisation • rrv<-0 • rr^.. JTV: » update rr).. jTk«-mini...K{ SreceivedCi,j-1), rrn.. Jrvc(i,j-1)} M,...IVk mining Kmaxhnun subarrays initialisation 8 i update m..M< *-nrn.xM..M<TOiv*(iJ-1), Mi-■■ is^<reoe,ved( i-1 ,j),s(i,j)-1 nm..JTv(i,j)} | 14 The solid lines in Figure 4 deliver the value of a register, and the dotted lines show how these registers are updated. The control unit performs the operation and controls the flow of data.

When initiated, the value of v(i,j) is simply copied to r(i,j) and s(i,j) is set to 0 as there is no prefix sum computed. and are initialised to all -oo and oo respectively. Note that m\ is set to 0.

The transmission of data takes place in two directions, from up or left. The transmission from up is triggered by a control signal and the transmission from left comes with the control signal. The control signal is initiated at the left boundary of the circuit, and propagated towards the right at each time unit elapse. Every cell in a column receives the control signal at the same time, and starts vertical transmission at the next time unit elapse.

Now we describe how the registers of each cell are updated on a transmission.

If a transmission comes from left, it causes r(i,j)-^-rreceiVedi,j-1)+ v(y), i.e., causes the row-wise partial sum of the left cell r(i,j-1) to be delivered and updates r(i,f) after being added to the value of this cell v(/j). cell(i,j) also gets sreceive^i,j-1) and inserts this value into , if it fits in.

The Kminimum prefix sums m\...mK(i,j-\) and the K maximum subarrays also flow into this cell. Each of them is inserted into and if necessary.

During the updates, it also checks if there is a transmission from up. If so, it initialises its nt\...mK because this cell now works for a new rectangle consisting of more rows above, as illustrated in Figure 4. No transmission from up suggests that this cell is evaluating a portion consisting of only one row.

When a transmission comes from up, it causes s(i,j)<^sreceived(i-lj)+ rreceivedhj-^)+ v0>7) and each of received is inserted into Every cell where a transmission takes place, regardless of the type of transmission, prepares a list of candidate K maxima by subtracting the minimum prefix sums from the prefix sum s. These candidates are inserted into the maximum subarray list if they fit, so that the best K maxima can be taken. The prefix sum s(i,j) is passed towards right so that its right neighbour cell(i,j+1) may produce 1).

The solution is found at cell(m,n) after the circuit has examined all candidate portions P(i, 1 j,ri) for / < j e [l..w] as Algorithm 3 does sequentially.

The K maximum subarrays problem for a rectangle P{i, l,j,h) is solved when cell(j[,h) gets the data originated from cell(i, 1) because it is when the sum s of all cells in P(i,\,j,h) is computed. Let r(i,j,h) be the number of data transmissions needed for computing the K maxima in a portion P(i, \,j,h) as shown in Figure 5. Then we show by induction that r(i,j,h) = j + h-i According to the algorithm design, the solution is found at cellQ, h) for this portion and s of cell(j,h) is computed by s*—sreceive&-1.h)+ rreceivec$,h-\)+ v§,h), meaning that it needs to receive relevant data from its upper and left neighbours. As P(i,\,j,h) has the data dependency on P(i, \,j-\,h) and P(i, we have to first consider the number of time units taken for computing P(i,\,j-l,h) and P(i,\,j,h-\), i.e., r{i,j-\,h) and r(i,j,h-1) respectively. Since it takes 1 time unit for the transmission from a cell to its adjacent neighbour and those values from P(i, \,j-\,h) and P(i, \,j,h-\) arrive at cell{i,j) at the same time, we have the following recurrence relation, r(i, j, h) = r(i, j-l,h) + l = r(i, j,h-1) +1 Following relations are trivial. r(i, i, h) = h = j-i+l Inductively, r{i,j,h) = r{i,j-\,h) + \ = / -1 +h -i +1 = j +h -i and, T(i,j,h) = r(i,j,h-l) + l = j +h -1 -i +1 =j +h -i 16 This result shows that this VLSI algorithm requires t(l,m,n) = m + n-l data transmissions to complete. The actual time spent for each transmission, however, is dependent on the size K, as the register values are transmitted through a single line connecting two cells. 1 While the present invention has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departure from the spirit or scope of applicant's general inventive concept. 6. Bibliography 1 BATCHER, K. E.

Sorting networks and their applications.

In Proc. 4 FIPS FJCC (1968), vol. 32, pp. 307-314 2 BENTLEY, J.

Programming pearls: algorithm design techniques.

Commun. ACM 21, 9 (1984), 865-873. 1 Certainly, one may consider creating K connection lines so that all 0(K) datum can be transmitted simultaneously. This increases the complexity of the VLSI circuit to 0(K*n2), while it does not improve the total time T=0(K*ri). This is because each cell needs to spend 0(K) time anyway for merging the list of candidates and the current solution set. However, the 0(K) time spent in a cell can be reduced to 0(log K) when every cell has a merging network embedded due to Batcher [1], then T=0(nlog K) total time. This still fails to achieve an optimal parallelism, in terms of the total cost. In addition, each cell spends 0(K) time for merging K candidates and K current maxima, thus the total time is T= 0(K*t( 1, m, n))= 0(K*n) with a VLSI circuit of P= 0(n2) size. 17 3 BENTLEY, J.

Programming pearls: perspective on performance.

Commun.ACM.Tl, 11 (1984), 1087-1092. 4 JODREY, K. Q.

Parallel maximum sum algorithms on interconnection networks.

PERUMALLA, K., AND DEO, N.

Parallel algorithms for maximum subsequence and maximum subarray Parallel Processing Letters 5,3 (1995), pp. 367-373 6 TAKAOKA, T.

Efficient algorithms for the maximum subarray problem by distance matrix multiplication. In Electronic Notes in Theoretical Computer Science (2002), J. Harland, Ed., vol. 61, Elsevier. 7 TAMAKI, H„ AND TOKUYAMA, T.

Algorithms for the maximum subarray problem based on matrix multiplications.

In Proc. of the ninth annual ACM-SIAM symposium on Discrete algorithms (1998), Society for Industrial and Applied Mathematics, pp. 446-452. 8 WEN, Z.

Fast parallel algorithms for the maximum sum problem.

Parallel Computing 21,3 (1995), pp. 461-466 18

Claims

1. A systolic array of processors programmed to identify a maximum subarray in an array of input data, and output the result.

2. A systolic array according to claim 1 programmed to identify K maximum subarrays, where K is an integer greater than or equal to two.

3. A systolic array according to claim 2 programmed to calculate one or more prefix sums.

4. A systolic array according to claim 2 or 3 wherein the K maximum subarrays overlap.

5. An image processor including a systolic array programmed to identify a maximum subarray in an array of image data.

6. An image processor according to claim 5 programmed to identify K maximum subarrays, where K is an integer greater than or equal to two.

7. An image processor according to claim 6 programmed to calculate one or more prefix sums.

8. An image processor according to claim 6 or 7 wherein the K maximum subarrays overlap.

9. A movement detector including a systolic array programmed to identify a maximum subarray in an array of image difference data, the image difference data representing the difference between two frames of image data.

10. A movement detector according to claim 9 programmed to identify K maximum subarrays, where K is an integer greater than or equal to two.

11. A movement detector according to claim 10 programmed to calculate one or more prefix sums. 19

12. A movement detector according to claim 10 or 11 wherein the K maximum subarrays overlap.

13. Software configured to program a systolic array of processors to identify a maximum subarray in an array of input data, and output the result.

14. Software according to claim 13 configured to program the array to identify K maximum subarrays, where K is an integer greater than or equal to two.

15. Software according to claim 14 configured to program the array to calculate one or more prefix sums.

16. Software according to claim 14 or 15 wherein the K maximum subarrays overlap.

17. A method of identifying a maximum subarray, the method comprising loading input data into a systolic array of processors, and identifying the maximum subarray by performing a systolic algorithm with the systolic array.

18. A method according to claim 17 wherein the systolic algorithm identifies K maximum subarrays, where K is an integer greater than or equal to two.

19. A method according to claim 18, wherein the systolic algorithm includes the calculation of one or more prefix sums.

20. A method according to claim 18 or 19 wherein the K maximum subarrays overlap.

21. A systolic array of processors substantially as described herein with reference to Figures 2 to 7b.

22. An image processor substantially as described herein with reference to Figures 2 to 7b.

23. A movement detector substantially as described herein with reference to Figures 2 to 7b. 20

24. Software configured to program a systolic array of processors substantially as described herein with reference to Figures 2 to 7b

25. A method of identifying a maximum subarray substantially as described herein with reference to Figures 2 to 7b. CANTERPRISE LIMITED By Their Attorneys INTELLECTUAL PROPERTY OFFICE OF N.Z 11 APR 2005 21