CN102411773B

CN102411773B - Vector-processor-oriented mean-residual normalized product correlation vectoring method

Info

Publication number: CN102411773B
Application number: CN2011102133381A
Authority: CN
Inventors: 刘仲; 陈书明; 陈跃跃; 刘衡竹; 陈海燕; 龚国辉; 万江华; 彭元喜; 扈啸; 孙书为
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2011-07-28
Filing date: 2011-07-28
Publication date: 2013-03-27
Anticipated expiration: 2031-07-28
Also published as: CN102411773A

Abstract

The invention discloses a vector-processor-oriented mean-residual normalized product correlation vectoring method. The method comprises the following steps of: setting a reference graph A and a real-time graph B; traversing the real-time graph B and calculating a mean value of pixel values in the real-time graph B and an accumulated sum of pixel value squares Bij2 respectively; traversing the reference graph A and taking two sub graphs Auv and A(u+4)v from the reference graph A each time, and shuffling to obtain four sub graphs A(u+k)v (k=0, 1, 2 and 3); sequentially calculating the accumulated sum of the pixel values, the accumulated sum of (A(u+k)v)ij2 and the accumulated sum of (A(u+k)v)ij*Bij; sequentially calculating the mean-residual normalized product correlation coefficients of the sub graphs A(u+k)v (k=0, 1, 2 and 3) with the real-time graph B; and setting u to be u+4, repeating the steps until the reference graph A is traversed completely so as to acquire all the mean-residual normalized product correlation coefficient values.

Description

The vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented

Technical field

The present invention relates to images match and vectorization thereof compiling field, refer in particular to a kind of vectorization implementation method of going average normalizing eliminate indigestion related coefficient.

Background technology

Along with the computation requirement of the compute-intensive applications such as 4G radio communication, Radar Signal Processing, HD video and Digital Image Processing is more and more higher, single-chip is difficult to satisfy application demand, and polycaryon processor vector processor especially wherein is widely used.Vector processor generally is comprised of a plurality of processor units (PE), usually supports to load and storage based on the data of vector.Each PE comprises independently a plurality of functional parts, generally comprises shifting part, ALU parts, multiplying unit etc.Vector processor is supported SIMD (single instrction/majority according to) operation usually, and namely under the control of same vector instruction, all PE carry out same operation to separately local register simultaneously, in order to the data level concurrency of developing application.

Images match is processed the many high density computing applications in using, as often need to calculate the similarity of benchmark image and realtime graphic based on the images match of template, as poor absolute value and, normalizing eliminate indigestion related coefficient (Normalized Product correlation, Nprod) etc., wherein going average normalizing eliminate indigestion related coefficient to have very strong anti-noise ability, is one of widely used similarity criterion in the images match.But the highly dense processor active task of this class need to scheme in real time with reference map in each subgraph mate one by one calculating, calculated amount is very large.On single-chip processor, usually adopt the fast algorithm that slides to calculate by row, column to reduce calculated amount.But on vector processor, this fast algorithm can not effectively be implemented.How to take full advantage of a large amount of computational resource of vector processor, the multistage parallel of exploitation vector processor improves the vector processor service efficiency, and the vectorization method is crucial efficiently.

Go average normalizing eliminate indigestion Calculation of correlation factor flow process to be, establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M＞m, N＞n; In the reference map take the subgraph of (u, v) upper left angle point as A _Uv, it can be represented by the formula with the average normalizing eliminate indigestion related coefficient of going of scheming in real time B:

ρ (u, v) = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} (({(A_{uv})}_{ij} - \overset{&OverBar;}{A_{uv}}) * (B_{ij} - \overset{&OverBar;}{B}))}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {({(A_{uv})}_{ij} - \overset{&OverBar;}{A_{uv}})}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(B_{ij} - \overset{&OverBar;}{B})}^{2}}}

= \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} ({(A_{uv})}_{ij} * B_{ij}) - mn \overset{&OverBar;}{A_{uv}} \overset{&OverBar;}{B}}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{uv})}_{ij}^{2} - mn {\overset{&OverBar;}{A_{uv}}}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} - mn {\overset{&OverBar;}{B}}^{2}}}

(A wherein _Uv) _IjExpression subgraph A _UvThe pixel value that middle coordinate (i, j) is located, B _IjThe pixel value that coordinate (i, j) is located among the B is schemed in expression in real time.The ρ (u, v) that the above calculates is for expression subgraph A _UvWith the matching degree of real-time figure B go average normalizing eliminate indigestion related coefficient value.In order to calculate best match position, need all subgraphs in the traversal reference map, and calculate one by one subgraph and real-time figure go average normalizing eliminate indigestion related coefficient value, ask for minimum value wherein.It is inferior to need altogether to calculate (M-m) * (N-n), and goes the calculating of average normalizing eliminate indigestion related coefficient value all to relate to the operations such as a large amount of element datas is sued for peace, the element product is sued for peace and add up at every turn, and calculated amount is very large.On single-chip processor, usually adopt the fast algorithm that slides to calculate by row, column to calculate.The advantage of this fast method is the result of calculation of recycling front, avoids a large amount of double countings.But concerning vector processor, vector processor comprises a plurality of processor units on the one hand, the result of calculation difficulty of recycling front, the subgraph pixel data adopts 8 pixel values usually on the other hand, traversal need to be by the byte offset reads image data during reference map, and vector processor does not support that generally the data of striding word boundary read.What lack at present effective vector processor-oriented goes average normalizing eliminate indigestion related coefficient vectorization implementation method.

Summary of the invention

Technical matters to be solved by this invention is: for the problem that prior art exists, the invention provides a kind of principle simple, easy to operate, can efficient calculation, can improve the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of the vector processor-oriented of processor computational resource service efficiency.

For solving the problems of the technologies described above, the present invention by the following technical solutions:

A kind of vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented may further comprise the steps:

(1) establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M＞m, N＞n; Vector processor comprises P processing unit;

(2) the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation With pixel value square B _Ij ²Accumulation and;

(3) vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A _UvAnd A _{(u+4) v}, obtain the subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation _{(u+k) v}(k=0,1,2,3);

(4) adopt the dot product based on SIMD to operate the summation of the value in the processing unit, to the summation of the value between processing unit, calculate successively described subgraph A based on reduction operation _{(u+k) v}The pixel value average of all elements in (k=0,1,2,3)

Pixel value accumulative total and, pixel value square (A _{(u+k) v}) _Ij ²Accumulation and and reference map A and the real-time pixel value product (A of figure B _{(u+k) v}) _Ij* B _IjAccumulation and;

(5) calculate successively subgraph A _{(u+k) v}(k=0,1,2,3) respectively with real-time figure B remove average normalizing eliminate indigestion correlation coefficient ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v);

(6) make u=u+4, repeat above-mentioned steps (3) to step (6) until traveled through reference map A, can calculate reference map A with scheme in real time B all go average normalizing eliminate indigestion related coefficient value.

As a further improvement on the present invention:

In the described step (2), described pixel value average

Computing formula be:

\overset{&OverBar;}{B} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; e_{w};

Pixel value square B _Ij ²Accumulation and computing formula be:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; b_{w};

Wherein, b _w=(B _Iw, B _{I (w+1)}, B _{I (w+2)}, B _{I (w+3)}) be 32 fixed point vectors that 48 pixel values consist of, e _w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;

For p processor unit of vector processor calculates simultaneously based on SIMD

The reduction of again result of calculation of p processor unit being fixed a point summation;

The reduction of again result of calculation of p processor unit being fixed a point summation, L is cycle count and L=mn/4p.

In the described step (4), described pixel value average:

\overset{&OverBar;}{A_{(u + k) v}} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w};

Described pixel value accumulative total and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w}

Described pixel value square (A _{(u+k) v}) _Ij ²Accumulation and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} B_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; a_{w};

Described pixel value product (A _{(u+k) v}) _Ij* B _IjAccumulation and

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {{(A_{(u + k) v})}_{ij} B}_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; b_{w};

A wherein _w=(A _Uv) _Iw, (A _Uv) _{I (w+1)}, (A _Uv) _{I (w+2)}, (A _Uv) _{I (w+3)}) be 32 fixed point vectors that 48 pixel values consist of, e _w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;

For p processor unit of vector processor calculates simultaneously based on SIMD The reduction of again result of calculation of p processor unit being fixed a point summation.

In the described step (5), A _{(u+k) v}(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:

ρ (u + k, v) = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} (({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}}) * (B_{ij} - \overset{&OverBar;}{B}))}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}})}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(B_{ij} - \overset{&OverBar;}{B})}^{2}}} .

= \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} ({(A_{(u + k) v})}_{ij} * B_{ij}) - mn \overset{&OverBar;}{A_{(u + k) v}} \overset{&OverBar;}{B}}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} - mn {\overset{&OverBar;}{A_{(u + k) v}}}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} - mn {\overset{&OverBar;}{B}}^{2}}}

Compared with prior art, the invention has the advantages that:

1, the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented of the present invention, realize simple, with low cost, easy to operate, good reliability, can give full play to the computation capability of whole PE of vector processor, and fully excavated the data parallelism based on SIMD of vector processor, the real-time figure of each traversal can calculate 4 and go average normalizing eliminate indigestion related coefficient value, Effective Raise based on the execution efficient of image matching algorithm in vector processor of going average normalizing eliminate indigestion related coefficient.

2, adopt method of the present invention simpler than traditional vectorization method, efficient, the hardware costs that the object vector processor is realized is low, in the situation that realizes identical function, has reduced power consumption.In addition, method of the present invention realizes simple, with low cost, easy to operate, good reliability.

Description of drawings

Fig. 1 is main-process stream synoptic diagram of the present invention;

Fig. 2 is the subgraph A in the specific embodiment of the invention _UvAnd A _{(u+4) v}Obtain the synoptic diagram of 4 adjacent subgraphs by shuffling operation;

Fig. 3 be vector processor in the specific embodiment of the invention based on the operation of the dot product of SIMD to the summation of the value in the PE, based on reduction operation to the summation of the value between PE synoptic diagram.

Embodiment

Below with reference to Figure of description and specific embodiment the present invention is described in further detail.

As shown in Figure 1, the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented of the present invention may further comprise the steps:

1, establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M＞m, N＞n; Vector processor comprises P processing unit;

2, the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation

With pixel value square B _Ij ²Accumulation and;

The pixel value average

Computing formula be:

\overset{&OverBar;}{B} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; e_{w};

Pixel value square B _Ij ²Accumulation and computing formula be:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; b_{w};

As shown in Figure 3, (a ₀, a ₁, a ₂, a ₃) be 4 element a ₀, a ₁, a ₂, a ₃The vector that consists of, (a _i, a _I+1, a _I+2, a _I+3) be 4 element a _i, a _I+1, a _I+2, a _I+3The vector that consists of, (b ₀, b ₁, b ₂, b ₃) be 4 element b ₀, b ₁, b ₂, b ₃The vector that consists of, (b _i, b _I+1, b _I+2, b _I+3) be 4 element b _i, b _I+1, b _I+2, b _I+3The vector that consists of, two groups of vectors are stored in respectively in the different vector registors.Dot product based on SIMD in the PE operates: (a ₀, a ₁, a ₂, a ₃) and (b ₀, b ₁, b ₂, b ₃) the dot product operating result is (a ₀* b ₀, a ₁* b ₁, a ₂* b ₂, a ₃* b ₃), (a _i, a _I+1, a _I+2, a _I+3) and (b _i, b _I+1, b _I+2, b _I+3) the dot product operating result is (a _i* b _i, a _I+1* b _I+1, a _I+2* b _I+2, a _I+3* b _I+3).That reduction sum operation between PE obtains these two groups of vectors and be: a ₀* b ₀+ a ₁* b ₁+ a ₂* b ₂+ a ₃* b ₃+ ... + a _i* b _i+ a _I+1* b _I+1+ a _I+2* b _I+2+ a _I+3* b _I+3

3, vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A _UvAnd A _{(u+4) v}, obtain the adjacent subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation _{(u+k) v}(k=0,1,2,3);

As shown in Figure 2, equal 2 as example take processing unit PE quantity: processor is got the adjacent vector p1 of 4 elements in two intervals from reference map, and p2, vector length are 4 times of PE quantity of vector processor, i.e. vectorial p1, and the element number of p2 all is 8.Through shuffling adjacent vector v0, v1, v2 and the v3 that obtains 1 element in 4 intervals after the operation.

4, adopt the dot product based on SIMD to operate the summation of the value in the processing unit, to the summation of the value between processing unit, calculate successively subgraph A based on reduction operation _{(u+k) v}The pixel value average of all elements in (k=0,1,2,3)

The pixel value average:

\overset{&OverBar;}{A_{(u + k) v}} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w};

Pixel value accumulative total and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w}

Pixel value square (A _{(u+k) v}) _Ij ²Accumulation and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; a_{w};

Pixel value product (A _{(u+k) v}) _Ij* B _IjAccumulation and

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {{(A_{(u + k) v})}_{ij} B}_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; b_{w};

The reduction of again result of calculation of p processor unit being fixed a point summation.

5, calculate successively subgraph A _{(u+k) v}(k=0,1,2,3) respectively with real-time figure B remove average normalizing eliminate indigestion correlation coefficient ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v);

A _{(u+k) v}(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:

ρ (u + k, v) = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} (({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}}) * (B_{ij} - \overset{&OverBar;}{B}))}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}})}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(B_{ij} - \overset{&OverBar;}{B})}^{2}}},

= \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} ({(A_{(u + k) v})}_{ij} * B_{ij}) - mn \overset{&OverBar;}{A_{(u + k) v}} \overset{&OverBar;}{B}}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} - mn {\overset{&OverBar;}{A_{(u + k) v}}}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} - mn {\overset{&OverBar;}{B}}^{2}}}

With k=0,1,2,3 successively substitutions calculate ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v).

Suppose that pixel value average among the real-time figure B that calculates according to each step of front is the accumulation of b0 and pixel value square and is b1,4 pixel values accumulative totals of reference map A and be respectively s0, s1, s2, s3, the accumulation of pixel value square and be respectively q0, q1, q2, q3, reference map A accumulates with 4 that scheme in real time B element product and is respectively r0, r1, r2, r3, then 4 are removed average normalizing eliminate indigestion related coefficient value ρ 0, and ρ 1, ρ 2, and ρ 3 is calculated as follows:

ρ 0 = \frac{r 0 - s 0 * b 0}{\sqrt{q 0 - \frac{{q 0}^{2}}{mn}} * \sqrt{b 1 - \frac{{b 1}^{2}}{mn}}}

ρ 1 = \frac{r 1 - s 1 * b 0}{\sqrt{q 1 - \frac{{q 1}^{2}}{mn}} * \sqrt{b 1 - \frac{{b 1}^{2}}{mn}}}

ρ 2 = \frac{r 2 - s 2 * b 0}{\sqrt{q 2 - \frac{{q 2}^{2}}{mn}} * \sqrt{b 1 - \frac{{b 1}^{2}}{mn}}}

ρ 3 = \frac{r 3 - s 3 * b 0}{\sqrt{q 3 - \frac{{q 3}^{2}}{mn}} * \sqrt{b 1 - \frac{{b 1}^{2}}{mn}}}

Calculate 4 at every turn and remove average normalizing eliminate indigestion related coefficient value ρ 0, ρ 1, and ρ 2, and ρ 3.

6, make u=u+4, repeat above-mentioned steps 3 to step 6 until traveled through reference map A, can calculate reference map A with scheme in real time B all go average normalizing eliminate indigestion related coefficient value.Can go minimum value in the average normalizing eliminate indigestion related coefficient value to determine optimum matching subgraph coordinate by asking all.

In sum, by the present invention, can support efficiently the vectorization of average normalizing eliminate indigestion related coefficient to calculate, can give full play to the computation capability of whole PE of vector processor, and fully excavated the data parallelism based on SIMD to vector processor, Effective Raise go the execution efficient of average normalizing eliminate indigestion related coefficient in vector processor.

The above only is preferred implementation of the present invention, and protection scope of the present invention also not only is confined to above-described embodiment, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art the some improvements and modifications not breaking away under the principle of the invention prerequisite should be considered as protection scope of the present invention.

Claims

1. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of a vector processor-oriented is characterized in that may further comprise the steps:

(2) the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation

With pixel value square B _Ij ²Accumulation and;

(3) vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A _UvAnd A _{(u+4) v}, obtain the adjacent subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation _{(u+k) v}(k=0,1,2,3);

2. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented according to claim 1 is characterized in that, in the described step (2), and described pixel value average

Computing formula be:

\overset{&OverBar;}{B} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} B_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; e_{w};

Pixel value square B _Ij ²Accumulation and computing formula be:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} b_{w} &CircleTimes; b_{w};

3. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented according to claim 2 is characterized in that, in the described step (4), and described pixel value average:

\overset{&OverBar;}{A_{(u + k) v}} = \frac{1}{mn} Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = \frac{1}{mn} Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w};

Described pixel value accumulative total and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{j = 0}^{4 (p - 1)} {(A_{(u + k) v})}_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; e_{w}

Described pixel value square (A _{(u+k) v}) _Ij ²Accumulation and:

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; a_{w};

Described pixel value product (A _{(u+k) v}) _Ij* B _IjAccumulation and

Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij} B_{ij} = Σ_{l = 0}^{L - 1} Σ_{w = 0}^{p - 1} a_{w} &CircleTimes; b_{w};

A wherein _w=(A _Uv) _Iw, (A _Uv) _{I (w+1)}, (A _Uv) _{I (w+2)}, (A _Uv) _{I (w+3)}Be 32 fixed point vectors that 48 pixel values consist of, e _w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;

4. according to claim 1 and 2 or the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of 3 described vector processor-orienteds, it is characterized in that, in the described step (5), A _{(u+k) v}(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:

ρ (u + k, v) = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} (({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}}) * (B_{ij} - \overset{&OverBar;}{B}))}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {({(A_{(u + k) v})}_{ij} - \overset{&OverBar;}{A_{(u + k) v}})}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(B_{ij} - \overset{&OverBar;}{B})}^{2}}} .

= \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} ({(A_{(u + k) v})}_{ij} * B_{ij}) - mn \overset{&OverBar;}{A_{(u + k) v}} \overset{&OverBar;}{B}}{\sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {(A_{(u + k) v})}_{ij}^{2} - mn {\overset{&OverBar;}{A_{(u + k) v}}}^{2}} * \sqrt{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} {B_{ij}}^{2} - mn {\overset{&OverBar;}{B}}^{2}}}