CN102411773B - Vector-processor-oriented mean-residual normalized product correlation vectoring method - Google Patents

Vector-processor-oriented mean-residual normalized product correlation vectoring method Download PDF

Info

Publication number
CN102411773B
CN102411773B CN2011102133381A CN201110213338A CN102411773B CN 102411773 B CN102411773 B CN 102411773B CN 2011102133381 A CN2011102133381 A CN 2011102133381A CN 201110213338 A CN201110213338 A CN 201110213338A CN 102411773 B CN102411773 B CN 102411773B
Authority
CN
China
Prior art keywords
sigma
pixel value
overbar
processor
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2011102133381A
Other languages
Chinese (zh)
Other versions
CN102411773A (en
Inventor
刘仲
陈书明
陈跃跃
刘衡竹
陈海燕
龚国辉
万江华
彭元喜
扈啸
孙书为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN2011102133381A priority Critical patent/CN102411773B/en
Publication of CN102411773A publication Critical patent/CN102411773A/en
Application granted granted Critical
Publication of CN102411773B publication Critical patent/CN102411773B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a vector-processor-oriented mean-residual normalized product correlation vectoring method. The method comprises the following steps of: setting a reference graph A and a real-time graph B; traversing the real-time graph B and calculating a mean value of pixel values in the real-time graph B and an accumulated sum of pixel value squares Bij2 respectively; traversing the reference graph A and taking two sub graphs Auv and A(u+4)v from the reference graph A each time, and shuffling to obtain four sub graphs A(u+k)v (k=0, 1, 2 and 3); sequentially calculating the accumulated sum of the pixel values, the accumulated sum of (A(u+k)v)ij2 and the accumulated sum of (A(u+k)v)ij*Bij; sequentially calculating the mean-residual normalized product correlation coefficients of the sub graphs A(u+k)v (k=0, 1, 2 and 3) with the real-time graph B; and setting u to be u+4, repeating the steps until the reference graph A is traversed completely so as to acquire all the mean-residual normalized product correlation coefficient values.

Description

The vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented
Technical field
The present invention relates to images match and vectorization thereof compiling field, refer in particular to a kind of vectorization implementation method of going average normalizing eliminate indigestion related coefficient.
Background technology
Along with the computation requirement of the compute-intensive applications such as 4G radio communication, Radar Signal Processing, HD video and Digital Image Processing is more and more higher, single-chip is difficult to satisfy application demand, and polycaryon processor vector processor especially wherein is widely used.Vector processor generally is comprised of a plurality of processor units (PE), usually supports to load and storage based on the data of vector.Each PE comprises independently a plurality of functional parts, generally comprises shifting part, ALU parts, multiplying unit etc.Vector processor is supported SIMD (single instrction/majority according to) operation usually, and namely under the control of same vector instruction, all PE carry out same operation to separately local register simultaneously, in order to the data level concurrency of developing application.
Images match is processed the many high density computing applications in using, as often need to calculate the similarity of benchmark image and realtime graphic based on the images match of template, as poor absolute value and, normalizing eliminate indigestion related coefficient (Normalized Product correlation, Nprod) etc., wherein going average normalizing eliminate indigestion related coefficient to have very strong anti-noise ability, is one of widely used similarity criterion in the images match.But the highly dense processor active task of this class need to scheme in real time with reference map in each subgraph mate one by one calculating, calculated amount is very large.On single-chip processor, usually adopt the fast algorithm that slides to calculate by row, column to reduce calculated amount.But on vector processor, this fast algorithm can not effectively be implemented.How to take full advantage of a large amount of computational resource of vector processor, the multistage parallel of exploitation vector processor improves the vector processor service efficiency, and the vectorization method is crucial efficiently.
Go average normalizing eliminate indigestion Calculation of correlation factor flow process to be, establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M>m, N>n; In the reference map take the subgraph of (u, v) upper left angle point as A Uv, it can be represented by the formula with the average normalizing eliminate indigestion related coefficient of going of scheming in real time B:
ρ ( u , v ) = Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( ( A uv ) ij - A uv ‾ ) * ( B ij - B ‾ ) ) Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A uv ) ij - A uv ‾ ) 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 ( B ij - B ‾ ) 2
= Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A uv ) ij * B ij ) - mn A uv ‾ B ‾ Σ i = 0 m - 1 Σ j = 0 n - 1 ( A uv ) ij 2 - mn A uv ‾ 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 - mn B ‾ 2
(A wherein Uv) IjExpression subgraph A UvThe pixel value that middle coordinate (i, j) is located, B IjThe pixel value that coordinate (i, j) is located among the B is schemed in expression in real time.The ρ (u, v) that the above calculates is for expression subgraph A UvWith the matching degree of real-time figure B go average normalizing eliminate indigestion related coefficient value.In order to calculate best match position, need all subgraphs in the traversal reference map, and calculate one by one subgraph and real-time figure go average normalizing eliminate indigestion related coefficient value, ask for minimum value wherein.It is inferior to need altogether to calculate (M-m) * (N-n), and goes the calculating of average normalizing eliminate indigestion related coefficient value all to relate to the operations such as a large amount of element datas is sued for peace, the element product is sued for peace and add up at every turn, and calculated amount is very large.On single-chip processor, usually adopt the fast algorithm that slides to calculate by row, column to calculate.The advantage of this fast method is the result of calculation of recycling front, avoids a large amount of double countings.But concerning vector processor, vector processor comprises a plurality of processor units on the one hand, the result of calculation difficulty of recycling front, the subgraph pixel data adopts 8 pixel values usually on the other hand, traversal need to be by the byte offset reads image data during reference map, and vector processor does not support that generally the data of striding word boundary read.What lack at present effective vector processor-oriented goes average normalizing eliminate indigestion related coefficient vectorization implementation method.
Summary of the invention
Technical matters to be solved by this invention is: for the problem that prior art exists, the invention provides a kind of principle simple, easy to operate, can efficient calculation, can improve the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of the vector processor-oriented of processor computational resource service efficiency.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
A kind of vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented may further comprise the steps:
(1) establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M>m, N>n; Vector processor comprises P processing unit;
(2) the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation With pixel value square B Ij 2Accumulation and;
(3) vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A UvAnd A (u+4) v, obtain the subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation (u+k) v(k=0,1,2,3);
(4) adopt the dot product based on SIMD to operate the summation of the value in the processing unit, to the summation of the value between processing unit, calculate successively described subgraph A based on reduction operation (u+k) vThe pixel value average of all elements in (k=0,1,2,3)
Figure BDA0000079299370000022
Pixel value accumulative total and, pixel value square (A (u+k) v) Ij 2Accumulation and and reference map A and the real-time pixel value product (A of figure B (u+k) v) Ij* B IjAccumulation and;
(5) calculate successively subgraph A (u+k) v(k=0,1,2,3) respectively with real-time figure B remove average normalizing eliminate indigestion correlation coefficient ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v);
(6) make u=u+4, repeat above-mentioned steps (3) to step (6) until traveled through reference map A, can calculate reference map A with scheme in real time B all go average normalizing eliminate indigestion related coefficient value.
As a further improvement on the present invention:
In the described step (2), described pixel value average
Figure BDA0000079299370000031
Computing formula be:
B ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 B ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) B ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ e w ;
Pixel value square B Ij 2Accumulation and computing formula be:
Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 = Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ b w ;
Wherein, b w=(B Iw, B I (w+1), B I (w+2), B I (w+3)) be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure BDA0000079299370000034
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000035
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA0000079299370000036
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000037
The reduction of again result of calculation of p processor unit being fixed a point summation, L is cycle count and L=mn/4p.
In the described step (4), described pixel value average:
A ( u + k ) v ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w ;
Described pixel value accumulative total and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w
Described pixel value square (A (u+k) v) Ij 2Accumulation and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 B ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ a w ;
Described pixel value product (A (u+k) v) Ij* B IjAccumulation and Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij B ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ b w ;
A wherein w=(A Uv) Iw, (A Uv) I (w+1), (A Uv) I (w+2), (A Uv) I (w+3)) be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure BDA00000792993700000312
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA00000792993700000313
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA00000792993700000314
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA00000792993700000315
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA0000079299370000041
For p processor unit of vector processor calculates simultaneously based on SIMD The reduction of again result of calculation of p processor unit being fixed a point summation.
In the described step (5), A (u+k) v(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:
ρ ( u + k , v ) = Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) * ( B ij - B ‾ ) ) Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 ( B ij - B ‾ ) 2 .
= Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij * B ij ) - mn A ( u + k ) v ‾ B ‾ Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 - mn A ( u + k ) v ‾ 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 - mn B ‾ 2
Compared with prior art, the invention has the advantages that:
1, the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented of the present invention, realize simple, with low cost, easy to operate, good reliability, can give full play to the computation capability of whole PE of vector processor, and fully excavated the data parallelism based on SIMD of vector processor, the real-time figure of each traversal can calculate 4 and go average normalizing eliminate indigestion related coefficient value, Effective Raise based on the execution efficient of image matching algorithm in vector processor of going average normalizing eliminate indigestion related coefficient.
2, adopt method of the present invention simpler than traditional vectorization method, efficient, the hardware costs that the object vector processor is realized is low, in the situation that realizes identical function, has reduced power consumption.In addition, method of the present invention realizes simple, with low cost, easy to operate, good reliability.
Description of drawings
Fig. 1 is main-process stream synoptic diagram of the present invention;
Fig. 2 is the subgraph A in the specific embodiment of the invention UvAnd A (u+4) vObtain the synoptic diagram of 4 adjacent subgraphs by shuffling operation;
Fig. 3 be vector processor in the specific embodiment of the invention based on the operation of the dot product of SIMD to the summation of the value in the PE, based on reduction operation to the summation of the value between PE synoptic diagram.
Embodiment
Below with reference to Figure of description and specific embodiment the present invention is described in further detail.
As shown in Figure 1, the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented of the present invention may further comprise the steps:
1, establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M>m, N>n; Vector processor comprises P processing unit;
2, the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation
Figure BDA0000079299370000051
With pixel value square B Ij 2Accumulation and;
The pixel value average
Figure BDA0000079299370000052
Computing formula be:
B ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 B ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) B ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ e w ;
Pixel value square B Ij 2Accumulation and computing formula be:
Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 = Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ b w ;
Wherein, b w=(B Iw, B I (w+1), B I (w+2), B I (w+3)) be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure BDA0000079299370000055
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000056
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA0000079299370000057
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000058
The reduction of again result of calculation of p processor unit being fixed a point summation, L is cycle count and L=mn/4p.
As shown in Figure 3, (a 0, a 1, a 2, a 3) be 4 element a 0, a 1, a 2, a 3The vector that consists of, (a i, a I+1, a I+2, a I+3) be 4 element a i, a I+1, a I+2, a I+3The vector that consists of, (b 0, b 1, b 2, b 3) be 4 element b 0, b 1, b 2, b 3The vector that consists of, (b i, b I+1, b I+2, b I+3) be 4 element b i, b I+1, b I+2, b I+3The vector that consists of, two groups of vectors are stored in respectively in the different vector registors.Dot product based on SIMD in the PE operates: (a 0, a 1, a 2, a 3) and (b 0, b 1, b 2, b 3) the dot product operating result is (a 0* b 0, a 1* b 1, a 2* b 2, a 3* b 3), (a i, a I+1, a I+2, a I+3) and (b i, b I+1, b I+2, b I+3) the dot product operating result is (a i* b i, a I+1* b I+1, a I+2* b I+2, a I+3* b I+3).That reduction sum operation between PE obtains these two groups of vectors and be: a 0* b 0+ a 1* b 1+ a 2* b 2+ a 3* b 3+ ... + a i* b i+ a I+1* b I+1+ a I+2* b I+2+ a I+3* b I+3
3, vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A UvAnd A (u+4) v, obtain the adjacent subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation (u+k) v(k=0,1,2,3);
As shown in Figure 2, equal 2 as example take processing unit PE quantity: processor is got the adjacent vector p1 of 4 elements in two intervals from reference map, and p2, vector length are 4 times of PE quantity of vector processor, i.e. vectorial p1, and the element number of p2 all is 8.Through shuffling adjacent vector v0, v1, v2 and the v3 that obtains 1 element in 4 intervals after the operation.
4, adopt the dot product based on SIMD to operate the summation of the value in the processing unit, to the summation of the value between processing unit, calculate successively subgraph A based on reduction operation (u+k) vThe pixel value average of all elements in (k=0,1,2,3)
Figure BDA0000079299370000061
Pixel value accumulative total and, pixel value square (A (u+k) v) Ij 2Accumulation and and reference map A and the real-time pixel value product (A of figure B (u+k) v) Ij* B IjAccumulation and;
The pixel value average:
A ( u + k ) v ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w ;
Pixel value accumulative total and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w
Pixel value square (A (u+k) v) Ij 2Accumulation and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ a w ;
Pixel value product (A (u+k) v) Ij* B IjAccumulation and Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij B ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ b w ;
A wherein w=(A Uv) Iw, (A Uv) I (w+1), (A Uv) I (w+2), (A Uv) I (w+3)) be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure BDA0000079299370000066
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000067
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA0000079299370000068
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA0000079299370000069
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure BDA00000792993700000610
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure BDA00000792993700000611
The reduction of again result of calculation of p processor unit being fixed a point summation.
5, calculate successively subgraph A (u+k) v(k=0,1,2,3) respectively with real-time figure B remove average normalizing eliminate indigestion correlation coefficient ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v);
A (u+k) v(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:
ρ ( u + k , v ) = Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) * ( B ij - B ‾ ) ) Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 ( B ij - B ‾ ) 2 ,
= Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij * B ij ) - mn A ( u + k ) v ‾ B ‾ Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 - mn A ( u + k ) v ‾ 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 - mn B ‾ 2
With k=0,1,2,3 successively substitutions calculate ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v).
Suppose that pixel value average among the real-time figure B that calculates according to each step of front is the accumulation of b0 and pixel value square and is b1,4 pixel values accumulative totals of reference map A and be respectively s0, s1, s2, s3, the accumulation of pixel value square and be respectively q0, q1, q2, q3, reference map A accumulates with 4 that scheme in real time B element product and is respectively r0, r1, r2, r3, then 4 are removed average normalizing eliminate indigestion related coefficient value ρ 0, and ρ 1, ρ 2, and ρ 3 is calculated as follows:
ρ 0 = r 0 - s 0 * b 0 q 0 - q 0 2 mn * b 1 - b 1 2 mn
ρ 1 = r 1 - s 1 * b 0 q 1 - q 1 2 mn * b 1 - b 1 2 mn
ρ 2 = r 2 - s 2 * b 0 q 2 - q 2 2 mn * b 1 - b 1 2 mn
ρ 3 = r 3 - s 3 * b 0 q 3 - q 3 2 mn * b 1 - b 1 2 mn
Calculate 4 at every turn and remove average normalizing eliminate indigestion related coefficient value ρ 0, ρ 1, and ρ 2, and ρ 3.
6, make u=u+4, repeat above-mentioned steps 3 to step 6 until traveled through reference map A, can calculate reference map A with scheme in real time B all go average normalizing eliminate indigestion related coefficient value.Can go minimum value in the average normalizing eliminate indigestion related coefficient value to determine optimum matching subgraph coordinate by asking all.
In sum, by the present invention, can support efficiently the vectorization of average normalizing eliminate indigestion related coefficient to calculate, can give full play to the computation capability of whole PE of vector processor, and fully excavated the data parallelism based on SIMD to vector processor, Effective Raise go the execution efficient of average normalizing eliminate indigestion related coefficient in vector processor.
The above only is preferred implementation of the present invention, and protection scope of the present invention also not only is confined to above-described embodiment, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art the some improvements and modifications not breaking away under the principle of the invention prerequisite should be considered as protection scope of the present invention.

Claims (4)

1. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of a vector processor-oriented is characterized in that may further comprise the steps:
(1) establish reference map A, its size is MxN, and figure is B in real time, and its size is mxn, and M>m, N>n; Vector processor comprises P processing unit;
(2) the vector processor data that at first travel through real-time figure B and will scheme in real time B are read in vector registor, employing is sued for peace to the value in the processing unit based on the dot product operation of SIMD, to the summation of the value between processing unit, calculate respectively the pixel value average of scheming in real time among the B based on reduction operation
Figure FDA0000079299360000011
With pixel value square B Ij 2Accumulation and;
(3) vector processor traversal reference map A and at every turn get the subgraph A that 4 elements of two head interval and length are the 4*p position from reference map A UvAnd A (u+4) v, obtain the adjacent subgraph A that 1 element of 4 head sequence interval and length are 4*p by shuffling operation (u+k) v(k=0,1,2,3);
(4) adopt the dot product based on SIMD to operate the summation of the value in the processing unit, to the summation of the value between processing unit, calculate successively described subgraph A based on reduction operation (u+k) vThe pixel value average of all elements in (k=0,1,2,3)
Figure FDA0000079299360000012
Pixel value accumulative total and, pixel value square (A (u+k) v) Ij 2Accumulation and and reference map A and the real-time pixel value product (A of figure B (u+k) v) Ij* B IjAccumulation and;
(5) calculate successively subgraph A (u+k) v(k=0,1,2,3) respectively with real-time figure B remove average normalizing eliminate indigestion correlation coefficient ρ (u, v), ρ (u+1, v), ρ (u+2, v), ρ (u+3, v);
(6) make u=u+4, repeat above-mentioned steps (3) to step (6) until traveled through reference map A, can calculate reference map A with scheme in real time B all go average normalizing eliminate indigestion related coefficient value.
2. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented according to claim 1 is characterized in that, in the described step (2), and described pixel value average
Figure FDA0000079299360000013
Computing formula be:
B ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 B ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) B ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ e w ;
Pixel value square B Ij 2Accumulation and computing formula be:
Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 = Σ l = 0 L - 1 Σ w = 0 p - 1 b w ⊗ b w ;
Wherein, b w=(B Iw, B I (w+1), B I (w+2), B I (w+3)) be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure FDA0000079299360000016
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure FDA0000079299360000017
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure FDA0000079299360000018
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure FDA0000079299360000019
The reduction of again result of calculation of p processor unit being fixed a point summation, L is cycle count and L=mn/4p.
3. the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of vector processor-oriented according to claim 2 is characterized in that, in the described step (4), and described pixel value average:
A ( u + k ) v ‾ = 1 mn Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = 1 mn Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w ;
Described pixel value accumulative total and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ j = 0 4 ( p - 1 ) ( A ( u + k ) v ) ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ e w
Described pixel value square (A (u+k) v) Ij 2Accumulation and:
Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ a w ;
Described pixel value product (A (u+k) v) Ij* B IjAccumulation and Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij B ij = Σ l = 0 L - 1 Σ w = 0 p - 1 a w ⊗ b w ;
A wherein w=(A Uv) Iw, (A Uv) I (w+1), (A Uv) I (w+2), (A Uv) I (w+3)Be 32 fixed point vectors that 48 pixel values consist of, e w=(1,1,1,1) is 32 fixed point vectors that 4 unit picture element values consist of;
Figure FDA0000079299360000025
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure FDA0000079299360000026
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure FDA0000079299360000027
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure FDA0000079299360000028
The reduction of again result of calculation of p processor unit being fixed a point summation;
Figure FDA0000079299360000029
For p processor unit of vector processor calculates simultaneously based on SIMD
Figure FDA00000792993600000210
The reduction of again result of calculation of p processor unit being fixed a point summation.
4. according to claim 1 and 2 or the vectorization implementation method of going average normalizing eliminate indigestion related coefficient of 3 described vector processor-orienteds, it is characterized in that, in the described step (5), A (u+k) v(k=0,1,2,3) with the computing formula of going average normalizing eliminate indigestion related coefficient of scheming in real time B are:
ρ ( u + k , v ) = Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) * ( B ij - B ‾ ) ) Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij - A ( u + k ) v ‾ ) 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 ( B ij - B ‾ ) 2 .
= Σ i = 0 m - 1 Σ j = 0 n - 1 ( ( A ( u + k ) v ) ij * B ij ) - mn A ( u + k ) v ‾ B ‾ Σ i = 0 m - 1 Σ j = 0 n - 1 ( A ( u + k ) v ) ij 2 - mn A ( u + k ) v ‾ 2 * Σ i = 0 m - 1 Σ j = 0 n - 1 B ij 2 - mn B ‾ 2
CN2011102133381A 2011-07-28 2011-07-28 Vector-processor-oriented mean-residual normalized product correlation vectoring method Active CN102411773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011102133381A CN102411773B (en) 2011-07-28 2011-07-28 Vector-processor-oriented mean-residual normalized product correlation vectoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011102133381A CN102411773B (en) 2011-07-28 2011-07-28 Vector-processor-oriented mean-residual normalized product correlation vectoring method

Publications (2)

Publication Number Publication Date
CN102411773A CN102411773A (en) 2012-04-11
CN102411773B true CN102411773B (en) 2013-03-27

Family

ID=45913839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011102133381A Active CN102411773B (en) 2011-07-28 2011-07-28 Vector-processor-oriented mean-residual normalized product correlation vectoring method

Country Status (1)

Country Link
CN (1) CN102411773B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104238994B (en) * 2014-09-01 2017-07-04 中国航天科工集团第三研究院第八三五七研究所 A kind of method for improving coprocessor operation efficiency
CN104699458A (en) * 2015-03-30 2015-06-10 哈尔滨工业大学 Fixed point vector processor and vector data access controlling method thereof
CN109165734B (en) * 2018-07-11 2021-04-02 中国人民解放军国防科技大学 Matrix local response normalization vectorization implementation method
CN109712173A (en) * 2018-12-05 2019-05-03 北京空间机电研究所 A kind of picture position method for estimating based on Kalman filter
CN114155562A (en) * 2022-02-09 2022-03-08 北京金山数字娱乐科技有限公司 Gesture recognition method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1349159A (en) * 2001-11-28 2002-05-15 中国人民解放军国防科学技术大学 Vector processing method of microprocessor
CN101833468A (en) * 2010-04-28 2010-09-15 中国科学院自动化研究所 Method for generating vector processing instruction set architecture in high performance computing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8934539B2 (en) * 2007-12-03 2015-01-13 Nvidia Corporation Vector processor acceleration for media quantization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1349159A (en) * 2001-11-28 2002-05-15 中国人民解放军国防科学技术大学 Vector processing method of microprocessor
CN101833468A (en) * 2010-04-28 2010-09-15 中国科学院自动化研究所 Method for generating vector processing instruction set architecture in high performance computing system

Also Published As

Publication number Publication date
CN102411773A (en) 2012-04-11

Similar Documents

Publication Publication Date Title
US20220365753A1 (en) Accelerated mathematical engine
CN102141976B (en) Method for storing diagonal data of sparse matrix and SpMV (Sparse Matrix Vector) realization method based on method
CN102411773B (en) Vector-processor-oriented mean-residual normalized product correlation vectoring method
CN102411558B (en) Vector processor oriented large matrix multiplied vectorization realizing method
CN103336758A (en) Sparse matrix storage method CSRL (Compressed Sparse Row with Local Information) and SpMV (Sparse Matrix Vector Multiplication) realization method based on same
CN103294648B (en) Support the partitioned matrix multiplication vectorization method of many MAC operation parts vector treatment device
TWI690896B (en) Image processor, method performed by the same, and non-transitory machine readable storage medium
CN102509071B (en) Optical flow computation system and method
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
CN103745447B (en) A kind of fast parallel implementation method of non-local mean filtering
CN114092336B (en) Image scaling method, device, equipment and medium based on bilinear interpolation algorithm
CN102158694A (en) Remote-sensing image decompression method based on GPU (Graphics Processing Unit)
Kunz et al. An FPGA-optimized architecture of horn and schunck optical flow algorithm for real-time applications
CN114503126A (en) Matrix operation circuit, device and method
CN102231202B (en) SAD (sum of absolute difference) vectorization realization method oriented to vector processor
Palaniappan et al. Parallel flux tensor analysis for efficient moving object detection
CN104504696A (en) Embedded parallel optimization method for image salient region detection
US20230254145A1 (en) System and method to improve efficiency in multiplicationladder-based cryptographic operations
CN102970545A (en) Static image compression method based on two-dimensional discrete wavelet transform algorithm
Amiri et al. High performance implementation of 2D convolution using Intel's advanced vector extensions
US10771089B2 (en) Method of input data compression, associated computer program product, computer system and extraction method
Menant et al. Optimized fixed point implementation of a local stereo matching algorithm onto C66x DSP
CN102012802A (en) Vector processor-oriented data exchange method and device
Fischer et al. BinArray: A scalable hardware accelerator for binary approximated CNNs
Ross et al. Implementing image processing algorithms for the epiphany many-core coprocessor with threaded mpi

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant