CN108765471B

CN108765471B - DSP image matching method based on operation resource load balancing technology

Info

Publication number: CN108765471B
Application number: CN201810268256.9A
Authority: CN
Inventors: 崔广涛; 李悦; 李彦; 韩翔宇; 韦龙飞; 柯贤锋; 潘云龙
Original assignee: China Academy of Launch Vehicle Technology CALT; Beijing Aerospace Automatic Control Research Institute
Current assignee: China Academy of Launch Vehicle Technology CALT; Beijing Aerospace Automatic Control Research Institute
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2021-12-07
Anticipated expiration: 2038-03-29
Also published as: CN108765471A

Abstract

The invention relates to a DSP image matching method based on an operation resource load balancing technology, which carries out FFT line transformation on a matched image and a reference image; performing FFT column transformation on the line transformation result to respectively obtain a matching image FFT transformation result and a reference image FFT transformation result; performing complex multiplication operation on the matching image FFT result and the reference image FFT result, and performing IFFT (inverse fast Fourier transform); and calculating the correlation surface of the matching image and the reference image. The invention balances the loads of the EDMA and the CPU by utilizing the characteristics of the EDMA and the CPU in the DSP, and recombines the data into a more efficient moving mode of the EDMA by using the CPU to balance the load when the EDMA becomes a resource bottleneck of calculation; when the CPU calculation becomes the bottleneck of the calculation resource, the calculation time is optimized from the perspective of the algorithm structure and the calculation sequence, and the fixed point processing of the algorithm is carried out under the condition of not losing the precision as much as possible.

Description

DSP image matching method based on operation resource load balancing technology

Technical Field

The invention relates to a DSP image matching method based on an operation resource load balancing technology, which adopts a region correlation matching algorithm and the DSP operation resource load balancing technology to realize the quick matching and positioning of a real-time matching image on a reference image and belongs to the field of image processing.

Background

The image matching technology in computer vision can adopt different matching strategies according to different application occasions, and in the application occasions of guided missile weapon end guidance and the like, the matching technology realizes the matching and positioning of the matching image of the detector on the reference image bound before shooting. In these application occasions, in order to ensure that the matching image is within the range of the reference image, the reference image participating in the matching calculation is often large, which makes the matching calculation very time-consuming and brings a serious challenge to the application occasions with high real-time requirements.

The matching calculation method based on the correlation coefficient is to slide the matching graph on the reference graph as a sliding window, calculate the correlation coefficient of each sliding position, and form a correlation surface with the size of the search range, wherein the position of the maximum value of the correlation surface is the final matching position. The formula for the correlation calculation is as follows:

as the reference map or the matching map becomes larger, the calculation time increases in a quadratic order. In order to weaken the influence of the search range and the size of the matching image, the matching image and the reference image are transformed to a frequency domain to perform complex multiplication operation to realize convolution operation of a space domain, so that the numerator in the formula (1) is quickly obtained, and then the quick calculation of the matching correlation surface is realized by using an integral graph technology. The fast correlation matching calculation steps in conjunction with fig. 1 are as follows:

1) performing FFT on the reference graph by complex expansion and zero padding to the size (integral power of 2) of the FFT;

2) expanding the complex number of the matching image and filling zero to the same size as the FFT result of the reference image, and performing FFT;

3) conjugate the conversion result of the reference graph and normalize the result after complex multiplication operation with the conversion result of the matching graph;

4) performing IFFT transformation on the result in the step 3) to obtain a convolution result, and calculating to obtain a correlation surface.

DSPs, which are high-performance processors in the field of data processing, can be used as embedded implementations of matching computations. Because the DSP memory is limited, the reference graph and the matching graph are usually stored in an expansion memory (for example, SDRAM or other external memory for short) for calculation, the access bandwidth of the external memory is small, and the EDMA ping-pong technique can be used to implement the parallel of data movement and CPU calculation. However, the resource loads of the EDMA and the CPU are not balanced, so that the image matching under the EDMA ping-pong architecture cannot achieve the optimal performance, and how to balance the resource loads of the EDMA and the CPU is a technical problem to be solved in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a DSP image matching method based on an operation resource load balancing technology, which combines the algorithm and the structural characteristics of image matching calculation to balance the load of operation resources and realize reasonable allocation of the resources so as to further improve the calculation performance of image matching.

The purpose of the invention is realized by the following technical scheme:

the DSP image matching method based on the operation resource load balancing technology comprises the following steps:

1) performing FFT row transformation on the matched image and the reference image;

2) performing FFT column transformation on the line transformation result in the step 1) to respectively obtain a matching image FFT transformation result and a reference image FFT transformation result;

3) performing complex multiplication operation on the matching image FFT result and the reference image FFT result, and performing IFFT (inverse fast Fourier transform);

4) and calculating the correlation surface of the matching image and the reference image.

Preferably, the method for performing FFT row transformation on the matching image and the reference image in step 1) is:

(1) opening up 4 rows of calculation space in SRAM of DSP, and initializing to all zeros;

(2) the EDMA moves 4 lines of image data of a matching image or a reference image in an external memory into an SRAM, the width of the 4 lines of image data is W _ r, and the height of the image data is H _ r;

(3) performing FFT (fast Fourier transform) on each row of data to obtain a transform result, and storing the transform result in 4 rows of computation spaces;

(4) changing the adjacent complex data of each column in the line transformation result into the adjacent complex data of the line and arranging the adjacent complex data in sequence;

(5) the data rearranged in the EDMA step (4) is moved to an external memory, and 4 pieces of complex data are moved each time and are arranged line by line;

(6) and (5) repeating the steps (1) to (5) until the operation of all the line data of the matching image or the actual image of the reference image is completed.

Preferably, the method for performing FFT column transformation on the row transformation result of step 1) in step 2) is as follows:

(1) opening up 1 line of calculation space in SRAM of DSP and initializing to all zeros;

(2) EDMA transfers the first H _ r complex data in a row of FFT row transformation result data of a matching graph or a reference graph in an external memory to SRAM;

(3) the CPU carries out FFT conversion on the data in the SRAM to obtain a conversion result and stores the conversion result in the row of calculation space;

(4) the CPU performs division operation by using shift calculation, the matching image and the reference image are respectively divided by H, W, H is the expanded height of the matching image and the reference image in the external memory, and W is the expanded width of the matching image and the reference image in the external memory;

(5) EDMA transfers the division result to external memory;

(6) and (5) repeating the steps (1) to (5) until the data calculation of the W data of the matching graph or the reference graph is finished.

Preferably, 3) the specific method of performing complex multiplication operation on the FFT result of the matching image and the FFT result of the reference image and then performing IFFT comprises:

(1) two lines of calculation space are opened in an SRAM of the DSP, and the calculation space is initialized to be all zero;

(2) the EDMA linearly moves the FFT conversion result of the matched image and the FFT conversion result of the reference image in an external memory to an internal memory and respectively stores the FFT conversion results and the FFT conversion results in two lines of calculation spaces;

(3) the CPU performs complex multiplication operation on the two data of each column and stores the data to the position of the first row and the column;

(4) EDMA linearly moves the result of the plural multiplication to the external memory;

(5) repeating the steps (1) to (4) until the W-row data is calculated;

(6) performing column direction IFFT of the complex multiplication operation result;

(7) and performing IFFT transformation of the complex multiplication operation result in the row direction.

Preferably, the method of performing the column-wise IFFT of the complex multiplication result is:

6.1 opening up 4 rows of calculation space in SRAM of DSP, and initializing to all zeros;

6.2EDMA uses linear moving mode to move the complex multiplication result data in external memory to SRAM, the width of 4 lines of image data is H;

6.3 performing IFFT transformation on each row of data respectively to obtain a transformation result, and storing the transformation result in 4 rows of computation spaces again;

6.4 changing the adjacent complex data of each column in the column direction conversion result into adjacent rows and arranging the adjacent rows;

6.5EDMA moves the reorganized data in SRAM to external memory, each time moves 4 complex data, and arranges them line by line;

6.6 repeat steps 6.1-6.5 until the complex multiplication is completed.

Preferably, the specific method for performing the row-wise IFFT transformation of the complex multiplication result in step (7) is:

7.1 opening up 1 line of calculation space in SRAM of DSP and initializing to all zeros;

7.2EDMA transfers the IFFT column direction transformation result data W complex data in the external memory to SRAM linearly;

7.3 the CPU performs IFFT transformation on the data in the SRAM to obtain a transformation result and stores the transformation result in the row of calculation space;

7.4EDMA transfers IFFT operation result to external memory;

7.5 repeating the steps 7.1-7.4 until the calculation of the IFFT column direction transformation result data of the H rows is finished.

Preferably, the specific method for calculating the correlation surface between the matching image and the reference image in step 4) is as follows:

(1) opening up 3 lines of calculation space in an SRAM of the DSP, wherein the width of a first line and a second line is W _ s +1, the width of a third line is W _ s-W _ rt +1, the first line and the second line are initialized to be all zero, W _ s is the width of an actual image of a reference image, and W _ rt is the width of an actual image of a matching image; let i equal to 0, j equal to 0;

(2) EDMA linearly moves ith row of data of IFFT conversion result in external memory to 3 rd row of calculation space, and j column number is S_ifft(i, j), wherein the value range of i is 0-H _ s-H _ rt, and the value range of j is 0-W _ s-W _ rt; the EDMA linearly moves the ith line and the (i + H _ rt) line of the datum map integral image data stored in the external memory to the calculation spaces of the 1 st and the 2 nd lines;

calculating del (i, j) by the CPU, adding the j + W _ rt data of the 2 nd row calculation space with the j data of the 1 st row, subtracting the j + W _ rt data of the 1 st row, and subtracting the j data of the 2 nd row calculation space to obtain del (i, j); calculating correlation coefficient corr (i, j) ═ S_ifft(i,j)/del(i,j)]*10000/F_ΔStoring the data in a 3 rd row calculation space to be j +1 until the correlation coefficient operation of all the column data of the ith row is completed; wherein F_ΔThe square sum of the gradient values of the effective areas of the matching image participating in the calculation is 1/2 th power;

(3) the EDMA linearly moves the calculation result of the correlation coefficient in the 3 rd line of calculation space to an external memory;

(4) and (4) the CPU judges whether i is smaller than H _ s-H _ rt, if so, the value of i is added with 1 and the step (2) is returned, and if not, the calculation of the correlation surface is finished.

Compared with the prior art, the invention has the following advantages:

(1) the invention balances the loads of the EDMA and the CPU by utilizing the characteristics of the EDMA and the CPU in the DSP, and recombines the data into a more efficient moving mode of the EDMA by using the CPU to balance the load when the EDMA becomes a resource bottleneck of calculation; when the CPU calculation becomes the bottleneck of the calculation resource, the calculation time is optimized from the perspective of the algorithm structure and the calculation sequence, and the fixed point processing of the algorithm is carried out under the condition of not losing the precision as much as possible.

(2) The invention utilizes the integrogram to calculate the correlation coefficient, thereby avoiding the excessive occupation of CPU resources in the correlation coefficient calculation process; the linear assembly optimization module provided by the TI is used for carrying out FFT and IFFT operation, and the CPU operation load is reduced.

Drawings

FIG. 1 is a schematic diagram of EDMA ping-pong processing of a DSP;

FIG. 2 is a schematic diagram of the load balancing processing of FFT computing resources in the image row direction according to the present invention;

FIG. 3 is a diagram of EDMA hardware resources;

FIG. 4 is a schematic diagram of the load balancing processing of FFT computing resources in the image column direction according to the present invention;

FIG. 5 is a schematic diagram of load balancing processing of IFFT operation resources in the image column direction according to the present invention;

FIG. 6 is a schematic diagram of load balancing processing of IFFT operation resources in the image row direction according to the present invention;

FIG. 7 is a schematic diagram illustrating the load balancing processing of computing resources of image correlation planes according to the present invention;

fig. 8 is a schematic diagram of a conventional fast correlation matching calculation method.

Detailed Description

The EDMA ping-pong processing is divided into A, B areas, when the CPU calculates the data in the a area, the EDMA moves the new external memory data from the external memory to the B area and moves the calculated data in the B area to the external memory, the core of the resource load balancing lies in the balancing of the data processing performance and the EDMA moving performance, and the schematic diagram of the ping-pong processing is shown in fig. 1. In order to further improve the performance, the loads of the EDMA and the CPU need to be balanced, and the purpose of computing resource load balancing is achieved by optimizing the algorithm sequence and data recombination according to the characteristics of image matching computation and the characteristics of the EDMA. When the EDMA becomes a resource bottleneck of calculation, the data is recombined into a more efficient moving mode of the EDMA by considering the use of CPU (central processing unit) to balance load; when the CPU calculation becomes a bottleneck of the calculation resource, the calculation time is optimized from the perspective of the algorithm structure and the calculation order, and the algorithm is fixed to a certain point as far as possible without losing accuracy. Load of EDMA data movement and CPU calculation is balanced, bottleneck limitation of single resource is eliminated, and high-performance image matching calculation with a large search range is achieved on the DSP.

The invention provides a DSP image matching method based on an operation resource load balancing technology, which mainly comprises the following steps:

1) FFT line transformation is carried out on the matching image and the reference image

The two-dimensional image FFT can be decomposed into two-dimensional FFT in two directions, that is, the image can be subjected to row direction FFT first and then column direction FFT, and after row direction conversion is completed, transposition storage of data is required to be performed, and then column direction conversion is performed. Because the line direction transformation needs to realize transposition by using EDMA two-dimensional movement, and the processing time of FFT transformation by adopting TI library function is shorter than that of EDMA, the EDMA is a resource needing to be optimized.

The FFT has parameter constraints and must satisfy the exponential multiple of the length of the transform complex data being 2, so the actual image width W _ r needs to be expanded to W and the height H _ r to H, as shown in fig. 2. In the figure, 1, 2, 3 and 4 represent four complex pixels, a shaded area in the SDRAM represents an actual memory image, a white area represents a virtual extended image portion, and the actual SDRAM does not have memory space allocation.

Because the line direction transformation needs to realize transposition by EDMA two-dimensional movement, shortening the movement time as much as possible is the key for balancing load. As shown in fig. 3, in the process of two-dimensional movement, after the movement request sent each time is executed, the address generation logic needs to update the source address and the destination address and then sends the movement request to the TCC again, so that the movement efficiency is higher if the total amount of data is constant and the continuous data is more, and the item uses a non-tight resource CPU to reconstruct the data in the SRAM of the DSP, thereby facilitating the two-dimensional movement of the EDMA. Meanwhile, in order to realize that the more the continuous data is, the larger the space in the SRAM needs to be opened up, and the load of the CPU increases each time, a balance point is found according to the actually measured transfer performance and CPU processing performance of the EDMA, and in this embodiment, a form of processing 4 rows of data each time by transfer is adopted to realize load balance of the row direction FFT transformation. Although there are 4 priority queues in the EDMA of DSP6416 that can handle data transfer tasks in parallel, after all the EDMAs are operating on SDRAM, there is no need to perform different priority assignment operations for different channels.

The implementation of the FFT transform software in the row direction of the operation load balancing is shown in FIG. 2, and the steps are as follows:

(2) EDMA uses the linear mode of moving to match the image or 4 lines of image data of the reference image in the external memory to SRAM, the width of four lines of image data is W _ r, do not include the fictitious expanded data;

(3) performing FFT (fast Fourier transform) on each row of data to obtain a transform result, storing the transform result in 4 rows of computation spaces again, wherein the FFT is operated by a CPU (central processing unit), data transfer is realized by an EDMA (enhanced direct memory access), and the loads of the CPU and the EDMA are balanced;

(4) carrying out data recombination on the line transformation result, and changing the adjacent complex data of each column into the adjacent line and arranging the adjacent line and the adjacent line; idle time after FFT conversion is carried out by the CPU is used for data recombination, the performance of moving the data to an external memory by the EDMA is improved, the load of the EDMA is reduced, and load balance is realized;

(5) EDMA moves the reorganized data in SRAM to the external memory, moves 4 complex data each time, and arranges them line by line;

(6) and (5) repeating the steps (1) to (5) until the operation of the H _ r row data of the matching image or the reference image actual image is completed, and obtaining an effective transformation result image as W rows and H _ r columns.

2) FFT column conversion of the matched image and the reference image

Since the normalization operation, i.e. division by W × H, is required after the FFT of the reference map, and the result after the FFT is the shaping, in order to ensure the accuracy of the calculation, the reference map is only divided by W after the FFT. And the subsequent complex multiplication operation is carried out on the matching image FFT conversion result, so that the operation of dividing by H can be carried out during the matching image FFT conversion, and the accuracy of fixed point processing in the FFT conversion process is ensured.

When the FFT is carried out in the column direction, the EDMA does not need to carry out two-dimensional moving any more, and meanwhile, division operation is needed after the FFT operation, so that the CPU operation is a resource needing optimization. At this time, the calculation overhead of the CPU should be reduced as much as possible, and since W is an exponential multiple of 2, the calculation load of the CPU can be further reduced by using the shift operation to achieve the purpose of load balancing as much as possible.

The implementation of the operational load balancing column direction FFT conversion software is shown in FIG. 4, and the steps are as follows:

(1) opening a row of calculation space in an SRAM of the DSP, and initializing the calculation space to be all zero;

(2) EDMA transfers the FFT row transformation result data H _ r of the matching graph or the reference graph in the external memory to SRAM linearly;

(4) the CPU performs division operation by using shift calculation, the matching graph and the reference graph are respectively divided by H, W, H is the extended height of the reference graph in the external memory, and W is the extended width of the reference graph in the external memory; and division operation is performed in a shifting mode, so that the load of the CPU is reduced.

(5) The EDMA transfers the division operation result to an external memory, and all the plurality of data are transferred each time;

(6) and (5) repeating the steps (1) to (5) until the data calculation of the W line of the matched graph or the reference graph is finished, and realizing the parallel load balance of the CPU and the EDMA.

The obtained effective transformation result image is W lines and H columns, transposition is not needed, and transposition operation is performed again when IFFT transformation is performed subsequently, so that the obtained result is a result without transposition. Since the matching graph and the reference graph are transposed after the FFT, the result obtained by the subsequent complex multiplication operation is a transposed result, the IFFT is equivalent to the column direction transformation performed first, and then the result obtained by transposing and storing the column direction IFFT is a positive order image convolution result.

3) Performing complex multiplication on the FFT result of the matched image and the FFT result of the reference image, and performing IFFT

(3) the CPU performs complex multiplication on the two data of each column and stores the data to the position of the first row and the column;

(5) repeating the steps (1) to (4) until the W-row data is calculated;

(6) with reference to fig. 5, column-wise IFFT transformation is performed; the SDRAM image in the lower left corner of fig. 5 is the FFT result of fig. 4, the

horizontal directions

1, 2, 3, and 4 are the same as fig. 4, the

vertical directions

1, 2, 3, and 4 represent the row direction adjacent elements, and it can be seen from the result of the transform that the data is restored to the positive sequence before the FFT.

6.2EDMA transfers the complex multiplication result data in the external memory to SRAM by linear transfer, the width of four lines of image data is H;

6.3 IFFT conversion is respectively carried out on each row of data to obtain a conversion result, the conversion result is stored in 4 rows of calculation spaces again, the IFFT conversion is operated by a CPU, data transfer is realized by EDMA, and the loads of the CPU and the EDMA are balanced;

6.4, carrying out data recombination on the column direction conversion result, and changing the adjacent complex data of each column into adjacent rows and arranging the adjacent rows; idle time after IFFT conversion is carried out by the CPU to carry out data recombination, so that the performance of moving the data to an external memory by the EDMA is improved, the load of the EDMA is reduced, and load balance is realized;

6.6 repeating the steps 6.1-6.5 until the operation of the complex multiplication result W row data is completed, and the obtained effective transformation result image is H row and W column.

(7) With reference to fig. 6, IFFT transformation is performed in the row direction;

7.1 opening up a row of calculation space in the SRAM of the DSP and initializing to be all zero;

7.4EDMA transfers IFFT operation result to external memory, and each transfer realizes transfer of all complex data;

4) Matching image and reference image correlation surface calculation

Correlation value calculation formula

F_ΔThe square sum of the gradient values of the effective areas of the matching images participating in the calculation is 1/2 th power, and the calculation is performed after the matching images are received each time. del (i, j) is 1/2 power of the square sum of gradient values of the reference image which is subjected to correlation calculation with the matching image, and in order to realize quick calculation, the binding of the integral graph of the reference image can be finished through uploading and distributing the reference image before shooting. In order to realize load balance of operation resources, the calculation load of a CPU needs to be reduced and the calculation precision is ensured, and the formula can be subjected to fixed point processing

corr(i,j)＝[S_ifft(i,j)/del(i,j)]*10000/F_Δ

The correlation surface computing resource load balancing processing is as shown in fig. 7, and the position where the maximum value of the correlation number in the correlation surface is located is the matching position. In fig. 7, the shaded background area is data and a calculation result used in the calculation process, the dark color area is data used for calculating the correlation value of the current line, the black background area is a schematic size of the matching graph, and W _ rt and H _ rt are widths and heights of the matching graph.

(1) Opening up three rows of calculation spaces in an SRAM of the DSP, wherein the widths of a first row and a second row are W _ s +1, the width of a third row is W _ s-W _ rt +1, the first row and the second row are initialized to be all zero, W _ s is the width of an actual image of a reference image, and W _ rt is the width of an actual image of a matching image; let i equal to 0, j equal to 0;

(2) EDMA linearly moves ith row of data of IFFT conversion result in external memory to third row calculation space, and j column number is S_ifft(i, j), wherein the value range of i is 0-H _ s-H _ rt, and the value range of j is 0-W _ s-W _ rt; EDMA linearly arranges the ith line and the (i + H _ rt) line of reference image integral image data stored in an external memoryMoving to the 1 st and 2 nd row calculation spaces;

calculating del (i, j) by the CPU, adding the j + W _ rt data of the 2 nd row calculation space with the j data of the 1 st row, subtracting the j + W _ rt data of the 1 st row, and subtracting the j data of the 2 nd row calculation space to obtain del (i, j); calculating correlation coefficient corr (i, j) ═ S_ifft(i,j)/del(i,j)]*10000/F_ΔStoring the data in a 3 rd row calculation space, and adding 1 to the value of j until the correlation coefficient operation of all the column data in the ith row is completed;

(4) and (3) the CPU judges whether i is smaller than H _ s-H _ rt, if so, the value of i is added with 1 and the step (2) is returned, and if not, the calculation of the correlation surface is completed, and the image matching is completed.

Compared with the traditional EDMA ping-pong architecture matching calculation method, the image matching method has the advantages that the performance is improved, and the calculation time is reduced by more than 20%.

The above description is only for the best mode of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Those skilled in the art will appreciate that the invention may be practiced without these specific details.

Claims

1. A DSP image matching method based on an operation resource load balancing technology is characterized by comprising the following steps:

4) calculating a correlation surface of the matched image and the reference image; the method comprises the following steps:

(4.1) opening up 3 lines of calculation space in the SRAM of the DSP, wherein the width of the first line and the second line is W _ s +1, the width of the third line is W _ s-W _ rt +1, and the calculation space is initialized to be all zero, W _ s is the width of the actual image of the reference image, and W _ rt is the width of the actual image of the matching image; let i equal to 0, j equal to 0;

(4.2) EDMA linearly moving ith row of data of IFFT result in external memory to 3 rd row of calculation space, where j column number is S_ifft(i, j), wherein the value range of i is 0-H _ s-H _ rt, and the value range of j is 0-W _ s-W _ rt; the EDMA linearly moves the ith line and the (i + H _ rt) line of the datum map integral image data stored in the external memory to the calculation spaces of the 1 st and the 2 nd lines;

calculating del (i, j) by the CPU, adding the j + W _ rt data of the 2 nd row calculation space with the j data of the 1 st row, subtracting the j + W _ rt data of the 1 st row, and subtracting the j data of the 2 nd row calculation space to obtain del (i, j); calculating correlation coefficient corr (i, j) ═ S_ifft(i,j)/del(i,j)]*10000/F_ΔStoring the data in a 3 rd row calculation space until the correlation coefficient operation of all the column data of the ith row is completed; wherein F_ΔThe square sum of the gradient values of the effective areas of the matching image participating in the calculation is 1/2 th power;

(4.3) the EDMA linearly moves the calculation result of the correlation coefficient in the 3 rd line of calculation space to an external memory;

and (4.4) judging whether i is smaller than H _ s-H _ rt by the CPU, if so, adding 1 to the value of i, returning to the step (4.2), and if not, finishing the calculation of the correlation surface.

2. The DSP image matching method based on computation resource load balancing technique according to claim 1, wherein the method of performing FFT row transformation for the matched image and the reference image in step 1) is:

(1.1) opening up 4 rows of calculation space in SRAM of DSP, and initializing to be all zero;

(1.2) the EDMA transfers 4 lines of image data of a matching image or a reference image in an external memory into an SRAM, wherein the width of the 4 lines of image data is W _ r, and the height of the image data is H _ r;

(1.3) performing FFT (fast Fourier transform) on each row of data respectively to obtain a transform result, and storing the transform result in 4 rows of computation spaces;

(1.4) changing the adjacent complex data of each column in the row transformation result into the adjacent complex data of the row and arranging the adjacent complex data in sequence;

(1.5) the EDMA moves the data rearranged in the step (1.4) to an external memory, moves 4 complex data each time and arranges the data line by line;

(1.6) repeating the steps (1.1) - (1.5) until the operation of all the line data of the matching image or the actual image of the reference image is completed.

3. The DSP image matching method based on computation resource load balancing technique according to claim 2, characterized in that the method of FFT column transformation of the row transformation result of step 1) in step 2) is as follows:

(2.1) opening up 1 line of calculation space in the SRAM of the DSP and initializing to be all zero;

(2.2) the EDMA transfers the first H _ r complex data in one line of FFT line transformation result data of the matching graph or the reference graph in the external memory to the SRAM;

(2.3) the CPU carries out FFT conversion on the data in the SRAM to obtain a conversion result and stores the conversion result in the row of calculation space;

(2.4) the CPU performs division operation by using shift calculation, the matching graph and the reference graph are divided by H, W respectively, H is the height expanded in the external memory of the matching graph and the reference graph, and W is the width expanded in the external memory of the matching graph and the reference graph;

(2.5) the EDMA transfers the division operation result to an external memory;

and (2.6) repeating the steps (2.1) - (2.5) until the W data of the matching graph or the reference graph is calculated.

4. The DSP image matching method based on operation resource load balancing technique as claimed in claim 3, wherein the specific method of 3) performing complex multiplication operation on the matching image FFT transformation result and the reference image FFT transformation result and then performing IFFT transformation is:

(3.1) opening up two lines of calculation space in the SRAM of the DSP, and initializing to be all zero;

(3.2) the EDMA linearly moves the FFT conversion result of the matched image in the external memory and the FFT conversion result of the reference image to the SRAM and respectively stores the FFT conversion results and the FFT conversion results in two lines of calculation spaces;

(3.3) the CPU performs complex multiplication operation on the two data of each column and stores the two data to the position of the first row and the column;

(3.4) EDMA linearly moving the result of the complex multiplication operation to external memory;

(3.5) repeating the steps (3.1) - (3.4) until the W-row data is calculated;

(3.6) performing column direction IFFT of the complex multiplication operation result;

(3.7) performing IFFT on the complex multiplication result in the row direction.

5. The DSP image matching method based on operation resource load balancing technology as claimed in claim 4, wherein the method of performing column direction IFFT of complex multiplication operation result is:

6.6 repeat steps 6.1-6.5 until the complex multiplication is completed.

6. The DSP image matching method based on operation resource load balancing technology according to claim 5, wherein the specific method for performing the row direction IFFT of the complex multiplication operation result is as follows:

7.4EDMA transfers IFFT operation result to external memory;