CN110781447B

CN110781447B - DDR-based high-efficiency matrix transposition processing method

Info

Publication number: CN110781447B
Application number: CN201910996540.2A
Authority: CN
Inventors: 张为; 李欣桐
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-10-19
Filing date: 2019-10-19
Publication date: 2023-04-07
Anticipated expiration: 2039-10-19
Also published as: CN110781447A

Abstract

The invention discloses a DDR-based high-efficiency matrix transposition processing method, which is characterized in that a write RAM and a read RAM are matched with an IP core of a DDR3 SDRAM; the matrix to be transposed is a 128-by-128 matrix of single data 64bit, and the data of each row of the matrix to be transposed is a small matrix; 16 data are written in each 1 active signal in the writing operation process; reading 8 × 16 data in every 1 active signal in the reading operation process; in DDR, the data is read out according to the rule of data cycle skip among the same lines so as to reduce the occurrence of a line active signal as much as possible. The invention solves the problem that the whole processing speed of the system is reduced due to the skip access of the DDR SDRAM in the large-order matrix transposition. By the matrix blocking technology, on the basis of reducing the write rate by a small margin, the read rate is greatly improved, the read-write rate of the DDR is balanced when the matrix is transposed, and therefore the average read-write efficiency of the DDR is improved.

Description

DDR-based high-efficiency matrix transposition processing method

Technical Field

The invention belongs to the field of data transposition in radar imaging technology, and relates to a processing method for quickly transposing large data volume in a radar imaging process.

Background

Synthetic Aperture Radar (SAR) is a Radar with high resolution imaging effect, and compared with the traditional Radar, SAR has the advantages of being all-weather and capable of penetrating through an obstruction all day long. The method plays an important role in the fields of environment detection, terrain reconnaissance, target identification and the like. The SAR is mainly carried on three working platforms, namely an airborne working platform, a satellite-borne working platform and a missile-borne working platform. By performing algorithm processing on the echo signal, a focused SAR image can be obtained. The conventional SAR imaging system not only has complex structure and high manufacturing cost, but also has larger weight and volume. With the rapid development of foreign unmanned aerial vehicles and light and small aircrafts, the imaging processing of the miniature SAR is widely concerned. Since the echo signals of the SAR need to process a large amount of data, a higher requirement is put on the fast transmission and processing of the data. In the echo data processing flow of the SAR, fourier transform FFT and transposition are two basic operations which are applied frequently.

The need for micro-SAR imaging processing, i.e., either an FPGA or an ASIC, can be relatively inadequate in this regard, compared to the high speed and efficiency that comes from the specificity of ASICs. However, a professional ASIC has a long development period and can only be replaced as a whole with the update. The advantages of the FPGA are its flexibility in programming development and the timeliness of upgrading algorithms.

Generally, the conventional transposition method is very simple to implement in a software portion, and the optimization thereof is also easy to complete. However, if the transposition is implemented in hardware, the transposition process becomes long and the resource occupancy rate is high if some special methods are not used. This is unacceptable for SAR real-time imaging systems with large data volumes to be processed. So that domestic researchers have also conducted research in this respect and achieved certain results. A large-capacity high-speed transmission of data is realized by using a mounted DDR3SDRAM core on an FPGA platform, and transposition is completed in the transmission process. By means of the DDR series double-rate reading and writing characteristics, the transposition method is researched, and the overall processing efficiency of the SAR is greatly improved. The algorithm for realizing matrix transposition in the SAR real-time imaging system mainly comprises the following steps: the methods of run-out (as shown in FIG. 1), column-out, pipeline balancing, chunking, etc. The conventional run-out or column-out method can greatly reduce the rate during the skip due to the DDR internal row activation time.

In the previous research, in 2013, zhou et al adopt an address mapping algorithm, combine the thought of dividing sub-blocks into large matrices, use the thought of storing data by changing pages with DDR, store the matrices according to two methods, namely, the same distance direction and the same direction, and finish transposition work through expression calculation of offset addresses in the reading process; in 2017, wu et al proposed a basic model of a block transposition method, namely, a 128 × 128 standard square matrix is divided into 16 × 8 small matrices, and a way of grouping write-in and cyclic skip read-out is used, so that under the condition of ensuring write-in efficiency, the read-out efficiency is improved, the read-write efficiency of DDR3 is balanced to a certain extent, and under the thought, the overall transposition efficiency is improved.

Disclosure of Invention

When an SAR real-time imaging system is operated on an FPGA, a conventional transposition method is used for solving the problem of low reading efficiency caused by line activation active signals of a DDR in the process of transposing a large amount of data, and meanwhile, 64-bit check bits are introduced for verifying the correctness of a result after matrix transposition. The invention provides a high-efficiency matrix transposition processing method based on DDR,

in order to solve the above technical problem, the present invention provides a DDR-based high efficiency matrix transpose processing method, which uses an IP core of a DDR3SDRAM, collocates a write RAM and a read RAM for the IP core, and includes:

step one, the matrix to be transposed is a 128-128 matrix of 64 bits of single data, and the matrix to be transposed is divided into 128 8-16 small matrices, namely, each row of data forms a small matrix;

writing 16 data in each 1 active signal in the writing operation process;

reading 8 × 16 data per 1 active signal in the reading operation process; during reading, the address is regulated and controlled by the controller, and the data is read out in the DDR according to the rule that data is read out in a cycle mode between the same lines, so that the occurrence of a row active signal is reduced as much as possible.

Further, in the second step of the matrix transposition processing method of the present invention, data enters the write RAM, after the write RAM is completely written, the data write of the DDR is started, the address bus of the DDR is increased in size according to the burst length, wherein the change of the address bit is the change of the read address bit of the write RAM; in the write RAM, switching to the next row and continuously reading the corresponding 16 data every time the reading of the 16 data is completed; according to the logic, each time 8 rows of data are read, a small cycle is formed; then, returning to the first row in the 8 rows, starting reading of the next group of 16 data, sequentially executing the above logics until the current 8 rows of data are completely read, starting reading of the next group of 8 rows of data, and performing a cycle operation, wherein each time the reading of 8 groups of data is completed is a major cycle; according to the large loop and the small loop, data reading of 16 large groups is finally completed, and each large group comprises 8 small groups.

In the third step of the matrix transposition processing method, 8 data are read in a single jump of each row in the DDR, namely the addresses of the 8 data in the read RAM are sequentially increased, and each row in the DDR is read out for 16 times; the reading mode is as follows: jumping to the corresponding position of the next small block every time one data is read out in one row, namely, crossing 15 data of the middle interval; after the 8 th data of the cycle is read out, the address pointer of the DDR points to the corresponding address of the first data which is not read out in the first small block of the line, the previous single-line cycle logic is repeated, and meanwhile, the address pointer of the RAM is read to point to the corresponding position of the next line; then, repeating the above processes until the 16 th cycle is completed, switching the address bit of the DDR to the first bit of the next row, and pointing the address pointer in the RAM to the corresponding position of the next row; repeating all the previous steps until 8 rows of data are read out from the DDR, and circulating the address pointer in the RAM to the corresponding position of the first row of the matrix; under such logic cycle, the previous cycle process is repeated until the data is read out from the DDR.

After all data are written into the read RAM, the reading state is finished, the data can be verified by using a controller, the difference of the high 64 bits is made between the data and two data to be verified by utilizing the characteristic that the diagonal line does not change in transposition, and if the difference value between the check bit of the diagonal line data and the check bit of one of the data is equal to the check bit of the other data, the two data are in correct positions; outputting the check result to a register, wherein the correctness is 1 and the error is 0; can be easily checked in a later report.

Compared with the prior art, the invention has the beneficial effects that:

the invention solves the problem that when a Programmable Gate Array (FPGA) is used as a main processing chip to realize radar signal processing, the large-order matrix transposition causes the reduction of the overall processing speed of a system due to the line skipping access of a Double-Rate synchronous dynamic random access memory (DDR SDRAM). By the matrix blocking technology, on the basis of reducing the write rate by a small margin, the read rate is greatly improved, the read-write rate of the DDR is balanced when the matrix is transposed, and therefore the average read-write efficiency of the DDR is improved.

Drawings

FIG. 1 is a conventional matrix transposed DDR3 read-write sequence;

FIG. 2 is a flow chart of the block-wise transpose of an embodiment of the present invention;

FIG. 3 is a block-wise DDR3 matrix transpose write sequence of an embodiment of the present invention;

FIG. 4 is a block-wise DDR3 matrix transpose read sequence of the present invention.

Detailed Description

The invention will be further described with reference to the following figures and specific examples, which are not intended to limit the invention in any way.

The invention provides a DDR-based high-efficiency matrix transposition processing method, which uses an IP core of a DDR3SDRAM and collocates a write RAM and a read RAM for the IP core; the matrix to be transposed is a 128-by-128 matrix of 64 bits of single data, and the matrix to be transposed is divided into 128 8-by-16 small matrices, namely, the data of each row forms a small matrix; 16 data are written in each 1 active signal in the process of writing operation; reading 8 × 16 data in every 1 active signal in the reading operation process; during reading, the address is regulated and controlled by the controller, and the data is read out in DDR according to the rule that data is read out in a cycle skip mode between the same lines, so that the occurrence of line active signals is reduced as much as possible.

The working flow of the matrix transpose processing method of the present invention is shown in fig. 2, and is described in detail below with reference to fig. 2.

(1) And splicing the original data in a splicing mode, numbering the data counted by the data counting module, and making into a 64-bit sequence. And splicing the sequence number corresponding to each data with the data to combine the data and the sequence number into 128-bit data to be transmitted. The significance of adding 64-bit sequence number array is that after the matrix transposition is completed, due to the large amount of data, it is inconvenient to verify the correctness of the transposition. Through the logic verification of the check bits of 64 bits, the accuracy of the whole module is quickly and conveniently verified. When the module passes the first verification, the step of verification can be cancelled, so that the whole using process is simpler and more efficient.

(2) After the splicing of the check bits is completed, data can enter the write RAM, and after the write-in of the write RAM is completed completely, the data write-in of the DDR can be started. In the process from reading the write RAM to the DDR, the address bus of the DDR only needs to be continuously increased according to the burst length. The main address bit change is also a change to the read address bit of the write RAM. In the write RAM, each time the reading of 16 data is completed, the next row is switched to, and the corresponding 16 data are continuously read. Every 8 rows are completed as per the above logic, looping back to the first of the 8 rows, beginning the reading of the next set of 16 data. And sequentially executing the logic, and starting reading the next group of 8 rows of data after the data of the current 8 rows are completely read, and continuously performing circulating operation. With the above large and small loops, there are eventually 16 large groups, each containing 8 small groups, which corresponds to the previous blocking strategy of dividing the 128 × 128 matrix into 16 × 8.

(3) After waiting for the data to be completely written, the write state is complete, signaling write completion, and the DDR controller will switch the DDR to the read state. In the read state, on one hand, the invention considers that the occurrence of row active signals is greatly reduced, so that frequent row conversion is avoided as much as possible in the read process of DDR. In DDR, each row reads 8 data in a single skip mode, namely the addresses of the 8 data in the read RAM are sequentially increased, and each row in DDR is read out for 16 times; the reading mode is as follows: jumping to the corresponding position of the next small block every time one data is read out in one row, namely, 15 data crossing the middle interval; after the 8 th data of the current cycle is read, the address pointer of the DDR points to the corresponding address of the first data which is not read in the first small block of the current row, the previous single-row cycle logic is repeated, and meanwhile, the address pointer of the RAM is read to point to the corresponding position of the next row; then, repeating the above processes until the 16 th cycle is completed, switching the address bit of the DDR to the first bit of the next row, and pointing the address pointer in the RAM to the corresponding position of the next row; repeating all the previous steps until 8 rows of data are read out from the DDR, and circulating the address pointer in the RAM to the corresponding position of the first row of the matrix; under such logic cycle, the previous cycle process is repeated until the data is read out from the DDR.

(4) After all data are written into the read RAM, the read state is finished, and the controller performs data verification. By utilizing the characteristic that the diagonal line does not change in transposition, the difference of 64 bits is carried out with two data to be verified. If the difference value between the check bit of the diagonal line data and the check bit of one of the data is equal to the check bit of the other data, the two data are in correct positions. And outputs the check result to the register, where the correct is 1 and the error is 0. This also allows the error bits to be easily checked in a later report.

Study materials:

the Kintex 7-series development board of Xilinx is loaded with the IP core of DDR3SDRAM, and a write RAM and a read RAM are matched with the IP core.

(1) In order to divide the matrix to be transposed into 128 small matrices of 8 × 16, the data of each row constitutes a small matrix; firstly, the original data passes through a splicing module, and a counting prefix of 64 bits is spliced at a high address bit of the original data of 64 bits. A simple counting module is arranged in the splicing device, and 1 is added when the splicing is completed once. The purpose is to encode data, and to correct the correctness of the transposition conveniently after the transposition is completed.

(2) And then, the data enters a write RAM, and a read-out address of the write RAM is controlled by a controller module, wherein the address in the DDR is 8 in burst length as an example. In the step, 8 bits are added to the addresses in the DDR in sequence; in conjunction with fig. 3, the read address of the write RAM changes according to the following rule:

a. for each increment of 16 address bits, i.e. for each output of 16 data, the address bits are first returned to 0, and then 128 x n. This step is scaled into the matrix, which is equivalent to pointing the address pointer to the first bit of the next row, where n denotes the number of big cycles, and the whole writing process has 16 big cycles.

b. After the corresponding 16 numbers in the 8 th row are read out, the address bits are reset to 0 again, and 16 × n is added, and this step is converted into the matrix, which is equivalent to pulling the address pointer from the 8 th row back to the 1 st row, starting from the first number in the second row of the current large group.

And repeating the steps a and b, and the like until after 1 large group is written, n +1, the address bit is reset to 0, 128 × 8 (n-1) =1024 × 1 is added, and the steps a and b are repeated from the first row of the next large group, namely the step c. And then repeating the steps a, b and c until the writing of the whole matrix from the writing RAM to the DDR is completed.

(3) After the data write is completed, the process of reading from the DDR follows. During this read, in conjunction with fig. 4, the address in ddr and the write address in read RAM change as follows:

adding 128 to the address bits of the DDR when reading out 1 number in the DDR, and sequentially adding 1 to the address bits of the RAM;

b. every time step a is executed 7 times, the address bits in the DDR are reset to 0, and 8 x n is added, wherein n represents the number of times step a is completed; meanwhile, the address bit in the RAM is reset to 0, and is added with 128 x n, and the next row is switched to. And finally, n is set to 0, and the steps a and b are repeated.

c. When the step b15 times is executed, the address bit in the DDR is reset to 0, 1024 × m is added, wherein m represents that the DDR finishes the data reading of m rows, namely, m +1 when the step b15 times is finished; in RAM, address bits are set to 0, plus 2048 × m. And then repeating the steps a, b and c.

d. After each step c 7 times, the address bit in the DDR is reset to 0, and 8196 × p is added, where p represents that the DDR completes p large groups of data reading, that is, p +1 after each step c 7 times; in RAM, address bits are set to 0, plus 8 × p. And then repeating the steps a, b and c until all data are read.

(4) After the transpose is completed, the correctness of the data arrangement is checked. The core idea follows: and (3) setting i > j, wherein i represents a row of the matrix to be transposed, j represents a column of the matrix to be transposed, judging whether (i, i) - (j, i) are equal to the upper 64 bits of (i, j), storing a comparison result into a register with 15 bits, wherein the lower 14 bits represent an address, the 15 th bit represents whether the address is equal, if so, the transposition is correct, and setting the address to be 1, otherwise, setting the address to be 0. And after the judgment is finished, removing the spliced high 64-bit check bits. The purpose of this step is mainly to verify whether the transposed function has defects in the design process, so as to ensure the correctness of the logic.

Although the present invention has been described in connection with the accompanying drawings, the present invention is not limited to the above-described embodiments, which are intended to be illustrative rather than restrictive, and many modifications may be made by those skilled in the art without departing from the spirit of the present invention as disclosed in the appended claims.

Claims

1. A high-efficiency matrix transposition processing method based on DDR uses an IP core of DDR3SDRAM, which is characterized in that a write RAM and a read RAM are collocated for the IP core, and the method comprises the following steps:

step two, writing 16 data in each 1 active signal in the writing operation process:

the data enters a write RAM, and after the write-in of the write RAM is completed completely, the data write-in of the DDR is started;

in the process of reading from the write RAM to the DDR, the address bus of the DDR increases progressively according to the burst length, wherein the change of the address bit is the change of the read address bit of the write RAM; in the write RAM, switching to the next row and continuously reading the corresponding 16 data every time the reading of the 16 data is completed; according to the logic, each time 8 rows of data are read, a small cycle is formed; then, returning to the first row in the 8 rows, starting reading of the next group of 16 data, sequentially executing the above logics until the current 8 rows of data are completely read, starting reading of the next group of 8 rows of data, and performing a cycle operation, wherein each time the reading of 8 groups of data is completed is a major cycle; according to the large circulation and the small circulation, the data reading of 16 large groups is finally completed, and each large group comprises 8 small groups;

the address in DDR is 8 according to the burst length, and the data writing process of DDR is as follows:

a) Writing the address bits into the RAM, wherein the address bits return to 0 and then 128 x n after 16 address bits are incremented in the RAM, namely 16 data are output; the conversion into the matrix is equivalent to pointing the address pointer to the first bit of the next row, n represents that the next cycle is in, and the whole writing process comprises 16 cycles;

b) After the corresponding 16 numbers in the 8 th row are read out, the address bits are reset to 0 again, 16 × n is added, and the address bits are converted into the matrix, namely the address pointer is pulled back to the 1 st row from the 8 th row, and the first number of the second row of the current large group is started;

c) Repeating the steps A) to B) until 1 major group is written;

d) n +1, the address bit is reset to 0, 128 × 8 (n-1) =1024 × (n-1), and the steps A) to B) are repeated from the first row of the next large group; until the whole matrix is written from the write RAM to the DDR;

reading 8 × 16 data per 1 active signal in the reading operation process; during reading, the address is regulated and controlled by the controller, and the data is read out in DDR according to the rule that data is read out in a cycle skip mode between the same lines, so that the occurrence of line active signals is reduced as much as possible.

2. The method as claimed in claim 1, wherein in step three, 8 data are read in a single skip of each row in the DDR, that is, the addresses of the 8 data in the read RAM are sequentially increased, and each row in the DDR is read out 16 times in total; the reading mode is as follows: jumping to the corresponding position of the next small block every time one data is read out in one row, namely, crossing 15 data of the middle interval; after the 8 th data of the current cycle is read, the address pointer of the DDR points to the corresponding address of the first data which is not read in the first small block of the current row, the previous single-row cycle logic is repeated, and meanwhile, the address pointer of the RAM is read to point to the corresponding position of the next row; then, repeating the above processes until the 16 th cycle is completed, switching the address bit of the DDR to the first bit of the next row, and pointing the address pointer in the RAM to the corresponding position of the next row; repeating all the previous steps until 8 rows of data are read out from the DDR, and circulating the address pointer in the RAM to the corresponding position of the first row of the matrix; under such logic cycle, the previous cycle process is repeated until the data is read out from the DDR.

3. The DDR-based high efficiency matrix transpose processing method of claim 2, wherein the address in the DDR is 8 in burst length, and the detailed procedure for reading out from the DDR is as follows,

a) Adding 128 to the address bits of the DDR and sequentially adding 1 to the address bits of the read RAM when 1 number of the DDR is read;

b) Every 7 times of executing the step A), 0 is added to the address bit in the DDR, and 8 x n is added, wherein n represents the number of times of completing the step A); meanwhile, the address bit in the read RAM is reset to 0, and 128 x n is added to switch to the next row; then, n is returned to 0, and the steps A) to B) are repeated;

c) When the step B15 times is executed, the address bit in the DDR is reset to 0, 1024 x m is added, m indicates that the DDR finishes the data reading of m rows, namely m +1 when the step B15 times is finished;

in RAM, address bit is returned to 0, and 2048 × m is added; thereafter, repeating steps a) through B);

d) When the step C7 times is executed, the address bit in the DDR is reset to 0, and 8196 × p is added, p indicates that the DDR finishes the data reading of p large groups, namely, p +1 when the step C) is finished for 7 times; in the read RAM, the address bit is returned to 0, and 8 × p is added; thereafter, steps a), B) and C) are repeated until the reading of all data is completed.

4. The DDR-based high efficiency matrix transpose processing method of claim 1, wherein after all data is written to the read RAM, the read state is complete and the controller performs data verification.

5. The DDR-based high efficiency matrix transpose processing method of claim 4, wherein the controller performs data validation by:

making a difference with two data to be checked by using the characteristic that the diagonal line does not change in transposition and making a difference with 64 bits, wherein if the difference between the check bit of the diagonal line data and the check bit of one of the data is equal to the check bit of the other data, the two data are in correct positions; and outputs the check result to the register, where the correctness is 1 and the error is 0.

6. The DDR-based high efficiency matrix transpose processing method of claim 5, wherein the addresses in the DDR are 8 in burst length; setting i > j, wherein i represents a row of a matrix to be transposed, j represents a column of the matrix to be transposed, judging whether (i, i) - (j, i) is equal to the upper 64 bits of (i, j), storing a comparison result into a register with 15 bits, wherein the lower 14 bits represent an address, the 15 th bit represents whether the address is equal, if so, indicating that the transposition is correct, and setting the comparison result to be 1, otherwise, setting 0.