CN111737169B - EDMA-based implementation method of high-capacity high-speed line-row output cache structure - Google Patents
EDMA-based implementation method of high-capacity high-speed line-row output cache structure Download PDFInfo
- Publication number
- CN111737169B CN111737169B CN202010702851.6A CN202010702851A CN111737169B CN 111737169 B CN111737169 B CN 111737169B CN 202010702851 A CN202010702851 A CN 202010702851A CN 111737169 B CN111737169 B CN 111737169B
- Authority
- CN
- China
- Prior art keywords
- data
- address
- cache
- new data
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a realization method of a high-capacity high-speed line-row output cache structure based on EDMA, which relates to the technical field of high-speed line-row output cache structures and comprises the following contents: storing original data into a new data cache operation and reading data from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use; and the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache according to a specific mode for subsequent use, and in some large data processing occasions, each row and column data of any large-resolution image are accessed, so that the data access time is reduced, and the data access speed is greatly increased.
Description
Technical Field
The invention relates to the technical field of high-speed row-column output cache structures, in particular to a realization method of a high-capacity high-speed row-column output cache structure based on EDMA (enhanced direct memory access).
Background
In data processing, in some special application occasions, such as some image processing applications of common image processing, SAR radar imaging processing and the like, any line or column of data in an image is generally required to be processed, such as FFT operation on the line or column; before data processing, a row of data or a column of data needs to be read.
The conventional data reading mode of the processor adopts code reading, when the code is adopted to read externally transmitted data which needs to be updated frequently, the efficiency is very low, and particularly aiming at the occasion of reading bytes of non-continuous addresses, the efficiency is extremely low, and the efficiency can only reach one percent compared with EDMA reading.
When reading a column of data, the method belongs to discontinuous address access, and the code reading is very slow, so an EDMA interval reading mode, namely interval row length reading, is usually adopted; however, for reading data with a large number of interval points, EDMA generally only supports interval data reading with a maximum number of 32767 points, that is, only supports column data reading with a row smaller than 32767 points at the maximum, and column reading with a larger row length cannot be achieved. In addition, if the transmission efficiency of the EMDA needs to be exerted, the transmission data length needs to be longer than a certain length to achieve higher transmission efficiency.
For application occasions needing real-time processing, such as images, SAR radars and the like, data are transmitted from a camera or a radar and the like in real time, data processing with specific data length is generally needed to be realized within a specified time interval period, and for the occasions needing processing simultaneously on rows and columns, the reading and writing time of row and column data and the processing time of row and column data are required to be the same and controllable as much as possible, so that the application process can work strictly according to a preset time sequence without interrupting the original process; in order to guarantee the algorithm operation time to the maximum extent, the data reading and writing time needs to be reduced as much as possible, that is, a set of efficient data reading and writing mechanism and a cache structure are needed to guarantee efficient data access and equal row and column processing time, and the application process is controllable.
In order to solve the problems, an EDMA-based high-capacity high-speed row-column output cache structure and an implementation method are provided, so that in some large data processing occasions, such as SAR radar imaging, any large-resolution image, such as each row and column of data with row data larger than 32767, can be accessed, and the data access time is reduced.
Disclosure of Invention
The present invention is directed to a method for implementing an EDMA-based high-capacity high-speed line/row output cache structure, so as to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data matrix is an M-row N-column matrix, the submatrix is an a-row b-column matrix, the original data matrix is divided into submatrix data blocks of a and b, namely the (x, y) -th submatrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
Preferably, the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of (i-1) divided by a quotient of a to truncate a fraction, i.e., a maximum integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
Preferably, the operation of reading data from the new data cache comprises a row operation of reading data relative to the original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.
Preferably, the operation of reading the ith row relative to the original data from the recombined new data cache includes the following specific steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
Preferably, in step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
Preferably, the reading from the new data cache after the reorganization is relative to the column operation with the original data, including the following cases:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
Preferably, the j-th column of data in the submatrix corresponding to the even column of the (x, y) -th submatrix of the original data, where 1< (j < (b), 1< (y < (N/b), and y is an even number, is read from the new data buffer after reorganization, and the process is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Compared with the prior art, the invention has the beneficial effects that: a method for realizing a high-capacity high-speed line-row output cache structure based on EDMA is provided, and original data are stored into a new data cache operation and read from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; and the data reading operation from the new data cache is used for taking out row and/or column data corresponding to the original data to be used from the new data cache in a specific mode for subsequent use, and in some large data processing occasions, the access to any large-resolution image, such as each row and each column of data with row data of more than 32767, is realized, and the data access time is reduced.
Detailed Description
The described embodiments are only some embodiments of the invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a technical scheme that: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data, and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data matrix is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, and the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < (M/a); y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
Wherein the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a truncated fraction, i.e., a largest integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
Wherein the read data from the new data cache comprises a row operation relative to the original data read from the reassembled new data cache and a column operation relative to the original data read from the reassembled new data cache.
The operation of reading the ith row relative to the original data from the recombined new data cache specifically comprises the following steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
In step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
Wherein the reading from the reorganized new data cache is relative to the column operation with the original data, including the following cases:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
Reading j-th column data in a submatrix relative to an even column of the (x, y) -th submatrix of the original data, wherein y is an even number, from the recombined new data buffer, wherein 1< j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Example (b):
A. setting an original data matrix as an M-row N-column matrix, setting a sub-matrix as an a-row b-column matrix, and dividing the original data matrix into sub-matrix data blocks of a b, namely the (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the original data matrix is divided into sub-matrices according to 4 x 4, and the following table is adopted:
storing original data into a new data cache operation, comprising the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
s5, after sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer;
the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by performing an EDMA transfer operation on the continuous original address and the equal-interval destination address, and the original data is divided into a sub-matrix data block of a × b and stored in a new data cache, where the value of the new data cache is as shown in the following figure:
the read data from the new data cache operation includes a row operation relative to the original data read from the reorganized new data cache and a column operation relative to the original data read from the reorganized new data cache:
reading the ith row operation relative to the original data from the recombined new data cache, wherein the ith row operation comprises the following steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data cache, continuing to execute the processes similar to the processes from s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data cache, wherein 5< m < N/b, and m is an integer;
in the step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; where k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) denotes the value of the quotient of (i-1) divided by a, with the fractional part being left, i.e. not greater than the largest integer of the original data.
The flows 1 and 3 are completed by adopting the continuous original address and continuous destination address moving operation of the EDMA.
The flows 2 and 4 are completed by equal-interval original address and continuous destination address moving operation of EDMA.
Reading a column operation corresponding to the original data from the recombined new data cache, and the method comprises the following steps:
reading j-th column data in an odd column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the odd column submatrix and the even column submatrix thereof are defined as the following chart:
the reading process is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
the steps (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA;
(3) and the steps 1 to 3 are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
The steps (1) to (3) are completed by adopting the continuous original address and continuous destination address moving operation of EDMA
Reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. An implementation method of an EDMA-based high-capacity high-speed line output cache structure is characterized by comprising the following steps:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
2. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 1, wherein: the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
3. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 2, wherein: the read data operation from the new data cache comprises a row operation of reading data relative to original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.
4. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 3, characterized in that: the method for reading the ith row operation relative to the original data from the recombined new data cache comprises the following specific steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
5. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 4, characterized in that: in the step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
6. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as claimed in claim 5, wherein: the reading from the recombined new data cache is relative to the column operation of the original data, and comprises the following conditions:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
7. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 6, characterized in that: reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010702851.6A CN111737169B (en) | 2020-07-21 | 2020-07-21 | EDMA-based implementation method of high-capacity high-speed line-row output cache structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010702851.6A CN111737169B (en) | 2020-07-21 | 2020-07-21 | EDMA-based implementation method of high-capacity high-speed line-row output cache structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111737169A CN111737169A (en) | 2020-10-02 |
CN111737169B true CN111737169B (en) | 2020-11-27 |
Family
ID=72656064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010702851.6A Active CN111737169B (en) | 2020-07-21 | 2020-07-21 | EDMA-based implementation method of high-capacity high-speed line-row output cache structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737169B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114879584B (en) * | 2022-07-05 | 2022-10-28 | 成都智明达电子股份有限公司 | DMA controller boundary alignment method based on FPGA and circuit thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262553A (en) * | 2011-08-03 | 2011-11-30 | 中国科学技术大学 | Method for optimizing linear system software package based on loongson 3B |
CN102930636A (en) * | 2012-11-15 | 2013-02-13 | 广州广电运通金融电子股份有限公司 | Recognition device and recognition method for paper money number |
CN105739874A (en) * | 2016-03-11 | 2016-07-06 | 沈阳聚德视频技术有限公司 | EDMA achieving method in image rotation based on DSP |
CN106303582A (en) * | 2016-08-20 | 2017-01-04 | 航天恒星科技有限公司 | A kind of Joint Source Channel decoding method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7673076B2 (en) * | 2005-05-13 | 2010-03-02 | Texas Instruments Incorporated | Concurrent read response acknowledge enhanced direct memory access unit |
-
2020
- 2020-07-21 CN CN202010702851.6A patent/CN111737169B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262553A (en) * | 2011-08-03 | 2011-11-30 | 中国科学技术大学 | Method for optimizing linear system software package based on loongson 3B |
CN102930636A (en) * | 2012-11-15 | 2013-02-13 | 广州广电运通金融电子股份有限公司 | Recognition device and recognition method for paper money number |
CN105739874A (en) * | 2016-03-11 | 2016-07-06 | 沈阳聚德视频技术有限公司 | EDMA achieving method in image rotation based on DSP |
CN106303582A (en) * | 2016-08-20 | 2017-01-04 | 航天恒星科技有限公司 | A kind of Joint Source Channel decoding method and system |
Non-Patent Citations (1)
Title |
---|
基于通用DSP的多路视频编码器的优化实现;李波等;《电子学报》;20061130(第11期);第2103-2108页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111737169A (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3667607B1 (en) | Virtual linebuffers for image signal processors | |
US5053985A (en) | Recycling dct/idct integrated circuit apparatus using a single multiplier/accumulator and a single random access memory | |
CN106846235B (en) | Convolution optimization method and system accelerated by NVIDIA Kepler GPU assembly instruction | |
US10347220B1 (en) | Data compression and decompression method for DeMura table | |
CN112991142B (en) | Matrix operation method, device, equipment and storage medium for image data | |
CN110032538B (en) | Data reading system and method | |
CN111737169B (en) | EDMA-based implementation method of high-capacity high-speed line-row output cache structure | |
JPH0944356A (en) | Processor and data processor | |
CN109993293B (en) | Deep learning accelerator suitable for heap hourglass network | |
JP5359569B2 (en) | Memory access method | |
US4768159A (en) | Squared-radix discrete Fourier transform | |
CN108920097B (en) | Three-dimensional data processing method based on interleaving storage | |
CN110532219B (en) | FPGA-based ping-pong data storage removing method | |
CN105513004B (en) | A kind of image distortion calibration system and its storage method and addressing method | |
JP7234000B2 (en) | Image data processing device for affine transformation of two-dimensional image | |
JPH07152730A (en) | Discrete cosine transformation device | |
CN110570802B (en) | Digital gamma correction system and display driving chip comprising same | |
JP5113174B2 (en) | Method and apparatus for simultaneously generating multi-site FM screen dots | |
JP2004185800A (en) | Range selectable address decoder and high speed graphic processing frame memory device using the decoder | |
CN112184565B (en) | Multi-window serial image sharpening method | |
CN117314730B (en) | Median filtering computing device and method for accelerating digital image processing | |
JPS61137189A (en) | Image memory unit | |
JPWO2021055027A5 (en) | ||
JP2005216124A (en) | Matrix operation apparatus | |
CN113962376A (en) | Sparse neural network processor and method based on mixed level precision operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |