CN107748723A - Storage method and access device supporting conflict-free stepping block-by-block access - Google Patents
Storage method and access device supporting conflict-free stepping block-by-block access Download PDFInfo
- Publication number
- CN107748723A CN107748723A CN201710901233.2A CN201710901233A CN107748723A CN 107748723 A CN107748723 A CN 107748723A CN 201710901233 A CN201710901233 A CN 201710901233A CN 107748723 A CN107748723 A CN 107748723A
- Authority
- CN
- China
- Prior art keywords
- access
- memory bank
- memory
- mrow
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Image Input (AREA)
Abstract
The invention discloses a storage method and a memory access device supporting conflict-free striding block-by-block access, wherein the storage method comprises the following steps: configuring a two-dimensional storage space with one size; mapping each pixel point in the two-dimensional image to different memory banks to support conflict-free step block-by-block access; the memory access device comprises a first element memory bank number calculation module, a shift information calculation module, a memory bank internal address sequencing module and a memory access execution module. The invention can support conflict-free striding block-by-block access of any address start, simultaneously has conflict-free line-by-line access of aligned addresses, and has the advantages of simple realization method, high access efficiency and the like.
Description
Technical field
The present invention relates to vectorial SIMD (Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD)) place
Device technical field of memory is managed, more particularly to a kind of supports that Lothrus apterus strides storage method and memory access device by block access.
Background technology
Can the design quality of storage system be an important factor for govern efficiently play performance in vectorial SIMD processor,
Current vectorial SIMD processor storage system be typically using by multiple memory modules according to organizing by the way of certain and
Row memory mechanism, this organizational form is especially suitable for the application that image, video etc. have 2D memory access demands, and video, image are calculated
Method needs the data of parallel processing often to form the row, column or block in bidimensional (2D) space, using traditional vectorial SIMD processor
Memory access efficiency step-down can be caused during mapping.
To solve the above problems, it is typically to use 2D memory mechanisms at present, 2D memory mechanisms are in parallel memorizing mechanism
One kind, i.e., memory space is represented by two coordinates of X and Y, to ensure image, Video Applications when mapping data between pixel
2D relations be not destroyed, effectively improve the data reusing and execution efficiency of application, at the same significantly improve video image calculation
Memory access efficiency of the method in vectorial SIMD processor.2D storage device address of cache is divided into two parts:Store module mapping and
Memory bank home address maps, 2D memory mechanisms be by setting suitable storage module mapping function, ensure every time to
Amount memory access is mapped to different memory banks, and memory bank home address mapping function can navigate to specific access position.
2D memory mechanisms of the prior art can only realize the continuously conflict-free access by row, column or block, but regard
Frequently, some sliding window classes apply (including convolutional neural networks, sub-pix difference, 2D filtering etc.) in computing in image processing algorithm
In the presence of the demand to stride by block conflict-free access, and to the horizontal and vertical demand also not phase that strides in the subprocess of computing
Together, striden for what is originated from arbitrary address by block access Lothrus apterus, and the access by row Lothrus apterus problem of alignment address, mesh
It is preceding not yet to have effective solution, therefore how to stride, vertically stride and stride and memory bank number in various levels
When, it is able to ensure that each element is mapped in different memory banks when under different two dimensions stride by block access
(i.e. Lothrus apterus), and the access by row Lothrus apterus for taking into account alignment address is urgent problem to be solved.
The content of the invention
The technical problem to be solved in the present invention is that:For technical problem existing for prior art, the present invention provides one
Kind can support the Lothrus apterus that arbitrary address originates to stride by block access, while have the access by row Lothrus apterus of alignment address concurrently,
And the support Lothrus apterus that implementation method is simple, access efficiency is high strides the storage method and memory access device by block access.
In order to solve the above technical problems, technical scheme proposed by the present invention is:
A kind of to support that Lothrus apterus strides the storage method by block access, step includes:
A two-dimensional storage space is configured, each of which element coordinate is (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access,
Mapping equation is specially:
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s';
In formula, f (w) is the numbering for the memory bank that mapping obtains, and M and N are respectively to exist by striding by data block during block access
The number of the element included in horizontal direction, vertical direction, and M and N are 2 integral number power;S and h is respectively to stride by block
Horizontal direction strides during access, Vertical Square strides, s=σ * 2s', wherein σ and 2 is relatively prime.
Further improvement as the inventive method:Each pixel by two dimensional image is mapped to different deposit
Chu Tihou, determine to obtain each pixel in memory bank home address according to formula (2);
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
As the further improvement of the inventive method, when needing to perform read and write access, read and write access concretely comprises the following steps:
S1. header element memory bank numbering calculates;Memory bank numbering b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden according to memory bank numbering b0 and the horizontal direction corresponding to the header element x0
S, calculate corresponding shift information shift_inf;
S3. memory bank home address calculates;Striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction
The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank
Portion address A;
S4. memory bank home address sorts;According to memory bank numbering b0, the horizontal direction corresponding to the header element x0
The s that strides is ranked up to the element inside the memory bank home address A;
S5. memory access performs;If write request, striden s according to the shift information shift_inf and the horizontal direction
After carrying out position selection to initial data, it is written in different memory banks;If read request, according to institute after data are read
State shift information shift_inf and the horizontal direction strides s to initial data progress position selection.
Further improvement as the inventive method:States and memory access of the step S25 also including detection buffer
Whether address conflicts produces step to produce the Busy signals of Busy signals.
The present invention further provides the memory access device to be striden using above-mentioned support Lothrus apterus by the storage method of block access, bag
Include:
Header element memory bank numbering computing module, numbered for calculating memory bank corresponding to header element x0 according to formula (1)
b0;
Shift information computing module, for memory bank numbering b0 and the horizontal direction according to corresponding to the header element x0
Stride s, calculates corresponding shift information shift_inf;
Memory bank home address computing module, for by header element x0 bidimensional address coordinate and horizontal direction stride s, hang down
Nogata stride h calculate needed for access elements bidimensional address coordinate, and the access elements according to needed for calculating formula (2) are storing
Body home address A;
Memory bank home address order module, for memory bank numbering b0, the water according to corresponding to the header element x0
Square the element inside the memory bank home address A is ranked up to the s that strides.
Memory access execution module, for if write request, according to the shift information shift_inf and the horizontal direction
After the s that strides carries out position selection to initial data, it is written in different memory banks;If read request, after data are read
S is striden to initial data progress position selection according to the shift information shift_inf and the horizontal direction.
Further improvement as apparatus of the present invention:The header element memory bank numbering computing module includes what is be sequentially connected
Selected to obtain the memory access type decision circuitry of memory access type, the modulus for performing modulo operation for the s that strides to level
Circuit and the memory bank numbering counting circuit for calculating memory bank numbering according to formula (1), access memory access logical address, level
Stride after s, export header element after the memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit successively
Memory bank numbering b0 corresponding to x0.
Further improvement as apparatus of the present invention:The shift information computing module includes the packet that strides being connected with each other
Circuit and multiplexer circuit, the packet circuit that strides receive the level and striden s, and striding s according to the level will be across
Step is divided into different groups, exports to the multiplexer circuit, and the multiplexer circuit receives the packet circuit that strides respectively
Export, memory bank numbering b0 corresponding to the header element x0, output difference strides the lower shift information shift_inf.
Further improvement as apparatus of the present invention:The memory bank home address order module calculates including logical address
Address offset circuit in circuit, block, the logical address counting circuit is according to header element x0 two dimension coordinates (i, j) and the water
Two dimension coordinates that each element of required access is calculated in step s are flatted across, address offset circuit calculates according to formula (2) in described piece
To skew of each element inside each memory bank of required access.
Further improvement as apparatus of the present invention:The memory access execution module includes buffer and for writing data
Carry out regioselective write data bits and put selection circuit, for putting selection electricity to reading the regioselective read data bit of data progress
Road, write data position selection circuit is received and writes shift information shift_inf, level strides s, and position is carried out to initial data
Exported after selection;The read data bit put selection circuit receive read shift information shift_inf, level strides s, after reading data
Position selection is carried out to initial data.
Further improvement as apparatus of the present invention:Also include Busy signal generating circuits, the Busy signals produce electricity
Road includes:For detecting whether read request and write request access the read/write address detection circuit of identical memory bank, for detecting
The buffering sky of the full state of the residing sky of buffering completely detects circuit, and for producing the Busy generators of Busy signals, it is described
Busy generators completely detect circuit inspection when the read/write address detects electric circuit inspection to generation read/write conflict or the buffering are empty
Measure buffering and send Busy signals when expiring.
Compared with prior art, the advantage of the invention is that:
1) present invention has taken into full account situations such as various levels are striden, vertically striden and striden with memory bank number, if
Unified address of cache mode is put, by using the unified address of cache mode, the data needed for accessing every time can be ensured
In different memory banks, gone so as to realize from striding for arbitrary address starting by block access Lothrus apterus, and pressing for alignment address
Lothrus apterus is accessed, sliding window class etc. can be greatly lifted and the memory access efficiency to stride by the application of block conflict-free access demand be present;
2) mapping mode that the present invention is striden by block access by forming unified support Lothrus apterus, by two dimensional image
Each pixel is mapped to the particular location of storage device, including the specific son that pixel is mapped in storage device is calculated
Memory bank, and determine to obtain particular location of the pixel in sub- memory bank so that by view data according to the mapping side
After formula write storage device, hardware will not clash when being striden by block access, realize that Lothrus apterus strides and visited by block
Ask;
3) present invention further passes sequentially through the calculating of header element memory bank numbering, shift information calculating, memory bank internally
Location calculates, the sequence of memory bank home address and memory access perform step and realize read and write access, and support strides by block conflict-free access
While, access efficiency is high, and whole process with additive, based on the simple operation such as displacement and selection logic, expense is small and is easy to
Realize.
Brief description of the drawings
Fig. 1 is the implementation process schematic diagram that the present embodiment supports Lothrus apterus to stride by the storage method of block access.
Fig. 2 is the structural representation that the present embodiment supports Lothrus apterus to stride by the memory access device of block access.
Fig. 3 is the principle schematic diagram of header element memory bank numbering computing module in the present embodiment.
Fig. 4 is the principle schematic diagram of shift information computing module in the present embodiment.
Fig. 5 is the principle schematic diagram of memory bank home address order module in the present embodiment.
Fig. 6 is the principle schematic diagram of memory access execution module in the present embodiment.
Fig. 7 is the principle schematic diagram of Busy signal generator modules in the present embodiment.
Fig. 8 is parallel convolution operations principle schematic in specific embodiment.
Fig. 9 is that the first transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Figure 10 is the present invention second of lower memory bank numbering schematic diagram that laterally strides in a particular embodiment.
Figure 11 is that the third transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Figure 12 is that the 4th kind of transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and
Limit the scope of the invention.
As shown in figure 1, the present embodiment supports the storage method that Lothrus apterus strides by block access, step includes:
Configure an Xm×YmThe two-dimensional storage space of size, each of which element coordinate are (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access,
Mapping equation is specially:
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s'
In formula (1), f (w) is the numbering of memory bank, and M and N are respectively by striding by data block during block access in level side
The SIMD width of the number of the element included on to, vertical direction, and M and N are 2 integral number power, M*N and processor and total
Memory bank number it is identical;S and h be respectively stride striden by horizontal direction during block access, Vertical Square strides, s=σ * 2s', wherein
σ and 2 is relatively prime, and "/" represents to take business to operate, and " % " represents to take the remainder operation.
One X is provided firstm×Ym(XmOr YmIntegral number power for 2) size 2D memory spaces, each of which element
Represented with coordinate (i, j);Further set n=log2(M*N) s=σ * 2, and by s are expressed ass'Form, wherein σ and 2 is relatively prime,
Coordinate is mapped to memory bank according to formula (1) for the element of (i, j), obtains memory bank numbering f (w).
The present embodiment has taken into full account how much passes that various levels stride, vertically stride and striden with memory bank number
Situations such as being, set such as the unified address mapping function of formula (1), by using the unified address of cache mode, can ensure
Data needed for accessing every time are in different memory banks, so as to realize striding by block access without punching from arbitrary address starting
It is prominent, and the access by row Lothrus apterus of alignment address, can greatly lift sliding window class etc. is needed in the presence of striding by block conflict-free access
The memory access efficiency for the application asked.
It is determined that after memory bank mapping function, element of the coordinate for (i, j) can be further determined that in memory bank home address.
In the present embodiment, after each pixel in two dimensional image is mapped into different memory banks, obtained according to formula (2) determination each
Pixel is in memory bank home address;
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
The present embodiment is based on above-mentioned formula (1), (2) form the mapping mode that unified support Lothrus apterus strides by block access,
Each pixel in two dimensional image is mapped to the particular location of storage device, wherein pixel is calculated by formula (1)
The specific sub- memory bank being mapped in storage device, further determine that to obtain the pixel in sub- memory bank by formula (2)
Particular location, by view data according to above-mentioned mapping mode write storage device, hardware when being striden by block access not
It can clash, realize that Lothrus apterus strides by block access.
The present embodiment is based on above-mentioned storage method, and when needing to perform read and write access, the left side by block is carried by access request
Upper angle element or the bidimensional address coordinate and access type (striding by block or by row) by first capable element x 0, read-write are visited
Ask and concretely comprise the following steps:
S1. header element memory bank numbering calculates;Memory bank b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden s, calculated correspondingly according to memory bank b0 and horizontal direction corresponding to header element x0
Shift information shift_inf, shift information shift_inf be used for data write buffer and from buffer read data row
Sequence;
S3. memory bank home address calculates;Striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction
The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank
Portion address A;
S4. memory bank home address sorts;S is striden to storage according to memory bank b0, horizontal direction corresponding to header element x0
Element inside body home address A is ranked up;
S5. memory access performs;If write request, s is striden to original number according to shift information shift_inf and horizontal direction
After position selection is carried out, it is written in different memory banks;If read request, according to shift information after data are read
Shift_inf and horizontal direction stride s to initial data progress position selection.
Above steps can use a station to complete, and can also be completed by the way of multistation flowing water.
In the present embodiment, whether states and memory access address of the step S25 also including detection buffer conflict to produce
The Busy signals of Busy signals produce step.
The above-mentioned read and write access step of the present embodiment, while support is striden by block conflict-free access, access efficiency is high, entirely
Process with additive, based on displacement and the selection simple operation such as logic, expense is small and is easily achieved.
As shown in Fig. 2 the memory access device that the present embodiment is striden by the storage method of block access using above-mentioned support Lothrus apterus,
Including:
Header element memory bank numbering computing module, for calculating memory bank b0 corresponding to header element x0 according to formula (1);
Shift information computing module, stride s, calculate for memory bank b0 and horizontal direction according to corresponding to header element x0
Corresponding shift information shift_inf;
Memory bank home address computing module, for by header element x0 bidimensional address coordinate and horizontal direction stride s, hang down
Nogata stride h calculate needed for access elements bidimensional address coordinate, and the access elements according to needed for calculating formula (2) are storing
Body home address A;
Memory bank home address order module, striden s pairs for memory bank b0, horizontal direction according to corresponding to header element x0
Element inside memory bank home address A is ranked up.
Memory access execution module;For if write request, being striden s pairs according to shift information shift_inf and horizontal direction
After initial data carries out position selection, it is written in different memory banks;If read request, according to displacement after data are read
Information shift_inf and horizontal direction stride s to initial data progress position selection.
By above-mentioned memory access device, simple in construction, required cost is low, and can ensure to access required data every time not
In same memory bank, so as to realize striding by block access Lothrus apterus, and the access by row of alignment address from arbitrary address starting
Lothrus apterus, it can greatly lift sliding window class etc. and the memory access efficiency to stride by the application of block conflict-free access demand be present.
As shown in figure 3, the present embodiment header element memory bank numbering computing module specifically include be sequentially connected be used for water
Flat across step s selected to obtain the memory access type decision circuitry of memory access type, the modulus circuit for performing modulo operation and
For calculating the memory bank numbering counting circuit of memory bank numbering according to formula (1), access memory access logical address, level stride after s,
Storage corresponding to header element x0 is exported after memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit successively
Body numbering b0.
Striden first by memory access type decision circuitry according to level and determine memory access type, i.e., first determine s' in formula (1)
Value, the present embodiment strides especially by MUX to level to be selected to obtain memory access type s';Due in formula (1)
Need to carry out modulo operation, the present embodiment sets modulus circuit to perform modulo operation;Memory bank numbering counting circuit is specially
The hardware configuration formed by the part such as shifting, being added, to realize the computing of formula (1) by hardware configuration, finally gives this
Memory bank numbering b0 corresponding to memory access header element.
As shown in figure 4, the present embodiment shift information computing module specifically include interconnection stride packet circuit and
Multiplexer circuit, the packet circuit reception level that strides stride s, will stride according to the level s that strides and be divided into different groups, and export to more
Road selection circuit, multiplexer circuit receives the output for the packet circuit that strides respectively, memory bank b0 corresponding to header element x0, output
Difference strides lower shift information shift_inf.
The access type supported due to write operation and read operation is different, read operation support data from arbitrary address originate across
Step presses block access, and write operation supports entering by row write for alignment of data, therefore writes shift information and read the calculating side of shift information
Formula also differs, wherein when reading shift information calculating, first being striden to stride according to level is divided into different groups, in conjunction with header element
Memory bank numbering b0 is striden lower shift information shift_inf by data selector to determine difference;When writing shift information, by head
Ordinate in the dimension coordinate of element two carries out modulo operation (mould 1~15), is selected according to vertical stride.
As shown in figure 5, the present embodiment memory bank home address order module includes address in logical address counting circuit, block
Required visit is calculated according to header element x0 two dimension coordinates (i, j) and the level s that strides in off-centre circuit, logical address counting circuit
Two dimension coordinates of each element asked, each element that address offset circuit accesses according to needed for being calculated formula (2) in block is each
Skew inside memory bank.
As shown in fig. 6, the present embodiment memory access execution module includes buffer and for carrying out position selection to writing data
Write data bits put selection circuit, put selection circuit for carrying out regioselective read data bit to reading data, write data bits are put
Selection circuit receives and writes shift information shift_inf, level strides s, is exported after carrying out position selection to initial data;Read data
Position selection circuit receives and reads shift information shift_inf, level strides s, position choosing is carried out after reading data to initial data
Select.
Memory access implementation module is logically divided into two parts, if write request, due to the memory bank to be written of user's transmission
Data are continuous, and it is not continuous, it is necessary to according to writing shift_inf and s to original number to actual write in memory bank
According to position selection is carried out, so as to be written in different memory banks;If read request, the data that are read from each memory bank
Need to be supplied to user after arranging by certain requirement, thus after data are read according to shift_inf and s to initial data
Carry out position selection.
As shown in fig. 7, the present embodiment also includes Busy signal generating circuits, Busy signal generating circuits include:For examining
Survey read request and whether write request accesses the read/write address detection circuit of identical memory bank, expires for detecting the residing sky of buffering
The buffering sky of state completely detects circuit, and for producing the Busy generators of Busy signals, Busy generators work as read/write address
Detection electric circuit inspection sends Busy signals to empty completely detect when electric circuit inspection expires to buffering of generation read/write conflict or buffering.
Memory access conflict can occur if same memory bank is accessed, now need to stop sending access request, otherwise can
Cause loss of data, circuit is detected by read/write address to detect whether read-write requests occur memory access conflict;The empty full detection of buffering
The full state of the residing sky of electric circuit inspection buffering, the present embodiment buffer follows the principle of write-after-read, by taking ping-pong buffers as an example:If
The register of meter one two is used for indicating buffer status, a total of 4 kinds of states:00,01,10,11, wherein 00 corresponds to " table tennis
Pang " buffering is sky, and 01 pang the buffering for corresponding to " rattle " in buffering is full, and 10 table tennis in buffering that corresponded to " rattle " buffer
Expire, 11, which correspond to " rattle ", buffers has expired;Busy generators receive the empty full detection electricity of read/write address detection circuit, buffering respectively
The detection output on road, Busy signals are produced according to the buffer status and read and write access situation that detect, when occur read/write conflict or
Person sends Busy signals when buffering full (state 11), now no longer receives signal access request.
The above-mentioned memory access device of the present embodiment, it is made up of simple logic circuits such as addition, displacement and selection logics, hardware spending
It is small and be easily achieved.
Entered below with the block by 4*2 (M=4, N=2) exemplified by row vector memory access and the present invention is further described, wherein
Buffer unit uses " table tennis " mechanism.
Fig. 8 be in specific embodiment execution parallel convolution operations schematic diagram, wherein convolution algorithm level stride and vertically across
Step is 2, and convolution kernel size is 3*3.The convolution of M*N output image pixel of parallel computation is needed in a bat from buffer
Obtain M*N view data, convolution and data one, which are clapped, only needs to read one, and such 9 bat can be to calculate 8 output images
The result of pixel.
To avoid memory access conflict from must assure that these data distributions in different memory banks, it is necessary to be carried out to buffering specific
Addressing, but striden existing for convolution algorithm and further increase the design difficulty of buffering.The present embodiment is by using above-mentioned
Storage method stores to data, i.e., the specific sub- storage that pixel is mapped in storage device is calculated by formula (1)
Body, further determine that to obtain particular location of the pixel in sub- memory bank by formula (2), and deposited successively by header element
Storage body numbering calculates, shift information calculates, memory bank home address calculates, the sequence of memory bank home address and memory access perform step
Suddenly read and write access is realized, enables to clash when being striden by block access, realize that Lothrus apterus strides by block
Access.
This implementation realizes that Lothrus apterus strides concretely comprising the following steps by block access by above-mentioned memory access device:Header element memory bank
Two dimension coordinates (i, j) and level that numbering computing module provides according to user stride s, and header element is calculated using formula (1)
It is mapped to memory bank numbering b0;Shift information computing module obtains data using header element memory bank numbering b0 and the level s that strides
Shift information shift_inf, shift_inf are used for data and write memory bank and data sorting is read from memory bank;Memory bank
8 elements of this needs are calculated according to two dimension coordinates (i, j) and the horizontal s that strides of header element for home address order module
Two dimension coordinates and this skew of 8 elements inside each memory bank is calculated according to formula (2);Memory access execution module root
Position selection is carried out to initial data according to shift_inf and s, is written in different memory banks, and the root after data are read
Position selection is carried out to initial data according to shift_inf and s;Busy signal generating circuits by the state of detection buffer and
Whether memory access address, which conflicts, produces Busy signals.
For the present embodiment, 2D buffers address schematic diagram, SIMD width in the case of the difference that strides as shown in figs. 9 to 12
For 8, below to be illustrated exemplified by 8 (M=4, N=2) memory bank numbers, scope of supporting laterally to stride now is deposited for 1~15
Storage body addressing function (formula (1)) can be now reduced to:
F (w)=(w+ (w/8) %s ') %8 (3)
W=i+ (j/h) * 4s'(4)
The scope to be striden according to varying level, above-mentioned storage module mapping function can be divided into four kinds of situations:
The s=1 1. level strides, 3,5,7,9,11,13,15
When the s that laterally strides is 1,3,5,7,9,11,13,15, storage module numbering is as shown in figure 9, specific addressing
Mode determines according to formula (5), (6), now s'=0:
F (w)=w%8 (5)
W=i+ (j/h) * 4 (6)
As shown in figure 9, the horizontal direction cycle of two-dimentional buffer is 8, square part is filled in figure and represents that non-alignment strides
By block access, from logical address from (1,0) starting, laterally stride as 3, and longitudinal direction strides any, is read logically by 4*2 block
Location is (1,0), (4,0), (7,0), (10,0), (0, j), (4, j), (7, j) (10, j) 8 numbers, pass through formula (5) and (6)
Above-mentioned 8 logical addresses are respectively mapped in 0~7 8 memory banks, you can to ensure that memory access does not clash;Slash square
Alignment is represented by the continuous memory access of row, because the buffer level direction cycle is 8, so 8 physical address that mapping obtains exist
In different memory banks, memory access will not clash.I.e. from figure 8, it is seen that the non-alignment originated from arbitrary address presses block
Memory access conflict will not occur for the access by row for accessing and aliging that strides.
The s=2 2. level strides, 6,10,14
When the s that laterally strides is 2,6,10,14, storage module numbering is as shown in Figure 10, and specific addressing mode is according to public affairs
Formula (7), (8) determine, now s'=1:
F (w)=(w+ (w/8) %2) %8 (7)
W=i+ (j/h) * 8 (8)
As shown in Figure 10, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across
Step press block access, from logical address from (1,0) starting, laterally strides as 2, and longitudinal direction is striden arbitrarily, and reading is often clapped by 4*2 block
It is (1,0), (3,0), (5,0), (7,0), (1, j), (3, j), 8 numbers of (5, j) (7, j) to take logical address;Above-mentioned 8 logics
The physical address that address obtains according to formula (7) (8) mapping is respectively fallen in different memory banks;Slash square represents that alignment is pressed
Row accesses, and same 8 physical address are in different memory banks.I.e. from fig. 9, it can be seen that from arbitrary address starting it is non-right
Memory access conflict will not occur for the neat access by row for accessing and aliging that striden by block.
The s=4 3. level strides, 12
Laterally stride s for 4,12 when, the numbering of memory bank is as shown in Figure 10, specific addressing mode according to formula (9),
(19) determine, now s'=2:
F (w)=(w+ (w/8) %4) %8 (9)
W=i+ (j/h) * 16 (10)
As shown in figure 11, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across
Step press block access, is originated from logical address from (1,0), laterally strides as 4, longitudinally strides any, by 4*2 block, often claps reading
Logical address is (1,0), (5,0), (9,0), (13,0), (1, j), (5, j), 8 numbers of (9, j) (13, j);Above-mentioned 8 logics
The physical address that address obtains according to formula (9) (10) mapping is respectively fallen in different memory banks;Slash square represents alignment
Access by row, same 8 physical address are in different memory banks.I.e. it can be seen from figure 11 that arbitrary address starting it is non-
Memory access conflict will not occur for the access by row for accessing and aliging that striden by block of alignment.
The s=8 4. level strides
When the s that laterally strides is 8, the numbering of memory bank is as shown in figure 11, and specific addressing mode is according to formula (11), (12)
It is determined that now s'=3:
F (w)=(w+ (w/8) %8) %8 (11)
W=i+ (j/h) * 32 (12)
As shown in figure 12, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across
Step presses block access, and from logical address from (1,0) starting, laterally stride is striden arbitrarily for 2 and longitudinal direction, and reading is often clapped by 4*2 block
It is (1,0), (9,0), (17,0), (25,0), (1, j), (9, j), 8 numbers of (17, j) (25, j) to take logical address;Above-mentioned 8
The physical address that logical address obtains according to formula (11) (12) mapping is respectively fallen in different memory banks;Slash square represents
Alignment access by row, same 8 physical address are in different memory banks.I.e. it can be recognized from fig. 12 that arbitrary address originates
Non-alignment by block stride access and alignment access by row memory access conflict will not occur.
Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.It is although of the invention
It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention
Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, it all should fall
In the range of technical solution of the present invention protection.
Claims (10)
1. a kind of support that Lothrus apterus strides the storage method by block access, it is characterised in that step includes:
A two-dimensional storage space is configured, each of which element coordinate is (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access, mapping
Formula is specially:
<mrow>
<mi>f</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>+</mo>
<mo>(</mo>
<mrow>
<mi>w</mi>
<mo>/</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>M</mi>
<mo>*</mo>
<mi>N</mi>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mo>)</mo>
<mi>%</mi>
<msup>
<mn>2</mn>
<msup>
<mi>s</mi>
<mo>&prime;</mo>
</msup>
</msup>
<mo>)</mo>
<mi>%</mi>
<mo>(</mo>
<mi>M</mi>
<mo>*</mo>
<mi>N</mi>
<mo>)</mo>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<msup>
<mi>s</mi>
<mo>&prime;</mo>
</msup>
<mo>&le;</mo>
<mi>n</mi>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>+</mo>
<mfrac>
<mrow>
<mi>w</mi>
<mo>/</mo>
<mrow>
<mo>(</mo>
<mrow>
<mi>M</mi>
<mo>*</mo>
<mi>N</mi>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mn>2</mn>
<mrow>
<msup>
<mi>s</mi>
<mo>&prime;</mo>
</msup>
<mo>-</mo>
<mi>n</mi>
</mrow>
</msup>
</mfrac>
<mo>)</mo>
<mi>%</mi>
<mo>(</mo>
<mi>M</mi>
<mo>*</mo>
<mi>N</mi>
<mo>)</mo>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<msup>
<mi>s</mi>
<mo>&prime;</mo>
</msup>
<mo>></mo>
<mi>n</mi>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s';
In formula, f (w) is the numbering of memory bank that mapping obtains, and M and N are respectively by striding by data block during block access in level
The number of the element included on direction, vertical direction, and M and N are 2 integral number power;S and h is respectively to stride by block access
When horizontal direction strides, Vertical Square strides, s=σ * 2s', wherein σ and 2 is relatively prime.
2. the storage method according to claim 1 for supporting Lothrus apterus to stride by block access, it is characterised in that:It is described by two
After each pixel in dimension image is mapped to different memory banks, determine to obtain each pixel in memory bank according to formula (2)
Home address;
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
3. the storage method according to claim 2 for supporting Lothrus apterus to stride by block access, it is characterised in that when needs are held
During row read and write access, read and write access concretely comprises the following steps:
S1. header element memory bank numbering calculates;Memory bank numbering b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden s according to memory bank numbering b0 and the horizontal direction corresponding to the header element x0, meter
Shift information shift_inf corresponding to calculating;
S3. memory bank home address calculates;Striden h by stride s, Vertical Square of header element x0 bidimensional address coordinate and horizontal direction
The bidimensional address coordinate of access elements needed for calculating, and according to needed for calculating formula (2) access elements in memory bank internally
Location A;
S4. memory bank home address sorts;Striden according to memory bank numbering b0, the horizontal direction corresponding to the header element x0
S is ranked up to the element inside the memory bank home address A;
S5. memory access performs;If write request, s is striden to original according to the shift information shift_inf and the horizontal direction
After beginning data carry out position selection, it is written in different memory banks;If read request, according to the shifting after data are read
Position information shift_inf and the horizontal direction stride s to initial data progress position selection.
4. the storage method according to claim 3 for supporting Lothrus apterus to stride by block access, it is characterised in that the step
Whether states and memory access address of the S25 also including detection buffer conflict produces step to produce the Busy signals of Busy signals
Suddenly.
The storage method by block access 5. a kind of support Lothrus apterus using described in any one in Claims 1 to 44 strides
Memory access device, it is characterised in that including:
Header element memory bank numbering computing module, for calculating memory bank numbering b0 corresponding to header element x0 according to formula (1);
Shift information computing module, striden for memory bank numbering b0 and the horizontal direction according to corresponding to the header element x0
S, calculate corresponding shift information shift_inf;
Memory bank home address computing module, for being striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction
The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank
Portion address A;
Memory bank home address order module, for memory bank numbering b0, the level side according to corresponding to the header element x0
To striding, s is ranked up to the element inside the memory bank home address A;
Memory access execution module, for if write request, being striden according to the shift information shift_inf and the horizontal direction
After s carries out position selection to initial data, it is written in different memory banks;If read request, the basis after data are read
The shift information shift_inf and the horizontal direction stride s to initial data progress position selection.
6. memory access device according to claim 5, it is characterised in that the header element memory bank numbering computing module includes
The s that is used to stride to level being sequentially connected, which is selected, obtains the memory access type decision circuitry of memory access type, for performing modulus
The modulus circuit of computing and the memory bank numbering counting circuit for calculating memory bank numbering according to formula (1), access memory access are patrolled
Volume address, level stride after s, successively after the memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit
Export memory bank numbering b0 corresponding to header element x0.
7. memory access device according to claim 6, it is characterised in that:The shift information computing module includes being connected with each other
Stride packet circuit and multiplexer circuit, the packet circuit that strides receives the level and striden s, according to the level
The s that strides, which will stride, is divided into different groups, exports to the multiplexer circuit, and the multiplexer circuit receives described stride respectively
The exporting of packet circuit, memory bank numbering b0 corresponding to the header element x0, output difference stride the lower shift information
shift_inf。
8. memory access device according to claim 7, it is characterised in that:The memory bank home address order module includes patrolling
Address offset circuit in volume address calculating circuit, block, the logical address counting circuit according to header element x0 two dimension coordinates (i,
J) and the level stride s be calculated needed for access each element two dimension coordinates, in described piece address offset circuit according to
Skew of each element of required access inside each memory bank is calculated in formula (2).
9. memory access device according to claim 8, it is characterised in that:The memory access execution module includes buffer and use
Selection circuit is put in carrying out regioselective write data bits to writing data, for carrying out regioselective read data bit to reading data
Put selection circuit, write data position selection circuit receives and writes shift information shift_inf, level strides s, to initial data
Exported after carrying out position selection;The read data bit puts selection circuit reception reading shift information shift_inf, level strides s, reading
Go out after data and position selection is carried out to initial data.
10. the memory access device according to any one in claim 5~9, it is characterised in that also produced including Busy signals
Circuit, the Busy signal generating circuits include:For detecting whether read request and write request access the reading of identical memory bank
Write address detection circuit, the buffering sky of the full state of the sky residing for detection buffering completely detect circuit, and for producing Busy
The Busy generators of signal, the Busy generators when the read/write address detection electric circuit inspection to generation read/write conflict or
Empty completely detect when electric circuit inspection expires to buffering of the buffering sends Busy signals.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710901233.2A CN107748723B (en) | 2017-09-28 | 2017-09-28 | Storage method and access device supporting conflict-free stepping block-by-block access |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710901233.2A CN107748723B (en) | 2017-09-28 | 2017-09-28 | Storage method and access device supporting conflict-free stepping block-by-block access |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107748723A true CN107748723A (en) | 2018-03-02 |
CN107748723B CN107748723B (en) | 2020-03-20 |
Family
ID=61256004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710901233.2A Active CN107748723B (en) | 2017-09-28 | 2017-09-28 | Storage method and access device supporting conflict-free stepping block-by-block access |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107748723B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635235A (en) * | 2018-11-06 | 2019-04-16 | 海南大学 | A kind of the triangular portions storage device and parallel read method of self adjoint matrix |
CN109710309A (en) * | 2018-12-24 | 2019-05-03 | 安谋科技(中国)有限公司 | The method for reducing bank conflict |
CN111209244A (en) * | 2018-11-21 | 2020-05-29 | 上海寒武纪信息科技有限公司 | Data processing device and related product |
CN111813722A (en) * | 2019-04-10 | 2020-10-23 | 北京灵汐科技有限公司 | Data read-write method and system based on shared memory and readable storage medium |
CN112445713A (en) * | 2019-08-15 | 2021-03-05 | 辉达公司 | Techniques for efficiently partitioning memory |
CN114780459A (en) * | 2022-04-06 | 2022-07-22 | Oppo广东移动通信有限公司 | Control module, storage system and control method |
CN114827091A (en) * | 2022-04-25 | 2022-07-29 | 珠海格力电器股份有限公司 | Method and device for processing physical address conflict and communication equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120159130A1 (en) * | 2010-12-21 | 2012-06-21 | Mikhail Smelyanskiy | Mechanism for conflict detection using simd |
US8731051B1 (en) * | 2006-02-10 | 2014-05-20 | Nvidia Corporation | Forward and inverse quantization of data for video compression |
CN104699624A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method |
CN105843591A (en) * | 2016-04-08 | 2016-08-10 | 龙芯中科技术有限公司 | Method and device for generating data through multi-dimensional array sliding as well as processor |
-
2017
- 2017-09-28 CN CN201710901233.2A patent/CN107748723B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8731051B1 (en) * | 2006-02-10 | 2014-05-20 | Nvidia Corporation | Forward and inverse quantization of data for video compression |
US20120159130A1 (en) * | 2010-12-21 | 2012-06-21 | Mikhail Smelyanskiy | Mechanism for conflict detection using simd |
CN103262058A (en) * | 2010-12-21 | 2013-08-21 | 英特尔公司 | Mechanism for conflict detection by using SIMD |
CN104699624A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method |
CN105843591A (en) * | 2016-04-08 | 2016-08-10 | 龙芯中科技术有限公司 | Method and device for generating data through multi-dimensional array sliding as well as processor |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635235A (en) * | 2018-11-06 | 2019-04-16 | 海南大学 | A kind of the triangular portions storage device and parallel read method of self adjoint matrix |
CN111209244A (en) * | 2018-11-21 | 2020-05-29 | 上海寒武纪信息科技有限公司 | Data processing device and related product |
CN111209244B (en) * | 2018-11-21 | 2022-05-06 | 上海寒武纪信息科技有限公司 | Data processing device and related product |
CN109710309A (en) * | 2018-12-24 | 2019-05-03 | 安谋科技(中国)有限公司 | The method for reducing bank conflict |
CN109710309B (en) * | 2018-12-24 | 2021-01-26 | 安谋科技(中国)有限公司 | Method for reducing memory bank conflict |
CN111813722A (en) * | 2019-04-10 | 2020-10-23 | 北京灵汐科技有限公司 | Data read-write method and system based on shared memory and readable storage medium |
CN111813722B (en) * | 2019-04-10 | 2022-04-15 | 北京灵汐科技有限公司 | Data read-write method and system based on shared memory and readable storage medium |
CN112445713A (en) * | 2019-08-15 | 2021-03-05 | 辉达公司 | Techniques for efficiently partitioning memory |
CN114780459A (en) * | 2022-04-06 | 2022-07-22 | Oppo广东移动通信有限公司 | Control module, storage system and control method |
CN114827091A (en) * | 2022-04-25 | 2022-07-29 | 珠海格力电器股份有限公司 | Method and device for processing physical address conflict and communication equipment |
CN114827091B (en) * | 2022-04-25 | 2023-06-20 | 珠海格力电器股份有限公司 | Physical address conflict processing method and device and communication equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107748723B (en) | 2020-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107748723A (en) | Storage method and access device supporting conflict-free stepping block-by-block access | |
RU2623806C1 (en) | Method and device of processing stereo images | |
CN102460503B (en) | Apparatus and method for displaying warped version of source image | |
CN108681984A (en) | A kind of accelerating circuit of 3*3 convolution algorithms | |
KR101639574B1 (en) | Image processing system supplying adaptive bank address and address mapping method thereof | |
US10621446B2 (en) | Handling perspective magnification in optical flow processing | |
CN103218348B (en) | Fast Fourier Transform (FFT) disposal route and system | |
CN104699465B (en) | Vector access and storage device supporting SIMT in vector processor and control method | |
CN110163338B (en) | Chip operation method and device with operation array, terminal and chip | |
CN110637461B (en) | Compact optical flow handling in computer vision systems | |
CN106683158A (en) | Modeling structure of GPU texture mapping non-blocking memory Cache | |
CN106846255B (en) | Image rotation realization method and device | |
CN208766715U (en) | The accelerating circuit of 3*3 convolution algorithm | |
JP3639464B2 (en) | Information processing system | |
CN110390382B (en) | Convolutional neural network hardware accelerator with novel feature map caching module | |
CN106021182A (en) | Line transpose architecture design method based on two-dimensional FFT (Fast Fourier Transform) processor | |
CN106530209A (en) | FPGA-based image rotation method and apparatus | |
CN111861883B (en) | Multi-channel video splicing method based on synchronous integral SURF algorithm | |
CN102592258B (en) | Configurable Gabor filtering hardware acceleration unit applied to fingerprint image enhancement | |
CN107563080B (en) | GPU-based two-phase medium random model parallel generation method and electronic equipment | |
CN106780415A (en) | A kind of statistics with histogram circuit and multimedia processing system | |
CN104869284A (en) | High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm | |
CN101796845A (en) | Device for motion search in dynamic image encoding | |
CN109614149B (en) | Upper triangular part storage device of symmetric matrix and parallel reading method | |
CN1105358C (en) | Semiconductor memory having arithmetic function, and processor using the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |