CN107748723A - Storage method and access device supporting conflict-free stepping block-by-block access - Google Patents

Storage method and access device supporting conflict-free stepping block-by-block access Download PDF

Info

Publication number
CN107748723A
CN107748723A CN201710901233.2A CN201710901233A CN107748723A CN 107748723 A CN107748723 A CN 107748723A CN 201710901233 A CN201710901233 A CN 201710901233A CN 107748723 A CN107748723 A CN 107748723A
Authority
CN
China
Prior art keywords
access
memory bank
memory
mrow
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710901233.2A
Other languages
Chinese (zh)
Other versions
CN107748723B (en
Inventor
刘胜
陈海燕
陈小文
鲁建壮
雷元武
谭弘兵
宋蕊
曾国钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710901233.2A priority Critical patent/CN107748723B/en
Publication of CN107748723A publication Critical patent/CN107748723A/en
Application granted granted Critical
Publication of CN107748723B publication Critical patent/CN107748723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Image Input (AREA)

Abstract

The invention discloses a storage method and a memory access device supporting conflict-free striding block-by-block access, wherein the storage method comprises the following steps: configuring a two-dimensional storage space with one size; mapping each pixel point in the two-dimensional image to different memory banks to support conflict-free step block-by-block access; the memory access device comprises a first element memory bank number calculation module, a shift information calculation module, a memory bank internal address sequencing module and a memory access execution module. The invention can support conflict-free striding block-by-block access of any address start, simultaneously has conflict-free line-by-line access of aligned addresses, and has the advantages of simple realization method, high access efficiency and the like.

Description

The storage method and memory access device for supporting Lothrus apterus to stride by block access
Technical field
The present invention relates to vectorial SIMD (Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD)) place Device technical field of memory is managed, more particularly to a kind of supports that Lothrus apterus strides storage method and memory access device by block access.
Background technology
Can the design quality of storage system be an important factor for govern efficiently play performance in vectorial SIMD processor, Current vectorial SIMD processor storage system be typically using by multiple memory modules according to organizing by the way of certain and Row memory mechanism, this organizational form is especially suitable for the application that image, video etc. have 2D memory access demands, and video, image are calculated Method needs the data of parallel processing often to form the row, column or block in bidimensional (2D) space, using traditional vectorial SIMD processor Memory access efficiency step-down can be caused during mapping.
To solve the above problems, it is typically to use 2D memory mechanisms at present, 2D memory mechanisms are in parallel memorizing mechanism One kind, i.e., memory space is represented by two coordinates of X and Y, to ensure image, Video Applications when mapping data between pixel 2D relations be not destroyed, effectively improve the data reusing and execution efficiency of application, at the same significantly improve video image calculation Memory access efficiency of the method in vectorial SIMD processor.2D storage device address of cache is divided into two parts:Store module mapping and Memory bank home address maps, 2D memory mechanisms be by setting suitable storage module mapping function, ensure every time to Amount memory access is mapped to different memory banks, and memory bank home address mapping function can navigate to specific access position.
2D memory mechanisms of the prior art can only realize the continuously conflict-free access by row, column or block, but regard Frequently, some sliding window classes apply (including convolutional neural networks, sub-pix difference, 2D filtering etc.) in computing in image processing algorithm In the presence of the demand to stride by block conflict-free access, and to the horizontal and vertical demand also not phase that strides in the subprocess of computing Together, striden for what is originated from arbitrary address by block access Lothrus apterus, and the access by row Lothrus apterus problem of alignment address, mesh It is preceding not yet to have effective solution, therefore how to stride, vertically stride and stride and memory bank number in various levels When, it is able to ensure that each element is mapped in different memory banks when under different two dimensions stride by block access (i.e. Lothrus apterus), and the access by row Lothrus apterus for taking into account alignment address is urgent problem to be solved.
The content of the invention
The technical problem to be solved in the present invention is that:For technical problem existing for prior art, the present invention provides one Kind can support the Lothrus apterus that arbitrary address originates to stride by block access, while have the access by row Lothrus apterus of alignment address concurrently, And the support Lothrus apterus that implementation method is simple, access efficiency is high strides the storage method and memory access device by block access.
In order to solve the above technical problems, technical scheme proposed by the present invention is:
A kind of to support that Lothrus apterus strides the storage method by block access, step includes:
A two-dimensional storage space is configured, each of which element coordinate is (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access, Mapping equation is specially:
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s'
In formula, f (w) is the numbering for the memory bank that mapping obtains, and M and N are respectively to exist by striding by data block during block access The number of the element included in horizontal direction, vertical direction, and M and N are 2 integral number power;S and h is respectively to stride by block Horizontal direction strides during access, Vertical Square strides, s=σ * 2s', wherein σ and 2 is relatively prime.
Further improvement as the inventive method:Each pixel by two dimensional image is mapped to different deposit Chu Tihou, determine to obtain each pixel in memory bank home address according to formula (2);
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
As the further improvement of the inventive method, when needing to perform read and write access, read and write access concretely comprises the following steps:
S1. header element memory bank numbering calculates;Memory bank numbering b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden according to memory bank numbering b0 and the horizontal direction corresponding to the header element x0 S, calculate corresponding shift information shift_inf;
S3. memory bank home address calculates;Striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank Portion address A;
S4. memory bank home address sorts;According to memory bank numbering b0, the horizontal direction corresponding to the header element x0 The s that strides is ranked up to the element inside the memory bank home address A;
S5. memory access performs;If write request, striden s according to the shift information shift_inf and the horizontal direction After carrying out position selection to initial data, it is written in different memory banks;If read request, according to institute after data are read State shift information shift_inf and the horizontal direction strides s to initial data progress position selection.
Further improvement as the inventive method:States and memory access of the step S25 also including detection buffer Whether address conflicts produces step to produce the Busy signals of Busy signals.
The present invention further provides the memory access device to be striden using above-mentioned support Lothrus apterus by the storage method of block access, bag Include:
Header element memory bank numbering computing module, numbered for calculating memory bank corresponding to header element x0 according to formula (1) b0;
Shift information computing module, for memory bank numbering b0 and the horizontal direction according to corresponding to the header element x0 Stride s, calculates corresponding shift information shift_inf;
Memory bank home address computing module, for by header element x0 bidimensional address coordinate and horizontal direction stride s, hang down Nogata stride h calculate needed for access elements bidimensional address coordinate, and the access elements according to needed for calculating formula (2) are storing Body home address A;
Memory bank home address order module, for memory bank numbering b0, the water according to corresponding to the header element x0 Square the element inside the memory bank home address A is ranked up to the s that strides.
Memory access execution module, for if write request, according to the shift information shift_inf and the horizontal direction After the s that strides carries out position selection to initial data, it is written in different memory banks;If read request, after data are read S is striden to initial data progress position selection according to the shift information shift_inf and the horizontal direction.
Further improvement as apparatus of the present invention:The header element memory bank numbering computing module includes what is be sequentially connected Selected to obtain the memory access type decision circuitry of memory access type, the modulus for performing modulo operation for the s that strides to level Circuit and the memory bank numbering counting circuit for calculating memory bank numbering according to formula (1), access memory access logical address, level Stride after s, export header element after the memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit successively Memory bank numbering b0 corresponding to x0.
Further improvement as apparatus of the present invention:The shift information computing module includes the packet that strides being connected with each other Circuit and multiplexer circuit, the packet circuit that strides receive the level and striden s, and striding s according to the level will be across Step is divided into different groups, exports to the multiplexer circuit, and the multiplexer circuit receives the packet circuit that strides respectively Export, memory bank numbering b0 corresponding to the header element x0, output difference strides the lower shift information shift_inf.
Further improvement as apparatus of the present invention:The memory bank home address order module calculates including logical address Address offset circuit in circuit, block, the logical address counting circuit is according to header element x0 two dimension coordinates (i, j) and the water Two dimension coordinates that each element of required access is calculated in step s are flatted across, address offset circuit calculates according to formula (2) in described piece To skew of each element inside each memory bank of required access.
Further improvement as apparatus of the present invention:The memory access execution module includes buffer and for writing data Carry out regioselective write data bits and put selection circuit, for putting selection electricity to reading the regioselective read data bit of data progress Road, write data position selection circuit is received and writes shift information shift_inf, level strides s, and position is carried out to initial data Exported after selection;The read data bit put selection circuit receive read shift information shift_inf, level strides s, after reading data Position selection is carried out to initial data.
Further improvement as apparatus of the present invention:Also include Busy signal generating circuits, the Busy signals produce electricity Road includes:For detecting whether read request and write request access the read/write address detection circuit of identical memory bank, for detecting The buffering sky of the full state of the residing sky of buffering completely detects circuit, and for producing the Busy generators of Busy signals, it is described Busy generators completely detect circuit inspection when the read/write address detects electric circuit inspection to generation read/write conflict or the buffering are empty Measure buffering and send Busy signals when expiring.
Compared with prior art, the advantage of the invention is that:
1) present invention has taken into full account situations such as various levels are striden, vertically striden and striden with memory bank number, if Unified address of cache mode is put, by using the unified address of cache mode, the data needed for accessing every time can be ensured In different memory banks, gone so as to realize from striding for arbitrary address starting by block access Lothrus apterus, and pressing for alignment address Lothrus apterus is accessed, sliding window class etc. can be greatly lifted and the memory access efficiency to stride by the application of block conflict-free access demand be present;
2) mapping mode that the present invention is striden by block access by forming unified support Lothrus apterus, by two dimensional image Each pixel is mapped to the particular location of storage device, including the specific son that pixel is mapped in storage device is calculated Memory bank, and determine to obtain particular location of the pixel in sub- memory bank so that by view data according to the mapping side After formula write storage device, hardware will not clash when being striden by block access, realize that Lothrus apterus strides and visited by block Ask;
3) present invention further passes sequentially through the calculating of header element memory bank numbering, shift information calculating, memory bank internally Location calculates, the sequence of memory bank home address and memory access perform step and realize read and write access, and support strides by block conflict-free access While, access efficiency is high, and whole process with additive, based on the simple operation such as displacement and selection logic, expense is small and is easy to Realize.
Brief description of the drawings
Fig. 1 is the implementation process schematic diagram that the present embodiment supports Lothrus apterus to stride by the storage method of block access.
Fig. 2 is the structural representation that the present embodiment supports Lothrus apterus to stride by the memory access device of block access.
Fig. 3 is the principle schematic diagram of header element memory bank numbering computing module in the present embodiment.
Fig. 4 is the principle schematic diagram of shift information computing module in the present embodiment.
Fig. 5 is the principle schematic diagram of memory bank home address order module in the present embodiment.
Fig. 6 is the principle schematic diagram of memory access execution module in the present embodiment.
Fig. 7 is the principle schematic diagram of Busy signal generator modules in the present embodiment.
Fig. 8 is parallel convolution operations principle schematic in specific embodiment.
Fig. 9 is that the first transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Figure 10 is the present invention second of lower memory bank numbering schematic diagram that laterally strides in a particular embodiment.
Figure 11 is that the third transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Figure 12 is that the 4th kind of transverse direction strides lower memory bank numbering schematic diagram the present invention in a particular embodiment.
Embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.
As shown in figure 1, the present embodiment supports the storage method that Lothrus apterus strides by block access, step includes:
Configure an Xm×YmThe two-dimensional storage space of size, each of which element coordinate are (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access, Mapping equation is specially:
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s'
In formula (1), f (w) is the numbering of memory bank, and M and N are respectively by striding by data block during block access in level side The SIMD width of the number of the element included on to, vertical direction, and M and N are 2 integral number power, M*N and processor and total Memory bank number it is identical;S and h be respectively stride striden by horizontal direction during block access, Vertical Square strides, s=σ * 2s', wherein σ and 2 is relatively prime, and "/" represents to take business to operate, and " % " represents to take the remainder operation.
One X is provided firstm×Ym(XmOr YmIntegral number power for 2) size 2D memory spaces, each of which element Represented with coordinate (i, j);Further set n=log2(M*N) s=σ * 2, and by s are expressed ass'Form, wherein σ and 2 is relatively prime, Coordinate is mapped to memory bank according to formula (1) for the element of (i, j), obtains memory bank numbering f (w).
The present embodiment has taken into full account how much passes that various levels stride, vertically stride and striden with memory bank number Situations such as being, set such as the unified address mapping function of formula (1), by using the unified address of cache mode, can ensure Data needed for accessing every time are in different memory banks, so as to realize striding by block access without punching from arbitrary address starting It is prominent, and the access by row Lothrus apterus of alignment address, can greatly lift sliding window class etc. is needed in the presence of striding by block conflict-free access The memory access efficiency for the application asked.
It is determined that after memory bank mapping function, element of the coordinate for (i, j) can be further determined that in memory bank home address. In the present embodiment, after each pixel in two dimensional image is mapped into different memory banks, obtained according to formula (2) determination each Pixel is in memory bank home address;
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
The present embodiment is based on above-mentioned formula (1), (2) form the mapping mode that unified support Lothrus apterus strides by block access, Each pixel in two dimensional image is mapped to the particular location of storage device, wherein pixel is calculated by formula (1) The specific sub- memory bank being mapped in storage device, further determine that to obtain the pixel in sub- memory bank by formula (2) Particular location, by view data according to above-mentioned mapping mode write storage device, hardware when being striden by block access not It can clash, realize that Lothrus apterus strides by block access.
The present embodiment is based on above-mentioned storage method, and when needing to perform read and write access, the left side by block is carried by access request Upper angle element or the bidimensional address coordinate and access type (striding by block or by row) by first capable element x 0, read-write are visited Ask and concretely comprise the following steps:
S1. header element memory bank numbering calculates;Memory bank b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden s, calculated correspondingly according to memory bank b0 and horizontal direction corresponding to header element x0 Shift information shift_inf, shift information shift_inf be used for data write buffer and from buffer read data row Sequence;
S3. memory bank home address calculates;Striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank Portion address A;
S4. memory bank home address sorts;S is striden to storage according to memory bank b0, horizontal direction corresponding to header element x0 Element inside body home address A is ranked up;
S5. memory access performs;If write request, s is striden to original number according to shift information shift_inf and horizontal direction After position selection is carried out, it is written in different memory banks;If read request, according to shift information after data are read Shift_inf and horizontal direction stride s to initial data progress position selection.
Above steps can use a station to complete, and can also be completed by the way of multistation flowing water.
In the present embodiment, whether states and memory access address of the step S25 also including detection buffer conflict to produce The Busy signals of Busy signals produce step.
The above-mentioned read and write access step of the present embodiment, while support is striden by block conflict-free access, access efficiency is high, entirely Process with additive, based on displacement and the selection simple operation such as logic, expense is small and is easily achieved.
As shown in Fig. 2 the memory access device that the present embodiment is striden by the storage method of block access using above-mentioned support Lothrus apterus, Including:
Header element memory bank numbering computing module, for calculating memory bank b0 corresponding to header element x0 according to formula (1);
Shift information computing module, stride s, calculate for memory bank b0 and horizontal direction according to corresponding to header element x0 Corresponding shift information shift_inf;
Memory bank home address computing module, for by header element x0 bidimensional address coordinate and horizontal direction stride s, hang down Nogata stride h calculate needed for access elements bidimensional address coordinate, and the access elements according to needed for calculating formula (2) are storing Body home address A;
Memory bank home address order module, striden s pairs for memory bank b0, horizontal direction according to corresponding to header element x0 Element inside memory bank home address A is ranked up.
Memory access execution module;For if write request, being striden s pairs according to shift information shift_inf and horizontal direction After initial data carries out position selection, it is written in different memory banks;If read request, according to displacement after data are read Information shift_inf and horizontal direction stride s to initial data progress position selection.
By above-mentioned memory access device, simple in construction, required cost is low, and can ensure to access required data every time not In same memory bank, so as to realize striding by block access Lothrus apterus, and the access by row of alignment address from arbitrary address starting Lothrus apterus, it can greatly lift sliding window class etc. and the memory access efficiency to stride by the application of block conflict-free access demand be present.
As shown in figure 3, the present embodiment header element memory bank numbering computing module specifically include be sequentially connected be used for water Flat across step s selected to obtain the memory access type decision circuitry of memory access type, the modulus circuit for performing modulo operation and For calculating the memory bank numbering counting circuit of memory bank numbering according to formula (1), access memory access logical address, level stride after s, Storage corresponding to header element x0 is exported after memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit successively Body numbering b0.
Striden first by memory access type decision circuitry according to level and determine memory access type, i.e., first determine s' in formula (1) Value, the present embodiment strides especially by MUX to level to be selected to obtain memory access type s';Due in formula (1) Need to carry out modulo operation, the present embodiment sets modulus circuit to perform modulo operation;Memory bank numbering counting circuit is specially The hardware configuration formed by the part such as shifting, being added, to realize the computing of formula (1) by hardware configuration, finally gives this Memory bank numbering b0 corresponding to memory access header element.
As shown in figure 4, the present embodiment shift information computing module specifically include interconnection stride packet circuit and Multiplexer circuit, the packet circuit reception level that strides stride s, will stride according to the level s that strides and be divided into different groups, and export to more Road selection circuit, multiplexer circuit receives the output for the packet circuit that strides respectively, memory bank b0 corresponding to header element x0, output Difference strides lower shift information shift_inf.
The access type supported due to write operation and read operation is different, read operation support data from arbitrary address originate across Step presses block access, and write operation supports entering by row write for alignment of data, therefore writes shift information and read the calculating side of shift information Formula also differs, wherein when reading shift information calculating, first being striden to stride according to level is divided into different groups, in conjunction with header element Memory bank numbering b0 is striden lower shift information shift_inf by data selector to determine difference;When writing shift information, by head Ordinate in the dimension coordinate of element two carries out modulo operation (mould 1~15), is selected according to vertical stride.
As shown in figure 5, the present embodiment memory bank home address order module includes address in logical address counting circuit, block Required visit is calculated according to header element x0 two dimension coordinates (i, j) and the level s that strides in off-centre circuit, logical address counting circuit Two dimension coordinates of each element asked, each element that address offset circuit accesses according to needed for being calculated formula (2) in block is each Skew inside memory bank.
As shown in fig. 6, the present embodiment memory access execution module includes buffer and for carrying out position selection to writing data Write data bits put selection circuit, put selection circuit for carrying out regioselective read data bit to reading data, write data bits are put Selection circuit receives and writes shift information shift_inf, level strides s, is exported after carrying out position selection to initial data;Read data Position selection circuit receives and reads shift information shift_inf, level strides s, position choosing is carried out after reading data to initial data Select.
Memory access implementation module is logically divided into two parts, if write request, due to the memory bank to be written of user's transmission Data are continuous, and it is not continuous, it is necessary to according to writing shift_inf and s to original number to actual write in memory bank According to position selection is carried out, so as to be written in different memory banks;If read request, the data that are read from each memory bank Need to be supplied to user after arranging by certain requirement, thus after data are read according to shift_inf and s to initial data Carry out position selection.
As shown in fig. 7, the present embodiment also includes Busy signal generating circuits, Busy signal generating circuits include:For examining Survey read request and whether write request accesses the read/write address detection circuit of identical memory bank, expires for detecting the residing sky of buffering The buffering sky of state completely detects circuit, and for producing the Busy generators of Busy signals, Busy generators work as read/write address Detection electric circuit inspection sends Busy signals to empty completely detect when electric circuit inspection expires to buffering of generation read/write conflict or buffering.
Memory access conflict can occur if same memory bank is accessed, now need to stop sending access request, otherwise can Cause loss of data, circuit is detected by read/write address to detect whether read-write requests occur memory access conflict;The empty full detection of buffering The full state of the residing sky of electric circuit inspection buffering, the present embodiment buffer follows the principle of write-after-read, by taking ping-pong buffers as an example:If The register of meter one two is used for indicating buffer status, a total of 4 kinds of states:00,01,10,11, wherein 00 corresponds to " table tennis Pang " buffering is sky, and 01 pang the buffering for corresponding to " rattle " in buffering is full, and 10 table tennis in buffering that corresponded to " rattle " buffer Expire, 11, which correspond to " rattle ", buffers has expired;Busy generators receive the empty full detection electricity of read/write address detection circuit, buffering respectively The detection output on road, Busy signals are produced according to the buffer status and read and write access situation that detect, when occur read/write conflict or Person sends Busy signals when buffering full (state 11), now no longer receives signal access request.
The above-mentioned memory access device of the present embodiment, it is made up of simple logic circuits such as addition, displacement and selection logics, hardware spending It is small and be easily achieved.
Entered below with the block by 4*2 (M=4, N=2) exemplified by row vector memory access and the present invention is further described, wherein Buffer unit uses " table tennis " mechanism.
Fig. 8 be in specific embodiment execution parallel convolution operations schematic diagram, wherein convolution algorithm level stride and vertically across Step is 2, and convolution kernel size is 3*3.The convolution of M*N output image pixel of parallel computation is needed in a bat from buffer Obtain M*N view data, convolution and data one, which are clapped, only needs to read one, and such 9 bat can be to calculate 8 output images The result of pixel.
To avoid memory access conflict from must assure that these data distributions in different memory banks, it is necessary to be carried out to buffering specific Addressing, but striden existing for convolution algorithm and further increase the design difficulty of buffering.The present embodiment is by using above-mentioned Storage method stores to data, i.e., the specific sub- storage that pixel is mapped in storage device is calculated by formula (1) Body, further determine that to obtain particular location of the pixel in sub- memory bank by formula (2), and deposited successively by header element Storage body numbering calculates, shift information calculates, memory bank home address calculates, the sequence of memory bank home address and memory access perform step Suddenly read and write access is realized, enables to clash when being striden by block access, realize that Lothrus apterus strides by block Access.
This implementation realizes that Lothrus apterus strides concretely comprising the following steps by block access by above-mentioned memory access device:Header element memory bank Two dimension coordinates (i, j) and level that numbering computing module provides according to user stride s, and header element is calculated using formula (1) It is mapped to memory bank numbering b0;Shift information computing module obtains data using header element memory bank numbering b0 and the level s that strides Shift information shift_inf, shift_inf are used for data and write memory bank and data sorting is read from memory bank;Memory bank 8 elements of this needs are calculated according to two dimension coordinates (i, j) and the horizontal s that strides of header element for home address order module Two dimension coordinates and this skew of 8 elements inside each memory bank is calculated according to formula (2);Memory access execution module root Position selection is carried out to initial data according to shift_inf and s, is written in different memory banks, and the root after data are read Position selection is carried out to initial data according to shift_inf and s;Busy signal generating circuits by the state of detection buffer and Whether memory access address, which conflicts, produces Busy signals.
For the present embodiment, 2D buffers address schematic diagram, SIMD width in the case of the difference that strides as shown in figs. 9 to 12 For 8, below to be illustrated exemplified by 8 (M=4, N=2) memory bank numbers, scope of supporting laterally to stride now is deposited for 1~15 Storage body addressing function (formula (1)) can be now reduced to:
F (w)=(w+ (w/8) %s ') %8 (3)
W=i+ (j/h) * 4s'(4)
The scope to be striden according to varying level, above-mentioned storage module mapping function can be divided into four kinds of situations:
The s=1 1. level strides, 3,5,7,9,11,13,15
When the s that laterally strides is 1,3,5,7,9,11,13,15, storage module numbering is as shown in figure 9, specific addressing Mode determines according to formula (5), (6), now s'=0:
F (w)=w%8 (5)
W=i+ (j/h) * 4 (6)
As shown in figure 9, the horizontal direction cycle of two-dimentional buffer is 8, square part is filled in figure and represents that non-alignment strides By block access, from logical address from (1,0) starting, laterally stride as 3, and longitudinal direction strides any, is read logically by 4*2 block Location is (1,0), (4,0), (7,0), (10,0), (0, j), (4, j), (7, j) (10, j) 8 numbers, pass through formula (5) and (6) Above-mentioned 8 logical addresses are respectively mapped in 0~7 8 memory banks, you can to ensure that memory access does not clash;Slash square Alignment is represented by the continuous memory access of row, because the buffer level direction cycle is 8, so 8 physical address that mapping obtains exist In different memory banks, memory access will not clash.I.e. from figure 8, it is seen that the non-alignment originated from arbitrary address presses block Memory access conflict will not occur for the access by row for accessing and aliging that strides.
The s=2 2. level strides, 6,10,14
When the s that laterally strides is 2,6,10,14, storage module numbering is as shown in Figure 10, and specific addressing mode is according to public affairs Formula (7), (8) determine, now s'=1:
F (w)=(w+ (w/8) %2) %8 (7)
W=i+ (j/h) * 8 (8)
As shown in Figure 10, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across Step press block access, from logical address from (1,0) starting, laterally strides as 2, and longitudinal direction is striden arbitrarily, and reading is often clapped by 4*2 block It is (1,0), (3,0), (5,0), (7,0), (1, j), (3, j), 8 numbers of (5, j) (7, j) to take logical address;Above-mentioned 8 logics The physical address that address obtains according to formula (7) (8) mapping is respectively fallen in different memory banks;Slash square represents that alignment is pressed Row accesses, and same 8 physical address are in different memory banks.I.e. from fig. 9, it can be seen that from arbitrary address starting it is non-right Memory access conflict will not occur for the neat access by row for accessing and aliging that striden by block.
The s=4 3. level strides, 12
Laterally stride s for 4,12 when, the numbering of memory bank is as shown in Figure 10, specific addressing mode according to formula (9), (19) determine, now s'=2:
F (w)=(w+ (w/8) %4) %8 (9)
W=i+ (j/h) * 16 (10)
As shown in figure 11, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across Step press block access, is originated from logical address from (1,0), laterally strides as 4, longitudinally strides any, by 4*2 block, often claps reading Logical address is (1,0), (5,0), (9,0), (13,0), (1, j), (5, j), 8 numbers of (9, j) (13, j);Above-mentioned 8 logics The physical address that address obtains according to formula (9) (10) mapping is respectively fallen in different memory banks;Slash square represents alignment Access by row, same 8 physical address are in different memory banks.I.e. it can be seen from figure 11 that arbitrary address starting it is non- Memory access conflict will not occur for the access by row for accessing and aliging that striden by block of alignment.
The s=8 4. level strides
When the s that laterally strides is 8, the numbering of memory bank is as shown in figure 11, and specific addressing mode is according to formula (11), (12) It is determined that now s'=3:
F (w)=(w+ (w/8) %8) %8 (11)
W=i+ (j/h) * 32 (12)
As shown in figure 12, the horizontal direction cycle of two-dimentional buffer is 16, fill in figure square part represent non-alignment across Step presses block access, and from logical address from (1,0) starting, laterally stride is striden arbitrarily for 2 and longitudinal direction, and reading is often clapped by 4*2 block It is (1,0), (9,0), (17,0), (25,0), (1, j), (9, j), 8 numbers of (17, j) (25, j) to take logical address;Above-mentioned 8 The physical address that logical address obtains according to formula (11) (12) mapping is respectively fallen in different memory banks;Slash square represents Alignment access by row, same 8 physical address are in different memory banks.I.e. it can be recognized from fig. 12 that arbitrary address originates Non-alignment by block stride access and alignment access by row memory access conflict will not occur.
Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.It is although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, it all should fall In the range of technical solution of the present invention protection.

Claims (10)

1. a kind of support that Lothrus apterus strides the storage method by block access, it is characterised in that step includes:
A two-dimensional storage space is configured, each of which element coordinate is (i, j);
Each pixel in two dimensional image is mapped to different memory banks, to support Lothrus apterus to stride by block access, mapping Formula is specially:
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <mi>w</mi> <mo>+</mo> <mo>(</mo> <mrow> <mi>w</mi> <mo>/</mo> <mrow> <mo>(</mo> <mrow> <mi>M</mi> <mo>*</mo> <mi>N</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mi>%</mi> <msup> <mn>2</mn> <msup> <mi>s</mi> <mo>&amp;prime;</mo> </msup> </msup> <mo>)</mo> <mi>%</mi> <mo>(</mo> <mi>M</mi> <mo>*</mo> <mi>N</mi> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msup> <mi>s</mi> <mo>&amp;prime;</mo> </msup> <mo>&amp;le;</mo> <mi>n</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mi>w</mi> <mo>+</mo> <mfrac> <mrow> <mi>w</mi> <mo>/</mo> <mrow> <mo>(</mo> <mrow> <mi>M</mi> <mo>*</mo> <mi>N</mi> </mrow> <mo>)</mo> </mrow> </mrow> <msup> <mn>2</mn> <mrow> <msup> <mi>s</mi> <mo>&amp;prime;</mo> </msup> <mo>-</mo> <mi>n</mi> </mrow> </msup> </mfrac> <mo>)</mo> <mi>%</mi> <mo>(</mo> <mi>M</mi> <mo>*</mo> <mi>N</mi> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msup> <mi>s</mi> <mo>&amp;prime;</mo> </msup> <mo>&gt;</mo> <mi>n</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
W=i+ ((j/h)+(j%2) * i/ (M*2s') %2* (N/4) * 2) * M*2s'
In formula, f (w) is the numbering of memory bank that mapping obtains, and M and N are respectively by striding by data block during block access in level The number of the element included on direction, vertical direction, and M and N are 2 integral number power;S and h is respectively to stride by block access When horizontal direction strides, Vertical Square strides, s=σ * 2s', wherein σ and 2 is relatively prime.
2. the storage method according to claim 1 for supporting Lothrus apterus to stride by block access, it is characterised in that:It is described by two After each pixel in dimension image is mapped to different memory banks, determine to obtain each pixel in memory bank according to formula (2) Home address;
G (i, j)=i/ (M*N)+j* (Xm/(2*M*N))+i*(Xm*Ym/(2*M*N)) (2)
Wherein g (i, j) is memory bank home address.
3. the storage method according to claim 2 for supporting Lothrus apterus to stride by block access, it is characterised in that when needs are held During row read and write access, read and write access concretely comprises the following steps:
S1. header element memory bank numbering calculates;Memory bank numbering b0 corresponding to header element x0 is calculated according to formula (1);
S2. shift information calculates;Striden s according to memory bank numbering b0 and the horizontal direction corresponding to the header element x0, meter Shift information shift_inf corresponding to calculating;
S3. memory bank home address calculates;Striden h by stride s, Vertical Square of header element x0 bidimensional address coordinate and horizontal direction The bidimensional address coordinate of access elements needed for calculating, and according to needed for calculating formula (2) access elements in memory bank internally Location A;
S4. memory bank home address sorts;Striden according to memory bank numbering b0, the horizontal direction corresponding to the header element x0 S is ranked up to the element inside the memory bank home address A;
S5. memory access performs;If write request, s is striden to original according to the shift information shift_inf and the horizontal direction After beginning data carry out position selection, it is written in different memory banks;If read request, according to the shifting after data are read Position information shift_inf and the horizontal direction stride s to initial data progress position selection.
4. the storage method according to claim 3 for supporting Lothrus apterus to stride by block access, it is characterised in that the step Whether states and memory access address of the S25 also including detection buffer conflict produces step to produce the Busy signals of Busy signals Suddenly.
The storage method by block access 5. a kind of support Lothrus apterus using described in any one in Claims 1 to 44 strides Memory access device, it is characterised in that including:
Header element memory bank numbering computing module, for calculating memory bank numbering b0 corresponding to header element x0 according to formula (1);
Shift information computing module, striden for memory bank numbering b0 and the horizontal direction according to corresponding to the header element x0 S, calculate corresponding shift information shift_inf;
Memory bank home address computing module, for being striden s, Vertical Square by header element x0 bidimensional address coordinate and horizontal direction The h that strides calculate needed for access elements bidimensional address coordinate, and according to needed for calculating formula (2) access elements in memory bank Portion address A;
Memory bank home address order module, for memory bank numbering b0, the level side according to corresponding to the header element x0 To striding, s is ranked up to the element inside the memory bank home address A;
Memory access execution module, for if write request, being striden according to the shift information shift_inf and the horizontal direction After s carries out position selection to initial data, it is written in different memory banks;If read request, the basis after data are read The shift information shift_inf and the horizontal direction stride s to initial data progress position selection.
6. memory access device according to claim 5, it is characterised in that the header element memory bank numbering computing module includes The s that is used to stride to level being sequentially connected, which is selected, obtains the memory access type decision circuitry of memory access type, for performing modulus The modulus circuit of computing and the memory bank numbering counting circuit for calculating memory bank numbering according to formula (1), access memory access are patrolled Volume address, level stride after s, successively after the memory access type decision circuitry, modulus circuit, memory bank numbering counting circuit Export memory bank numbering b0 corresponding to header element x0.
7. memory access device according to claim 6, it is characterised in that:The shift information computing module includes being connected with each other Stride packet circuit and multiplexer circuit, the packet circuit that strides receives the level and striden s, according to the level The s that strides, which will stride, is divided into different groups, exports to the multiplexer circuit, and the multiplexer circuit receives described stride respectively The exporting of packet circuit, memory bank numbering b0 corresponding to the header element x0, output difference stride the lower shift information shift_inf。
8. memory access device according to claim 7, it is characterised in that:The memory bank home address order module includes patrolling Address offset circuit in volume address calculating circuit, block, the logical address counting circuit according to header element x0 two dimension coordinates (i, J) and the level stride s be calculated needed for access each element two dimension coordinates, in described piece address offset circuit according to Skew of each element of required access inside each memory bank is calculated in formula (2).
9. memory access device according to claim 8, it is characterised in that:The memory access execution module includes buffer and use Selection circuit is put in carrying out regioselective write data bits to writing data, for carrying out regioselective read data bit to reading data Put selection circuit, write data position selection circuit receives and writes shift information shift_inf, level strides s, to initial data Exported after carrying out position selection;The read data bit puts selection circuit reception reading shift information shift_inf, level strides s, reading Go out after data and position selection is carried out to initial data.
10. the memory access device according to any one in claim 5~9, it is characterised in that also produced including Busy signals Circuit, the Busy signal generating circuits include:For detecting whether read request and write request access the reading of identical memory bank Write address detection circuit, the buffering sky of the full state of the sky residing for detection buffering completely detect circuit, and for producing Busy The Busy generators of signal, the Busy generators when the read/write address detection electric circuit inspection to generation read/write conflict or Empty completely detect when electric circuit inspection expires to buffering of the buffering sends Busy signals.
CN201710901233.2A 2017-09-28 2017-09-28 Storage method and access device supporting conflict-free stepping block-by-block access Active CN107748723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710901233.2A CN107748723B (en) 2017-09-28 2017-09-28 Storage method and access device supporting conflict-free stepping block-by-block access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710901233.2A CN107748723B (en) 2017-09-28 2017-09-28 Storage method and access device supporting conflict-free stepping block-by-block access

Publications (2)

Publication Number Publication Date
CN107748723A true CN107748723A (en) 2018-03-02
CN107748723B CN107748723B (en) 2020-03-20

Family

ID=61256004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710901233.2A Active CN107748723B (en) 2017-09-28 2017-09-28 Storage method and access device supporting conflict-free stepping block-by-block access

Country Status (1)

Country Link
CN (1) CN107748723B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635235A (en) * 2018-11-06 2019-04-16 海南大学 A kind of the triangular portions storage device and parallel read method of self adjoint matrix
CN109710309A (en) * 2018-12-24 2019-05-03 安谋科技(中国)有限公司 The method for reducing bank conflict
CN111209244A (en) * 2018-11-21 2020-05-29 上海寒武纪信息科技有限公司 Data processing device and related product
CN111813722A (en) * 2019-04-10 2020-10-23 北京灵汐科技有限公司 Data read-write method and system based on shared memory and readable storage medium
CN112445713A (en) * 2019-08-15 2021-03-05 辉达公司 Techniques for efficiently partitioning memory
CN114780459A (en) * 2022-04-06 2022-07-22 Oppo广东移动通信有限公司 Control module, storage system and control method
CN114827091A (en) * 2022-04-25 2022-07-29 珠海格力电器股份有限公司 Method and device for processing physical address conflict and communication equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120159130A1 (en) * 2010-12-21 2012-06-21 Mikhail Smelyanskiy Mechanism for conflict detection using simd
US8731051B1 (en) * 2006-02-10 2014-05-20 Nvidia Corporation Forward and inverse quantization of data for video compression
CN104699624A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method
CN105843591A (en) * 2016-04-08 2016-08-10 龙芯中科技术有限公司 Method and device for generating data through multi-dimensional array sliding as well as processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8731051B1 (en) * 2006-02-10 2014-05-20 Nvidia Corporation Forward and inverse quantization of data for video compression
US20120159130A1 (en) * 2010-12-21 2012-06-21 Mikhail Smelyanskiy Mechanism for conflict detection using simd
CN103262058A (en) * 2010-12-21 2013-08-21 英特尔公司 Mechanism for conflict detection by using SIMD
CN104699624A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method
CN105843591A (en) * 2016-04-08 2016-08-10 龙芯中科技术有限公司 Method and device for generating data through multi-dimensional array sliding as well as processor

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635235A (en) * 2018-11-06 2019-04-16 海南大学 A kind of the triangular portions storage device and parallel read method of self adjoint matrix
CN111209244A (en) * 2018-11-21 2020-05-29 上海寒武纪信息科技有限公司 Data processing device and related product
CN111209244B (en) * 2018-11-21 2022-05-06 上海寒武纪信息科技有限公司 Data processing device and related product
CN109710309A (en) * 2018-12-24 2019-05-03 安谋科技(中国)有限公司 The method for reducing bank conflict
CN109710309B (en) * 2018-12-24 2021-01-26 安谋科技(中国)有限公司 Method for reducing memory bank conflict
CN111813722A (en) * 2019-04-10 2020-10-23 北京灵汐科技有限公司 Data read-write method and system based on shared memory and readable storage medium
CN111813722B (en) * 2019-04-10 2022-04-15 北京灵汐科技有限公司 Data read-write method and system based on shared memory and readable storage medium
CN112445713A (en) * 2019-08-15 2021-03-05 辉达公司 Techniques for efficiently partitioning memory
CN114780459A (en) * 2022-04-06 2022-07-22 Oppo广东移动通信有限公司 Control module, storage system and control method
CN114827091A (en) * 2022-04-25 2022-07-29 珠海格力电器股份有限公司 Method and device for processing physical address conflict and communication equipment
CN114827091B (en) * 2022-04-25 2023-06-20 珠海格力电器股份有限公司 Physical address conflict processing method and device and communication equipment

Also Published As

Publication number Publication date
CN107748723B (en) 2020-03-20

Similar Documents

Publication Publication Date Title
CN107748723A (en) Storage method and access device supporting conflict-free stepping block-by-block access
RU2623806C1 (en) Method and device of processing stereo images
CN102460503B (en) Apparatus and method for displaying warped version of source image
CN108681984A (en) A kind of accelerating circuit of 3*3 convolution algorithms
KR101639574B1 (en) Image processing system supplying adaptive bank address and address mapping method thereof
US10621446B2 (en) Handling perspective magnification in optical flow processing
CN103218348B (en) Fast Fourier Transform (FFT) disposal route and system
CN104699465B (en) Vector access and storage device supporting SIMT in vector processor and control method
CN110163338B (en) Chip operation method and device with operation array, terminal and chip
CN110637461B (en) Compact optical flow handling in computer vision systems
CN106683158A (en) Modeling structure of GPU texture mapping non-blocking memory Cache
CN106846255B (en) Image rotation realization method and device
CN208766715U (en) The accelerating circuit of 3*3 convolution algorithm
JP3639464B2 (en) Information processing system
CN110390382B (en) Convolutional neural network hardware accelerator with novel feature map caching module
CN106021182A (en) Line transpose architecture design method based on two-dimensional FFT (Fast Fourier Transform) processor
CN106530209A (en) FPGA-based image rotation method and apparatus
CN111861883B (en) Multi-channel video splicing method based on synchronous integral SURF algorithm
CN102592258B (en) Configurable Gabor filtering hardware acceleration unit applied to fingerprint image enhancement
CN107563080B (en) GPU-based two-phase medium random model parallel generation method and electronic equipment
CN106780415A (en) A kind of statistics with histogram circuit and multimedia processing system
CN104869284A (en) High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm
CN101796845A (en) Device for motion search in dynamic image encoding
CN109614149B (en) Upper triangular part storage device of symmetric matrix and parallel reading method
CN1105358C (en) Semiconductor memory having arithmetic function, and processor using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant