CN106293624A

CN106293624A - A kind of data address produces system and method

Info

Publication number: CN106293624A
Application number: CN201510271803.5A
Authority: CN
Inventors: 林正浩
Original assignee: Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Current assignee: Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority date: 2015-05-23
Filing date: 2015-05-23
Publication date: 2017-01-04
Also published as: WO2016188392A1

Abstract

The invention provides a kind of study automatically, record, produce the method and system of data address.It is applied to during field of processors described to instruct the data needed and be sent to processor core in advance in case it uses before processor core performs data read command；And data address when next time performing this instruction can be predicted, it is filled into corresponding data in data buffer to reduce cache miss.

Description

A kind of data address produces system and method

Technical field

The present invention relates to computer, communication and integrated circuit fields.

Background technology

In processor system, the effect of caching is to be replicated in wherein by a part of content in internal memory, makes these contents quickly to be accessed by processor core at short notice, to ensure the continuous service of streamline.Caching is generally divided into instruction buffer and data buffer storage, and the locality of instruction address is preferable, and therefore the hit rate of instruction buffer is the highest；But the locality of the data address produced when performing data access instruction is poor, cause data buffer storage hit rate the highest.

But, the data address being positioned at the data access instruction in circulation (loop) code has certain rule to follow.Generally during certain the data access instruction in every time going to a circulation, the data address of its correspondence is all to increase a constant (can be positive number, negative or zero).This constant is exactly data step size corresponding to corresponding data access instruction (stride).Obviously, before and after adjacent twice is performed data address during same data access instruction, from later data address, deduct previous data address can be obtained by data step size, then later data address be can be obtained by performing prediction data address during this data access instruction plus data step size the most next time.Thus can shift to an earlier date and from external memory storage, corresponding data is prefetched to data buffer storage according to described prediction data address.When actual go to this data access instruction time, if the actual data address produced is equal with this prediction data address, then data buffer storage necessarily hits；If both, then determine whether the data of correspondence hit in data buffer storage according to real data address.

Although in most cases data step size is to immobilize, but the most still can change.In the case of such as two-layer loop nesting, when interior loop code is performed a plurality of times, owing to the code performed is identical, therefore the increment (i.e. data step size) of data address also tends to identical.But once go to outer loop code, data address can adds additional an increment so that the difference of data address and previous secondary data address be no longer before data step size.But, when again performing interior loop, the increment of data address be then restored to before value.In this case, if only one data step size of record, then produce the prediction data address of mistake when circulating level changes, thus affect the raising of data buffer storage hit rate.

The method and system device that the present invention proposes can directly solve above-mentioned or other one or more difficulties.

Summary of the invention

The present invention proposes a kind of data address and produces system and method, can produce data address prior to processor core, with accesss data storage, read data for processor core process.Address and the processor core of the data access instruction that this data address generation system and method study processor core performs perform the corresponding data address that this instruction produces, and processor core performs the address increment between the data address that same data access instruction produces for twice and stores it in step length table.

Step length table is one and one-dimensional adds a two-dimentional data structure.Wherein one-dimensional data structure is with the instruction addresses address of data access instruction, and content is data address.The wherein one-dimensional instruction addresses address with data access instruction in two dimensional data structure, the most one-dimensional instruction addresses address with branch instruction, and the content of data structure is data address increment (step-length).The map instruction addresses of data access instruction is corresponding data address by so step length table.And this mapping on-fixed, and the dynamic mapping being as the performing number of times and its circulating path of this data access instruction and changing.System and method for is more using the instruction address of the reverse skip branch instruction of successful branch as the upper limit, using respective branch target instruction address as lower limit, the state auto-increment that the corresponding data address of the automatic data access instruction by instruction address between lower limit and the upper limit is circulated according to current branch updates data address, and access data storage with the data address updated, read corresponding data and be supplied to processor core.

It is an object of the invention to provide a kind of data address and produce system and method, produce data address in advance to cover the time delay and the disappearance of data buffer accessing data storage.

To this end, the invention provides a kind of data address to produce system, including:

Step length table, in order to store data address and address increment；Described data address produces the data address that system learns respectively and recording processor core execution data access instruction produces according to data access instruction address, and performs data address increment between this instruction twice, is stored in step length table；

Described data address produces system and produces new data access address access data storage with data access instruction addressing of address step length table content, obtains data for processor core.

Optionally, the contents in table in described step length table is described data address；Described step length table is by described data access instruction addressing of address.

Optionally, the contents in table in described step length table is described data address increment；Described step length table one-dimensional by described data access instruction addressing of address；Another dimension of described step length table is by the backward branch instruction addressing of address of successful branch.

Optionally, described data address produces system and produces address by the following method:

Described data address produces system and is added to produce new data address by described data address and the described data address increment of storage in described step length table；Described new data address is subsequently saved back step length table.

Optionally, including:

Described data address produces system and accesses data access instruction address step length table content between the backward branch instruction address and Branch Target Instruction address thereof of successful branch；Described data address produces system and produces new address data memory according to described step length table content；Described data address produces system and accesses described data storage with described new address data memory, obtains data and processes for described processor core；Described new data address is subsequently saved back step length table.

The invention provides a kind of data address production method, comprise the following steps:

The data address that data access instruction produces is performed according to data access instruction address learning processor core；

Data address increment between same data access instruction is performed for twice according to data access instruction address learning processor core；

By above-mentioned data address and data increment record in step length table；

Produce new data access address with data access instruction addressing of address step length table content and access data storage, obtain data for processor core.

Optionally, including:

For this area professional person, it is also possible under the explanation of the present invention, the inspiration of claims and drawing, understand, understand the present invention and comprised other aspect contents.

Beneficial effect

In addition, system and method of the present invention can be before processor core will go to data read command, produce data address in advance from data storage, read these data and be sent to processor core for it, processor core can directly be taken when needs read these data, mask the time delay accessing data storage, mask the disappearance of data buffer.

Further, the data address that system and method for the present invention can learn automatically and recording processor core produces and the increment under different instruction circulates thereof；And according to the level of instruction cycles, automatically adjust in order to produce increment during data address so that the data address of generation is more accurate.

For the professional person of this area, other advantages and applications of the present invention will be apparent from.

Accompanying drawing explanation

Fig. 1 is a schematic diagram of the data access instruction in circulation of the present invention；

Fig. 2 is the embodiment of circulation step-length memory module of the present invention；

Fig. 3 be of the present invention its be data address of the present invention produce system an embodiment；

Fig. 4 is an embodiment of the line decoder of step length table of the present invention；

Fig. 5 is the processor system block diagram using data address of the present invention to produce system.

Detailed description of the invention

The data buffering system and the method that propose the present invention below in conjunction with the drawings and specific embodiments are described in further detail.According to following explanation and claims, advantages and features of the invention will be apparent from.It should be noted that, accompanying drawing all uses the form simplified very much and all uses non-ratio accurately, only in order to facilitate, to aid in illustrating lucidly the purpose of the embodiment of the present invention.

It should be noted that, in order to clearly demonstrate present disclosure, the present invention is especially exemplified by multiple embodiments to explain the different implementations of the present invention further, and wherein, the plurality of embodiment is the not exhaustive formula of enumerative.Additionally, succinct in order to illustrate, content noted above in front embodiment is often omitted in rear embodiment, and therefore, in rear embodiment, NM content can be accordingly with reference to front embodiment.

Although this invention can extend in amendment in a variety of forms and replacing, description also lists some concrete enforcement legends and is described in detail.It should be appreciated that the starting point of inventor is not that this invention is limited to illustrated specific embodiment, antithesis, the starting point of inventor is to protect all based on the improvement carried out in the spirit or scope defined by this rights statement, equivalency transform and amendment.Same components and parts number is likely to be used for all accompanying drawings to represent same or similar part.

Under normal circumstances, a data access instruction is likely located in multilamellar instruction cycles, and when performing the circulation of same level, corresponding data step size is identical every time, but when performing different levels circulation, corresponding data step size is different.Such as: for a data access instruction being positioned in two-layer circulation, when performing interior loop, data address increases ' 4 ', i.e. data step size is ' 4 ' every time；But when performing outer loop, data address increases ' 20 ', i.e. data step size is ' 20 ' every time.Now, no matter general ' 4 ' or ' 20 ', as the data step size of this instruction, can cause the data address prediction error of some.According to technical solution of the present invention, according to the relation between branch instruction and data access instruction, same data access instruction can be positioned at the situation of different levels circulation, give different data step size so that the prediction of data address is more accurate.

Refer to Fig. 1, it is a schematic diagram of the data access instruction in circulation of the present invention.In FIG, instruction is arranged from left to right by sequence of addresses, and wherein, instruction 11,12 and 13 is all data access instruction, and instructing 21,22 and 23 is all the branch instruction of reverse skip.Therefore, the instruction respectively and between its Branch Target Instruction of these three branch instructions constitutes circulation.As it is shown in figure 1, constitute three layers of nested circulation altogether, wherein, the circulation of branch instruction 21 correspondence is innermost loop, and the circulation of branch instruction 23 correspondence is outermost loop.As such, it is possible to respectively the every data access instruction in this section of code to be given special circulation step-length memory module, thus provide different data step size when the circulation operation performing different levels.

Refer to Fig. 2, it is the embodiment of circulation step-length memory module of the present invention.For the ease of describing, the present embodiment, to how to be provided data step size to illustrate by circulation step-length memory module, is how to be stored in circulation step-length memory module as these data step size, Fig. 3 embodiment is described further again.In the present embodiment, the corresponding three layers of circulation of this circulation step-length memory module, by depositor 31,32 and 33, and selector 41,42 and 43 is constituted.Wherein, depositor 31 and corresponding ground floor circulation (innermost loop) of selector 41, depositor 32 and selector 42 corresponding second layer circulation, depositor 33 and corresponding third layer circulation (outermost loop) of selector 43.

In the present invention, the data access instruction of every predicted data address is all corresponding such as a circulation step-length memory module of Fig. 2.As a example by the described circulation step-length memory module of data access instruction 12 correspondence (this data access instruction is positioned in three layers of circulation), depositor 31 stores data step size corresponding when performing ground floor circulation (i.e. the circulation of branch instruction 21 correspondence) and significance bit；Depositor 32 stores data step size corresponding when performing second layer circulation (i.e. the circulation of branch instruction 22 correspondence) and significance bit；Depositor 33 stores data step size corresponding when performing third layer circulation (i.e. the circulation of branch instruction 23 correspondence) and significance bit.Owing to different data access instruction all has respective circulation step-length memory module, and the corresponding registers in each circulation step-length memory module can store different value so that pieces of data access instruction can have different data step size when being positioned at different levels circulation.The initial value of described significance bit is ' 0 '.

In the present embodiment, these three layers circulation are to have dominance relation.Such as, once ground floor circulation occurs, then a situation arises can not to consider second and third layer of circulation, the data step size (i.e. the value of depositor 31) that output ground floor circulation is corresponding.On the other hand, only when ground floor circulation does not occur, second layer circulation can just be entered.And once second layer circulation occurs, then a situation arises can not to consider third layer circulation, the data step size in output register 32.Similarly, only when first and second layer of circulation does not occur, third layer circulation can just be entered.And once third layer circulation occurs, then the data step size in output register 33.If three layers of circulation does not the most occur (represent and go to this section of code for the first time, or owing to more outer loop causes again going to this section of code), then to export the data step size default value deriving from bus 35.As such, it is possible to based on the circulation step-length memory module shown in Fig. 2 embodiment, the branch whether occurred with the branch's transfer representing branch instruction 21,22 and 23 of processor core output respectively judges that signal controls corresponding selector, the data step size of the most exportable correspondence.

Such as, for code shown in Fig. 1, when going to data access instruction 11,12 and 13 for the first time, branch instruction 21,22 and 23 is not yet performed, therefore the transfer of corresponding branch does not the most occur, i.e. in respective cycle step-length memory module, the selection signal of selector 41,42 and 43 is ' 0 ', and all output derives from the data step size default value of bus 35.

When branch instruction 21 occurs branch's transfer for the first time, enter ground floor circulation, again go to data access instruction 12 and 13, i.e. second time and perform branch's access instruction 12 and 13.The significance bit now read from depositor 31 is ' 0 ', and therefore in depositor 31, the data step size of storage is invalid.Now calculate data step size, and judge that signal stores in depositor 31 under controlling in the branch of branch instruction 22, and be ' 1 ' by the active position in depositor 31.

After assuming all there is (performing ground floor circulation) in branch's transfer of branch instruction 21 the most always, then in respective cycle step-length memory module, the selection signal of selector 41 is ' 1 ', and the circulation step-length memory module of data access instruction 12 and 13 correspondence exports the significance bit in respective depositor 31 and data step size respectively.Owing to now significance bit is ' 1 ', this data step size therefore can be used to calculate corresponding prediction data address.Meanwhile, recalculating data step size by preceding method, and judge that signal stores in depositor 31 under controlling in the branch of branch instruction 22, significance bit then remains ' 1 '.So, if data step size when front and back performing same data access instruction twice does not change, then the value in depositor 31 is constant；If data step size there occurs change, then the value in depositor 31 is updated to new data step size value.

When again performing branch instruction 21, branch's transfer does not occur, and perform branch's transfer generation during branch instruction 22, then in respective cycle step-length memory module, the selection signal of selector 41 and 42 is respectively ' 0 ' and ' 1 ', and the data step size in the circulation step-length memory module of data access instruction 11,12 and 13 correspondence output register 32 respectively is to calculate corresponding prediction data address.Thus, in the circulation of different levels, in use circulation step-length memory module, different register values are as data step size.

Additionally, for data access instruction 11, if branch's transfer of branch instruction 21 occurs, then the selection signal 21 of selector 41 is ' 1 '.Now, although the data step size in output register 31, but not comprising data access instruction 11 in circulating due to ground floor, therefore this data step size is left in the basket.And in other cases, then by the data step size in the corresponding output register 32 or 33 of the most described operation, or output derives from the data step size default value of bus 35, thus different circulations is provided different data step size.

Each corresponding data access instruction of circulation step-length memory module that data step size is provided for data access instruction in circulation of the present invention.Described circulation step-length memory module is extended, increases depositor and the number of selector (the most often group depositor one layer circulation corresponding with corresponding selector), can the circulation of corresponding more levels；Being that every (or part) data access instruction in the circulation of described more levels provides a described circulation step-length memory module again, can be embodied as in the circulation of described more levels all (or part) data access instruction is provided the function of data step size more accurately respectively by circulation implementation status.

Refer to Fig. 3, it is the embodiment that data address of the present invention produces system.Wherein comprise and be collectively referred to as step length table, corresponding to the storage array 52 of plural number data access instruction, according to the column decoder 50 of branch's judged result decoding, the line decoder 54 decoded according to the instruction address of data access instruction；And address generator 60.60 are made up of subtractor 61, adder 62, selector 63, and comparator 64.Storage array 52, the corresponding data access instruction of each of which row.In line decoder 54, in correspondence 52, every a line is provided with depositor and stores the instruction address of data access instruction corresponding to this row, and has the more described content of registers of comparator and the instruction address on data access instruction address bus 53.As the content of registers of certain row compares identical with the address on 53 in 54, then 54 wordline (word line) enabling (enable) these row, makes this row to be read or write.Array 52 has two read/write mouths.One of them read/write mouth 37,38 is special by the string 58 of rightmost in 52, and the access to these row is only controlled by line decoder 54, the corresponding data address 67 of the entry format storage data access instruction of these row and significance bit 68.Another read/write mouth 34,36 is that in 52, each row in addition to 58 row share such as 56 row etc., the a line selected by line decoder 54 can be accessed by it, and the list item in the string of column decoder 50 selection, its entry format is the data step size 65 and significance bit 66 that storage described plural number data access instruction is corresponding in specific level circulates.In above-mentioned array 52, remaining in addition to 58 respectively arranges, the number of the circulating level that its columns this step-length storage array corresponding at most can be supported.In 50, in correspondence 52, each row are provided with comparator and address register, wherein store the instruction address of each branch instruction.When processor core makes the judgement of ' performing branch ', the instruction address of its respective branch instructions is sent in 50 coupling through bus 51, row (in addition to 58 arrange) in array 52 corresponding to the depositor of match hit can be accessed by read/write mouth 34,36.

Processor core performs instruction in order, and from Fig. 1, the instruction on the left side starts from left to right to perform.Find that it is data loading instruction when processor core decoding instruction 11, the instruction address of instruction 11 is mated in bus 53 is sent to line decoder 54, now mate miss, then the line replacement logic in 54 is instruction 11 distribution a line, array 52 upper row 11 in i.e. Fig. 3, is stored in 54 the depositor corresponding with this row by the instruction address of instruction 11.In this row, the significance bit 66 and 68 etc. of each row is all set to ' 0 '.Being the significance bit 68 of ' 0 ' in arranging according to 58, data address produces system makes processor core execution data loading instruction 11 produce data address access data storage (such as data buffer storage), reads data to processor core execution.Simultaneity factor also by above-mentioned data address through bus 57, selector 63, write mouth 38 and deposit in array 52 67 territories in the list item of 11 row 58 row, and significance bit 68 in this list item is set to ' 1 '.Thereafter processor core performs subsequent instructions, in 54, line replacement logic is data loading instruction 12 also described above, 13 respectively distribute a line, array 52 centre and following a line 12 in i.e. Fig. 3,13, processor core performed the data address that instruction 12,13 produces it is stored in the list item of 12,13 row 58 row 67 as above-mentioned respectively and the significance bit 68 in list item is set to ' 1 '.

Thereafter processor core performs branch instruction 21, and its branch is judged as ' performing branch '.Therefore processor core reverse skip is to the branch target address instruction between instruction 11 and instruction 12, starts to perform by instruction sequences from this instruction.Meanwhile the instruction address of branch instruction 21 is mated in bus 51 is sent to column decoder 50.If do not mated, the string (21 row in figure) during column permutation logic is its distribution array 52 in 50, the instruction address of branch instruction 21 is stored in 50 the depositor that should arrange.Can be always maintained at the instruction address of last successful branch in bus 51, now address coupling in the depositor of address 21 row corresponding with 50 on 51, therefore column decoder 50 selects 21 row.

Processor core follow procedure order continues executing with instruction, again performs instruction 12, finds that it is data loading instruction during decoding instruction 12.Therefore the instruction address of instruction 12 is mated in bus 53 is sent to line decoder 54, current match hit, and from 52,12 row 58 arrange through read mouth 37 reading significance bit 68 is ' 1 ', and from 52,12 row 21 arrange through read mouth 37 reading significance bit 66 is ' 0 '.According to two significance bits that this is ' 10 ', system makes processor core perform data loading instruction 12 and produces data address access data storage reading data to processor core execution.Above-mentioned data address is also deducted data address 67 last 12 row 58 now read arrange from the reading mouth 37 of array 52 by subtractor 61 in bus 57 delivers to address generator 60 by simultaneity factor, and its difference writes array 52 as address increment (step-length) through writing mouth 34.In bus 53 being now the address of data loading instruction 12, bus 51 is the address of branch instruction 21, the most above-mentioned step-length is stored in array 52 step-length territory 65 in the list item of 12 row 21 row.System also significance bit 66 will be set to ' 1 ' during also 12 row 21 arrange.According to same mode system, also the step-length of data loading instruction 13 is stored in during 13 row 21 arrange in 52 65, the significance bit 66 of 13 row 21 is set to ' 1 '.

Thereafter processor core performs branch instruction 21 again, and its branch judges and as ' perform branch '.Therefore processor core reverse skip is to the branch target address instruction between instruction 11 and instruction 12, starts to perform by instruction sequences from this instruction.Meanwhile the instruction address of branch instruction 21 is mated in bus 51 is sent to column decoder 50.Current match hit, therefore column decoder 50 selects 21 row.

Processor core follow procedure order performs instruction, again performs instruction 12, finds that it is data loading instruction during decoding instruction 12.Therefore the instruction address of instruction 12 is mated in bus 53 is sent to line decoder 54, current match hit, and from 52,12 row 58 arrange through read mouth 37 reading significance bit 68 is ' 1 ', and from 52,12 row 58 arrange through read mouth 36 reading significance bit 66 is ' 1 '.According to two significance bits that this is ' 11 ', the now step-length 65 from the data address 67 read 12 row 58 that mouth 37 reads arrange arranges with 12 row 21 read from reading mouth 36 is added by system device 62 with additive.It accesses data storage with 38 as new data address, reads data and delivers to processor core process.Processor core performs the data address 57 of instruction 12 generation and is compared by comparator 64 with the data address on 38.As comparative result is identical, then processor core continues follow procedure order and performs subsequent instructions, adder 62 and also write back 12 row 58 row and be stored in 67 through reading mouth 38, and maintain significance bit 68 in 12 row 58 row constant for ' 1 '.As comparative result is different, then system control processor core is abandoned the middle of data obtained according to data address on 38 and is performed result, makes processor core perform the loading of instruction 12, then order execution subsequent instructions by the data that data address 57 obtains.System also controls selector 63 and selects address on 57 to be stored in the list item of 12 row 58 row 67 through reading mouth 38, and maintaining significance bit 68 in this list item is ' 1 '；But significance bit 66 is set to ' 0 ' in being arranged by 12 row 21, records step-length in this list item with this invalid, need to relearn.Instruction 13 is also pressed the same manner and is processed by system.

Thereafter processor core performs branch instruction 21 again, and current branch is judged as ' non-limbed '.Therefore processor core continues to perform by instruction sequences.Thereafter processor core performs branch instruction 22, and its branch is judged as ' performing branch '.Therefore processor core reverse skip is to the branch target address instruction instructed before 11, starts to perform by instruction sequences from this instruction.Meanwhile the instruction address of branch instruction 22 is mated in bus 51 is sent to column decoder 50.Because not mating, the string (22 row in figure) during column permutation logic is its distribution array 52 in 50, the instruction address of branch instruction 22 is stored in 50 the depositor that should arrange.Address coupling in the depositor of address 22 row corresponding with 50 now kept in bus 51, therefore column decoder 50 selects 22 row.

Processor core follow procedure order performs instruction, again performs instruction 11, finds that it is data loading instruction during decoding instruction 11.Therefore the instruction address of instruction 11 is sent in 54 coupling, current match hit through bus 53, and from array 52,11 row 58 arrange through read mouth 37 reading significance bit 68 is ' 1 ', and from 52,11 row 22 arrange and read significance bit 66 for ' 0 ' through reading mouth 37.Therefore system is performed by situation effectively but time step-length 65 is invalid of aforementioned data loading instruction address 67, the data address produced with processor core reads data from data storage and processes for processor core, and data address 67 is subtracted each other in the data address sent here through bus 57 being arranged with 58 read through reading mouth 37, its difference is stored in 11 row 22 row 65 territories as step-length through writing mouth, and is set to ' 1 ' by 66.Data address on 57 is also stored in 67 territories in 11 row 58 list items by system, and maintains significance bit 68 in this list item constant for ' 1 '.System is pressed same method and is processed instruction 12 and 13.Later by the aforementioned operation that goes round and begins again.

Generally speaking, a line in array 52 stores the data memory addresses of a data access instruction, step-length and corresponding significance bit；The step-length of the string storage corresponding each data access instruction of a corresponding successful branch instruction of branch in 52 and significance bit, 58 the most special column storage storage address, its read-write is not affected by the state of branch instruction.System selects a line in array 52 through bus 53 with the instruction address of data access instruction, reads significance bit 68 in its 58 row；And select string through bus 51 with the branch instruction address of the last item successful branch, read wherein significance bit 66.According to the state of significance bit in a line 68 and 66, system has following three kinds of operator schemes.When read 68 and 66 states be ' 00 ' time, meaning be data address and step-length the most invalid, now system with the data address 57 that processor core produces be stored in this row 58 arrange in 67 and the state that sets as ' 10 '.When 68 and 66 states read are ' 10 ', meaning is that effectively but step-length is invalid for data address, data address 57 and this row 58 that now system-computed processor core produces arrange in 67 data addresses deposited differ from and be deposited into 65 territories in the row selected by now branch's judgement, and the state that sets is as ' 11 '.When 68 and 66 states read are ' 11 ', meaning be data address and step-length the most effective, now system is added to produce data address 38 with the step-length in 65 territories in other row of branch's judgement selection with data address in 67 territories in 58 row, access data storage, read data and process for processor core.System the data address 57 that produced by processor core are with data address 38 compared with in this case, optionally take to revise according to comparative result to operate, and revise the state of 66.

The above-mentioned system fetched data in advance based on step-length can be used in several ways.First kind of way is when the data address 38 produced according to a line in array 52 reads (in bus 38, address is identical with address in bus 57) when data are accepted by processor core, and system is i.e. delivered to data buffer storage plus the step-length read on mouth 36 as guess data address using address on 38 and mated.If do not mated, can start fetches data from lower storage hierarchy inserts high-level caching.Processor core still produces data address through bus 57 and reads data from data buffer storage.This mode can the disappearance of part or all of obfuscated data caching in most cases.But the step-length that described guess data address is result based on last successful branch and chooses, performed before upper once successful branch, and processor core perform same data loading instruction to be read data to be next time after successful branch.Upper and lower twice successful branch be not necessarily same branch instruction, therefore produce the step-length that guess data address used the most consistent with the increment of the data address use that processor core produces.The second way is to read the list item in the row that successful branch is selected and 58 list items from array 52 with the data loading instruction address performed before after successful branch, such as entry status 68,66 is ' 11 ', i.e. produce data address 38, from data storage, read data be stored in a shorter data read buffer (data read buffer) of reading delay, in case processor core is taken.Below it is the embodiment of the second way.

Refer to Fig. 4, it is an embodiment of step length table line decoder of the present invention.The decoding logic of a line in corresponding array 52 in Fig. 4 display line decoder 54, wherein 72,74 is depositor, and 76 is selector, and 78 is comparator.An input of comparator 78 is received in the output of depositor 72, and another input of comparator 78 is received in the output of selector 76.Comparator 78 is in addition to doing comparison of equalization, it is also possible to do more than comparing, less than comparing, and three kinds of comparison patterns altogether.The comparison pattern of comparator 78 links with the selection of selector 76.When selector 76 selects bus 73 as input, comparator 78 does deposited address in depositor 76 and is more than or equal to the comparison of address on 73；When selector 76 selects bus 51 as input, 78 do deposited address in depositor 76 is less than the comparison of address on 51；When selector 76 selects bus 53 as input, 78 do deposited address in depositor 76 is equal to the comparison of address on 53；75 is the comparative result of comparator 78 output.

Wherein in bus 53 be processor core output data load address.This address is done comparison of equalization with depositor 72 content in all of row in line decoder 54 by the comparator 78 often gone.As none mates, then row is replaced logic and can be distributed a line for this data loading instruction in 54, and address on 53 is stored in the depositor 72 of this row.Same address can mate with address in this row depositor 72 on 53 afterwards, enables the wordline of this row, its operation detailed description in Fig. 3 embodiment.Now when a reverse skip branch instruction successful execution branch, the instruction address of this instruction is put bus 51, the instruction address of the Branch Target Instruction of this instruction is put bus 73 by system.Depositor 72 content that system makes in 54 in all row is done aforementioned being more than or equal to the address in bus 73, and bus 51 successively, and less than comparing, comparative result 75 is stored into depositor 74.When in depositor 72 in certain row, the address of storage is more than or equal to the address in bus 73, but when being less than the address in bus 51, this row corresponding data loading instruction is between branch instruction (not containing) and Branch Target Instruction (containing), Fig. 1 such as instructs 12,13 in the circulation of branch instruction 21, and comparative result ' 1 ' write into by the depositor 74 of this row.Comparative result is unsatisfactory for the row of above-mentioned condition, and not between Branch Target Instruction and branch instruction, such as in Fig. 1, instruction 11 is not in the circulation of branch instruction 21, and comparative result ' 0 ' write into by the depositor of this row.

Thereafter the wordline of the row that all depositor 74 contents are ' 1 ' is enabled by system successively, reads the contents in table of this row in the contents in table of this row in 58 row and row that now column decoder 50 selects according to the branch instruction address in 51 buses from array 52.As wherein 68,66 states are ' 11 ', then system is added with the upper loop-around data address in the step-length in 65 territories and 67 territories, obtains new data address through bus 38 address data memory, reads data in case processor core uses.New data address is also written back into 67 territories.And system capable does not operates what depositor 74 content was ' 0 '.Afterwards when processor core performs data loading instruction, send the instruction address of data loading instruction through bus 53 and do comparison of equalization to content with the most each row depositor 72 in line decoder 54.The content of the depositor 74 in the row that system detection is more equal, if content is ' 0 ', then state 68 during system reads this row, 66, by this state such as aforementioned operation；If depositor 74 content is ' 1 ', then the data address that during system reads this row, in 67 territories, data address is sent here through bus 57 with processor core.As equal in two addresses, system, by the depositor 74 of this row accommodating ' 0 ', does not do subsequent operation.Such as two addresses, then system is by the depositor 74 of this row accommodating ' 0 ', the situation operation not waited by address in aforementioned 67 territories and address in bus 57.

Refer to Fig. 5, it is the processor system block diagram using data address of the present invention to produce system.Wherein 50 is the column decoder of step length table, and 52 is step length table array, and 54 is the line decoder of step length table, and 60 is address generator, and 80 is processor core, and 82 is data buffer, and 84 is data storage.51 is the branch instruction address bus of successful branch, processor core 80 export to column decoder 50 and line decoder 54.73 is Branch Target Instruction address bus, processor core 80 export to 54.53 is data access instruction address bus, processor core 80 export to 54.57 is data address bus, processor core 80 export to address generator 60 and data storage 84.38 is data address bus, address generator 60 export to data storage 84.Data are exported data read buffer 82 from data storage 85 and keep in by bus 85, and data are exported to processor core 80 by bus 87 from 82.The columns of step length table (50,52,54) determines its treatable circulation number of plies or circulation number.The line number of step length table determines its treatable data access instruction bar number.

Step length table is that one one-dimensional (58 row) are plus a two-dimentional data structure.Wherein one-dimensional data structure is with the instruction addresses address of data access instruction, and the content of data structure is data address.The wherein one-dimensional instruction addresses address with data access instruction in two dimensional data structure, the most one-dimensional instruction addresses address with branch instruction, and the content of data structure is data address increment (step-length).The map instruction addresses of data access instruction is corresponding data address by so step length table.And this mapping on-fixed, and the dynamic mapping being as the performing number of times and its circulating path of this data access instruction and changing.One-dimension storage resource (OK) is distributed through bus 53 in the data structure of step length table in the data access instruction address that system provides with processor core 80, and the corresponding data address that passing through bus 57 with 80 provides provides initial content (data address) to the one-dimension storage resource of distribution；With the 80 branch instruction address another one dimensional resource of distribution (arranging) provided by bus 51, it is stored in each another one dimensional resource (arranging) with the difference of 80 data addresses again provided by bus 57 with the initial content in step length table.System more transmits the instruction address of reverse skip branch instruction of successful branch as the upper limit using bus 51 afterwards, respective branch target instruction address is transmitted as lower limit using bus 73, respective corresponding step-length renewal data address in the row that the state that makes step length table and address generator 60 automatically circulates according to current branch the corresponding data address of instruction address data access instruction between lower limit to the upper limit is selected, and access data storage 84 with the data address updated by bus 38, read corresponding data and before processor 80 exports corresponding data address 57, be stored in data reading buffering 82.

Data read buffering 82 can be to be the form read by address coupling, and under this form, every a line of 82 has the list item of storage data, and the list item of storage corresponding data address.Now data address is sent into 82 with bus 57 by processor core 80, and the data in the corresponding data list item that in 82, on address list item and 57, address is mated are sent into processor core 80 through bus 97 and processed.Under this form, line replacement logic can use the substitute modes such as LRU (least recently used).The substitute mode of row can also use the forms such as LRU.Under this kind of form, it is noted that to have mechanism to perform by instruction address order when being limited to the digital independent of the upper limit under execution, because the corresponding data near the data access instruction of lower limit is used by processor core the earliest as far as possible.

It can be FIFO (FIFO) that data read the another kind of form of buffering 82.Now the row distribution logic in line decoder 54 is strictly by line number increasing alllocated lines resource, as distributed by the sequence of addresses of instruction 11,12,13 by row in Fig. 3 embodiment.So when processor core 80 provides described lower limit and the upper limit to line decoder 54,54 are limited to the upper limit from down enables each wordline successively according to instruction address order, making step length table and address generator 60 provide corresponding data addresses access data memorizer 84 by instruction order, the data making the data of reading be stored in FIFO formula by instruction order read buffering 82.Now processor core 80 often performs a data loading instruction, just provides a read request to FIFO 82, and 82 will be to 80 one data of output.82 be now one by the tactic data queue of instruction address.

Step length table is considered as cyclic buffer (circular buffer) by the line replacement logic of this form, after step length table last column has been allocated, distributes the first row, then the second row next time.In depositor 72, the address of storage can make aforesaid mechanism survival.Also having other data access instruction, nethermost a line in instruction 11 acquisition distribution diagram three in such as Fig. 1 embodiment before 11,12,13 three data access instructions, 12,13 instructions obtain successively and distribute above and middle row.When instruction 22 or 23 successful branch, lower limit is all in nethermost row (instruction 11), and the upper limit is all in center row (instruction 13).When performing the digital independent being limited to the upper limit from down, it is from the beginning of row below (instruction 11), through row (instruction 12) above, terminates to center row (instruction 13).Row can also be organized by same circular buffer mode, after being i.e. assigned to last string, and lower sub-distribution first row.The substitute modes such as LRU can also be used.

The third uses the mode of the above-mentioned system fetched data in advance based on step-length to be the first and the second way to be combined.After branch judges to produce, will accordingly under the data address of data access instruction that is limited between the upper limit update, and the addresses access data memorizer updated with these, read data to be stored in data and read buffering 82 in case processor core 80 uses, and the data address of renewal is stored in array 52 in 58 row (the most above-mentioned second way).The most i.e. judge that selecting step-length to produce guess data address plus data address delivers to data buffer, such as corresponding data the most in the buffer, i.e. prefetches data and is stored in caching, to cover cache miss with current branch；But this guess data address is not stored in array 52 in 58 row.The generation of guess data address, and with its address data memory, in advance the data that may perform same data loading instruction needs next time are filled in data storage, all occur before new branch judges to produce, be therefore first kind of way.

Data storage 84 can realize with buffer.Data buffer typically has a tag unit (tag), wherein stores all or part of storage address.Storage address is sent to tag unit coupling, buffer address (buffer address being combined into such as the road number in the caching of multichannel group type of organization) is produced with the index part in storage address and block bias internal address part, in order to address the data storage (data RAM) in caching during match hit.Step length table disclosed by the invention can in its 58 row 67 direct memory buffers device addresses, territory, make data storage in the directly addressable caching in address that bus 38 is sent map without through tag unit.Now require that in data storage, row address is continuous in certain interval, calculates next data address in order to address generator 60 automatically by the mode of step size increments；During such as in data storage, address continuous print several rows are stored in the same road of multichannel group buffer.When buffer address crosses over discontinuous address space, method to be had adjusts buffer address, as changed road number.Further, can store the buffer address that storage address is corresponding in above-mentioned 67 territories, two addresses update by same step-length simultaneously.The wherein data storage in buffer address direct addressin access cache, wherein storage address compares for the storage address exported through bus 57 with processor core 80, correct to determine the address that address generator 60 produces.Can mate to obtain new buffer address in the tag unit of data buffer with the storage address in 67 when buffer address crosses over discontinuous address space under this form；And in continuation address space, then directly update buffer address with incremental mode, with the data storage in addressing caching.

Above-described embodiment is all as a example by data loading instruction, actually the method for the invention and system is readily adaptable for use in instruction data storage, first kind of way makes possible application write back caching (write back cache) to have a writing allocation policy (write allocate) as described above, the storage address that will store is produced by address generator, corresponding data is read into data buffer from memorizer, makes processor core storage data avoid cache miss when entering data buffer.

Step length table is a two-dimentional data structure, and one the parameter addressing of one of them Wesy, another parameter addressing of another Wesy is to access a list item in data structure.Parameter direct addressin can be used.If but it is discontinuous between parameter, then can be with squeezed parameter, such as Fig. 3, what in Fig. 4 embodiment, each depositor 72 in line decoder 54 played is the effect of tag unit in similar fully-associative buffer storage, have compressed the cavity in address space and array 52 (data access instruction accounts for three point of total number of instructions, and its address space is discontinuous).The structure being actually in the form of a similar fully-associative buffer storage of step length table from that point on.In column decoder 50, each address register also functions to same compression (branch instruction accounts for six point of total number of instructions, and its address space is the most discontinuous).Therefore step length table can be considered a two-dimentional complete association pressure texture.

Although embodiments of the invention only architectural feature and/or procedure to the present invention is described, but it is to be understood that, the claim of the present invention is not only limited to and described feature and process.On the contrary, described feature and process simply realize several examples of the claims in the present invention.

It should be appreciated that the multiple parts listed in above-described embodiment are only to facilitate describe, it is also possible to comprise miscellaneous part, or some parts can be combined or save.The plurality of parts can be distributed in multiple system, can be that be physically present or virtual, it is also possible to realizes (such as integrated circuit) with hardware, realizes with software or realized by combination thereof.

Obviously; according to the explanation to above-mentioned preferably embodiment; no matter how soon the technology development of this area has; which kind of may obtain the most in the future and be the most still difficult to the progress of prediction; replacement that corresponding parameter, configuration all can be adapted according to the principle of the present invention by those of ordinary skill in the art by the present invention, adjust and improve, all these replacements, adjust and improve the protection domain that all should belong to claims of the present invention.

Claims

1. a data address produces system, it is characterised in that including:

Step length table, in order to store data address and address increment；

Described data address generation system learns respectively according to data access instruction address and recording processor core is held The data address that row data access instruction produces, and perform data address increment between this instruction twice, it is stored in Step length table；

Described data address produces system and produces new data with data access instruction addressing of address step length table content Reference address accesses data storage, obtains data for processor core.

2. data address as claimed in claim 1 produces system, it is characterised in that in described step length table Contents in table is described data address；

Described step length table is by described data access instruction addressing of address.

3. data address as claimed in claim 1 produces system, it is characterised in that in described step length table Contents in table is described data address increment；

Described step length table one-dimensional by described data access instruction addressing of address；

Another dimension of described step length table is by the backward branch instruction addressing of address of successful branch.

4. data address system as claimed in claim 1, it is characterised in that described data address produces system Unite and produce address by the following method:

Described data address produces system by the described data address stored in described step length table and described data ground Location increment is added to produce new data address；

Described new data address is subsequently saved back step length table.

5. data address as claimed in claim 1 produces system, it is characterised in that including:

Described data address produces system and accesses the data access instruction address backward branch instruction at successful branch Step length table content between address and Branch Target Instruction address thereof；

Described data address produces system and produces new address data memory according to described step length table content；

Described data address produces system and accesses described data storage with described new address data memory, takes Obtain data to process for described processor；

Described new data address is subsequently saved back step length table.

6. a data address production method, it is characterised in that comprise the following steps:

Data between same data access instruction are performed for twice according to data access instruction address learning processor core Address increment；

Produce new data access address with data access instruction addressing of address step length table content and access data storage Device, obtains data for processor core.

7. data address production method as claimed in claim 6, it is characterised in that in described step length table Contents in table is described data address；

8. data address production method as claimed in claim 6, it is characterised in that in described step length table Contents in table is described data address increment；

9. data address method as claimed in claim 6, it is characterised in that described data address produces system Unite and produce address by the following method:

Described new data address is subsequently saved back step length table.

10. data address production method as claimed in claim 6, it is characterised in that including:

Described new data address is subsequently saved back step length table.