CN1287294C - Device and method for fast fetch target miss early detection - Google Patents

Device and method for fast fetch target miss early detection Download PDF

Info

Publication number
CN1287294C
CN1287294C CNB200410005333XA CN200410005333A CN1287294C CN 1287294 C CN1287294 C CN 1287294C CN B200410005333X A CNB200410005333X A CN B200410005333XA CN 200410005333 A CN200410005333 A CN 200410005333A CN 1287294 C CN1287294 C CN 1287294C
Authority
CN
China
Prior art keywords
cache memory
signal
busy
resource
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB200410005333XA
Other languages
Chinese (zh)
Other versions
CN1558331A (en
Inventor
詹姆斯恩·N·哈达吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INTELLIGENCE FIRST CO
Original Assignee
INTELLIGENCE FIRST CO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INTELLIGENCE FIRST CO filed Critical INTELLIGENCE FIRST CO
Priority to CNB200410005333XA priority Critical patent/CN1287294C/en
Publication of CN1558331A publication Critical patent/CN1558331A/en
Application granted granted Critical
Publication of CN1287294C publication Critical patent/CN1287294C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Abstract

The present invention relates to a cache memory which notifies failed hitting of other functional blocks in a microprocessor N time pulses earlier than the known method. N is the stage number of a pipeline of a cache memory. The multichannel cache memory receives a plurality of busy indication signals from resources needed for completing various operating types. The cache memory can divide the following both: a first group of resources needed for completing the job when a fast line taking address of the job hits the cache memory, and a second group of resources needed for completing job of the type when the address does not hit the cache memory. In case of failure to hit, in the second group of resources for job of the type, none resources are busy and the cache memory can immediately send out a failure-to-hit signal rather than returns the job back to again pass through the pipeline of the cache memory for testing the job again, which can result in a failure-to-hit signal is not sent out until N excessive time pulses cycles passes.

Description

But early detection is got miss device and method soon
Technical field
The present invention relates to the data caching field in the microprocessor, particularly get miss early detection and coherent signal soon.
Background technology
Modern computer system comprises microprocessor.Microprocessor can be from system storer (it is positioned at outside the microprocessor) reading of data, or with in the writing data into memory.Carry out the speed of data operation with microprocessor in inside and compare, transferring data is quite slow between microprocessor and storer.Therefore, microprocessor can be in idle many times, comes from the data of storer with wait, or waits for the data of desiring write store, and cause usefulness to reduce.
In order to deal with this problem, modern microprocessor comprises one or more cache memory.Cache memory (cache memory or cache) but be the subclass of the data in system storage-its storage system storer of storer-usually of microprocessor internal.Cache memory can be with data storage in fast line taking.The minimum data unit of fast line taking for can between cache memory and system storage, shifting.General fast line taking size is 32 bytes.Whether carry out when needing the instruction of reference data when microprocessor, microprocessor at first can be done inspection, be present in the cache memory and serve as effective to learn the fast line taking that comprises data.If, then because data are present in the cache memory, so can carry out this instruction immediately.That is under the situation that reads or load, when data were extracted into microprocessor from storer, microprocessor promptly need not waited for.Similarly, under the situation that writes or store, microprocessor can write data cache memory earlier and continue running, and must not wait until that data are written into storer and could continue.
When microprocessor detects, the fast line taking that comprises desired data is to be present in cache memory and when effective, this situation is commonly referred to as cache hit (cache hit) or hits.And detect that fast line taking does not exist or when invalid when microprocessor, then be called and get miss (cache miss) or miss soon.
When getting miss generation soon, it is miss that cache memory must notify other function square in the microprocessor to take place, so that cause miss fast line taking can be extracted into cache memory again.In known cache memory, cache memory can not notify other function square to take place miss in some cases.On the contrary, in some cases, cache memory can produce miss operation (transaction) by retry.When retry, cache memory can be arbitrated the operation of this operation and other access cache again, and rearrangement is by the pipeline of cache memory.
Most cache memory has high hit rate.Hit rate is not rare above 90% cache memory, and it depends on related data acquisition.Therefore, miss if cache memory postpones to inform that other function square has taken place, little usually to Effect on Performance.
Yet the configuration of some cache memory had low hit rate usually.For example, some microprocessor uses the hierarchy type cache architecture with a plurality of cache memories, is commonly referred to as first rank (L1) cache memory and second rank (L2) cache memory.The L1 cache memory is than the computation module of the more close microprocessor of L2 cache memory, and can quickly data be sent to computation module than L2 cache memory.Some L2 cache memory is to be used for being used as victim cache (victim cache).In the configuration of victim cache, when a fast line taking is given up or shifted out, this fast line taking can be write the L2 cache memory in the L1 cache memory, and can be with fast line taking writing system storer.We observe, and some L2 victim cache, particularly its size are same as or less than the situation of L1 cache memory, its hit rate is about 50%.
When the hit rate of cache memory reduced, cache memory was miss if other function square of the notice of delay has taken place, then has negative influence for usefulness.Therefore, need a kind of cache memory, it can reduce this kind and notify other function square that miss delay has taken place.
Summary of the invention
The present invention proposes a kind of cache memory, can be according to operation hit cache whether, finish this operation required do not do differentiation on the same group between resource devices.Whether cache memory only has much to do according to miss group resource, and produces miss actuating signal, and as if a resource being arranged for busy in the resource of hitting group, as long as miss group resource neither one is to have much to do, just can not this operation of retry.Hitting group and miss group resource can be along with the different and difference to some extent of homework type.
In order to reach above-mentioned purpose, the invention provides a kind of cache memory.This cache memory comprises first group of resource devices, when this cache memory is hit in the fast line taking address of an operation, in order to finish this operation.This cache memory also comprises second group of resource devices, when this address misses cache memory, in order to finish this operation.Second group of resource devices is different from first group of resource devices.This cache memory comprises the control logic device that is coupled to first group of resource devices and second group of resource devices.If this address misses cache memory, and second group of resource devices neither one resource is in busy, then no matter first group of resource devices whether have any resource to be in busy, control logic device can be set a miss indicator signal, and can this operation of retry.This cache memory also comprises the homework type input media that is coupled to this control logic device, is any in the multiple homework type that can be carried out by this cache memory in order to specify this operation.
If it is busy that this second group of resource devices has one or more resource to be in, then this steering logic is set a retry indicator signal.
The action of this this operation of retry comprises with the operation of this memory cache height speed cache memory of other access to be arbitrated again.
This memory cache height speed cache memory comprises a pipeline.
The action of this this operation of retry comprises makes this operation rearrangement by this pipeline.
If this address coincide in this memory cache height speed cache memory pipeline another operation an address, then this control logic device is set this retry indicator signal.
If this memory cache height speed cache memory is hit in this address, and if this first group of resource devices neither one resource is in busy, then no matter this second group of resource devices whether have any to be in busy, this control logic device is all set one and is hit indicator signal, and can this operation of retry.
This cache memory is one second rank cache memory.
On the other hand, the present invention also provides a kind of cache memory.This cache memory comprises control logic device, can receive a plurality of type signals, a hiting signal and a plurality of busy signal, and described a plurality of type signals are any in the several work type in order to specify an operation, and a fast line taking is specified in this operation; Whether described hiting signal is present in the cache memory in order to this fast line taking of expression; Described a plurality of busy signal is busy in order to point out whether corresponding a plurality of resources are in, and it is finish this operation required that a predetermined subset of wherein said resource is closed, and wherein to close be to decide according to this hiting signal and described homework type signal to this predetermined subset.The control logic device foundation only is arranged in the described busy signal that this predetermined subset is closed, and produces miss actuating signal.
If this hiting signal be pseudo-, then this predetermined subset of described busy signal is combined into one first predetermined subset and closes, and if this hiting signal is very, then is that one second predetermined subset is closed.
If this hiting signal be pseudo-, and in the described busy signal that closes corresponding to this first predetermined subset, neither one be very, then this miss actuating signal of this control logic device generation true value.
If this hiting signal be pseudo-, and in the described busy signal that closes corresponding to this first predetermined subset, have one or morely for very, then this control logic device produces the retry actuating signal of a true value.
If this hiting signal be true, and in the described busy signal that closes corresponding to this second predetermined subset, neither one is that very then this control logic device produces the actuating signal of hitting of a true value.
If this hiting signal be true, and in the described busy signal that closes corresponding to this second predetermined subset, have one or more for true, this retry actuating signal of this control logic device generation true value then.
On the other hand, the present invention also provides a kind of method of getting actuating signal soon that produces cache memory.The method comprises judges whether a fast line taking address is present in the cache memory, is present in the cache memory as if this fast line taking address, whether then judge has any to be in busy in first group of resource devices and if this fast line taking address is not present in the cache memory, judges that then whether to have any to be in second group of resource devices busy.The method also comprises when this fast line taking address is not present in cache memory, if neither one is in busyly in second group of resource devices, then produces a miss actuating signal, even certain resource in first group of resource devices is in busy.The method also is included in before this step that produces this miss actuating signal, judges a homework type of a relevant operation of this fast line taking address.Also comprise in this method: busy if any in this second group of resource devices is in when this fast line taking address is not present in this cache memory, then produce a retry actuating signal.
Also comprise in this method: busy if any in this first group of resource devices is in when this fast line taking address is present in this cache memory, then produce a retry actuating signal.
Also comprise in this method: when this fast line taking address is present in this cache memory,, then produces one and hit actuating signal, even certain resource in this second group of resource devices is in busy if neither one is in busyly in this first group of resource devices.
This cache memory is a pipeline cache memory, comprises a plurality of stages.
Also comprise in this method: will this fast line taking address and other the fast line taking address in the described stage compare.
Also comprise in this method: if one or more the coincideing in this fast line taking address and described other the fast line taking address in the described stage then produces this retry actuating signal.
An advantage of the present invention is that early N the clock period (N is the degree of depth of cache memory pipeline), other function square in the notice microprocessor has taken place miss.Another advantage is only to need the operation of retry access cache slightly.Because each retry operation needs the arbitration of cache memory, and wants N clock period just can finish at least, so the present invention can reduce the data traffic of cache memory.
Further feature of the present invention and advantage after cooperating following explanation and accompanying drawing, will be more obvious.
Description of drawings
Fig. 1 is the block scheme of the cache memory that illustrates according to the present invention;
Fig. 2 A, Fig. 2 B and Fig. 2 C are the block schemes of a correlation technique, show the known logic in the control logic device be contained in Fig. 1, its produce Fig. 1 the retry actuating signal, hit actuating signal and miss actuating signal;
Fig. 3 be illustrate have Fig. 2 A, the correlation technique operation workflow figure of Fig. 1 cache memory of Fig. 2 B and the known control logic device of Fig. 2 C;
Fig. 4 and 5 is the flow processs according to Fig. 3, illustrates four correlation technique sequential charts of the running of Fig. 1 cache memory with the known control logic device of Fig. 2;
Fig. 6 A and Fig. 6 B illustrate the block scheme that is contained in the logic in Fig. 1 control logic device according to the present invention, in conjunction with the logic of Fig. 2 C can produce Fig. 1 the retry actuating signal, hit actuating signal and miss actuating signal;
Fig. 7 is the operation workflow figure of Fig. 1 cache memory with Fig. 6 control logic device that illustrates according to the present invention;
Fig. 8 is the flow process according to Fig. 7 of the present invention, illustrates two sequential charts of the running of Fig. 1 cache memory with Fig. 6 control logic device;
Fig. 9 A and Fig. 9 B illustrate the block scheme that is contained in the logic in Fig. 1 control logic device according to another embodiment of the present invention, in conjunction with the logic of Fig. 2 C can produce Fig. 1 the retry actuating signal, hit actuating signal and miss actuating signal;
Figure 10 is the operation workflow figure of Fig. 1 cache memory with Fig. 9 control logic device that illustrates according to another embodiment of the present invention;
Figure 11 is the flow process of Figure 10 according to another embodiment of the present invention, illustrates the sequential chart of the running of Fig. 1 cache memory with Fig. 9 control logic device.
Embodiment
Now please refer to Fig. 1, it is the block scheme of the cache memory 100 that illustrates according to the present invention.In one embodiment, cache memory 100 is the L2 victim cache.
Cache memory 100 comprises a control logic device 102.Control logic device 102 can receive a plurality of requestor's signals 112.In Fig. 1, shown four representational requestor's signals, be called requestor A112A, requestor B112B, requestor C112C and requestor D112D.Requestor's signal 112 meeting request control logic devices 102 are arbitrated between the requestor of generation requestor's signal 112 with access cache 100 (particularly access data and mark array 106 hereinafter described), to carry out operation.Operation can be all kinds, as load operations, storage operation, spy on (snoop) operation and shift out (castout) operation.Control logic device 102 also can receive a plurality of homework type signal 108[1:M corresponding to various homework types], wherein M is the number of different work type.One of them homework type signal 108 can be set to very, wins the homework type that the requestor 112 of cache memory 100 arbitration will carry out with expression.
Cache memory 100 also comprises data and mark array 106, and it is coupled to control logic device 102.Data array 106 can be stored the fast line taking that is taken at soon wherein.Mark array 106 can be stored the relative address of the data array 106 fast line taking of depositing.Mark array 106 also can be stored the state of getting soon of every fast line taking in the data array 106.In the embodiment in figure 1, data and mark array 106 comprise the pipeline in 4 stages.Stage is denoted as J, K, L and M in regular turn.Data and mark array 106 can receive the fast line taking address 138 of the operation of access cache 100.Fast line taking address 138 can be descended along pipeline, by the buffer in J, K, L and the M pipeline stage.When response, mark array 106 can produce a hiting signal 134, is sent to control logic device 102, if fast line taking address 138 is present in the mark array 106, and the corresponding fast line taking in the data array 106 is that effectively then this hiting signal 134 is true.Moreover, data array 106 can output by the selected fast line taking data 136 in fast line taking address 138.
Cache memory 100 also comprises a plurality of address comparators 104, and it is coupled to data and mark array 106.The Address Register receiver address of address comparator 104 meetings from pipeline stage, and compare with the corresponding fast line taking address of various operations in cache memory 100 pipelines, and produce address conflict signal 118, to represent the identical situation between various pipelines address.Address conflict signal 118 is used for forcing the order of some operation in the pipeline, to guarantee data consistent.Address conflict signal 118 can be sent to control logic device 102.
Cache memory 100 also comprises a plurality of resource devices 114, and it is coupled to control logic device 102.Resource devices 114 is to be denoted as resource A114A, resource B114B, resource C114C and resource N114N typically.Resource devices 114 comprises various buffers, impact damper or carries out other required resource of operation.Control logic device 102 can receive a plurality of resource busy signals 116, and whether the corresponding resource devices 114 of its expression is in the busy condition that is used by another operation at present.Resource busy signal 116 is to be called busy 116C of busy 116B, resource C of resource A busy 116A, resource B and the busy 116N of resource N.
Cache memory 100 can response homework type signals 108, address conflict signal 118, hiting signal 134 and resource busy signal 116, and produce retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.Retry actuating signal 126 is that expression must retry be positioned at the operation bottom cache memory 100 pipelines.That is this operation must be asked access data and mark array 106 again, wins arbitration, and rearrangement is to pass through cache memory 100 pipelines.The reason that can not fulfil assignment is because of address conflict, or busy, as mentioned below because have a resource to be in the required resource devices 114 that fulfils assignment.Every type operation needs not resource devices 114 on the same group.There have a resource to be in as if the required resource devices group 114 of operation that is positioned at cache memory 100 pipelines bottom to be busy, and then control logic device 102 can produce the retry actuating signal 126 of true value.Advantageously, the present invention can following both do differentiation: when fast line taking address 138 miss data and mark array 106, the resource devices group 114 that the operations specific type is required, during with fast line taking address 138 hiting datas and mark array 106, the resource devices group 114 that this operations specific type is required hereinafter will do illustrating in greater detail.
Hit actuating signal 124 and be expression because fast line taking address 138 hiting datas and mark array 106, required resource devices 114 can with and the relevant address conflict of generation, so the operation meeting is taken as and hits and finish.In one embodiment, cache memory 100 is hyperchannel (multi-pass) cache memory.That is most homework type will be by cache memory 100 pipelines two or more times, could be complete.For example, when execution deposits the homework type of cache memory 100 in, for the first time by the time can read mark array 106, to obtain the state of the specified fast line taking in fast line taking address 138.Then, for the second time by and during follow-up necessary pass through, can be so that newly line taking and relative address thereof and state come more new data and mark array 106 soon.If cache memory 100 just is being read, true value to hit the data 136 that actuating signal 124 represents that promptly data and mark array 106 are sent be effective.In one embodiment, hitting actuating signal 124 is expression one operation non-retry ground first time results by pipeline.
Miss actuating signal 122 is expressions because fast line taking address 138 miss data and mark array 106, required resource devices 114 can with and the relevant address conflict of generation, so the operation meeting is taken as miss and finishes.Particularly, the miss actuating signal 122 of true value is that other function square generation in the expression microprocessor is miss, therefore corresponding to data (as system storage) acquisition elsewhere of line taking address 138 soon.Advantageously, in general, the present invention notifies other function square can be faster than well known cache memory via miss actuating signal 122.Especially, under the situation of well known cache memory because of the busy necessary miss operation of retry of resource, the present invention some the time early N clock period send the miss information that has produced, wherein N is the degree of depth of cache memory 100 pipelines, and is as mentioned below.
Before further explaining orally the present invention, narrate that well known cache memory is helpful, more can show advantage of the present invention fully in comparison.Well known cache memory is done explanation with reference to the cache memory 100 of Fig. 1; Yet in control logic device 102 parts of Fig. 1, cache memory 100 of the present invention is different with known cache memory, and is as mentioned below.
Now please refer to Fig. 2 A, Fig. 2 B and Fig. 2 C, be generically and collectively referred to as Fig. 2, it is the block scheme of a correlation technique, shows the known logic in the control logic device 102 be contained in Fig. 1, its produce Fig. 1 retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.
Now please refer to Fig. 2 A, control logic device 102 comprises M or door 242 corresponding to M kind homework type.Or door 242 can produce the resource busy signal 238[1:M of operations specific type].Or door 242 can receive the various resource busy signals 116 of Fig. 1, and each or door 242 can produce operations specific resource type busy signal 238[1:M] in corresponding one.If any enters the resource busy signal 116 of correspondence or door 242 for true, then resource busy signal 238[i] be true.Resource busy signal 116 combinations that enter each or the door 242 of Fig. 2 A are different (as shown in the figure), and the resource devices group 114 of Fig. 1 of every kind of homework type of this representative can be different.That is for every kind of homework type, the number of resource busy signal 116 and combination can be identical or different.
Now please refer to Fig. 2 B, control logic device 102 comprises one group of combinational logic, and it comprises one or 202, three at door and 204,214 and 224, two rejection gates 212 of door and 222 and one phase inverter 232.In control logic device 102, in the M kind homework type each, all comprise the combinational logic shown in the set of diagrams 2B, Fig. 2 B is shown, is the pairing combinational logic of a kind of representational homework type.Phase inverter 232 can receive the hiting signal 134 of Fig. 1, and produces a miss signal 234.
Or door 202 can receive the operations specific resource type busy signal 238[i of Fig. 2 A], and the address conflict signal 118 of Fig. 1.Or the output of door 202 can be sent to and door 204.Also can receive the homework type signal 108[i of Fig. 1 with door 204], and produce operations specific type retry action [i] signal 208.
Rejection gate 212 can receive operations specific resource type busy signal 238[i] and address conflict signal 118.The output of rejection gate 212 can be sent to and door 214.Also can receive homework type signal 108[i with door 214] and hiting signal 134, and produce an operations specific type and hit action [i] signal 218.
Rejection gate 222 can receive operations specific resource type busy signal 238[i] and address conflict signal 118.The output of rejection gate 222 can be sent to and door 224.Also can receive homework type signal 108[i with door 224] and miss signal 234, and produce the miss action of an operations specific type [i] signal 228.
Now please refer to Fig. 2 C, control logic device 102 comprises three or 252,254 and 256.Or door 256 can receive all operations specific type retry actuating signals 208 of Fig. 2 B, and can produce the retry actuating signal 126 of Fig. 1.Or all operations specific types that door 254 can receive Fig. 2 B hit actuating signal 218, and produce Fig. 1 hit actuating signal 124.Or the miss actuating signal 228 of all operations specific types of door 252 meeting reception Fig. 2 B, and the miss actuating signal 122 of generation Fig. 1.In one embodiment, control logic device 102 also comprises buffer, can be according to clock signal, storage by or 252,254 and 256 at door respectively output miss actuating signals 122, hit the value of actuating signal 124 and retry actuating signal 126.
Now please refer to Fig. 3, it is the correlation technique operation workflow figure that illustrates Fig. 1 cache memory 100 with the known control logic device 102 of Fig. 2.Flow process is from square 302.
In square 302, control logic device 102 is done arbitration between access data and mark array 106 and Fig. 1 requestor 112.Flow process is proceeded square 304.
In square 304, the relevant operation of winning the requestor 112 of arbitration enter and order by cache memory 100 pipelines.Flow process is proceeded square 306.
In square 306, this operation arrives the bottom of cache memory 100 pipelines.Flow process is proceeded decision block 308.
In decision block 308, the resource busy signal 116 of control logic device 102 controlling charts 1, whether judge to carry out in the required resource of the specified homework type of true value homework type signal 108, there have any to be in to be busy.If then flow process is proceeded square 312.Otherwise flow process is proceeded decision block 314.
In square 312, control logic device 102 produces Fig. 1 retry actuating signal 126 of true value, and is busy because of one or more the required resource devices 114 of possibility that fulfils assignment is in expression, so must the retry operation.Flow process can be returned and carry out square 302, so that this operation arbitrates again, with access data and mark array 106.
In decision block 314, whether control logic device 102 is checked address conflict signal 118, can hinder the address conflict that fulfils assignment with judgement and take place.If then flow process is proceeded square 312, so that can the retry operation.Otherwise flow process is proceeded square 316.
In square 316, cache memory 100 is finished this operation.That is cache memory 100 is left in this operation meeting, and perhaps the many execution of cache memory 100 meetings are one or more inferior by the action of cache memory 100 pipelines with this operation.Flow process ends at square 316.
Now please refer to Fig. 4, it is the flow process according to Fig. 3, illustrates two correlation technique sequential charts of the running of Fig. 1 cache memory 100 with the known control logic device 102 of Fig. 2.Sequential chart 1 comprises ten row corresponding to ten continuous clock cycles, and corresponding to four row of four cache memories, the 100 pipeline stage J of Fig. 1, K, L, M.Each project among the figure is to show the corresponding content of pipeline stage in the specific clock period.
Sequential chart 1 is the example that two operations produce address conflict, and because address conflict can cause second operation by retry.First operation is an operation that is moved to address A, is denoted as " CO A ", and it moves to cache memory 100 from another cache memory (as the L1 instruction cache).Second operation is a load operations that loads another cache memory (as the L1 data caching) from identical address A, is denoted as " Ld A ".
During the clock period 1, CO A can enter the stage J of pipeline, and down proceeds along pipeline, up to the clock period 4, can arrive at bottom stage M, as shown in the figure.Therefore, during the clock period 4, shift out the pairing Fig. 1 homework type of homework type signal 108 for true, and other homework type signal 108 is pseudo-.In this example, during clock 4, CO A can not hit data and the mark array 106 of Fig. 1; Therefore hiting signal 134 is pseudo-(as shown in the figure), is true and make the miss signal 234 of Fig. 2 B.In this external this example, during clock 4, carry out and shift out in the required resource of type of operation, neither one is in busy; Therefore Fig. 2 A shift out type of operation or door 242 can produce pseudo-values shifts out operations specific resource type busy signal 238[CO], as shown in the figure.Therefore, Fig. 2 C's or door 252 can produce the miss actuating signal 122 of true value, as shown in the figure.
During clock 5 to 9, two continuous shifting out that are denoted as " COFin A " fulfil assignment (or subjob) meeting in regular turn by pipeline, be moved to the fast line taking of cache memory 100 from the L1 instruction cache with storage.That is COFin A subjob deposits the fast line taking of being shifted out in cache memory 100 required extra action by cache memory 100.In an embodiment of cache memory 100, the data routing that enters data and mark array 106 is 16 byte wides, and fast line taking is 32 byte wides.Therefore, need two COFin A operation to store the fast line taking of one 32 byte.
During clock 2, Ld A can enter pipeline, and down proceeds along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, the pairing Fig. 1 homework type of Ld homework type signal 108 is true, and other homework type signal 108 is pseudo-.During clock 5, must before Ld A finishes, deposit cache memory 100 in owing to be moved to the relevant fast line taking data of address A, otherwise Ld A will receive wrong fast line taking data, so the address comparator 104 of Fig. 1 can produce the address conflict signal 118 of true value, as shown in the figure.Therefore, Fig. 2 B's or door 202 can produce true value, and make Fig. 2 B produce retry action [Ld] 208 of true value with door 204, its then make Fig. 2 C's or door 256 produce Fig. 1 retry actuating signal 126 of true value, as shown in the figure.
At clock 7, Ld A is by retry and win arbitration.That is Ld A can arbitrate again with access data and mark array 106, and the pipeline of passing through, and arrives the bottom of pipeline when clock 10, as shown in the figure.Therefore, during clock 10, the pairing Fig. 1 homework type of Ld homework type signal 108 is true, and other homework type signal 108 is pseudo-.Because the fast line taking of address A now just is taken in the cache memory 100 by shifting out operation soon, so during clock 10, data and mark array 106 can produce the hiting signal 134 of true value, as shown in the figure.In this example, there do not have load operations resource type device 114 to be in to be busy.Therefore, the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, the output of the rejection gate 212 of Fig. 2 B can produce true value, and Fig. 2 B that can produce true value with door 214 of Fig. 2 B hits action [Ld] 218, and it further makes Fig. 2 C's or door 254 generation true value Fig. 1 hit actuating signal 124, as shown in the figure.
Now please refer to the sequential chart 2 of Fig. 4, shown is another example that two operations produce address conflict, owing to this address conflict causes second operation by retry.Yet opposite with the example of sequential chart 1 is, in the example of sequential chart 2, the first operation meeting produces and hits action, and the second operation meeting produces miss action.First operation is one and loads another cache memories (as the L1 data caching) from cache memory 100 and, be denoted as " Ld A1 " to the load operations of address A.Second operation then is one and loads another cache memories (as the L1 instruction cache) from cache memory 100 and, be denoted as " Ld A2 " to the load operations of identical address A.
During the clock period 1, Ld A1 can enter the stage J of pipeline, and down proceeds along pipeline, up to the clock period 4, arrives at the bottom of pipeline, as shown in the figure.Therefore, during the clock period 4, be true corresponding to the homework type signal 108 of load operations type, and other homework type signal 108 be a puppet.In this example, during clock 4, Ld A1 meeting hiting data and mark array 106; Therefore hiting signal 134 is true (as shown in the figure), and makes miss signal 234 for pseudo-.In this external this example, during clock 4, carry out in the required resource of loading type operation, neither one is in busy; Therefore the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, Fig. 2 C's or door 254 can produce true value hit actuating signal 124, as shown in the figure.During clock 4, preceding 16 bytes of the fast line taking that data array 106 is sent via data-signal 136 can be deposited among the temporary register.
During clock 5 to 8, the loading that is called " LdFin A1 " fulfils assignment and can pass through pipeline in regular turn, and obtains back 16 bytes by the specified fast line taking in fast line taking address 138.
During clock 2, Ld A2 can enter pipeline, and down carries out along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, be true corresponding to Fig. 1 homework type signal 108 of Ld homework type, and other homework type signal 108 be a puppet.During clock 5, the address comparator 104 of Fig. 1 can produce the address conflict signal 118 of true value, as shown in the figure.Therefore, Fig. 2 B's or door 202 can produce true value, and make Fig. 2 B produce retry action [Ld] 208 of true value with door 204, its then make Fig. 2 C's or door 256 produce Fig. 1 retry actuating signal 126 of true value, as shown in the figure.
At clock 6, Ld A2 can and win arbitration by retry.That is Ld A2 can arbitrate again with access data and mark array 106, and the pipeline of passing through, and when clock 9, arrives the bottom of pipeline, as shown in the figure.Therefore, during clock 9, be true corresponding to Fig. 1 homework type signal 108 of Ld homework type, and other homework type signal 108 be a puppet.Because when the fast line taking of address A is taken to other cache memory soon by Ld A1 operation, it can being disabled in cache memory 100, so during clock 9, mark array 106 can produce the hiting signal 134 of pseudo-value, as shown in the figure.In this example, there do not have the load operations resource type to be in to be busy.Therefore, the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, the output of the rejection gate 222 of Fig. 2 B can produce true value, and the miss actions of Fig. 2 B [Ld] 228 that can produce true value with door 224 of Fig. 2 B, its further make Fig. 2 C's or door 252 produce the miss actuating signal 122 of Fig. 1 of true value, as shown in the figure.
Now please refer to Fig. 5, it is the flow process according to Fig. 3, illustrates two correlation technique sequential charts of the running of Fig. 1 cache memory 100 with the known control logic device 102 of Fig. 2.Shown sequential chart 3 and 4 among Fig. 5, with sequential chart 1 and 2 shown among Fig. 4 aspect a lot of, be similar.
Sequential chart 3 is an example of second operation hit cache 100 in two operations, but because a required resource of second homework type be in busy, so can be by retry; When retry, second operation is a hit cache 100.First operation is to load the load operations of cache memory 100 to address A from another cache memory (as the L1 data caching), is denoted as Ld A.Second operation is denoted as Ld B for load the load operations of another cache memory (as the L1 instruction cache) from address B.
During clock 1, Ld A can enter the stage J of pipeline, and down carries out along pipeline, and up to the clock period 4, it arrives at the bottom of pipeline, as shown in the figure.Therefore, during the clock period 4, be true corresponding to the homework type signal 108 of load operations type, and other homework type signal 108 be a puppet.In this example, during clock 4, Ld A meeting hiting data and mark array 106; Therefore hiting signal 134 is true (as shown in the figure), and makes miss signal 234 for pseudo-.In this external this example, during clock 4, carry out in the required resource of loading type operation, neither one is in busy; Therefore the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, Fig. 2 C's or door 254 can produce true value hit actuating signal 124, as shown in the figure.During clock 4, preceding 16 bytes of the fast line taking that data array 106 is sent via data-signal 136 can deposit among the temporary register.At this moment, one or more in the resource devices 114 of Fig. 1 needs therefore can be denoted as busy in order to finish Ld A.
During clock 5 to 8, the loading that is called " LdFin A " fulfil assignment can order by pipeline, and obtain back 16 bytes by the specified fast line taking in fast line taking address 138.
During clock 2, Ld B can enter pipeline, and down carries out along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, be true corresponding to Fig. 1 homework type signal 108 of Ld homework type, and other homework type signal 108 be a puppet.In this example, during clock 5, Ld A can make to be needed to become busy in order in the resource of carrying out the loading type operation one or more; Therefore the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or door 242 meeting generation true value], as shown in the figure.Therefore, Fig. 2 C's or door 256 can produce the retry actuating signal 126 of true value, as shown in the figure.
At clock 6, Ld B can and win arbitration by retry.That is Ld B can arbitrate again with access data and mark array 106, and the pipeline of passing through, and when clock 9, arrives the bottom of pipeline, as shown in the figure.Therefore, during clock 9, be true corresponding to Fig. 1 homework type signal 108 of Ld homework type, and other homework type signal 108 be a puppet.In this example, address B meeting hiting data and mark array 106, and during clock 9, data and mark array 106 can produce the hiting signal 134 of true value.In this example, during clock 9, be in busy without any the load operations resource type.Therefore, the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, the output of the rejection gate 212 of Fig. 2 B can produce true value, and Fig. 2 B that can produce true value with door 214 of Fig. 2 B hits action [Ld] 218, and it further makes Fig. 2 C's or door 252 generation true value Fig. 1 hit actuating signal 124, as shown in the figure.
Sequential chart 4 is an example of operation hit cache 100, but because a required resource of this homework type be in busy, so can be by retry; When retry, second operation can't hit cache 100.In other words, sequential chart 4 is identical with sequential chart 3, and except the part of clock period 5 and 9, this is that Ld B can hiting data and mark array 106 because in the example of sequential chart 4.Therefore, in sequential chart 4, address B can not hit mark array 106, and during clock 5, mark array 106 can produce the hiting signal 134 of pseudo-value.
At clock 6, Ld B can and win arbitration by retry, the pipeline of passing through, and when clock 9, arrive the bottom of pipeline, as shown in the figure.Because address B can not hit mark array 106, so during clock 9, mark array 106 can produce the hiting signal 134 of pseudo-value, as shown in the figure.In this example, during clock 9, be in busy without any the load operations resource type.Therefore, the loading operations specific resource type busy signal 238[Ld loading type operation of Fig. 2 A or the pseudo-value of door 242 meeting generations], as shown in the figure.Therefore, the output of the rejection gate 222 of Fig. 2 B can produce true value, and the miss actions of Fig. 2 B [Ld] 228 that can produce true value with door 224 of Fig. 2 B, its further make Fig. 2 C's or door 252 produce the miss actuating signal 122 of Fig. 1 of true value, as shown in the figure.
Can observe from sequential chart 4, though just know that when clock 5 address B can hiting data and mark array 106, the known action logic of getting soon of Fig. 2 just produces miss action 122 during the clock behind four clocks 9, and this does not benefit.That is, pairing four clock period of retry Ld B, can cause miss action 122 than its original can late four clock period just produce.This is because known logic can't be distinguished and finishes the required resource devices group 114 of miss homework type, and finishes the required resource devices group 114 of homework type of hitting.Therefore, other need be known miss function square has taken place in the microprocessor, as extracting the bus interface logic that cause miss fast line taking from system storage, just must wait for than the required longer time, just can be apprised of taken place miss.
Yet the present invention is by recognizing, for every kind of homework type, miss required resource may solve this problem with to hit required resource different.Advantageously, the present invention can do difference hitting between resource and the miss resource of a known homework type, and if finish in the miss required resource of this homework type, the neither one resource is in busy, then can avoid this operation of retry, and produce the signal of miss action immediately, as the following explanation of doing.
Now please refer to Fig. 6 A and Fig. 6 B and be can distinguish the hit/miss resource get action logic soon, be generically and collectively referred to as Fig. 6, it is to illustrate the block scheme that is contained in the logic in Fig. 1 control logic device 102 according to the present invention, in conjunction with the logic of Fig. 2 C can produce Fig. 1 retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.
Now please refer to Fig. 6 A, control logic device 102 is similar with the control logic device 102 of Fig. 2 A,-one is used for one group and hits resource except every kind of homework type can produce two resource busy signals, another then is used for, and one group of miss resource-rather than as Fig. 2 A, every kind of homework type has only a resource busy signal.
The control logic device 102 of Fig. 6 A comprises M or door 642 corresponding to M kind homework type.Door 642 and Fig. 2 A's or door 242 similar, except or door 642 can produce operations specific types and hit resource busy signal 652[1:M].Or door 642 can receive the various resource busy signals 116 of Fig. 1, and each or door 642 can produce the operations specific types and hit resource busy signal 652[1:M] in corresponding one.If enter or door 642 resource busy signal 116 in any for true, then hit resource busy signal 652[i] for true.Resource busy signal 116 combinations that enter each or the door 642 of Fig. 6 A are different (as shown in the figure), and when hitting to be presented at, Fig. 1 resource devices group 114 of every kind of homework type may be different.That is when hitting, for every kind of homework type, the number of resource busy signal 116 and combination can be identical or different.
The control logic device 102 of Fig. 6 A also comprises corresponding to the M of M kind homework type extra or door 644, with or door 642 similar.Yet, or door 644 can produce the miss resource busy signal of operations specific type 654[1:M].Or door 644 can receive the various resource busy signals 116 of Fig. 1, and each or door 644 can produce the miss resource busy signal of operations specific types 654[1:M] in corresponding one.If enter or door 644 resource busy signal 116 in any for true, then miss resource busy signal 654[i] be true.Enter the various of Fig. 6 A or 116 combinations of door 644 resource busy signal are different (as shown in the figure), when miss, Fig. 1 resource devices group 114 of every kind of homework type may be different to be presented at.That is when miss, for every kind of homework type, the number of resource busy signal 116 and combination can be identical or different.
Now please refer to Fig. 6 B, control logic device 102 is similar with the control logic device 102 of Fig. 2 B, except the control logic device 102 of Fig. 6 B is understood hitting resource busy signal 652 and the miss resource busy signal 654 of reception Fig. 6 A, and can not receive the general resource busy signal 238 of Fig. 2 A, with the retry actuating signal 208 that produces the operations specific type, hit actuating signal 218 and miss actuating signal 228.
Control logic device 102 comprises one group of combinational logic, it comprise three with 204,214 and 224, two rejection gates 212 of door and 222 and one phase inverter 232, all with the same numeral class of Fig. 2 B seemingly.In addition, control logic device 102 comprise two with door 664 and 666 and one or 662, itself and Fig. 2 be B's or 202 similar, but have three input ends, but not two.Phase inverter 232 can receive the hiting signal 134 of Fig. 1, and produces miss signal 234.
The operations specific type that can receive Fig. 6 A with door 664 is hit resource busy signal 652[i], and hiting signal 134.The miss resource busy signal of operations specific type 654[i with door 666 meeting reception Fig. 6 A], and miss signal 234.
Or door 662 can receive and the output of door 664 and 666, and the address conflict signal 118 of Fig. 1.Or the output of door 662 can be sent to and door 204.Also can receive the homework type signal 108[i of Fig. 1 with door 204], and produce operations specific type retry action [i] signal 208.
Rejection gate 212 can receive the operations specific type and hit resource busy signal 652[i] and address conflict signal 118.The output of rejection gate 212 can be sent to and door 214.Also can receive homework type signal 108[i with door 214] and hiting signal 134, and generation operations specific type is hit action [i] signal 218.
Rejection gate 222 can receive the miss resource busy signal of operations specific type 654[i] and address conflict signal 118.The output of rejection gate 222 can be sent to and door 224.Also can receive homework type signal 108[i with door 224] and miss signal 234, and produce the miss action of operations specific type [i] signal 228.
The control logic device 102 of Fig. 6 also comprises the control logic device 102 similar combinational logics with Fig. 2 C, in order to the retry actuating signal 208 that receives Fig. 6 B, hit actuating signal 218 and miss actuating signal 228, and produce Fig. 1 retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.
Now please refer to Fig. 7, it is the operation workflow figure of Fig. 1 cache memory 100 with Fig. 6 control logic device 102 that illustrates according to the present invention, is the running that can distinguish the cache memory of hit/miss resource.Flow process is from square 702.
In square 702, the control logic device 102 of Fig. 1 is done arbitration between Fig. 1 requestor 112 of access data and mark array 106.Flow process is proceeded square 704.
In square 704, the relevant operation of winning the requestor 112 of arbitration enters and in regular turn by cache memory 100 pipelines.Flow process is proceeded square 706.
In square 706, this operation arrives the bottom of cache memory 100 pipelines.Flow process is proceeded decision block 708.
In decision block 708, whether the hiting signal 134 of control logic device 102 controlling charts 1 of Fig. 6 hits or miss cache memory 100 to judge fast line taking address 138.If hiting signal 134 is pseudo-(that is miss generation), then flow process is proceeded decision block 712.Otherwise, hit generation and flow process and proceed decision block 716.
In decision block 712, the resource busy signal 116 of control logic device 102 controlling charts 1, judging when the miss cache memory 100 of this operation, whether in the required resource of the specified homework type of execution true value homework type signal 108, there have any to be in to be busy.If then flow process is proceeded square 714.Otherwise flow process is proceeded decision block 718.
In decision block 714, the control logic device 102 of Fig. 6 produces Fig. 1 retry actuating signal 126 of true value, with the expression because of finish have appointment hit or one or more required resource devices 114 of operation of miss characteristics is in busy, or generation address conflict, as in square 708,712,716 and 718, judging, so must the retry operation.Flow process can be returned and carry out square 702, so that this operation arbitrates again, with access data and mark array 106.
In decision block 716, the resource busy signal 116 of the control logic device 102 meeting controlling charts 1 of Fig. 6, judging when this operation hit cache 100, whether in the required resource of the specified homework type of execution true value homework type signal 108, there have any to be in to be busy.If then flow process is proceeded square 714.Otherwise flow process is proceeded decision block 718.
In decision block 718, whether the control logic device 102 of Fig. 6 is checked address conflict signal 118, can hinder the address conflict that fulfils assignment with judgement and take place.If then flow process is proceeded square 714, so that can this operation of retry.Otherwise flow process is proceeded square 722.
At square 722, cache memory 100 is finished this operation.Flow process ends at square 722.
Now please refer to Fig. 8, it is the flow process according to Fig. 7 of the present invention, illustrates two sequential charts of the running of Fig. 1 cache memory 100 with Fig. 6 control logic device 102.
The similar part of the example of the example of sequential chart 5 and sequential chart 3 is and then Ld B operation after the Ld A operation, and the two all can hit cache 100.Yet shown example is the control logic device 102 that is applied to Fig. 6 in the sequential chart 5, and it can be in the required resource devices group 114 of miss operations specific type, and does differentiation between the required resource devices group 114 of the homework type that hits.Particularly, sequential chart 5 be the control logic device 102 of displayed map 6 how in the required resource devices group 114 of miss load operations type, and do differentiation between the required resource devices group 114 of the load operations type of hitting.
Now please refer to sequential chart 5, during the clock period 1, Ld A can enter the stage J of pipeline, and down carries out along pipeline, and up to the clock period 4, it arrives at the bottom of pipeline, as shown in the figure.Therefore, during the clock period 4, be true corresponding to the homework type signal 108 of load operations type, and other homework type signal 108 be a puppet.In this example, during clock 4, Ld A meeting hiting data and mark array 106; Therefore hiting signal 134 is true (as shown in the figure), and makes miss signal 234 for pseudo-.
In addition, in this example, during clock 4, in the required resource devices 114 of the loading type operation that execution is hit, neither one is in busy, and during the when clock cycle in office, carry out in the required resource devices 114 of miss loading type operation, neither one is in busy; Therefore the loading operations specific type loading type operation of Fig. 6 A or the pseudo-value of door 642 meeting generations is hit resource busy signal 652[Ld], and the miss resource signal 654[Ld of the loading operations specific type loading type operation of Fig. 6 A or the pseudo-value of door 644 meeting generations], as shown in the figure.In this example, at all clock periods, address conflict signal 118 also is pseudo-.Therefore, the output of the rejection gate 212 of Fig. 6 B can produce true value, and Fig. 6 B that can produce true value with door 214 of Fig. 6 B hits action [Ld] 218, and it further can make Fig. 2 C's or door 254 generation true value Fig. 1 hit actuating signal 124, as shown in the figure.During clock 4, preceding 16 bytes of the fast line taking that data array 106 is sent via data-signal 136 can deposit among the temporary register.At this moment, one or more in the resource devices 114 of Fig. 1 needs therefore can be denoted as busy in order to finish the Ld A with the characteristic of hitting.
During clock 5 to 8, the loading that is denoted as " LdFin A " fulfil assignment can order by pipeline, and obtain back 16 bytes by the specified fast line taking in fast line taking address 138.
During clock 2, Ld B can enter pipeline, and down carries out along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, be true corresponding to the homework type signal 108 of Fig. 1 of Ld homework type, and other homework type signal 108 be a puppet.In this example, during clock 5, Ld B can hit mark array 106; Therefore hiting signal 134 is true, as shown in the figure, and makes miss signal 234 for pseudo-.In this example, during clock 5, Ld A can make in the resource devices 114 that needs the loading type operation of hitting in order to execution one or more become busy; Therefore the loading operations specific types loading type operation of Fig. 6 A or door 642 meeting generation true value are hit resource busy signal 652[Ld], as shown in the figure.Therefore, Fig. 6 B can produce true value with output door 664.Therefore, Fig. 6 B's or door 662 output can produce true value.Therefore, Fig. 6 B's understands the loading operations specific type retry actuating signal 208[Ld that produce true value with door 204].Therefore, Fig. 2 C's or door 256 can produce the retry actuating signal 126 of true value, as shown in the figure.
At clock 6, Ld B can and win arbitration by retry.That is Ld B can arbitrate again with access data and mark array 106, and the pipeline of passing through, and when clock 9, arrives the bottom of pipeline, as shown in the figure.Therefore, during clock 9, be true corresponding to the homework type signal 108 of Fig. 1 of Ld homework type, and other homework type signal 108 be a puppet.In this example, address B can hit mark array 106, and during clock 9, mark array 106 can produce the hiting signal 134 of true value, as shown in the figure.In this example, during clock 9, do not have load operations resource type device 114 can because of hit or miss be in busy.Therefore, the loading operations specific type loading type operation of Fig. 6 A or the pseudo-value of door 642 meeting generations is hit resource busy signal 652[Ld], and the miss resource busy signal of the loading operations specific type 654[Ld loading type operation of Fig. 6 A or the pseudo-value of door 644 meeting generations], as shown in the figure.Therefore, the output of the rejection gate 212 of Fig. 6 B can produce true value, and Fig. 6 B that can produce true value with door 214 of Fig. 6 B hits action [Ld] 218, and it further can make Fig. 2 C's or door 254 generation true value Fig. 1 hit actuating signal 124, as shown in the figure.
Be noted that in the example of sequential chart 5 when hitting of Ld B betides clock 5, finish the required resource devices group 114 of loading type operation and be in busyly, so Ld B must be by retry, as sequential chart 3.Therefore, about Ld B hit action 124 can be than more Zao generation in the sequential chart 3.
The example of the example of the sequential chart 6 of Fig. 8 and the sequential chart 4 of Fig. 5 similarly be in, be and then Ld B operation after the Ld A operation, Ld A can hit cache 100, and Ld can hit cache 100.Yet shown example is the control logic device 102 that is applied to Fig. 6 in the sequential chart 6, and it can be in the required resource devices group 114 of miss load operations type, and does differentiation between the required resource devices group 114 of the load operations type of hitting.Therefore, in sequential chart 6,, and can just not produce in the time of will arriving clock 9 as using the situation of the known control logic device 102 of Fig. 2 in the sequential chart 4 of Fig. 5 about the advantageously generation in the clock period 5 of miss action 228 meetings of operation Ld B.
Now please refer to sequential chart 6, during the clock period 1, Ld A can enter the stage J of pipeline, and down carries out along pipeline, and up to the clock period 4, it arrives at the bottom of pipeline, as shown in the figure.Therefore, during the clock period 4, be true corresponding to the homework type signal 108 of load operations type, and other homework type signal 108 be a puppet.In this example, during clock 4, Ld A can hit mark array 106; Therefore hiting signal 134 is true (as shown in the figure), and makes miss signal 234 for pseudo-.
In addition, in this example, during clock 4, carry out in the required resource devices 114 of the loading type operation hit, neither one is in busy; Therefore the loading operations specific type loading type operation of Fig. 6 A or the pseudo-value of door 642 meeting generations is hit resource busy signal 652[Ld], as shown in the figure.In this example, during the when clock cycle in office, carry out the miss required resource of loading type operation and do not have much to do; So at all clock periods, the miss resource busy signal of the loading operations specific type 654[Ld loading type operation of Fig. 6 A or that door 644 meeting generation puppets are worth], as shown in the figure.During clock 4 because miss signal 234 is pseudo-, so Fig. 6 B can produce the miss action of Fig. 6 B [Ld] signals 228 of pseudo-value with door 224, as shown in the figure.In this example, at all clock periods, address conflict signal 118 also is pseudo-.Therefore, the output of the rejection gate 212 of Fig. 6 B can produce true value, and Fig. 6 B that can produce true value with door 214 of Fig. 6 B hits action [Ld] 218, and it further can make Fig. 2 C's or door 254 generation true value Fig. 1 hit actuating signal 124, as shown in the figure.During clock 4, preceding 16 bytes of the fast line taking that data array 106 is sent via data-signal 136 can deposit among the temporary register.At this moment, one or more in the resource devices 114 of Fig. 1 needs therefore can be denoted as busy in order to finish the Ld A with the characteristic of hitting.
During clock 5 to 8, the loading that is denoted as " LdFin A " fulfil assignment can order by pipeline, and obtain back 16 bytes by the specified fast line taking in fast line taking address 138.
During clock 2, Ld B can enter pipeline, and down carries out along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, be true corresponding to the homework type signal 108 of Fig. 1 of Ld homework type, and other homework type signal 108 be a puppet.In this example, during clock 5, Ld B can not hit mark array 106; Therefore hiting signal 134 is pseudo-, as shown in the figure, and makes miss signal 234 for true.In this example, as previously discussed, during the when clock in office, the miss group of resource devices 114 that is used for the loading type operation be not be in busy; So during all clocks, the miss resource busy signal of the loading operations specific type 654[Ld loading type operation of Fig. 6 A or that door 644 meeting generation puppets are worth], as shown in the figure.Therefore, during clock 5, the output of the rejection gate 222 of Fig. 6 B can produce true value.Therefore, Fig. 6 B's understands the miss actuating signal 228[Ld of loading operations specific type that produce true value with door 224].Therefore, during clock 5, Fig. 2 C's or door 252 can produce the miss actuating signal 122 of true value, as shown in the figure.
As viewed with sequential chart 4 by comparing sequential chart 6, in case during clock 5, judge the miss cache memory 100 of Ld B, and finish and load in the miss required resource, neither one is in busy, and the cache memory 100 with Fig. 6 control logic device 102 then of the present invention just can not retry Ld B operation.Therefore, advantageously, the comparable known technology of the present invention early 4 the clock period-that is, the miss action 122 of the degree of depth-generation of cache memory 100 pipelines.
Now please refer to Fig. 9 A and Fig. 9 B, be can distinguish the hit/miss resource get action logic soon, be generically and collectively referred to as Fig. 9, it is to illustrate the block scheme that is contained in the logic in Fig. 1 control logic device 102 according to another embodiment of the present invention, in conjunction with the logic of Fig. 2 C can produce Fig. 1 retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.
Now please refer to Fig. 9 A, control logic device 102 is similar with the control logic device 102 of Fig. 6 A, except every kind of homework type can not produce miss resource busy signal.This is because among the embodiment of Fig. 9, when miss generation, and for one or more plant homework type, the resource devices group 114 that does not fulfil assignment required.
Now please refer to Fig. 9 B, control logic device 102 is similar with the control logic device 102 of Fig. 6 B, except the control logic device 102 of Fig. 9 B can not receive miss resource busy signal, to produce operations specific type retry actuating signal 208, to hit actuating signal 218 and miss actuating signal 228.
The control logic device 102 of Fig. 9 B comprises one group of combinational logic, and it comprises four and door 204,214,224 and 664, one rejection gate 212 and a phase inverter 232, and is all similar with the same numeral lock of Fig. 6 B.In addition, control logic device 102 comprises one or 962, with Fig. 6 B's or door 662 similar, but have two input ends, rather than three.Phase inverter 232 can receive the hiting signal 134 of Fig. 1, and produces miss signal 234.Phase inverter 932 meeting receiver address collision signals 118, and the input of inversion signal conduct with door 224 is provided.
The operations specific type that can receive Fig. 9 A with door 664 is hit resource busy signal 652[i], and hiting signal 134.Or door 962 can receive and the output of door 664, and the address conflict signal 118 of Fig. 1.Or the output of door 962 can be sent to and door 204.Also can receive the homework type signal 108[i of Fig. 1 with door 204], and produce operations specific type retry action [i] signal 208.
Rejection gate 212 can receive the operations specific type and hit resource busy signal 652[i] and address conflict signal 118.The output of rejection gate 212 can be sent to and door 214.Also can receive homework type signal 108[i with door 214] and hiting signal 134, and generation operations specific type is hit action [i] signal 218.
Can receive homework type signal 108[i with door 224], the output and the miss signal 234 of phase inverter 932, and produce the miss action of operations specific type [i] signal 228.
The control logic device 102 of Fig. 9 also comprises the control logic device 102 similar combinational logics with Fig. 2 C, in order to the retry actuating signal 208 that receives Fig. 9 B, hit actuating signal 218 and miss actuating signal 228, and produce Fig. 1 retry actuating signal 126, hit actuating signal 124 and miss actuating signal 122.
Now please refer to Figure 10, the operation workflow figure of Fig. 1 cache memory 100 with Fig. 9 control logic device 102 that it illustrates according to another embodiment of the present invention is the runnings that can distinguish the cache memory of hit/miss resource.Flow process is from square 1002.
In square 1002, the control logic device 102 of Fig. 1 is done arbitration between Fig. 1 requestor 112 of access data and mark array 106.Flow process is proceeded square 1004.
In square 1004, the relevant operation of winning the requestor 112 of arbitration enters and in regular turn by cache memory 100 pipelines.Flow process is proceeded square 1006.
In square 1006, this operation arrives the bottom of cache memory 100 pipelines.Flow process is proceeded decision block 1008.
In decision block 1008, whether the hiting signal 134 of control logic device 102 controlling charts 1 of Fig. 9 hits or miss cache memory 100 to judge fast line taking address 138.If hiting signal 134 is pseudo-(that is miss generation), then flow process is proceeded decision block 1018.Otherwise, hit generation and flow process and proceed decision block 1016.
In decision block 1016, the resource busy signal 116 of control logic device 102 controlling charts 1 of Fig. 9, to judge when this operation hit cache 100, whether carry out in the required resource devices 114 of the specified homework type of true value homework type signal 108, there have any to be in to be busy.If then flow process is proceeded square 1014.Otherwise flow process is proceeded decision block 1018.
In decision block 1014, the control logic device 102 of Fig. 9 produces Fig. 1 retry actuating signal 126 of true value, with the expression because of finish have appointment hit or one or more required resource devices 114 of operation of miss characteristics is in busy, or generation address conflict, as in square 1008,1012,1016 and 1018, judging, so must the retry operation.Flow process can be returned and carry out square 1002, so that this operation arbitrates again, with access data and mark array 106.
In decision block 1018, whether the control logic device 102 of Fig. 9 is checked address conflict signal 118, can hinder the address conflict that fulfils assignment with judgement and take place.If then flow process is proceeded square 1014, so that can this operation of retry.Otherwise flow process is proceeded square 1022.
In square 1022, cache memory 100 is finished this operation.Flow process ends at square 1022.
Now please refer to Figure 11, it is the flow process of Figure 10 according to another embodiment of the present invention, illustrates the sequential chart of the running of Fig. 1 cache memory 100 with Fig. 9 control logic device 102.
The similar part of the example of the example of sequential chart 7 and the sequential chart of Fig. 86 is and then Ld B operation after the Ld A operation.Yet, Ld A and Ld B the two all can hit cache 100.Moreover shown example is the control logic device 102 that is applied to Fig. 9 in the sequential chart 7, when miss generation, does not finish the required resource devices group 114 of load operations.
Now please refer to sequential chart 7, during the clock period 1, Ld A can enter the stage J of pipeline, and down carries out along pipeline, and up to the clock period 4, it arrives at the bottom of pipeline, as shown in the figure.Therefore, during the clock period 4, be true corresponding to the homework type signal 108 of load operations type, and other homework type signal 108 be a puppet.In this example, during clock 4, Ld A can hiting data and mark array 106; Therefore hiting signal 134 is pseudo-(as shown in the figure), and makes miss signal 234 for true.
In addition, in this example, during the when clock cycle in office, carry out in the required resource devices 114 of the loading type operation of hitting, neither one is in busy; Therefore the loading operations specific type loading type operation of Fig. 9 A or the pseudo-value of door 642 meeting generations is hit resource busy signal 652[Ld], as shown in the figure.In this example, at all clock periods, address conflict signal 118 also is pseudo-.Therefore, during clock 4, the miss actions of Fig. 9 B [Ld] 228 that can produce true value with door 224 of Fig. 9 B, its further can make Fig. 2 C's or door 252 produce the miss actuating signal 122 of Fig. 1 of true value, as shown in the figure.
During clock 2, Ld B can enter pipeline, and down carries out along pipeline, up to the clock period 5, just arrives at the bottom, as shown in the figure.Therefore, during clock 5, be true corresponding to the homework type signal 108 of Fig. 1 of Ld homework type, and other homework type signal 108 be a puppet.In this example, during clock 5, Ld B can not hit mark array 106; Therefore hiting signal 134 is pseudo-, as shown in the figure, and makes miss signal 234 for true.Therefore, during clock 5, Fig. 9 B can produce the miss actuating signal 228[Ld of loading operations specific type of true value with door 224], its further make or door 252 during clock 5, produce the miss actuating signal 122 of true value, as shown in the figure.
Can be observed from sequential chart 7, advantageously, the present invention can be for miss Ld B operation produces retry action 126, even take place miss at the preceding Ld A of Ld B yet.Even miss when and then taking place, early four clock period produce miss action 122 to the still comparable known method of the present invention therefore.
Though the present invention and purpose thereof, feature and advantage are described in detail, other embodiment also can be within the scope of the present invention.For example, getting soon in the stratum of microprocessor, the cache memory of any stratum all can use the present invention.Moreover the present invention is applicable to inside or outside cache memory, and is applicable to the cache memory of any size or the pipeline degree of depth.
In a word, the above only is preferred embodiment of the present invention, when not limiting the scope that the present invention is implemented with this.All equalizations of doing according to claims of the present invention change and modify, and all should still belong in the scope that patent of the present invention contains.

Claims (21)

1, a kind of cache memory, it comprises:
One first group of resource devices is when hitting this cache memory in a fast line taking address of an operation, in order to finish this operation;
One second group of resource devices, when this cache memory of this address misses, in order to finish this operation, this second group of resource devices is to be different from this first group of resource devices;
One control logic device, be coupled to this first group of resource devices and this second group of resource devices, wherein if this cache memory of this address misses, and this second group of resource devices neither one resource is in busy, then no matter this first group of resource devices whether have any resource to be in busy, this control logic device can be set a miss indicator signal, and can this operation of retry; And
One homework type input media is coupled to this control logic device, is any in the multiple homework type that can be carried out by this cache memory in order to specify this operation.
2, cache memory as claimed in claim 1, it is busy to it is characterized in that having one or more resource to be in as if this second group of resource devices, and then this control logic device is set a retry indicator signal.
3, cache memory as claimed in claim 2, the action that it is characterized in that this this operation of retry comprises with the operation of other this cache memory of access to be arbitrated again.
4, cache memory as claimed in claim 3 is characterized in that this cache memory comprises a pipeline.
5, cache memory as claimed in claim 4 is characterized in that the action of this this operation of retry comprises that this operation is resequenced passes through this pipeline.
6, cache memory as claimed in claim 4 is characterized in that then this control logic device is set this retry indicator signal as if an address of another operation in identical this cache memory pipeline in this address.
7, cache memory as claimed in claim 1, it is characterized in that if this cache memory is hit in this address, and if this first group of resource devices neither one resource is in busy, then no matter this second group of resource devices whether have any to be in busy, this control logic device is all set one and is hit indicator signal, and can this operation of retry.
8, cache memory as claimed in claim 1 is characterized in that this cache memory is one second rank cache memory.
9, a kind of cache memory, it comprises:
One control logic device can receive a plurality of type signals, a hiting signal and a plurality of busy signal, and described a plurality of type signals are any in the several work type in order to specify an operation, and a fast line taking is specified in this operation; Whether described hiting signal is present in the cache memory in order to this fast line taking of expression; Described a plurality of busy signal is busy in order to point out whether corresponding a plurality of resources are in, and it is finish this operation required that a predetermined subset of wherein said resource is closed, and wherein to close be to decide according to this hiting signal and described homework type signal to this predetermined subset; Described type signal, this hiting signal and described busy signal only are arranged in the described busy signal that this predetermined subset is closed in order to foundation, and produce a miss actuating signal.
10, cache memory as claimed in claim 9, it is characterized in that if this hiting signal for pseudo-, then this predetermined subset of described busy signal is combined into one first predetermined subset and closes, and if this hiting signal be very, then is that one second predetermined subset is closed.
11, cache memory as claimed in claim 10, it is characterized in that if this hiting signal is puppet, and in the described busy signal that closes corresponding to this first predetermined subset, neither one is true, and then this control logic device produces this miss actuating signal of true value.
12, cache memory as claimed in claim 10, it is characterized in that if this hiting signal is puppet, and in the described busy signal that closes corresponding to this first predetermined subset, have one or morely for true, then this control logic device produces the retry actuating signal of a true value.
13, cache memory as claimed in claim 10, it is characterized in that if this hiting signal for true, and in the described busy signal that closes corresponding to this second predetermined subset, neither one be that very then this control logic device produces the actuating signal of hitting of a true value.
14, cache memory as claimed in claim 10, it is characterized in that if this hiting signal is true, and in the described busy signal that closes corresponding to this second predetermined subset, have one or morely for true, then this control logic device produces this retry actuating signal of true value.
15, a kind of method of getting actuating signal soon of generation one cache memory, it comprises:
Judge whether a fast line taking address is present in this cache memory;
If this fast line taking address is present in this cache memory, judge that then whether to have any to be in one first group of resource devices busy;
If this fast line taking address is not present in this cache memory, judge that then whether to have any to be in one second group of resource devices busy;
When this fast line taking address is not present in this cache memory,, then produce a miss actuating signal, even certain resource in this first group of resource devices is in busy if neither one is in busyly in this second group of resource devices; And
Before this produces the step of this miss actuating signal, judge a homework type of a relevant operation of this fast line taking address.
16, method as claimed in claim 15 is characterized in that also comprising:
When this fast line taking address is not present in this cache memory, busy if any in this second group of resource devices is in, then produce a retry actuating signal.
17, method as claimed in claim 16 is characterized in that also comprising:
When this fast line taking address is present in this cache memory, busy if any in this first group of resource devices is in, then produce a retry actuating signal.
18, method as claimed in claim 17 is characterized in that also comprising:
When this fast line taking address is present in this cache memory,, then produces one and hit actuating signal, even certain resource in this second group of resource devices is in busy if neither one is in busyly in this first group of resource devices.
19, method as claimed in claim 15 is characterized in that this cache memory is a pipeline cache memory, comprises a plurality of stages.
20, method as claimed in claim 19 is characterized in that also comprising:
This fast line taking address and other the fast line taking address in the described stage are compared.
21, method as claimed in claim 20 is characterized in that also comprising:
If one or more the coincideing in this fast line taking address and described other the fast line taking address in the described stage then produces this retry actuating signal.
CNB200410005333XA 2004-01-30 2004-01-30 Device and method for fast fetch target miss early detection Expired - Lifetime CN1287294C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200410005333XA CN1287294C (en) 2004-01-30 2004-01-30 Device and method for fast fetch target miss early detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200410005333XA CN1287294C (en) 2004-01-30 2004-01-30 Device and method for fast fetch target miss early detection

Publications (2)

Publication Number Publication Date
CN1558331A CN1558331A (en) 2004-12-29
CN1287294C true CN1287294C (en) 2006-11-29

Family

ID=34350857

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200410005333XA Expired - Lifetime CN1287294C (en) 2004-01-30 2004-01-30 Device and method for fast fetch target miss early detection

Country Status (1)

Country Link
CN (1) CN1287294C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9892047B2 (en) 2009-09-17 2018-02-13 Provenance Asset Group Llc Multi-channel cache memory
US8661200B2 (en) 2010-02-05 2014-02-25 Nokia Corporation Channel controller for multi-channel cache

Also Published As

Publication number Publication date
CN1558331A (en) 2004-12-29

Similar Documents

Publication Publication Date Title
JP2020537784A (en) Machine learning runtime library for neural network acceleration
CN1084896C (en) Apparatus for flushing contents of cache memory
CN101059783A (en) Transactional memory virtualization
CN1991906A (en) Transparent multi-buffering in multi-gpu graphics subsystem
CN1012855B (en) Cache resiliency in processing veriety of address faults
CN101038531A (en) Shared interface for cmponents in an embedded system
CN101055544A (en) Method and apparatus for supporting multiple one-time table access operations in a hierarchical memory setting
CN104221005B (en) For sending a request to the mechanism of accelerator from multithreading
CN100337207C (en) Detection method of signal quantization deadlock
CN1760847A (en) Bus bridge and data transmission method
CN108932108B (en) Method for scheduling and executing commands of flash memory and device using the same
US20090007117A1 (en) Method and apparatus for performing related tasks on multi-core processor
CN1828541A (en) Implementation method for timing task in Java operating system
CN1292440C (en) Renewing control method for semiconductor memory and semiconductor memory
US9170963B2 (en) Apparatus and method for generating interrupt signal that supports multi-processor
CN1924816A (en) Method and apparatus for improving speed of multi-core system accessing critical resources
CN1892630A (en) Data transfer device which executes dma transfer, semiconductor integrated circuit device and data transfer method
CN110716691B (en) Scheduling method and device, flash memory device and system
CN1760987A (en) Memory cell test circuit for use in semiconductor memory device and its method
TWI588652B (en) Methods for scheduling read and write commands and apparatuses using the same
CN1278241C (en) Memory-access management method and system for synchronous dynamic random-access memory or the like
CN1287294C (en) Device and method for fast fetch target miss early detection
CN1529858A (en) Computing system
CN1825473A (en) Access control device, method for changing memory addresses, and memory system
CN1127022C (en) Method and apparatus for processing data with address mapping

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20061129