CN110297787B - Method, device and equipment for accessing memory by I/O equipment - Google Patents

Method, device and equipment for accessing memory by I/O equipment Download PDF

Info

Publication number
CN110297787B
CN110297787B CN201810240206.XA CN201810240206A CN110297787B CN 110297787 B CN110297787 B CN 110297787B CN 201810240206 A CN201810240206 A CN 201810240206A CN 110297787 B CN110297787 B CN 110297787B
Authority
CN
China
Prior art keywords
cache
access
data
group
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810240206.XA
Other languages
Chinese (zh)
Other versions
CN110297787A (en
Inventor
李鹏
曾露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Loongson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Loongson Technology Corp Ltd filed Critical Loongson Technology Corp Ltd
Priority to CN201810240206.XA priority Critical patent/CN110297787B/en
Publication of CN110297787A publication Critical patent/CN110297787A/en
Application granted granted Critical
Publication of CN110297787B publication Critical patent/CN110297787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a method, a device and equipment for accessing a memory by I/O equipment. The method comprises the steps of calculating the maximum hit way number of a CPU access request in the time period from the time when I/O data corresponding to an I/O access and write request is written into a Cache to the time when the I/O data is read, and updating the available way number of the I/O data in the Cache according to the maximum hit way number to enable the updated available way number of the I/O data to be equal to the difference value between the total way number of the Cache and the maximum hit way number; and performing I/O access processing according to the number of available paths of the I/O data in the Cache, and dynamically adjusting the space occupied by the I/O data in the Cache according to the use condition of the CPU on the Cache in real time, so that the I/O access performance can be improved on the premise of not influencing the CPU performance, the overall performance of the processor is further improved, and the space utilization rate of the Cache is improved.

Description

Method, device and equipment for accessing memory by I/O equipment
Technical Field
The present invention relates to the field of processors, and in particular, to a method, an apparatus, and a device for accessing a memory by an I/O device.
Background
With the rapid development of microprocessor technology, the integration level of the microprocessor is higher and higher, the computing power of the microprocessor is greatly improved, and the memory access performance of the I/O device becomes a bottleneck limiting the performance improvement of the processor.
The traditional I/O equipment generally adopts a Direct Memory Access (DMA) mode or a Direct Cache Access (DCA) mode, wherein the DMA mode allows the I/O equipment to directly read and write the memory so as to reduce the participation degree of a processor core in the I/O data carrying process; the DCA mode is to improve the access performance of the I/O device and allow the I/O device to directly read and write a Cache (Cache). Although the DCA method may improve the access performance of the I/O device, directly writing the I/O data into the Cache memory may cause the contamination of the Cache by the I/O data, which may have a serious impact on other processor processes. Therefore, in order to reduce the pollution of the I/O data to the Cache, a Partition-Based DMA Cache (PBDC for short) mode appears, and the I/O data and the processor data are separated by statically dividing the Cache into two areas for storing the I/O data and the processor data, so as to achieve the purpose of reducing the pollution of the Cache.
However, the PBDC mode requires a more obvious change to the Cache structure and the consistency protocol, and the complexity of implementation is high; due to the diversity of I/O access, if the available space allocated for the I/O data in the Cache is too small, the space of the I/O Cache is insufficient, and the I/O data in the Cache is replaced when not used, so that the overall performance of the processor is seriously reduced; if too much space is allocated in the Cache for the I/O data, the performance of other programs is affected, and the overall performance of the processor is also reduced.
Disclosure of Invention
The invention provides a method, a device and equipment for accessing a memory by I/O (input/output) equipment, which are used for solving the problems that the space of an I/O Cache is insufficient and the I/O data in the Cache is not used and is replaced because the available space allocated for the I/O data in the Cache is too little in the conventional memory access method, so that the overall performance of a processor is seriously reduced; if too much space is allocated in the Cache for the I/O data, the performance of other programs is affected, and the overall performance of the processor is also reduced.
One aspect of the present invention provides a method for an I/O device to access a memory, including:
receiving an I/O access and memory writing request, and calculating the maximum number of hit ways of the CPU access and memory request in the time period from the time when the I/O data corresponding to the I/O access and memory writing request is written into a Cache to the time when the I/O data is read;
updating the number of available paths of the I/O data in the Cache according to the maximum hit path number, so that the updated number of the available paths of the I/O data is equal to the difference value between the total number of paths of the Cache and the maximum hit path number;
and performing I/O access processing according to the available path number of the I/O data in the Cache.
Another aspect of the present invention provides an apparatus for accessing a memory by an I/O device, including:
the first computing module is used for receiving an I/O access and memory write request and computing the maximum hit way number of the CPU access and memory request in the time period from the time when the I/O data corresponding to the I/O access and memory write request is written into the Cache to the time when the I/O data is read;
the updating module is used for updating the available number of the I/O data in the Cache according to the maximum hit number, so that the updated available number of the I/O data is equal to the difference value between the total number of the Cache and the maximum hit number;
and the memory access processing module is used for carrying out I/O memory access processing according to the number of available paths of the I/O data in the Cache.
Another aspect of the present invention provides a computer apparatus comprising: a processor, a memory;
and a computer program stored on the memory and executable by the processor;
and when the processor executes the computer program, the method for accessing the memory by the I/O equipment is realized.
The invention provides a method, a device and equipment for accessing an internal memory by I/O equipment, which updates the available path number of I/O data in a Cache by calculating the maximum hit path number of the CPU access request in the time period from the time when the I/O data corresponding to an I/O access write request is written into the Cache to the time when the I/O data is read, so that the updated available path number of the I/O data is equal to the difference value between the total path number of the Cache and the maximum hit path number, then performs I/O access processing according to the available path number of the I/O data in the Cache, dynamically updates the available path number of the I/O data in the Cache according to the use condition of the CPU to the Cache in real time, thereby dynamically adjusting the space occupied by the I/O data in the Cache, and leading the I/O data to have no great influence on other CPU threads, the number of available paths of the I/O data is increased, so that the I/O access performance can be improved on the premise of not influencing the performance of a CPU (Central processing Unit), the overall performance of the processor is further improved, and the space utilization rate of the Cache is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart of a method for an I/O device to access a memory according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for accessing a memory by an I/O device according to a second embodiment of the present invention;
fig. 3 is a flowchart of a method for accessing a memory by an I/O device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for accessing a memory by an I/O device according to a fifth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a ninth embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terms to which the present invention relates will be explained first:
cache memory: also called Cache, is a high-speed small-capacity memory between a Central Processing Unit (CPU or processor for short) and a main memory in the hierarchical structure of a computer storage system, and forms a primary memory together with the main memory. Cache is usually composed of static memory chips (SRAM), and has a relatively small capacity but much higher speed than main memory, close to the speed of CPU. Because the processor executes the locality of the instruction, the processor has high hit rate in the Cache, and the processor can be searched in the main memory only when the processor can not be found in the Cache, thereby greatly improving the processing speed of the CPU.
The Cache structure is as follows: cache is typically implemented by associative memory, each memory block (also called a Cache line) of which has additional storage information, called a Tag (Tag). When accessing the associative memory, the address is compared to each tag simultaneously, thereby accessing the same memory block as the tag. The Cache in the embodiment of the invention adopts a multi-path group-associative structure, the group-associative Cache is a structure between the fully-associative Cache and the direct-image Cache, the group-associative Cache uses several groups of direct-image blocks, and for a given index, the positions of several Cache lines in one group can be corresponded, so that the hit rate and the system efficiency can be increased.
Direct Memory Access (DMA) mode: the I/O equipment is allowed to directly read and write the memory, so that the participation degree of the processor core in the I/O data transfer process is reduced.
Direct Cache Access (DCA) mode: the I/O device is allowed to directly read and write a Cache memory (Cache) so as to improve the access performance of the I/O device.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Example one
Because the I/O data has better stream characteristics, the I/O generally has continuous addresses for access, namely, the space locality is better, which means that a Cache structure with multiple groups connected is adopted, each group of the Cache can receive I/O requests with approximately equal number, and the space utilization rate of the Cache can be improved; the characteristic also enables the Cache partition similar to the way division of the I/O data to be more effective, the Cache partition effect is better than that of the Cache partition according to the corresponding CPU thread, and more complex Cache partition replacement algorithms such as Hash index addressing and the like are avoided. Therefore, the Cache in the embodiment of the invention adopts a multi-way set connection organizational structure.
Fig. 1 is a flowchart of a method for an I/O device to access a memory according to an embodiment of the present invention. The embodiment of the invention aims at the problem that due to the I/O access diversity in the existing access method, if the available space allocated for the I/O data in the Cache is too small, the space of the I/O Cache is insufficient, and the I/O data in the Cache is replaced when not used, so that the overall performance of a processor is seriously reduced; if too much space is allocated to the I/O data in the Cache, the performance of other programs is influenced, the overall performance of the processor is reduced, and a method for accessing the memory by the I/O equipment is provided. As shown in fig. 1, the method comprises the following specific steps:
step S101, receiving an I/O access and write request, and calculating the maximum hit way number of the CPU access and write request in the time period from the time when the I/O data corresponding to the I/O access and write request is written into the Cache to the time when the I/O data is read.
Wherein, the Cache adopts an organization structure of W-path group connection, W is the nth power of 2, and n is a positive integer.
In this embodiment, a time period from when the I/O data corresponding to the I/O access write request is written into the Cache memory to when the I/O data is read is recorded as a target time period. When the processor receives an I/O access and memory write request, the processor can track the hit condition of the CPU access and memory request in the Cache in the target time period when the I/O data corresponding to the I/O access and memory write request is written into the Cache to be read, thereby calculating the maximum hit way number of the CPU access and memory request in the Cache in the target time period.
In this embodiment, when the maximum number of ways hit by the CPU access request in the Cache in the target time period may be the number of ways closest to the LRU position in the hit position when other CPU threads access and hit the CPU data in the Cache in the target time period, the utilization condition of the Cache by other CPU threads may be reflected. The LRU position refers to a Least Recently Used position, that is, a position of a next Cache line to be replaced according to a Least Recently Used (LRU) replacement policy.
And S102, updating the available path number of the I/O data in the Cache according to the maximum hit path number, so that the updated available path number of the I/O data is equal to the difference value between the total path number of the Cache and the maximum hit path number.
For W-path set-associative Cache, if the value of the maximum hit path number counted in a target time period is N (N is less than or equal to W), it can be known according to the LRU stack characteristics that if the data of the Nth path is hit, the CPU data before the N paths is also hit at a high probability; then, if only the N-way Cache is allocated to the CPU data in the target time period, the hit rate is not affected. Thus, the number of ways that I/O data can use is given as (W-N) in the most conservative way, which has the least impact on other CPU threads.
In this embodiment, after the maximum number of hit ways of the CPU access request in the time period from when the I/O data corresponding to the I/O access write request is written into the Cache memory to when the I/O data is read is obtained by calculation, the available number of ways of the I/O data in the Cache memory is updated to the difference between the total number of ways of the Cache memory and the maximum number of hit ways, so that the available number of ways of the I/O data is increased as much as possible on the premise of not having a large influence on other CPU threads, and therefore, the I/O access performance can be improved without affecting the CPU performance.
And step S103, performing I/O memory access processing according to the number of available paths of the I/O data in the Cache.
The embodiment of the invention updates the available path number of the I/O data in the Cache by calculating the maximum hit path number of the CPU access request in the time period from the time when the I/O data corresponding to the I/O access write request is written into the Cache to the time when the I/O data is read, so that the updated available path number of the I/O data is equal to the difference value between the total path number of the Cache and the maximum hit path number, and then performs I/O access processing according to the available path number of the I/O data in the Cache, thereby realizing dynamically updating the available path number of the I/O data in the Cache in real time according to the use condition of the CPU to the Cache, dynamically adjusting the space occupied by the I/O data in the Cache, increasing the available path number of the I/O data on the premise of not greatly influencing other CPU threads, and further improving the performance of the CPU without influencing the performance of the CPU, the I/O access performance is improved, so that the overall performance of the processor is further improved, and the space utilization rate of the Cache is improved.
Example two
Fig. 2 is a flowchart of a method for an I/O device to access a memory according to a second embodiment of the present invention. On the basis of the first embodiment, in this embodiment, sampling groups are preset, a monitoring register is set for each sampling group, only the access situation of the sampling group is tracked and recorded, and the maximum number of hits of the CPU access requests in a time period from when the I/O data corresponding to the I/O access write request is written into the Cache to when the I/O data is read is calculated. As shown in fig. 2, the step S101 may be implemented by the following steps:
step S201, receiving a memory access request, wherein the memory access request is an I/O memory access write request or a CPU memory access request.
The memory access request carries a target address of the requested access, and the target address at least comprises an index of a Cache group and a label of a Cache line.
In practical application, the memory access request received by the processor may be at least any one of an I/O memory access write request, an I/O memory access read request, a CPU memory access write request, and a CPU memory access read request. I/O data has obvious producer-consumer characteristics, i.e., CPU write I/O read, or I/O write CPU read.
Step S202, determining whether a target Cache group accessed by the memory access request belongs to a sampling group or not according to the index carried by the memory access request.
In this embodiment, an auxiliary tag directory may be added, where the auxiliary tag directory records an index of a sampling group and a tag of a Cache line in the sampling group, and in this embodiment, the auxiliary tag directory is used to count utilization conditions of the Cache by other CPU threads in a target time period from writing of I/O data into the Cache to reading of the I/O data. The auxiliary tag directory and the tag directory in the Cache in the prior art have the same structure, and an LRU replacement strategy is adopted, so that the difference is that the auxiliary tag directory only tracks the hit condition of a sampling group of a CPU access request in the Cache, and the I/O access request is directly ignored, and the auxiliary tag directory can be used for simulating the behavior of a CPU program in the Cache.
The sampling group in this embodiment is obtained by sampling and selecting from all Cache groups, and the sampling group includes a plurality of Cache groups in the caches, and can be obtained by random selection. Preferably, for any path in the Cache, at least one Cache group belonging to the path exists in the sampling group, that is, the Cache group in the sampling group covers each path of the Cache, so that the access condition of the sampling group is closer to the whole access condition of the Cache. In addition, the more Cache groups in the sampling group, the closer the memory access condition of the sampling group is to the overall memory access condition of the Cache.
The Cache in the implementation adopts a W-path group-connected organizational structure, wherein W is the nth power of 2, and n is a positive integer. Optionally, the number of the sampling groups may be W, and the number and the selection manner of the sampling groups in this embodiment are not specifically limited.
In this step, whether a target Cache group accessed by the memory access request belongs to a sampling group is determined according to an index carried by the memory access request, which can be specifically realized by adopting the following mode:
determining whether the memory access request accesses an index item in the auxiliary label directory according to an index in a target address carried by the memory access request; if the access request is determined to access the index entry in the auxiliary tag directory, determining that a target Cache group accessed by the access request belongs to a sampling group; and if the memory access request is determined not to access the index entry in the auxiliary tag directory, determining that the target Cache group accessed by the memory access request does not belong to the sampling group.
In addition, in this embodiment, an index of the sampling group may also be recorded, and whether a target Cache group accessed by the memory access request belongs to the sampling group is determined according to whether the index carried by the memory access request is recorded in the index of the sampling group.
And if the target Cache group accessed by the memory access request belongs to the sampling group, executing the step S203, and determining that the memory access request is an I/O memory access request.
And if the target Cache group accessed by the memory access request does not belong to the sampling group, ending the process and not performing tracking processing on the memory access request.
And step S203, determining whether the memory access request is an I/O memory access request.
If the access request is an I/O access write request, executing steps S204-S205 to continue to track the access request.
And if the memory access request is an I/O memory access read request, ending the process and not carrying out tracking processing on the memory access request.
If the memory access request is not an I/O memory access request, it is indicated that the memory access request is a CPU memory access write request or a CPU memory access read request, and step S206 is executed to perform tracking processing on the CPU memory access request.
And step S204, if the access request is an I/O access write request, determining whether a monitoring register corresponding to the target Cache group is in an unused state.
In this embodiment, the monitor register may record at least the following contents: whether in-use, whether end, tag, and number of ways hit. Initially, the monitor register is set to be in an unused state, and is finished, the tag is 0, and the number of hit ways is 0. For example, the monitor register may include the following fields: used, valid, tag, and LHW. The used is used for indicating whether the monitoring register is in an in-use state, namely whether I/O data is being monitored, the used can use a flag bit, the used is in the in-use state when being 1, and the used is in the unused state when being 0; valid represents whether the monitoring process corresponding to the monitoring register is finished or not, the finishing condition is that the I/O data corresponding to the label recorded by the monitoring register is accessed, and valid can also be represented by a flag bit; tag field is used for recording the label of I/O data to be monitored; LHW is used to record the number of ways hit. Initially, all fields in the monitor register are set to 0.
Optionally, a monitor register may be added to each sampling group, or the existing register may be used to implement the function of the monitor register.
If the monitor register corresponding to the target Cache group is not in use, step S205 is executed.
And if the monitoring register corresponding to the target Cache group is in the using state, ending, and not continuing to track the I/O access and write request.
And S205, recording a label carried by the I/O access and write request in a monitoring register corresponding to the target Cache group, and marking the monitoring register corresponding to the target Cache group as an in-use state.
If the monitoring register corresponding to the target Cache group is in an unused state, it indicates that the access behavior to the target Cache group has not been tracked at present, in this step, a tag carried by the I/O access write request is recorded in the monitoring register corresponding to the target Cache group, so as to record the behavior of writing the I/O data of the I/O access write request into the Cache, and at this time, the monitoring register corresponding to the target Cache group is marked as an in-use state, so as to start the monitoring register to record the number of hit ways of the CPU access request to the Cache before the I/O data of the I/O access write request is read, so as to realize the tracking processing of the use condition of the CPU access request to the Cache.
And S206, if the access request is the CPU access request, determining whether the monitoring register corresponding to the target Cache group is in the in-use state.
And if the monitoring register corresponding to the target Cache group is in the in-use state, executing the steps S207-S211, and performing subsequent tracking processing on the CPU memory access request.
And if the monitoring register corresponding to the target Cache group is in an unused state, which indicates that the access behavior of the target Cache group is not tracked at present, ending the process, and not performing subsequent tracking processing on the CPU access request.
Step S207, judging whether the CPU access request hits the Cache line in the target Cache group.
And if the judgment result is that the CPU access request hits the Cache line in the target Cache group, executing the step S208-S209 and updating the hit way number recorded by the monitoring register.
And if the judgment result is that the CPU access request does not hit the Cache line in the target Cache group, executing the steps S210-S211.
In this embodiment, whether the CPU access request hits a Cache line in the target Cache group is determined, which may specifically be implemented in the following manner:
judging whether the CPU memory access request accesses the tag item in the auxiliary tag directory or not according to the tag in the target address carried by the memory access request; if the CPU access request accesses the tag item in the auxiliary tag directory, determining that the CPU access request hits a Cache line in a target Cache group; and if the CPU access request does not access the tag item in the auxiliary tag directory, determining that the CPU access request does not hit the Cache line in the target Cache group.
In addition, in this embodiment, an index and a tag of the sampling group may also be recorded, and whether the CPU access request hits a Cache line in the target Cache group is determined according to whether the index and the tag carried by the access request hit the tag of the sampling group.
And S208, comparing the number of the paths corresponding to the currently hit Cache line with the number of the hit paths recorded in the monitoring register corresponding to the target Cache group.
And if the number of ways corresponding to the currently hit Cache line is less than or equal to the number of hit ways recorded in the monitoring register corresponding to the target Cache group, updating the number of hit ways recorded in the monitoring register corresponding to the target Cache group is not needed.
Step S209, if the number of ways corresponding to the currently hit Cache line is greater than the number of ways already hit recorded in the monitoring register corresponding to the target Cache group, updating the number of ways already hit recorded in the monitoring register corresponding to the target Cache group to the number of ways corresponding to the currently hit Cache line.
And step S210, judging whether the CPU access request hits a Cache line corresponding to a label recorded by a monitoring register corresponding to the target Cache group.
I/O data has obvious producer-consumer characteristics, namely CPU write I/O read or I/O write CPU read. If the CPU access request hits the Cache line corresponding to the tag recorded in the monitoring register corresponding to the target Cache group, it indicates that the I/O data in the Cache line corresponding to the tag recorded in the monitoring register is accessed, that is, the I/O data is read by the CPU access request, step S211 is executed, and the number of hit ways recorded in the monitoring register corresponding to the target Cache group is taken as the maximum number of hit ways.
And if the CPU access request does not hit the Cache line corresponding to the label recorded by the monitoring register corresponding to the target Cache group, finishing the tracking processing of the CPU access request.
If the CPU access request hits in the Cache line corresponding to the tag recorded in the monitor register corresponding to the target Cache group, step S211 is executed.
And S211, taking the number of hit ways recorded in the monitoring register corresponding to the target Cache group as the maximum number of hit ways.
Optionally, after taking the number of hit ways recorded in the monitoring register corresponding to the target Cache group as the maximum number of hit ways, the method further includes:
setting the number of hit ways in a monitoring register corresponding to a target Cache group to be 0; and recording a monitoring register corresponding to the target Cache group as an unused state.
In addition, after the number of hit ways recorded in the monitor register corresponding to the target Cache group is taken as the maximum number of hit ways, all fields of the monitor register may be reset to 0, so that the monitor register may be used for the next round of tracking processing.
In the embodiment of the present invention, the processing flow shown in fig. 2 is described in detail by taking a process of performing tracking processing on one memory access request as an example, in the embodiment of the present invention, the processing process shown in fig. 2 is performed on a plurality of memory access requests circularly, and is performed in parallel with normal memory access processing of a processor, and while normal memory access processing of the processor is not affected, the maximum number of hit ways of the CPU memory access request in a time period from writing of I/O data corresponding to one I/O memory access write request into a Cache to reading is dynamically calculated in real time.
The embodiment of the invention sets sampling groups in advance, sets a monitoring register for each sampling group, and only tracks and records the access situation of the sampling group, so that the sampling mode can save hardware cost, and dynamically calculates the maximum hit way number of the CPU access request in the time period from the time when the I/O data corresponding to the I/O access write request is written into the Cache to the time when the I/O data is read, thereby further dynamically adjusting the space occupied by the I/O data in the Cache, increasing the available way number of the I/O data on the premise of not greatly influencing other CPU threads, and further improving the I/O access performance on the premise of not influencing the CPU performance, further improving the overall performance of a processor, and improving the space utilization rate of the Cache.
EXAMPLE III
Fig. 3 is a flowchart of a method for an I/O device to access a memory according to a third embodiment of the present invention. On the basis of the first embodiment or the second embodiment, in this embodiment, after the I/O access write request is received, when the I/O access write request hits the Cache, the I/O data corresponding to the I/O access write request is directly written into the Cache; when the I/O access and write request does not hit the Cache, calculating the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request; and determining whether to carry out access processing on the I/O access and write request by adopting a direct Cache access DCA mode according to whether the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of available paths of the I/O data in the Cache. As shown in fig. 3, the method comprises the following specific steps:
and step S301, receiving an I/O access and storage write request.
Step S302, judging whether the I/O access and write request hits the Cache.
If the I/O access and write request hits the Cache, step S303 is executed to directly write the I/O data corresponding to the I/O access and write request into the Cache.
If the I/O access write request does not hit the Cache, steps S304-S305 are executed.
Step S303, if the I/O access and write request hits the Cache, the I/O data corresponding to the I/O access and write request is directly written into the Cache.
And step S304, if the I/O access and write request does not hit the Cache, calculating the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request.
In this embodiment, in order to distinguish between I/O data and CPU data, two identification bits are newly added to an original tag register in the Cache, and the improved tag register at least includes the following fields: type, whether it has been accessed, label. Further, the tag register may use a flag bit to indicate the type of stored data, so as to distinguish CPU data from I/O data, where, for example, a flag of 1 indicates I/O data, and a flag of 0 indicates CPU data; whether stored data has been accessed can be indicated by a used flag bit, for example, a used flag bit of 1 indicates that the stored data has been accessed, and a used flag bit of 0 indicates that the stored data has not been accessed; tag is used for recording the label corresponding to the stored data.
In this step, the number of ways occupied by I/O data in the target Cache group accessed by the I/O access/write request may be counted by the number of Cache lines whose flag bit is 1 in the corresponding tag register in the target Cache group.
Step S305, determining whether to adopt a direct Cache access DCA mode to carry out access processing on the I/O access and write request according to whether the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache.
In this embodiment, whether to perform access processing on an I/O access write request in a DCA manner is determined according to whether the number of ways occupied by I/O data in a target Cache group accessed by the I/O access write request is less than the number of ways available for I/O data in the Cache, which specifically includes the following conditions:
(1) and if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of available paths of the I/O data in the Cache, performing access processing on the I/O access and write request in a DCA mode.
(2) And if the position to be replaced stores the accessed I/O data, performing access processing on the I/O access and write request in a DCA mode, wherein the position to be replaced refers to the position of the next Cache line to be replaced according to the adopted replacement strategy.
The I/O data stored in the position to be replaced and accessed is as follows: the locations to be replaced store I/O data, and the I/O data stored by the locations to be replaced have been accessed.
Specifically, according to the tag register corresponding to the position to be replaced, if both the flag bit and the used flag bit in the tag register corresponding to the position to be replaced are 1, it can be determined that the position to be replaced stores accessed I/O data.
(3) And if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is greater than or equal to the number of available paths of the I/O data in the Cache, and the I/O data stored in the position to be replaced is not accessed, performing access processing on the I/O access and write request in a DMA mode.
If the I/O data stored in the current position to be replaced is not the I/O data or the I/O data stored in the current position to be replaced is not accessed, the I/O data stored in the position to be replaced is determined not to be accessed.
Specifically, according to the tag register corresponding to the position to be replaced, if the flag bit in the tag register corresponding to the position to be replaced is 0, it can be determined that the current position to be replaced stores no I/O data; if the used flag bit in the tag register corresponding to the position to be replaced is 0, determining that the I/O data stored in the current position to be replaced is not accessed; as long as the flag bit is determined to be 0 or the used flag bit is determined to be 0, it can be determined that the to-be-replaced location stores I/O data that has not been accessed.
After an I/O access and write request is received, when the I/O access and write request hits a Cache, the I/O data corresponding to the I/O access and write request is directly written into the Cache; when the I/O access and write request does not hit the Cache, determining whether to perform access processing on the I/O access and write request by adopting a Direct Cache Access (DCA) mode according to whether the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache, thereby improving the performance of the processor.
Example four
On the basis of the third embodiment, in this embodiment, in the process of performing access processing on an I/O access write request in a DCA manner, it is determined whether the Cache replacement policy adopts the LRU replacement policy or adopts another replacement policy according to a statistical result of the number of Cache lines that are replaced without secondary access and the number of Cache lines that have secondary access before being replaced.
Wherein, the second replacement strategy is a Most Recently Used (MRU) replacement strategy.
Preferably, since the target addresses of the I/O write requests in the DCA manner are consecutive and the I/O data is read and used only once by the CPU, the second replacement policy is: and after the I/O data in the Cache line is accessed once, taking the Cache line as a priority replacement position, and preferentially writing the target data into the priority replacement position when the target data is written into the Cache subsequently.
In this embodiment, since the behavior of the I/O data is initially unclear, an LRU replacement policy is initially adopted as a Cache replacement policy; counting a first difference value between the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced; and when the first difference value is equal to or larger than a first preset threshold value, if the number of Cache lines which are replaced without secondary access is larger than the number of Cache lines which are subjected to secondary access before being replaced, updating the Cache replacement strategy into a second replacement strategy.
Alternatively, an N-bit saturation counter may be set, with an initial value of all 0's. When a Cache line where I/O data is located is replaced, if the I/O data is not accessed for the second time until the I/O data is replaced, a saturation counter is added with 1; when a Cache line where I/O data is located is accessed for the second time, the saturation counter is decreased by 1; and when the saturation counter is changed to be all 1, changing the Cache replacement strategy of the I/O data into a second replacement strategy.
When the initial values of the saturation counters are all 1, the counting number of the saturation counters is equal to a first preset threshold value. The first preset threshold may be set by a technician according to actual needs, and this embodiment does not specifically limit this.
Further, with the tag register modified by the third embodiment, the used flag bit is extended to 2 bits, the lower bit is used to indicate whether the access has been performed, the upper bit is used to indicate whether the secondary access has occurred, and the upper bit is 1 bit when the secondary access occurs. During counting, when a Cache line (the flag bit of a corresponding tag register is 1) where I/O data is located is replaced, if the high bit of the used flag bit is 0, the I/O data in the Cache line is indicated until the I/O data is replaced and no secondary access occurs, and a saturation counter is added with 1; when a Cache line (the flag bit of the corresponding tag register is 1) where an I/O data is located is accessed, if the high bit of the used flag bit is 0 and the low bit is 1, it indicates that the I/O data in the Cache line has been accessed once, the used flag bit is 11, at this time, it indicates that the I/O data in the Cache line has been accessed twice, and the value of the saturation counter is subtracted by 1.
Optionally, the number of Cache lines that are replaced without the occurrence of the secondary access (marked as a first number) and the number of Cache lines that are replaced with the secondary access (marked as a second number) may be counted respectively, a difference between the first number and the second number counted in real time is obtained, and the first difference is compared with a first preset threshold; and when the first difference is equal to or larger than a first preset threshold value, if the first quantity is larger than the second quantity, updating the Cache replacement strategy into a second replacement strategy.
In this embodiment, for how to change the second replacement into the LRU replacement policy, the auxiliary tag directory needs to be preset, and the Cache replacement policy corresponding to the auxiliary tag directory always adopts the LRU replacement policy. The Cache replacement policy in the auxiliary tag directory keeps the LRU replacement policy all the time, and the other field definitions are completely consistent with the tag in the Cache.
After updating the Cache replacement policy to the second replacement policy, the method further includes:
when the Cache replacement strategy is updated to a second replacement strategy, tracking the auxiliary label directory, and counting a second difference value between the number of Cache line labels which are replaced without secondary access and the number of Cache line labels which are secondarily accessed before the Cache line labels are replaced in the tracking process; and when the second difference is equal to or larger than a second preset threshold value, if the number of Cache line tags which are replaced without secondary access is smaller than the number of Cache line tags which are subjected to secondary access before being replaced, updating the Cache replacement policy into the LRU replacement policy.
In the tracking process, a second difference between the number of Cache line tags that are replaced without the occurrence of the secondary access and the number of Cache line tags that have the occurrence of the secondary access before being replaced is counted, and the counting mode of the first difference may be the same, which is not described herein again.
For example, the auxiliary tag directory may be tracked by using a saturation counter, when the Cache replacement policy is updated to the LRU replacement policy, the value of the saturation counter is all 1, the condition for increasing or decreasing the saturation counter is consistent with the counting process of the first difference value, and when the value of the saturation counter is all 0, the Cache replacement policy is changed to the LRU replacement policy.
In order to reduce hardware overhead, the auxiliary tag directory also adopts a sampling mode, the auxiliary tag directory records indexes of sampling groups and tags of Cache lines in the sampling groups, and high accuracy can be achieved only by setting a limited number of groups of sampling groups.
In addition, the memory address is generally organized into a three-dimensional structure of a heap address, a row address and a column address, and due to the physical characteristics of the memory address, the efficiency of continuously accessing the same row address is highest, so that the row conflict of the memory can be well reduced, and the external expression is that the read-write access bandwidth of the continuous address is higher and the delay is smaller. When the high-speed I/O adopts a DMA mode for accessing, the advantage can be fully utilized due to the continuity of I/O data; however, when the DCA method is adopted, analysis optimization is needed to take advantage of the advantage.
The write strategy is generally divided into a direct write mode and a write-back mode, wherein the direct write mode is to write the memory simultaneously when the Cache is written, and the write-back mode is to write the memory only when the Cache is replaced. In order to reduce frequent memory read and write, most modern microprocessors use write-back, which is suitable for CPU data that is not very spatially localized and is reused. But the benefits of the DCA approach may be greatly reduced if the same approach is used for I/O data.
Because I/O write request addresses adopting a DCA mode are continuous, and I/O data is only read and used once by a CPU (central processing unit), due to the filtering property of Cache, the time for replacing the I/O data of continuous addresses in different groups is not continuous, so that the original write of the continuous addresses is divided into a plurality of writes of different addresses, the write requests are written back to a memory at different time, and serious line conflict of the memory is caused. If a direct write mode is adopted, the I/O write request adopting the DCA mode is written into the memory while the Cache is written, and the data can not be written subsequently, so that the data can be directly deleted from the Cache without being written back to the memory during replacement, and the memory access efficiency is greatly improved.
In this embodiment, optionally, when the DCA mode is used for the access processing, and when the LRU replacement policy is used, the write-back mode is used for the access processing of the I/O access write request; and when the second replacement strategy is adopted, performing access processing on the I/O access and write request in a direct mode.
In the embodiment of the invention, in the process of performing access processing on the I/O access write request in a DCA mode, according to the statistical result of the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced, whether the Cache replacement strategy adopts an LRU (least recently used) replacement strategy or other replacement strategies is determined, the dynamic switching of the Cache replacement strategies is realized, and different write memory modes are correspondingly adopted aiming at different Cache replacement strategies, so that the access efficiency is greatly improved, and the overall performance of the processor is improved.
EXAMPLE five
Fig. 4 is a schematic structural diagram of an apparatus for accessing a memory by an I/O device according to a fifth embodiment of the present invention. The device for accessing the memory by the I/O equipment provided by the embodiment of the invention can execute the processing flow provided by the method for accessing the memory by the I/O equipment. As shown in fig. 4, the apparatus 40 includes: a first calculation module 401, an update module 402 and a memory access processing module 403.
Specifically, the first computing module 401 is configured to receive an I/O access write request, and compute a maximum number of hit ways of the CPU access request in a time period from when I/O data corresponding to the I/O access write request is written into the Cache to when the I/O data is read.
The updating module 402 is configured to update the available number of I/O data in the Cache according to the maximum number of hit ways, so that the updated available number of I/O data is equal to a difference between the total number of ways of the Cache and the maximum number of hit ways.
The memory access processing module 403 is configured to perform I/O memory access processing according to the number of available paths of I/O data in the Cache.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the first embodiment, and specific functions are not described herein again.
The embodiment of the invention updates the available path number of the I/O data in the Cache by calculating the maximum hit path number of the CPU access request in the time period from the time when the I/O data corresponding to the I/O access write request is written into the Cache to the time when the I/O data is read, so that the updated available path number of the I/O data is equal to the difference value between the total path number of the Cache and the maximum hit path number, and then performs I/O access processing according to the available path number of the I/O data in the Cache, thereby realizing dynamically updating the available path number of the I/O data in the Cache in real time according to the use condition of the CPU to the Cache, dynamically adjusting the space occupied by the I/O data in the Cache, increasing the available path number of the I/O data on the premise of not greatly influencing other CPU threads, and further improving the performance of the CPU without influencing the performance of the CPU, the I/O access performance is improved, so that the overall performance of the processor is further improved, and the space utilization rate of the Cache is improved.
EXAMPLE six
On the basis of the fifth embodiment, in this embodiment, the first calculating module includes: the device comprises a receiving submodule, a first determining submodule, a second determining submodule, a first recording submodule, a third determining submodule, a first judging submodule, a comparing submodule, a second recording submodule and a fourth determining submodule.
The receiving submodule is used for receiving an access request, the access request carries a target address requested to be accessed, and the target address at least comprises an index of a Cache group and a label of a Cache line.
The first determining submodule is used for determining whether a target Cache group accessed by the memory access request belongs to a sampling group or not according to the index carried by the memory access request.
And the second determining submodule is used for determining whether a monitoring register corresponding to the target Cache group is in an unused state or not if the target Cache group accessed by the access request belongs to the sampling group and the access request is an I/O access write request.
The first recording submodule is used for recording a label carried by the I/O access and write request in a monitoring register corresponding to the target Cache group and marking the monitoring register corresponding to the target Cache group as an in-use state if the monitoring register corresponding to the target Cache group is in an unused state.
And the third determining sub-module is used for determining whether a monitoring register corresponding to the target Cache group is in an in-use state or not if the target Cache group accessed by the access request belongs to the sampling group and the access request is a CPU access request.
And the first judgment sub-module is used for judging whether the CPU access request hits a Cache line in the target Cache group or not if the monitoring register corresponding to the target Cache group is in the in-use state.
And the comparison sub-module is used for comparing the number of the paths corresponding to the currently hit Cache line with the number of the hit paths recorded in the monitoring register corresponding to the target Cache group if the judgment result is that the CPU access request hits the Cache line in the target Cache group.
And the second recording submodule is used for updating the number of the hit ways recorded in the monitoring register corresponding to the target Cache group into the number of the ways corresponding to the currently hit Cache line if the number of the ways corresponding to the currently hit Cache line is greater than the number of the hit ways recorded in the monitoring register corresponding to the target Cache group.
And the fourth determining submodule is used for determining the number of the hit ways recorded in the monitoring register corresponding to the target Cache group as the maximum number of the hit ways when the CPU access request hits the Cache line corresponding to the label recorded in the monitoring register corresponding to the target Cache group if the judgment result is that the CPU access request does not hit the Cache line in the target Cache group.
Optionally, the first computation module further includes a reset submodule.
The reset sub-module is used for setting the number of hit ways in the monitoring registers corresponding to the target Cache group to 0, and recording the monitoring registers corresponding to the target Cache group as an unused state.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the second embodiment, and specific functions are not described herein again.
The embodiment of the invention sets sampling groups in advance, sets a monitoring register for each sampling group, and only tracks and records the access situation of the sampling group, so that the sampling mode can save hardware cost, and dynamically calculates the maximum hit way number of the CPU access request in the time period from the time when the I/O data corresponding to the I/O access write request is written into the Cache to the time when the I/O data is read, thereby further dynamically adjusting the space occupied by the I/O data in the Cache, increasing the available way number of the I/O data on the premise of not greatly influencing other CPU threads, and further improving the I/O access performance on the premise of not influencing the CPU performance, further improving the overall performance of a processor, and improving the space utilization rate of the Cache.
EXAMPLE seven
On the basis of the fifth embodiment or the sixth embodiment, in this embodiment, the apparatus for accessing a memory by an I/O device further includes: the device comprises a judging module, a writing module, a second calculating module and a determining module.
The judging module is used for judging whether the I/O access and write request hits the Cache or not.
And the write-in module is used for directly writing the I/O data corresponding to the I/O access and write request into the Cache if the I/O access and write request hits the Cache.
And the second computing module is used for computing the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request if the I/O access and write request does not hit the Cache.
The determining module is used for determining whether to carry out access processing on the I/O access and write request in a direct Cache access DCA mode according to whether the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache.
Optionally, the determining module includes: a first processing sub-module and a second processing sub-module.
The first processing submodule is used for performing access processing on the I/O access and write request in a DCA mode if the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache.
The first processing submodule is also used for performing access processing on the I/O access and write request in a DCA mode if the I/O data which is accessed is stored in the position to be replaced, and the position to be replaced refers to the position of the next Cache line to be replaced according to the adopted replacement strategy.
And the second processing submodule is used for performing access processing on the I/O access and write request in a DMA mode if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is greater than or equal to the number of available paths of the I/O data in the Cache or the I/O data which is stored in the position to be replaced and is not accessed is stored.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the third embodiment, and specific functions are not described herein again.
After an I/O access and write request is received, when the I/O access and write request hits a Cache, the I/O data corresponding to the I/O access and write request is directly written into the Cache; when the I/O access and write request does not hit the Cache, determining whether to perform access processing on the I/O access and write request by adopting a Direct Cache Access (DCA) mode according to whether the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache, thereby improving the performance of the processor.
Example eight
On the basis of the seventh embodiment, in this embodiment, the first processing sub-module is further configured to: initially adopting an LRU replacement strategy as a Cache replacement strategy; counting a first difference value between the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced; when the first difference value is equal to or larger than a first preset threshold value, if the number of Cache lines which are replaced without secondary access is larger than the number of Cache lines which are subjected to secondary access before being replaced, updating the Cache replacement strategy into a second replacement strategy; wherein the second replacement policy is an MRU replacement policy, or the second replacement policy is: and after the I/O data in the Cache line is accessed once, taking the Cache line as a priority replacement position, and preferentially writing the target data into the priority replacement position when the target data is written into the Cache subsequently.
The first processing submodule is further configured to: when the Cache replacement strategy is updated to a second replacement strategy, acquiring a preset auxiliary tag directory, wherein the auxiliary tag directory records the index of the sampling group and the tag of a Cache line in the sampling group, and the Cache replacement strategy corresponding to the auxiliary tag directory always adopts an LRU replacement strategy; tracking the auxiliary label directory, and counting a second difference value between the number of Cache line labels which are replaced without secondary access and the number of Cache line labels which are subjected to secondary access before being replaced in the tracking process; and when the second difference is equal to or larger than a second preset threshold value, if the number of Cache line tags which are replaced without secondary access is smaller than the number of Cache line tags which are subjected to secondary access before being replaced, updating the Cache replacement policy into the LRU replacement policy.
Optionally, the first processing sub-module is further configured to: when the LRU replacement strategy is adopted, the access processing is carried out on the I/O access and write request in a write-back mode; and when the second replacement strategy is adopted, performing access processing on the I/O access and write request in a direct-write mode.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the fourth embodiment, and specific functions are not described herein again.
In the embodiment of the invention, in the process of performing access processing on the I/O access write request in a DCA mode, according to the statistical result of the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced, whether the Cache replacement strategy adopts an LRU (least recently used) replacement strategy or other replacement strategies is determined, the dynamic switching of the Cache replacement strategies is realized, and different write memory modes are correspondingly adopted aiming at different Cache replacement strategies, so that the access efficiency is greatly improved, and the overall performance of the processor is improved.
Example nine
Fig. 5 is a schematic structural diagram of a computer device according to a ninth embodiment of the present invention. As shown in fig. 5, the apparatus 50 includes: a processor 501, a memory 502, and computer programs stored on the memory 502 and executable by the processor 501.
The processor 501, when executing the computer program stored on the memory 502, implements the method for an I/O device to access memory provided by any of the above-described method embodiments.
The embodiment of the invention updates the available path number of the I/O data in the Cache by calculating the maximum hit path number of the CPU access request in the time period from the time when the I/O data corresponding to the I/O access write request is written into the Cache to the time when the I/O data is read, so that the updated available path number of the I/O data is equal to the difference value between the total path number of the Cache and the maximum hit path number, and then performs I/O access processing according to the available path number of the I/O data in the Cache, thereby realizing dynamically updating the available path number of the I/O data in the Cache in real time according to the use condition of the CPU to the Cache, dynamically adjusting the space occupied by the I/O data in the Cache, increasing the available path number of the I/O data on the premise of not greatly influencing other CPU threads, and further improving the performance of the CPU without influencing the performance of the CPU, the I/O access performance is improved, so that the overall performance of the processor is further improved, and the space utilization rate of the Cache is improved.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (19)

1. A method for an I/O device to access memory, comprising:
receiving an I/O access and write request, and calculating the maximum number of hit ways of a CPU access request in a time period from the time when I/O data corresponding to the I/O access and write request is written into a Cache memory to the time when the I/O data is read, wherein the Cache memory adopts a multi-way group-connected organization structure;
updating the number of available paths of the I/O data in the Cache according to the maximum hit path number, so that the updated number of the available paths of the I/O data is equal to the difference value between the total number of paths of the Cache and the maximum hit path number;
and performing I/O access processing according to the available path number of the I/O data in the Cache.
2. The method of claim 1, wherein receiving an I/O access write request, tracking a hit condition of a CPU access request in a Cache memory in a period from when I/O data corresponding to the I/O access write request is written into the Cache memory to when the I/O data is read, and thereby calculating a maximum hit number of the CPU access request, comprises:
receiving an access request, wherein the access request carries a target address requesting access, and the target address at least comprises an index of a Cache group and a label of a Cache line;
determining whether a target Cache group accessed by the memory access request belongs to a sampling group according to an index carried by the memory access request, wherein the sampling group comprises at least one Cache group;
if the target Cache group accessed by the access request belongs to a sampling group and the access request is an I/O access write request, determining whether a monitoring register corresponding to the target Cache group is in an unused state;
if the monitoring register corresponding to the target Cache group is in an unused state, recording a tag carried by the I/O access and write request in the monitoring register corresponding to the target Cache group, and marking the monitoring register corresponding to the target Cache group as an in-use state.
3. The method as claimed in claim 2, wherein after determining whether a target Cache group accessed by the memory access request belongs to a sampling group according to an index carried by the memory access request, the method further comprises:
if the target Cache group accessed by the access request belongs to a sampling group and the access request is a CPU access request, determining whether a monitoring register corresponding to the target Cache group is in an in-use state;
if the monitoring register corresponding to the target Cache group is in an in-use state, judging whether the CPU access request hits a Cache line in the target Cache group;
if the judgment result is that the CPU access request hits the Cache line in the target Cache group, comparing the number of the corresponding way of the currently hit Cache line with the number of the hit way recorded in the monitoring register corresponding to the target Cache group;
if the number of ways corresponding to the currently hit Cache line is greater than the number of ways already hit recorded in the monitoring register corresponding to the target Cache group, updating the number of ways already hit recorded in the monitoring register corresponding to the target Cache group to the number of ways corresponding to the currently hit Cache line;
and if the judgment result is that the CPU access request does not hit the Cache line in the target Cache group, taking the hit way number recorded in the monitoring register corresponding to the target Cache group as the maximum hit way number when the CPU access request hits the Cache line corresponding to the label recorded in the monitoring register corresponding to the target Cache group.
4. The method according to claim 3, wherein after taking the number of hit ways recorded in the monitor register corresponding to the target Cache group as the maximum number of hit ways, further comprising:
setting the number of hit ways in a monitoring register corresponding to the target Cache group to be 0;
and recording a monitoring register corresponding to the target Cache group as an unused state.
5. The method of claim 1, wherein after receiving the I/O access write request, further comprising:
judging whether the I/O access and write request hits a Cache or not;
if the I/O access and memory writing request hits the Cache, directly writing the I/O data corresponding to the I/O access and memory writing request into the Cache;
if the I/O access and write request does not hit the Cache, calculating the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request;
and determining whether to carry out access processing on the I/O access and write request by adopting a Direct Cache Access (DCA) mode according to whether the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache.
6. The method as claimed in claim 5, wherein determining whether to perform access processing on the I/O access write request in a direct Cache access DCA manner according to whether the number of ways occupied by I/O data in the target Cache group accessed by the I/O access write request is less than the number of ways available for I/O data in the Cache comprises:
if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache, performing access processing on the I/O access and write request in a DCA mode;
if the I/O data which is accessed is stored in the position to be replaced, performing access processing on the I/O access and write request in a DCA mode, wherein the position to be replaced refers to the position of the next Cache line to be replaced according to the adopted replacement strategy;
and if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is greater than or equal to the number of available paths of the I/O data in the Cache, or the I/O data which is stored in the position to be replaced and is not accessed is stored, performing access processing on the I/O access and write request in a DMA mode.
7. The method as claimed in claim 5, wherein when performing access processing on the I/O access write request in DCA manner, further comprising:
initially adopting an LRU replacement strategy as a Cache replacement strategy;
counting a first difference value between the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced;
when the first difference value is equal to or larger than a first preset threshold value, if the number of Cache lines which are replaced without secondary access is larger than the number of Cache lines which are subjected to secondary access before being replaced, updating the Cache replacement strategy into a second replacement strategy;
wherein the second replacement policy is an MRU replacement policy, or the second replacement policy is: and after the I/O data in the Cache line is accessed once, taking the Cache line as a priority replacement position, and preferentially writing the target data into the priority replacement position when the target data is written into the Cache subsequently.
8. The method according to claim 7, further comprising, after the Cache replacement policy is updated to the second replacement policy:
acquiring a preset auxiliary tag directory, wherein the auxiliary tag directory records indexes of a sampling group and tags of Cache lines in the sampling group, and Cache replacement strategies corresponding to the auxiliary tag directory always adopt LRU (least recently used) replacement strategies;
tracking the auxiliary label directory, and counting a second difference value between the number of Cache line labels which are replaced without secondary access and the number of Cache line labels which are subjected to secondary access before being replaced in the tracking process;
and when the second difference is equal to or larger than a second preset threshold value, if the number of Cache line tags which are replaced without secondary access is smaller than the number of Cache line tags which are subjected to secondary access before being replaced, updating the Cache replacement policy into the LRU replacement policy.
9. The method according to claim 7 or 8, wherein when performing access processing on the I/O access write request in a DCA manner, the method further comprises:
when the LRU replacement strategy is adopted, carrying out access processing on the I/O access and write request in a write-back mode;
and when the second replacement strategy is adopted, performing access processing on the I/O access and write request in a direct-write mode.
10. An apparatus for accessing memory by an I/O device, comprising:
the first computing module is used for receiving an I/O access and memory write request and computing the maximum hit way number of the CPU access and memory request in the time period from the time when the I/O data corresponding to the I/O access and memory write request is written into a Cache to the time when the I/O data is read, wherein the Cache adopts a multi-way group connection organization structure;
the updating module is used for updating the available number of the I/O data in the Cache according to the maximum hit number, so that the updated available number of the I/O data is equal to the difference value between the total number of the Cache and the maximum hit number;
and the memory access processing module is used for carrying out I/O memory access processing according to the number of available paths of the I/O data in the Cache.
11. The apparatus of claim 10, wherein the first computing module comprises:
the receiving submodule is used for receiving an access request, wherein the access request carries a target address requesting access, and the target address at least comprises an index of a Cache group and a label of a Cache line;
the first determining sub-module is used for determining whether a target Cache group accessed by the memory access request belongs to a sampling group according to an index carried by the memory access request, and the sampling group comprises at least one Cache group;
the second determining submodule is used for determining whether a monitoring register corresponding to the target Cache group is in an unused state or not if the target Cache group accessed by the access request belongs to a sampling group and the access request is an I/O access write request;
and the first recording sub-module is used for recording a label carried by the I/O access and write request in the monitoring register corresponding to the target Cache group and marking the monitoring register corresponding to the target Cache group as an in-use state if the monitoring register corresponding to the target Cache group is in an unused state.
12. The apparatus of claim 11, wherein the first computing module further comprises:
a third determining submodule, configured to determine whether a monitoring register corresponding to a target Cache group is in an in-use state if the target Cache group accessed by the access request belongs to a sampling group and the access request is a CPU access request;
the first judgment sub-module is used for judging whether the CPU access request hits a Cache line in the target Cache group or not if the monitoring register corresponding to the target Cache group is in an in-use state;
the comparison submodule is used for comparing the number of ways corresponding to the currently hit Cache line with the number of the hit ways recorded in the monitoring register corresponding to the target Cache group if the judgment result is that the CPU access request hits the Cache line in the target Cache group;
a second recording sub-module, configured to update the number of hit ways recorded in the monitoring register corresponding to the target Cache group to the number of ways corresponding to the currently hit Cache line if the number of ways corresponding to the currently hit Cache line is greater than the number of hit ways recorded in the monitoring register corresponding to the target Cache group;
and the fourth determining sub-module is used for determining the number of hit ways recorded in the monitoring register corresponding to the target Cache group as the maximum number of hit ways when the CPU access request hits the Cache line corresponding to the tag recorded in the monitoring register corresponding to the target Cache group if the judgment result shows that the CPU access request does not hit the Cache line in the target Cache group.
13. The apparatus of claim 12, wherein the first computing module further comprises:
and the reset sub-module is used for setting the number of hit ways in the monitoring registers corresponding to the target Cache group to 0 and recording the monitoring registers corresponding to the target Cache group as an unused state.
14. The apparatus of claim 10, further comprising:
the judging module is used for judging whether the I/O access and write request hits the Cache or not;
the write-in module is used for directly writing the I/O data corresponding to the I/O access and write request into the Cache if the I/O access and write request hits the Cache;
the second calculation module is used for calculating the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request if the I/O access and write request does not hit the Cache;
and the determining module is used for determining whether to carry out access processing on the I/O access and write request by adopting a Direct Cache Access (DCA) mode according to whether the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache.
15. The apparatus of claim 14, wherein the determining module comprises:
the first processing submodule is used for performing access processing on the I/O access and write request in a DCA mode if the number of paths occupied by the I/O data in a target Cache group accessed by the I/O access and write request is less than the number of paths available for the I/O data in the Cache;
the first processing submodule is also used for performing memory access processing on the I/O memory access and write request in a DCA mode if the I/O data which is accessed is stored in a position to be replaced, wherein the position to be replaced refers to the position of a Cache line to be replaced next according to the adopted replacement strategy;
and the second processing submodule is used for performing access processing on the I/O access and write request in a DMA mode if the number of paths occupied by the I/O data in the target Cache group accessed by the I/O access and write request is greater than or equal to the number of available paths of the I/O data in the Cache or the I/O data which is stored in the position to be replaced and is not accessed is stored.
16. The apparatus of claim 15, wherein the first processing sub-module is further configured to:
initially adopting an LRU replacement strategy as a Cache replacement strategy;
counting a first difference value between the number of Cache lines which are replaced without secondary access and the number of Cache lines which are subjected to secondary access before being replaced;
when the first difference value is equal to or larger than a first preset threshold value, if the number of Cache lines which are replaced without secondary access is larger than the number of Cache lines which are subjected to secondary access before being replaced, updating the Cache replacement strategy into a second replacement strategy;
wherein the second replacement policy is an MRU replacement policy, or the second replacement policy is: and after the I/O data in the Cache line is accessed once, taking the Cache line as a priority replacement position, and preferentially writing the target data into the priority replacement position when the target data is written into the Cache subsequently.
17. The apparatus of claim 16, wherein the first processing sub-module is further configured to:
when the Cache replacement strategy is updated to the second replacement strategy, acquiring a preset auxiliary tag directory, wherein the auxiliary tag directory records the index of a sampling group and the tag of a Cache line in the sampling group, and the Cache replacement strategy corresponding to the auxiliary tag directory always adopts an LRU (least recently used) replacement strategy;
tracking the auxiliary label directory, and counting a second difference value between the number of Cache line labels which are replaced without secondary access and the number of Cache line labels which are subjected to secondary access before being replaced in the tracking process;
and when the second difference is equal to or larger than a second preset threshold value, if the number of Cache line tags which are replaced without secondary access is smaller than the number of Cache line tags which are subjected to secondary access before being replaced, updating the Cache replacement policy into the LRU replacement policy.
18. The apparatus of claim 16 or 17, wherein the first processing sub-module is further configured to:
when the LRU replacement strategy is adopted, carrying out access processing on the I/O access and write request in a write-back mode;
and when the second replacement strategy is adopted, performing access processing on the I/O access and write request in a direct-write mode.
19. A computer device, comprising: a processor, a memory;
and a computer program stored on the memory and executable by the processor;
the processor, when executing the computer program, implements the method of accessing memory by the I/O device of any of claims 1-9.
CN201810240206.XA 2018-03-22 2018-03-22 Method, device and equipment for accessing memory by I/O equipment Active CN110297787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810240206.XA CN110297787B (en) 2018-03-22 2018-03-22 Method, device and equipment for accessing memory by I/O equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810240206.XA CN110297787B (en) 2018-03-22 2018-03-22 Method, device and equipment for accessing memory by I/O equipment

Publications (2)

Publication Number Publication Date
CN110297787A CN110297787A (en) 2019-10-01
CN110297787B true CN110297787B (en) 2021-06-01

Family

ID=68025548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810240206.XA Active CN110297787B (en) 2018-03-22 2018-03-22 Method, device and equipment for accessing memory by I/O equipment

Country Status (1)

Country Link
CN (1) CN110297787B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11210234B2 (en) 2019-10-31 2021-12-28 Advanced Micro Devices, Inc. Cache access measurement deskew
CN111026682B (en) * 2019-12-26 2022-03-08 浪潮(北京)电子信息产业有限公司 Data access method and device of board card chip and computer readable storage medium
CN112069091B (en) * 2020-08-17 2023-09-01 北京科技大学 Memory access optimization method and device applied to molecular dynamics simulation software
CN112181864B (en) * 2020-10-23 2023-07-25 中山大学 Address tag allocation scheduling and multipath cache write-back method for Path ORAM
CN113392043A (en) * 2021-07-06 2021-09-14 南京英锐创电子科技有限公司 Cache data replacement method, device, equipment and storage medium
CN114115746A (en) * 2021-12-02 2022-03-01 北京乐讯科技有限公司 Full link tracking device of user mode storage system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000013091A1 (en) * 1998-08-28 2000-03-09 Alacritech, Inc. Intelligent network interface device and system for accelerating communication
US6353877B1 (en) * 1996-11-12 2002-03-05 Compaq Computer Corporation Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line write
CN102298556A (en) * 2011-08-26 2011-12-28 成都市华为赛门铁克科技有限公司 Data stream recognition method and device
CN104756090A (en) * 2012-11-27 2015-07-01 英特尔公司 Providing extended cache replacement state information
CN104781753A (en) * 2012-12-14 2015-07-15 英特尔公司 Power gating a portion of a cache memory
CN107368433A (en) * 2011-12-20 2017-11-21 英特尔公司 The dynamic part power-off of memory side cache in 2 grades of hierarchy of memory

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6353877B1 (en) * 1996-11-12 2002-03-05 Compaq Computer Corporation Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line write
WO2000013091A1 (en) * 1998-08-28 2000-03-09 Alacritech, Inc. Intelligent network interface device and system for accelerating communication
CN102298556A (en) * 2011-08-26 2011-12-28 成都市华为赛门铁克科技有限公司 Data stream recognition method and device
CN107368433A (en) * 2011-12-20 2017-11-21 英特尔公司 The dynamic part power-off of memory side cache in 2 grades of hierarchy of memory
CN104756090A (en) * 2012-11-27 2015-07-01 英特尔公司 Providing extended cache replacement state information
CN104781753A (en) * 2012-12-14 2015-07-15 英特尔公司 Power gating a portion of a cache memory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《面向多线程应用的Cache优化策略及并行模拟研究》;唐轶轩;《中国博士学位论文全文数据库 信息科技辑》;20130131;全文 *

Also Published As

Publication number Publication date
CN110297787A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110297787B (en) Method, device and equipment for accessing memory by I/O equipment
US9122631B2 (en) Buffer management strategies for flash-based storage systems
US10579531B2 (en) Multi-line data prefetching using dynamic prefetch depth
US20010014931A1 (en) Cache management for a multi-threaded processor
US10642709B2 (en) Processor cache tracing
CN113641596B (en) Cache management method, cache management device and processor
US7702875B1 (en) System and method for memory compression
CN115495394A (en) Data prefetching method and data prefetching device
US7356650B1 (en) Cache apparatus and method for accesses lacking locality
CN115809028A (en) Cache data replacement method and device, graphic processing system and electronic equipment
CN111488293B (en) Access method and equipment for data visitor directory in multi-core system
CN109478164A (en) For storing the system and method for being used for the requested information of cache entries transmission
CN111414321B (en) Cache protection method and device based on dynamic mapping mechanism
CN111736900B (en) Parallel double-channel cache design method and device
CN117609314A (en) Cache data processing method, cache controller, chip and electronic equipment
US7237084B2 (en) Method and program product for avoiding cache congestion by offsetting addresses while allocating memory
US20100257319A1 (en) Cache system, method of controlling cache system, and information processing apparatus
US7797492B2 (en) Method and apparatus for dedicating cache entries to certain streams for performance optimization
US20150378935A1 (en) Storage table replacement method
US20090157968A1 (en) Cache Memory with Extended Set-associativity of Partner Sets
US20030033483A1 (en) Cache architecture to reduce leakage power consumption
CN112579482B (en) Advanced accurate updating device and method for non-blocking Cache replacement information table
US7603522B1 (en) Blocking aggressive neighbors in a cache subsystem
CN118295936B (en) Management method and device of cache replacement policy and electronic equipment
US8312221B2 (en) Cache system, cache system control method, and information processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Applicant after: Loongson Zhongke Technology Co.,Ltd.

Address before: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Applicant before: LOONGSON TECHNOLOGY Corp.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant