CN117093132A - Data processing method, device, processor and computer system - Google Patents
Data processing method, device, processor and computer system Download PDFInfo
- Publication number
- CN117093132A CN117093132A CN202210514855.0A CN202210514855A CN117093132A CN 117093132 A CN117093132 A CN 117093132A CN 202210514855 A CN202210514855 A CN 202210514855A CN 117093132 A CN117093132 A CN 117093132A
- Authority
- CN
- China
- Prior art keywords
- memory
- memory space
- address
- processor core
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 97
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000013519 translation Methods 0.000 description 51
- 230000014616 translation Effects 0.000 description 51
- 238000007726 management method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 239000007787 solid Substances 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000013403 standard screening design Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A data processing method, a device, a processor and a computer system are disclosed, and relate to the field of computers. The method comprises the following steps: the first processor core acquires a first memory access request, the first memory access request is used for instructing the first processor to check data stored in a first memory space to execute data processing, when the first memory space is a memory space of a memory associated with a first attraction domain of the first processor core, the first processor core determines an address of the first memory space, and executes the data processing of the first memory access request according to the address of the first memory space. Therefore, access conflict existing when the multiprocessor cores perform memory access in parallel is avoided, the overall time length of the memory access is reduced, and the efficiency of the memory access is improved.
Description
Technical Field
The present application relates to the field of computers, and in particular, to a data processing method, apparatus, processor and computer system.
Background
Currently, a computer system includes a conventional memory (e.g., dynamic random-access memory (DRAM)) and an extended memory (e.g., storage-class-memory (SCM), solid State Disk (SSD)), where the conventional memory and the extended memory are used as a main memory (main memory) of the computing system to expand a storage capacity of the conventional memory. Further, a processor (Central Processing Unit, CPU) in the computer system accesses the memory using a virtual memory technology, that is, the processor accesses the memory according to a Physical Address (Physical Address) corresponding to the virtual Address (Virtual Memory Address). The same CPU may include a plurality of processor cores (cores) for processing different memory access requests, respectively, and when the processor cores perform data processing of the access requests on addresses of the memory space according to the memory access requests, the processor cores operate on metadata in the memory for describing attributes of data in the memory. Multiple processor cores can access the memory space of the memory indicated by the physical address associated with the same virtual address, and can operate on the same metadata. Therefore, the metadata is globally shared in the memory, and the multiple processor cores cannot operate on the metadata at the same time, so that the problem that the memory accessed by the multiple processor cores is inconsistent is avoided.
When a first processor core of the plurality of processor cores executes the data processing of the access request to the address of the memory space according to the memory access request, the processor cores other than the first processor core need to wait for the first processor core to finish the access to the memory, and the first processor core can execute the data processing of the access request. Therefore, the operation of the access request may generate an access conflict, resulting in a long memory access duration and finally resulting in a decrease in the processing performance of the processor. Therefore, how to provide a more efficient memory access method is a technical problem to be solved.
Disclosure of Invention
The application provides a data processing method, a data processing device, a processor and a computer system, so that the memory access efficiency of the processor is improved.
In a first aspect, a data processing method is provided for use with a computer system including a processor and a memory, performed by a first processor core of the processor. The data processing method comprises the following steps: the first processor core acquires a first memory access request, the first memory access request is used for instructing the first processor to check data stored in a first memory space to execute data processing, and when the first memory space is a memory space of a memory associated with a first suction Zone (attachment Zone) of the first processor core, the first processor core determines an address of the first memory space and executes the data processing of the first memory access request according to the address of the first memory space.
Thus, when the first memory space to be accessed indicated by the first memory access request is a memory space of a memory associated with a first suction domain of the first processor core, the first processor checks an address of the first memory space to execute data processing of the first memory access request, and associates the first processor core with the first memory space through the suction domain, so that processor cores other than the first processor core cannot execute data processing on the address of the first memory space. Thus, different processors perform data processing by checking the addresses of different memory spaces, and different processor cores operate on different metadata when performing data processing. The operations of the plurality of processor cores on the metadata are not affected, which is equivalent to that the metadata in the memory is divided into a plurality of mutually isolated parts according to the attraction domain. The plurality of processor cores can operate the metadata in the respective attraction domains simultaneously, so that when the first processor core executes the data processing of the access request on the address of the memory space according to the memory access request, the processor cores outside the first processor core can operate the metadata outside the first attraction domain to complete the memory access without waiting for the first processor core to complete the memory access and the metadata operation. And further, access conflict existing when the multiprocessor cores perform memory access in parallel is avoided, the overall time length of the memory access is reduced, and the efficiency of the memory access is improved.
In one possible implementation, the first attraction domain includes an association relationship of a memory space address and an attraction domain identifier, where the attraction domain identifier is used to indicate an attraction domain to which the association relationship belongs, and the memory space address is an address of a memory space associated with the attraction domain to which the association relationship belongs.
In the implementation manner, the processor core determines whether the memory space is the storage space of the memory associated with the attraction domain of the processor core according to the attraction domain identifier and the memory space address contained in the association relationship, so that the judging efficiency of the processor for checking the association between the memory space and the attraction domain is ensured.
For example, the association is a page table entry, and the attraction domain identification is set in a reserved bit of the page table entry. The suction domain identifier is recorded by using the reserved bits in the page table entry, and the processor can identify the suction domain identifier without modifying the micro-architecture.
As another example, the attraction domain identification includes a processor core identification. The processor core identification in the association can indicate that the association belongs to an attraction domain of the processor core represented by the processor core identification.
In one possible implementation, the first memory access request may be generated by the first processor core in the case where a thread executed by the first processor core needs to read and write data of the first memory space.
In another possible implementation, the first memory access request may be generated by a processor core outside the first processor core and sent to the first processor core in a case where a thread executed by the processor core outside the first processor core needs to read and write data of the first memory space. After the first processor core executes the data processing of the first memory access request according to the address of the first memory space, the obtained data is sent to the processor core sending the first memory access request, so that the processor is prevented from checking the memory space which is not associated with the self attraction domain of the processor core to perform the data processing, and the isolation strength of the memory spaces which are associated with different attraction domains is improved.
The processor core may also dynamically adjust the association relationship contained in the attraction domain. For example, the first processor core adds an association relationship that does not belong to any attraction domain to the first attraction domain.
The step of the first processor core adding an association not belonging to any attraction domain to the first attraction domain may be as follows: the first processor core may obtain a second memory access request, where the second memory access request is used to instruct the first processor to check data stored in the second memory space to perform data processing. And under the condition that the second memory space is not associated with any attraction domain and the accumulated number of times of acquiring the second memory access request is larger than a preset threshold value, the first processor checks the data stored in the second memory space to execute data processing and adds the association relation of the addresses of the second memory space to the first attraction domain.
Optionally, the adding, by the first processor core, the association relationship of the addresses of the second memory space to the first attraction domain may include: and setting the suction domain identifier of the association relation containing the address of the second memory space as the suction domain identifier of the first suction domain.
For another example, the first processor core migrates the association relationship belonging to the first attraction domain into an attraction domain outside the first attraction domain.
The step of the first processor core migrating the association belonging to the first attraction domain into the attraction domain outside the first attraction domain may be as follows: the first processor core acquires a third memory access request, wherein the third memory access request is used for instructing the first processor core to check data stored in a third memory space to execute data processing, and the third memory access request is generated for the processor core outside the first processor core and is sent to the first processor core. The first processor core adds the association relation containing the address of the third memory space to the attraction domain of the processor core with the largest number of times of sending the third memory access request.
Optionally, the first processor core adds an association relationship including an address of the third memory space to an attraction domain of the processor core that sends the third memory access request the most times, including: and setting the suction domain identifier of the suction domain of the processor core with the largest sending times of the third memory access request according to the suction domain identifier of the association relation containing the address of the third memory space.
Therefore, the processor core realizes expansion and reduction of the attraction domain, so that the memory space with larger access possibility by the processor core is associated with the attraction domain according to the frequency of the processor core accessing different memory spaces, the probability that the memory access request acquired by the processor core indicates the processor core to check the memory space which does not belong to the attraction domain of the processor core to access is reduced, and the utilization rate of processing resources of the processor core is improved.
As an alternative implementation, the association of the first attraction domain may be stored in a storage unit connected to the first processor core, where the storage unit is configured to store and query the association contained in the first attraction domain. The first memory access request includes a virtual address of a first memory space, and the memory space of the association relationship includes a physical address of the first memory space, and the step of determining the address of the first memory space by the first processor core may be as follows: the first processor core queries the association relation in the storage unit according to the virtual address of the first memory space to obtain the first association relation, and determines the physical address of the first memory space according to the first association relation.
Alternatively, the storage unit may be an address translation unit.
In addition, the first processor core may delete association relationships in the storage unit that do not belong to the first attraction domain every other preset period.
Therefore, the probability that the association relation queried by the processor core in the connected storage unit does not exist or fails is reduced, and the processing overhead of acquiring the association relation from the memory when the association relation is not queried by the processor core in the connected storage unit is reduced, so that the performance of memory access is improved.
It should be noted that, when the first processor core determines the physical address of the memory space according to the association relationship, if the association relationship does not belong to any attraction domain, belongs to the first attraction domain or belongs to the attraction domain outside the first attraction domain, all the processor cores do not need to stop the threads to update the storage unit, thereby improving the performance of memory access.
In a second aspect, a data processing apparatus is provided, the apparatus comprising means for performing the data processing method of the first aspect or any one of the possible implementations of the first aspect.
In a third aspect, a processor is provided, the processor core being configured to perform the operational steps of the data processing method of the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, a computer system is provided, the computer system comprising a memory and a processor according to the third aspect, the processor being configured to perform the steps of the data processing method according to the first aspect or any of the possible implementations of the first aspect, to perform data processing of a memory access request on an address of a storage space of the memory.
In a fifth aspect, there is provided a computer readable storage medium comprising: computer software instructions; the computer software instructions, when executed in a computing device, cause the computing device to perform the operational steps of the method as described in the first aspect or any one of the possible implementations of the first aspect.
In a sixth aspect, there is provided a computer program product for, when run on a computer system, causing a computing device to perform the operational steps of the method as described in the first aspect or any one of the possible implementations of the first aspect.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
FIG. 1 is a schematic diagram of a memory system with a multi-layered structure according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a computer system according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a page table entry according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a data processing step according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a further data processing step according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an attraction domain according to an embodiment of the present application;
FIG. 8 is a schematic flow chart of an expanding step of an attraction domain according to an embodiment of the present application;
fig. 9 is a flowchart illustrating a step of reducing an attraction domain according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
For easy understanding, related terms and related concepts such as virtual memory related to the embodiments of the present application are described below.
(1) Hierarchical storage
Different types of storage media, such as cache, main memory, and hard disk, may be configured throughout the computer system. In the data processing process, due to the influence of factors such as a communication protocol, an access path, a bandwidth and the like between the CPU and the storage medium, the access speed of the CPU to the storage medium often has a difference, and data is usually stored and accessed in a hierarchical storage mode.
The cache of the processor can be divided into a first-Level cache (Level 1 cache), a second-Level cache (Level 2 cache) and a third-Level cache (Level 3 cache) according to the speed of accessing data by the CPU. A main memory (or referred to as a memory) may also be configured in the computer system, for example, a random access memory (Random Access Memory, RAM), a dynamic random access memory (Dynamic Random Access Memory, DRAM), a solid state disk, and a mechanical hard disk (HDD) may be used as the main memory of the computer system.
There are many types of hard disks, and each type of hard disk has different performance. For example, SSDs have higher data storage speeds than mechanical hard disks. If data which is frequently accessed and has high performance requirements is placed in a hard disk with high read-write performance, data which is originally stored in the high-performance hard disk and is not frequently accessed or has low performance requirements is moved to a low-performance hard disk.
Fig. 1 is a schematic diagram of a storage system with a multi-layer structure according to the present application. From the first layer to the third layer, the storage capacity is gradually increased, the access speed is gradually reduced, and the cost is gradually reduced. As shown in fig. 1, the first hierarchy includes registers 111, a first level cache 112, a second level cache 113, and a third level cache 114 located within a processor 210. The second level of memory may be included as the main memory, i.e., memory, of the computer system. For example, a dynamic random access memory 121, a double data rate synchronous dynamic random access memory (DDR SDRAM) 122, and a storage-class-memory (SCM) 123. The main memory may simply be referred to as main memory or memory, i.e. memory that exchanges information with the CPU. The third level may include memory that is used as a secondary memory, i.e., external memory, of the computer system. For example, a network memory 131, a Solid State Disk (SSD) 132, and a hard Disk drive 133. The auxiliary memory may be simply referred to as auxiliary memory or external memory, i.e., a hard disk in this embodiment. Compared with main memory, the hard disk has large storage capacity and low access speed. It can be seen that the closer the memory is to the CPU, the smaller the capacity, the faster the access speed, the greater the bandwidth, and the lower the delay. Therefore, the memory included in the third level stores data that is not frequently accessed by the CPU, and the reliability of the data is improved. The memory included in the second hierarchy can be used as a cache device for storing data frequently accessed by the CPU, significantly improving the access performance of the system.
(2) Virtual memory
The programs running in the computer system all need memory as storage space, and if the executed programs occupy a lot of memory resources, the memory consumption is lost. To solve this problem, the operating system of the computer system uses virtual memory technology, that is, uses a portion of the hard disk space as the storage space of the memory. The operating system provides a Virtual Memory for users through the Virtual Memory technology, and when the program is started, the operating system stores part of data of the program into the Memory, and the rest part of data is left on the hard disk, so that the program can be started. When the data to be accessed by the program does not exist in the memory during the execution of the program, the operating system replaces the needed partial data into the memory, and then the program is continuously executed. On the other hand, the operating system swaps out data in the memory which is temporarily not accessed by the program to the hard disk, thereby making room for storing the data to be transferred into the memory. Thus, the system provides a memory that is much larger than the physical memory, i.e., virtual memory, through virtual memory technology, and logically implements memory expansion. From a technical perspective, the operating system uses virtual memory technology to set a "contiguous" piece of virtual memory space for each program, partition the virtual memory space into multiple pages (pages) with contiguous address ranges, and map the pages to physical memory, which is the pages of virtual memory space to physical memory during program execution. When a program needs to access a section of address space in a physical memory, mapping between a virtual address and a physical address is executed by hardware; when the program refers to an address space which is not in the physical memory, the operating system is responsible for storing the missing part into the physical memory and re-executing the failed instruction.
(3) Page table entry
Data is stored in units of bytes in a memory (e.g., memory, hard disk) and, in order to properly store or retrieve data, each byte unit is given a unique memory address, referred to as a physical address, which is an address used for data access in virtual memory technology. A Logical Address (Logical Address) is an Address given by an access memory instruction (also called an intra-access instruction) in a computer system having an Address translation function, and is also called a relative Address. I.e., machine language instructions, are used to specify an operand or an address of an instruction. The logical address may be comprised of a segment selector plus an offset address to the relative address in a given segment. The logical address is calculated or converted in an addressing mode to obtain the actual effective address, i.e. the physical address, in the internal memory. The Linear Address (Linear Address) is a middle layer between logical Address to physical Address translations. The program code generates a logical address, or offset address, in a segment, plus the base address of the corresponding segment, generating a linear address. The virtual address is an intra-segment offset address in the logical address.
The virtual address space is divided into units called pages (pages) in a fixed size, which correspond to Page frames (Page frames) in the physical memory, which are blocks of the physical memory partition. Pages are typically the same size as page frames, e.g., 4KB, 8KB, 16KB, and even larger, and in practice are typically 512 bytes to 1GB in a computer system, which is the paging technique in virtual memory technology.
The page table is a one-to-one relationship table of pages and page frames, and plays an indexing role when a page frame corresponding to one page is queried. The correspondence of one of the page tables to a page frame is referred to as a page table entry. The page table consists of a number of page table entries whose structure depends on the machine architecture, but basically all differ in size. For example, the page table entry contains an identifier, data auxiliary information, an identifier being a virtual address, and data being a physical address to which the virtual address maps, the auxiliary information including a valid bit, a modified bit, an access bit, and a reserved bit. The valid bit indicates whether the page is currently in memory, and when a thread executing on the processor or processor core attempts to access a page that is not in memory, a page fault interrupt is caused. The protection bits indicate the type of access allowed by the page, e.g. read-write or read-only, etc. The modification bits and access bits are introduced to record page usage, which is used in page replacement. For example, after a memory Page is modified by a program, hardware will automatically set a modification bit, if the next program is in a Page fault, and if a Page replacement algorithm needs to be run to call the Page out to make room for the Page to be called in, the modification bit will be accessed first, so that it is known that the Page is modified, that is, a Dirty Page (Dirty Page), then the latest Page content needs to be written back to the hard disk for storage, otherwise, it means that the copy contents on the memory and the hard disk are synchronous, and no writing back to the hard disk is needed; the access bit is also set automatically by the system when the program accesses the page, and is a value that the page replacement algorithm will use, and the operating system will determine whether to eliminate the page according to whether the page is being accessed, so that the page that is not used is more suitable to be eliminated. The reserved bits are used to store other auxiliary information such as the attraction domain identification in this embodiment.
The embodiment of the application provides a data processing method, a first processor core acquires a first memory access request, the first memory access request is used for indicating a first processor to check data stored in a first memory space to execute data processing, if the first memory space is a memory space of a memory associated with a first suction domain of the first processor core, the first processor core executes the data processing of the first memory access request according to the address of the first memory space after determining the address of the first memory space. The processor cores are associated with the memory space associated with the attraction domain through the attraction domain, and then the attraction domain is associated with metadata for describing the attribute of the data stored in the memory space associated with the attraction domain, so that the plurality of processor cores can process the metadata associated with the attraction domain and the data in the memory space when performing memory access in parallel, access conflict of the multiprocessor cores during the memory access in parallel is avoided, the overall time of the memory access is reduced, and the efficiency of the memory access is improved.
It should be noted that, the data processing method in the embodiment of the present application may be executed by a processor in addition to the first processor core. If the data processing method is executed by the processor, the difference between the first processor core execution method and the first memory space is that the first memory space is a storage space of a memory associated with a first attraction domain of the processor.
The following describes in detail the implementation of the embodiment of the present application with reference to the drawings.
Fig. 2 is a schematic diagram of a computer system according to an embodiment of the application. As shown in FIG. 2, computer system 200 includes a processor 210, a memory 220, and a bus 230.
The processor 210 may be a graphics processing unit (graphic processing unit, GPU), a central processing unit (central processing unit, CPU), other general purpose processor, digital signal processor (digital signal processing, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.
Processor 210 includes at least one processor core. Processor cores, also known as cores, are the most important components of a processor. Various processor cores have fixed logic structures such as a level one cache, a level two cache, an execution unit, an instruction level unit, and a bus interface. Each processor core of processor 210 has a unique processor core identification (id) in the operating system.
Illustratively, as shown in FIG. 2, processor 210 includes a processor core 211, a memory management unit (Memory Management Unit, MMU) 212, and a processor core 215. Memory management unit 212 is coupled to memory 220 and to the hard disk via bus 230. The hard disk in this embodiment may also be referred to as an external memory with respect to the memory 120.
The memory management Unit 212 is a hardware Unit in a processor or processor core, and the memory management Unit 212 includes an address translation Unit 213 and a page Table Walk Unit (TWU) 214. The memory management unit 212 is mainly used for managing virtual memory and control lines of physical memory, and maps virtual addresses to physical addresses. Typically there is one memory management unit per processor core. In this embodiment, the processor core 211 may have one or more memory units connected thereto. For example, the first storage unit is an address translation unit 213 and the second storage unit is a memory 220. In some possible embodiments, the memory management unit 212 may be disposed within the processor core 211. Alternatively, the address translation unit 213 in the present embodiment may also be referred to as an address translation cache (Translation Lookaside Buffer, TLB).
The address translation unit 213 is configured to store a correspondence relationship between a virtual address and a physical address, i.e., a page table entry. If the virtual address requested by the CPU exists in the correspondence between the virtual address and the physical address of the address translation unit 213, the address translation unit 213 will quickly match a page table entry containing the virtual address of the access request, and send the confirmed page table entry to the CPU, which accesses the memory 220 according to the physical address contained in the obtained page table entry. In this way, when the memory management unit 212 is not used by the processor core 211 to perform memory access, the page table entry needs to be obtained from the memory 220, and then the physical address needs to be obtained from the page table entry, and the memory management unit 212 can obtain the physical address by using the address translation unit 213, so that the translation from the virtual address to the physical address is realized without accessing the memory 220, and the translation speed from the virtual address to the physical address is improved.
Illustratively, the process by which address translation unit 213 matches page table entries for virtual addresses includes: the memory management unit 212 extracts the virtual address from the access request, sends the virtual address to the address translation unit 213, and the address translation unit 213 looks up whether there is a page table entry identical to the virtual address in a memory space in which it can access and store page table entries; if so, indicating an address translation unit hit (TLB hit); if not, an address translation unit miss (TLB miss) is indicated. Alternatively, the memory management unit 212 may indicate that the page table entry queried in the address translation unit 213 is invalidated, e.g., the valid bit of the page table entry indicates that the page table entry is invalidated, or may indicate that the address translation unit is not hit.
In the present embodiment, if the virtual address requested by the CPU does not exist in the correspondence between the virtual address and the physical address of the address translation unit 213, the page table walk unit 214 walks up and acquires a page table entry in the memory 220. Wherein, the process of the page table traversing unit 214 to query and obtain the page table entry in the memory 220 includes: the page table walk unit 214 performs a page table walk (Translation Table Walk) step, accesses the memory 120 multiple times according to the multi-level page tables such as the primary page table and the secondary page table to obtain the address of the page table entry, and obtains the page table entry in the memory 220 according to the address of the page table entry.
Alternatively, the memory management unit 212 of the processor 210 may be an integrated memory controller (integrated memory controller, IMC).
In this embodiment of the present application, the processor core 211 may obtain a first memory access request when the program needs to access first data, where the first memory access request is used to instruct the processor core 211 to perform data processing on data stored in the first memory space, and after determining an address of the first memory space, the processor core 211 performs data processing of the first memory access request according to the address of the first memory space.
Alternatively, the first memory access request may be generated by the processor core 211 when the thread executed by the processor core 211 needs to access the first data. The first memory access request may be a data access instruction, where the data access instruction is a Load/Store instruction, and the Load/Store instruction is used to indicate data transfer between a register and a memory, and the Load/Store instruction includes a virtual address of data to be accessed by the processor core 211.
The first memory access request may be generated by a processor core other than the processor core 211 (e.g., the processor core 215) when a thread executed by the processor core other than the processor core 211 needs to access the first data, and the processor core generating the first memory access request may send the first memory access request to the processor core 211. The first memory access request may be a proxy (proxy) instruction for instructing the processor core 211 to perform data processing on the data stored in the first memory space, and transmitting the acquired data to the processor core generating the first memory access request.
Memory 220 and the hard disk are storage devices coupled to processor 210. The memory is a memory device for storing programs and various data. The larger the memory capacity, the slower the access speed. Conversely, the smaller the memory size, the faster the access speed. The access speed refers to a data transfer speed when writing data to or reading data from the memory. The access speed may also be referred to as a read-write speed. The memory may be divided into different levels depending on storage capacity and access speed.
In this embodiment, the memory 220 may store multiple levels of page tables, for example, a level one page table for storing a level one page table index, a level two page table for storing a level two page table index, a level three page table for storing a level three page table index, and a level four page table for storing page table entries. Optionally, as shown in the dashed box in fig. 2, the dashed line between the dashed box and memory 220 represents one to four levels of page tables stored in memory 220. The primary Page Table may be a Page Map four-Level Table (Page Map Level 4, pml 4), the secondary Page Table may be a Page Directory pointer Table (Page Directory Pointer Table, PDPT), the tertiary Page Table may be a Page Directory Table (PD), the four-Level Page Table may be a Page Table (PT), and the Page Table stores one or more Page Table entries. When the processor core 211 queries a page table entry in the multi-level page table according to the virtual address, the processor core 211 sequentially queries the first-level page table, the second-level page table, the third-level page table and the fourth-level page table according to the first-level page table index, the second-level page table index and the third-level page table index, obtains an address of the page table entry in the memory 220, and obtains the page table entry in the memory 220 according to the address.
Memory 220 may also store metadata, which is information describing the data attributes (properties) of physical pages of memory space in memory 220. Such as active list (active list), free list (free list), LRU list, etc. active list is used to maintain active physical pages, free list is used to maintain free physical pages. LRU list is used to maintain the proximity of the active physical page. The least recently used policy (Least Recently Used, LRU) is a common page replacement algorithm that selects the least recently used page to eliminate. The LRU uses a linked list to maintain the access condition of each data in the cache, adjusts the position of the data in the linked list according to the real-time access of the data, and then indicates whether the data is accessed recently or not in the existing period of time through the position of the data in the linked list.
With continued reference to fig. 2, the bus 230 may be a different type of bus for different connection objects. For example, the bus 230 between the processor 210 and the memory 220 may be a DDR3 interface standard bus, and the bus 230 between the processor 210 and the external memory may be a peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) interface standard bus. In coordination with the bus of the PCIe interface standard, the external memory may be a PCIe SSD, which is a high-speed expansion card that connects computer system 200 to its peripheral devices. In other words, PCIe SSD takes the form of a built-in add-in card for a computer. Compared with interfaces such as SATA (serial advanced technology attachment)/SAS (serial attached small computer system interface), PCIe (peripheral component interconnect express) adopts a multi-queue mode in data transmission, so that the aim of single-disk concurrent data transmission can be fulfilled, and the efficiency of the data interface is improved.
In combination with the computer system 200, the data processing method provided in this embodiment can be applied to a memory expansion scenario, for example, a memory expansion scenario implemented by a virtual memory technology. Specifically, the data processing method of the embodiment of the application can be applied to the scenes such as memory expansion of a data center server, memory expansion of a cloud computing server and the like.
In the application scenario of memory expansion of the data center server and memory expansion of the cloud computing server, an operating system of the server distributes a thread for processing a request of the user terminal to the processor core 211 according to the request generated by the user terminal (for example, a mobile phone, a computer and a tablet computer) or the user terminal, the processor core 211 generates a first memory access request according to the thread, the first memory access request is used for instructing the processor core 211 to execute data processing on data stored in a first memory space, and the first memory space is a memory space of a memory associated with a first attraction domain of the processor core 211, wherein each processor core is associated with at least one attraction domain, and the same attraction domain is associated with only one processor core. The processor core 211 determines an address of the first memory space according to the first memory access request, and then performs data processing of the first memory access request according to the address of the first memory space. The processor cores are associated with the memory space associated with the attraction domain through the attraction domain, and the attraction domain is further associated with the metadata, so that the plurality of processor cores of the server can process the metadata associated with the attraction domain and the data in the memory space when performing memory access in parallel, access conflict of the multiprocessor cores during memory access in parallel is avoided, and the overall time of memory access is reduced. Therefore, the data processing method provided by the embodiment is applied to the application scenes of memory expansion of the data center server and memory expansion of the cloud computing server with larger concurrent access quantity, so that the concurrent access efficiency of the memory access is greatly improved, and the overall access time delay of the concurrent access is reduced.
Fig. 2 is merely a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, apparatuses, modules, etc. shown in fig. 2 is not limited in any way.
Next, please refer to fig. 3, a detailed description of the data processing method will be provided. In this embodiment, the first processor core is taken as the processor core 211 in fig. 2, and the processor core 211 accesses the first data is taken as an example.
Step 310, the processor core 211 obtains a first memory access request.
As shown in the implementation at step S310 and in dashed lines in fig. 3, the first memory access request may be initiated when the data in the first memory space in the memory 220 needs to be read/written during the execution of the thread by the processor core 211 or by a processor core outside the processor core 211, where the first memory access request includes a virtual address of the first memory space. The first memory access request is used to instruct the processor core 211 to perform data processing on the data stored in the first memory space. The first memory space is a storage space of the memory 220 associated with the first attraction domain of the processor core 211.
As a possible implementation manner, the first attraction domain includes an association relationship between a memory space address associated with the processor core 211 and an attraction domain identifier, where the attraction domain identifier of the association relationship in the first attraction domain is used to globally and uniquely indicate the first attraction domain to which the association relationship belongs, and the memory space address is an address of a memory space associated with the first attraction domain to which the association relationship belongs. The structure of the first suction domain is described in detail with reference to fig. 7 after step 330, and will not be described herein.
In the virtual memory technology, the processor core generally performs conversion of a virtual address and a physical address by using page table entries, so this embodiment may use page table entries as an association relationship, and a memory space address (virtual address and physical address) included in one page table entry is associated with an attraction domain represented by an attraction domain identifier.
Illustratively, as shown in FIG. 4, the present embodiment may also set the attractive-field identification to a reserved bit of the page table entry, thereby enabling the page table entry to carry the attractive-field identification without modifying the microarchitecture of the processor. The suction domain identification in the page table entry may be a processor core identification, e.g., the suction domain identification of the first suction domain is the processor core identification of the processor core 211, such that the page table entry represents the association of memory space, suction domain and processor core.
Step 320, the processor core 211 determines an address of the first memory space.
The address of the first memory space refers to a physical address of the first memory space. Taking the association relationship as the page table entry as an example, first, the processor core 211 queries whether the first page table entry with the same identifier as the virtual address exists in the address translation unit 213 according to the virtual address of the first memory space, if yes, indicates that the address translation unit hits (TLB hit), and step 330 is executed; if not, indicating an address translation unit miss (TLB miss), the processor core 211 retrieves the first page table entry from the memory 220 using the page table walk unit 214. Then, the processor core 211 determines the physical address of the first memory space according to the correspondence between the virtual address and the physical address in the first page table entry.
Optionally, in the case that the first memory space is a storage space of the memory 220 associated with the first attraction domain of the processor core 211, the memory management unit 212 performs an address translation unit loading (instruction) step, i.e. copies the first page table entry, after determining that the first page table entry belongs to the first attraction domain of the processor core 211, and stores the copied first page table entry in the address translation unit 213 of the processor core 211. After the processor core 211 stores the first page table entry in the address translation unit 213, if the processor core 211 needs to perform memory access according to the first page table entry again, the first page table entry may be directly hit in the address translation unit 213, so that the first page table entry is prevented from being acquired from the memory 220 by performing the page table entry traversal operation again, and the operation memory overhead and the time overhead caused by the page table entry traversal operation are reduced, thereby improving the memory access efficiency.
In step 330, the processor core 211 performs data processing of the first memory access request according to the address of the first memory space.
The processor core 211 performs data processing of the first memory access request according to the physical address of the first memory space by using the memory management unit 212.
The step of the memory management unit 212 performing the data processing of the first memory access request on the physical address of the first memory space according to the first memory access request generated for the processor core 211 or the processor core outside the processor core 211 may be as follows:
Case one, the first memory access request, is generated by the processor core 211.
As shown in fig. 5, the memory management unit 212 sends the physical address of the first memory space to the memory 220, receives the first data of the storage space of the physical address of the first memory space returned by the memory 220, and sends the first data to the register 111 of the processor core 211.
In case two, a first memory access request is generated by processor core 215.
As shown in fig. 6, numerals 1 to 4 in fig. 6 denote data transmission sequences, and the memory management unit 212 sends the physical address of the first memory space to the memory 220, receives the first data of the storage space of the physical address of the first memory space returned by the memory 220, and sends the first data to the register of the processor core 215.
Therefore, the processor cores are associated with the memory spaces through the attraction domains, and the processor cores are used for carrying out data processing on the memory spaces associated with the attraction domains, so that different processor cores can carry out data processing by checking addresses of different memory spaces, and then the different processor cores operate metadata for describing data of the memory spaces of the attraction domains when carrying out data processing. For example, the processor core 211 is configured to determine an address of the memory space according to a page table entry belonging to the first attraction domain, and perform data processing on the address of the memory space. Assuming that the first metadata is part of metadata of the data describing the address of the memory space associated with the first attraction domain in the memory 220, since the processor core 211 performs data processing only on the data of the address of the memory space associated with the first attraction domain, the processor core 211 only needs to operate on the first metadata, thereby associating the processor core 211, the first attraction domain, and the first metadata. Thus, different processor cores operate on different portions of metadata in memory 220, and different processor cores do not operate on the same portion of metadata. When one processor core of the plurality of processor cores performs data processing of the memory space, the other processor core does not need to wait for the processor core which is performing the data processing to finish the data processing and metadata operation, and then performs the data processing of the new memory space. And a plurality of processor cores can perform data processing concurrently, so that access conflict existing when the multiprocessor cores perform memory access in parallel is avoided, the overall time length of the memory access is reduced, and the efficiency of the memory access is improved.
The attraction domain of the present application is described in detail below in conjunction with fig. 7-9, and for ease of description, the first attraction domain associated with the processor core 211 is illustrated as an example.
First, the structure of the first suction domain will be described, referring to fig. 7, fig. 7 is a schematic structural diagram of the suction domain according to an embodiment of the present application.
The first attractive domain 700 includes one or more page table entries, an attractive domain identification of which is a processor core identification of the processor core 211.
Since the metadata in the memory is divided into a plurality of mutually isolated parts by the attraction domain, the processor core 211 operates only on the first metadata of the data of the memory space associated with the first attraction domain 700 when performing the data processing of the first memory access request according to the physical address of the first memory space. Thus, the first metadata may be included as data of the first attraction domain 700.
Alternatively, the first metadata may include an active table, an idle table, and the like.
As a possible implementation, the page table entry included in the first attraction domain 700 may be a page table entry stored in the address translation unit 213 or a page table entry stored in the memory 220.
It should be noted that the page table entries included in the first attraction domain 700 may be dynamically adjusted, for example, the processor core 211 implements the entry of the page table entries according to the access frequency to the memory space, i.e., adds the page table entries to the first attraction domain 700, thereby implementing the expansion of the first attraction domain 700.
Next, the expansion step of the first suction domain 700 will be described in detail with reference to fig. 8.
In step 810, the processor core 211 queries the address translation unit 213 for a second page table entry according to the second memory access request, and the TLB miss occurs.
The second memory access request is used to instruct the processor core 211 to perform data processing on second data stored in a second memory space corresponding to the virtual address of the second page table entry.
Step 820, processor core 211 retrieves the second page table entry from memory 220.
The processor core 211 performs a page table walk step using the page table walk unit 214 to obtain a second page table entry from the memory 220, the suction field in the second page table entry being identified as an initial value, indicating that the second memory space is not associated with any suction field.
In step 830, if the accumulated number of times of the second page table entry is greater than the preset threshold, the processor core 211 performs data processing on the second memory access request according to the address of the second page table entry to the second memory space, and adds the second page table entry to the first attraction domain.
Each time the processor core 211 receives a second memory access request, a page table walk unit 214 is utilized to obtain a second page table entry from the multi-level page table of the memory 220, and PML4E, PDPTE, PDE and PTE in fig. 8 represent data queried in PML4, PDPT, PD and PT, respectively, by the second memory access request. The processor core 211 may record the cumulative number of times the page table walk unit 214 fetched the second page table entry from the memory 220 via the first counter. For example, page table walk unit 214 uses a register as a first counter when a second page table entry is first fetched from memory 220, and increments the value of the first counter by one each time the second page table entry is fetched from memory 220. Alternatively, the first threshold may be 5 times, 8 times, 15 times, or the like.
Optionally, the processor core 211 adding the second page table entry to the first attraction domain 700 further comprises: the processor core 211 copies the second page table entry using the page table walk unit 214 and stores the copied second page table entry in the address translation unit 213.
In addition, the processor core 211 may add the virtual page corresponding to the second page table entry to the active table in the first attraction domain 700, that is, the processor core 211 adds the virtual page corresponding to the second page table entry in the processor core footprint (footprits) to the active table, where the processor core footprint sequentially records addresses corresponding to which virtual pages the processor checks to perform data processing.
The processor core 211 may add the second page table entry to the first attraction domain, and may modify the attraction domain identifier of the second page table entry to the attraction domain identifier of the first attraction domain. After the processor core 211 stores the second page table entry in the address translation unit 213, if the processor core 211 needs to perform memory access according to the second page table entry again, the address translation unit 213 may directly query the second page table entry, so as to avoid performing the page table entry traversing operation again to obtain the second page table entry from the memory 220, and reduce the operation memory overhead and time overhead caused by the page table entry traversing operation, thereby improving the memory access efficiency and the utilization rate of the storage resources of the address translation unit 213.
In addition to the expansion of the first attractive domain 700, the dynamic adjustment of the page table entries contained by the first attractive domain 700 by the processor core 211 may also include a reduction of the first attractive domain 700. Next, the following describes the reduction step of the first suction field 700 in detail with reference to fig. 9, and (1) to (6) in fig. 9 show the execution sequence of the steps.
In step 910, the processor core 211 queries the address translation unit 213 for a third page table entry according to the third memory access request, and a TLB miss occurs.
The third memory access request is a proxy instruction sent by a processor core other than the processor core 211, and is used for instructing the processor core 211 to execute data processing on third data stored in a third memory space corresponding to the virtual address of the third page table entry.
Taking a processor core that sends a proxy instruction to the processor core 211 as an example of the processor core 215, the processor core 215 obtains a memory access request, and according to a virtual address included in the memory access request, a third page table entry is not queried in an address translation unit of the processor core 215, and if a TLB miss occurs, the processor core 215 obtains the third page table entry in the memory 220 by using a page table traversing unit of the processor core 215. When the suction domain identification of the third page table entry is the suction domain identification of the first suction domain 700, a proxy instruction is sent to the processor core 215 to the processor core 211, the proxy instruction containing a virtual address of the third memory space.
In step 920, processor core 211 retrieves the third page table entry from memory 220.
The processor core 211 performs a page table walk step using the page table walk unit 214 to obtain a third page table entry from the multi-level page table in the memory 220, with PML4E, PDPTE, PDE and PTE in fig. 9 representing data queried in PML4, PDPT, PD and PT, respectively, by the third memory access request. The attractive-field identification in the third page table entry is the attractive-field identification of the processor core 211.
Step 930, processor core 211 migrates the third page table entry into the attraction domain of the processor core that sent the third memory access request the most times.
Assuming that processor core 215 is the most frequently processor core that sends the third memory access request to processor core 211, processor core 211 migrates the third page table entry into the attraction domain of processor core 215, which may be by modifying the attraction domain identification of the third page table entry to the attraction domain identification of the attraction domain of processor core 215.
Each time the processor core 211 receives a proxy instruction, a third page table entry is fetched from the memory 220 using the page table walk unit 214. For each processor core that sends a proxy instruction to the processor core 211, the processor core 211 may use a counter to record the number of times the processor core 211 outside the processor core 211 instructs the processor core 211 to perform data processing on the third data stored in the third memory space.
The processor core 211 may record the number of times that the proxy instruction sent by the processor core 215 was received through the second counter. For example, the first time processor core 211 receives a proxy instruction sent by processor core 215, a register is used as the second counter, and the value of the second counter is incremented by one each time the proxy instruction sent by processor core 215 is received. Alternatively, the second threshold may be 5 times, 8 times, 15 times, or the like.
Optionally, the processor core 211 migrates the third page table entry to an attraction domain of the processor core 215, further comprising: the processor core 211 copies the third page table entry using the page table walk unit 214 and stores the copied third page table entry in the address translation unit 213.
In addition, the processor core 211 may also add the virtual page corresponding to the third page table entry to the active table of the attraction domain of the processor core 215, i.e., the processor core 215 adds the virtual page corresponding to the third page table entry in the processor core footprint to the active table.
Therefore, the processor core 211 realizes the migration of the third page table item among different attraction domains, and the processor core 215 does not need to send a proxy instruction to the processor core 211 when performing the data processing of the third memory space again, but the processor core 215 performs the data processing of the third memory space according to the third page table item in the address translation unit connected with itself, thereby improving the efficiency of memory access and reducing the cost of communication resources among the processor cores. On the other hand, the migration and the domain entry are matched to realize the dynamic adjustment of the attraction domain, so that the page table entry in the attraction domain is the page table entry with higher access frequency of the processor core, thereby improving the utilization rate of the storage resource of the address translation unit.
In addition to the fact that more page table entries occupy the utilization of the memory resources of the address translation unit, after data is swapped in/out between the memory 220 and the hard disk, all processors check the connected address translation unit to update the page table entries, and trigger TLB shootdown (shootdown) to halt the thread being executed by all processor cores, which has the problem that the utilization of the computing resources of the processor cores is low. In order to improve the utilization of the computing resources of the processor core, the processor core in this embodiment periodically updates its own address translation unit.
Taking the processor core 211 as an example, the processor core 211 periodically updates the address translation unit 213 with the page table walk unit 214. The page table walk unit 214 determines the suction domain identifications of all page table entries in the address translation unit 213, retains the page table entries in the address translation unit 213 whose suction domain identifications are the suction domain identifications of the first suction domain 700, and deletes the page table entries in the address translation unit 213 whose suction domain identifications are not the suction domain identifications of the first suction domain 700.
The update address translation units 213 may be equal or unequal for every two adjacent cycles. The length of the period may be 10 seconds, 20 seconds, 30 seconds, or any other length.
Therefore, the processor core 211 completes refreshing of page table entries by using the page table traversing unit 214, avoids querying page table entries which do not belong to the first attraction domain in the address translating unit 213, and reduces the triggering times of TLB (TLB) knockdown, thereby improving the utilization rate of computing resources of the processor core and improving the memory access performance of the processor 210.
It should be noted that, in this embodiment, when the TLB miss occurs in the page table entry being queried in the address translation unit, neither the processor core 211 nor the processor core 215 initiates the TLB shootdown, thereby further improving the efficiency of the memory access.
The data processing method provided according to the present embodiment is described in detail above with reference to fig. 1 to 9, and the data processing apparatus provided according to the present embodiment will be described below with reference to fig. 10.
Fig. 10 is a schematic structural diagram of a possible data processing apparatus according to this embodiment. These data processing means may be adapted to carry out the functions of the processor or processor core of the above-described method embodiments, and thus may also achieve the advantageous effects provided by the above-described method embodiments.
As shown in fig. 10, the data processing apparatus 1000 includes a transceiver module 1010 and a processing module 1020. The data processing device 800 is used to implement the functionality of the processor core 211 in the method embodiments shown in fig. 3, 8 or 9 described above.
The transceiver module 1010 is configured to obtain a first memory access request, where the first memory access request is used to instruct the first processor to check data stored in a first memory space to perform data processing, and the first memory space is a memory space of a memory associated with a first attraction domain of the first processor core.
The processing module 1020 is configured to determine an address of the first memory space, and perform data processing of the first memory access request according to the address of the first memory space.
As one possible implementation manner, the first attraction domain includes an association relationship of a memory space address and an attraction domain identifier, and the attraction domain identifier is used for indicating an attraction domain to which the association relationship belongs, and the memory space address is an address of a memory space associated with the attraction domain to which the association relationship belongs.
For example, the association is a Page Table Entry (PTE), and the attraction domain identification is set in a reserved bit of the Page Table Entry. The suction domain identifier is recorded by using the reserved bits in the page table entry, and the processor can identify the suction domain identifier without modifying the micro-architecture.
As another example, the attraction domain identification includes a processor core identification. The processor core identification in the association can indicate that the association belongs to an attraction domain of the processor core represented by the processor core identification.
In one possible implementation, the first memory access request may be generated by the first processor core in the case where a thread executed by the first processor core needs to read and write data of the first memory space.
In another possible implementation, the first memory access request may be generated by a processor core outside the first processor core and sent to the first processor core in a case where a thread executed by the processor core outside the first processor core needs to read and write data of the first memory space.
The data processing apparatus 1000 may dynamically adjust the association relationship included in the attraction domain.
For example, the transceiver module 1010 is further configured to obtain a second memory access request. The processing module 1020 is configured to perform steps 810-830 in fig. 8.
As another example, the transceiver module 1010 is further configured to obtain a third memory access request, and the processing module 1020 is configured to perform steps 910-930 in fig. 9.
As a possible implementation manner, the association relationship of the first attraction domain may be stored in a storage unit connected to the first processor core, where the storage unit is configured to store and query the association relationship contained in the first attraction domain. The first memory access request includes a virtual address of a first memory space, and the associated memory space includes a physical address of the first memory space. The processing module 1020 is specifically configured to, when determining the address of the first memory space: and inquiring the association relation in the storage unit according to the virtual address of the first memory space to obtain the first association relation, and determining the physical address of the first memory space according to the first association relation.
Alternatively, the storage unit may be an address translation unit.
In addition, the processing module 1020 is further configured to: and deleting the association relation which does not belong to the first attraction domain in the storage unit every other preset period.
It should be appreciated that the data processing apparatus 1000 of embodiments of the present application may be implemented as an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), which may be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof.
The data processing apparatus 1000 according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of each unit in the data processing apparatus 1000 are for implementing the corresponding flow of each method in fig. 3, 8 or 9, and are not described herein for brevity.
The method steps in this embodiment may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a computing device. The processor and the storage medium may reside as discrete components in a computing device.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; optical media, such as digital video discs (digital video disc, DVD); but also semiconductor media such as solid state disks (solid state drive, SSD). While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (14)
1. A method of data processing, the method being adapted for use in a computer system comprising a processor and a memory, the method being performed by a first processor core of the processor and comprising:
acquiring a first memory access request, wherein the first memory access request is used for instructing the first processor to check data stored in a first memory space to execute data processing, and the first memory space is a memory space of the memory associated with a first attraction domain of the first processor core;
determining the address of the first memory space;
and executing the data processing of the first memory access request according to the address of the first memory space.
2. The method of claim 1, wherein the first attraction domain comprises an association of a memory space address and an attraction domain identifier, the attraction domain identifier being used to indicate an attraction domain to which the association belongs, the memory space address being an address of a memory space associated with the attraction domain to which the association belongs.
3. The method according to claim 2, wherein the method further comprises:
acquiring a second memory access request, wherein the second memory access request is used for instructing the first processor to check data stored in a second memory space to execute data processing, and the second memory space is not associated with any attraction domain;
And if the accumulated times of the second memory access requests acquired by the first processing core are larger than a preset threshold, adding the association relation of the addresses of the second memory space to the first attraction domain.
4. The method of claim 2, wherein the first processor core is connected to a storage unit, the storage unit is configured to store and query an association relationship included in the first attraction domain, the first memory access request includes a virtual address of the first memory space, the memory space of the association relationship includes a physical address of the first memory space, and determining the address of the first memory space includes:
inquiring the association relation in the storage unit according to the virtual address of the first memory space to obtain a first association relation;
and determining the physical address of the first memory space according to the first association relation.
5. The method according to claim 4, wherein the method further comprises:
and deleting the association relation which does not belong to the first attraction domain in the storage unit every other preset period.
6. The method of any of claims 2-5, wherein the association is a page table entry and the attractive domain identification is provided in a reserved bit of the page table entry.
7. A data processing apparatus, comprising:
the receiving and transmitting module is used for acquiring a first memory access request, wherein the first memory access request is used for instructing the first processor to check data stored in a first memory space to execute data processing, and the first memory space is a memory space of the memory associated with a first attraction domain of the first processor core;
the processing module is used for determining the address of the first memory space;
the processing module is further configured to perform data processing of the first memory access request according to the address of the first memory space.
8. The apparatus of claim 7, wherein the first attraction domain comprises an association of a memory space address and an attraction domain identifier, the attraction domain identifier being used to indicate an attraction domain to which the association belongs, the memory space address being an address of a memory space associated with the attraction domain to which the association belongs.
9. The apparatus of claim 8, wherein the transceiver module is further configured to obtain a second memory access request, the second memory access request being configured to instruct the first processor to check data stored in a second memory space to perform data processing, the second memory space not being associated with any attraction domain;
The processing module is further configured to add an association relationship of addresses of the second memory space to the first attraction domain when the first processing core obtains that the accumulated number of times of the second memory access request is greater than a preset threshold.
10. The apparatus of claim 8, wherein a storage unit is connected to the first processor core, the storage unit configured to store and query an association relationship included in the first attraction domain, the first memory access request includes a virtual address of the first memory space, and a memory space of the association relationship includes a physical address of the first memory space, and the processing module is further configured to:
inquiring the association relation in the storage unit according to the virtual address of the first memory space to obtain a first association relation;
and determining the physical address of the first memory space according to the first association relation.
11. The apparatus of claim 10, wherein the processing module is further configured to:
and deleting the association relation which does not belong to the first attraction domain in the storage unit every other preset period.
12. The apparatus according to any of claims 8-11, wherein the association is a page table entry and the attractive domain identification is arranged in a reserved bit of the page table entry.
13. A processor, characterized in that it comprises a processor core for executing the operational steps of the method according to any of the preceding claims 1-6.
14. A computing system comprising a memory and a processor as claimed in claim 13, the processor being arranged to perform the steps of the method of any of claims 1-6 to perform the data processing of the memory access request on the address of the storage space of the memory.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210514855.0A CN117093132A (en) | 2022-05-12 | 2022-05-12 | Data processing method, device, processor and computer system |
PCT/CN2023/093749 WO2023217255A1 (en) | 2022-05-12 | 2023-05-12 | Data processing method and device, processor and computer system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210514855.0A CN117093132A (en) | 2022-05-12 | 2022-05-12 | Data processing method, device, processor and computer system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117093132A true CN117093132A (en) | 2023-11-21 |
Family
ID=88729756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210514855.0A Pending CN117093132A (en) | 2022-05-12 | 2022-05-12 | Data processing method, device, processor and computer system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117093132A (en) |
WO (1) | WO2023217255A1 (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101909966B1 (en) * | 2014-09-01 | 2018-10-19 | 후아웨이 테크놀러지 컴퍼니 리미티드 | File access method and apparatus, and storage system |
US11139967B2 (en) * | 2018-12-20 | 2021-10-05 | Intel Corporation | Restricting usage of encryption keys by untrusted software |
-
2022
- 2022-05-12 CN CN202210514855.0A patent/CN117093132A/en active Pending
-
2023
- 2023-05-12 WO PCT/CN2023/093749 patent/WO2023217255A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023217255A1 (en) | 2023-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200057729A1 (en) | Memory access method and computer system | |
US10067684B2 (en) | File access method and apparatus, and storage device | |
US10552337B2 (en) | Memory management and device | |
CN105740164B (en) | Multi-core processor supporting cache consistency, reading and writing method, device and equipment | |
CN109074317B (en) | Adaptive deferral of lease for an entry in a translation look-aside buffer | |
CN104346294B (en) | Data read/write method, device and computer system based on multi-level buffer | |
US20160085585A1 (en) | Memory System, Method for Processing Memory Access Request and Computer System | |
US10019377B2 (en) | Managing cache coherence using information in a page table | |
US11210020B2 (en) | Methods and systems for accessing a memory | |
US10997078B2 (en) | Method, apparatus, and non-transitory readable medium for accessing non-volatile memory | |
US11237980B2 (en) | File page table management technology | |
US20170277634A1 (en) | Using Leases for Entries in a Translation Lookaside Buffer | |
WO2023035646A1 (en) | Method and apparatus for expanding memory, and related device | |
US10108553B2 (en) | Memory management method and device and memory controller | |
US10114762B2 (en) | Method and apparatus for querying physical memory address | |
CN107870867B (en) | Method and device for 32-bit CPU to access memory space larger than 4GB | |
KR20160060550A (en) | Page cache device and method for efficient mapping | |
CN116383101A (en) | Memory access method, memory management unit, chip, device and storage medium | |
JP6343722B2 (en) | Method and device for accessing a data visitor directory in a multi-core system | |
US10241906B1 (en) | Memory subsystem to augment physical memory of a computing system | |
WO2023217255A1 (en) | Data processing method and device, processor and computer system | |
CN107870870B (en) | Accessing memory space beyond address bus width | |
EP4435578A1 (en) | Cache management method and apparatus, and system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |