CN112231099A

CN112231099A - Memory access method and device of processor

Info

Publication number: CN112231099A
Application number: CN202011096790.XA
Authority: CN
Inventors: 张娇慧
Original assignee: BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Current assignee: BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-15

Abstract

The embodiment of the invention provides a memory access method and a device of a processor, wherein the method comprises the following steps: when a service flow in a processor applies for a memory, acquiring a kernel identifier in the service flow, and determining node information of the service flow according to the kernel identifier; and applying for a memory from a memory pool corresponding to the node information so as to access the memory. The method can realize memory access according to the nodes, greatly reduce cross-node access, avoid synchronization between a local memory and a remote memory and reduce the time delay of memory access. Therefore, the system performance can be linearly increased along with the increase of the number of the processors by modifying the software scheme, so that the processing speed of the domestic processor is effectively improved.

Description

Memory access method and device of processor

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a memory access method and apparatus for a processor.

Background

In many memory access schemes for processors, each core can access memory banks in all nodes, for example, the processor of the Shenwei 1621, but the access time varies with the distance of the nodes, and the performance of accessing the memory bank of the node by a certain core is better than that of accessing other nodes, so the memory bank in the node is called as a local memory, and the memory banks in other nodes are called as remote memories.

In the prior art, when a core of the Shenwei 1621 processor accesses a memory, nodes are not distinguished, when a certain core accesses a local memory area, the speed is high, but when the core crosses nodes, the memory cache is damaged, two memory areas are required to be mutually synchronous, and if data migration is frequently carried out, cache loss is always generated, so that the access overhead is increased, and the system performance is also seriously influenced.

Therefore, how to better implement a memory access scheme for a processor has become an urgent problem to be solved in the industry.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for accessing a memory of a processor, so as to solve the technical problems mentioned in the foregoing background art, or at least partially solve the technical problems mentioned in the foregoing background art.

In a first aspect, an embodiment of the present invention provides a memory access method for a processor, including:

when a service flow in a processor applies for a memory, acquiring a kernel identifier in the service flow, and determining node information of the service flow according to the kernel identifier;

and applying for a memory from a memory pool corresponding to the node information so as to access the memory.

More specifically, before the step of obtaining the kernel identifier in the service flow and determining the node information of the service flow according to the kernel identifier, the method further includes:

acquiring all kernel information of a processor, and grouping the kernel information to obtain a grouped core group;

and taking each core group as a node to obtain a plurality of node information.

More specifically, after the step of obtaining information of a plurality of nodes, the method further includes:

dividing the memory accessed by the processor into a plurality of memory pools with the number corresponding to the node information;

each node information corresponds to one memory pool, and the memory pools are connected through an interconnection module to perform information interaction.

More specifically, the step of applying for a memory from a memory pool corresponding to the node information to perform memory access includes:

under the condition that the application of the memory pool corresponding to the node information is successful, performing memory access by using the memory pool corresponding to the node information;

and under the condition that the memory pool corresponding to the node information fails to apply for the memory, applying for the memory from the adjacent memory pool until the application is successful.

More specifically, the number of the cores of the processor is 16, and accordingly, the obtaining of all the core information of the processor and the grouping of the core information specifically include:

the 16 cores of the processor are divided into four groups, resulting in four core groups.

In a second aspect, an embodiment of the present invention provides a memory access device for a processor, including:

the identification module is used for acquiring a kernel identification in the service flow when the service flow in the processor applies for the memory, and determining the node information of the service flow according to the kernel identification;

and the memory access module is used for applying for a memory from the memory pool corresponding to the node information so as to access the memory.

More specifically, the apparatus further comprises: a grouping module;

the grouping module is used for acquiring all kernel information of the processor and grouping the kernel information to obtain a grouped core group;

and taking each core group as a node to obtain a plurality of node information.

More specifically, the grouping module is further configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the memory access method of the processor according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the memory access method of the processor according to the first aspect.

In the memory access method and device for the processor provided by the embodiment of the invention, the corresponding relation between multiple nodes and the processor cores is established through the kernel, the total memory is divided into the memory pools with corresponding number according to the multiple nodes, when the service function module needs the memory, the processor core is determined first, and then the memory is applied from the memory pool corresponding to the node where the kernel is located. The embodiment of the invention can realize memory access according to the nodes, greatly reduce cross-node access, avoid synchronization between the local memory and the remote memory and reduce the time delay of memory access. Therefore, the system performance can be linearly increased along with the increase of the number of the processors by modifying the software scheme, so that the processing speed of the domestic processor is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for accessing a memory of a processor according to an embodiment of the invention;

FIG. 2 is a diagram illustrating a memory access device of a processor according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Specifically, the overall scheme of the processor memory access method described in the embodiment of the present invention may be applied to a Shenwei 1621 processor, the Shenwei 1621 adopts a symmetric multi-core structure and an SoC technology, a single chip integrates 16 Shenwei processor cores with 64-bit RISC structures, 16 cores share an on-chip three-level Cache in a CC-NUMA manner, and the capacity of the three-level Cache is increased to 32 MB. The Shenwei 1621 processor is composed of 6 ring network nodes, wherein 4 ring network nodes are connected with the core group and the memory group, and two ring network nodes are connected with the PCIE channel. Each core can access the memory banks in all nodes, but the access time varies with the distance of the nodes. That is, the problem that nodes are not distinguished when a core accesses a memory exists in the Shenwei 1621 processor, when the core crosses a node, a memory cache is damaged, the two memory areas are mutually synchronized, and if data migration is frequently performed, cache loss is always generated, so that the access overhead of the Shenwei 1621 processor is increased, and the system performance is also seriously influenced.

Fig. 1 is a flowchart illustrating a memory access method of a processor according to an embodiment of the present invention, as shown in fig. 1, including:

step S1, when the service flow in the processor applies for the memory, acquiring the kernel identification in the service flow, and determining the node information of the service flow according to the kernel identification;

step S2, applying for a memory from the memory pool corresponding to the node information to perform memory access.

Specifically, the processor described in the embodiment of the present invention may be a Shenwei 1621 processor, or may be another processor that may be continuously replaced between the local memory and the remote memory during the memory access, resulting in a high latency and a low latency of the memory access.

When a service applies for a memory, firstly analyzing the service flow to obtain a kernel ID in the service flow, determining a processor kernel from which the service flow exits according to the kernel ID, simultaneously determining a node corresponding to the processor kernel according to a node message divided in advance, applying for the memory from a memory pool corresponding to the node message, and returning if the application is successful; and if the application fails, the application is carried out from the memory pool of the adjacent node of the node. For example, a Session applies for a memory, which needs to be accessed according to a node, and the basis of the node is determined as a core ID in a service flow, and if a core 0 corresponds to a node0, the core 0 is allowed to access the memory of the node0 all the time, which is the most efficient. Otherwise core 0 re-accesses node 1-3's memory.

In addition, when the system is started, the embodiment of the invention firstly establishes the relationship between the processor core and the plurality of nodes, and then divides the total memory into the memory pools with corresponding number according to the number of the plurality of nodes, thereby realizing the application of the memory in the memory pool corresponding to the node where the core is located.

The embodiment of the invention establishes the corresponding relation between the multiple nodes and the processor cores through the kernel, divides the total memory into the memory pools with corresponding number according to the multiple nodes, determines the processor core when the business function module needs the memory, and applies for the memory from the memory pool corresponding to the node where the kernel is located. The embodiment of the invention can realize memory access according to the nodes, greatly reduce cross-node access, avoid synchronization between the local memory and the remote memory and reduce the time delay of memory access. Therefore, the system performance can be linearly increased along with the increase of the number of the processors by modifying the software scheme, so that the processing speed of the domestic processor is effectively improved.

On the basis of the above embodiment, before the step of obtaining the kernel identifier in the service flow and determining the node information of the service flow according to the kernel identifier, the method further includes:

and taking each core group as a node to obtain a plurality of node information.

Specifically, the process described in the embodiment of the present invention specifically is to establish a correspondence relationship between processor cores and nodes, for example, by using the processor of the Shenwei 1621, 16 cores of the Shenwei 1621 may be divided into 4 core groups, that is, 4 nodes, and after the system is started, the processor 0-3 core of the 16 cores may be automatically checked to the first node, that is, the node0, the processor 4-7 core may be checked to the second node, that is, the node1, the processor 8-11 core may be checked to the third node, that is, the node2, and the processor 12-15 core may be checked to the fourth node, that is, the node 3.

According to the embodiment of the invention, the memory access according to the nodes is realized by establishing the corresponding relation between the processor cores and the nodes, the cross-node access is greatly reduced, the synchronization between the local memory and the remote memory is avoided, and the time delay of the memory access is reduced.

On the basis of the foregoing embodiment, after the step of obtaining information of a plurality of nodes, the method further includes:

Specifically, in the embodiment of the present invention, the total memory is divided into the corresponding number of memory pools according to the number of nodes, and each node has an independent local memory. The nodes can be connected and interacted with information through the interconnection module, so that each processor core can access the memory of the whole system.

The embodiment of the invention can effectively improve the speed of accessing the memory by setting the local memory corresponding to the access node.

On the basis of the above embodiment, the step of applying for a memory from the memory pool corresponding to the node information to perform memory access specifically includes:

Specifically, the adjacent memory pool described in the embodiment of the present invention refers to a memory pool of another node adjacent to the memory pool corresponding to the node information.

For example, a Session application memory needs to be accessed according to a node, and the basis of the node is determined to be a core ID in a service flow, and if the core 0 corresponds to the node0, the core 0 is allowed to access the memory of the node0 all the time, which is the highest efficiency. Otherwise, core 0 accesses the memory of node1-3 again, and the speed of accessing the memory pool corresponding to node0 is much higher than that of accessing the memory pool corresponding to node 1-3.

On the basis of the above embodiment, the number of the cores of the processor is 16, and accordingly, the obtaining of all the core information of the processor and the grouping of the core information specifically include:

Specifically, the 16 cores of the Shenwei 1621 have 4 core groups, namely 4 nodes, and after the system is started, the processors 0 to 3 in the 16 cores are automatically checked to the first node, namely the node0, the processors 4 to 7 are checked to the second node, namely the node1, the processors 8 to 11 are checked to the third node, namely the node2, and the processors 12 to 15 are automatically checked to the fourth node, namely the node 3.

In the embodiment of the present invention, the cores may be further divided into two groups or eight groups, which is not specifically limited in the embodiment of the present invention, and may be specifically set according to different types of processors.

Fig. 2 is a schematic diagram of a memory access device of a processor according to an embodiment of the present invention, as shown in fig. 2, including: an identification module 210 and a memory access module 220; the identification module 210 is configured to, when a service flow in a processor applies for a memory, obtain a kernel identifier in the service flow, and determine node information of the service flow according to the kernel identifier; the memory access module 220 is configured to apply for a memory from a memory pool corresponding to the node information to perform memory access.

More specifically, the apparatus further comprises: a grouping module;

and taking each core group as a node to obtain a plurality of node information.

More specifically, the grouping module is further configured to:

Specifically, when a service applies for a memory, a service flow is analyzed to obtain a kernel ID in the service flow, a processor kernel from which the service flow exits is determined according to the kernel ID, a node corresponding to the processor kernel is determined according to a pre-divided node message, a memory is applied to a memory pool corresponding to the node information, and if the application is successful, the memory is returned; and if the application fails, the application is carried out from the memory pool of the adjacent node of the node. For example, a Session applies for a memory, which needs to be accessed according to a node, and the basis of the node is determined as a core ID in a service flow, and if a core 0 corresponds to a node0, the core 0 is allowed to access the memory of the node0 all the time, which is the most efficient. Otherwise core 0 re-accesses node 1-3's memory.

The apparatus provided in the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method: when a service flow in a processor applies for a memory, acquiring a kernel identifier in the service flow, and determining node information of the service flow according to the kernel identifier; and applying for a memory from a memory pool corresponding to the node information so as to access the memory.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: when a service flow in a processor applies for a memory, acquiring a kernel identifier in the service flow, and determining node information of the service flow according to the kernel identifier; and applying for a memory from a memory pool corresponding to the node information so as to access the memory.

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing server instructions, where the server instructions cause a computer to execute the method provided in the foregoing embodiments, for example, the method includes: when a service flow in a processor applies for a memory, acquiring a kernel identifier in the service flow, and determining node information of the service flow according to the kernel identifier; and applying for a memory from a memory pool corresponding to the node information so as to access the memory.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for memory access of a processor, comprising:

2. The memory access method of the processor according to claim 1, wherein before the step of obtaining the kernel identifier in the service flow and determining the node information of the service flow according to the kernel identifier, the method further comprises:

and taking each core group as a node to obtain a plurality of node information.

3. The method of claim 2, wherein after the step of obtaining information for a plurality of nodes, the method further comprises:

4. The memory access method of the processor according to claim 1, wherein the step of applying for a memory from the memory pool corresponding to the node information to perform a memory access specifically includes:

5. The method according to claim 2, wherein the number of the cores of the processor is 16, and accordingly, the obtaining of all the core information of the processor and the grouping of the core information specifically include:

6. A memory access device for a processor, comprising:

7. The memory access device of claim 6, further comprising: a grouping module;

and taking each core group as a node to obtain a plurality of node information.

8. The memory access method of claim 7, wherein the grouping module is further configured to:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the memory access method of the processor according to any one of claims 1 to 5 when executing the program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the memory access method of the processor according to any one of claims 1 to 5.