CN115080262A

CN115080262A - Method, device, computer device and system for realizing memory sharing control

Info

Publication number: CN115080262A
Application number: CN202110351637.5A
Authority: CN
Inventors: 周轶刚; 朱晓明; 周冠锋
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-12
Filing date: 2021-03-31
Publication date: 2022-09-20

Abstract

A method, equipment, computer equipment and system for realizing memory sharing control by the method for realizing memory sharing control are provided to improve the utilization rate of memory resources. The computer device is characterized in that a memory sharing control device is arranged between a processor and a memory pool, and the processor accesses the memory pool through the memory sharing control device. Different processing units, such as processors or cores in the processors, access at least one memory in the memory pool at different time periods, so that the memory is shared by a plurality of processing units, and the utilization rate of memory resources is improved.

Description

Method, device, computer device and system for realizing memory sharing control

The present application claims priority of chinese patent application entitled "method and apparatus for implementing memory sharing control" filed by the chinese intellectual property office on 12/03/2021, application number 202110270731.8, which is incorporated herein by reference in its entirety.

Technical Field

The present application relates to the field of information technologies, and in particular, to a method, a device, and a system for controlling a shared memory.

Background

With the popularity of big data technology, the demand for computing resources for applications in various fields is increasing. Large-scale computing, typified by applications such as graph computing and deep learning, represents the latest direction of application development. Meanwhile, with the slow development of semiconductor processes, applications have been unable to continuously obtain scalable performance improvements from the update of processors, and processors based on multiple cores are becoming mainstream.

The demand for memory capacity in multi-core processor systems is increasing. The memory is an indispensable important component in the server, and the cost accounts for about 30-40% of the cost of the whole system of the server. Improving the utilization rate of the memory is an important means for reducing the total cost of the system (TCO).

Disclosure of Invention

The application provides a method, equipment, computer equipment and a system for realizing memory sharing control by using the method for realizing memory sharing control, so as to improve the utilization rate of memory resources.

In a first aspect, the present application provides a computer device, including at least two processing units, a memory sharing control device, and a memory pool, where the processing unit is a processor, a core in the processor, or a combination of cores in the processor, and the memory pool includes one or more memories;

the at least two processing units are coupled with the memory sharing control device;

the memory sharing control device is configured to allocate memories from the memory pool to the at least two processing units, respectively, where at least one memory in the memory pool is accessible to different processing units at different time periods;

the at least two processing units are used for accessing the allocated memory through the memory sharing control device.

At least two processing units in the computer equipment can access at least one memory in the memory pool in different time periods through the memory sharing control equipment, and can realize the sharing of the memory by a plurality of processing units, thereby improving the utilization rate of memory resources.

Optionally, at least one memory in the memory pool may be accessed by different processing units in different time periods, which means that any two processing units in the at least two processing units may access at least one memory in the memory pool in different time periods, respectively. For example, the at least two processing units include a first processing unit and a second processing unit, and in a first time period, a first memory in the memory pool is accessed by the first processing unit, and the second processing unit cannot access the first memory; and in a second time period, the first memory in the memory pool is accessed by the second processing unit, and the first processing unit cannot access the first memory. Optionally, the processor may be a Central Processing Unit (CPU), and one CPU may include 2 or more cores.

Alternatively, one of the at least two processing units may be one processor, a core of one processor, a combination of cores of one processor, or a combination of cores of different processors. The combination of a plurality of cores in one processor is used as one processing unit, or the combination of a plurality of cores in different processors is used as one processing unit, so that the access to the same internal memory when a plurality of different cores execute tasks in parallel can be met in a parallel computing scene, and the efficiency of the plurality of different cores executing parallel computing can be improved.

Optionally, the memory sharing control device may allocate the memories to the at least two processing units from the memory pool according to a received control instruction sent by an operating system in the computer device. Specifically, the driver in the operating system may send, through a dedicated channel, the control instruction that allocates the memory in the memory pool to the at least two processing units to the memory sharing control device. The operating system is implemented by the CPU in the computer device executing the relevant code, and the CPU running the operating system has a privileged (privileged) mode in which a driver in the operating system can send a control instruction to the memory sharing control device through a dedicated or specific channel.

Optionally, the memory sharing control device may be implemented by a Field Programmable Gate Array (FPGA) chip, an application-specific integrated circuit (ASIC), or other similar chips. The ASIC chip has defined circuit function in the first design, and has the features of high chip integration, easy realization of mass flow, low cost and small size.

In some possible implementations, the at least two processing units and the memory sharing control device are connected through a serial bus;

a first processing unit of the at least two processing units is configured to send a first memory access request in the form of a serial signal to the memory sharing device through the serial bus, where the first memory access request is used to access a first memory allocated to the first processing unit.

The serial bus has the characteristics of high bandwidth and low time delay, and the at least two processing units are connected with the memory sharing control device through the serial bus, so that the efficiency of data transmission between the processing units and the memory sharing control device can be ensured.

Optionally, the serial bus is a memory semantic bus. The memory semantic bus includes but is not limited to: a bus based on Quick Path Interconnect (QPI), peripheral component interconnect express (PCIe), Huawei Cache Coherence System (HCCS), or compute express link (CXL) protocol interconnect.

Optionally, when the first processing unit generates the memory access request, the memory access request is in the form of a parallel signal. The first processing unit may convert the memory access request in the form of a parallel signal into a first memory access request in the form of a serial signal through an interface capable of converting a parallel signal and a serial signal, such as a Serdes interface, and send the first memory access request to the memory sharing device through a serial bus.

In some possible implementations, the memory sharing control device includes a processor interface, and the processor interface is configured to:

receiving the first memory access request;

and converting the first memory access request into a second memory access request in a parallel signal form.

The processor interface converts the first memory access request into a second memory access request in a parallel signal form, so that the memory sharing control device can access the first memory, and memory sharing is realized without changing the framework of the existing memory access architecture.

Optionally, the processor interface is an interface capable of converting parallel signals into serial signals, and may be a Serdes interface, for example.

In some possible implementations, the memory sharing control device includes a control unit, where the control unit is configured to:

and establishing a corresponding relation between the memory address of the first memory in the memory pool and a first processing unit in the at least two processing units, so as to allocate the first memory for the first processing unit from the memory pool.

Optionally, the correspondence between the memory address of the first memory and the first processing unit may be dynamically adjusted. For example, the correspondence between the memory address of the first memory and the first processing unit may be dynamically adjusted as needed.

Optionally, the memory address of the first memory may be a segment of continuous physical memory addresses in the memory pool. A segment of contiguous physical memory addresses in the memory pool can simplify management of the first memory. Of course, the memory address of the first memory may also be several discontinuous physical memory addresses in the memory pool.

Optionally, the memory address information of the first memory includes a start address of the first memory and a size of the first memory. The first processing unit has an identifier, and the establishing of the correspondence between the memory address of the first memory and the first processing unit may be establishing of a correspondence between a unique identifier of the first processing unit and memory address information of the first memory.

virtualizing a plurality of virtual memory devices from the memory pool, wherein a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory;

and allocating the first virtual memory device to the first processing unit. Optionally, the virtual memory device corresponds to a segment of continuous physical memory addresses in the memory pool. The virtual memory device corresponds to a section of continuous physical memory addresses in the memory pool, and management of the virtual memory device can be simplified. Of course, the virtual memory device may also correspond to several discontinuous segments of physical memory addresses in the memory pool.

Optionally, the first virtual memory device may be allocated to the first processing unit by establishing an access control table. For example, the access control table may include an identifier of the first processing unit, an identifier of the first virtual memory device, and information such as a start address and a size of a memory corresponding to the first virtual memory device. The access control table may further include authority information of the first processing unit accessing the first virtual memory device, attribute information of a memory to be accessed (including but not limited to information on whether the memory is a persistent memory), and the like.

In some possible implementations, the control unit is further configured to:

when a preset condition is met, removing the corresponding relation between the first virtual memory device and the first processing unit;

and establishing a corresponding relation between the first virtual memory device and a second processing unit of the at least two processing units.

Optionally, the correspondence between the virtual memory device and the processing unit may be dynamically adjusted according to the requirements of the at least two processing units on the memory resources.

By dynamically adjusting the corresponding relationship between the virtual memory device and the processing unit, the requirements of different processing units on the memory resources under different service scenes can be flexibly met, and the utilization rate of the memory resources is improved.

Optionally, the preset condition may be that the amount of memory access required by the first processing unit is decreased, and the amount of memory access required by the second processing unit is increased.

Optionally, the control unit is further configured to:

when a preset condition is met, removing the corresponding relation between the first memory and the first virtual memory device; establishing a corresponding relationship between the first memory and a second virtual memory device in the plurality of virtual memory devices; allocating the second virtual memory device to a second processing unit of the at least two processing units. In this case, the access of different processing units to the same physical block in different time periods can be realized by only changing the corresponding relationship between the virtual memory device and different processing units without changing the corresponding relationship between the virtual memory device and the physical memory addresses in the memory pool.

In some possible implementation manners, the memory sharing control device further includes a cache unit, where the cache unit is configured to cache data read from the memory pool by any one of the at least two processing units, or cache obsolete data by any one of the at least two processing units.

Through the cache unit, the efficiency of the processing unit for accessing the memory data can be further improved.

Optionally, the cache unit may include a first level cache and a second level cache. The first-level cache may be a cache with a small capacity but a higher read-write speed than the second-level cache, for example, the first-level cache may be a nanosecond-level cache with a size of 100 Megabytes (MB); the second level cache may be a large-capacity cache with a lower read/write speed than the first level cache, for example, the second level cache may be a Dynamic Random Access Memory (DRAM) with a size of 1 Gigabyte (GB). Through the first-level cache and the second-level cache, the data access speed of the processor can be improved through the caches, meanwhile, the cache space is increased, the range of the processor for quickly accessing the memory through the caches is expanded, and the speed of the processor resource pool for accessing the memory is further improved on the whole.

Optionally, the data in the memory may be cached in the second level cache first, and the data in the second level cache is cached in the first level cache according to the requirement of the processing unit on the memory data. The data eliminated or temporarily not needed to be processed in the processing unit can also be cached in the first-level cache, and in order to ensure that the first-level cache has enough space for other processing units to cache data, the data eliminated by the processing units in part of the first-level cache can be cached in the second-level cache unit.

In some possible implementation manners, the memory sharing control device further includes a prefetch engine, where the prefetch engine is configured to pre-fetch, from the memory pool, data that needs to be read by any one of the at least two processing units, and cache the data in the cache unit.

Optionally, the prefetch engine may implement intelligent data prediction through a specific algorithm or an associated Artificial Intelligence (AI) algorithm, so as to further improve the efficiency of the processing unit accessing the memory data.

In some possible implementations, the memory sharing control device further includes a quality of service (Qos) engine, where the Qos engine is configured to implement optimized storage of data that needs to be cached by any one of the at least two processing units in the caching unit. Through the Qos engine, it is possible to implement that data of different processing units accessing the memory have different caching capacities in the caching unit 304. For example, a memory access request initiated by a processing unit with a high priority has an exclusive cache space in the cache unit 304, which can ensure that data accessed by the processing unit can be cached in time, thereby ensuring the quality of such processor service processing.

In some possible implementations, the memory sharing control device further includes a compression/decompression engine, where the compression/decompression engine is configured to implement compression or decompression of the data related to memory access.

Alternatively, the compression/decompression engine may be disabled.

The compression/decompression engine can compress the data written into the memory by the processing unit by using a compression ratio algorithm and taking 4 Kilobits (KB) per page as granularity and then write the data into the memory; or when the processing unit reads the compressed data in the memory, decompressing the data to be read and then sending the decompressed data to the processor. Therefore, the data transmission rate can be improved, and the efficiency of the processing unit for accessing the memory data is further improved. Alternatively, the compression/decompression engine may be shut down.

Optionally, the memory sharing control device further includes a storage unit, where the storage unit includes software code of at least one of the Qos engine, the pre-fetch engine, and the compression/decompression engine. The memory sharing control device can read the codes in the storage unit to realize corresponding functions.

Optionally, at least one of the Qos engine, the prefetch engine, and the compression/decompression engine may be implemented by control logic of the memory sharing control device.

In some possible implementations, the first processing unit further has a local memory, and the local memory is used for memory access of the first processing unit. Optionally, the first processing unit may preferentially access the local memory. The first processing unit has a faster speed of accessing the local memory, so that the speed of accessing the memory by the first processing unit can be further improved.

In some possible implementations, the plurality of memories included in the memory pool are of different media types. For example, the memory pool may include at least one of the following memory media: DRAM, Phase Change Memory (PCM), Storage Class Memory (SCM), Static Random Access Memory (SRAM), Programmable Read Only Memory (PROM), Erasable Programmable Read Only Memory (EPROM), flash Memory (NAND FLASH), spin-torque transfer Memory (STT-RAM), or resistive Memory (RRAM). The memory pool may further include a dual in-line memory module (DIMM) or a dual in-line memory module (DIMM), or a Solid State Disk (SSD).

Different memory media are used, so that the requirements of different processing units on memory resources in different services of the processor can be met. For example, DRAM has the characteristics of fast read-write speed and volatility, and the memory of DRAM can be allocated to the processing unit which initiates hot data access; the PCM has a non-volatile characteristic, and a memory of the PCM can be allocated to a processing unit that accesses data that needs to be stored for a long time. Therefore, the flexibility of memory access control can be improved while memory resources are shared.

Taking the memory pool including volatile DRAM storage medium and nonvolatile PCM storage medium as an example, the DRAM and PCM in the memory pool can be parallel architecture, and the DRAM and PCM have no upper and lower grade; the DRAM can be used as a cache, the PCM can be used as a non-parallel architecture of a main memory, the DRAM can be used as a storage medium of a first level, and the PCM can be used as a storage medium of a second level. For the architecture with parallel DRAM and PCM, the control unit can store the hot data with high access frequency on the DRAM, namely, the corresponding relation between the processing unit initiating the hot data with high access frequency and the virtual memory device corresponding to the memory of the DRAM is established, so that the read-write speed of the memory data and the service life of the main memory system can be improved. The control unit can also establish a corresponding relation between a processing unit initiating cold data with low access frequency and a virtual memory device corresponding to the memory of the PCM, and store the cold data with low access frequency on the PCM, so that the safety of important data can be ensured by utilizing the nonvolatile characteristic of the PCM. For the DRAM and the PCM non-parallel architecture, the characteristics of high integration degree of the PCM and low read-write delay of the DRAM can be utilized, the control unit can take the PCM as a main memory to store various data and take the DRAM as a cache, and therefore the efficiency and the performance of memory access can be further improved.

In a second aspect, the present application provides a system comprising at least two computer devices of the first aspect, said at least two computer devices of the first aspect being connected via a network.

One computer device of the system can access the memory pool on the computer device through the memory sharing control device, so that the utilization rate of the memory is improved, and the memory pool on other computer devices can be accessed through a network. The range of the memory pool is enlarged, and the utilization rate of memory resources can be further improved.

Optionally, the memory sharing control device in the computer device in the system may further have a function of a network adapter, and may send the access request of the processing unit to another computer device in the system through a network to access the memory in the another computer device.

Optionally, the computer device in the system may further include a network adapter having a serial-to-parallel conversion interface (e.g., Serdes interface), and the memory sharing control device in the computer device may send the memory access request of the processing unit to the other computer devices in the system through the network adapter to access the memories in the other computer devices.

Optionally, the computer devices in the system may be connected via an ethernet-based network or a universal-bus (U-bus) based network.

In a third aspect, the present application provides a memory sharing control device, where the memory sharing control device includes a control unit, a processor interface, and a memory interface;

the processor interface is used for receiving memory access requests sent by at least two processing units; wherein the processing unit is a processor, a core in a processor, or a combination of cores in a processor;

the control unit is configured to allocate memories to the at least two processing units from a memory pool, respectively, where at least one memory in the memory pool is capable of being accessed by different processing units at different time periods;

the control unit is further configured to implement, through the memory interface, access to the allocated memory by the at least two processing units.

Through the memory sharing control device, different processing units can access at least one memory in the memory pool in different time periods, the requirements of the processing units on memory resources can be met, and the utilization rate of the memory resources is improved.

Optionally, at least one memory in the memory pool may be accessed by different processing units in different time periods, which means that any two processing units in the at least two processing units may access at least one memory in the memory pool in different time periods, respectively. For example, the at least two processing units include a first processing unit and a second processing unit, and in a first time period, a first memory in the memory pool is accessed by the first processing unit, and the second processing unit cannot access the first memory; and in a second time period, the first memory in the memory pool is accessed by the second processing unit, and the first processing unit cannot access the first memory.

Optionally, the memory interface may be a Double Data Rate (DDR) controller, or the memory interface may be a memory controller with a PCM control function.

Optionally, the memory sharing control device may be implemented by an FPGA chip, an ASIC, or other similar chips.

In some possible implementations, the processor interface is further configured to receive, through the serial bus, a first memory access request sent by a first processing unit of the at least two processing units in a serial signal, where the first memory access request is used to access a first memory allocated to the first processing unit.

The serial bus has the characteristics of high bandwidth and low time delay, receives a first memory access request sent by a first processing unit in the at least two processing units in a serial signal mode through the serial bus, and can ensure the efficiency of data transmission between the processing units and the memory sharing control equipment.

Optionally, the serial bus is a memory semantic bus. The memory semantic bus includes but is not limited to: QPI, PCIe, HCCS, or CXL protocol interconnect.

In some possible implementations, the processor interface is further configured to convert the first memory access request into a second memory access request in a parallel signal form, and send the second memory access request to the control unit;

the control unit is further configured to implement, through the memory interface, the access of the second memory access request to the first memory.

In some possible implementation manners, the control unit is further configured to establish a correspondence between a memory address of the first memory in the memory pool and the first processing unit, so as to allocate the first memory to the first processing unit from the memory pool.

Optionally, a correspondence between the memory address of the first memory and the first processing unit is dynamically adjustable. For example, the correspondence between the memory address of the first memory and the first processing unit is dynamic, and may be dynamically adjusted as needed.

Optionally, the memory address information of the first memory includes a start address of the first memory and a size of the first memory. The first processing unit has an identifier, and the establishing of the correspondence between the memory address of the first memory and the first processing unit may be establishing of a correspondence between a unique identifier of the first processing unit and memory address information of the first memory. In some possible implementation manners, the control unit is further configured to virtualize a plurality of virtual memory devices from the memory pool, where a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory;

and allocating the first virtual memory device to the first processing unit.

Optionally, the virtual memory device corresponds to a segment of continuous physical memory addresses in the memory pool. The virtual memory device corresponds to a section of continuous physical memory addresses in the memory pool, and management of the virtual memory device can be simplified. Of course, the virtual memory device may also correspond to several discontinuous segments of physical memory addresses in the memory pool.

In some possible implementations, the control unit is further configured to: and when a preset condition is met, removing the corresponding relation between the first virtual memory device and the first processing unit, and establishing the corresponding relation between the first virtual memory device and a second processing unit of the at least two processing units.

Optionally, the control unit is further configured to:

when a preset condition is met, removing the corresponding relation between the first memory and the first virtual memory device; establishing a corresponding relationship between the first memory and a second virtual memory device in the plurality of virtual memory devices; allocating the second virtual memory device to a second processing unit of the at least two processing units. In this case, the access of different processing units to the same physical block in different time periods can be realized by only changing the corresponding relationship between the virtual memory device and different processing units without changing the corresponding relationship between the virtual memory device and the physical memory addresses in the memory pool. In some possible implementations, the memory sharing control device further includes a cache unit;

the cache unit is configured to cache data read from the memory pool by any one of the at least two processing units, or cache obsolete data by any one of the at least two processing units.

Optionally, the cache unit may include a first level cache and a second level cache. The first-level cache may be a cache with a small capacity but a higher read-write speed than the second-level cache, for example, the first-level cache may be a nanosecond-level cache with a size of 100 MB; the second level cache may be a large capacity but read and write speed slower than the first level cache, e.g., the second level cache may be 1GB size DRAM. Through the first-level cache and the second-level cache, the data access speed of the processor can be improved through the caches, meanwhile, the cache space is increased, the range of the processor for quickly accessing the memory through the caches is expanded, and the speed of the processor resource pool for accessing the memory is further improved on the whole.

Optionally, the prefetch engine may implement intelligent data expectation through a specific algorithm or an AI algorithm, so as to further improve the efficiency of the processing unit accessing the memory data.

In some possible implementations, the memory sharing control device further includes a quality of service Qos engine;

and the Qos engine is used for controlling and realizing the optimized storage of the data needing to be cached by any one of the at least two processing units in the cache unit. Through the Qos engine, it is possible to implement that data of different processing units accessing the memory have different caching capacities in the caching unit 304. For example, a memory access request initiated by a processing unit with a high priority has an exclusive cache space in the cache unit 304, which can ensure that data accessed by the processing unit can be cached in time, thereby ensuring the quality of such processor service processing.

In some possible implementations, the memory sharing control device further includes a compression/decompression engine;

the compression/decompression engine is used for realizing compression or decompression of data related to memory access.

Alternatively, the compression/decompression engine may be disabled.

Optionally, the compression/decompression engine may compress the data written into the memory by the processing unit through a compression ratio algorithm with 4KB per page as a granularity, and then write the compressed data into the memory; or when the processing unit reads the compressed data in the memory, decompressing the data to be read and then sending the decompressed data to the processor. Therefore, the data transmission rate can be improved, and the efficiency of the processing unit for accessing the memory data is further improved. Alternatively, the compression/decompression engine may be shut down.

Optionally, the memory sharing control device may further include a storage unit, where the storage unit includes software code of at least one of the Qos engine, the pre-fetch engine, and the compression/decompression engine. The memory sharing control device can read the codes in the storage unit to realize corresponding functions.

In a fourth aspect, the present application provides a method for memory sharing control, where the method is applied to a computer device, where the computer device includes at least two processing units, a memory sharing control device, and a memory pool, where the memory pool includes one or more memories, and the method includes:

the memory sharing control equipment receives a first memory access request sent by a first processing unit of at least two processing units; wherein the processing unit is a processor, a core in a processor, or a combination of cores in a processor;

the memory sharing control device allocates a first memory to the first processing unit from a memory pool, and the first memory can be accessed by a second processing unit of the at least two processing units in other time periods;

and the first processing unit accesses the first memory through the memory sharing control equipment.

By the method, different processing units access at least one memory in the memory pool in different time periods, the requirements of the processing units on the memory resources can be met, and the utilization rate of the memory resources is improved.

In some possible implementations, the method further includes:

the memory sharing control device receives a first memory access request sent by a first processing unit of the at least two processing units in a serial signal form through a serial bus, wherein the first memory access request is used for accessing a first memory allocated to the first processing unit.

In some possible implementations, the method further includes:

and the memory sharing control equipment converts the first memory access request into a second memory access request in a parallel signal form and realizes the access to the first memory according to the second memory access request.

In some possible implementations, the method further includes:

the memory sharing control device establishes a corresponding relationship between a memory address of the first memory in the memory pool and a first processing unit in the at least two processing units.

In some possible implementations, the method further includes:

virtualizing a plurality of virtual memory devices from the memory pool by the memory sharing control device, wherein a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory;

the memory sharing control device is further configured to allocate the first virtual memory device to the first processing unit.

In some possible implementations, the method further includes:

when a preset condition is met, the memory sharing control device releases the corresponding relation between the first virtual memory device and the first processing unit, and establishes the corresponding relation between the first virtual memory device and a second processing unit of the at least two processing units.

In some possible implementations, the method further includes:

the memory sharing control device caches data read from the memory pool by any one of the at least two processing units, or caches obsolete data by any one of the at least two processing units.

In some possible implementations, the method further includes:

and the memory sharing control equipment pre-fetches the data to be read by any one of the at least two processing units from the memory pool and caches the data.

In some possible implementations, the method further includes:

and the memory sharing control equipment controls the optimized storage of the data needing to be cached by any one of the at least two processing units in the cache storage medium.

In some possible implementations, the method further includes:

data associated with the memory access is compressed or decompressed.

In a fifth aspect, an embodiment of the present application further provides a chip, where the chip is configured to implement the function implemented by the memory sharing control device in the third aspect.

In a sixth aspect, the present application further provides a computer-readable storage medium including program code including instructions for performing some or all of the steps of any one of the methods provided in the fourth aspect.

In a seventh aspect, this application embodiment also provides a computer program product, which when run on a computer causes any one of the methods provided in the fourth aspect to be performed.

It should be understood that any memory sharing control device, computer readable storage medium, or computer program product provided above is used for executing the corresponding method provided above, and therefore, the beneficial effects achieved by the memory sharing control device can refer to the beneficial effects in the corresponding method, and are not described herein again.

Drawings

While the drawings needed for the description of the embodiments will be briefly described below, it should be apparent that the drawings in the following description are merely examples of the invention and that other drawings may be derived from those drawings by those skilled in the art without inventive faculty.

FIG. 1A is a diagram of a centralized shared memory system architecture;

FIG. 1B is a diagram illustrating a distributed shared memory system;

fig. 2A is a schematic structural diagram of a memory sharing control device 200 according to an embodiment of the present disclosure;

fig. 2B is a schematic diagram of the connection relationship between the memory sharing control device 200 and the processor and the memory;

fig. 3 is a schematic diagram of an internal structure of an SRAM-type FPGA according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a Serdes interface according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating an internal structure of a memory controller 500 according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a processor 210 according to an embodiment of the present application;

fig. 7A is a schematic structural diagram of a memory sharing control device 300 according to an embodiment of the present application;

fig. 7B is a schematic structural diagram of another memory sharing control device 300 according to an embodiment of the present disclosure;

fig. 7C is a schematic structural diagram of the memory sharing control device 300 according to the embodiment of the present application when the memory sharing control device includes a cache unit;

fig. 7D is a schematic structural diagram of the memory sharing control device 300 according to the embodiment of the present application when the memory sharing control device includes a storage unit;

fig. 7E is a schematic diagram illustrating a connection relationship structure between the memory sharing control device 300 and the memory pool according to an embodiment of the present application;

fig. 7F is a schematic structural diagram of another connection relationship between the memory sharing control device 300 and the memory pool according to the embodiment of the present application;

FIG. 8A-1 is a schematic diagram of a computer device 80a according to an embodiment of the present application;

FIG. 8A-2 is a schematic diagram of another computer device 80a according to an embodiment of the present disclosure;

FIG. 8B-1 is a schematic structural diagram of a computer device 80B according to an embodiment of the present disclosure;

FIG. 8B-2 is a schematic diagram of another computer device 80B according to an embodiment of the present disclosure;

fig. 9A is a schematic structural diagram of a system 901 according to an embodiment of the present application;

fig. 9B is a schematic structural diagram of a system 902 according to an embodiment of the present application;

fig. 9C is a schematic structural diagram of a system 903 according to an embodiment of the present disclosure;

fig. 10 is a logic diagram illustrating a memory sharing implementation according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a computer device 1100 according to an embodiment of the present application;

fig. 12 is a flowchart illustrating a method for memory sharing control according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

In the description and claims of this application, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved. The division of the units presented in this application is a logical division, and in practical applications, there may be another division, for example, multiple units may be combined or integrated into another system, or some features may be omitted, or not executed, and in addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, and the indirect coupling or communication connection between the units may be in an electrical or other similar form, which is not limited in this application. Furthermore, the units or sub-units described as the separate parts may or may not be physically separate, may or may not be physical units, or may be distributed in a plurality of circuit units, and some or all of the units may be selected according to actual needs to achieve the purpose of the present disclosure.

It is to be understood that the terminology used in the description of the various described examples in the description and claims of this application is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various described examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the specification and claims of this application refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present application generally indicates that the preceding and following related objects are in an "or" relationship.

It should be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also understood that the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.

It should be appreciated that reference throughout this specification to "one embodiment," "an embodiment," "one possible implementation" means that a particular feature, structure, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "one possible implementation" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

First, some terms and related technologies referred to in this application are explained to facilitate understanding:

the memory controller: the computer system is an important component for controlling the memory inside the computer system and realizing data exchange between the memory and the processor through the memory, and is a bridge for communication between the central processing unit and the memory. The main function of a memory controller is to read and write operations to and from a memory, which can be roughly classified into a conventional type and an integrated type. In conventional computer systems, the memory controller is located in the north bridge chip of the motherboard chipset. Under this structure, data transmission between any CPU and the memory passes through the path of 'CPU-Northbridge-memory-Northbridge-CPU'. Because the CPU reads and writes data from the memory, the data transmission is carried out in multiple stages, and the delay is large. The integrated memory controller is located in the CPU, and data transmission between any CPU and the memory needs to pass through a path of 'CPU-memory-CPU', so that compared with the traditional memory controller, the time delay of data transmission is greatly reduced.

DRAM: is a widely used memory medium. Unlike sequential access to disk media, DRAM allows a central processor to have direct random access to any of the bytes therein. The memory structure of the DRAM is simple, and each memory structure mainly comprises a capacitor and a transistor. When the capacitor is charged, it represents the storage of a data "1". The state after the discharge is completed represents data "0".

PCM: is a non-volatile memory that stores information based on a phase change memory material. Each memory cell in a PCM is made of a phase change material (e.g., chalcogenide glass) and two electrodes. The phase-change material can be converted between the crystalline state and the amorphous state by changing the voltage magnitude and the electrifying time of the electrodes. When the medium is in a crystalline state, the resistance is low; when in the amorphous state, the resistance is high. Data can be stored by changing the state of the phase change material. The most typical characteristic of PCMs is non-volatile.

Serializer/deserializer (serializer/deserializer, Serdes): converting the parallel data into serial data at a sending end, and transmitting the serial data to a receiving end through a transmission line; the serial data is converted into parallel data at the receiving end, so that the number of transmission lines can be reduced, and the system cost is reduced. Serdes is a Time Division Multiplexing (TDM) and point-to-point communication technology, in which multiple low-speed parallel signals (i.e., parallel data) at a transmitting end are converted into high-speed serial signals (i.e., serial data), and the high-speed serial signals are finally converted back into low-speed parallel signals at a receiving end through a transmission medium. Serdes adopts differential signal transmission, so that interference noise loaded on two differential transmission lines is mutually offset, the transmission speed is increased, and the signal transmission quality can be improved. The parallel interface technology refers to multi-bit data parallel transmission, and simultaneously transmits a synchronous clock to divide data bytes, so that the method is simple and easy to implement, but the number of signal lines is large, and the parallel interface technology is generally used for short-distance data transmission. The serial interface technology transmits byte data bit by bit and is widely applied to remote data communication.

As the state of the art of integrated circuits continues to improve, and in particular as the design of processor architectures continues to advance, the performance of processors has increased. The speed of memory performance increase is much slower compared to processors. As the gap grows cumulatively, the cumulative growth of this imbalance has the consequences of: the memory access speed is seriously behind the calculation speed of the processor, and the bottleneck formed by the memory can cause the advantage of a high-performance processor to be difficult to exert. For example, with the increasing High Performance Computing (HPC), the access speed of the memory is greatly restricted.

Meanwhile, a multi-core processor gradually replaces a single-core processor, and the number of accesses to a memory (such as an off-chip memory, also referred to as a main memory) by multi-core parallel execution in the processor is also greatly increased. This results in a corresponding increase in bandwidth requirements between the processor and the memory.

The access speed and bandwidth between the processor and the memory are improved, and the access speed and bandwidth are generally realized by sharing memory resources.

The architecture of the shared memory of the multiprocessor can be divided into a centralized shared memory system and a distributed shared memory system according to whether the access from the processor to the memory is different. The centralized shared memory system has the characteristics of less processors and single interconnection mode, and the memory is connected with all the processors in a centralized mode through a cross switch or a shared bus. FIG. 1A is a typical centralized shared memory system architecture. Because access to memory is equal or symmetric across all processors, such architectures are also referred to as Unified Memory Architecture (UMA) or Symmetric Multiprocessing (SMP).

The centralized shared memory system has a single memory system, and therefore, the problem that the required memory access bandwidth cannot be provided after the number of processors reaches a certain scale is faced, which becomes a bottleneck restricting performance. The distributed shared memory system effectively solves the problem. Fig. 1B is a schematic structural diagram of a distributed shared memory system. As shown in fig. 1B, in this system, memory is globally shared and uniformly addressed and distributed across the various processors. The address space of the memory is divided into several parts, which are managed by the processors respectively. Illustratively, if the processor 1 is to access the memory address space managed by the present processor, no cross-processor or interconnect bus is required. If the processor 1 accesses a memory space managed by another processor, for example, to access a memory address space managed by the processor N, it needs to go via the interconnect bus. A distributed shared memory system is also referred to as a Non-uniform memory access (NUMA) system.

In the NUMA system, address spaces of shared memories are managed by processors, and due to lack of a uniform memory management mechanism, when one processor needs to use memory spaces managed by other processors, sharing of memory resources is not flexible enough, and a problem of low memory resource utilization rate exists. Moreover, when the processor accesses a memory address space that is not managed by the processor, the delay is often large due to the interconnection bus.

Embodiments of the present application provide an apparatus, a chip, a computer apparatus, a system, and a method for implementing memory sharing control, and provide a brand new memory access architecture, where a bridge for access between a processor and a shared memory pool (which may also be referred to as a memory pool in this embodiment) is established by using a memory sharing control apparatus, so as to improve the utilization rate of memory resources.

Fig. 2A is a schematic structural diagram of a memory sharing control device 200 according to an embodiment of the present disclosure. As shown in fig. 2A, the memory sharing control apparatus 200 includes: a control unit 201, a processor interface 202 and a memory interface 203.

The memory sharing control device 200 may be a chip located between a processor (CPU or a core in the CPU) and a memory (also referred to as a main memory) in a computer device, and may be an FPGA chip, for example.

Fig. 2B is a schematic diagram of the connection relationship between the memory sharing control device 200 and the processor and the memory. As shown in fig. 2B, the processor 210 and the memory sharing control device 200 are connected through the processor interface 202, and the memory 220 and the memory sharing control device 200 are connected through the memory interface 203. Wherein, the processor 210 may be a CPU or a CPU including a plurality of cores; memory 220 includes, but is not limited to, DRAM, PCM, flash, SCM, SRAM, PROM, EPROM, STT-RAM, or RRAM, etc. SCM is a hybrid storage technology that combines the characteristics of both traditional storage devices and memory, memory-class memory providing faster read and write speeds than hard disks, but slower operation speed and lower cost than DRAM. In the embodiment of the present application, the memory 220 may further include a DIMM or an SSD.

The processor interface 202 is an interface between the memory sharing control device 200 and the processor 210, and is capable of receiving a serial signal transmitted by the processor and converting the serial signal into a parallel signal. Based on the processor interface 202, the memory sharing control device 200 and the processor 210 may be connected through a serial bus. The serial bus has the characteristics of high bandwidth and low time delay, and can ensure the efficiency of data transmission between the processor 210 and the memory sharing control device 200. Illustratively, the processor interface 202 may be a low latency based Serdes interface. The Serdes interface as the processor interface 202 is connected to the processor via a serial bus, and can convert between a serial signal and a parallel signal based on serial-parallel conversion logic. The serial bus may be a memory semantic bus. The memory semantic bus includes but is not limited to: buses interconnected based on QPI, PCIe, HCCS or CXL protocols.

In particular implementations, the processor 210 may be connected to a serial bus via a Serdes interface and to the processor interface 202 (e.g., Serdes interface) of the memory sharing control device 200 via the serial bus. The processor 210 initiates that the memory access request is a memory access request in the form of a parallel signal, converts the memory access request in the form of a parallel signal into a memory access request in the form of a serial signal through a Serdes interface in the processor 210, and transmits the memory access request through a serial bus. After receiving the memory access request in the form of a serial signal from the processor 210 via the serial bus, the processor interface 202 converts the memory access request in the form of a serial signal into a memory access request in the form of a parallel signal, and sends the converted memory access request to the control unit 301. The control unit 301 may access the corresponding memory based on the memory access request in the form of a parallel signal. The corresponding memories may be accessed in a parallel manner, for example. In the embodiment of the present application, the parallel signal may refer to a signal that transmits a plurality of bits at a time, and the serial signal may refer to a signal that transmits one bit at a time.

Likewise, when the memory sharing control device 200 returns a response message of a memory access request to the processor 210, the response message in the form of a parallel signal is converted into a response message in the form of a serial signal through the processor interface 202 (e.g., Serdes interface), and the response message in the form of a serial signal is transmitted to the processor 210 through the serial bus. After receiving the response message in the form of serial signal, the processor 210 converts the response message in the form of serial signal into parallel signal and then performs subsequent processing.

The memory sharing control device 200 may implement access to a corresponding memory in the memory 220 through the memory interface 203 as a memory controller. For example, when the memory 220 is a shared memory pool composed of DRAMs, the memory interface 203 is a DDR controller having a DRAM control function, and is used to implement interface control of a DRAM storage medium; when the memory 220 is a shared memory pool composed of PCM, the memory interface 203 is a memory controller with PCM control function, and is used to implement interface control of the PCM storage medium.

It should be noted that one processor 210 shown in fig. 2B is only an example, and the processor connected to the memory sharing control device 200 may also be a multi-core processor or a processor resource pool. The processor resource pool includes at least two processing units, each of which may be a processor, a core within a processor, or a combination of cores within a processor. The processing units in the processor resource pool may be a combination of different cores in the same processor or a combination of different cores in different processors. When the processor executes different tasks, a plurality of cores are required to execute the computing task in parallel, or the cores in different processors are required to execute the computing task in parallel. When the cores execute the computing tasks in parallel, the combination of the cores can be used as a processing unit to access the same block of memory in the shared memory pool.

The memory 220 shown in fig. 2B is only an example, and the memory 220 connected to the memory sharing control apparatus 200 may also be a shared memory pool formed by a plurality of memories. At least one memory in the shared memory pool is accessible to different processing units at different time periods. The memory in the shared memory pool includes, but is not limited to, DRAM, PCM, flash, STT-RAM, or RRAM. Similarly, the memory in the shared memory pool may be a memory on one computer device, or may be a memory on different computer devices. It should be understood that the computer device may be a device that requires a processor to access a memory, such as a computer (desktop computer or laptop computer) or a server, and may also include a terminal device, such as a mobile phone terminal, and the embodiment of the present application is not limited to a specific form of the device.

The control unit 201 is configured to implement control of memory access according to the memory access request, including but not limited to dividing the memory resource in the shared memory pool into multiple independent memory resources, and allocating (for example, allocating as needed) the memory resource to each processing unit in the processor resource pool. The independent memory resource partitioned by the control unit 201 may be a memory storage space corresponding to a segment of physical address in a shared memory pool. The physical addresses of the memory resources may be continuous or discontinuous. For example, the memory sharing control device 200 may virtualize a plurality of virtual memory devices based on a shared memory pool, where each virtual memory device corresponds to or manages a portion of memory resources. The control unit 201 allocates a plurality of independent memory resources divided in the shared memory pool to each processing unit in the processor resource pool respectively by establishing corresponding relationships between different virtual memory devices and the processing units.

However, the correspondence between the processing units and the memory resources is not fixed, and the correspondence may be adjusted when a certain condition is satisfied. That is, the correspondence between the processing units and the memory resources can be dynamically adjusted. The control unit 201 adjusts the corresponding relationship between the processing unit and the memory resource, which may be by receiving a control instruction sent by a driver in the operating system, and implementing the adjustment of the corresponding relationship according to the control instruction. The control instruction includes information for deleting, changing or adding the corresponding relationship.

Illustratively, the computer device 20 (not shown) includes the processor 210, the memory sharing device 200, and the memory 220 shown in FIG. 2B. The processor 210 runs an operating system that the computer device 20 needs to run, and realizes control of the computer device 20. Assume that the computer device 20 is a server providing cloud services and there are 8 cores in the processor 210, where core a provides cloud service services for a user and core B provides cloud service services for B user. Based on the service needs of the user a and the user B, the operating system of the computer device allocates the memory resource a in the memory 220 as a resource for memory access to the core a and allocates the memory resource B in the memory 220 as a resource for memory access to the core B, respectively. The operating system may send a control instruction for establishing a correspondence between the core a and the memory resource a and a correspondence between the core B and the memory resource B to the memory sharing control device 200. The memory sharing control device 200 establishes a correspondence between the core a and the memory resource a and a correspondence between the core B and the memory resource B according to a control instruction of the operating system. In this way, when the core a initiates a memory access request, the memory sharing control device 200 may determine, according to information carried in the access request, a memory resource (i.e., memory resource a) that the core a can access, and implement access of the core a to the memory resource a. When the user a needs to take a rest due to business requirements or time zones and the like and no longer uses the cloud service, which reduces the demand on the memory resource, and the user B needs to use more cloud services due to business requirements or time zones and the like and needs more memory resources, the operating system of the computer device 20 may send a control instruction for resolving the correspondence between the core a and the memory resource a and allocating the memory resource a to the core B for use, based on the change of the business requirements of the user a and the user B. The operating system, which may be specifically a driver in the operating system, may send a control instruction for deleting the correspondence between the core a and the memory resource a and establishing the correspondence between the core B and the memory resource a, and the memory sharing control device 200 reconfigures the correspondence between the memory resource a based on the control instruction sent by the driver in the operating system, deletes the correspondence between the core a and the memory resource a, and establishes the correspondence between the core B and the memory resource a. Therefore, the memory resource A can be used as the memories of the core A and the core B in different time periods, the requirements of different cores for different services can be met, and the utilization rate of the memory resource is improved.

The driver of the operating system sends the control command to the memory sharing control device 200, and the control command may be sent through a dedicated or specific channel. Specifically, when the processor running the operating system is in a privileged (privileged) mode, the driver in the operating system can send the control instruction to the memory shared control device 200 through a dedicated or specific channel. In this way, the driver in the operating system can send the control instruction for deleting, changing or adding the corresponding relation through the special channel.

The memory sharing control device 200 may be connected to the processor 210 through an interface (e.g., Serdes interface) supporting serial-to-parallel conversion, and the processor 210 may communicate with the memory sharing control device 200 through a serial bus, so that an access rate of the processor 210 to the shared memory pool may be guaranteed even if the processor 210 communicates with the memory sharing control device 200 at a long distance based on the characteristics of the serial bus, such as a high bandwidth and a low latency.

Moreover, the control unit 201 may also be configured to implement cache control on data, compression control on data, or priority control on data. Thereby further improving the efficiency and quality of the processor accessing the memory.

The following takes an FPGA as an example of a chip for implementing the memory sharing control device 200, and an implementation manner of the memory sharing control device 200 provided in the embodiment of the present application is exemplarily described.

As a programmable logic device, FPGAs can be classified into three categories according to different principles of programming: SRAM type, antifuse type, and FLASH type based on Static Random Access Memory (SRAM). Due to the erasability and volatility of SRAM, SRAM-based FPGAs are reprogrammable but lose power-down configuration data. The antifuse-type FPGA can be programmed only once, and the circuit function is fixed after programming and cannot be modified again, so that the circuit function cannot be changed even in a power-off state.

Taking an SRAM-based FPGA as an example, an internal structure of the FPGA is exemplarily described below. Fig. 3 is a schematic diagram of an internal structure of an SRAM type FPGA. As shown in fig. 3, the FAGA includes at least the following parts inside:

programmable logic block (CLB): the interior of the chip mainly comprises programmable resources such as lookup tables (LUTs), multiplexers, carry chains, D triggers and the like, is used for realizing different logic functions, and is the core of the whole FPGA chip.

Programmable input/output block (IOB): and providing an interface between the FPGA and an external circuit, and providing appropriate drive of input/output signals to achieve matching when the internal and external electrical characteristics of the FPGA are different. Through Electronic Design Automation (EDA) development software, different electrical standards and physical information can be configured according to requirements, for example, the magnitude of driving current is adjusted, and the resistance values of pull-up resistor and pull-down resistor are changed. Typically several IOBs are divided into a group (bank). The number of IOBs contained in each group will vary from one family of FPGA chips to another.

Embedded block memory (BRAM): the memory unit is used for storing data with a large data volume, and in order to meet different data reading and writing requirements, the memory unit can be configured as a single-port RAM, a dual-port RAM, a Content Addressable Memory (CAM), a First In First Out (FIFO) and other common memory structures, and the memory bit width and depth can be changed according to design requirements. The BRAM can expand the application range of the FPGA and can improve the flexibility of the FPGA.

Switch Matrix (SM): the FPGA-based intelligent control system is an important part of FPGA internal Interconnection Resources (IR), and is mainly distributed at the left end of each resource module, and the switch matrixes at the left ends of different modules are very similar but have different differences and play a role in connecting each module resource. Another part of interconnection resources in the FPGA are interconnection metal wires (Wire segments), and the interconnection metal wires and the SM are used in cooperation to play a role in connecting the resources of the whole chip.

Fig. 3 only shows a few main components of the FPGA chip related to the implementation of the memory sharing control device 200 in the embodiment of the present application. In a specific implementation, besides the components shown in fig. 3, the FPGA may further include other components or embedded functional units, such as a Digital Signal Processor (DSP), a Phase Locked Loop (PLL), a Multiplier (MUL), or the like.

The control unit 201 in fig. 2A or fig. 2B may be implemented by CLB in fig. 3. That is, the CLB is used to control the shared memory pool connected to the memory sharing control device 200, for example, the memory resources in the shared memory pool are divided into a plurality of blocks, and one or more blocks of memory resources are allocated to one processing unit; or virtualizing a plurality of virtual memory devices based on memory resources in the shared memory pool, wherein each virtual memory device corresponds to a physical address space in a segment of the shared memory pool, allocating one or more virtual memory devices to one processing unit, and establishing a correspondence table between the allocated virtual memory devices and the corresponding processing units.

The processor interface 202 in fig. 2A or fig. 2B may be implemented by the IOB in fig. 3, that is, an interface with a serial-to-parallel conversion function may be implemented by the IOB, for example, a Serdes interface is implemented by the IOB.

Fig. 4 is a schematic diagram of a specific structure of a Serdes interface. As shown in fig. 4, the Serdes interface is mainly composed of a transmission channel and a reception channel. In the transmitting channel, the input parallel data is encoded by an encoder, converted into a serial signal by a parallel-to-serial module, and then driven by a transmitter (Tx) to output the serial data. In a receiving channel, a receiver and a clock recovery circuit recover a sampling clock and data, a serial-parallel conversion module finds out byte boundaries and converts the byte boundaries into parallel data, and finally, a decoder completes the recovery of the final original parallel data.

The encoder and the decoder complete the data encoding and decoding functions, and ensure the serial data DC balance and the data jump as much as possible. For example, 8b/10b and irregular coding (Scrambling/Desrambling) coding schemes may be used. And the parallel-serial module and the serial-parallel module are used for completing the conversion of the data between the parallel form and the serial form. The clock generation circuit generates a conversion clock for the parallel-to-serial circuit, typically implemented by a phase-locked loop. The clock generation circuit and the clock recovery circuit provide the conversion control signal for the serial-to-parallel circuit, and are generally implemented by a phase-locked loop, but may also be implemented by a phase interpolator, etc.

The above description describes an implementation of a Serdes interface by way of example only, i.e., the Serdes interface shown in fig. 4 may be implemented based on the IOB in fig. 3. Of course, the functions of the Serdes interface may also be implemented based on other hardware components, for example, by using other special hardware components in the FPGA, and the embodiment of the present application does not limit the specific implementation form of the Serdes interface.

The memory interface 203 in fig. 2A or fig. 2B may be implemented based on the IOB or other special circuits. For example, when the memory interface 203 is implemented by a DDR controller, the logic structure thereof may be as shown in fig. 5.

Fig. 5 is a schematic diagram of an internal structure of a memory controller 500. Referring to fig. 5, a memory controller 500 includes:

the receiving module 501: for recording the access request information. An access request mainly comprises the type of request and the address of the request. From these two pieces of information, it is known what operation the access request is to perform on which block memory address. The information of the access request recorded by the receiving module 501 may include the type of the request and the address of the request. In addition, the information recorded by the receiving module 501 may also include some auxiliary information for estimating the system performance, such as the arrival time and completion time of the access request.

The control module 502: the method is used for controlling initialization, power-off and the like of the memory. Meanwhile, the control module 502 may also control the depth of a memory queue for implementing memory access control, determine whether the memory queue is empty or full, determine whether a memory request is completed, determine which arbitration scheme to use, which scheduling method to use, and the like.

The address mapping module 503: for realizing the conversion between the address of the access request and the address recognizable by the memory. Illustratively, for the DDR4 memory system, the memory address is divided into six parts, namely Channel, Rank, Bank group, Bank, Row, Column. Different address mapping modes have different access efficiencies.

The refresh module 504: for implementing a timed refresh of the memory. The DRAM is internally composed of a plurality of repeated cells (cells), each cell is composed of a transistor (Mosfet) and a capacitor. The capacitor is used to store charge and determines whether the logic state of a DRAM cell is a 1 or a 0. But because the capacitor leaks electricity, charge is lost every time, and data is lost, so that the refresh module 504 needs to refresh periodically.

The scheduling module 505: for respectively scheduling the access requests into different queues based on the type of the request according to the access requests sent by the address mapping module 503. For example, an access request may be scheduled to a queue with a high priority, and a request with the highest priority is selected from the queue with the highest priority according to a preset scheduling policy, so as to complete one scheduling. Wherein, the queue is a queue for realizing memory access control; the scheduling strategy can be determined according to the arrival time sequence of the requests, and the priority of the arrival time is high; or the scheduling policy may be determined based on which request is ready first, etc.

It should be noted that fig. 5 only shows some components or functional blocks of the memory controller 500, and in this embodiment, the memory controller 500 may further include other components or functional blocks. For example, the memory controller 500 may further include a vila engine for multi-thread computing or a Direct Memory Access (DMA) module for DMA, which are not described in detail herein.

The above describes an implementation manner of the memory sharing control device 200 by taking an FPGA as an example. In a specific implementation, the memory sharing control device 200 may also be implemented by other chips or other devices capable of implementing chip-like functions. For example, the memory sharing control device 200 may also be implemented by an ASIC. The ASIC chip has defined circuit function in the first design, and has the features of high chip integration, easy realization of mass flow, low cost and small size. The embodiment of the present application does not limit the specific hardware implementation manner of the memory sharing control device 200.

In the embodiment of the present application, the processor connected to the memory sharing control device 200 may be any processor that implements a function of a processor. Fig. 6 is a schematic structural diagram of a processor 210 according to an embodiment of the present disclosure. As shown in fig. 6, the processor 210 includes a core 601, a memory 602, and peripherals and interfaces 603 and the like. Core 601 may include more than one core for implementing the functions of processor 600. Fig. 6 illustrates two cores (core 1 and core 2) as an example, but this does not limit the number of cores in the processor 600, and the processor 600 may further include 4, 8, or 16 cores. The memory 602 includes a cache or SRAM for implementing caching of data read from and written to the core 1 or core 2. Peripherals and interfaces 603 include Serdes interface 6031, memory controller 6032, input output interfaces, power supply, and clocks. The Serdes interface 6031 is an interface between the processor 210 and a serial bus, and a memory access request in the form of a parallel signal initiated by the processor 210 is converted into a serial signal by the Serdes interface 6031 and then transmitted to the memory sharing control device 200 through the serial bus. Memory controller 6032 may be a memory controller that functions similarly to the memory controller shown in fig. 5, and when processor 210 has a local memory it controls, processor 210 may control access to the local memory via memory controller 6031.

It is to be understood that fig. 6 is a schematic structural diagram illustrating an implementation manner of a processor, and the embodiment of the present application is not limited to a specific structure or form of the processor connected to the memory sharing control device 200, and as long as the processor can implement a certain calculation or control function, the present application is within the scope of the disclosure of the embodiment of the present application.

The following further describes a specific implementation of the memory sharing control device provided in the embodiment of the present application.

Fig. 7A is a schematic structural diagram of a memory sharing control device 300 according to an embodiment of the present disclosure. As shown in fig. 7A, the memory sharing control apparatus 300 includes a control unit 301, a processor interface 302, and a memory interface 303. The specific implementation of the memory sharing control device 300 shown in fig. 7 may be implemented with reference to the implementation of the memory sharing control device 200 shown in fig. 2A or fig. 2B, or may be implemented with reference to the FPGA shown in fig. 3. Specifically, the control unit 301 may be implemented by referring to the implementation of the control unit 201 in fig. 2A or fig. 2B, and may be implemented by a CLB shown in fig. 3. The processor interface 302 may be implemented by referring to the Serdes interface shown in fig. 4, and the memory interface 303 may be implemented by referring to the memory controller shown in fig. 5, which are not described again.

Specifically, the control unit 301 in fig. 7A may be configured to implement the following functions:

1, virtualizing a plurality of virtual memory devices based on memory resources connected to a memory sharing control device 300;

the memory resources to which the memory sharing control device 300 is connected constitute a shared memory pool. The control unit 301 may uniformly address the memory resources in the shared memory pool, and divide the uniformly addressed physical address space of the memory into a plurality of address segments, where each address segment corresponds to one virtual memory device. The size of the address space corresponding to the divided address segments can be the same or different. That is, the size of each virtual memory device may be the same or different.

The virtual memory device is not an actually existing device, but the control unit 301 is configured to identify a segment of memory address space in the shared memory pool, where the segment of memory address space is allocated to a processing unit (which may be a processor, a core in the processor, a combination of cores in the same processor, or a combination of cores in different processors) for accessing (e.g., reading and writing) a memory, and is referred to as a virtual memory device. Illustratively, each virtual memory device corresponds to a memory region with consecutive physical addresses. Optionally, one virtual memory device may also correspond to a non-contiguous physical address space.

The control unit 301 may assign an identifier to each virtual memory device for identifying different virtual memory devices. FIG. 7A illustrates two virtual memory devices: virtual memory device a and virtual memory device b. The virtual memory device a and the virtual memory device b respectively correspond to different memory address spaces in the shared memory pool. 2, allocating a plurality of virtualized virtual memory devices to the processing units connected to the memory sharing control device 300;

control unit 301 may assign a virtual memory device to a processing unit. To avoid possible complex logic or possible traffic storms, control unit 301 avoids allocating one virtual memory device to multiple processors or one virtual memory device to multiple cores in one processor when allocating virtual memory devices. However, when part of services need different cores in the same processor to execute the computation tasks in parallel, or need different cores in different processors to execute the computation tasks in parallel, the memory corresponding to one virtual memory device is allocated to the combination of the cores through complex logic implementation, so that the efficiency of service processing during parallel computation can be improved.

The way for the control unit 301 to allocate the virtual memory device may be to establish a corresponding relationship between the identifier of the virtual memory device and the identifier of the processing unit. Illustratively, the control unit 301 establishes a correspondence between the virtual memory device and different processing units based on the number of processing units connected to the memory sharing control device 300. Optionally, the control unit 301 may also establish a corresponding relationship between the processing unit and the virtual memory device, and establish a corresponding relationship between the virtual memory device and different memory resources, so as to establish a corresponding relationship between the processing unit and different memory resources.

3, recording the corresponding relation between the virtual memory device and the distributed processing units;

in a specific implementation, the control unit 301 may maintain an access control table (also referred to as a mapping table) for recording a corresponding relationship between the virtual memory device and the processing unit. One implementation of an access control table may be as shown in table 1:

TABLE 1

In table 1, Device _ ID represents an identifier of a virtual memory Device, Address represents a start Address of a physical memory Address managed or accessible by the virtual memory Device, Size represents a memory Size managed or accessible by the virtual memory Device, and Access Attribute represents an Access mode, i.e., whether a read operation or a write operation; resource ID represents the identity of the processing unit.

The Resource ID in Table 1 typically corresponds to a processing unit. Since a processing unit may be one processor, one core in one processor, a combination of multiple cores in one processor, or a combination of multiple cores in different processors, the control unit 301 may further maintain a corresponding relationship table of Resource IDs and combinations of cores to clarify information of the core or processor corresponding to each processing unit. For example, table 2 shows exemplary correspondence of Resource ID to core:

TABLE 2

Because cores in different processors have uniform identification in a computer device, the IDs of the cores in table 2 can distinguish different cores in different processors. It is to be understood that table 2 is merely exemplary to show the correspondence between the Resource ID of a processing unit and the corresponding core or the corresponding processor. The embodiment of the present application does not limit the manner in which the memory sharing control device 300 determines the correspondence between the Resource ID and the corresponding core or the corresponding processor.

In another implementation, if the memory to which the memory sharing control device 300 is connected includes DRAM and PCM, the access control table maintained by the control unit 301 may further include whether each virtual memory device is a persistent virtual memory device or a non-persistent virtual memory device due to the non-persistent characteristic of the DRAM storage medium and the persistent characteristic of the PCM storage medium.

Table 3 is another implementation manner of the access control table provided in the embodiment of the present application:

TABLE 3

In table 3, Persistent Attribute represents a Persistent Attribute of the virtual memory device, that is, represents whether a memory address space corresponding to the virtual memory device is Persistent or non-Persistent.

Optionally, the access control table maintained by the control unit 301 may further include other information for implementing further memory access control. Illustratively, the access control table may further include authority information of the processing unit accessing the virtual memory device, where the authority information includes, but is not limited to, read-only access or write-only access.

When a memory access request sent by a processing unit is received, determining a virtual memory device corresponding to the processing unit sending the memory access request based on the corresponding relation between the virtual memory device and the processing unit recorded in an access control table, and accessing a corresponding memory based on the determined virtual memory device;

for example, a memory access request includes information such as the RESOURCE _ ID, address information, and access attribute of the processing unit. The RESOURCE _ ID is an ID of a combination of cores, the address information indicates address information of a memory to be accessed, and the access attribute indicates whether the memory access request is a read request or a write request. The control unit 301 may determine at least one virtual memory device corresponding to the RESOURCE _ ID according to the RESOURCE _ ID lookup access control table (e.g., table 1). For example, the determined virtual memory device is the virtual memory device a shown in fig. 7A, and it may be determined according to table 2 that the core corresponding to the RESOURCE _ ID can access the memory RESOURCE that the virtual memory device a can manage to access. The control unit 301, in combination with the address information and the access attribute in the access request, controls the memory access request to implement memory access control on the memory address space that can be managed or accessed by the virtual memory device a. Optionally, when the access control table records the authority information, the control unit 301 may further control the access of the corresponding processing unit to the memory based on the authority information recorded in the access control table.

It should be noted that, the access control of the control unit 301 to the virtual memory device is to implement access control of a memory with respect to a physical address space of a memory resource corresponding to the virtual memory device.

And 5, dynamically adjusting the corresponding relation between the virtual memory device and the processing unit.

The control unit 301 may implement dynamic adjustment of the virtual memory device by changing the corresponding relationship between the processing units and the virtual memory device in the access control table based on a preset condition (e.g., different requirements of different processing units on memory resources). For example, the correspondence between a certain virtual memory device and a processing unit is deleted, that is, the memory resource corresponding to the virtual memory device is released, and the released memory resource may be allocated to other processing units for memory access. Specifically, the method may be implemented by referring to the manner in which the control unit 201 dynamically adjusts the corresponding relationship to delete, change, or add the corresponding relationship in fig. 2B.

In an optional implementation manner, the modulation of the correspondence between the processing unit and the memory resource in the shared memory pool may also be implemented by changing the memory resource corresponding to each virtual memory device. For example, when a service processed by one processing unit is in a dormant state and does not occupy too much memory, a memory resource managed by a virtual memory device corresponding to the processing unit may be allocated to a virtual memory device corresponding to another processing unit, so as to achieve the purpose that the same memory resource is accessed by different processing units in different time periods.

Illustratively, when the memory sharing control device 300 is implemented by an FPGA chip shown in fig. 3, the CLB in fig. 3 may be configured to implement the functions of the control unit 301.

It should be noted that the control unit 301 virtualizes a plurality of virtual memory devices, allocates the virtualized plurality of virtual memory devices to the processing units connected to the memory sharing control device 300, and dynamically adjusts the correspondence between the virtual memory devices and the processing units, and may be implemented according to a received control instruction sent by a driver of the operating system through a dedicated channel. That is, a driver in an operating system of a computer device where the memory sharing control device 300 is located, sends an instruction for virtualizing a plurality of virtual memory devices, allocating the virtual memory devices to a processing unit, and dynamically adjusting a correspondence between the virtual memory devices and the processing unit to the memory sharing control device 300 through a dedicated channel, and the control unit 301 implements a corresponding function according to the received control instruction.

The memory sharing control device 300 is connected to the processor through a serial bus based on a serial-to-parallel conversion interface (such as a Serdes interface), so that the memory sharing control device 300 can realize remote transmission between the processor and the memory sharing control device 300 while ensuring the speed of the processor accessing the memory, and the processor can quickly access the memory resources in the shared memory pool. Because the memory resources in the shared memory pool can be allocated to different processing units at different times to be used as memory access, the utilization rate of the memory resources is improved.

For example, the control unit 301 in the memory sharing control apparatus 300 can dynamically adjust the correspondence between the virtual memory devices and the processing units, and when the amount of memory space required by a certain processing unit increases, adjust the virtual memory devices that are not occupied or the virtual memory devices that have been allocated to other processing units but are temporarily idle to the processing units with increased memory requirements, i.e., establish the correspondence between the idle virtual memory devices and the processing units with increased requirements. Therefore, the existing memory resources can be effectively utilized to meet different service requirements of the processing unit, the requirements on the memory space of the processing unit under different service scenes are guaranteed, and the utilization rate of the memory resources is improved.

Fig. 7B is a schematic structural diagram of another memory sharing control device 300 according to an embodiment of the present disclosure. The memory sharing control device 300 shown in fig. 7B further includes a cache unit 304 in addition to fig. 7A.

The cache unit 304 may be a Random Access Memory (RAM), and is used for caching data that needs to be accessed by the processing unit during memory access. For example, the data that needs to be read by the processing unit is read from the shared memory pool in advance and is cached in the cache unit 304, so as to achieve the purpose that the processing unit accesses the data quickly, and the data reading rate of the processing unit can be further improved. The cache unit 304 may also cache data evicted from the processing unit, such as Cacheline data evicted from the processing unit. The speed of the processing unit accessing the memory data can be further improved through the cache unit 304.

In an alternative implementation, the cache unit 304 may include a level one cache and a level two cache. As shown in fig. 7C, the cache unit 304 in the memory sharing control device 300 further includes a first-level cache 3041 and a second-level cache 3042, wherein:

the first-level cache 3041 may be a small-capacity (e.g., 100 MB-level) cache, which may be a nanosecond-level SRAM medium, that caches Cacheline data evicted from the processing unit.

The second level cache 3042 may be a high capacity (e.g., 1GB level capacity) cache, which may be a DRAM medium. The secondary cache 3042 may cache Cacheline data evicted from the primary cache at a granularity of 4KB per page, as well as data prefetched from memory 220 (e.g., DDR or PCM media). The Cacheline data is data in a cache. For example, the Cache in the Cache unit 304 is composed of three parts, namely, a valid bit, a tag bit and a data bit, each line contains the three data, and one line of data constitutes one Cache line. When the processing unit initiates a memory access request, the data in the memory access request is matched with the corresponding bit of the Cache, and the Cache data in the Cache is read or written into the Cache.

Illustratively, when the memory sharing control device 300 is implemented by the FPGA chip shown in fig. 3, the function of the cache unit 304 may be implemented by configuring a BRAM in fig. 3, or the functions of the first-level cache 3041 and the second-level cache 3042 may be implemented by configuring a BRAM in fig. 3.

The cache unit 304 further includes a first-level cache 3041 and a second-level cache 3042, which can increase the data access speed of the processing unit through the cache, increase the cache space, expand the range of the processing unit accessing the memory quickly through the cache, and further increase the speed of the processor resource pool accessing the memory as a whole.

Fig. 7D is a schematic structural diagram of another memory sharing control device 300 according to an embodiment of the present disclosure. As shown in fig. 7D, the memory sharing control device 300 further includes a storage unit 305. The storage unit 305 may be a volatile memory (volatile memory), such as a RAM; non-volatile memory (non-volatile memory) such as read-only memory (ROM), flash memory, etc. may also be included. The memory unit 305 stores therein a program or instructions readable by the control unit 301, such as program code containing at least one process or program code of at least one thread. The control unit 301 implements corresponding control by executing the program code in the storage unit 202.

The program code stored in the storage unit 305 may include at least one of a Qos engine 306, a pre-fetch engine 307, and a compression/decompression engine 308. Fig. 7D is only for convenience of displaying functions related to the Qos engine 306, the pre-fetch engine 307, and the compression/decompression engine 308, and these engines are displayed outside the control unit 301 and the storage unit 305, which does not mean that these engines are located outside the control unit 301 and the storage unit 305. In specific implementation, the engines are the control unit 301 that executes the corresponding codes stored in the storage unit 305 to realize the corresponding functions.

The Qos engine 306 is configured to control, according to the RESOURCE _ ID in the memory access request, a storage area of data to be accessed by the processing unit in the cache unit 304 (the first-level cache 3041 or the second-level cache 3042), so that the data accessed by different processing units to the memory has different cache capacities in the cache unit 304. For example, a memory access request initiated by a processing unit with a high priority has an exclusive cache space in the cache unit 304, which can ensure that data accessed by the processing unit can be cached in time, thereby ensuring the quality of service processing of such processing unit.

The prefetch engine 307 is configured to implement prefetching of data in the memory based on a specific algorithm, and implement prefetching of data to be read by the processing unit. Different prefetching modes affect the prefetching accuracy and the memory access efficiency, and prefetching with higher accuracy is realized by the prefetching engine 307 based on a specific algorithm, so that the hit rate of the processing unit during memory data access can be further improved. Exemplary prefetching performed by prefetch engine 307 includes, but is not limited to: prefetching Cacheline from the second level cache into the first level cache, or prefetching data from an external DRAM or PCM into the cache, etc.

The compression/decompression engine 308 is configured to implement compression or decompression of memory access data. For example, the data written into the memory by the processing unit is compressed and written into the memory by a compression ratio algorithm with 4KB of each page as granularity; or when the processing unit reads the compressed data in the memory, decompressing the data to be read and then sending the decompressed data to the processing unit. Alternatively, the compression/decompression engine may be shut down. Thus, the compression/decompression engine 308 is disabled and the processing unit accesses the data from memory without compressing or decompressing the data.

Described above is that the Qos engine 306, the prefetch engine 307, and the compression/decompression engine 308 are stored in the storage unit 305 as software modules, and the control unit 301 reads corresponding codes in the storage unit to implement corresponding functions. As an alternative implementation, at least one of the Qos engine 306, the pre-fetch engine 307, and the compression/decompression engine 308 may also be directly configured in the control unit 301, and implemented by the control logic of the control unit 301. In this way, the control unit 301 can implement the relevant functions by executing the relevant control logic, without having to read the code in the storage unit 305. Illustratively, when the memory sharing control device 300 is implemented by an FPGA chip shown in fig. 3, the functions related to the Qos engine 306, the pre-fetch engine 307, and the compression/decompression engine 308 can be implemented by configuring the CLB in fig. 3.

It is understood that parts of the Qos engine 306, the pre-fetch engine 307, and the compression/decompression engine 308 may be directly implemented by the control unit 301, and parts may be stored in the storage unit 305, and the control unit 301 may execute corresponding functions by reading software codes in the storage unit 305. For example, the Qos engine 306 and the pre-fetch engine 307 are directly implemented by the control logic of the control unit 301, the compression/decompression engine 308 is software code stored in the storage unit 305, and the control unit 301 reads the software code of the compression/decompression engine 308 in the storage unit 305 to implement the functions of the compression/decompression engine 308.

Illustratively, when the memory sharing control device 300 is implemented by an FPGA chip shown in fig. 3, the BRAM in fig. 3 may be configured to implement the functions of the memory unit 305.

The following takes the example that the memory resources connected to the memory sharing control device 300 include DRAM and PCM, and an implementation manner of the memory sharing control device 300 during memory access is exemplarily described.

Fig. 7E and 7F show two implementations of DRAM and PCM, respectively, as storage media for a shared memory pool. In the implementation shown in FIG. 7E, DRAM and PCM are different types of memory contained in a shared memory pool, with no upper or lower level divisions. Data is stored in the shared memory pool shown in fig. 7E without distinguishing which type of memory it is. In fig. 7E, the DDR controller 3031 controls the storage medium of the DRAM, and the PCM controller 3032 controls the storage medium of the PCM. The control unit 301 can access the DRAM through the DDR controller 3031 and the PCM through the PCM controller 3032. In the implementation shown in fig. 7F, since DRAM has higher speed and higher performance than PCM, DRAM can be used as the first level memory, and data with high access frequency can be stored in DRAM preferentially. The PCM is used as a second-level memory for storing data with low access frequency or data eliminated from the DRAM. In FIG. 7F, memory controller 303 includes two portions, PCM control logic and DDR control logic. The memory access request received by the control unit 301 realizes access to the PCM storage medium through the PCM control logic. Based on a preset algorithm or policy, when it is predicted that the data is to be accessed, the data may be cached in the DRAM in advance, so that a subsequent access request of the control unit 301 may hit corresponding data from the DRAM through the DDR control logic, thereby further improving the efficiency of memory access.

As an alternative implementation, in the horizontal architecture shown in fig. 7E, the DRAM and the PCM correspond to different memory spaces. For such an architecture, the control unit 301 may store the hot data with high access frequency on the DRAM, that is, establish a corresponding relationship between the processing unit initiating the hot data with high access frequency and the virtual memory device corresponding to the memory of the DRAM, so as to improve the read-write speed of the memory data and the service life of the main memory system. The control unit 301 may establish a correspondence relationship between a processing unit initiating cold data with a low access frequency and a virtual memory device corresponding to a memory of the PCM to store the cold data with the low access frequency on the PCM, so that security of important data can be ensured by using a nonvolatile characteristic of the PCM. In the vertical architecture shown in FIG. 7F, the features of high integration of PCM and low read and write latency of DRAM are utilized. On one hand, the PCM main memory with larger capacity can be used for storing various data, and the access times to the disk are reduced. On the other hand, the DRAM is used as the cache, so that the efficiency and the performance of memory access can be further improved.

It should be noted that although fig. 7E or fig. 7F includes a cache (the first-level cache 3041 or the second-level cache 3042), the Qos engine 306, the pre-fetch engine 307, and the compression/decompression engine 308, these components are all optional implementations. That is, fig. 7E or 7F may not include these components, and may include at least one of these components.

Based on the characteristics of different architectures in fig. 7E and fig. 7F, the control unit 301 in the memory sharing control device 300 may create virtual memory devices with different characteristics based on different architectures, and allocate the virtual memory devices to processing units with different service requirements, so as to more flexibly meet the requirements of the processing units for accessing the memory resources, and further improve the efficiency of the processing units for accessing the memory.

Fig. 8A-1 is a schematic structural diagram of a computer device 80a according to an embodiment of the present disclosure. As shown in fig. 8A-1, the computer device 80a includes a plurality of processors (processor 810a to processor 810a + N), a memory sharing control device 800a, a shared memory pool of N memories 820a, and a bus 840 a. The memory sharing control device 800a is connected to the processors (processor 810a to processor 810a + N) and the shared memory pool (memory 820a to memory 820a + N) via the bus 840a, respectively. In the embodiment of the application, N is a positive integer greater than or equal to 1

In FIG. 8A-1, each processor 810a has its own local memory, e.g., processor 810a has local memory 1. Each processor 810a may access local memory and, when it is desired to expand the memory resources to be accessed, may access the shared memory pool via the memory sharing control device 800 a. The unified shared memory pool is used for any one of the processors 810a to 810a + N to share, so that the utilization rate of memory resources can be improved, and the problem of overlong time delay caused by cross-processor access when the processor 810a accesses a local memory controlled by other processors can be solved.

Optionally, the memory sharing control device 800a in fig. 8A-1 may further include a logic function of a network adapter, so that the memory sharing control device 800a can also access memory resources in other computer devices through a network, which can further expand the range of the shared memory resources and improve the utilization rate of the memory resources.

Fig. 8A-2 is a schematic structural diagram of another computer device 80a according to an embodiment of the present disclosure. As shown in fig. 8A-2, computer device 80a also includes a network adapter 830a, network adapter 830a being connected to bus 840a via a Serdes interface. The memory sharing control device 800a may access memory resources in other computer devices through the network adapter 830 a. In the computer device 80a shown in fig. 8A-2, the memory sharing control device 800a may not have the function of a network adapter.

Fig. 8B-1 is a schematic structural diagram of a computer device 80B according to an embodiment of the present disclosure. As shown in fig. 8B-1, the computer device 80B includes a processor resource pool formed by a plurality of processors 810B (processor 810B to processor 810B + N), a memory sharing control device 800B, a shared memory pool formed by a plurality of memories 820B (memory 820B to memory 820B + N), and a bus 840B. The processor resource pool is connected to the bus 840b through the memory sharing control device 800b, and the shared memory pool is connected to the bus 840 b. Unlike fig. 8A-1 or 8A-2, each processor (any one of processor 810B through 810B + N) in fig. 8B-1 or 8B-2 does not have its own local memory, and the memory access requests of the processors are implemented in a shared memory pool composed of memory 820B (memory 820B through 820B + N) through the memory sharing control device 800B.

Optionally, the memory sharing control device 800a in fig. 8B-1 may also include a logic function of a network adapter, so that the memory sharing control device 800B can also access memory resources in other computer devices through a network, which can further expand the range of shared memory resources and improve the utilization rate of the memory resources.

Fig. 8B-2 is a schematic structural diagram of another computer device 80B according to an embodiment of the present disclosure. As shown in FIG. 8B-2, computer device 80B also includes a network adapter 830B, network adapter 830B connected to bus 840B via a Serdes interface. The memory sharing control device 800b may access memory resources in other computer devices through the network adapter 830 b. In the computer device 80B shown in fig. 8B-2, the memory sharing control device 800B may not have the function of a network adapter.

The memory sharing control device 80a in fig. 8A-1 and 8A-2, and the memory sharing control device 80B in fig. 8B-1 and 8B-2, may be implemented by referring to the memory sharing control device 200 in fig. 2A or 2B or the memory sharing control device 300 in fig. 7A-7F. The processor 810a or the processor 810b may be implemented as described above with reference to the processor implementation of fig. 6, and the memory 820a or the memory 820b may be a memory resource such as DRAM or PCM. Network adapter 830a connects to bus 840a through a serial interface, such as a Serdes interface; the network adapter 830b is connected to the bus 840b via a serial interface, such as a Serdes interface. The bus 840a or the bus 840b may be a PCIe bus.

In the computer device 80a or the computer device 80b, the multiple processors can realize fast access to the shared memory pool through the memory sharing control device, and can improve the utilization rate of the memory resources in the shared memory pool. Meanwhile, because the network adapter 830 is connected to the bus through the serial interface, the delay of data transmission between the processor and the network adapter is not significant due to the increase of the distance, and the computer device 80a or the computer device 80b can expand the memory resources that can be accessed by the processor to other devices connected to the computer device 80a or the computer device 80b through the memory sharing control device and the network adapter, thereby further expanding the range of the memory resources that can be shared by the processor, realizing the sharing of the memory resources in a larger range, and further improving the utilization rate of the memory resources.

It is understood that the computer device 80a may also include processors without local memory, and these processors access the shared memory pool through the memory sharing control device 800a to realize the memory access. The computer device 80b may also include processors with local memory, which may access local memory, or may access memory in a shared memory pool via the memory sharing control device 800 b. Alternatively, when some of the processors in computer device 80b have local memory, most of the memory accesses by those processors are performed in local memory.

Fig. 9A is a schematic structural diagram of a system 901 according to an embodiment of the present application. As shown in fig. 9A, M computer devices are included in the system 901, and include, for example, a computer device 80a, a computer device 81a, a computer device 82a, and the like. In the embodiment of the application, M is a positive integer greater than or equal to 3. The M computer devices are connected via a network 910a, and the network 910a may be an Ethernet-based network or a U-bus-based network. The computer device 81a has a similar structure to the computer device 80a, and includes a processor resource pool formed by a plurality of processors (processor 8012a to processor 8012a + N), a memory sharing control device 8011a, a shared memory pool formed by a plurality of memories (memory 8013a to memory 8013a + N), a network adapter 8014a and a bus 8015 a. In the computer device 81a, the processor resource pool, the memory sharing control device 8011a and the network adapter 8014a are respectively connected to the bus 8015a, and the memory sharing pool (the memory 8013a to the memory 8013a + N) is connected to the memory sharing control device 8011 a. A memory access request initiated by a processor in the processor resource pool accesses the shared memory pool through the memory sharing control device 8011 a. The memory sharing control device 8011a may be implemented with reference to the memory sharing control device 200 in fig. 2A or fig. 2B, or the memory sharing control device 300 in fig. 7A to fig. 7F. The processor 8012a may be implemented as described above with reference to the processor of fig. 6, and the memory 8013a may be a memory resource such as DRAM or PCM. A network adapter 8014a is connected to bus 8015a via a serial interface, such as a Serdes interface. Bus 8015a may be a PCIe bus.

In FIG. 9A, each processor has its own local memory, which is the main memory resource accessed by the processor's memory. Taking the processor 8012a as an example, the processor 8012a may directly access its local memory 1, and most of the memory access requests of the processor 8012a may be implemented in its memory 1. When the processor 8012a needs a larger memory for processing a burst service, the memory resources in the shared memory pool can be accessed through the memory sharing control device 8011a, so as to meet the demand of the processor 8012a for the memory resources. Optionally, the processor 8012a may also access the local memory of other processors, for example, the local memory of the processor N (memory N) may be accessed, that is, the processor 8012a may also access the local memory of other processors in a memory sharing manner in the NUMA system.

The structure of computer device 82a and other computer devices M may be similar to computer device 80a and will not be described in detail.

In the system 901, the processor 80a may access the shared memory pool constituted by the memory 8013a through the memory sharing control device 800a, the network adapter 830a, the network 910a, the network adapter 8014a and the memory sharing control device 8011 a. That is, the memory resources that processor 810a can access include memory resources in computer device 80a and memory resources in computer device 81 a. In a similar manner, processor 810a may also access memory resources in all of the computer devices in system 901. Thus, when a certain computer device, for example, the processor 8012a running on the computer device 81a has a low traffic load and has more memory 8013a in an idle state, and the processor 810a in the computer device 80a needs a large amount of memory resources for executing applications such as HPC, the memory resources of the computer device 81a can be allocated to the processor 810a in the computer device 80a through the memory sharing control device 800 a. Thus, the memory resources in the system 901 are effectively utilized, the memory requirements of different computer devices during service processing are met, the memory resource utilization rate of the whole system is improved, and the effect of improving the memory resource utilization rate to reduce the TCO is more obvious.

In the system 901 shown in fig. 9A, the computer device 80a includes a network adapter 830 a. In a specific implementation, the computer device 80a may not include the network adapter 830a, and the memory sharing control device 800a may include the control logic of the network adapter, so that the processor 810a may access other memory resources in the network through the memory sharing control device 8011 a. For example, the processor 80a may access the shared memory pool formed by the memory 8013a through the memory sharing control device 800a, the network 910a, the network adapter 8014a and the memory sharing control device 8011 a. When the computer device 81a also does not include the network adapter 8014a and the memory sharing control device 8011a implements the function of the network adapter, the processor 80a may access the shared memory pool constituted by the memory 8013a through the memory sharing control device 800a, the network 910a and the memory sharing control device 8011 a.

Fig. 9B is a schematic structural diagram of a system 902 according to an embodiment of the present disclosure. As shown in fig. 9B, the system 902 includes M computer devices, for example, the computer device 80B, the computer device 81B, the computer device 82B, and the like, where M is a positive integer greater than or equal to 3. The M computer devices are connected via a network 910b, and the network 910b may be an Ethernet-based network or a U-bus-based network. The computer device 81b has a similar structure to the computer device 80b, and includes a processor resource pool formed by a plurality of processors (processor 8012b to processor 8012b + N), a memory sharing control device 8011b, a shared memory pool formed by a plurality of memories (memory 8013b to memory 8013b + N), a network adapter 8014b, and a bus 8015 b. The processor resource pool is connected to the bus 8015b through the memory sharing control device 8011b, and the shared memory pool and the network adapter 8014b are also connected to the bus 8015b, respectively. The memory sharing control device 8011B may be implemented with reference to the memory sharing control device 200 in fig. 2A or fig. 2B, or the memory sharing control device 300 in fig. 7A to fig. 7F. The processor 8012b may be implemented as described above with reference to the processor of fig. 6, and the memory 8013b may be a memory resource such as DRAM or PCM. Network adapter 8014b is connected to bus 8015b via a serial interface, such as a Serdes interface. Bus 8015b may be a PCIe bus.

The structure of computer device 82b and other computer device M may be similar to computer device 80b and will not be described in detail.

In the system 902, the processor 810b may access the shared memory pool constituted by the memory 8013b through the memory sharing control device 800b, the network adapter 830b, the network 910b and the network adapter 8014 b. That is, the memory resources that processor 810b can access include memory resources in computer device 80b and memory resources in computer device 81 b. In a similar manner, the processor 810b may also access memory resources in all of the computer devices in the system 902, thereby treating the memory resources in the system 902 as shared memory resources. Thus, when the traffic load of the processor 8012 running on a computer device, for example, the computer device 81b, is low, more memory 8013b is idle, and the processor 810b in the computer device 80b needs a lot of memory resources for executing applications such as HPC, the memory resources of the computer device 81b can be allocated to the processor 810b in the computer device 80b through the memory sharing control device 800 b. Thus, the memory resource in the system 902 is effectively utilized to meet the memory requirements of different computer devices during processing services, and the utilization rate of the memory resource in the system 902 is improved, which is more obvious in the aspect of improving the utilization rate of the memory resource to reduce the TCO.

In the system 902 shown in fig. 9B, the computer device 80B includes a network adapter 830B. In a specific implementation, the computer device 80b may not include the network adapter 830b, and the memory sharing control device 800b may include the control logic of the network adapter, so that the processor 810b may access other memory resources in the network through the memory sharing control device 8011 b. For example, the processor 80b may access the shared memory pool constituted by the memory 8013b through the memory sharing control device 800b, the network 910b, the network adapter 8014b and the memory sharing control device 8011 b. When the computer device 81b also does not include the network adapter 8014b and the memory sharing control device 8011b implements the function of the network adapter, the processor 80b may access the shared memory pool constituted by the memory 8013b through the memory sharing control device 800b, the network 910b and the memory sharing control device 8011 b.

Fig. 9C is a schematic structural diagram of a system 903 according to an embodiment of the present disclosure. As shown in fig. 9C, the system 903 includes a computer device 80a, a computer device 81b, a computer device 82C, and a computer device M. Wherein 80a in the system 903 is implemented in the same way as 80a in the system 901, 81b in the system 903 is implemented in the same way as 81b in the system 902, and the computer devices 82C to M may be computer devices similar to the computer device 80a or may be computer devices similar to the computer device 81 b. The system 903 integrates the computer device 80a in the system 910 and the computer device 81b in the system 920, and the purpose of improving the utilization rate of the memory resources in the system can be achieved through the memory sharing.

It should be noted that in the system 901-system 903, the computer device needs to transmit the memory access request through the network, and since the network adapter 830b is connected to the memory sharing control device through the serial bus via the Serdes interface, the transmission rate and bandwidth of the serial bus can ensure the data transmission rate. Therefore, although the data transmission rate is influenced to a certain extent by network transmission, the utilization rate of the memory resource can be improved while the access memory rate of the processor is taken into consideration by the method from the aspect of improving the utilization rate of the memory resource.

Fig. 10 is a logic diagram of the computer device 80a shown in fig. 8A-1 or fig. 8A-2, or the computer device 80B shown in fig. 8B-1 or fig. 8B-2 when implementing memory sharing, or a logic diagram of the system 900 shown in fig. 9A-9C when implementing memory sharing.

Taking the logic diagram of the computer device 80a shown in fig. 8A-1 for implementing memory sharing as an example, the processors 1-4 represent any 4 processors (or cores in the processors) in a processor resource pool formed by the processors 810a, the memory sharing control device 1000 represents the memory sharing control device 800a, and the memories 1-4 represent any 4 memories in a shared memory pool formed by the memories 820 a. The memory sharing control device 1000 virtualizes 4 virtual memory devices (i.e., the virtual memories 1 to 4 shown in fig. 10) based on the memories 1 to 4, and the access control table 1001 records the correspondence between the virtual memory devices and the processors. When the memory sharing control device 1000 receives a memory access request sent by any one of the processors 1 to 4, the information of the virtual memory device corresponding to the processor that sends the memory access request is obtained based on the access control table 1001, and according to the obtained information of the virtual memory device, the memory controller 1002 is used to access the corresponding memory.

Taking the logic diagram of the system 902 shown in fig. 9B when implementing memory sharing as an example, the processors 1-4 represent 4 processors (or cores in processors) in any one or more computer devices in the system 902, the memory sharing control device 1000 represents a memory sharing control device in any one computer device, and the memories 1-4 represent 4 memories in any one or more computer devices in the system 900. The memory sharing control device 1000 virtualizes 4 virtual memory devices (i.e., the virtual memories 1 to 4 shown in fig. 10) based on the memories 1 to 4, and the access control table 1001 records the correspondence between the virtual memory devices and the processors. When the memory sharing control device 1000 receives a memory access request sent by any one of the processors 1 to 4, the information of the virtual memory device corresponding to the processor that sends the memory access request is obtained based on the access control table 1001, and according to the obtained information of the virtual memory device, the memory controller 1002 is used to access the corresponding memory.

Fig. 11 is a schematic structural diagram of a computer device 1100 according to an embodiment of the present application. As shown in fig. 11, the computer device 1100 includes at least two processing units 1102, which are processors, cores in processors, or a combination of cores in processors, a memory sharing control device 1101, and a memory pool including one or more memories 1103;

the at least two processing units 1102 are coupled with the memory sharing control device 1101;

the memory sharing control device 1101 is configured to allocate memories from the memory pool to the at least two processing units 1102 respectively, where at least one memory in the memory pool can be accessed by different processing units in different time periods;

the at least two processing units 1102 are configured to access the allocated memory through the memory sharing control device 1101.

The at least two processing units 1102 are coupled to the memory sharing control device 1101, which means that the at least two processing units 1102 are respectively connected to the memory sharing control device 1101, and any one processing unit 1102 of the at least two processing units 1102 may be directly connected to the memory sharing control device 1101, or may be connected to another hardware component (e.g., another chip).

The specific implementation of the computer device 1100 shown in fig. 11 can be implemented with reference to the implementations of fig. 8A-1, 8A-2, 8B-1 and 8B-2, as described above, as well as with reference to the implementations of the computer devices (e.g., the computer device 80a or the computer device 80B, etc.) in fig. 9A-9C, as described above, and as shown in fig. 10; the memory sharing control device 1101 in the computer device 1100 may also be implemented by referring to the memory sharing control device 200 in fig. 2A or fig. 2B or the memory sharing control device 300 in fig. 7A to fig. 7F, which is not described again.

At least two processing units 1102 in the computer device 1100 shown in fig. 11 can access at least one memory in the memory pool in different time periods through the memory sharing control device 1101, so as to meet the requirement of the processing units on the memory resources, and improve the utilization rate of the memory resources

Fig. 12 is a flowchart illustrating a method for memory sharing control according to an embodiment of the present disclosure. The method may be applied to the computer device shown in fig. 8A-1, fig. 8A-2, fig. 8B-1, or fig. 8B-2, or may also be applied to the computer device (e.g., the computer device 80a or the computer device 80B) shown in fig. 9A-9C, where the computer device includes at least two processing units, a memory sharing control device, and a memory pool, where the memory pool includes one or more memories, as shown in fig. 12, and the method includes:

step 1200: the memory sharing control equipment receives a first memory access request sent by a first processing unit of at least two processing units; wherein the processing unit is a processor, a core in a processor, or a combination of cores in a processor;

step 1202: the memory sharing control device allocates a first memory to the first processing unit from a memory pool, and the first memory can be accessed by a second processing unit of the at least two processing units in other time periods;

step 1204: and the first processing unit accesses the first memory through the memory sharing control equipment.

Based on the method shown in fig. 12, different processing units access at least one memory in the memory pool in different time periods, so that the requirements of the processing units on the memory resources can be met, and the utilization rate of the memory resources is improved.

Specifically, the method shown in fig. 12 may be implemented by referring to the memory sharing control device 200 in fig. 2A or fig. 2B or the memory sharing control device 300 in fig. 7A to fig. 7F, and details are not repeated.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, the described apparatus embodiments are only illustrative, for example, the division of the units is only one logical function division, and there may be other division manners in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A computer device is characterized by comprising at least two processing units, a memory sharing control device and a memory pool, wherein the processing units are processors, cores in the processors or a combination of the cores in the processors, and the memory pool comprises one or more memories;

2. The computer device according to claim 1, wherein the at least two processing units are connected to the memory sharing control device through a serial bus;

3. The computer device of claim 2, wherein the memory sharing control device comprises a processor interface configured to:

receiving the first memory access request;

4. A computer device according to claim 2 or 3, wherein the memory sharing control device comprises a control unit configured to:

5. The computer device according to claim 2 or 3, wherein the memory sharing control device comprises a control unit configured to:

and allocating the first virtual memory device to the first processing unit.

6. The computer device of claim 5, wherein the control unit is further configured to:

7. The computer device according to any one of claims 1 to 6, wherein the memory sharing control device further comprises a cache unit, and the cache unit is configured to cache data read from the memory pool by any one of the at least two processing units or cache obsolete data by any one of the at least two processing units.

8. The computer device according to claim 7, wherein the memory sharing control device further comprises a prefetch engine, and the prefetch engine is configured to pre-fetch data that needs to be read by any one of the at least two processing units from the memory pool, and cache the data in the cache unit.

9. The computer device according to claim 7 or 8, wherein the memory sharing control device further comprises a quality of service Qos engine, and the Qos engine is configured to implement optimized storage of data that needs to be cached by any one of the at least two processing units in the caching unit.

10. The computer device according to any one of claims 1 to 9, wherein the memory sharing control device further comprises a compression/decompression engine, and the compression/decompression engine is configured to implement compression or decompression of data related to memory access.

11. The computer device of any of claims 1-10, wherein the first processing unit further has a local memory, the local memory being used for memory access by the first processing unit.

12. The computer device of any of claims 1-11, wherein the plurality of memories included in the memory pool are of different media types.

13. A system comprising at least two computer devices according to claims 1-12 connected via a network.

14. The memory sharing control equipment is characterized by comprising a control unit, a processor interface and a memory interface;

15. The memory sharing control apparatus according to claim 14, wherein:

the processor interface is further configured to receive, through a serial bus, a first memory access request sent by a first processing unit of the at least two processing units in a serial signal form, where the first memory access request is used to access a first memory allocated to the first processing unit.

16. The memory sharing control apparatus according to claim 15, wherein:

the processor interface is further configured to convert the first memory access request into a second memory access request in the form of a parallel signal, and send the second memory access request to the control unit;

17. The memory sharing control device according to claim 15 or 16, characterized in that:

the control unit is further configured to establish a correspondence between a memory address of the first memory in the memory pool and the first processing unit, so as to allocate the first memory to the first processing unit from the memory pool.

18. The memory sharing control device according to claim 15 or 16, wherein:

the control unit is further configured to virtualize a plurality of virtual memory devices from the memory pool, where a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory;

and allocating the first virtual memory device to the first processing unit.

19. The memory sharing control device according to claim 18, wherein:

the control unit is further configured to: and when a preset condition is met, removing the corresponding relation between the first virtual memory device and the first processing unit, and establishing the corresponding relation between the first virtual memory device and a second processing unit of the at least two processing units.

20. The memory sharing control device according to any one of claims 14 to 19, further comprising a cache unit;

21. The memory sharing control device according to claim 20, further comprising a prefetch engine, configured to pre-fetch data that needs to be read by any one of the at least two processing units from the memory pool, and cache the data in the cache unit.

22. The memory sharing control device according to claim 20 or 21, further comprising a quality of service Qos engine;

and the Qos engine is used for realizing the optimized storage of the data needing to be cached by any one of the at least two processing units in the cache unit.

23. The memory sharing control device according to any one of claims 14 to 22, further comprising a compression/decompression engine;

24. A method for memory sharing control, the method being applied to a computer device, the computer device including at least two processing units, a memory sharing control device, and a memory pool, the memory pool including one or more memories, the method comprising:

the memory sharing control device allocates a first memory for the first processing unit from a memory pool, where the first memory can be accessed by a second processing unit of the at least two processing units in other time periods;

25. The method of claim 24, further comprising:

26. The method of claim 25, further comprising:

27. The method of claim 25 or 26, further comprising:

28. The method of claim 25 or 26, further comprising:

29. The method of claim 28, further comprising:

30. The method according to any one of claims 24-29, further comprising:

31. The method of claim 30, further comprising:

32. The method of claim 30 or 31, further comprising:

33. The method according to any one of claims 24-32, further comprising:

and the memory sharing control equipment compresses or decompresses data related to memory access of any one processing unit in the at least two processing units.