CN118426976A - Memory expansion system, access method and device, medium and computer program product - Google Patents
Memory expansion system, access method and device, medium and computer program product Download PDFInfo
- Publication number
- CN118426976A CN118426976A CN202410889277.8A CN202410889277A CN118426976A CN 118426976 A CN118426976 A CN 118426976A CN 202410889277 A CN202410889277 A CN 202410889277A CN 118426976 A CN118426976 A CN 118426976A
- Authority
- CN
- China
- Prior art keywords
- memory
- access request
- memory access
- module
- programmable gate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 title claims abstract description 828
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000004590 computer program Methods 0.000 title claims abstract description 25
- 230000004044 response Effects 0.000 claims description 59
- 238000004458 analytical method Methods 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 25
- 238000006243 chemical reaction Methods 0.000 claims description 20
- 230000003287 optical effect Effects 0.000 claims description 16
- 230000002093 peripheral effect Effects 0.000 claims description 15
- 230000003068 static effect Effects 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 20
- 230000005540 biological transmission Effects 0.000 description 9
- 230000001360 synchronised effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000008187 granular material Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Stored Programmes (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a memory expansion system, an access method, access equipment, a medium and a computer program product, which relate to the technical field of computers, wherein the memory expansion system comprises N graphic processors and N field programmable gate array modules, the field programmable gate array modules are provided with the memory expansion modules, the N graphic processors are connected in a ring shape, the N field programmable gate array modules are connected in a ring shape, and each graphic processor is connected with k field programmable gate array modules; the field programmable gate array module is used for receiving a memory access request; the memory access request comprises a processor and/or a graphics processor connected with the field programmable gate array module and/or memory access requests sent by other field programmable gate array modules; and the memory expansion module is used for responding to the memory access request. The invention realizes the memory expansion of the graphic processor and improves the processing performance of the graphic processor.
Description
Technical Field
The present invention relates to the field of computer technology, and more particularly, to a memory expansion system, an access method and apparatus, a medium, and a computer program product.
Background
With the development of AI (ARTIFICIAL INTELLIGENCE ) and related technologies of large models thereof, more and more GPU (Graphics Processing Unit, graphics processor) servers and data are required to train an iterative large model, and early GPU functions and resources are single and are only used for graphics image processing, so that the method is widely used in the AI artificial intelligence computing field at present and is an important data center computing chip.
GPUs are usually plugged into server slots as PCIe (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, peripheral component interconnect express bus) devices, and the memory on which GPU chips rely for computation usually has HBMs (High Bandwidth Memory ) on chip, which cannot be effectively shared between GPU cards, and there is a performance bottleneck.
Therefore, how to implement memory expansion of the GPU and improve the processing performance of the GPU is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a memory access method, a memory access device, memory access equipment, a storage medium and a computer program product, which realize memory expansion of a GPU and improve processing performance of the GPU.
In order to achieve the above purpose, the invention provides a memory expansion system, which comprises N graphic processors and N field programmable gate array modules, wherein the field programmable gate array modules are provided with memory expansion modules, the N graphic processors are connected in a ring shape, the N field programmable gate array modules are connected in a ring shape, each graphic processor is connected with k field programmable gate array modules, the nth graphic processor is connected with the mth field programmable gate array module, k is more than or equal to 2 and less than or equal to N, N is more than or equal to 1 and less than or equal to N, when N is more than or equal to k, the value range of m is [ N-k+1, N ], and when N is less than k, the value range of m is [1, N ] [ n+n-k+1, N ];
The field programmable gate array module is used for receiving a memory access request, analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
The memory expansion module is used for responding to the memory access request;
The memory expansion module comprises a first memory area and a second memory area, the nth graphic processor reads data from the first memory area in the memory expansion module mounted by the nth field programmable gate array module and the second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the nth graphic processor, and stores the processed data to the second memory area in the memory expansion module mounted by the nth field programmable gate array module.
The field programmable gate array module is used for mounting the memory expansion module through an open memory interface, and the field programmable gate array module is used for sending the memory access request to the mounted memory expansion module through the open memory interface according to the analysis result;
And/or the double-rate static random access memory is mounted on the field programmable gate array module, and is mounted on the field programmable gate array module in a dual-in-line module mode, and the field programmable gate array module sends the memory access request to the double-rate static random access memory according to the analysis result so that the double-rate static random access memory responds to the memory access request.
The memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a first memory access request sent by a processor in the target node, analyzing the first memory access request to determine a requested target memory page table, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module; and unlocking the target memory page table after the memory expansion module responds to the first memory access request.
The field programmable gate array module comprises a computing fast link interface or a peripheral component interconnection fast bus interface, a computing fast link, a direct memory access controller and a page table enabling module;
A computing fast link interface or a peripheral component interconnect fast bus interface for receiving a first memory access request sent by a processor in the target node;
Calculating a fast link and a direct memory access controller, wherein the fast link and the direct memory access controller are used for analyzing the first memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the memory expansion module is further used for sending the first memory access request to the field programmable gate array module;
and the page table enabling module is further used for unlocking the target memory page table after the memory expansion module responds to the first memory access request.
The memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node, analyzing the second memory access request, converting an address of a target memory page table requested in the second memory access request into a second memory access request with a converted memory physical address, analyzing the second memory access request to determine a target memory page table requested, judging whether the attribute of the target memory page table meets a preset condition, locking the target memory page table if the attribute of the target memory page table meets the preset condition, and sending the second memory access request after conversion to a memory expansion module mounted on the field programmable gate array module; and receiving response data of the second memory access request after the memory expansion module responds to the conversion, constructing a response data packet which accords with the format corresponding to the target memory page table based on the response data, returning the response data packet to a sender of the second memory access request, and unlocking the target memory page table.
The field programmable gate array module comprises a high-speed channelized chip-to-chip interface, an inter-card memory access request analysis module, a page table conversion module, a computing fast link, a direct memory access controller, a page table enabling module and an inter-card memory access response packet module; the different graphic processors, the different field programmable gate array modules and the field programmable gate array modules are connected with the graphic processors through high-speed channelized chip-to-chip interfaces;
The high-speed channelized chip-to-chip interface in the field programmable gate array module is used for receiving a second memory access request sent by the graphic processor and/or other field programmable gate array modules in the target node;
The inter-card memory access request analysis module is used for analyzing the second memory access request;
The page table conversion module is used for converting the address of the target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request;
calculating a fast link and a direct memory access controller, which are used for analyzing the converted second memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the memory expansion module is further used for sending the converted second memory access request to the field programmable gate array module;
the inter-card memory access response packet module is used for receiving response data of the second memory access request after the memory expansion module responds to the conversion, and constructing a response data packet which accords with a format corresponding to the target memory page table based on the response data;
the high-speed channelized chip-to-chip interface in the field programmable gate array module is further configured to return the response packet to the sender of the second memory access request;
and the page table enabling module is also used for unlocking the target memory page table.
The memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a third memory access request sent by equipment in other nodes, analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module; unlocking the target memory page table after the memory expansion module responds to the third memory access request; wherein the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphics processors in the other nodes.
The field programmable gate array module comprises a network optical module, a remote direct memory access protocol stack based on Ethernet, a computing fast link, a direct memory access controller and a page table enabling module;
the network optical module is used for receiving a third memory access request sent by equipment in other nodes;
The remote direct memory access protocol stack, the calculation fast link and the direct memory access controller based on the Ethernet are used for analyzing the third memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the fast link and the direct memory access controller are also used for sending the third memory access request to the memory expansion module mounted on the field programmable gate array module;
And the page table enabling module is further used for unlocking the target memory page table after the memory expansion module responds to the third memory access request.
Wherein the graphics processor includes a high bandwidth memory.
The memory expansion system is applied to model training, training data are divided into N training sub-data, and the N training sub-data are respectively stored in first memory areas in memory expansion modules mounted on N field programmable gate array modules;
The n-th graphic processor reads target training data from a first memory area in the memory expansion module mounted by the n-th field programmable gate array module and a second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the n-th graphic processor, performs model training based on the target training data, and stores the processed data to the second memory area in the memory expansion module mounted by the n-th field programmable gate array module.
The N training sub-data are respectively stored into a first memory area in the memory expansion modules mounted on the N field programmable gate array modules through the peripheral component interconnection rapid bus interface or the network optical module.
And the field programmable gate array module and the graphic processor perform data transmission through the DMA controller.
The N graphic processors share the high-bandwidth memory in the graphic processors through annular connection.
In order to achieve the above object, the present invention provides a memory access method applied to a field programmable gate array module in the memory expansion system, the method comprising:
Receiving a memory access request; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
And analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result so that the memory expansion module responds to the memory access request.
The memory expansion system is deployed in a target node, and the receiving the memory access request includes:
Receiving a first memory access request sent by a processor in the target node;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
Analyzing the first memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module;
Correspondingly, after the memory expansion module responds to the first memory access request, the method further comprises:
And unlocking the target memory page table.
The memory expansion system is deployed in a target node, and the receiving the memory access request includes:
Receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
Analyzing the second memory access request, and converting the address of a target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request;
Analyzing the converted second memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the converted second memory access request to a memory expansion module mounted on the field programmable gate array module;
Correspondingly, after the memory expansion module responds to the second memory access request, the method further comprises:
And receiving response data of the second memory access request after the memory expansion module responds to the conversion, constructing a response data packet which accords with the format corresponding to the target memory page table based on the response data, returning the response data packet to a sender of the second memory access request, and unlocking the target memory page table.
The memory expansion system is deployed in a target node, and the receiving the memory access request includes:
Receiving a third memory access request sent by equipment in other nodes; the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphic processors in the other nodes;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the third memory access request to a memory expansion module mounted on the field programmable gate array module;
correspondingly, after the memory expansion module responds to the third memory access request, the method further comprises:
And unlocking the target memory page table.
To achieve the above object, the present invention provides an electronic device including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the memory access method when executing the computer program.
To achieve the above object, the present invention provides a non-volatile storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the memory access method as described above.
To achieve the above object, the present invention provides a computer program product comprising a computer program which, when being executed by a processor, implements the steps of a memory access method as described above.
According to the scheme, the memory expansion system comprises N graphic processors and N field programmable gate array modules, wherein the field programmable gate array modules are provided with memory expansion modules, the N graphic processors are connected in a ring mode, the N field programmable gate array modules are connected in a ring mode, each graphic processor is connected with k field programmable gate array modules, the nth graphic processor is connected with the mth field programmable gate array module, k is not less than 2 and not more than N, N is not less than 1 and not more than N, when N is not less than k, the value range of m is [ N-k+1, N ], and when N is less than k, the value range of m is [1, N ] [ n+n-k+1, N ]; the field programmable gate array module is used for receiving a memory access request, analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module; the memory expansion module is used for responding to the memory access request; the memory expansion module comprises a first memory area and a second memory area, the nth graphic processor reads data from the first memory area in the memory expansion module mounted by the nth field programmable gate array module and the second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the nth graphic processor, and stores the processed data to the second memory area in the memory expansion module mounted by the nth field programmable gate array module.
The invention has the beneficial effects that: according to the memory expansion system provided by the invention, a multi-path ring network communication topology is formed between the GPU and the FPGA (Field Programmable GATE ARRAY ) module, the GPU can directly access the memory expansion module mounted on the FPGA module connected with the GPU, in addition, the processor and other FPGA modules can also access the memory expansion module mounted on the current FPGA module, so that the GPU calculation memory is effectively expanded, the communication bottleneck is reduced, the GPU calculation resource utilization rate is increased, and the processing performance of the GPU is improved. Furthermore, each GPU realizes the cyclic processing of data stored in the memory expansion module in the link ring formed by the FPGA modules, and can fully utilize the interconnection pipelining processing data between the GPUs and the FPGA, so that the local memory of a single GPU is expanded, the data quantity processed by the single GPU is improved, and the processing performance of a plurality of GPUs is further improved. The invention also discloses a memory access device, an electronic device, a nonvolatile storage medium and a computer program product, and the technical effects can be achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is a block diagram of a memory access system according to an exemplary embodiment;
FIG. 2 is a schematic diagram of an FPGA module according to an exemplary embodiment;
FIG. 3 is a schematic diagram illustrating the data flow of functional modules in an FPGA module according to an exemplary embodiment;
FIG. 4 is a flowchart illustrating a first memory access method according to an exemplary embodiment;
FIG. 5 is a flow chart illustrating a second memory access method according to an exemplary embodiment;
FIG. 6 is a flow chart illustrating a third memory access method according to an example embodiment;
FIG. 7 is a flowchart illustrating a fourth memory access method according to an example embodiment;
Fig. 8 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In addition, in the embodiments of the present invention, "first", "second", etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence.
The embodiment of the invention discloses a memory expansion system, which comprises N graphic processors and N field programmable gate array modules, wherein the field programmable gate array modules are provided with memory expansion modules, the N graphic processors are connected in a ring shape, the N field programmable gate array modules are connected in a ring shape, each graphic processor is connected with k field programmable gate array modules, the nth graphic processor is connected with the mth field programmable gate array module, k is more than or equal to 2 and less than or equal to N, N is more than or equal to 1 and less than or equal to N, when N is more than or equal to k, the value range of m is [ N-k+1, N ], and when N is less than k, the value range of m is [1, N ] [ n+n-k+1, N ];
The field programmable gate array module is used for receiving a memory access request, analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
the memory expansion module is used for responding to the memory access request.
The memory expansion system in the embodiment can be applied to a server, wherein the server comprises a plurality of nodes, and each node is provided with a set of memory expansion system. The memory expansion system comprises N GPUs and N FPGA modules, wherein the GPUs comprise HBMs (High Bandwidth Memory, high-bandwidth memories), and each FPGA module is provided with a memory expansion module.
As an optional implementation manner, the field programmable gate array module mounts the memory expansion module through an open memory interface, and the field programmable gate array module sends the memory access request to the mounted memory expansion module through the open memory interface according to the analysis result; and/or the double-rate static random access memory is mounted on the field programmable gate array module, and is mounted on the field programmable gate array module in a dual-in-line module mode, and the field programmable gate array module sends the memory access request to the double-rate static random access memory according to the analysis result so that the double-rate static random access memory responds to the memory access request.
The memory expansion module in this embodiment may include a memory expansion module connected to the FPGA module through an open memory interface (Open Memory Interface, OMI), double-rate static random access memory (Double DATA RATE SDRAM, DDR), and so on. DDR is a memory granule that can be sampled separately on rising and falling edges, which can be transmitted twice per clock cycle, and is usually in the form of DIMM (Dual Inline Memory Module, dual inline Module) and concentrates multiple memory granules on a circuit board, which can be applied to a server host or various accelerator cards using address, data and control buses. OMI is an open-source memory bus interface using high-speed serial transmission link, which defines only the physical layer specification, the link protocol layer is not in the range, and a high-speed serial-parallel conversion controller and buffer (buffer) are integrated on the circuit board where the memory particles are located. The OMI memory expansion module is connected with the field programmable gate array module by adopting an OMI interface, a high-speed differential link is adopted, the anti-interference performance is high, the signal transmission is stable, a single channel of the memory expansion module is expanded, the transmission signal frequency can reach the upper GHz, the data rate can reach 25.6GB/s, even higher, the total transmission bandwidth is high, a plurality of memory controller channels are supported, the total bandwidth of the expanded memory is higher, the total memory is larger, the total memory is up to 256GB, the multi-channel expanded memory is formed by using a plurality of OMI memory expansion modules, and the total expansion capacity can easily reach the TB level.
In this embodiment, the GPUs are interconnected with the GPUs adjacent to each other in the left and right, that is, N GPUs are connected in a ring, the FPGA modules are interconnected with the FPGA modules adjacent to each other in the left and right, that is, N FPGA modules are connected in a ring, each of the graphics processors is connected to k field programmable gate array modules, N graphics processors are connected to m field programmable gate array modules, k is 2-N, N is 1-N, N is N when N is not less than k, the value range of m is [ N-k+1, N ], and when N is less than k, the value range of m is [1, N ] [ n+n-k+1, N ]. Taking k=2 as an example, the GPU is interconnected with two front and back adjacent FPGA modules, that is, the first GPU is connected with the first FPGA module and the last FPGA module, the nth GPU is connected with the nth FPGA module and the n-1 FPGA module, and the connection topology diagram is shown in fig. 1.
Each of the GPU and the FPGA module supports a plurality of high-speed channelized chip-to-chip (inter-token) interfaces, computing fast link (Compute Express Link, CXL) interfaces or peripheral component interconnect fast bus (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE) interfaces, and the different GPUs, the different FPGA modules and the FPGA modules are connected through the inter-token. The GPU in one node can access the memory of the GPU connected with the GPU, can access the memory expansion module mounted on the FPGA module connected with the GPU, and can also access the memory expansion module mounted on the FPGA module in other nodes, namely the memory which can be accessed by a single GPU is increased, so that the calculation efficiency of the single GPU is improved. The FPGA module can unload the workload of DMA (direct memory access ) data movement of the GPU part, and reduces the occupation of GPU computing scheduling resources.
The structure schematic diagram of the FPGA module is shown in fig. 2, and includes a field programmable gate array chip, a network optical module, a computing fast link interface or a peripheral component interconnect fast bus interface, a multi-channel high-speed channelized chip-to-chip interface, an MCU (micro control unit, microcontroller Unit), a memory expansion module connected through an open memory interface, a double-rate static random access memory, and a power module. The MCU is used for detecting states such as voltage, temperature and the like of the whole field programmable gate array module, and sending out alarm information when any state index exceeds standard.
The memory expansion module comprises a first memory area and a second memory area, the nth graphic processor reads data from the first memory area in the memory expansion module mounted by the nth field programmable gate array module and the second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the nth graphic processor, and stores the processed data to the second memory area in the memory expansion module mounted by the nth field programmable gate array module.
Taking k=2 and the 2 nd GPU as an example, the data flow is shown by a dotted line in fig. 1, the 2 nd GPU simultaneously reads the data of the second memory area in the memory expansion module mounted by the 1 st FPGA module and the data of the first memory area in the memory expansion module mounted by the 2 nd FPGA module after the 1 st GPU processes the data of the second memory area, writes the data of the two memory areas into the second memory area in the memory expansion module mounted by the 2 nd FPGA module after the data of the two memory areas are processed simultaneously, and so on, and circularly processes the data stored on the link ring.
The data flow diagram of each functional module in the FPGA module is shown in fig. 3, the computing fast link interface or the peripheral component interconnect fast bus interface is used for sending and receiving TLP (Transaction LAYER PACKET) messages interacted with a CPU (Central Processing Unit ) in a node, the computing fast link and the direct memory access controller can mount an extended multi-channel memory controller, the DMA controller can realize data movement between the CPU and the FPGA module and between the FPGA module and the DMA controller, the RDMA (Remote Direct Memory Access) based on ethernet is responsible for processing DMA requests from a network access extended memory, the multi-channel INTERLAKEN IP is connected with each Interlaken interface and is used for receiving and sending protocol data packets based on Interlaken, the inter-card memory access request parsing module is responsible for parsing command requests for reading and writing the extended memory, the inter-card memory access response packet grouping module is responsible for grouping response data returned by the read request, and the conversion module is responsible for converting page table IDs and corresponding 64-bit address buses in the system. The page table enabling module is responsible for monitoring read-write status authority and the like of the enabling page table block, and the network optical module can be a 400G network optical module and is responsible for receiving and transmitting data packets for remotely accessing the local expansion memory.
After the system is powered on, the FPGA module automatically runs an extended memory self-checking program, a self-checking result is uploaded through the MCU, a host CPU allocates an address space for the extended memory, the extended memory is registered as an independent NUMA (Non-Uniform Memory Access ) node, a page table ID attribute table is initialized, and the like.
When the system is powered on and initialized, the page table conversion module and the page table enabling module divide the page table according to the allocated physical address of the corresponding extended memory in a preset unit (for example, 4 KB), each page table has states and operations of reading, writing, enabling, locking and the like, and the information can be stored in an 8-bit dual-port RAM (random access memory) with a page table ID as an address, namely, a page table ID attribute table, and is defined in detail as shown in a table 1.
TABLE 1
The communication frame types between FPGAs, GPUs and FPGAs through the Interlaken interface can include source ID, destination ID, request/response frame type, request page table ID, packet sequence number and payload data of the board card, and the specific format is shown in table 2.
TABLE 2
As a possible implementation manner, the memory expansion system is deployed in the target node, and the field programmable gate array module is specifically configured to: receiving a first memory access request sent by a processor in the target node, analyzing the first memory access request to determine a requested target memory page table, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module; and unlocking the target memory page table after the memory expansion module responds to the first memory access request.
In a specific implementation, a processor in a target node sends a first memory access request to an FPGA module through a CXL interface or a PCIE interface, after the first memory access request is analyzed by a computing fast link and a direct memory access controller, the attribute of a target memory page table of the request is judged, when the target memory page table is enabled and the attribute meets an unlocked state, the page table updating state is locked, a read-write memory operation command is sent to a memory expansion module, a read response data original path is returned to the processor, and the page table updating state is unlocked.
Referring to FIG. 3, the field programmable gate array module includes a compute fast link interface or peripheral component interconnect fast bus interface, a compute fast link and direct memory access controller, a page table enabling module; the method comprises the steps of calculating a fast link interface or a peripheral component interconnect fast bus interface, and receiving a first memory access request sent by a processor in a target node; calculating a fast link and a direct memory access controller, which are used for analyzing a first memory access request to determine a target memory page table of the request; the page table enabling module is used for judging whether the attribute of the target memory page table meets the preset condition, and if yes, locking the target memory page table; the method comprises the steps of calculating a fast link and a direct memory access controller, and sending a first memory access request to a memory expansion module mounted on a field programmable gate array module; the page table enabling module is further configured to unlock the target memory page table after the memory expansion module responds to the first memory access request. The data flow of the intra-node processor accessing the FPGA module is 1, 10, 11, 10, 2,1, 10 and 11 in the sequence shown in fig. 3.
As a possible implementation manner, the memory expansion system is deployed in the target node, and the field programmable gate array module is specifically configured to: receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node, analyzing the second memory access request, converting an address of a target memory page table requested in the second memory access request into a second memory access request with a converted memory physical address, analyzing the second memory access request to determine a target memory page table requested, judging whether the attribute of the target memory page table meets a preset condition, locking the target memory page table if the attribute of the target memory page table meets the preset condition, and sending the second memory access request after conversion to a memory expansion module mounted on the field programmable gate array module; and receiving response data of the second memory access request after the memory expansion module responds to the conversion, constructing a response data packet which accords with the format corresponding to the target memory page table based on the response data, returning the response data packet to a sender of the second memory access request, and unlocking the target memory page table.
In a specific implementation, when a GPU or other FPGA module connected with an inter-link interface accesses a memory expansion module mounted on a current FPGA module, the multi-channel inter-link interface firstly receives a second memory access request, analyzes a specific command through an inter-card memory access request analysis module, converts the second memory access request into a physical address corresponding to a target memory page table according to a page table conversion module, sends the converted second memory access request to a computing fast link and a direct memory access controller to analyze, judges the enabling and state of the target memory page table, locks and updates the page table state when the target memory page table is enabled and the attribute meets an unlocked state, initiates access to the memory expansion module, returns read response data to the inter-card memory access response packet module after the DMA controller reads, converts the read response data into a response data packet conforming to a format corresponding to the target memory page table, sends the response data packet to the inter-link interface back to a sender, and unlocks the page table updating state.
Referring to fig. 3, a high-speed channeled chip-to-chip interface in a field programmable gate array module is used to receive a second memory access request sent by a graphics processor and/or other field programmable gate array modules in a target node; the inter-card memory access request analysis module is used for analyzing the second memory access request; the page table conversion module is used for converting the address of a target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request; calculating a fast link and a direct memory access controller, which are used for analyzing the converted second memory access request to determine a target memory page table of the request; the page table enabling module is used for judging whether the attribute of the target memory page table meets the preset condition, and if yes, locking the target memory page table; the fast link and the direct memory access controller are calculated, and the memory expansion module is further used for sending the converted second memory access request to the field programmable gate array module for mounting; the inter-card memory access response packet module is used for receiving response data of the second memory access request after the memory expansion module responds to the conversion, and constructing a response data packet which accords with a format corresponding to the target memory page table based on the response data; the high-speed channelized chip-to-chip interface in the field programmable gate array module is also used for returning a response data packet to a sender of the second memory access request; the page table enabling module is also used for unlocking the target memory page table. The data flow of the GPU or other FPGA module in the node to access the current FPGA module is 3,4, 5, 10, 11, 10, 2,6, 7, 8, 10, 11 in fig. 3 in sequence.
As a possible implementation manner, the memory expansion system is deployed in the target node, and the field programmable gate array module is specifically configured to: receiving a third memory access request sent by equipment in other nodes, analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module; unlocking the target memory page table after the memory expansion module responds to the third memory access request; wherein the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphics processors in the other nodes.
In specific implementation, the device outside the node sends a third memory access request to the FPGA module in the target node through the network optical module, after the remote direct memory access protocol stack based on ethernet, the computing fast link and the direct memory access controller analyze, the attribute of the requested target memory page table is judged, when the target memory page table is enabled and the attribute meets the unlocked state, the page table update state is locked, a read-write memory operation command is sent to the memory expansion module, the read response data is returned to the device requested outside the node, and the page table update state is unlocked.
Referring to fig. 3, the field programmable gate array module includes a network optical module, an ethernet-based remote direct memory access protocol stack, a computing fast link and direct memory access controller, and a page table enabling module; the network optical module is used for receiving a third memory access request sent by equipment in other nodes; the remote direct memory access protocol stack based on the Ethernet, the computing fast link and the direct memory access controller are used for analyzing the third memory access request to determine a target memory page table of the request; the page table enabling module is used for judging whether the attribute of the target memory page table meets the preset condition, and if yes, locking the target memory page table; the computing fast link and the direct memory access controller are also used for sending a third memory access request to a memory expansion module mounted on the field programmable gate array module; the page table enabling module is further configured to unlock the target memory page table after the memory expansion module responds to the third memory access request. The data flow of the device outside the node (including the processor outside the node, the GPU outside the node and the FPGA module outside the node) for accessing the current FPGA module is 9, 4, 10, 11, 10, 2,4, 9, 10 and 11 in the figure 3 in sequence.
The FPGA module in this embodiment may access RoCEv (RDMA over Converged Ethernet version, based on the second version of the remote direct memory access protocol of the converged ethernet) network or standard ethernet through the network optical module, communicate with other computing nodes or storage nodes, and the GPU may communicate with the external resources of the nodes through the directly connected FPGA, without the need of the conventional scheme to connect with the RDMA network card through PCIE SWITCH (switch), so as to improve the data transmission efficiency between the nodes.
According to the memory expansion system provided by the embodiment of the invention, a multi-path ring network communication topology is formed between the GPU and the FPGA module, the GPU can directly access the memory expansion module mounted on the FPGA module connected with the GPU, in addition, the processor and other FPGA modules can also access the memory expansion module mounted on the current FPGA module, so that the GPU calculation memory is effectively expanded, the communication bottleneck is reduced, the GPU calculation resource utilization rate is increased, and the processing performance of the GPU is improved. Furthermore, each GPU realizes the cyclic processing of data stored in the memory expansion module in the link ring formed by the FPGA modules, and can fully utilize the interconnection pipelining processing data between the GPUs and the FPGA, so that the local memory of a single GPU is expanded, the data quantity processed by the single GPU is improved, and the processing performance of a plurality of GPUs is further improved.
The memory expansion system provided in the above embodiment may be applied to model training, where training data is divided into N training sub-data, and the N training sub-data are respectively stored in first memory areas in N field programmable gate array module mounted memory expansion modules.
As a possible implementation manner, the N training sub-data are respectively stored into the first memory areas in the memory expansion modules mounted by the N field programmable gate array modules through the peripheral component interconnect express bus interface or the network optical module.
As a possible implementation, the N graphics processors share the high bandwidth memory in the graphics processors through a ring connection. In specific implementation, when the local memory of the GPU is available, multiple GPUs access the local memory, and the GPUs can still transmit data by adopting a ring channel interconnected between the GPUs, and the internal control logic of the GPU and the DMA controller are responsible for memory read-write and data movement of each node. That is, multiple GPUs may communicate using a Ring Allreduce algorithm, and only the GPUs directly share the HBM memory limited between adjacent GPU cards.
As a possible implementation manner, the field programmable gate array module and the graphics processor perform data transmission through the DMA controller. In a specific implementation, when the local memory of the GPU is insufficient, the GPU accesses a memory expansion module mounted on an FPGA module connected with the GPU, the FPGA module can receive data movement descriptor information (including information such as a source address, a destination address, and a length) from the GPU, and the DMA controller is utilized to actively perform memory data transmission between the FPGA module and the GPU so as to release GPU computing and control resources, so that the GPU performs data computing processing more intensively.
The FPGA modules can also receive commands sent by host CPUs in the nodes or host CPUs outside the nodes through PCIe, and initiate transmission of the extended memory data among the FPGA modules according to the control logic of the figure 3 according to the information such as the source address, the destination address, the length and the like in the descriptor commands, and the transmission paths are shown in (9, 4), 10, 7 and 8 in the figure 3.
The n-th graphic processor reads target training data from a first memory area in the memory expansion module mounted by the n-th field programmable gate array module and a second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the n-th graphic processor, performs model training based on the target training data, and stores the processed data to the second memory area in the memory expansion module mounted by the n-th field programmable gate array module.
In a specific implementation, taking k=2 and GPU2 as examples, simultaneously reading the data of the second memory area in the memory expansion module mounted on the FPGA1 module and the data of the first memory area in the memory expansion module mounted on the FPGA2 module, which are processed by the GPU1, writing the data of the two memory areas into the second memory area in the memory expansion module mounted on the FPGA2 module after the data of the two memory areas are processed simultaneously, and so on, circularly processing the data stored on the link ring, so that the inter-card interconnection Interlaken channel pipelining processing data can be fully utilized.
Therefore, the local memory of the GPU is expanded through the memory expansion system, so that the data volume processed by a single GPU can be increased, the quantity of the GPUs required by the whole model training is reduced, the total communication time and the total communication quantity are further reduced, and the model training efficiency is improved.
The embodiment discloses a memory access method, referring to fig. 4, a flowchart of a first memory access method according to an exemplary embodiment is shown, as shown in fig. 4, including:
S101: receiving a memory access request; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
S102: and analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result so that the memory expansion module responds to the memory access request.
The execution body of the embodiment is a field programmable gate array module in the memory expansion system. In a specific implementation, the field programmable gate array module is located in a target node, and a processor, a graphics processor and other field programmable gate array modules connected with the field programmable gate array module in the target node can access the memory expansion module mounted on the field programmable gate array module, and a processor, a graphics processor and the field programmable gate array module in other nodes can also access the memory expansion module mounted on the field programmable gate array module.
Therefore, in this embodiment, the GPU may directly access the memory expansion module mounted on the FPGA module connected to the GPU, and in addition, the processor and other FPGA modules may also access the memory expansion module mounted on the current FPGA module, so as to effectively expand the GPU computing memory, reduce the communication bottleneck, increase the utilization rate of GPU computing resources, and improve the processing performance of the GPU.
The present embodiment discloses a memory access method, referring to fig. 5, and a flowchart of a second memory access method according to an exemplary embodiment is shown, as shown in fig. 5, including:
S201: receiving a first memory access request sent by a processor in a target node;
S202: analyzing the first memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module so that the memory expansion module responds to the first memory access request;
s203: and unlocking the target memory page table after the memory expansion module responds to the first memory access request.
In a specific implementation, a processor in a target node sends a first memory access request to an FPGA module through a CXL interface or a PCIE interface, after the first memory access request is analyzed by a computing fast link and a direct memory access controller, the attribute of a target memory page table of the request is judged, when the target memory page table is enabled and the attribute meets an unlocked state, the page table updating state is locked, a read-write memory operation command is sent to a memory expansion module, a read response data original path is returned to the processor, and the page table updating state is unlocked.
The present embodiment discloses a memory access method, referring to fig. 6, and a flowchart of a third memory access method according to an exemplary embodiment is shown, as shown in fig. 6, including:
S301: receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node;
S302: analyzing the second memory access request, and converting the address of a target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request;
S303: analyzing the converted second memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the converted second memory access request to a memory expansion module mounted on the field programmable gate array module so that the memory expansion module responds to the second memory access request;
S304: after the memory expansion module responds to the second memory access request, response data of the memory expansion module responding to the converted second memory access request is received, a response data packet conforming to a format corresponding to the target memory page table is constructed based on the response data, the response data packet is returned to a sender of the second memory access request, and the target memory page table is unlocked.
In a specific implementation, when a GPU or other FPGA module connected with an inter-link interface accesses a memory expansion module mounted on a current FPGA module, the multi-channel inter-link interface firstly receives a second memory access request, analyzes a specific command through an inter-card memory access request analysis module, converts the second memory access request into a physical address corresponding to a target memory page table according to a page table conversion module, sends the converted second memory access request to a computing fast link and a direct memory access controller for analysis, judges the enable and state of the target memory page table, locks the updated page table state when the target memory page table is enabled and the attribute meets an unlocked state, initiates access to the memory expansion module, returns read response data to the inter-card memory access response packet module after the DMA controller reads the read response data, converts the read response data into a response data packet conforming to a format corresponding to the target memory page table, sends the response data packet to the inter-link interface back to a sender, and unlocks the page table updating state.
The present embodiment discloses a memory access method, referring to fig. 7, and a flowchart of a fourth memory access method according to an exemplary embodiment is shown, as shown in fig. 7, including:
s401: receiving a third memory access request sent by equipment in other nodes; the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphic processors in the other nodes;
s402: analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the third memory access request to a memory expansion module mounted on the field programmable gate array module so that the memory expansion module responds to the third memory access request;
S403: and unlocking the target memory page table after the memory expansion module responds to the third memory access request.
In specific implementation, the device outside the node sends a third memory access request to the FPGA module in the target node through the network optical module, after the remote direct memory access protocol stack based on ethernet, the computing fast link and the direct memory access controller analyze, the attribute of the requested target memory page table is judged, when the target memory page table is enabled and the attribute meets the unlocked state, the page table update state is locked, a read-write memory operation command is sent to the memory expansion module, the read response data is returned to the device requested outside the node, and the page table update state is unlocked.
Based on the hardware implementation of the program modules, and in order to implement the method according to the embodiment of the present invention, the embodiment of the present invention further provides an electronic device, and fig. 8 is a block diagram of an electronic device according to an exemplary embodiment, and as shown in fig. 8, the electronic device includes:
A communication interface 1 capable of information interaction with other devices such as network devices and the like;
And the processor 2 is connected with the communication interface 1 to realize information interaction with other devices and is used for executing the memory access method provided by one or more technical schemes when running the computer program. And the computer program is stored on the memory 3.
Of course, in practice, the various components in the electronic device are coupled together by a bus system 4. It will be appreciated that the bus system 4 is used to enable connected communications between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. But for clarity of illustration the various buses are labeled as bus system 4 in fig. 8.
The memory 3 in the embodiment of the present invention is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be, among other things, a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory 3 described in the embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the above embodiment of the present invention may be applied to the processor 2 or implemented by the processor 2. The processor 2 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 2 or by instructions in the form of software. The processor 2 described above may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 2 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present invention. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the invention can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium in the memory 3 and the processor 2 reads the program in the memory 3 to perform the steps of the method described above in connection with its hardware.
The corresponding flow in each method of the embodiments of the present invention is implemented when the processor 2 executes the program, and for brevity, will not be described in detail herein.
In an exemplary embodiment, the present invention also provides a non-volatile storage medium storing a computer program executable by the processor 2 to perform the steps of the aforementioned method.
In an exemplary embodiment, the present invention also provides a computer program product comprising a computer program to be executed by the processor 2 for performing the steps of the aforementioned method.
Those of ordinary skill in the art will appreciate that: all or part of the steps of implementing the above method embodiments may be implemented by hardware associated with computer program instructions, where the computer program may be stored on a non-volatile storage medium, and where the computer program, when executed, performs the steps comprising the above method embodiments. Or the above-described integrated units of the invention may be stored in a non-volatile storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied essentially or in part in the form of a software product stored in a non-volatile storage medium, including instructions for causing an electronic device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the methods described in the embodiments of the present invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention.
Claims (20)
1. The memory expansion system is characterized by comprising N graphic processors and N field programmable gate array modules, wherein the field programmable gate array modules are provided with memory expansion modules, the N graphic processors are connected in a ring mode, the N field programmable gate array modules are connected in a ring mode, each graphic processor is connected with k field programmable gate array modules, the nth graphic processor is connected with the mth field programmable gate array module, k is more than or equal to 2 and less than or equal to N, N is more than or equal to 1 and less than or equal to N, when N is more than or equal to k, the value range of m is [ N-k+1, N ] U [ n+n-k+1, N ] when N is less than k;
The field programmable gate array module is used for receiving a memory access request, analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
The memory expansion module is used for responding to the memory access request;
The memory expansion module comprises a first memory area and a second memory area, the nth graphic processor reads data from the first memory area in the memory expansion module mounted by the nth field programmable gate array module and the second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the nth graphic processor, and stores the processed data to the second memory area in the memory expansion module mounted by the nth field programmable gate array module.
2. The memory expansion system of claim 1, wherein the field programmable gate array module mounts the memory expansion module through an open memory interface, and the field programmable gate array module sends the memory access request to the mounted memory expansion module through the open memory interface according to the analysis result;
And/or the double-rate static random access memory is mounted on the field programmable gate array module, and is mounted on the field programmable gate array module in a dual-in-line module mode, and the field programmable gate array module sends the memory access request to the double-rate static random access memory according to the analysis result so that the double-rate static random access memory responds to the memory access request.
3. The memory expansion system of claim 1, wherein the memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a first memory access request sent by a processor in the target node, analyzing the first memory access request to determine a requested target memory page table, judging whether the attribute of the target memory page table meets a preset condition, if yes, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module; and unlocking the target memory page table after the memory expansion module responds to the first memory access request.
4. The memory expansion system of claim 3, wherein the field programmable gate array module comprises a compute fast link interface or peripheral component interconnect fast bus interface, a compute fast link and direct memory access controller, a page table enabling module;
A computing fast link interface or a peripheral component interconnect fast bus interface for receiving a first memory access request sent by a processor in the target node;
Calculating a fast link and a direct memory access controller, wherein the fast link and the direct memory access controller are used for analyzing the first memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the memory expansion module is further used for sending the first memory access request to the field programmable gate array module;
and the page table enabling module is further used for unlocking the target memory page table after the memory expansion module responds to the first memory access request.
5. The memory expansion system of claim 1, wherein the memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node, analyzing the second memory access request, converting an address of a target memory page table requested in the second memory access request into a second memory access request with a converted memory physical address, analyzing the second memory access request to determine a target memory page table requested, judging whether the attribute of the target memory page table meets a preset condition, locking the target memory page table if the attribute of the target memory page table meets the preset condition, and sending the second memory access request after conversion to a memory expansion module mounted on the field programmable gate array module; and receiving response data of the second memory access request after the memory expansion module responds to the conversion, constructing a response data packet which accords with the format corresponding to the target memory page table based on the response data, returning the response data packet to a sender of the second memory access request, and unlocking the target memory page table.
6. The memory expansion system of claim 5, wherein the graphics processor comprises a high-speed channeled chip-to-chip interface, the field programmable gate array module comprises a high-speed channeled chip-to-chip interface, an inter-card memory access request resolution module, a page table translation module, a compute fast link and direct memory access controller, a page table enable module, an inter-card memory access response packet module; the different graphic processors, the different field programmable gate array modules and the field programmable gate array modules are connected with the graphic processors through high-speed channelized chip-to-chip interfaces;
The high-speed channelized chip-to-chip interface in the field programmable gate array module is used for receiving a second memory access request sent by the graphic processor and/or other field programmable gate array modules in the target node;
The inter-card memory access request analysis module is used for analyzing the second memory access request;
The page table conversion module is used for converting the address of the target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request;
calculating a fast link and a direct memory access controller, which are used for analyzing the converted second memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the memory expansion module is further used for sending the converted second memory access request to the field programmable gate array module;
the inter-card memory access response packet module is used for receiving response data of the second memory access request after the memory expansion module responds to the conversion, and constructing a response data packet which accords with a format corresponding to the target memory page table based on the response data;
the high-speed channelized chip-to-chip interface in the field programmable gate array module is further configured to return the response packet to the sender of the second memory access request;
and the page table enabling module is also used for unlocking the target memory page table.
7. The memory expansion system of claim 1, wherein the memory expansion system is deployed in a target node, and the field programmable gate array module is specifically configured to: receiving a third memory access request sent by equipment in other nodes, analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module; unlocking the target memory page table after the memory expansion module responds to the third memory access request; wherein the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphics processors in the other nodes.
8. The memory expansion system of claim 7, wherein the field programmable gate array module comprises a network optical module, an ethernet-based remote direct memory access protocol stack, a computational fast link and direct memory access controller, a page table enabling module;
the network optical module is used for receiving a third memory access request sent by equipment in other nodes;
The remote direct memory access protocol stack, the calculation fast link and the direct memory access controller based on the Ethernet are used for analyzing the third memory access request to determine a target memory page table of the request;
The page table enabling module is used for judging whether the attribute of the target memory page table meets a preset condition, and if yes, locking the target memory page table;
The fast link and the direct memory access controller are calculated, and the fast link and the direct memory access controller are also used for sending the third memory access request to the memory expansion module mounted on the field programmable gate array module;
And the page table enabling module is further used for unlocking the target memory page table after the memory expansion module responds to the third memory access request.
9. The memory expansion system of claim 1, wherein the graphics processor comprises a high bandwidth memory.
10. The memory expansion system of claim 1, wherein the memory expansion system is applied to model training, the training data is divided into N training sub-data which are respectively stored in a first memory area in a memory expansion module mounted on N field programmable gate array modules;
The n-th graphic processor reads target training data from a first memory area in the memory expansion module mounted by the n-th field programmable gate array module and a second memory area in the memory expansion module mounted by other field programmable gate array modules connected with the n-th graphic processor, performs model training based on the target training data, and stores the processed data to the second memory area in the memory expansion module mounted by the n-th field programmable gate array module.
11. The memory expansion system of claim 10, wherein the N training sub-data are stored in the first memory region of the N field programmable gate array module mounted memory expansion modules via peripheral component interconnect express bus interfaces or network optical modules, respectively.
12. The memory expansion system of claim 10, wherein the field programmable gate array module and the graphics processor are in data communication via a DMA controller.
13. The memory expansion system of claim 10, wherein the N graphics processors share high bandwidth memory in the graphics processors via a ring connection.
14. A memory access method, applied to a field programmable gate array module in a memory expansion system according to any one of claims 1 to 13, the method comprising:
Receiving a memory access request; the memory access request comprises a memory access request sent by a processor and/or a graphics processor and/or other field programmable gate array modules connected with the field programmable gate array module;
And analyzing the memory access request to obtain an analysis result, and sending the memory access request to a memory expansion module mounted on the field programmable gate array module according to the analysis result so that the memory expansion module responds to the memory access request.
15. The memory access method of claim 14, wherein the memory expansion system is deployed in a target node, and the receiving the memory access request comprises:
Receiving a first memory access request sent by a processor in the target node;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
Analyzing the first memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the first memory access request to a memory expansion module mounted on the field programmable gate array module;
Correspondingly, after the memory expansion module responds to the first memory access request, the method further comprises:
And unlocking the target memory page table.
16. The memory access method of claim 14, wherein the memory expansion system is deployed in a target node, and the receiving the memory access request comprises:
Receiving a second memory access request sent by a graphic processor and/or other field programmable gate array modules in the target node;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
Analyzing the second memory access request, and converting the address of a target memory page table requested in the second memory access request into a memory physical address to obtain a converted second memory access request;
Analyzing the converted second memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the converted second memory access request to a memory expansion module mounted on the field programmable gate array module;
Correspondingly, after the memory expansion module responds to the second memory access request, the method further comprises:
And receiving response data of the second memory access request after the memory expansion module responds to the conversion, constructing a response data packet which accords with the format corresponding to the target memory page table based on the response data, returning the response data packet to a sender of the second memory access request, and unlocking the target memory page table.
17. The memory access method of claim 14, wherein the memory expansion system is deployed in a target node, and the receiving the memory access request comprises:
Receiving a third memory access request sent by equipment in other nodes; the devices in the other nodes comprise processors and/or field programmable gate array modules and/or graphic processors in the other nodes;
correspondingly, analyzing the memory access request to obtain an analysis result, and sending the memory access request to the memory expansion module mounted on the field programmable gate array module according to the analysis result, wherein the memory expansion module comprises:
analyzing the third memory access request to determine a target memory page table of the request, judging whether the attribute of the target memory page table meets a preset condition, if so, locking the target memory page table, and sending the third memory access request to a memory expansion module mounted on the field programmable gate array module;
correspondingly, after the memory expansion module responds to the third memory access request, the method further comprises:
And unlocking the target memory page table.
18. An electronic device, comprising:
a memory for storing a computer program;
A processor for implementing the steps of the memory access method according to any of claims 14 to 17 when executing the computer program.
19. A non-volatile storage medium having stored thereon a computer program which when executed performs the steps of the memory access method according to any of claims 14 to 17.
20. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the memory access method of any of claims 14 to 17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410889277.8A CN118426976B (en) | 2024-07-04 | 2024-07-04 | Memory expansion system, access method and device, medium and computer program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410889277.8A CN118426976B (en) | 2024-07-04 | 2024-07-04 | Memory expansion system, access method and device, medium and computer program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118426976A true CN118426976A (en) | 2024-08-02 |
CN118426976B CN118426976B (en) | 2024-09-20 |
Family
ID=92321923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410889277.8A Active CN118426976B (en) | 2024-07-04 | 2024-07-04 | Memory expansion system, access method and device, medium and computer program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118426976B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114445260A (en) * | 2022-01-17 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Distributed GPU communication method and device based on FPGA |
CN115037747A (en) * | 2022-05-31 | 2022-09-09 | 北京百度网讯科技有限公司 | Data communication method and device, distributed system, device and medium |
WO2023098032A1 (en) * | 2021-11-30 | 2023-06-08 | 苏州浪潮智能科技有限公司 | Memory space extension method and apparatus, electronic device, and storage medium |
CN117742996A (en) * | 2023-12-22 | 2024-03-22 | 海光信息技术股份有限公司 | Communication optimization method and device for calculation, electronic equipment and storage medium |
-
2024
- 2024-07-04 CN CN202410889277.8A patent/CN118426976B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023098032A1 (en) * | 2021-11-30 | 2023-06-08 | 苏州浪潮智能科技有限公司 | Memory space extension method and apparatus, electronic device, and storage medium |
CN114445260A (en) * | 2022-01-17 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Distributed GPU communication method and device based on FPGA |
CN115037747A (en) * | 2022-05-31 | 2022-09-09 | 北京百度网讯科技有限公司 | Data communication method and device, distributed system, device and medium |
CN117742996A (en) * | 2023-12-22 | 2024-03-22 | 海光信息技术股份有限公司 | Communication optimization method and device for calculation, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN118426976B (en) | 2024-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11929927B2 (en) | Network interface for data transport in heterogeneous computing environments | |
US12099458B2 (en) | Pooled memory address translation | |
US20210105207A1 (en) | Direct memory access (dma) engine with network interface capabilities | |
US10732879B2 (en) | Technologies for processing network packets by an intelligent network interface controller | |
US10725957B1 (en) | Uniform memory access architecture | |
CN106462524B (en) | Interconnect system and method using hybrid memory cube links | |
JP2024099640A (en) | Unified address space for multiple hardware accelerators using dedicated low latency link | |
US20200379922A1 (en) | Adaptive routing for pooled and tiered data architectures | |
US10437747B2 (en) | Memory appliance couplings and operations | |
US12056528B2 (en) | System for cooperation of disaggregated computing resources interconnected through optical circuit, and method for cooperation of disaggregated resources | |
US20230325277A1 (en) | Memory controller performing selective and parallel error correction, system including the same and operating method of memory device | |
WO2020122988A1 (en) | Memory request chaining on bus | |
US11231927B2 (en) | System, apparatus and method for providing a fabric for an accelerator | |
CN118426976B (en) | Memory expansion system, access method and device, medium and computer program product | |
US11003616B1 (en) | Data transfer using point-to-point interconnect | |
US20230195368A1 (en) | Write Request Buffer | |
US11451435B2 (en) | Technologies for providing multi-tenant support using one or more edge channels | |
CN116455849B (en) | Concurrent communication method, device, equipment and medium for many-core processor | |
WO2023051248A1 (en) | Data access system and method, and related device | |
KR20220070951A (en) | Memory device, system including the same and operating method of memory device | |
CN116795742A (en) | Storage device, information storage method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |