CN116049037A - Method and device for accessing target memory, electronic equipment and storage medium - Google Patents

Method and device for accessing target memory, electronic equipment and storage medium Download PDF

Info

Publication number
CN116049037A
CN116049037A CN202310129935.9A CN202310129935A CN116049037A CN 116049037 A CN116049037 A CN 116049037A CN 202310129935 A CN202310129935 A CN 202310129935A CN 116049037 A CN116049037 A CN 116049037A
Authority
CN
China
Prior art keywords
target
memory
register
size
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310129935.9A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biren Intelligent Technology Co Ltd
Original Assignee
Shanghai Biren Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biren Intelligent Technology Co Ltd filed Critical Shanghai Biren Intelligent Technology Co Ltd
Priority to CN202310129935.9A priority Critical patent/CN116049037A/en
Publication of CN116049037A publication Critical patent/CN116049037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and device for accessing a target memory, an electronic device and a storage medium. The method for accessing the target memory comprises the following steps: rewriting a size of a mapped storage space of a target register, wherein the target register includes variable-size space control information for rewriting the size of the mapped storage space; receiving target information required for executing data processing; the target information is mapped to the current mapped storage space to determine data processing. The method for accessing the target memory utilizes the rewritten target register to finish the mapping of the target information to the address space and the extended address space of the target memory, so that the number of the target registers is not required to be increased, the functional diversity of accessing the target memory is improved, the method is suitable for various interconnection modes among computing devices, the flexibility of the execution mode of data processing is improved, the universality of the memory access mode is enhanced, and the hardware realization cost is reduced.

Description

Method and device for accessing target memory, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to a method and a device for accessing a target memory, electronic equipment and a storage medium.
Background
Computing devices are connected to each other via an interconnection structure such as a network or a high-speed serial bus. When using multiple devices to perform operations or data processing, a Host (Host) is mainly responsible for flow control, operation analysis, and a small amount of computation, a large amount of operations are transferred to a computing Device (Device) to run, and the computing devices between different computing nodes are required to directly exchange large blocks of data with each other. As the operational modes change, there is an increasing need for computing devices between different computing nodes to directly exchange data with each other.
Disclosure of Invention
At least one embodiment of the present disclosure provides a method for accessing a target memory, where the method for accessing the target memory includes: the method comprises the steps of rewriting the size of a mapping storage space of a target register to obtain a current mapping storage space, wherein the target register comprises variable-size space control information, the variable-size space control information is used for rewriting the size of the mapping storage space, and the current mapping storage space comprises a first address space and an extended address space corresponding to a target memory; receiving target information required for executing data processing; and mapping the target information to the current mapped storage space to determine the data processing.
For example, a method for accessing a target memory according to at least one embodiment of the present disclosure further includes: executing the data processing by utilizing the current mapping storage space; the data processing comprises at least one calculation operation type, and the data to be processed corresponding to the target information comprises at least one data type.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the data processing includes distributed data integration computation.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the overwriting the size of the mapped storage space of the target register to obtain the current mapped storage space includes: and expanding the mapping storage space of the target register from an initial mapping storage space of a first size to the current mapping storage space of a second size, wherein the second size is larger than the first size.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the first size of initial mapping storage space includes a first interface address space, the second size of target register includes n+1th interface address space, the n+1th interface address space includes the first interface address space to the n+1th interface address space, N is a positive integer, the extended address space includes N sections, the N sections include the first section to the N-th section, the target information includes basic information and N sets of calculation information, and the mapping the target information into the current mapping storage space includes: mapping the base information into the first address space using the first interface address space; the k-th set of computing information in the N-th set of computing information is mapped into the k-th section in the N-th section using the k-th interface address space in the n+1 interface address spaces, k=1, 2, …, N.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the data processing includes N sets of computations, and the method further includes: and mapping a kth set of computations in the N sets of computations using the kth segment based on the kth set of computation information.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the size of the extended address space is an integer multiple of the size of the first address space.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the target register includes a first sub-register and a second sub-register, the first sub-register includes variable size space supporting information, the second sub-register includes the variable size space controlling information, and the overwriting the size of the mapped storage space of the target register includes: reading the variable size space support information in the first sub-register to determine a size of a mapped storage space supported by the target register; the variable size space control information in the second sub-register is rewritten based on the determined size of the mapped storage space supported by the target register.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the target register includes a base register of adjustable size that can be used to overwrite the mapped memory space.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the overwriting the size of the mapped storage space of the target register includes: and rewriting the size of the mapping storage space of the target register based on a PCIe bus protocol.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, data to be processed corresponding to the target information is stored in a first memory, the target memory is located in a target node, the first memory is located in a first node, the target node and the first node are located in a first topology, and the target information is transmitted from the first node to the target node for executing access to the target memory.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, a first computing device is located in the first node, a target computing device is located in the target node, the first computing device includes the first memory, the target computing device includes the target memory and the target register, and the target register is configured to map the target information from the first computing device to the mapped storage space of the target computing device.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus, and the target information is transmitted from the first node to the target node through the PCIe bus, so as to be used for executing access to the target memory.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, a connection of the first memory to the target memory allows remote direct memory access, and the target information is transmitted from the first node to the target node for performing the remote direct memory access to the target memory.
For example, in the method for accessing a target memory according to at least one embodiment of the present disclosure, the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus, a first network card is located in the first node, a second network card is located in the target node, the first memory is connected to the first network card through a first branch of the PCIe bus, the first network card is connected to the second network card through a first link, and the second network card is connected to the target memory through a second branch of the PCIe bus, where the first link includes a transmission medium.
For example, in a method for accessing a target memory provided in at least one embodiment of the present disclosure, the first memory is connected to the target memory through a second link, and the target information is transmitted from the first node to the target node through the second link, for performing access to the target memory, where the second link includes a private modification based on a PCIe bus protocol.
The present disclosure also provides an apparatus for accessing a target memory, where the apparatus for accessing a target memory includes: the system comprises a rewriting module, a memory management module and a memory management module, wherein the rewriting module is configured to rewrite the size of a mapping memory space of a target register to obtain a current mapping memory space, the target register comprises variable size space control information, the variable size space control information is used for rewriting the size of the mapping memory space, and the current mapping memory space comprises a first address space and an extended address space corresponding to a target memory; a receiving module configured to receive target information required for performing data processing; and a mapping module configured to map the target information to the current mapped storage space to determine the data processing.
For example, an apparatus for accessing a target memory according to at least one embodiment of the present disclosure further includes: an execution module configured to execute the data processing using the current mapped storage space; the data processing comprises at least one calculation operation type, and the data to be processed corresponding to the target information comprises at least one data type.
At least one embodiment of the present disclosure also provides an electronic device. The electronic device includes: a processor; a memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules being configured to implement the method of accessing a target memory provided by any of the embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a storage medium storing non-transitory computer-readable instructions that, when executed by a computer, implement a method of accessing a target memory provided by any of the embodiments of the present disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a schematic diagram of a distributed data integration calculation;
FIG. 2 is a schematic diagram of a remote direct memory access scheme;
FIG. 3 is an exemplary flow chart of a method for accessing a target memory according to at least one embodiment of the present disclosure;
FIG. 4A is a schematic diagram of an example of a method for accessing a target memory according to at least one embodiment of the present disclosure;
FIG. 4B is a schematic diagram of one example of a destination register for performing data processing provided in accordance with at least one embodiment of the present disclosure;
FIG. 5A is a schematic diagram of one example of an extended configuration space of a destination register provided in accordance with at least one embodiment of the present disclosure;
FIG. 5B is a schematic diagram of one example of a first sub-register in a destination register provided in accordance with at least one embodiment of the present disclosure;
FIG. 5C is a schematic diagram of one example of a second sub-register in a destination register provided in accordance with at least one embodiment of the present disclosure;
FIG. 6 is a schematic diagram of another example of a method for accessing a target memory according to at least one embodiment of the present disclosure;
FIG. 7 is a schematic diagram of yet another example of a method for accessing a target memory according to at least one embodiment of the present disclosure;
FIG. 8 is a schematic block diagram of an apparatus for accessing a target memory according to at least one embodiment of the present disclosure;
FIG. 9 is a schematic block diagram of an electronic device provided in accordance with at least one embodiment of the present disclosure;
FIG. 10 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure; and
fig. 11 is a schematic diagram of a storage medium according to at least one embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
The present disclosure is illustrated by the following several specific examples. Detailed descriptions of known functions and known components may be omitted for the sake of clarity and conciseness in the following description of the embodiments of the present disclosure. When any element of an embodiment of the present disclosure appears in more than one drawing, the element is identified by the same or similar reference numeral in each drawing.
In some computing tasks, it is desirable to perform the computation in a data-parallel fashion. Executing computing tasks in parallel by multiple computing devices may increase processing speed and flexibility in the computing environment. For example, the computing device may include a graphics processor (Graphics Processing Unit, GPU), a General-purpose graphics processor (GPGPU) or other type of chip or device such as an AI accelerator, or the like. For example, data in one computing node's computing device may be transferred to another computing node's computing device, completing a particular operation requirement during the transfer without having to store the transferred data in memory before reading the data out to perform the particular operation.
For example, in a large-scale artificial intelligence (Artificial Intelligence, AI) training cluster, training may be accomplished in a data-parallel manner. For example, distributed data integration (All-reduce) is a training algorithm for deep learning, which needs to be implemented in a data parallel manner; each computing device uses the same model, different training samples, and the gradient data calculated by each computing device needs to be integrated (reduce) and then updated with parameters.
FIG. 1 is a schematic diagram of a distributed data integration calculation.
For example, as shown in FIG. 1, in distributed data consolidation computing, data in each work unit (worker) is batched into multicomponent patch data (mini-batch). For example, the data in the working unit 1 includes 4-group piece data a0, a1, a2, and a3, the data in the working unit 2 includes 4-group piece data b0, b1, b2, and b3, the data in the working unit 3 includes 4-group piece data c0, c1, c2, and c3, and the data in the working unit 4 includes 4-group piece data d0, d1, d2, and d3. Here, the work unit may be implemented as a computing device, for example.
For example, each working unit reads the respective required multicomponent slice data, and synchronously calculates a gradient function (gradient) of a loss function of each multicomponent slice data. Further, after integrating each work unit gradient function, the calculation model is updated. For example, as shown in fig. 1, in the updated calculation model, the data in each working unit includes the same 4-component piece of data, and the 4-component piece of data includes a0+b0+c0+d0, a1+b1+c1+d1, a2+b2+c2+d2, and a3+b3+c3+d3, respectively, thereby completing the distributed data integration.
For example, for computing tasks that employ data parallelism (e.g., distributed data-integration computing or other types of computing tasks), movement and synchronization of large volumes of data between multiple computing devices is required. Remote direct memory access (Remote Direct Memory Access, RDMA) technology is widely used for data transfer between multiple computing devices. RDMA technology is a software and hardware cooperative direct memory access (Direct Memory Access, DMA) technology for use between the memory of multiple computing devices, where the memory of one computing device of a remote node may be directly accessed (e.g., read or written to) from another computing device of the network node, thereby enabling direct transfer between the remote node memory and the local node memory.
FIG. 2 is a schematic diagram of a remote direct memory access scheme.
For example, as shown in fig. 2, in the RDMA scheme, computing device 0 and network card 0 belong to the same node 0, and computing device 1 and network card 1 belong to the same node 1. For example, node 0 and/or node 1 may be a computer or other computing platform; computing device 0 and/or computing device 1 may include a GPU, GPGPU, AI accelerator or other computing device. For example, when the memory 0 of the computing device 0 needs to access the memory 1 of the computing device 1, the network card 0 reads data to be transmitted from the memory 0 through the link 0; the network card 0 transmits the data to the network card 1 through the link 2; after the network card 1 receives the data, the data is directly written into the memory 1 through the link 1.
For example, link 2 may be implemented in an InfiniBand (InfiniBand) or Ethernet (Ethernet) manner, and link 0 and link 1 respectively belong to local hardware interconnection of the nodes where memory 0 and memory 1 are located. Therefore, when data is transmitted in an RDMA mode between memories (namely 'link 0-link 2-link 1'), the data is only transmitted through the network card or the bus, and the data does not need to pass through a host, so that the time delay of data communication between different computing devices is shortened, and the transmission performance is improved.
For example, for RDMA approaches as shown in FIG. 2, one may implement the following:
1) Implementing the computing device and network card as interface logic of a high-speed serial computer expansion bus standard (Peripheral Component Interconnect Express, PCIe) bus protocol, on which each manufacturer makes proprietary modifications to link 0 and link 1, e.g., nvLink bus, XGMI bus, etc.;
2) Completely adopts a standard but complex PCIe topology, thereby realizing point-to-point (peer-to-peer) interconnection of memory 0-memory 1;
3) Other topologies not related to PCIe bus protocols.
For example, for the first implementation, when computing device 0 and computing device 1 as shown in FIG. 2 belong to different vendors, although some development costs may be saved by borrowing PCIe interface logic. However, in this manner, the link 0 and the link 1 are implemented as a Root Complex (RC) point-to-point interconnect (i.e., RC- > EP) from a Root Complex (RC) device to an end device (EP) in PCIe, and it is difficult to directly become under a PCIe topology controlled by a host RC, and it is difficult to implement uniform control by the host. Moreover, because different manufacturers perform private modification on the link 0 and the link 1 in different manners, incompatibility may occur when data transmission is performed. Therefore, the method can only realize the interconnection between the computing devices of the same manufacturer, and cannot realize the interconnection between the computing devices of different manufacturers.
For example, for the second implementation, the interconnection between computing device 0 and computing device 1, as shown in FIG. 2, may be considered a point-to-point interconnection between two end devices in a PCIe bus. This implementation employs a full PCIe topology, which may not require link 2 in fig. 2, but requires the use of a multi-level PCIe switch (switch) to implement a point-to-point interconnect (i.e., EP- > EP) for "memory 0-memory 1", resulting in an increased complexity of the hardware topology and thus greater latency.
For example, for the third implementation, due to the difference of computing devices among different manufacturers, there is also a compatibility problem, and complicated setting or modification needs to be performed on the interconnection topology, so that the universality of the implementation is not strong, and a certain limitation exists.
To sum up, for a computing task that adopts data parallelism, on one hand, when the computation amount of data parallelism is large (for example, in distributed data integration computation as shown in fig. 1, there is a large amount of sliced data), when the computing device performs a computing operation, a large memory space is required, and a plurality of registers are required to complete address mapping; on the other hand, there are certain drawbacks to the manner in which data is transferred between multiple computing devices (e.g., RDMA technology): for a complete transmission scheme of PCIe topology, the hardware complexity is high; for a transmission scheme without passing through a host, it is difficult to realize uniform control by using the host, resulting in poor compatibility.
At least one embodiment of the present disclosure provides a method for accessing a target memory, where the method for accessing the target memory includes: rewriting the size of the mapping storage space of the target register to obtain the current mapping storage space; receiving target information required for executing data processing; the target information is mapped to the current mapped storage space to determine data processing. For example, the method is applicable to accessing a target memory of a target computing device.
At least one embodiment of the present disclosure further provides an apparatus, an electronic device, and a storage medium for accessing a target memory, which are configured to implement the method for accessing a target memory in the foregoing embodiments.
According to the method, the device, the electronic equipment and the storage medium provided by at least one embodiment of the present disclosure, the mapping of the target information to the current mapping storage space is completed by rewriting the size of the mapping storage space of the target register, so as to determine the data processing required to be performed, thereby not increasing the number of the target registers, improving the functional diversity of the access target memory, being applicable to various interconnection modes among computing equipment, improving the flexibility of the execution mode of the data processing, enhancing the universality of the memory access mode, and reducing the hardware implementation cost.
Hereinafter, at least one embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals in different drawings will be used to refer to the same elements already described.
Fig. 3 is an exemplary flowchart of a method for accessing a target memory according to at least one embodiment of the present disclosure.
For example, as shown in FIG. 3, at least one embodiment of the present disclosure provides a method of accessing a target memory for mapping target information into a mapped memory space to perform data processing. For example, the method may include the following steps S10 to S30.
Step S10: the size of the mapping storage space of the target register is rewritten to obtain the current mapping storage space;
step S20: receiving target information required for executing data processing;
step S30: the target information is mapped to the current mapped storage space to determine data processing.
For example, the current mapped storage space includes a first address space and an extended address space corresponding to the target memory. The target information includes, for example, an access address (e.g., a logical address or a physical address) or the like corresponding to an access request for data to be processed, including, for example, reading or writing the data to be processed, which is transferred during access to the target memory for performing data processing.
For example, the data processing includes at least one type of computing operation, and the data to be processed corresponding to the target information includes at least one type of data. For example, the computing operation types of the data processing may include distributed data integration computing (All-reduce), comparison operation (Compare), etc., and an exemplary manner of the distributed data integration computing may be described with reference to fig. 1, which is not repeated herein; performing data processing may also include performing other computing operation types of computing tasks (e.g., other computing tasks employing data parallelism), as embodiments of the present disclosure are not limited in this regard. For example, the data to be processed corresponding to the target information may include FP32 (single precision floating point number), FP16 (half precision floating point number) and other data types, or may select other data types according to actual needs, which is not limited in the embodiments of the present disclosure.
For example, a computing device for performing data processing is a target computing device, and a memory of the target computing device is a target memory; taking the topology shown in fig. 2 as an example, the target computing device may be computing device 1, and the target memory may be memory 1. For example, the computing device may include a GPU, GPGPU, AI accelerator or other type of chip or device, as embodiments of the disclosure are not limited in this regard.
For example, the target computing device further includes a target register, configured to configure an access address of the target computing device to obtain a mapping storage space, where the mapping storage space includes a first address space corresponding to the target memory, and may further include an extended address space outside the first address space, so that the received target information may be mapped to the mapping storage space based on the mapping storage space recorded by the target register; in particular, the destination register includes variable size space control information for overwriting the size of the mapped storage space to which the destination register corresponds. In some examples, the target register includes a base address register (Base Address Register, BAR) that rewrites the size of the mapped storage space.
For example, information such as the size of a hardware storage space required for performing data processing (for example, the size of an address space required for mapping data processing) may be obtained based on the type of calculation operation of data processing to be performed, the data type of data to be processed, or the number of data to be processed, or the like; by rewriting the size of the mapping memory space of the target register, the received target information can be mapped into the matched address space to perform data processing; the flexibility of the system configuration can be increased, so that the system can be suitable for different calculation operation types of data processing, data types or data volume sizes of data to be processed and the like.
For example, in step S20, target information for performing data processing is received by an associated processing circuit in the target computing device; in some examples, the processing circuitry may determine, based on the relationship between the target information and the rewritten current mapped storage space, a data process to be performed, e.g., writing the data to be processed received by the target computing device to the target memory (or other storage device of the target computing device), and, for example, writing the result of the operation to the target memory after operating the data to be processed received by the target computing device with the existing data stored in the target memory. The target information includes, for example, address information required to perform data processing, specifically including, for example, but not limited to, basic information for accessing a target memory, calculation information for performing specific calculation operations, and the like. In some examples, taking data processing as distributed data integration computation as shown in fig. 1 as an example, multiple sets of computation information may be included in the target information, where each set of computation information includes information related to operations such as gradient integration (reduce) of the corresponding sliced data. For example, when data processing is other types of computing tasks, the computing information accordingly includes other types of information, as well, embodiments of the present disclosure are not limited in this regard.
For example, in step S30, the destination register is coupled with the processing circuit that receives the destination information to map the destination information to the first address space and the extended address space of the destination memory. For example, the first address space of the target memory is used for mapping normal access operations (such as basic information) to the target memory, and the extended address space is used for mapping specific computing operations (such as computing information) for performing data processing, so that when an access address recorded by the target information falls into the first address space, the normal access operations to the target memory are indicated, and when the access address falls into a certain section of the extended address space, the data processing of a certain type is performed on the data to be processed. In some examples, a computing operation corresponding to an extended address space may use a cache (cache) of a target computing device to cache pending data; the extended address space may also be an address space that may be used to map computing information according to actual needs, which embodiments of the present disclosure do not limit.
For example, for the case of an operation that only requires access to the target memory (i.e., no other computing tasks), the size of the mapped storage space of the target register may be set to be the same as the first address space size of the target memory; if other computing tasks (for example, distributed data integration computing and the like employing data parallelism) are included in the data processing in addition to the operation required to access the target memory, the size of the mapped storage space of the target register needs to be rewritten (or expanded) to map the corresponding computing information into the expanded address space to perform the data processing by mapping the expanded address space.
For example, the target memory access in steps S10 to S30 may be applicable to various interconnection manners between the target computing device and other computing devices; for example, the method is applicable to the RDMA mode as shown in fig. 2, and is also applicable to the complete PCIe topology, or other interconnection modes may be selected according to actual needs, which is not limited by the embodiments of the present disclosure.
According to the method for accessing the target memory, the size of the mapping storage space of the target register is rewritten, and the mapping from the target information to the current mapping storage space is performed by utilizing the current mapping storage space recorded by the rewritten target register, so that the number of the target registers is not required to be increased, the functional diversity of accessing the target memory is improved, the method is suitable for various interconnection modes among computing devices, the flexibility of the execution mode of data processing is improved, the universality of the memory access mode is enhanced, and the hardware realization cost is reduced.
In some examples, step S10 may further include: the mapped memory space of the destination register is expanded from the initial mapped memory space of the first size to the current mapped memory space of the second size. For example, the rewritten (current) second size is larger than the first size.
For example, the first size of the initial mapped memory space comprises a first interface address space, the second size of the current mapped memory space comprises n+1 interface address spaces, and the n+1 interface address spaces comprise the first interface address space through the n+1 interface address space, where N is a positive integer. For example, the target information includes basic information and N sets of calculation information.
For example, the extended address space includes N sections including a first section through an nth section. In some examples, the extended address space is an integer multiple (e.g., N times) of the first address space of the target memory; in other examples, the extended address space may also be a non-integer multiple of the first address space of the target memory; embodiments of the present disclosure are not limited in this regard.
For example, step S30 may further include: mapping the basic information into a first address space of a target memory by using the first interface address space; the kth set of computing information in the N sets of computing information is mapped into the kth segment in the N segments using the kth interface address space in the n+1 interface address spaces, where k=1, 2, …, N.
For example, the data processing includes N sets of computations, corresponding to the N sets of computation information, respectively. The method for accessing the target memory provided by at least one embodiment of the present disclosure may further include: the kth group of computations in the N groups of computations are mapped with the kth section based on the kth group of computation information. For example, if the access address falls within the kth section, it means that the kth set of computations is performed on the corresponding pending data. For example, the N sets of computations may include N sets of gradient integration operations of multi-component patch data as in FIG. 1, as well as other types of N sets of computation operations employing multiple sets of data in parallel, as embodiments of the present disclosure are not limited in this regard.
FIG. 4A is a schematic diagram of an example of a method for accessing a target memory according to at least one embodiment of the present disclosure; FIG. 4B is a schematic diagram of one example of a destination register for performing data processing provided in at least one embodiment of the present disclosure. For example, fig. 4A and 4B are specific examples of the method of accessing the target memory shown in fig. 3.
For example, as shown in fig. 4A, the target computing device is configured to perform data processing, and the memory of the target computing device is a target memory; also included in the target computing device is a target register BARi for mapping target information into target memory. For example, in step S10, the size of the mapped storage space of the target register BARi is rewritten (e.g., expanded from the first size to the second size); in step S20, target information required for performing data processing is received; in step S30, the target information is mapped to the current mapped memory space using the target register BARi to determine the above data processing.
For example, the size of the mapping memory space of the target register BARi is a first size before being extended; the initial mapped memory space of the first size comprises only the first interface address space BARi-APER0. For example, as shown in FIG. 4B, after the target register BAri is expanded to the second size, the second size of the currently mapped memory space includes N+1th interface address space, where N is a positive integer, including the first interface address space through the N+1th interface address space (BARIAPER 0, BARIAPER 1 … … BARIAPERN).
For example, the target information includes basic information and N sets of calculation information; the basic information is used to access the target memory, and the calculation information is used to perform specific calculation operations (e.g., distributed data integration calculation, etc. calculation tasks employing data parallelism). For example, as shown in fig. 4B, the N sets of calculation information include the 1 st set of calculation information to the N-th set of calculation information (op_1, op_ … … op_n).
For example, the extended address space includes N sections (not shown in the figure), which include first to nth sections. For example, in FIG. 4B, the extended address space may be N times the first address space of the target memory; the N sections are the same size and each section is the same size as the first address space of the target memory. It should be noted that the sizes of the N sections may also be different, and the size of each section may be selected according to the actual requirement of data processing, which is not limited by the embodiment of the present disclosure.
For example, as shown in FIG. 4B, the first interface address space BAri-APER0 maps basic information to the first address space of the target memory for accessing the target memory itself; the kth interface address space BARi-apex (k-1) maps the kth set of calculation information op_k into the kth section of the extended address space, where k=1, 2, …, N. For example, the data processing includes N sets of computations, corresponding to the N sets of computation information, respectively. For example, as shown in fig. 4B, the kth section may be used to represent kth group calculation information op_k for performing kth group calculation among N groups of calculation.
Note that, the manner of rewriting (or expanding) the target register as shown in fig. 4B is only an example, and other manners of rewriting (or expanding) may be selected according to the actual situation, which is not limited in the embodiments of the present disclosure.
Based on the description of fig. 4A and fig. 4B, it can be known that, in the method for accessing the target memory according to at least one embodiment of the present disclosure, the target register is expanded from one interface address space to a plurality of interface address spaces, and the mapping from the target information to the current mapping storage space is completed by using the plurality of interface address spaces, so that the number of the target registers does not need to be increased, the functional diversity of accessing the target memory is improved, and the method is suitable for multiple interconnection modes between computing devices, the flexibility of the execution mode of data processing is improved, the universality of the memory access mode is enhanced, and the hardware implementation cost is reduced.
In some examples, the target register includes a first sub-register including variable size space support information and a second sub-register including variable size space control information. For example, step S10 in fig. 3 may further include: reading variable size space supporting information in the first sub-register to determine the size of the mapping storage space supported by the target register; based on the determined size of the mapped memory space supported by the target register, and overwriting variable size space control information in the second sub-register.
For example, the size of the mapped memory space of the destination register may be rewritten based on the PCIe bus protocol; in particular, when the target register includes a base register, the size of the mapped memory space of the target register may be overwritten with the optional adjustable size BAR (Resizable BAR) feature provided in the PCIe bus protocol. For example, the first sub-register may be an adjustable size BAR capacity register (Resizable BARCapability Register) and the second sub-register may be an adjustable size BAR control register (Resizable BAR Control Register).
It should be noted that, in addition to the above method in the example, other ways may be selected to rewrite the size of the mapping storage space of the target register according to actual needs, which is not limited by the embodiments of the present disclosure.
FIG. 5A is a schematic diagram of one example of an extended configuration space of a destination register provided in accordance with at least one embodiment of the present disclosure; FIG. 5B is a schematic diagram of one example of a first sub-register in a destination register provided in accordance with at least one embodiment of the present disclosure; FIG. 5C is a schematic diagram of one example of a second sub-register in a destination register provided in at least one embodiment of the present disclosure.
For example, as shown in FIG. 5A, the target register has an extended configuration space including a header space, a plurality of first sub-registers, and a plurality of second sub-registers; for example, the first sub-register and the second sub-register are configured in pairs. For example, in a pair of a first sub-register including variable size space supporting information (e.g., information about the size of the mapped storage space supported by the target register) and a second sub-register including variable size space control information (i.e., control information for changing the size of the mapped storage space). For example, the host may read the variable size space support information in the first sub-register to determine the size of the mapped storage space supported by the target register; then, under the control of the driver, the step S10 in fig. 3 may be performed based on the determined size of the mapped storage space supported by the target register, and the variable-size space control information in the second sub-register is rewritten, that is.
For example, in some examples, the extended configuration space of the target registers shown in FIG. 5A may be rewritten based on the PCIe bus protocol. For example, the adjustable-size BAR feature in the PCIe bus protocol is optional, thereby providing the possibility for the rewriteability of the variable-size spatial control information of the second sub-register. It should be noted that the above-described overwriting process needs to be completed before the Operating System (OS) has completed the entire allocation of the entire PCIe address space by the host.
For example, one PCIe function (function) includes at most 6 sets of base address registers; as shown in FIG. 5A, to implement the adjustable-size BAR feature in the PCIe protocol, each group of base address registers (i.e., target registers) has an adjustable-size BAR capacity function register (i.e., first sub-register) and an adjustable-size BAR control register (i.e., second sub-register), e.g., the first and second sub-registers are each 4B in size. Thus, to achieve the adjustable-size BAR characteristic, the size of 6 groups of base registers is at most 6 (4b+4b) =48b, plus the header space (4B) of the target register itself, so that the target register requires 48b+4b=52b at maximum. For example, because the size of the mapped memory space of each set of base address registers may be further rewritten (or expanded), the re-written (or expanded) resizable BAR feature may support larger memory spaces (e.g., 512GB, even 8EB, etc.).
For example, to achieve mapping to the target memory, the mapped storage size of the target register needs to be the same as the first address space size of the target memory. For example, when the entire system address space is 32 bits, a base register (4b=32 bits) is included in the target register; when the entire system address space is 64 bits, 2 base registers are needed to spell one target register (2×4b=8b=64 bits).
For example, as shown in FIG. 5B, when the first sub-register is a BAR capacity register of adjustable size, the first sub-register may support multiple sizes (e.g., 1MB-128 TB) and various permutations thereof; the headspace information (Reserved and Preserved, rsvdP) identified on the first sub-register is used for other functions that may be supported in the future.
It should be noted that, when the first sub-register is a BAR capacity register with adjustable size, since the BAR capacity register with adjustable size is a read-only register, several permutation and combinations supported by the first sub-register are set in the range specified by the PCIe protocol according to the actual size of the target memory and the mapping requirement of performing data processing in the hardware design stage of the computing device, so that the size of the mapping storage space supported by the target register can be determined by reading the variable size space support information in the first sub-register before rewriting the variable size space control information in the second sub-register.
For example, as shown in fig. 5C, when the second sub-register is a BAR control register of an adjustable size, index information, reserved space information (RsvdP), variable size space control information (BAR size), variable size register number information, a plurality of expandable size information, and the like are included in the second sub-register; the second sub-register may support multiple scalable sizes (e.g., 256TB-8 EB) and various permutation and combinations thereof.
For example, by rewriting the variable size space control information (BAR size), the size of the mapped storage space of the target register can be rewritten (or expanded) from the original size (e.g., the first size) to a desired size (e.g., the second size). For example, the variable size space control information may be rewritten to many times the actual need to perform the data processing, and then a rescan (rescan) of the second sub-register may be triggered to complete further rewriting so that the size of the current mapped storage space of the rewritten target register matches the actual need of the data processing that is scheduled to be performed. For example, in this embodiment, the rewriting of the size of the mapped storage space of the target register can be completed by rewriting the variable size space control information of the second sub-register, which is simple in implementation and has greater flexibility, so that there is no need to increase the number of the target registers, the functional diversity of accessing the target memory is improved, and the hardware implementation cost is reduced.
It should be noted that the implementation of the extended configuration space of the target registers described in fig. 5A to 5C (e.g., the base register and the size supported by the base register are described) is merely exemplary, and other possible implementations may be selected according to actual needs, which are not limited by the embodiments of the present disclosure.
For example, the data to be processed corresponding to the target information may be stored in the first memory; the first memory may be a memory that needs to transmit data to the target memory; taking the topology shown in fig. 2 as an example, when the target memory is memory 1, the first memory may be memory 0.
For example, the target memory is located in a target node, the first memory is located in a first node, and the target node and the first node are co-located in a first topology; in a method of accessing a target memory provided in at least one embodiment of the present disclosure, target information is transmitted from a first node to a target node for performing an access to the target memory. Then, the target node processes (e.g., parses) the target information, and performs data processing according to the processing result.
For example, a first computing device is located in a first node, a target computing device is located in a target node, the first computing device includes a first memory, the target computing device includes a target memory and a target register, and the target register is configured to map target information from the first memory into a current mapped storage space.
For example, in some examples, when the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus; the target information is transmitted from the first node to the target node over the PCIe bus for performing the access to the target memory.
For example, in other examples, the first memory is coupled to the target memory via a second link; the target information is transmitted from the first node to the target node over the second link for performing the access to the target memory. For example, the second link includes proprietary modifications based on the PCIe bus protocol. For example, embodiments of the present disclosure are not limited to a particular implementation (private modification) of the second link.
For example, in still other examples, when the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus; the first network card is positioned in the first node, and the second network card is positioned in the target node. For example, the first memory is connected to the target memory through a remote direct memory access; the target information is transmitted from the first node to the target node via a remote direct memory access for performing the remote direct memory access to the target memory. For example, the first memory is connected to the first network card through a first branch of the PCIe bus, the first network card is connected to the second network card through a first link, and the second network card is connected to the target memory through a second branch of the PCIe bus. For example, the first link may include a transmission medium, such as a network cable (e.g., infiniband and/or ethernet, etc.), and embodiments of the present disclosure are not limited to a particular implementation of the first link.
Fig. 6 is a schematic diagram of another example of a method for accessing a target memory according to at least one embodiment of the present disclosure. For example, fig. 6 is a specific example of the method for accessing the target memory shown in fig. 3.
For example, as shown in fig. 6, taking as an example a PCIe bus topology as a first topology as a description object, in the first topology, a host 0 is connected to a first computing device 0, a target computing device 1, a first network card 0, and a second network card 1 through a plurality of PCIe switches; the PCIe switches in each include multiple PCI-to-PCI connection bridges (PCI-to-PCI) to enable interconnection between the different interfaces. For example, the host 0 may be a central processing unit (Central Processing Unit, CPU), or may select other processors capable of implementing functions such as control or scheduling according to actual needs, which is not limited in the embodiments of the present disclosure.
For example, as shown in fig. 6, the first node 0 and the target node 1 are located in a first topology; the first computing device 0 and the first network card 0 are located in the first node 0, and the target computing device 1 and the second network card 1 are located in the target node 1; the first computing device 0 includes a first memory and the target computing device 1 includes a target memory. In particular, the target computing device 1 further comprises a target register BARi with which the host 0 maps target information from the first node 0 into the mapped storage space of the target node 1; the specific operation of the target register BARi in the target computing device may be found in the description of fig. 4A above, and will not be described in detail herein.
For example, as shown in fig. 6, the first memory in the first computing device 0 is connected to the target memory in the target computing device 1 through the PCIe bus, so that the host 0 can implement unified control over transmission of the target information and data processing in the target memory. On this basis, the transfer of the target information from the first node to the target node may be implemented in a number of ways, as described in detail below.
For example, in a first implementation, as shown in fig. 6, since the first memory is connected to the target memory through the PCIe bus, based on the target information transferred from the first node to the target node, the data to be processed corresponding to the target information may be transferred directly from the first memory to the target node through the PCIe bus (i.e., through the north bridge chipset of the host 0 without passing through the network card), and then based on writing the data to be processed into the target memory according to the target information. However, this approach requires passing through multiple stages of PCIe switches, which may introduce significant latency.
For example, in a second implementation, as shown in fig. 6, since the first memory is also connected to the target memory through the second link, the data to be processed may be transferred from the first memory to the target memory through the second link. For example, the second link may be implemented as a proprietary modification based on the PCIe bus protocol; in particular, a local interconnection between the first computing device 0 and the target computing device 1, or a point-to-point interconnection (i.e., RC- > EP) of a customized root complex device to a terminal device, may be used, or other implementation forms may be selected according to actual needs, which embodiments of the present disclosure do not limit.
For example, in a third implementation, as shown in fig. 6, since the first memory may also be connected to the target memory through RDMA, the pending data may be transferred from the first memory to the target memory through RDMA based on the target information. For example, as shown in fig. 6, the first memory 0 is connected to the first network card 0 through a first branch of the PCIe bus, specifically, the first network card 0 includes an RDMA0 engine, and the data to be processed can be carried out from the first memory through the RDMA0 engine based on the target information; the first network card 0 is connected to the second network card 1 through a first link, which may be implemented by some transmission medium or means, for example by a network cable (e.g. infiniband or ethernet, etc.) or other possible implementation; the second network card 1 is connected with the target memory through a second branch of the PCIe bus, specifically, the second network card 1 may directly move the data to be processed into the target memory based on the target information.
For example, the implementation is to transfer the target information from the first memory to the target memory through "first leg-first link-second leg"; although this method is an RDMA transmission method, the first branch and the second branch are branches of the PCIe bus, that is, the first computing device 0, the target computing device 1, the first network card 0, and the second network card 1 are all PCIe terminal devices that can be unified managed by the host 0 (that is, the PCIe topology has a unique identifier), so that unified control can be implemented by using the host 0 (for example, address space is uniformly allocated by the host 0), and versatility of the data transmission method is improved.
For example, for multiple implementations based on PCIe bus topology as shown in fig. 6, besides the transmission mode of RDMA (i.e., the third implementation mode), implementation modes of PCIe bus transmission mode (i.e., the first implementation mode) and local interconnection (i.e., the second implementation mode) are reserved, so that the most suitable implementation mode can be selected according to actual needs, and defects caused by only one implementation mode are avoided to a certain extent; in addition, the whole data transmission and the subsequent data processing process can realize unified control by utilizing the host, so that the universality of the target memory access method is improved.
It should be noted that the PCIe bus topology in fig. 6 is only an example, and other possible topologies may be selected according to actual needs; the specific implementation manner of transmitting the target information from the first memory to the target memory is not limited to three implementation manners, and other choices can be made according to actual needs; in addition, the computing devices in the topology structure are not limited to two, and can be expanded to a plurality of computing devices according to actual needs; embodiments of the present disclosure are not limited in this regard.
In particular, steps S10-S30 in fig. 3 may be accomplished at least in part under unified control of the host, based on, for example, the topology shown in fig. 6 or other types of topologies for which unified control is performed by the host. For example, in step S10, the host determines a specific manner of data processing (for example, a type of data processing, specifically including several sets of computation and address mapping manners) according to the data to be processed, and rewrites the size of the mapped storage space of the target register according to the size of the storage space supported by the target register. For example, in step S20, the relevant processing circuitry in the target computing device receives target information from the first computing device. Then, in step S30, the host controls the target register to map the target information to the current mapped storage space of the target computing device. Further, the target computing device performs operations required for data processing based on the target information in the current mapping storage space.
Fig. 7 is a schematic diagram of yet another example of a method for accessing a target memory according to at least one embodiment of the present disclosure. For example, fig. 7 is a specific example of the method for accessing the target memory shown in fig. 3.
For example, based on a topology such as that shown in FIG. 6 or other type of unified control performed by the host, as shown in FIG. 7, the host completes a series of preprocessing tasks prior to operating system startup to perform access to the target memory after the operating system startup.
For example, as shown in the left-hand flow of FIG. 7, prior to the operating system booting, the host first cold boots the target computing device and in the process completes a series of basic readiness for power-supply (Power-supply), clock (clock), reset (reset), memory and Firmware (Firmware) loads, etc.; then, the host enumerates computing devices (including target computing devices) connected by, for example, a PCIe bus in the system, so as to determine how many functions (functions), how many base address registers, etc. are included in each computing device; further, for example, the host discovers that a target register BAri is present in the target computing device (the target register BAri is a size-adjustable BAR) and determines all mapped storage space sizes supported by the target register BAri; further, the host uses the actual (or smaller) size of the target memory in advance as the initial size of the mapped storage space of the target register BARi.
For example, as shown in the right-side flow of fig. 7, after completing a series of preprocessing operations, the host starts the operating system and loads a driver; then, the driver starts the expansion mode of the target register BARi, sets the size of the mapping memory space of the target register BARi to the maximum value supported by the mapping memory space, and specifically, can expand the variable size space control information of the second sub-register in the target register BARi to a plurality of times of the actual requirement for executing data processing; further, the driver triggers rescanning (rescan) of the target register BARi to perform step S10 in fig. 3, i.e. to rewrite the size of the mapped storage space of the target register. Further, during the operation process of starting the application program later, the host configures RDMA (or configures other types of transmission modes according to actual needs), and starts access to the target memory according to needs, so as to execute steps S20 to S30, that is, receive the target information corresponding to the execution data processing, and map the target information to the current mapped storage space of the target computing device by using the target register. For example, the specific embodiments of steps S10 to S30 are detailed in the foregoing description, and are not repeated here.
According to the method for accessing the target memory, the size of the mapping storage space of the target register is rewritten, and the rewritten target register is used for completing the mapping from the target information to the current mapping storage space, so that the number of the target registers is not required to be increased, the functional diversity of the target memory is improved, the method is suitable for various interconnection modes among computing devices, the flexibility of the execution mode of data processing is improved, the universality of the memory access mode is enhanced, and the hardware realization cost is reduced.
Fig. 8 is a schematic block diagram of an apparatus for accessing a target memory according to at least one embodiment of the present disclosure.
For example, as shown in fig. 8, the apparatus 200 for accessing a target memory includes a rewrite module 210, a receive module 220, and a map module 230.
For example, the rewrite module 210 is configured to rewrite the size of the mapped storage space of the target register to obtain the current mapped storage space; the target register includes variable size space control information for overwriting the size of the mapped memory space, the current mapped memory space including a first address space and an extended address space corresponding to the target memory. That is, the rewrite module 210 may be configured to perform, for example, step S10 shown in fig. 3.
For example, the receiving module 220 is configured to receive target information required to perform data processing. That is, the receiving module 220 may be configured to perform step S20 shown in fig. 3, for example.
For example, the mapping module 230 is configured to map the target information to the current mapped storage space to determine the data processing described above. That is, the mapping module 230 may be configured to perform step S30 shown in fig. 3, for example.
For example, further as shown in fig. 8, the apparatus 200 for accessing a target memory may further include an execution module 240, where the execution module 240 is configured to perform data processing using the current mapped storage space. For example, the data processing includes at least one type of computing operation, and the data to be processed corresponding to the target information includes at least one type of data. In some examples, the data processing includes distributed data integration computing.
For example, in some examples, the rewrite module 210 may be further configured to expand the mapped storage space of the target register from an initial mapped storage space of a first size to a current mapped storage space of a second size; wherein the second dimension is greater than the first dimension. In some examples, the size of the extended address space is an integer multiple of the size of the first address space.
For example, in some examples, the target register includes a first sub-register including variable size space support information and a second sub-register including variable size space control information; the rewrite module 210 may be further configured to read the variable size space support information in the first sub-register to determine a size of the mapped storage space supported by the target register; based on the determined size of the mapped memory space supported by the target register, and overwriting variable size space control information in the second sub-register.
For example, in some examples, the destination register includes a base register that may be rewritten to the size of the mapped storage space. For example, in some examples, the rewrite module 210 may be further configured to rewrite the size of the mapped memory space of the target register based on the PCIe bus protocol
For example, further, the data to be processed corresponding to the target information is stored in the first memory; the target memory is located in the target node, the first memory is located in the first node, and the target node and the first node are located in the first topology; the target information is transmitted from the first node to the target node for performing an access to the target memory.
For example, in some examples, the first topology is a PCIe bus topology, and the first memory is connected to the target memory through a PCIe bus; the target information is transmitted from the first node to the target node over the PCIe bus for performing the access to the target memory.
For example, in some examples, the connection of the first memory to the target memory allows remote direct memory access; the target information is transmitted from the first node to the target node for performing a remote direct memory access to the target memory. In some examples, the first network card is located in a first node and the second network card is located in a target node; the first memory is connected with the first network card through a first branch of the PCIe bus, the first network card is connected with the second network card through a first link, and the second network card is connected with the target memory through a second branch of the PCIe bus. For example, the first link may comprise a transmission medium such as a network cable (e.g., infiniband and/or ethernet, etc.).
For example, in some examples, the first memory is connected to the target memory through a second link; the target information is transmitted from the first node to the target node over the second link for performing an access to the target memory. For example, the second link includes proprietary modifications based on the PCIe bus protocol.
Since details of the operations of the apparatus 200 for accessing the target memory have been described in the above process of describing the method for accessing the target memory, such as shown in fig. 3, details thereof are not described herein for brevity, and reference is made to the above descriptions of fig. 3 to 7.
It should be noted that, each of the above modules in the apparatus 200 for accessing a target memory shown in fig. 8 may be configured as software, hardware, firmware, or any combination thereof for performing a specific function. For example, these modules may correspond to application specific integrated circuits, to pure software code, or to a combination of software and hardware. By way of example, the device described with reference to fig. 8 may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing program instructions, but is not limited thereto.
In addition, although the apparatus 200 for accessing the target memory is described above as being divided into modules for performing the respective processes, it is apparent to those skilled in the art that the processes performed by the respective modules may be performed without any specific division of the modules in the apparatus or without explicit demarcation between the respective modules. In addition, the apparatus 200 for accessing the target memory described above with reference to fig. 8 is not limited to include the above-described modules, but may be added with some other modules (e.g., a reading module, a control module, etc.) as needed, or the above modules may be combined.
At least one embodiment of the present disclosure also provides an electronic device including a processor and a memory; the memory includes one or more computer program modules; the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising a method for implementing the access target memory provided by the embodiments of the present disclosure described above. For example, the processor may be a single-core processor or a multi-core processor.
FIG. 9 is a schematic block diagram of an electronic device provided in accordance with at least one embodiment of the present disclosure
For example, as shown in fig. 9, the electronic device 300 includes a processor 310 and a memory 320. For example, memory 320 is used to store non-transitory computer-readable instructions (e.g., one or more computer program modules). The processor 310 is configured to execute non-transitory computer readable instructions that, when executed by the processor 310, may perform one or more steps in accordance with the method of accessing a target memory described above. The memory 320 and the processor 310 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).
For example, processor 310 may be a Central Processing Unit (CPU), a Graphics Processor (GPU), a General Purpose Graphics Processor (GPGPU), a Digital Signal Processor (DSP), or other form of processing unit having the capability to access a target memory and/or program execution capability, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86, RISC-V, ARM architecture, or the like. The processor 310 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 300 to perform the desired functions.
For example, memory 320 may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The nonvolatile memory may include, for example, read Only Memory (ROM), hard disk, erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer readable storage medium and executed by the processor 310 to implement various functions of the electronic device 300. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium.
It should be noted that, in the embodiments of the present disclosure, specific functions and technical effects of the electronic device 300 may refer to the description of the method for accessing the target memory provided in at least one embodiment of the present disclosure, which is not repeated herein.
Fig. 10 is a schematic block diagram of another electronic device provided in accordance with at least one embodiment of the present disclosure.
For example, as shown in fig. 10, the electronic device 400 is suitable for implementing the method for accessing the target memory provided in the embodiments of the present disclosure, for example. It should be noted that the electronic device 400 shown in fig. 10 is only one example and does not impose any limitation on the functionality and scope of use of the disclosed embodiments.
For example, as shown in fig. 10, the electronic device 400 may include a processing means (e.g., a central processor, a graphics processor, etc.) 41 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 42 or a program loaded from a storage means 48 into a Random Access Memory (RAM) 43. In the RAM 43, various programs and data required for the operation of the electronic apparatus 400 are also stored. The processing device 41, the ROM 42 and the RAM 43 are connected to each other via a bus 44. An input/output (I/O) interface 45 is also connected to bus 44. In general, the following devices may be connected to the I/O interface 45: input devices 46 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 47 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 48 including, for example, magnetic tape, hard disk, etc.; and communication means 49. The communication means 49 may allow the electronic device 400 to communicate with other electronic devices wirelessly or by wire to exchange data.
While fig. 10 shows an electronic device 400 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that electronic device 400 may alternatively be implemented or provided with more or fewer means.
For detailed description and technical effects of the electronic device 400, reference may be made to the above description of the method for accessing the target memory, which is not repeated herein.
Fig. 11 is a schematic diagram of a storage medium according to at least one embodiment of the present disclosure.
For example, as shown in FIG. 11, the storage medium 500 stores non-transitory computer readable instructions 510. For example, non-transitory computer readable instructions 510, when executed by a computer, perform one or more steps in a method of accessing a target memory according to the above.
For example, the storage medium 500 may be applied to the electronic device 300 shown in fig. 9. For example, the storage medium 500 may be the memory 320 in the electronic device 300. For example, the relevant description of the storage medium 500 may refer to the corresponding description of the memory 320 in the electronic device 300 shown in fig. 9, and will not be repeated here.
For the purposes of this disclosure, the following points are to be described:
(1) In the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are referred to, and other structures may refer to the general design.
(2) Features of the same and different embodiments of the disclosure may be combined with each other without conflict.
The foregoing is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it should be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (20)

1. A method of accessing a target memory, comprising:
the method comprises the steps of rewriting the size of a mapping storage space of a target register to obtain a current mapping storage space, wherein the target register comprises variable-size space control information, the variable-size space control information is used for rewriting the size of the mapping storage space, and the current mapping storage space comprises a first address space and an extended address space corresponding to a target memory;
receiving target information required for executing data processing; and
mapping the target information to the current mapped storage space to determine the data processing.
2. The method of accessing a target memory according to claim 1, further comprising:
Executing the data processing by utilizing the current mapping storage space;
the data processing comprises at least one calculation operation type, and the data to be processed corresponding to the target information comprises at least one data type.
3. The method of accessing a target memory according to claim 1, wherein the data processing comprises distributed data integration computing.
4. The method of accessing a target memory according to claim 1, wherein the overwriting the size of the mapped storage space of the target register to obtain the current mapped storage space comprises:
expanding the mapped memory space of the destination register from an initial mapped memory space of a first size to the current mapped memory space of a second size,
wherein the second dimension is greater than the first dimension.
5. The method of accessing a target memory as recited in claim 4, wherein the first size of initially mapped storage space comprises a first interface address space, the second size of target registers comprises n+1 interface address spaces, the n+1 interface address spaces comprise the first interface address space through n+1th interface address space, N is a positive integer,
The extended address space includes N sections including first to Nth sections,
the target information includes basic information and N sets of calculation information,
the mapping the target information into the current mapping storage space includes:
mapping the base information into the first address space using the first interface address space;
mapping a kth set of computing information in the N sets of computing information into a kth section in the N sections using a kth interface address space in the N +1 interface address spaces,
k=1,2,…,N。
6. the method of accessing a target memory as recited in claim 5, wherein the data processing comprises N sets of computations,
the method further comprises the steps of:
and mapping a kth set of computations in the N sets of computations using the kth segment based on the kth set of computation information.
7. The method of accessing a target memory according to claim 4, wherein the size of the extended address space is an integer multiple of the size of the first address space.
8. The method of accessing a target memory according to claim 1, wherein the target register comprises a first sub-register and a second sub-register, the first sub-register comprising variable size space support information, the second sub-register comprising the variable size space control information,
The size of the mapping storage space of the rewriting target register includes:
reading the variable size space support information in the first sub-register to determine a size of a mapped storage space supported by the target register;
the variable size space control information in the second sub-register is rewritten based on the determined size of the mapped storage space supported by the target register.
9. The method of accessing a target memory according to claim 1, wherein the target register comprises an adjustable size base register operable to overwrite the mapped storage space.
10. The method of accessing a target memory according to claim 1, wherein the overwriting the size of the mapped storage space of the target register comprises:
and rewriting the size of the mapping storage space of the target register based on a PCIe bus protocol.
11. The method for accessing a target memory according to claim 1, wherein the data to be processed corresponding to the target information is stored in a first memory,
the target memory is located in a target node, the first memory is located in a first node, the target node and the first node are located in a first topology,
The target information is transmitted from the first node to the target node for performing an access to the target memory.
12. The method of accessing a target memory of claim 11, wherein a first computing device is located in the first node, a target computing device is located in the target node,
the first computing device includes the first memory, the target computing device includes the target memory and the target register, and the target register is used for mapping the target information from the first computing device to the mapping storage space of the target computing device.
13. The method of accessing a target memory according to claim 11, wherein the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus,
the target information is transmitted from the first node to the target node over the PCIe bus for performing an access to the target memory.
14. The method of accessing a target memory according to claim 11, wherein the connection of the first memory to the target memory allows remote direct memory access,
The target information is transmitted from the first node to the target node for performing the remote direct memory access to the target memory.
15. The method of accessing a target memory according to claim 14, wherein the first topology is a PCIe bus topology, the first memory is connected to the target memory through a PCIe bus,
a first network card is located in the first node, a second network card is located in the target node,
the first memory is connected with the first network card through a first branch of the PCIe bus, the first network card is connected with the second network card through a first link, the second network card is connected with the target memory through a second branch of the PCIe bus,
wherein the first link comprises a transmission medium.
16. The method of accessing a target memory as recited in claim 11, wherein the first memory is coupled to the target memory via a second link,
the target information is transmitted from the first node to the target node over the second link, for performing an access to the target memory,
wherein the second link comprises a proprietary modification based on a PCIe bus protocol.
17. An apparatus for accessing a target memory, comprising:
the system comprises a rewriting module, a memory management module and a memory management module, wherein the rewriting module is configured to rewrite the size of a mapping memory space of a target register to obtain a current mapping memory space, the target register comprises variable size space control information, the variable size space control information is used for rewriting the size of the mapping memory space, and the current mapping memory space comprises a first address space and an extended address space corresponding to a target memory;
a receiving module configured to receive target information required for performing data processing; and
and a mapping module configured to map the target information to the current mapped storage space to determine the data processing.
18. The apparatus for accessing a target memory according to claim 17, further comprising:
an execution module configured to execute the data processing using the current mapped storage space;
the data processing comprises at least one calculation operation type, and the data to be processed corresponding to the target information comprises at least one data type.
19. An electronic device, comprising:
a processor;
a memory including one or more computer program modules;
Wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules being configured to implement the method of accessing a target memory of any of claims 1-16.
20. A storage medium storing non-transitory computer readable instructions which when executed by a computer implement the method of accessing a target memory of any of claims 1-16.
CN202310129935.9A 2023-02-17 2023-02-17 Method and device for accessing target memory, electronic equipment and storage medium Pending CN116049037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310129935.9A CN116049037A (en) 2023-02-17 2023-02-17 Method and device for accessing target memory, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310129935.9A CN116049037A (en) 2023-02-17 2023-02-17 Method and device for accessing target memory, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116049037A true CN116049037A (en) 2023-05-02

Family

ID=86120135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310129935.9A Pending CN116049037A (en) 2023-02-17 2023-02-17 Method and device for accessing target memory, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116049037A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664381A (en) * 2023-07-28 2023-08-29 深流微智能科技(深圳)有限公司 Method for GPU to access CPU extended memory and graphics processing system
CN117170744A (en) * 2023-11-03 2023-12-05 珠海星云智联科技有限公司 DPU (differential pulse Unit) OptionRom function implementation method and related device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664381A (en) * 2023-07-28 2023-08-29 深流微智能科技(深圳)有限公司 Method for GPU to access CPU extended memory and graphics processing system
CN116664381B (en) * 2023-07-28 2024-03-26 深流微智能科技(深圳)有限公司 Method for GPU to access CPU extended memory and graphics processing system
CN117170744A (en) * 2023-11-03 2023-12-05 珠海星云智联科技有限公司 DPU (differential pulse Unit) OptionRom function implementation method and related device
CN117170744B (en) * 2023-11-03 2024-01-23 珠海星云智联科技有限公司 DPU (differential pulse Unit) OptionRom function implementation method and related device

Similar Documents

Publication Publication Date Title
CN116049037A (en) Method and device for accessing target memory, electronic equipment and storage medium
CN109102065B (en) Convolutional neural network accelerator based on PSoC
CN105678378A (en) Indirectly accessing sample data to perform multi-convolution operations in parallel processing system
KR20190065789A (en) Electronic device performing training on memory device by rank unit and training method thereof
CN111459844B (en) Data storage device and method for accessing logical-to-physical address mapping table
US9799092B2 (en) Graphic processing unit and method of processing graphic data by using the same
CN111274025B (en) System and method for accelerating data processing in SSD
KR20120063829A (en) Method of data processing for non-volatile memory
US20170286287A1 (en) Method and apparatus for processing sequential writes to a block group of physical blocks in a memory device
CN102959504A (en) Method and apparatus to facilitate shared pointers in a heterogeneous platform
CN104375972A (en) Microprocessor integrated configuration controller for configurable math hardware accelerators
CN110647291A (en) Hardware assisted paging mechanism
US11126382B2 (en) SD card-based high-speed data storage method
KR20200108774A (en) Memory Device including instruction memory based on circular queue and Operation Method thereof
KR20140018813A (en) Method for managing dynamic memory reallocation and device performing the same
CN106648758A (en) Multi-core processor BOOT starting system and method
CN117751367A (en) Method and apparatus for performing machine learning operations using storage element pointers
CN107451070B (en) Data processing method and server
CN111694513A (en) Memory device and method including a circular instruction memory queue
KR102416465B1 (en) Data Processing System of Effectively Managing Shared Resource
CN117631974A (en) Access request reordering across a multi-channel interface of a memory-based communication queue
CN111104362A (en) Device and method for configuring field programmable gate array and field programmable gate array
CN117435549A (en) Method and system for communication between hardware components
US11550736B1 (en) Tensorized direct memory access descriptors
CN118043790A (en) High bandwidth collection cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant after: Shanghai Bi Ren Technology Co.,Ltd.

Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information