WO2023051248A1

WO2023051248A1 - Data access system and method, and related device

Info

Publication number: WO2023051248A1
Application number: PCT/CN2022/118756
Authority: WO
Inventors: 陈天翔; 黄江乐; 胡天驰
Original assignee: 华为技术有限公司
Priority date: 2021-09-30
Filing date: 2022-09-14
Publication date: 2023-04-06
Also published as: CN115905036A

Abstract

A data access system and method, and a related device. The system comprises a first node (110) and a second node (120); the first node (110) is connected to the second node (120) by means of a cable (140), and is used for generating a data access request, wherein the data access request is used for requesting data in a memory of the second node (120); the first node (110) is used for sending the data access request to the second node (120) by means of the cable (140); and the second node (120) is used for converting a first destination address in the data access request into a local physical address corresponding to the first destination address, and accessing data in the memory of the second node (120) according to the local physical address. The system does not need to wait for the preparation time of a network card queue unit, such that the first node (110) has high efficiency and low time delay in accessing the memory of the second node (120), thereby improving the data processing efficiency of the first node (110).

Description

A data access system, method and related equipment

This application claims the priority of the Chinese patent application with the application number 202111160189.7 and the application title "A data access system, method and related equipment" submitted to the China Patent Office on September 30, 2021, the entire contents of which are incorporated by reference in In this application.

technical field

The present application relates to the field of storage, in particular to a data access system, method and related equipment.

Background technique

With the continuous development of science and technology, the massive data generated in the era of information explosion has penetrated into every industry and business function field today, and the fields of big data (big data) and artificial intelligence (artificial intelligence, AI) have also been developed. Become two very popular research directions.

When computing nodes perform data processing (such as big data or AI tasks), they often need a large memory capacity to store data. Usually, data can be distributed in the memory of multiple storage nodes. Computing nodes The data in the memory of the storage node can be read through the remote direct memory access (RDMA) protocol to realize the expansion of the memory capacity.

However, under the RDMA protocol, the communication between the computing node and the storage node is realized through the network card, and the data transmission between the two network cards is performed through the network card queue, so that each time the computing node reads data, it needs to put the data read request into the network card Queue, which causes the data reading process to consume a lot of time in the preparation of the queue unit, and even in some cases, the preparation time of the queue unit is longer than the data transmission time, resulting in low data access efficiency of computing nodes, high network delay, and great impact The processing efficiency of data or AI tasks.

Contents of the invention

The present application provides a data access system, method and related equipment, which are used to solve the problems of low access efficiency and high network delay when computing nodes access memory of storage nodes.

In a first aspect, a data access system is provided, the data access system includes a first node and a second node, the first node and the second node are connected through a cable; the first node is used to generate a data access request, wherein the data The access request is used to request data in the memory of the second node; the first node is used to send the data access request to the second node through the cable; the second node is used to convert the first destination address in the data access request to the first The local physical address corresponding to the destination address, and accessing the data in the memory of the second node according to the local physical address.

Implement the system described in the first aspect, the first node and the second node are connected by cables, and the communication interaction between the two does not need to pass through the network card or routing equipment, so that the first node does not need to wait for the network card queue unit when accessing the memory of the second node preparation time, thereby improving the efficiency of the first node accessing the memory of the second node and reducing the access delay.

In a possible implementation, the first node includes a computing chip and an interconnection chip, wherein the first high-speed interconnection port of the interconnection chip is connected to the second high-speed interconnection port of the processor in the second node through a cable, and the computing chip is connected through The port is connected to the interconnection chip, the calculation chip is used to generate a data access request, and sends the data access request to the interconnection chip, and the interconnection chip is used to send the data access request to the second node through the cable.

A computing chip may consist of at least one general-purpose processor, such as a CPU, NPU, or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field programmable logic gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof. Wherein, the number of computing chips in the first node may be one or more, which is not specifically limited in this application.

The interconnection chip may be ASIC, PLD or a combination thereof, and the aforementioned PLD may be CPLD, FPGA, GAL or any combination thereof, which is not specifically limited in this application. Wherein, the number of interconnected chips in the first node may be multiple, which is not specifically limited in this application. The interconnection chip is provided with a high-speed interconnection port, and the interconnection chip can perform data communication with the second node through the high-speed interconnection port. The first high-speed interconnection port of the interconnection chip is connected to the second high-speed interconnection port on the second node through a cable. Yes, the number of first high-speed interconnection ports on each interconnection chip may be one or more, and each first high-speed interconnection port is in a one-to-one correspondence with the second high-speed interconnection ports on the second node.

The high-speed interconnection port can be a high-speed serial bus port, such as a SERDES bus port, and the cable can be a cable, optical fiber, twisted pair, etc. that can transmit data, and this application does not specifically limit the cable. The number of high-speed interconnection ports on the first node may be one or more, and the first high-speed interconnection ports on the first node are in a one-to-one correspondence with the second high-speed interconnection ports on the second node.

The port of the computing chip can be a high-speed serial bus port, such as a SERDES bus port, and the computing chip can be connected to the interconnection chip through the bus, and the bus can be a Peripheral Component Interconnect (PCI) bus or an extended industry standard structure (Extended Industry Standard Architecture, EISA) bus, etc., computing chips, interconnect chips, ports and buses of computing chips can be uniformly printed on the circuit board during processing. In a specific implementation, the number of ports of the computing chip may be one or more, which is not limited in this application.

Implementing the above implementation method, deploying interconnection chips in the first node can enable the first node to communicate with more second nodes. The greater the number of interconnection chips, the greater the number of high-speed interconnection ports that can be deployed in the first node. The more the number of second nodes connected to the first node is, the more the memory expansion capability of the first node is expanded, so that the first node can be applied to more application scenarios.

In another possible implementation manner, the data communication between the computing chip, the interconnection chip and the second node of the first node may implement an addressing function through an address decoder.

Optionally, the computing chip includes a first address decoder, and the computing chip is specifically configured to: generate a data access request, determine the first port according to the first destination address in the data access request and the first address decoder, and use the second A port sends a data access request to the interconnection chip, wherein the first address decoder is used to record the correspondence between the destination address and the port of the computing chip.

Optionally, a first address decoder is deployed in the computing chip, and the computing chip is specifically used to generate a data access request, determine the first port according to the first destination address in the data access request and the first address decoder, and use the second A port sends a data access request to the interconnection chip, wherein the first address decoder can record the correspondence between the destination address and the port of the computing chip.

Optionally, a second address decoder is deployed in the interconnection chip, and the interconnection chip is specifically used to determine the first high-speed interconnection port according to the first destination address and the second address decoder, and communicate to the second node through the first high-speed interconnection port A data access request is sent, wherein the second address decoder is used to record the corresponding relationship between the destination address and the high-speed interconnection port.

Optionally, a third address decoder is deployed in the second node, and the second node is specifically configured to determine the local physical address corresponding to the first destination address according to the first destination address and the third address decoder, wherein the second The three-address decoder is used to record the correspondence between the destination address and the local physical address. Wherein, the corresponding relationship recorded by the third address decoder can be: local physical address=destination address-base address, wherein, the base address refers to the starting address of an address segment, also known as the first address or segment address, belonging to the same The base address of the destination address of an address segment is the same.

Optionally, the data access system may further include a configuration node, and the configuration node may configure the first address decoder, the second address decoder, and the third address decoder. Specifically, the configuration node is used to acquire at least one local physical address of the memory of the second node from the second node, the configuration node is used to determine at least one corresponding destination address according to the at least one local physical address, and perform configuration; the configuration node is also used to configure the second address decoder according to at least one destination address in combination with the high-speed interconnect port between the second node and the interconnection chip; the configuration node is also used to combine the interconnection port according to at least one destination address The chip port between the chip and the computing chip configures the first address decoder.

In the specific implementation, when the configuration node acquires at least one local physical address of the second node's memory from the second node, it can determine the extension allocated by the second node for use by the first node according to the size of the second node's memory and in combination with business requirements. The local physical address of memory. Optionally, the extended memory used by the first node can be part of the memory of the second node, and this part of the extended memory can be processed through memory isolation technology, so that the second node cannot access this part of the extended memory, thereby increasing the storage capacity of the extended memory. Data Security.

Implementing the above implementation, by configuring the first, second and third address decoders configured by the node, it can be ensured that the data access request generated by the computing chip is routed and addressed by the address decoder, and the data access request is transmitted to the destination address corresponding to The CPU of the second node performs memory read and write, thereby avoiding the waiting time for network card queue preparation, improving the efficiency of the first node to expand memory read and write, and the delay can even reach the microsecond level (the Ethernet delay can reach the millisecond level ), the bandwidth can reach 400GB, which has higher bandwidth and delay than the RDMA network card with a bandwidth of only 100GB.

In another possible implementation manner, when matching the first destination address with the first address decoder and the second address decoder, the whole or part of the first destination address can be matched with the address in the decoder Matching, so as to improve the matching efficiency, and then improve the efficiency of data access.

Optionally, the first port may be determined according to the base address and length of the first destination address in the data access request. Specifically, the computing chip is specifically configured to match the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address, and determine the first port corresponding to the matched destination address. Similarly, the interconnect chip is specifically used to match the base address and length of the destination address recorded in the second address decoder with the base address and length of the second destination address, and determine the first high-speed interconnect corresponding to the matched destination address. port. I won't go into details here.

Optionally, the first port may be determined according to the upper address of the first destination address. The computing chip is specifically used to match the high-order address of the destination address recorded in the first address decoder with the high-order address of the first destination address, and determine the first port corresponding to the matched destination address, wherein the number of digits of the high-order address It is determined according to the memory size of the second node. Similarly, the interconnection chip is specifically used to match the upper address of the destination address recorded in the second address decoder with the upper address of the first destination address, and determine the first high-speed interconnection port corresponding to the matched destination address. I won't go into details here.

For example, assuming that the total length of the destination address is 64 bits, if the memory of the second node 120 corresponding to a high-speed interconnection port is 1T, then among the destination addresses of the 1T memory, the address of the last 30 bits is different, so the number of bits of the high address can be It is 64-30=34bit. Simply speaking, the first 34bits of the destination address in the same memory are the same, and the latter 30bits are different. Therefore, the high-order address can be determined according to the expanded memory size of the second node 120 connected to the high-speed interconnect port. digits.

It should be understood that since the expanded memory provided by the second node corresponds to multiple physical addresses, the ports corresponding to part of the destination addresses recorded in the first and second decoders may be the same, corresponding to the destination addresses of the same port Located in the same memory, these destination addresses corresponding to the same port have the same base address and length, or the high address is the same, so the first purpose can be determined by matching the base address and length, or matching the high address The port corresponding to the address.

Implementing the above implementation method, matching part of the first destination address with the address in the decoder can improve the matching efficiency, improve the efficiency of determining the first port and the first high-speed interconnection port, and further improve the efficiency of data access.

It should be noted that if the data access request is to read data in memory from the second node, after the second node processes the data access request, it can combine the first, second and third nodes according to the source address in the data access request. The address decoder returns the read data to the first node through the original route, which will not be repeated here.

It should be noted that in some embodiments, the first node may not include an interconnection chip, and the high-speed interconnection port on the computing chip is connected to the high-speed interconnection port on the second node through a cable, and the computing chip can also decode the address through the above-mentioned The router implements routing addressing for data access requests. Specifically, the computing chip can be equipped with a second address decoder, and the second node can be equipped with a third address decoder 230. The data access request generated by the computing chip can be based on the high-speed interconnection port recorded in the second address decoder and The corresponding relationship between destination addresses is to determine the first high-speed interconnection port corresponding to the first destination address, and then send the data access request to the second node through the first high-speed interconnection port, which will not be described in detail here.

In a second aspect, a data access method is provided, the method is applied to a data access system, the data access system includes a first node and a second node, the first node and the second node are connected by a cable, the method includes the following steps : The first node generates a data access request, wherein the data access request is used to request data in the memory of the second node, the first node sends the data access request to the second node through a cable, and the second node sends the data in the data access request The first destination address is converted into a local physical address corresponding to the first destination address, and data in the memory of the second node is accessed according to the local physical address.

Implement the method described in the second aspect, the first node and the second node are connected by a cable, and the communication interaction between the two does not need to pass through a network card or a routing device, so that the first node does not need to wait for the network card queue unit when accessing the memory of the second node preparation time, thereby improving the efficiency of the first node accessing the memory of the second node and reducing the access delay.

In a possible implementation, the first node includes a computing chip and an interconnection chip, wherein the first high-speed interconnection port of the interconnection chip is connected to the second high-speed interconnection port of the processor in the second node through a cable, and the computing chip can A data access request is generated and sent to the interconnection chip, and the interconnection chip sends the data access request to the second node through the cable.

In a possible implementation, the computing chip is connected to the interconnection chip through a port, the computing chip includes a first address decoder, and the computing chip can generate a data access request, and according to the first destination address and the first address in the data access request The decoder determines the first port, and sends a data access request to the interconnection chip through the first port, wherein the first address decoder is used to record the correspondence between the destination address and the port of the computing chip.

In a possible implementation, the interconnection chip includes a second address decoder, the interconnection chip determines the first high-speed interconnection port according to the first destination address and the second address decoder, and communicates to the second node through the first high-speed interconnection port A data access request is sent, wherein the second address decoder is used to record the corresponding relationship between the destination address and the high-speed interconnection port.

In a possible implementation, the second node includes a third address decoder, and the second node determines the local physical address corresponding to the first destination address according to the first destination address and the third address decoder, wherein the third The address decoder is used to record the correspondence between the destination address and the local physical address.

In a possible implementation, the data access system further includes a configuration node, and the above method further includes the following steps: the configuration node acquires at least one local physical address of the memory of the second node from the second node, and the configuration node obtains at least one local physical address of the memory of the second node according to the at least one local physical address Determining at least one corresponding destination address, configuring the third address decoder, configuring the second address decoder according to the at least one destination address, and combining the high-speed interconnection port between the second node and the interconnection chip, The configuration node configures the first address decoder according to at least one destination address in combination with the chip port between the interconnection chip and the computing chip.

In a possible implementation, the computing chip matches the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address, and determines the first address corresponding to the matched destination address. port, the interconnection chip matches the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address, and determines the first high-speed interconnection port corresponding to the matched destination address.

In a possible implementation manner, the computing chip matches the high-order address of the destination address recorded in the first address decoder with the high-order address of the first destination address, and determines the first port corresponding to the matched destination address, wherein, The number of bits in the high address is determined according to the memory size of the second node. The interconnection chip matches the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address to determine the match The first high-speed interconnect port corresponding to the last destination address.

In a possible implementation manner, the first high-speed interconnection port and the second high-speed interconnection port are high-speed serial bus ports, and the first port is a high-speed serial bus port.

In a third aspect, a computing node is provided, the computing node may be the first node described in the first aspect and the second aspect, the computing node is applied to a data access system, the data access system further includes a storage node, and the computing node includes: Computing chip and interconnection chip, wherein, the computing chip is connected to the interconnection chip through a high-speed interconnection port, and the interconnection chip is connected to other nodes through a high-speed interconnection port and cables; the computing chip is used to generate a data access request and send the data access request to The interconnection chip, wherein the data access request includes a first destination address, and the first destination address indicates the location of the memory in the storage node; the interconnection chip is used to send the data access request to the storage node according to the first destination address.

A fourth aspect provides a storage node, which may be the second node described in the first aspect and the second aspect, the storage node is applied to a data access system, the data access system further includes a computing node, and the storage node includes a processing The storage node is connected to the computing node through the high-speed interconnection port of the processor and the cable; the processor is used to receive the data access request sent by the computing node through the high-speed interconnection port, and convert the first destination address carried in the data access request It is the local physical address of the storage node corresponding to the first destination address, and accesses the data in the memory according to the local physical address.

In a fifth aspect, there is provided a computing device, the computing device includes a processor and a memory, the memory stores codes, and the processor includes the first node or the first node for executing the first aspect or any possible implementation manner of the first aspect. The functions of each module implemented by the two nodes.

According to a sixth aspect, a computer-readable storage medium is provided, wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the methods described in the above aspects.

On the basis of the implementation manners provided in the foregoing aspects, the present application may further be combined to provide more implementation manners.

Description of drawings

Fig. 1 is a schematic structural diagram of a data access system provided by the present application;

FIG. 2 is a schematic diagram of deployment of a first node and a second node in an application scenario provided by the present application;

FIG. 3 is a schematic structural diagram of another data access system provided by the present application;

FIG. 4 is an example diagram of a first address decoder provided by the present application;

Fig. 5 is a schematic flow chart of steps of a data access method provided by the present application;

FIG. 6 is a schematic structural diagram of a computing node provided by the present application;

FIG. 7 is a schematic structural diagram of a storage node provided by the present application;

FIG. 8 is a schematic structural diagram of a computing device provided by the present application.

Detailed ways

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

First, the application scenarios involved in this application are described.

When computing nodes perform some big data or AI tasks, they need a large memory capacity to store data, and access data with small granularity and sparsely. The access granularity may be only 64 bytes or 128 bytes, and the location of each access is highly random, resulting in low data access efficiency and high network latency for big data or AI tasks, affecting the processing efficiency of big data or AI tasks.

Usually, in order to increase the memory capacity, the data can be distributed in the memory of multiple storage nodes, and the computing nodes can read the data in the memory of the storage nodes through the remote direct memory access (RDMA) protocol , to achieve the expansion of memory capacity.

However, under the RDMA protocol, the communication between the computing node and the storage node is realized through the network card, and the data transmission between the two network cards is performed through the network card queue, so that each time the computing node reads data, it needs to put the data read request into the network card Queue, which causes the data reading process to consume a lot of time in the preparation of the queue unit, and even in some cases, the preparation time of the queue unit is longer than the data transmission time, resulting in low data access efficiency of computing nodes and high network delay, affecting data Processing efficiency.

In order to improve the speed of accessing memory, PCI memory devices can be expanded on the memory bus of computing devices, but the number of expansions for PCIe devices is limited, and the expansion capabilities of buses such as QPI and UPI are weaker. Usually, the maximum memory capacity is expanded to 1T, making expansion The final memory still cannot reach the magnitude required by big data or AI tasks.

In summary, when computing nodes perform data processing tasks, they need a large memory capacity to store data. However, although the current commonly used RDMA method can expand the memory to the required level, it will make the data access efficiency of computing nodes low. Using Although expansion memory of PCI devices can improve access efficiency, the ability to expand memory is very weak, and the expanded memory still cannot meet the demand, resulting in low data access efficiency and high network latency for big data or AI tasks, affecting the performance of big data or AI tasks. Processing efficiency.

FIG. 1 is a schematic structural diagram of a data access system provided by the present application. The data access system includes a first node 110 and a second node 120, wherein the first node 110 and the second node 120 are connected through a cable 140, specifically, the high-speed interconnection port 130 of the first node 110 is connected to the second The high-speed interconnect ports 130 of the processors in the nodes 120 are connected by cables. It should be understood that the number of second nodes 120 in FIG. 1 is used for illustration, and the application does not limit the number of second nodes 120 . For ease of distinction, the high-speed interconnect port 130 in the first node 110 will be collectively referred to as the first high-speed interconnect port, and the high-speed interconnect port 130 in the second node 120 will be referred to as the second high-speed interconnect port.

The first node 110 and the second node 120 can be physical servers, such as X86 servers, ARM servers, etc.; they can also be virtual machines (virtual machines) based on common physical servers combined with network functions virtualization (network functions virtualization, NFV) technology , VM), a virtual machine refers to a complete computer system that is simulated by software and has complete hardware system functions and runs in a completely isolated environment, such as a virtual device in cloud computing, which is not specifically limited in this application; the first node 110 and the second node The second node 120 may also be a server cluster composed of multiple servers, and the servers may be physical servers or virtual machines in the foregoing content.

The high-speed interconnection port 130 can be a high-speed serial bus port, such as a SERDES bus port, and the cable 140 can be a cable, optical fiber, twisted pair, etc. that can transmit data. This application does not specifically limit the cable 140 . Wherein, the number of high-speed interconnection ports 130 on the first node 110 may be one or more, and the first high-speed interconnection ports on the first node 110 are in a one-to-one correspondence with the second high-speed interconnection ports on the second node, FIG. 1 uses three ports as an example for illustration, which is not limited in this application.

It should be noted that the first high-speed interconnection port of the first node 110 is connected to the second high-speed interconnection port of the processor of the second node 120 through a cable, and there may be one or more processors of the second node 120. When the number of processors in the node 120 is multiple, the first node 110 can be connected to different processors of the second node through different high-speed interconnect ports, for example, when the second node 4 includes processor 4, processor 5 and processor 6 , the high-speed interconnection port 1 of the first node 110 can be connected to the processor 4, the high-speed interconnection port 2 of the first node 110 can be connected to the processor 5, and the data in the memory corresponding to different processors can be read through different high-speed interconnection ports . It should be understood that the second node 120 may reserve at least one processor not connected to the first node 110, so as to ensure that after the second node 120 provides some memory to the first node, it will not affect the second node 120 to process other services.

The first node 110 is used to process data tasks, such as big data or AI tasks in the aforementioned content. The second node 120 is used to store data, and the second node 120 can divide a part of the memory as the expanded memory of the first node 110 for use by the first node 110. The first node 110 can use the data access system shown in FIG. Data is read from the expanded memory allocated by the second node 120 to process big data or AI tasks, thereby implementing memory expansion of the first node 110 .

In an application scenario, as shown in FIG. 2 , the first node 110 and the second node 120 can be deployed in the same cabinet, and the first high-speed interconnection port of the first node 110 and the second high-speed interconnection port of the second node 120 direct connection. The servers in the entire cabinet can communicate with each other without going through a switch or a network card, so that the first node 110 can read data from the memory of the second node 120 .

Among them, the first node 110 can be an AI server, and the second node 120 can be a 2P server. A 2P server refers to a server with two CPUs. Each 2P server has 16 channels (channels), and each channel can mount Two 64GB memory sticks, that is, each 2P server can expand 64GB×2×16=2TB memory for the first node 110, so about 10 2P servers can meet the memory expansion requirements of 10TB-20TB. Moreover, due to the height of the AI server and 2P server in the cabinet, one AI server and 8 to 10 2P servers can be placed inside a cabinet, so that a rack server composed of a cabinet can have 10TB to 20TB of memory, which meets the In most application scenarios, the memory requirements of the AI server for data processing. It should be understood that FIG. 2 is used for illustration, and the present application does not limit it.

In a specific implementation, the first node 110 may generate a data access request, wherein the data access request is used to request data in the memory of the second node, and the first node 110 may send the data access request to the second node 120 through the cable 140 , the second node may convert the first destination address in the data access request into a local physical address corresponding to the destination address, and access the data in the memory of the second node according to the local physical address.

Exemplarily, as shown in FIG. 3, FIG. 3 is a schematic structural diagram of another data access system provided by the present application, wherein the first node 110 may include a computing chip 111, an interconnection chip 112, a port 113 of the computing chip and a bus 114, wherein, the port 113 of the computing chip of the computing chip 111 communicates with the interconnection chip 112 through the bus 114. In order to make the connection relationship in the figure clearer, the port on the interconnection chip 112 is not drawn in FIG. 3, but in the specific implementation, The interconnection chip 112 may also have corresponding ports. It should be understood that FIG. 1 is only an exemplary division method, and each module unit may be merged or split into more or fewer module units. This application does not make specific limited.

The port 113 of the computing chip can be a high-speed serial bus port, such as a SERDES bus port, and the bus 114 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. , the computing chip 111 , interconnection chip 112 , port 113 and bus 114 of the computing chip can be uniformly printed on the circuit board during processing. In a specific implementation, the number of ports 113 of the computing chip may be one or more. FIG. 3 uses two ports (port 0 and port 1) as an example for illustration, which is not limited in this application.

The computing chip 111 may be composed of at least one general-purpose processor, such as a CPU, an NPU, or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field programmable logic gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof. The computing chip 111 executes various types of digital storage instructions, which enable the first node 110 to provide a wide variety of services. Wherein, the number of computing chips 111 in the first node 110 may be one or more. FIG. 3 uses one computing chip 111 as an example for illustration, which is not specifically limited in this application.

The interconnect chip 112 may be an ASIC, a PLD or a combination thereof, and the PLD may be a CPLD, FPGA, GAL or any combination thereof, which is not specifically limited in this application. Wherein, the number of interconnection chips 112 in the first node 110 may be multiple. FIG. 3 takes two interconnection chips 112 as an example (interconnection chip 1 and interconnection chip 2) for illustration, which is not specifically limited in this application.

The interconnection chip 112 is provided with a high-speed interconnection port 130, and the interconnection chip 112 can perform data communication with the second node 120 through the high-speed interconnection port 130. The first high-speed interconnection port of the interconnection chip 112 and the second high-speed interconnection port on the second node 120 They are connected through a cable 140 , wherein the description of the high-speed interconnection port 130 and the cable 140 can refer to the foregoing embodiments in FIG. 1 and FIG. 2 , and details are not repeated here. It should be noted that the number of first high-speed interconnection ports on each interconnection chip 112 may be one or more, and each first high-speed interconnection port is in a one-to-one correspondence with the second high-speed interconnection port on the second node, as shown in FIG. 3 Take the number of interconnection chips 112 as two as an example for illustration, that is, interconnection chip 1 includes high-speed interconnection port 2 and high-speed interconnection port 3, and interconnection chip 2 includes high-speed interconnection port 4 and high-speed interconnection port 5. Make specific restrictions.

The description of the second node 120 can refer to the embodiment in FIG. 1 and FIG. 4) An example is used for illustration, and this application does not make specific limitations.

In this embodiment of the application, the computing chip 111 is used to generate the above-mentioned data access request, and send the data access request to the interconnection chip 112. In a specific implementation, the computing chip 111 can send data to the interconnection chip 112 through the port 113 of the above-mentioned computing chip access request. The interconnection chip 112 is used to send the data access request to the second node 120 through the above-mentioned cable 140. ask. It can be understood that deploying the interconnection chip 112 in the first node 110 can enable the first node 110 to communicate with more second nodes 120, and the greater the number of interconnection chips 112, the higher the high-speed The greater the number of interconnection ports 130, the greater the number of second nodes 120 connected to the first node 110, thereby expanding the memory expansion capability of the first node 110, making the first node 110 applicable to more application scenarios.

In the embodiment of the present application, the data communication between the computing chip 111 , the interconnection chip 112 and the second node 120 can implement an addressing function through an address decoder. The address decoders in the computing chip 111 , the interconnection chip 112 and the second node 120 will be described in detail below with reference to FIG. 3 .

In one embodiment, as shown in FIG. 3 , a first address decoder 210 is deployed in the computing chip 111, and the computing chip 111 is specifically used to generate a data access request, and according to the first destination address and the first The address decoder 210 determines the first port, and sends a data access request to the interconnection chip 112 through the first port, wherein the first address decoder 210 can record the correspondence between the destination address and the port of the computing chip.

In one embodiment, as shown in FIG. 3 , a second address decoder 220 is deployed in the interconnection chip 112, and the interconnection chip 112 is specifically used to determine the first high-speed interconnection address according to the first destination address and the second address decoder 220. The port sends a data access request to the second node 120 through the first high-speed interconnection port, wherein the second address decoder 220 is used to record the correspondence between the destination address and the high-speed interconnection port.

In one embodiment, as shown in FIG. 3 , a third address decoder 230 is deployed in the second node 120, and the second node 120 is specifically configured to determine the first The local physical address corresponding to the destination address, wherein the third address decoder 230 is used to record the correspondence between the destination address and the local physical address.

In one embodiment, as shown in FIG. 3 , the data access system may further include a configuration node 150, and the configuration node 150 may configure the first address decoder 210, the second address decoder 220, and the third address decoder 230 for configuration. Specifically, the configuration node 150 is used to obtain at least one local physical address of the memory of the second node from the second node 120, the configuration node is used to determine at least one corresponding destination address according to the at least one local physical address, and decode the third address The configuration node is also used to configure the second address decoder according to at least one destination address in combination with the high-speed interconnection port between the second node and the interconnection chip; the configuration node is also used to configure the second address decoder according to at least one destination address, The first address decoder is configured in combination with the chip port between the interconnection chip and the computing chip.

It can be understood that by configuring the first, second and third address decoders configured by the node 150, it can be ensured that the data access request generated by the computing chip is routed and addressed by the address decoder, and the data access request is transmitted to the corresponding address of the destination address. The CPU of the second node performs memory reading and writing, thereby avoiding the waiting time for network card queue preparation, improving the efficiency of the first node 110 to expand memory reading and writing, and the delay can even reach the microsecond level (the Ethernet delay can reach milliseconds) level), the bandwidth can reach 400GB, which has higher bandwidth and delay than the RDMA network card with a bandwidth of only 100GB.

In an embodiment, when the configuration node 150 obtains at least one local physical address of the memory of the second node from the second node 120, it may determine that the second node 120 is allocated for use according to the size of the memory of the second node 120 and in combination with business requirements. The local physical address of the extended memory used by the first node 110.

Optionally, the extended memory used by the first node 110 may be a part of the memory of the second node 120, and this part of the extended memory may be processed through a memory isolation technology, so that the second node 120 cannot access this part of the extended memory, improving the performance of the extended memory. Security of data stored in .

Optionally, the corresponding relationship recorded by the third address decoder 230 may be: local physical address=destination address-base address, where the base address refers to the starting address of an address segment, also known as the first address or segment address , the base addresses of the destination addresses belonging to the same address segment are the same.

In one embodiment, when matching the first destination address with the first address decoder 210 and the second address decoder 220, the whole or part of the first destination address can be matched with the address in the decoder , so as to improve the matching efficiency, thereby improving the efficiency of data access.

Optionally, the first port may be determined according to the base address and length of the first destination address in the data access request. Specifically, the computing chip 111 is specifically configured to match the base address and length of the destination address recorded in the first address decoder 210 with the base address and length of the first destination address, and determine the first address corresponding to the matched destination address. port. Similarly, the interconnect chip 112 is specifically used to match the base address and length of the destination address recorded in the second address decoder 220 with the base address and length of the second destination address, and determine the first address corresponding to the matched destination address. High-speed Internet port. I won't go into details here.

Optionally, the first port may be determined according to the upper address of the first destination address. The computing chip 111 is specifically used to match the high-order address of the destination address recorded in the first address decoder with the high-order address of the first destination address, and determine the first port corresponding to the matched destination address, wherein the bits of the high-order address The number is determined according to the memory size of the second node. Similarly, the interconnection chip 112 is specifically used to match the upper address of the destination address recorded in the second address decoder with the upper address of the first destination address, and determine the first high-speed interconnection port corresponding to the matched destination address. I won't go into details here.

It should be understood that since the expanded memory provided by the second node 120 has multiple physical addresses, the ports corresponding to some of the destination addresses recorded in the first and second decoders may be the same, corresponding to the same port The addresses are located in the same memory. These destination addresses corresponding to the same port have the same base address and length, or the high address is the same, so the first can be determined by matching the base address and length, or matching the high address. The port corresponding to the destination address.

Still taking the data access system shown in Figure 3 as an example, assuming that the number of physical addresses corresponding to the extended memory provided by the second node 4 is 4, and the third address decoder records destination addresses 1-10, then the second address decoder The encoder can record that destination addresses 1-10 correspond to high-speed interconnection ports 5. If the first destination address is any one of destination addresses 1-10, the corresponding high-speed interconnection ports are all high-speed interconnection ports 5, which correspond to the same high-speed interconnection ports. The destination address of the interconnect port is also the address of the extended memory of the second node 4, so the base address and length of these destination addresses are the same, or the high address is the same. When determining the high-speed interconnection port corresponding to the first destination address, the partial address of the first destination address can be matched with the partial address of the destination address in the second address decoder, thereby improving the matching efficiency.

For example, FIG. 4 is an example diagram of an address decoder 210. As shown in FIG. 4, the first address decoder 210 may include a plurality of destination addresses, and the computing chip corresponding to the base address and the destination address with the same length The ports of 111 are the same, assuming that the base address and length of the first destination address are as shown in Figure 4, the base address and length of the first destination address can be compared with the base address and length of each destination address in the first address decoder 210 Matching is performed to determine that the first port corresponding to the matched destination address is port 2, and the data access request can be transmitted to the interconnection chip 112 through port 2. Similarly, the high-speed interconnection port for transmitting the data access request is determined according to the base address and length of the destination address recorded by the second address decoder 220, and the description will not be repeated here. It should be understood that, in FIG. 4 , the first destination address is matched based on the base address and the length, and the above-mentioned matching method based on the high-order address is similar, and no further illustration is given here.

It should be noted that the data access system shown in FIG. 1 can also implement routing addressing for data access requests through the above address decoder. Specifically, the first node 110 may be equipped with a second address decoder 220, and the second node 120 may be equipped with a third address decoder 230, and the data access request generated by the first node 110 may be configured according to the second address decoder 220 The corresponding relationship between the high-speed interconnection port and the destination address recorded in , determine the first high-speed interconnection port corresponding to the first destination address, and then send the data access request to the second node 120 through the first high-speed interconnection port, which will not be described in detail here . It should be understood that in the data access system shown in FIG. 1 , the high-speed interconnect port can be deployed on the processor in the first node 110. In simple terms, the processor of the first node 110 and the processor of the second node 120 pass The cable connects directly.

It should be noted that if the data access request is to read data in memory from the second node 120, after the second node 120 processes the data access request, it can combine the first, second and The third address decoder returns the read data back to the first node 110 through the original path, which will not be repeated here.

In summary, in the data access system provided by this application, the high-speed interconnection port of the first node is connected to the high-speed interconnection port of the second node through a cable, and the first node can combine the address decoder to realize the addressing function, so that the data The access request is sent to the memory of the second node corresponding to the first destination address to realize the memory expansion of the first node. This method does not need to deploy additional network cards or routers, and does not need to wait for the preparation time of the network card queue unit, so that the first node can access the second node The memory has high efficiency and low latency. At the same time, by increasing the number of high-speed interconnection ports, the number of second nodes can be increased, and the expanded memory capacity of the first node can be increased, so that the scalable memory capacity of the first node is large and can handle more business in application scenarios.

The data access method provided by this application will be explained below with reference to FIG. 5 . Fig. 5 is a data access method provided by the present application, which can be applied to the data access system shown in Fig. 1 to Fig. 4, and the method may include the following steps:

Step S510: the first node generates a data access request, where the data access request is used to request data in the memory of the second node. For the description of the first node, reference may be made to the embodiments in FIG. 1 to FIG. 4 , and details are not repeated here.

Step S520: the first node sends a data access request to the second node through a cable. It should be understood that the first high-speed interconnection port of the first node is connected to the second high-speed interconnection port of the second node through a cable, wherein, the description of the high-speed interconnection port and the cable can refer to the embodiments in FIGS. 1 to 4 , and will not be repeated here. repeat.

In an embodiment, the first node may include a second address decoder, which is used to record the correspondence between the destination address and the high-speed interconnection port, and the first node may, according to the first address in the data access request A destination address and second address decoder determine the first high-speed interconnection port corresponding to the first destination address, and then send the data access request to the second node through the first high-speed interconnection port. Wherein, the specific description of the second address decoder can refer to the embodiments shown in FIG. 1 to FIG. 4 , which will not be repeated here.

Step S530: the second node converts the first destination address in the data access request into a local physical address corresponding to the first destination address, and accesses the data in the memory of the second node according to the local physical address.

In an embodiment, the second node may include a third address decoder, which is used to record the correspondence between the destination address and the local physical address, and the second node may, according to the first address in the data access request A destination address and a third decoder determine the local physical address corresponding to the first destination address, and then access data in the memory of the second node according to the local physical address. Wherein, the specific description of the third address decoder can refer to the embodiments shown in FIG. 1 to FIG. 4 , and details are not repeated here.

In an embodiment, the first node may include a computing chip and an interconnection chip, and the computing chip is connected to the interconnection chip through a port. , interconnection chips, ports and bus descriptions may refer to the embodiments shown in FIG. 1 to FIG. 4 , which will not be repeated here.

In a specific implementation, the computing chip may perform step S510 to generate the above data access request, and send the data access request to the interconnection chip, and the computing chip may send the data access request to the interconnection chip through a port of the computing chip. The interconnection chip sends the data access request to the second node through the first high-speed interconnection port.

It can be understood that deploying interconnection chips in the first node can enable the first node to communicate with more second nodes. The greater the number of interconnection chips, the greater the number of high-speed interconnection ports that can be deployed in the first node. , so that the number of second nodes connected to the first node increases, thereby expanding the memory expansion capability of the first node, so that the first node can be applied to more application scenarios.

In one embodiment, the computing chip is equipped with a first address decoder. After the computing chip generates a data access request, it determines the first port according to the first destination address in the data access request and the first address decoder. A port sends a data access request to the interconnection chip, wherein the first address decoder can record the correspondence between the destination address and the port of the computing chip.

In one embodiment, a second address decoder is deployed in the interconnection chip, and the interconnection chip can determine the first high-speed interconnection port according to the first destination address and the second address decoder, and communicate to the second node through the first high-speed interconnection port A data access request is sent, wherein the second address decoder is used to record the corresponding relationship between the destination address and the high-speed interconnection port.

In an embodiment, the data access system may further include a configuration node, and the configuration node may configure the first address decoder, the second address decoder, and the third address decoder before the first node generates a data access request. to configure. Specifically, the configuration node obtains at least one local physical address of the memory of the second node from the second node, determines at least one corresponding destination address according to the at least one local physical address, and configures the third address decoder; according to at least one destination Address, combining the high-speed interconnection port between the second node and the interconnection chip, configuring the second address decoder; according to at least one destination address, combining the chip port between the interconnection chip and the computing chip, decoding the first address device to configure.

It can be understood that by configuring the first, second and third address decoders configured by the node, it can be ensured that the data access request generated by the computing chip is routed and addressed by the address decoder, and the data access request is transmitted to the address corresponding to the destination address. The CPU of the second node reads and writes the memory, thereby avoiding the waiting time for the network card queue preparation, improving the efficiency of the first node to read and write the expanded memory, and the delay can even reach the microsecond level (the Ethernet delay can reach the millisecond level) , the bandwidth can reach 400GB, which has higher bandwidth and delay than the RDMA network card with a bandwidth of only 100GB.

In an embodiment, when the configuration node acquires at least one local physical address of the memory of the second node from the second node 120, it may determine that the second node is allocated for the first node according to the size of the memory of the second node and combined with business requirements. 110 The local physical address of the extended memory used.

Optionally, the extended memory used by the first node can be part of the memory of the second node, and this part of the extended memory can be processed through memory isolation technology, so that the second node cannot access this part of the extended memory, thereby increasing the storage capacity of the extended memory. Data Security.

Optionally, the corresponding relationship recorded by the third address decoder may be: local physical address=destination address-base address, wherein the base address refers to the starting address of an address segment, also known as the first address or segment address, The base addresses of the destination addresses belonging to the same address segment are the same.

In an embodiment, when matching the first destination address with the first address decoder and the second address decoder, the whole or part of the first destination address can be matched with the address in the decoder, thereby Improve matching efficiency, thereby improving the efficiency of data access.

Optionally, the first port may be determined according to the base address and length of the first destination address in the data access request. Specifically, the computing chip can match the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address, and determine the first port corresponding to the matched destination address. Similarly, the interconnection chip can match the base address and length of the destination address recorded in the second address decoder with the base address and length of the second destination address, and determine the first high-speed interconnection port corresponding to the matched destination address. I won't go into details here.

Optionally, the first port may be determined according to the upper address of the first destination address. The computing chip can match the high-order address of the destination address recorded in the first address decoder with the high-order address of the first destination address, and determine the first port corresponding to the matched destination address, wherein the number of digits of the high-order address is based on The memory size of the second node is determined. Similarly, the interconnection chip can match the upper address of the destination address recorded in the second address decoder with the upper address of the first destination address, and determine the first high-speed interconnection port corresponding to the matched destination address. I won't go into details here.

It should be understood that since the expanded memory provided by the second node corresponds to multiple physical addresses, the ports corresponding to part of the destination addresses recorded in the first and second decoders may be the same, corresponding to the destination addresses of the same port Located in the same memory, these destination addresses corresponding to the same port have the same base address and length, or the high address is the same, so the first purpose can be determined by matching the base address and length, or matching the high address The port corresponding to the address, thus improving the matching efficiency.

It should be noted that for a detailed description of the manner of matching through the base address and the length, reference may be made to the illustration of the embodiment in FIG. 4 above, and details are not repeated here.

In summary, in the data access method provided by this application, the high-speed interconnection port of the first node is connected to the high-speed interconnection port of the second node through a cable, and the first node can combine the address decoder to realize the addressing function, so that the data The access request is sent to the memory of the second node corresponding to the first destination address to realize the memory expansion of the first node. This method does not need to deploy additional network cards or routers, and does not need to wait for the preparation time of the network card queue unit, so that the first node can access the second node The memory has high efficiency and low latency. At the same time, by increasing the number of high-speed interconnection ports, the number of second nodes can be increased, and the expanded memory capacity of the first node can be increased, so that the scalable memory capacity of the first node is large and can handle more business in application scenarios.

FIG. 6 is a schematic structural diagram of a computing node 600 provided in the present application. The computing node 600 may be the first node 110 in the aforementioned content. The computing node 600 may include a computing chip 111 and an interconnection chip 112, wherein the computing chip 111 It may include a generating unit 1111 , a first matching unit 1112 and a second sending unit 1113 , and the interconnection chip 112 may include a first sending unit 1121 and a second matching unit 1122 .

The generating unit 1111 is configured to generate a data access request, wherein the data access request is used to request data in the memory of the second node, and specifically step S510 in the embodiment of FIG. 5 can be executed;

The first sending unit 1121 is configured to send the data access request to the second node through the cable, so that the second node converts the first destination address in the data access request into a local physical address corresponding to the first destination address, and according to the local The physical address accesses the data in the memory of the second node, and specifically step S520 in the embodiment of FIG. 5 can be executed.

In one embodiment, the first high-speed interconnection port of the interconnection chip 112 is connected to the second high-speed interconnection port of the processor in the second node through a cable, and the generating unit 1111 is configured to generate a data access request through the computing chip 111; The second sending unit 1113 is configured to send the data access request to the interconnection chip 112 through the computing chip 111 ; the first sending unit 1121 is configured to send the data access request to the second node through the interconnection chip 112 through a cable.

In one embodiment, the computing chip 111 is connected to the interconnection chip 112 through a port, the computing chip 111 includes a first address decoder, and the first matching unit 1112 is used to pass through the computing chip 111 according to the first purpose in the data access request The address and the first address decoder determine the first port, wherein the first address decoder is used to record the corresponding relationship between the destination address and the port of the computing chip; the second sending unit 1113 is used to pass the computing chip through the first port A port sends a data access request to the interconnection chip 112 .

In one embodiment, the interconnection chip 112 includes a second address decoder and a second matching unit 1122, configured to determine the first high-speed interconnection port through the interconnection chip 112 according to the first destination address and the second address decoder, Wherein, the second address decoder is used to record the corresponding relationship between the destination address and the high-speed interconnection port; the first sending unit 1121 is used to send a data access request to the second node through the interconnection chip 112 through the first high-speed interconnection port.

In one embodiment, the first matching unit 1112 is configured to match the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address through the computing chip 111 to determine the matching The first port corresponding to the last destination address; the second matching unit 1122 is used to perform the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address through the interconnection chip 112 Matching, determining the first high-speed interconnection port corresponding to the matched destination address.

In one embodiment, the first matching unit 1112 is configured to match the upper address of the destination address recorded in the first address decoder with the upper address of the first destination address through the computing chip 111 to determine the matched destination address Corresponding first port, wherein, the number of digits of the upper address is determined according to the memory size of the second node; the second matching unit 1122 is used to convert the base address of the destination address recorded in the second address decoder through the interconnection chip 112 The address and length are matched with the base address and length of the first destination address, and the first high-speed interconnection port corresponding to the matched destination address is determined.

In one embodiment, the first high-speed interconnect port and the second high-speed interconnect port are high-speed serial bus ports, and the first port is a high-speed serial bus port.

The first node 110 can be a physical server, such as an X86 server, an ARM server, etc.; it can also be a virtual machine (virtual machine, VM) implemented based on a common physical server combined with network functions virtualization (network functions virtualization, NFV) technology, A virtual machine refers to a complete computer system simulated by software with complete hardware system functions and running in a completely isolated environment, such as a virtual device in cloud computing, which is not specifically limited in this application; it can also be multiple physical servers or virtual machines composed of server clusters.

FIG. 7 is a schematic structural diagram of a storage node 700 provided in the present application. The storage node 700 may be the second node 120 in the embodiments of FIGS.

The receiving unit 121 is configured to receive a data access request, the data access request is generated by the first node, and the data access request is sent by the first node through a cable;

The conversion unit 122 is configured to convert the first destination address in the data access request into a local physical address corresponding to the first destination address, and access data in the memory of the second node according to the local physical address.

In one embodiment, the second node 120 includes a third address decoder; the converting unit 122 is configured to determine the local physical address corresponding to the first destination address according to the first destination address and the third address decoder, wherein, The third address decoder is used to record the corresponding relationship between the destination address and the local physical address.

In one embodiment, the first high-speed interconnection port of the first node is connected to the second high-speed interconnection port of the processor in the second node through a cable, and the first high-speed interconnection port and the second high-speed interconnection port are high-speed serial buses port.

The second node 120 can be a physical server, such as an X86 server, an ARM server, etc.; it can also be a virtual machine (virtual machine, VM) implemented based on a common physical server combined with network functions virtualization (network functions virtualization, NFV) technology, A virtual machine refers to a complete computer system simulated by software with complete hardware system functions and running in a completely isolated environment, such as a virtual device in cloud computing, which is not specifically limited in this application; it can also be multiple physical servers or virtual machines composed of server clusters.

To sum up, in the first node and the second node provided by this application, the high-speed interconnection port of the first node is connected to the high-speed interconnection port of the second node through a cable, and the first node can be combined with an address decoder to realize addressing function, so as to send the data access request to the memory of the second node corresponding to the first destination address, and realize the memory expansion of the first node. Node access to the memory of the second node has high efficiency and low delay. At the same time, the number of second nodes can be increased by increasing the high-speed interconnection port, which can increase the expanded memory capacity of the first node, so that the scalable memory capacity of the first node is very large. , able to handle business in more application scenarios.

FIG. 8 is a schematic structural diagram of a computing device provided by the present application. The computing device 800 may be the first node 110 or the second node 120 in the embodiments of FIG. 1 to FIG. 7 , and the computing device may be a physical server or a virtual machine. Or a server cluster, it may also be a chip (system) or other components or components that can be set on a physical server or a virtual machine, which is not limited in this application.

Further, the computing device 800 includes a processor 801, a memory 802, and a communication interface 803, where the processor 801, the memory 802, and the communication interface 803 communicate through a bus 805, or other means such as wireless transmission.

The processor 801 may be composed of at least one general-purpose processor, such as a CPU, an NPU, or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field programmable logic gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof. Processor 801 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 802, which enable computing device 800 to provide a wide variety of services.

In a specific implementation, the above-mentioned processor 801 may be a computing chip or an interconnection chip in the first node mentioned above, or may be a processor chip in the second node, which is not specifically limited in this application. In a specific implementation, as an embodiment, the processor 801 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 8 .

In a specific implementation, as an embodiment, the computing device 800 may also include multiple processors, such as the processor 801 and the processor 804 shown in FIG. 8 . Each of these processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).

The memory 802 is used to store program codes, which are executed under the control of the processor 801, so as to execute the processing steps of the workflow system in any of the above-mentioned embodiments in FIGS. 1-7. One or more software modules may be included in the program code. When the computing node is the first node 110, the above one or more software modules may be the generating unit 1111, the first matching unit 1112, the second sending unit 1113, the second matching unit 1122 and the first sending unit in the embodiment of FIG. Unit 1121, the specific implementation of the above-mentioned method can refer to the embodiment of the method in FIG. For the conversion unit 122, for the specific implementation manner above, reference may be made to the method embodiment in FIG. 6 , and details are not repeated here.

The memory 802 may include read-only memory and random-access memory, and provides instructions and data to the processor 801 . Memory 802 may also include non-volatile random access memory. For example, memory 802 may also store device type information.

Memory 802 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM). It can also be a hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc., and the hard disk can be a hard disk drive (hard disk drive). , HDD), solid state disk (solid state disk, SSD), mechanical hard disk (mechanical hard disk, HDD), etc., which are not specifically limited in this application.

The communication interface 803 can be a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface) or a wireless interface ( For example, a cellular network interface or a wireless local area network interface) is used to communicate with other servers or modules. In specific implementation, the communication interface 803 can be used to receive a message for the processor 801 or processor 804 to process the message.

The bus 805 can be a peripheral component interconnection standard (Peripheral Component Interconnect Express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, unified bus (unified bus, Ubus or UB), computer fast link (compute express link, CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus 805 can be divided into an address bus, a data bus, a control bus, and the like.

In addition to the data bus, the bus 805 may also include a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus 805 in the figure.

It should be noted that FIG. 8 is only a possible implementation manner of the embodiment of the present application. In practical applications, the computing device 800 may include more or fewer components, which is not limited here. Regarding the content not shown or described in the embodiment of the present application, reference may be made to the related explanations in the foregoing embodiments of FIGS. 1-7 , which will not be repeated here.

It should be understood that the computing device 800 shown in FIG. 8 may also be a computer cluster composed of at least one physical server. For details, reference may be made to the description of the specific form of the data access system in the embodiment of FIG. 1 to FIG. repeat.

An embodiment of the present application provides a chip, which can be specifically used in a server where a processor of the X86 architecture resides (also may be called an X86 server), a server where a processor of the ARM architecture resides (also may be referred to as an ARM server for short), etc. The chip may include the above-mentioned device or logic circuit, and when the chip runs on the server, the server is made to execute the data access method described in the above method embodiment.

In a specific implementation, the chip may be a computing chip or an interconnection chip in the first node in the foregoing content, or may be a processor chip in the second node.

An embodiment of the present application provides a main board, which may also be called a printed circuit board (printed circuit boards, PCB). The main board includes a processor, and the processor is used to execute program codes to implement the data access method described in the above method embodiments. Optionally, the mainboard may further include a memory, which is used to store the above program codes for execution by the processor.

An embodiment of the present application provides a computer-readable storage medium, including: computer instructions are stored in the computer-readable storage medium; when the computer instructions are run on a computer, the computer is made to perform the data access described in the foregoing method embodiments method.

An embodiment of the present application provides a computer program product containing instructions, including a computer program or an instruction. When the computer program or instruction is run on a computer, the computer is made to execute the data access method described in the above method embodiments.

The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. A computer program product comprises at least one computer instruction. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. A computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage node such as a server or a data center that includes at least one set of available media. Available media may be magnetic media (eg, floppy disks, hard disks, tapes), optical media (eg, high-density digital video discs (DVD), or semiconductor media. The semiconductor media may be SSDs.

The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily think of various equivalent repairs or repairs within the technical scope disclosed in the present invention. Replacement, these repairs or replacements should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

A data access system, characterized in that the data access system includes a first node and a second node, and the first node is connected to the second node through a cable;

The first node is configured to generate a data access request, wherein the data access request is used to request data in the memory of the second node;

The first node is configured to send the data access request to the second node through the cable;

The second node is configured to convert the first destination address in the data access request into a local physical address corresponding to the first destination address, and access the memory of the second node according to the local physical address the data.
The system according to claim 1, wherein the first node includes a computing chip and an interconnection chip, wherein the first high-speed interconnection port of the interconnection chip is connected to the second high-speed interconnection port of the processor in the second node. The high-speed interconnection port is connected through the cable;

The computing chip is used to generate the data access request, and send the data access request to the interconnection chip;

The interconnection chip is used to send the data access request to the second node through the cable.
The system according to claim 2, wherein the computing chip is connected to the interconnection chip through a port, and the computing chip includes a first address decoder;

The computing chip is specifically configured to: generate the data access request, determine the first port according to the first destination address in the data access request and the first address decoder, and send data to The interconnection chip sends the data access request, wherein the first address decoder is used to record the correspondence between the destination address and the port of the computing chip.
The system according to claim 3, wherein the interconnection chip includes a second address decoder,

The interconnection chip is specifically configured to: determine the first high-speed interconnection port according to the first destination address and the second address decoder, and send the A data access request, wherein the second address decoder is used to record the correspondence between the destination address and the high-speed interconnection port.
The system of claim 4, wherein the second node includes a third address decoder;

The second node is specifically configured to: determine the local physical address corresponding to the first destination address according to the first destination address and the third address decoder, wherein the third address decoder is used to Record the correspondence between the destination address and the local physical address.
The system according to claim 5, wherein the data access system further comprises a configuration node,

The configuration node is used to obtain at least one local physical address of the memory of the second node from the second node;

The configuration node is configured to determine at least one corresponding destination address according to the at least one local physical address, and configure the third address decoder;

The configuration node is further configured to configure the second address decoder according to the at least one destination address in combination with the high-speed interconnection port between the second node and the interconnection chip;

The configuration node is further configured to configure the first address decoder according to the at least one destination address in combination with a chip port between the interconnection chip and the computing chip.
A system according to any one of claims 4 to 6, characterized in that,

The computing chip is specifically used to: match the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address, and determine the corresponding address of the matched destination address. said first port;

The interconnection chip is specifically used to: match the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address, and determine the corresponding The first high-speed interconnect port.
A system according to any one of claims 4 to 6, characterized in that,

The computing chip is specifically configured to: match the high-order address of the destination address recorded in the first address decoder with the high-order address of the first destination address, and determine the first destination address corresponding to the matched destination address. port, wherein the number of bits of the high address is determined according to the memory size of the second node;

The interconnection chip is specifically used for: matching the upper address of the destination address recorded in the second address decoder with the upper address of the first destination address, and determining the first destination address corresponding to the matched destination address. High-speed Internet port.
The system according to any one of claims 2 to 8, wherein the first high-speed interconnection port and the second high-speed interconnection port are high-speed serial bus ports, and the first port is a high-speed serial bus port. bus port.
A data access method, characterized in that the method is applied to a data access system, the data access system includes a first node and a second node, the first node is connected to the second node through a cable, the The methods described include:

The first node generates a data access request, wherein the data access request is used to request data in the memory of the second node;

the first node sends the data access request to the second node through the cable;

The second node converts the first destination address in the data access request into a local physical address corresponding to the first destination address, and accesses the data.
The method according to claim 10, wherein the first node includes a computing chip and an interconnection chip, wherein the first high-speed interconnection port of the interconnection chip is connected to the second high-speed interconnection port of the processor in the second node The high-speed interconnect port is connected by cable;

The generating of the data access request by the first node includes:

The computing chip generates the data access request, and sends the data access request to the interconnection chip;

Sending the data access request to the second node by the first node through the cable includes:

The interconnection chip sends the data access request to the second node through the cable.
The method according to claim 11, wherein the computing chip is connected to the interconnection chip through a port, and the computing chip includes a first address decoder;

The computing chip generating the data access request, and sending the data access request to the interconnection chip includes:

The computing chip generates the data access request, determines the first port according to the first destination address in the data access request and the first address decoder, and communicates to the interconnection chip through the first port Sending the data access request, wherein the first address decoder is used to record the correspondence between the destination address and the port of the computing chip.
The method according to claim 12, wherein the interconnection chip includes a second address decoder,

Sending the data access request to the second node by the interconnection chip through the cable includes:

The interconnection chip determines the first high-speed interconnection port according to the first destination address and the second address decoder, and sends the data access request to the second node through the first high-speed interconnection port, Wherein, the second address decoder is used to record the corresponding relationship between the destination address and the high-speed interconnection port.
The system of claim 13, wherein the second node includes a third address decoder;

The second node converting the first destination address in the data access request into a local physical address corresponding to the first destination address includes:

The second node determines the local physical address corresponding to the first destination address according to the first destination address and the third address decoder, wherein the third address decoder is used to record the destination address and Correspondence between local physical addresses.
The system according to claim 14, wherein the data access system further comprises a configuration node, and the method further comprises:

The configuration node acquires at least one local physical address of the memory of the second node from the second node;

The configuration node determines at least one corresponding destination address according to the at least one local physical address, and configures the third address decoder;

The configuration node configures the second address decoder according to the at least one destination address in combination with the high-speed interconnection port between the second node and the interconnection chip;

The configuration node configures the first address decoder according to the at least one destination address in combination with a chip port between the interconnection chip and the computing chip.
The method according to any one of claims 13 to 15, wherein the computing chip determines the first destination address according to the first destination address in the data access request and the first address decoder. Ports include:

The calculation chip matches the base address and length of the destination address recorded in the first address decoder with the base address and length of the first destination address, and determines the first address corresponding to the matched destination address. port;

The interconnect chip determining the first high-speed interconnect port according to the first destination address and the second address decoder includes:

The interconnection chip matches the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address, and determines the first destination address corresponding to the matched destination address. High-speed Internet port.
The method according to any one of claims 13 to 15, wherein the computing chip determines the first destination address according to the first destination address in the data access request and the first address decoder. Ports include:

The calculation chip matches the upper address of the destination address recorded in the first address decoder with the upper address of the first destination address, and determines the first port corresponding to the matched destination address, wherein, The number of digits of the high address is determined according to the memory size of the second node;

The interconnect chip determining the first high-speed interconnect port according to the first destination address and the second address decoder includes:

The interconnection chip matches the base address and length of the destination address recorded in the second address decoder with the base address and length of the first destination address, and determines the first destination address corresponding to the matched destination address. High-speed Internet port.
The method according to any one of claims 10 to 17, wherein the first high-speed interconnection port and the second high-speed interconnection port are high-speed serial bus ports, and the first port is a high-speed serial bus port. row bus port.
A computing node, characterized in that it is applied to a data access system, and the data access system further includes a storage node, and the computing node includes: a computing chip and an interconnection chip, wherein the computing chip is connected to the high-speed interconnection port The interconnection chip is connected, and the interconnection chip is connected to the other nodes through a high-speed interconnection port and a cable;

The computing chip is configured to generate a data access request, and send the data access request to the interconnection chip, wherein the data access request includes a first destination address, and the first destination address indicates The location of the memory;

The interconnection chip is configured to send the data access request to the storage node according to the first destination address.
A storage node, characterized in that it is applied to a data access system, and the data access system further includes a computing node, the storage node includes a processor and a memory, and the storage node is connected through a high-speed interconnection port and a line of the processor A cable is connected to the computing node;

The processor is configured to receive the data access request sent by the computing node through the high-speed interconnection port, and convert the first destination address carried in the data access request into the storage node corresponding to the first destination address local physical address, and access the data in the memory according to the local physical address.