CN115269453A

CN115269453A - Data transceiving method, processor, electronic device and computer system

Info

Publication number: CN115269453A
Application number: CN202210894156.3A
Authority: CN
Inventors: 邱昊楠; 李强
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-01

Abstract

The present disclosure relates to the field of computer technologies, and in particular, to a data transceiving method, a processor, an electronic device, and a computer system, where the data transceiving method includes: allocating a designated cache region for a designated thread in a cache, wherein the designated thread is used for receiving and sending data through a network, the designated cache region is used for caching data written in an address in a designated buffer region, and the designated buffer region is established in a memory by the designated thread; receiving a write request, wherein the write request comprises a write address and write data in the designated buffer, and the write data comprises data to be sent by the designated thread through a network or data received by the designated thread through the network; and for the received write request, writing the write data in the write request into an address corresponding to the write address in the designated cache region.

Description

Data transceiving method, processor, electronic device and computer system

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data transceiving method, a processor, an electronic device, and a computer system.

Background

In recent years, the speed of data center networks is rapidly increased, from 25Gbps networks to 100Gbps networks and then to 200Gbps networks, the network bandwidth doubles every two years, and 400Gbps networks are immediately put into commercial use. The consumption of ultra-large memory bandwidth is brought by a high-speed network, the consumption of 80GBps memory bandwidth is required for receiving and transmitting data under a double-port 200Gbps network, the memory bandwidth capacity of a memory strip used on a server widely deployed in a data center is about 100GBps generally, and after the memory access of the network is removed, the residual memory bandwidth cannot meet the application requirement deployed on the server.

If the physical configuration of the servers of the data center is not upgraded, the currently deployed servers cannot bear such a large memory bandwidth access of a high-speed network, and the upgrade of the physical configuration causes cost increase, including hardware cost, electricity charge, refrigeration and Central Processing Unit (CPU) configuration upgrade cost related thereto. More importantly, the update cycle of the network bandwidth is much shorter than the service period of the server hardware of the data center, so that the update speed of the network bandwidth is often not kept up with the update speed of the server hardware, and therefore, the problem that the server cannot bear large memory bandwidth access of a high-speed network by only depending on hardware update cannot be solved.

Disclosure of Invention

To solve the problems in the related art, embodiments of the present disclosure provide a data transceiving method, a processor, an electronic device, and a computer system.

In a first aspect, an embodiment of the present disclosure provides a data transceiving method, including:

allocating a designated cache region for a designated thread in a cache, wherein the designated thread is used for receiving and sending data through a network, the designated cache region is used for caching data written by an address in a designated buffer region, and the designated buffer region is established in a memory by the designated thread;

receiving a write request, wherein the write request comprises a write address and write data in the designated buffer, and the write data comprises data to be sent by the designated thread through a network or data received by the designated thread through the network;

and for any received write request, writing the write data in the write request into an address corresponding to the write address in the designated cache area.

According to an embodiment of the present disclosure, wherein:

the designated cache region is not allowed to be allocated to threads other than the designated thread;

the designated cache region is positioned in a third-level cache of the central processing unit and is smaller than the size of the third-level cache;

the designated buffer area allows the threads except the designated thread to execute the read-write operation, and does not allow the threads except the designated thread to execute the operations except the read-write operation.

According to an embodiment of the present disclosure, the method further comprises:

receiving a read request, wherein the read request comprises a read address in the designated buffer;

and for any received read request, reading corresponding data from an address corresponding to the read address in the specified cache region.

According to an embodiment of the present disclosure, wherein:

and after reading corresponding data from the specified cache region according to the read address in the read request, allowing the specified thread to write data to the address in the specified cache region corresponding to the read address.

According to an embodiment of the present disclosure, wherein:

the designated buffer is a reception buffer for storing reception data received through a network;

the designated buffer area is a receiving buffer area corresponding to the receiving buffer area and is used for buffering data written with addresses in the receiving buffer area;

the allocating of the designated cache area for the designated thread includes:

and according to the instruction for loading the receiving buffer area into the cache, allocating a receiving cache area corresponding to the receiving buffer area in the cache.

According to an embodiment of the present disclosure, wherein:

the receiving of the write request comprises receiving the write request from a network card;

the write address comprises a write address in the receive buffer;

the write data includes receive data received over a network;

the writing the write data in the write request into the address corresponding to the write address in the designated cache area comprises writing the received data into the address corresponding to the write address in the receiving cache area.

receiving a read request from a designated storage device, the read request including a read address in the receive buffer;

and reading corresponding data from the address corresponding to the read address in the receiving cache region to the specified storage device according to the read request.

According to an embodiment of the present disclosure, wherein:

the receiving of the write request from the network card comprises receiving a DMA write request from the network card through a PCIe bus;

the receiving a read request from the designated storage device includes receiving a DMA read request from a network card over a PCIe bus.

According to an embodiment of the present disclosure, wherein:

the designated buffer is a transmission buffer for storing transmission data to be transmitted through a network;

the appointed buffer area is a sending buffer area corresponding to the sending buffer area and is used for buffering data written with addresses in the sending buffer area;

the allocating of the designated cache area for the designated thread includes:

and according to the instruction for loading the sending buffer area into the cache, allocating a sending cache area corresponding to the sending buffer area in the cache.

According to an embodiment of the present disclosure, wherein:

the receiving a write request comprises receiving a write request from a specified storage device;

the write request is used for a write address in the sending buffer;

the write data includes transmission data to be transmitted through a network;

and writing the write data in the write request into the address corresponding to the write address in the designated cache region comprises writing the sending data into the address corresponding to the write address in the sending cache region.

receiving a read request from a network card, wherein the read request comprises a read address in the sending buffer area;

and reading corresponding data from the address corresponding to the reading address in the sending buffer area to the network card according to the reading request.

According to an embodiment of the present disclosure, wherein:

the receiving a write request from a designated storage device comprises receiving a DMA write request from the designated storage device over a PCIe bus;

the receiving a read request from the network card includes receiving a DMA read request from the network card over a PCIe bus.

A second aspect of the present disclosure provides a data transceiver apparatus, comprising:

the allocation module is configured to allocate a designated cache region for a designated thread in a cache, wherein the designated thread is used for receiving and sending data through a network, the designated cache region is used for caching data written by addresses in a designated buffer region, and the designated buffer region is established in a memory by the designated thread;

a first receiving module configured to receive a write request, the write request including a write address and write data in the designated buffer, the write data including data to be sent by the designated thread over a network or data received by the designated thread over the network;

and the writing module is configured to write the write data in the write request into an address corresponding to the write address in the designated cache region for the received write request.

According to an embodiment of the present disclosure, wherein:

According to an embodiment of the present disclosure, the apparatus further comprises:

a second receiving module configured to receive a read request, the read request including a read address in the designated buffer;

and the reading module is configured to read corresponding data from an address corresponding to the reading address in the specified cache region for any received reading request.

According to an embodiment of the present disclosure, wherein:

the allocating of the designated cache area for the designated thread includes:

According to an embodiment of the present disclosure, wherein:

the write address comprises a write address in the receive buffer;

the write data includes receive data received over a network;

According to an embodiment of the present disclosure, wherein:

the second receiving module is configured to receive a read request from a specified storage device, the read request including a read address in the receive buffer;

the reading module is configured to read corresponding data from an address corresponding to the reading address in the receiving cache region to the specified storage device according to the reading request.

According to an embodiment of the present disclosure, wherein:

the allocating of the designated cache area for the designated thread includes:

According to an embodiment of the present disclosure, wherein:

the write request is used for a write address in the sending buffer;

the write data includes transmission data to be transmitted through a network;

According to an embodiment of the present disclosure, wherein:

the second receiving module is configured to receive a read request from a network card, the read request including a read address in the send buffer;

the reading module is configured to read corresponding data from an address corresponding to the reading address in the sending buffer area to the network card according to the reading request.

According to an embodiment of the present disclosure, wherein:

A third aspect of the present disclosure provides a processor, including a processor core, an integrated input/output module, and a cache, wherein:

the processor core allocates a designated cache region for a designated thread in a cache, the designated thread is used for receiving and sending data through a network, the designated cache region is used for caching data written with addresses in a designated buffer region, and the designated buffer region is established in a memory by the designated thread;

the integrated input and output module receives a write request, wherein the write request comprises a write address and write data in the designated buffer area, and the write data comprises data to be sent by the designated thread through a network or data received by the designated thread through the network;

and for any write request received by the integrated input and output module, the cache writes write data in the write request into an address corresponding to the write address in the specified cache region.

According to the embodiment of the disclosure, the designated cache region is not allowed to be allocated to the threads other than the designated threads, the designated cache region is positioned in the third-level cache of the central processing unit and is smaller than the size of the third-level cache, the designated cache region allows the threads other than the designated threads to perform the read-write operation, and does not allow the threads other than the designated threads to perform the operations other than the read-write operation.

According to the embodiment of the disclosure, the integrated input/output module receives a read request, the read request includes a read address in the designated buffer area, and for any read request received by the integrated input/output module, the cache reads corresponding data from an address in the designated buffer area corresponding to the read address in the read request.

According to the embodiment of the disclosure, after reading corresponding data from the designated cache region according to the read address in the read request, the operating system allows the designated thread to write data to an address in the designated cache region corresponding to the read address.

According to an embodiment of the present disclosure, the designated buffer is a reception buffer for storing reception data received through a network; the designated buffer area is a receiving buffer area corresponding to the receiving buffer area and is used for buffering data written with addresses in the receiving buffer area; the allocating of the designated cache area for the designated thread includes: and according to the instruction for loading the receiving buffer area into the cache, allocating a receiving cache area corresponding to the receiving buffer area in the cache.

According to an embodiment of the present disclosure, the receiving a write request includes receiving a write request from a network card; the write address comprises a write address in the receive buffer; the write data includes receive data received over a network; the writing the write data in the write request into the address corresponding to the write address in the designated cache area comprises writing the received data into the address corresponding to the write address in the receiving cache area.

According to an embodiment of the present disclosure, the integrated input/output module receives a read request from a specified storage device, the read request including a read address in the receive buffer; and the cache reads corresponding data from the address corresponding to the read address in the receiving cache region to the specified storage device according to the read request.

According to the embodiment of the disclosure, the receiving of the write request from the network card comprises the integrated input and output module receiving a DMA write request from the network card through a PCIe bus; the receiving of the read request from the designated storage device includes the integrated input output module receiving a DMA read request from the network card through the PCIe bus.

According to an embodiment of the present disclosure, the designated buffer is a transmission buffer for storing transmission data to be transmitted through a network; the appointed buffer area is a sending buffer area corresponding to the sending buffer area and is used for buffering data written with addresses in the sending buffer area; the allocating of the designated cache area for the designated thread includes: and according to the instruction for loading the sending buffer area into the cache, allocating a sending cache area corresponding to the sending buffer area in the cache.

According to an embodiment of the present disclosure, the receiving a write request includes receiving a write request from a specified storage device; the write request is used for a write address in the sending buffer; the write data includes transmission data to be transmitted through a network; the writing the write data in the write request into the address corresponding to the write address in the designated cache area comprises writing the sending data into the address corresponding to the write address in the sending cache area.

According to the embodiment of the disclosure, the integrated input/output module receives a read request from a network card, wherein the read request comprises a read address in the sending buffer area. And the cache reads corresponding data from the address corresponding to the read address in the sending cache region to the network card according to the read request.

According to an embodiment of the present disclosure, the receiving a write request from a designated storage device includes receiving a DMA write request from the designated storage device over a PCIe bus; the receiving a read request from the network card includes receiving a DMA read request from the network card over a PCIe bus.

A fourth aspect of the present disclosure provides an electronic device, including the processor according to the third aspect, a memory, a network card, and a designated storage device, where the processor is connected to the memory, the network card, and the designated storage device.

A fifth aspect of the present disclosure provides a computer system, including a plurality of computers, where the computer includes the processor according to the third aspect, a memory, a network card, and a designated storage device, and the processor is connected to the memory, the network card, and the designated storage device.

According to the technical scheme provided by the embodiment of the disclosure, data to be sent by a designated thread through a network or data received by the designated thread through the network is written into a designated cache region, and the data to be sent by the designated thread through the network or the data received by the designated thread through the network is not required to be written into a designated buffer region in a memory through a memory controller, so that the consumption of memory bandwidth for data transceiving through the network can be remarkably saved, and the requirement of data transceiving of a data center on high-speed network data is met, and meanwhile, the requirement of sufficient memory bandwidth for application deployed on a server is ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

fig. 1 is a schematic diagram illustrating a related art data transceiving method.

Fig. 2 illustrates a schematic diagram of a data transceiving method according to an embodiment of the present disclosure.

Fig. 3 shows a flow chart of a data transceiving method according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of a data transceiving apparatus according to an embodiment of the present disclosure.

FIG. 5 shows a block diagram of a processor, according to an embodiment of the disclosure.

Fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

FIG. 7 shows a schematic diagram of a computer system according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should also be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Cloud computing and big data have already entered the digital infrastructure stage, and the internet trade is from basic hardware to software service and application deployment all to operate on the cloud, and the cloud is very harsh to the network service ability of cloud data center. The network speed of the current data center has been upgraded from 25Gbps networks to 100Gbps and 200Gbps networks, and 400Gbps networks have been planned, and the rule is basically doubled in network bandwidth every two years. However, the server hardware configuration of the data center is not upgraded to keep up with the development of network bandwidth. The data receiving and transmitting under the double-port 200Gbps network needs to consume memory bandwidth close to 80GBps, the memory bandwidth capacity of a memory strip used on a server widely deployed in a data center is usually about 100GBps, and after the memory access of the network is removed, the residual memory bandwidth is difficult to meet the application requirement deployed on the server.

Specifically, the Data center high-speed network generally uses RDMA (Remote Direct Memory Access) or DPDK (Data Plane Development Kit) and other user mode network protocol stacks for communication, and such user mode network protocol stacks can provide throughput of hundreds of Gbps and delay of microsecond level, and at the same time, need to apply themselves to manage and maintain a network buffer for receiving and sending Data in a user mode. The network buffer is a memory area that the application applies for from the system memory. When the data of the application is sent, the data needs to be placed in a network buffer area, a CPU (central processing unit) running the application submits the network buffer area to a network card, the network card reads the data in the network buffer area through Direct Memory Access (DMA), and then the data is sent through a network and the CPU running the application is informed that the data sending is completed. When receiving data, the application submits the network buffer area to the network card, the network card receives the data through the network, then writes the data into the network buffer area through the DMA, and informs the CPU running the application that the data reception is finished.

When the network card receives and transmits data, the network card needs to perform DMA write or DMA read on the network buffer through the PCIe bus so as to write the received data into the memory or read the transmitted data from the memory. On the other hand, the hard disk writes transmission data to be transmitted through the network card into the memory by DMA write, or reads reception data received by the network card through the network from the memory by DMA read. When the network card and the hard disk read and write data from and to the network buffer, the memory controller must access the memory area to perform access, thereby causing memory bandwidth consumption with the same size as the network bandwidth, and causing the remaining memory bandwidth to be difficult to meet the application requirements deployed on the server.

As shown in fig. 1, a CPU runs a designated thread for data transmission and reception through a network, and the designated thread applies for a memory area in a memory as a network Buffer, where the network Buffer includes a receive Buffer RX Buffer and a transmit Buffer TX Buffer.

When receiving data, the network card writes the received data received through the network into the RX Buffer through the memory controller, and the hard disk reads the received data from the RX Buffer through the memory controller so as to complete the persistent storage of the received data. When data is transmitted, the hard disk writes the transmitted data to be transmitted into the TX Buffer through the memory controller, and then the network card reads the transmitted data from the TX Buffer through the memory controller and transmits the data through the network.

As shown in fig. 2, on an IO path in a network receiving direction, a designated thread that receives data through a network applies for a memory area in a memory as a network Buffer, which may be specifically a receiving Buffer RX Buffer. According to embodiments of the present disclosure, the designated thread may be an RDMA or DPDK based application thread. Then, the RX Buffer is loaded into a third level cache (LLC) of the CPU by a CPU load instruction, that is, a designated cache region corresponding to the RX Buffer is allocated to the designated thread in the LLC, specifically, the RX Buffer may be a receive cache region RX Buffer, which is used for caching data to be written into the RX Buffer. The cache may maintain a correspondence between the address in RX Buffer and the address in RX Buffer.

The CPU initiates a command for receiving data to the network card, and sends the meta information (including any one or more of the following items: RX Buffer physical address, length, access key, etc.) of the RX Buffer to the network card, after the network card receives the data sent through the network (hereinafter referred to as "received data"), the network card parses the command for receiving data and the meta information of the RX Buffer to obtain the memory address for writing the received data, and then sends a DMA write request through a PCIe (Peripheral Component Interface extended) bus, wherein the DMA write request includes the write address in the RX Buffer and the write data (i.e. the received data). And the cache receives a DMA write request on the PCIe bus, and writes the received data into an address corresponding to the write address in the RX Buffer according to the corresponding relation between the address in the RX Buffer and the address in the RX Buffer.

Because the RX buffer is loaded into the LLC of the CPU in advance, the DMA write request does not write in the memory, but writes the received data into the corresponding address in the RX buffer, thereby saving the consumption of the memory bandwidth in the receiving direction of the network card.

When the received data written into the RX Buffer needs to be stored in the designated storage device, the CPU sends a read data instruction to the designated storage device, and the designated storage device sends a DMA read request via the PCIe bus in response to the read data instruction, where the DMA read request includes a read address in the RX Buffer. And the cache receives a DMA read request on the PCIe bus as a response to the DMA read request, and sends data of an address corresponding to the read address in the RX Buffer to the specified storage device according to the corresponding relation between the address in the RX Buffer and the address in the RX Buffer. In this way, the data received through the network can be stored in the specified storage device under the condition of not reading and writing the memory, and the consumption of the memory bandwidth for reading the specified storage device is saved.

According to an embodiment of the present disclosure, the specified storage device may be, for example, a Solid State Disk (SSD), or may be another storage device capable of performing DMA access to the LLC.

On an IO path in a network sending direction, a designated thread sending data through a network applies for a memory area in a memory as a network Buffer, specifically, a Buffer TX Buffer. According to embodiments of the present disclosure, the designated thread may be an RDMA or DPDK based application thread. Then, the TX Buffer is loaded into a third level cache (LLC) of the CPU by a CPU load instruction, that is, a specified cache region corresponding to the TX Buffer is allocated to the specified thread in the LLC, specifically, the TX Buffer may be a sending cache region TX Buffer, which is used for caching data to be written into the TX Buffer. The cache may maintain a correspondence between the address in TX Buffer and the address in TX Buffer.

The CPU issues a read data command to the designated storage device, which sends a DMA write request over the PCIe bus, the DMA write request including the write address and write data in the TX Buffer (i.e., transmit data to be sent over the network). And the cache receives a DMA write request on the PCIe bus, and writes the sending data into an address corresponding to the written address in the TX Buffer according to the corresponding relation between the address in the TX Buffer and the address in the TX Buffer.

Because the TX buffer is loaded into the LLC of the CPU in advance, the DMA write request does not write into the memory, but writes the sending data into the corresponding address in the TX buffer, and the consumption of the read direction of the specified storage device on the memory bandwidth is saved.

Then, the CPU sends a data sending instruction to the network card, and the network card sends a DMA read request through a PCIe bus in response to the data sending instruction, wherein the DMA read request comprises a read address in the TX Buffer. And the cache receives a DMA read request on the PCIe bus as a response to the DMA read request, and sends data of an address corresponding to the read address in the TX Buffer to the network card according to the corresponding relation between the address in the TX Buffer and the address in the TX Buffer. The network card packages the data and then sends the data through the network. In this way, the data to be sent through the network can be transmitted to the network card from the cache for sending under the condition of not reading and writing the memory, and the consumption of the memory bandwidth in the sending direction of the network card is saved.

Typically, the bandwidth capability of the CPU LLC is above 200GBps, much larger than the memory bandwidth and can meet the current and future high throughput access requirements. According to the embodiment of the disclosure, the LLC is used for bypassing the memory to transmit data, so that the consumption of the memory bandwidth caused by network data receiving and transmitting can be effectively reduced, and the requirements of other application programs deployed on the server on the memory bandwidth are ensured.

Fig. 3 shows a flow chart of a data transceiving method according to an embodiment of the present disclosure. As shown in fig. 3, the data transceiving method includes the following steps S301 to S303:

in step S301, a designated cache area is allocated to a designated thread in a cache, where the designated thread is used to receive and send data via a network, the designated cache area is used to cache data written in an address in a designated cache area, and the designated cache area is established in a memory by the designated thread;

in step S302, a write request is received, where the write request includes a write address and write data in the designated buffer, and the write data includes data to be sent by the designated thread through a network or data received by the designated thread through the network;

in step S303, for the received write request, the write data in the write request is written into the address corresponding to the write address in the designated buffer area.

According to the embodiment of the disclosure, for any write request received by the cache, writing the write data in the write request into the address corresponding to the write address in the designated cache area.

The data transceiving method according to an embodiment of the present disclosure may be performed by an electronic device, which may be a computer or a server, running a designated thread that is responsible for a part or all of data transceiving operations of the electronic device via a network. By writing the data to be sent by the designated thread through the network or the data received by the designated thread through the network into the designated cache region, the data to be sent by the designated thread through the network or the data received by the designated thread through the network does not need to be written into the designated buffer region in the memory through the memory controller, so that the consumption of the memory bandwidth for receiving and sending the data through the network can be remarkably saved, and the requirement of the application deployed on the server for having enough memory bandwidth on the server is met while the data receiving and sending requirements of the high-speed network data of the data center are met.

According to an embodiment of the present disclosure, the method further comprises: receiving a read request, wherein the read request comprises a read address in the designated buffer; and for any received read request, reading corresponding data from an address corresponding to the read address in the specified cache region.

According to the embodiment of the disclosure, by reading the data to be sent by the designated thread through the network or the data received by the designated thread through the network from the designated cache region, the memory controller does not need to read the data to be sent by the designated thread through the network or the data received by the designated thread through the network from the designated buffer region in the memory, and the consumption of the memory bandwidth by data transceiving through the network can be further saved.

According to the embodiment of the disclosure, the addresses in the write request and the read request are both addresses in the designated buffer area, and the cache maintains the corresponding relation between the addresses in the designated buffer area and the addresses in the designated buffer area. After the cache acquires the write request, the address in the designated cache region corresponding to the address in the designated buffer region is determined, and the data is written into the address in the designated cache region corresponding to the address in the designated buffer region. After the cache acquires the read request, the address in the designated cache region corresponding to the address in the designated buffer region is determined, and the data is read from the address in the designated cache region corresponding to the address in the designated buffer region.

According to the embodiment of the disclosure, the designated cache region is not allowed to be allocated to the threads other than the designated thread; the designated cache region is positioned in a third-level cache of the central processing unit and is smaller than the size of the third-level cache; the designated buffer area allows the threads except the designated thread to execute the read-write operation, and does not allow the threads except the designated thread to execute the operations except the read-write operation.

According to the embodiment of the present disclosure, all threads running on all CPU cores in one CPU socket (socket) share the LLC of the CPU socket, and a specified cache region allocated for a specified thread can be isolated from cache regions of other threads by, for example, an existing CAT (cache allocation tool), so as to prevent the specified cache region from being allocated to other threads, which may result in a failure and additional memory access when a network card or a specified storage device accesses the specified cache region. According to the embodiment of the disclosure, the designated buffer area is established by the designated thread, and the designated thread declares the designated buffer area as a private object when establishing the designated buffer area, so that the designated thread can release and read and write the designated buffer area, and the threads except the designated thread can only execute the read and write operation on the designated buffer area. The designated buffer area allows the threads except the designated thread to perform read-write operation, so that the requirement that other threads acquire the sending data or receive the data quickly can be met. Meanwhile, because other threads except the designated thread cannot perform operations except read-write operations on the designated buffer area, for example, other threads cannot release or apply for a memory space in the designated buffer area, centralized management of the designated buffer area by the designated thread can be realized, the address corresponding relation between the designated buffer area and the designated buffer area is ensured to be unchanged, and efficient and stable data receiving and sending are realized.

According to the embodiment of the disclosure, the designated buffer area is set in the LLC and is smaller than the LLC in size, so that it can be ensured that reading and writing of the designated buffer area can be completed through the designated buffer area.

According to the embodiment of the disclosure, after reading corresponding data from the designated cache region according to the read address in the read request, the designated thread is allowed to write data to the address in the designated cache region corresponding to the read address.

For example, after reading the received data from the designated buffer area and storing the data in the designated storage device, or after reading the transmission data from the designated buffer area and transmitting the data through the network card, the designated thread may be allowed to write the data to an address in the designated buffer area where the read data is originally stored (i.e., an address in the designated buffer area corresponding to the read address). In this way, the addresses in the specified cache region can be reused, ensuring that the size of the specified cache region is always within the LLC size and that the specified cache region is always maintained in the LLC. By controlling the size of the designated cache region and recycling the designated cache region, the access of the network card or the designated storage device to the memory can be completely transferred to the access of the LLC with larger bandwidth, the consumption of the memory bandwidth in the scene of receiving and transmitting high-speed network data is eliminated, and the bottleneck of accessing the memory bandwidth is avoided.

According to an embodiment of the present disclosure, the designated Buffer is a reception Buffer for storing reception data received through a network, and may be, for example, an RX Buffer shown in fig. 2.

According to an embodiment of the present disclosure, the designated buffer area is a receiving buffer area corresponding to the receiving buffer area, and is used for buffering data written with addresses in the receiving buffer area, for example, rx buffer shown in fig. 2 may be used.

According to an embodiment of the present disclosure, the allocating a designated cache area for a designated thread includes: and according to the instruction for loading the receiving buffer area into the cache, allocating a receiving cache area corresponding to the receiving buffer area in the cache.

For example, in fig. 2, according to a CPU load instruction for loading the RX Buffer into the cache, an RX Buffer corresponding to the RX Buffer may be allocated in the cache, and the cache maintains an address correspondence relationship between the RX Buffer and the RX Buffer, so that when an access request for an address in the RX Buffer is received, the RX Buffer is converted into an access request for the address in the RX Buffer and corresponding access operation is performed.

According to an embodiment of the present disclosure, the method further comprises: receiving a read request from a designated storage device, the read request including a read address in the receive buffer; and reading corresponding data from the address corresponding to the read address in the receiving cache region to the specified storage device according to the read request.

According to an embodiment of the present disclosure, the receiving a write request from a network card includes the cache receiving a DMA write request from the network card through a PCIe bus; the receiving a read request from the designated storage device includes the cache receiving a DMA read request from the network card over the PCIe bus.

For example, in fig. 2, the cache receives a DMA write request from the network card over the PCIe bus, the write address in the DMA write request includes the write address in the RX Buffer, and the write data includes the receive data. And the cache converts the access to the write address into the access to the RX Buffer according to the corresponding relation between the address in the RX Buffer and the address in the RX Buffer, and writes the received data into the address corresponding to the write address in the RX Buffer.

According to an embodiment of the present disclosure, the designated storage device may be, for example, an SSD. The cache receives a DMA read request from the designated storage device over the PCIe bus, where the DMA read request includes a read address in the RX Buffer. And the cache converts the access to the read address into the access to the RX Buffer according to the corresponding relation between the address in the RX Buffer and the address in the RX Buffer, and reads data from the address corresponding to the read address in the RX Buffer.

According to the embodiment of the disclosure, data received by the designated thread through the network is written into the receiving cache region from the network card, and the data received by the designated thread through the network does not need to be written into the receiving cache region in the memory through the memory controller, so that the consumption of memory bandwidth by data receiving through the network can be remarkably reduced, and the requirement of the server for application deployed on the server on enough memory bandwidth is ensured while the requirement of data receiving and sending of a high-speed network of a data center is met.

According to the embodiment of the disclosure, the data received by the designated thread through the network is read from the receiving cache region by the designated storage device, and the data received by the designated thread through the network is not required to be read from the receiving cache region in the memory by the memory controller, so that the consumption of the memory bandwidth by data transceiving through the network can be further saved.

According to an embodiment of the present disclosure, the designated Buffer is a transmission Buffer for storing transmission data to be transmitted through a network, and may be, for example, a TX Buffer shown in fig. 2.

According to an embodiment of the present disclosure, the specified buffer area is a sending buffer area corresponding to the sending buffer area, and is used for buffering data of a write address in the sending buffer area, for example, tx buffer shown in fig. 2 may be used.

According to an embodiment of the present disclosure, the allocating a designated cache area for a designated thread includes: and according to the instruction for loading the sending buffer area into the cache, allocating a sending cache area corresponding to the sending buffer area in the cache.

For example, in fig. 2, according to a CPU load instruction for loading the TX Buffer into the cache, a TX Buffer corresponding to the TX Buffer may be allocated in the cache, and the cache maintains an address correspondence between the TX Buffer and the TX Buffer, so that when an access request for an address in the TX Buffer is received, the access request is converted into an access request for an address in the TX Buffer, and a corresponding access operation is executed.

According to an embodiment of the present disclosure, the method further comprises: receiving a read request from a network card, wherein the read request comprises a read address in the sending buffer area; and reading corresponding data from the address corresponding to the reading address in the sending buffer area to the network card according to the reading request.

According to an embodiment of the present disclosure, wherein the receiving a write request from a designated storage device includes the cache receiving a DMA write request from the designated storage device over a PCIe bus; the receiving a read request from the network card includes the cache receiving a DMA read request from the network card over a PCIe bus.

For example, in fig. 2, a given storage device may be, for example, an SSD, according to an embodiment of the disclosure. The cache receives a DMA write request from a designated storage device over the PCIe bus, the write address in the DMA write request including the write address in the TX Buffer, and the write data including the transmit data. And the cache converts the access to the write address into the access to the TX Buffer according to the corresponding relation between the address in the TX Buffer and the address in the TX Buffer, and writes the sending data into the address corresponding to the write address in the TX Buffer.

The cache receives a DMA read request from the network card through the PCIe bus, wherein the DMA read request comprises a read address in the TX Buffer. And the cache converts the access to the read address into the TX Buffer according to the corresponding relation between the address in the TX Buffer and the address in the TX Buffer, and reads data from the address corresponding to the read address in the TX Buffer.

According to the embodiment of the disclosure, data to be sent by a designated thread through a network is written into a sending cache region from a designated storage device, and the data to be sent by the designated thread through the network does not need to be written into a sending buffer region in a memory through a memory controller, so that the consumption of memory bandwidth caused by data sending through the network can be remarkably reduced, and the requirement of a server for application deployed on the server is met while the requirement of data receiving and sending of a high-speed network of a data center is met.

According to the embodiment of the disclosure, the network card is used for reading the data to be sent by the designated thread from the sending buffer area, and the memory controller is not required to read the data to be sent by the designated thread from the sending buffer area in the memory, so that the consumption of the memory bandwidth for data receiving and sending through the network can be further saved.

Fig. 4 shows a block diagram of a data transceiving apparatus according to an embodiment of the present disclosure. Wherein the apparatus may be implemented as part or all of a processor (e.g., CPU) via software, hardware, or a combination of both.

As shown in fig. 4, the data transceiver 400 includes an allocating module 410, a first receiving module 420, and a writing module 430.

The allocation module 410 is configured to allocate a designated cache region for a designated thread in the cache, the designated thread being used for transceiving data through the network, the designated cache region being used for caching data written with addresses in a designated buffer region, the designated buffer region being established in the memory by the designated thread;

the first receiving module 420 is configured to receive a write request, the write request including a write address and write data in the designated buffer, the write data including data to be sent by the designated thread over a network or data received by the designated thread over the network;

the writing module 430 is configured to, for a received write request, write the write data in the write request to an address in the designated buffer area corresponding to the write address.

According to an embodiment of the present disclosure, wherein:

and the designated cache region allows the designated thread to execute read-write operation, and allows threads other than the designated thread to execute read operation.

According to an embodiment of the present disclosure, the apparatus 400 further comprises:

a second receiving module 440 configured to receive a read request, the read request including a read address in the designated buffer;

the reading module 450 is configured to, for any received read request, read corresponding data from an address in the designated cache area corresponding to the read address.

According to an embodiment of the present disclosure, wherein:

the allocating of the designated cache area for the designated thread includes:

According to an embodiment of the present disclosure, wherein:

the write address comprises a write address in the receive buffer;

the write data includes receive data received over a network;

According to an embodiment of the present disclosure, wherein:

the second receiving module 440 is configured to receive a read request from a specified storage device, the read request including a read address in the receive buffer;

the reading module 450 is configured to read corresponding data from an address corresponding to the read address in the receiving buffer area to the specified storage device according to the read request.

According to an embodiment of the present disclosure, wherein:

the slave network card receives a write request, including receiving a DMA write request from the network card through a PCIe bus;

According to an embodiment of the present disclosure, wherein:

the allocating of the designated cache area for the designated thread includes:

According to an embodiment of the present disclosure, wherein:

the write request is used for a write address in the sending buffer;

the write data includes transmission data to be transmitted through a network;

the writing the write data in the write request into the address corresponding to the write address in the designated cache area comprises writing the sending data into the address corresponding to the write address in the sending cache area.

According to an embodiment of the present disclosure, wherein:

the second receiving module 440 is configured to receive a read request from a network card, where the read request includes a read address in the sending buffer;

the reading module 450 is configured to read corresponding data from an address corresponding to the read address in the sending buffer area to the network card according to the read request.

According to an embodiment of the present disclosure, wherein:

As shown in fig. 5, a processor 500 includes at least one processor core 501, an Integrated I/O module (IIO) 502, at least one Integrated Memory Controller (IMC) 503, and at least one cache 504.

Where processor core 501 is coupled to IIO 502, IIO 502 is coupled to IMC 503, IMC 503 is coupled to cache 504, and cache 504 is coupled to processors and 501. According to the embodiments of the present disclosure, the connection may be a direct connection or an indirect connection via other components.

According to the embodiment of the present disclosure, the processor core 501 runs a designated thread for data transceiving through a network, and allocates a designated cache region for the designated thread in a cache according to a CPU load instruction, the designated thread is used for data transceiving through the network, the designated cache region is used for caching data written with an address in a designated buffer, and the designated buffer is established in a memory by the designated thread.

The IIO 502 receives a write request including a write address and write data in the designated buffer, the write data including data to be sent by the designated thread over a network or data received by the designated thread over a network.

In accordance with embodiments of the present disclosure, IIO 502 forwards received DMA requests (including DMA write requests and DMA read requests) to IMC 503, which forwards the DMA requests to cache 504 if IMC 503 recognizes that the memory address in the DMA request has a corresponding address in cache 504. Since the specified buffer in the disclosed embodiment is loaded into cache 504 and the specified cache region is not allowed to be allocated to threads other than the specified thread, the address in the specified buffer always has a corresponding address in the specified cache region, and thus, IMC 503 will forward any read and write requests received for the address in the specified buffer to cache 504.

For any write request received by the IIO 502, the cache 504 writes the write data in the write request to the address corresponding to the write address in the designated cache area.

According to the embodiment of the present disclosure, the IIO 502 receives a read request, where the read request includes a read address in the designated buffer, and for any read request received by the IIO 502, the cache 504 reads corresponding data from an address in the designated buffer corresponding to the read address in the read request.

According to the embodiment of the disclosure, after reading the corresponding data from the specified cache region according to the read address in the read request, the operating system allows the specified thread to write the data to the address in the specified cache region corresponding to the read address.

According to an embodiment of the present disclosure, the IIO 502 receives a read request from a specified storage device, the read request including a read address in the receive buffer; the cache 504 reads corresponding data from the address corresponding to the read address in the receiving cache region to the specified storage device according to the read request.

According to an embodiment of the present disclosure, the receiving a write request from a network card includes the IIO 502 receiving a DMA write request from the network card through a PCIe bus; the receiving of the read request from the designated storage device includes IIO 502 receiving a DMA read request from a network card over a PCIe bus.

In accordance with an embodiment of the present disclosure, IIO 502 receives a read request from a network card, the read request including a read address in the send buffer. The cache 504 reads corresponding data from the address corresponding to the read address in the sending cache region to the network card according to the read request.

An embodiment of the present disclosure further provides an electronic device, including the processor, the memory, the network card, and the designated storage device, where the processor is connected to the memory, the network card, and the designated storage device.

Fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

As shown in fig. 6, the electronic device includes a processor, a memory, a network card, and a designated storage device, where the processor is connected to the memory, the network card, and the designated storage device.

The processor may be implemented as a processor including the data transceiving apparatus 400 described above, or as the processor 500, according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the electronic device may be a server.

As shown in fig. 7, an embodiment of the present disclosure further provides a computer system 700, which includes a plurality of computers 700-1, 700-2 \8230 \ 8230and 700-N, where the computers include a processor, a memory, a network card, and a designated storage device, and the processor is connected to the memory, the network card, and the designated storage device. The processor may be implemented as a processor including the data transceiving apparatus 400 described above, or as the processor 500, according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the computer system may be a server system of a data center, and the computer may be a server of the data center.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software, or by programmable hardware or dedicated hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A data transmitting and receiving method comprises the following steps:

allocating a designated cache region for a designated thread in a cache, wherein the designated thread is used for receiving and sending data through a network, the designated cache region is used for caching data written in an address in a designated buffer region, and the designated buffer region is established in a memory by the designated thread;

and for the received write request, writing the write data in the write request into an address corresponding to the write address in the designated cache region.

2. The method of claim 1, wherein:

3. The method of claim 1, further comprising:

for any received read request, reading corresponding data from an address corresponding to the read address in the designated cache region;

4. The method of claim 1, wherein:

the allocating of the designated cache area for the designated thread includes:

5. The method of claim 4, wherein:

the write address comprises a write address in the receive buffer;

the write data includes receive data received over a network;

6. The method of claim 5, further comprising:

7. The method of claim 6, wherein:

8. The method of claim 1, wherein:

the allocating of the designated cache area for the designated thread includes:

9. The method of claim 8, wherein:

the write request is used for a write address in the sending buffer;

the write data includes transmission data to be transmitted through a network;

10. The method of claim 9, further comprising:

11. The method of claim 10, wherein:

12. A processor comprises a processor core, an integrated input and output module and a cache, wherein:

13. An electronic device comprising the processor of claim 12, a memory, a network card, a designated storage device, the processor being coupled to the memory, the network card, the designated storage device.

14. A computer system comprising a plurality of computers, said computers comprising the processor of claim 12, a memory, a network card, a designated storage device, said processor coupled to said memory, said network card, said designated storage device.