WO2021089036A1 - Data transmission method, network device, network system and chip - Google Patents

Data transmission method, network device, network system and chip Download PDF

Info

Publication number
WO2021089036A1
WO2021089036A1 PCT/CN2020/127446 CN2020127446W WO2021089036A1 WO 2021089036 A1 WO2021089036 A1 WO 2021089036A1 CN 2020127446 W CN2020127446 W CN 2020127446W WO 2021089036 A1 WO2021089036 A1 WO 2021089036A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
written
write request
destination
network device
Prior art date
Application number
PCT/CN2020/127446
Other languages
French (fr)
Chinese (zh)
Inventor
游俊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021089036A1 publication Critical patent/WO2021089036A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services

Definitions

  • This application relates to the storage field, and in particular to a data transmission method, network equipment, network system and chip.
  • the data is generally transferred by means of Remote Direct Memory Access (RDMA), for example, when the data from the source device needs to be transferred to the destination device ,
  • RDMA Remote Direct Memory Access
  • the data of the source device is first transmitted to the network card of the destination device, and the network card of the destination device is equipped with an RDMA chip.
  • the data can be directly written into the memory of the destination device through the RDMA chip in the network card, without the need
  • the participation of the processor of the destination device saves the computing resources of the destination device.
  • the written data needs to be verified. Because the data is transmitted through RDMA, the processor of the destination device is not required to participate , The destination device cannot verify the written data and needs to be verified by the source device. When the source device is verified, the data written in the memory of the destination device needs to be read to the source. Equipment, this not only consumes network resources, but also increases the delay of data transmission.
  • the present invention provides a data transmission method, network equipment, network system, and chip for performing data verification on the network equipment of the destination equipment, which not only saves network transmission resources, but also reduces the time delay of data transmission.
  • the first aspect of the present invention provides a network device configured as a destination network device (such as a network card of the destination device).
  • the network device receives a write request from a source device (such as a source computing device), and executes the write request to write the data to be written in the write request to a destination device (such as a destination computing device). ); After the data to be written is written to the destination device, read the written data from the destination device; combine the data to be written in the write request with the read written data Data is compared; when the data to be written is the same as the read written data, a write success response is returned to the source device; when the data to be written is different from the read written data , And return a write failure response to the source device.
  • the data transmitted from the source equipment (such as the source computing equipment) to the destination equipment (such as the destination computing equipment) can pass through the network equipment of the destination equipment (referred to as the destination network for short)
  • the network device of the destination device the network card of the destination device
  • the network device of the destination device verifies the written data without transmitting the data to the source device (such as the source computing device).
  • Performs verification thereby reducing the consumption of network transmission resources in the prior art.
  • the network device of the destination device verifies the data that has been written, it directly returns a successful data write response to the source device to avoid This reduces the communication overhead of the prior art, thereby reducing the data transmission delay as a whole.
  • the network device is configured as a destination network device and is further configured to, after receiving a write request from the source device, cache the data to be written in the write request .
  • an area can be allocated in the cache of the destination network device to cache the data to be written in the write request.
  • the network device is configured as a destination network device for specific execution: parsing the write request to obtain the data to be written and the data to be written Destination address; and writing the data to be written into the storage space identified by the destination address in the destination device (such as the destination computing device).
  • the aforementioned storage space is the memory of the destination device.
  • the network device is configured as a source-end network device (such as a network card of a source-end computing device) for executing: receiving the source-end device (such as: The source computing device) sends striped data containing at least two strips of data, the write address of each striped data contained in the striped data and the write address of at least one check data; according to the at least two strips of data; Generate the at least one check data for each piece of data; for each piece of data, generate a write request according to the piece of data and the write address of the piece of data; for each piece of data, according to the The verification data and the write address of the verification data generate a write request; each write request generated above is sent to a destination device.
  • a source-end network device such as a network card of a source-end computing device for executing: receiving the source-end device (such as: The source computing device) sends striped data containing at least two strips of data, the write address of each striped data contained in the striped data and the write address of at least
  • the generation of verification data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device).
  • the source device such as the source computing device
  • the source network device such as the network card of the source computing device.
  • the network device is configured as a source-end network device for executing: receiving the data to be written sent by the source device, and storing the data to be written At least one logical address of at least one logical copy of the at least one logical address; lock the at least one logical address; for each logical address, generate a write request based on the data to be written and the logical address; combine the generated at least one write request Each write request in is sent to a target network device.
  • the generation of duplicate data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device), you can Reduce the load of the source-end device, save the computing resources of the source-end device (such as the source-end computing device), thereby improving the data processing efficiency of the source-end device as a whole.
  • the network device is configured as a source-end network device for: parallelly sending each of the at least one generated write request to A target network device.
  • the target network device by sending multiple write requests to the target network device in parallel, the concurrency of data transmission of the source network device (such as the network card of the source computing device) is enhanced, thereby improving data efficiency. Transmission efficiency.
  • the second aspect of the present invention provides a data transmission method, which can be executed by a destination network device.
  • the method includes:
  • the data transmitted from the source device to the destination device can be written into the destination device through the network device of the destination device (referred to as the destination network device),
  • the network device of the destination device verifies the written data without transmitting the data to the source device for verification, thereby reducing the consumption of network resources in the prior art.
  • the device verifies the written data, it directly returns a data write successful response to the source device, avoiding the communication overhead of the prior art, thereby reducing the data transmission delay as a whole.
  • the method is executed by a destination network device, and further includes: after receiving the write request, buffering the data to be written in the write request.
  • the write request further includes the destination address of the data to be written, and the execution of the write request is to write the data to be written in the write request to the destination.
  • the end device includes: writing the data to be written in the write request into the storage unit of the destination device according to the destination address in the write request.
  • the method is executed by a source network device.
  • the method also includes:
  • the write address of the at least one parity data, the plurality of striped data and the at least one parity data generate a plurality of write requests, wherein each partition Each piece of data corresponds to a write request, and each check data corresponds to a write request;
  • Each of the multiple write requests is sent to a destination device respectively.
  • the generation of verification data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device). Reduce the load of the source device and save the computing resources of the source device (such as the source computing device), thereby improving the data processing efficiency of the source device.
  • the method further includes:
  • Each of the multiple write requests is sent to a target network device respectively.
  • offloading the generation of duplicate data from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device) can reduce The load of the source-end device saves the computing resources of the source-end device (such as the source-end computing device), thereby improving the data processing efficiency of the source-end device as a whole.
  • each of the multiple write requests is sent to a target network device in parallel.
  • the target network device by sending multiple write requests to the target network device in parallel, the concurrency of data transmission of the source network device (such as the network card of the source computing device) is enhanced, thereby improving data efficiency. Transmission efficiency.
  • the third aspect of the present invention provides a network system.
  • the network system includes a source device (such as a source computing device), a destination device (such as a destination computing device), and any one of the first aspect or the first aspect connected to the destination device.
  • the realization of the provided destination network device (such as the network card of the destination computing device).
  • the fourth aspect of the present invention provides a chip, including an interface and a processor, the interface is used to provide input and output data for the processor; the processor is used to execute the second aspect or any of the second aspect of the present invention
  • the chip can use CPU (Central processing unit, central processing unit), MCU (Micro controller Unit, microcontroller), MPU (Micro processing unit, microprocessor), DSP (Digital signal processing, digital Signal processor), SoC (System on Chip), ASIC (application-specific integrated circuit, application-specific integrated circuit), FPGA (Field Programmable Gate Array) or PLD (Programmable Logic Device), editable Logic device).
  • CPU Central processing unit, central processing unit
  • MCU Micro controller Unit, microcontroller
  • MPU Micro processing unit, microprocessor
  • DSP Digital signal processing, digital Signal processor
  • SoC System on Chip
  • ASIC application-specific integrated circuit, application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • PLD Programmable Logic Device
  • a fifth aspect of the present invention provides a network device, the network device includes a processor and a memory, the memory stores program instructions, and the processor executes the program instructions in the memory to execute the second aspect of the present invention or In the second aspect, any possible implementation of the method provided.
  • the sixth aspect of the present invention provides a computer-readable storage medium, and the computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute the second aspect or any possible implementation of the second aspect of the present invention The method provided.
  • These computer-readable storages include but are not limited to one or more of the following: ROM (Read-Only Memory), PROM (Programmable ROM), EPROM (Erasable PROM), Flash memory, EEPROM (Electrically EPROM, electric EPROM) and hard drive (Hard drive).
  • the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method provided by the second aspect or any one of the possible implementations of the second aspect of the present invention.
  • any one of the possible implementation manners of the third aspect and the third aspect of the present invention corresponds to any one of the first aspect and the first aspect respectively
  • the fourth, fifth, sixth, and seventh aspects and Any one of their possible implementation manners corresponds to the second aspect and any one of the possible implementation manners of the second aspect respectively, and the technical effects of the implementation are not repeated here.
  • FIG. 1 is an architecture diagram of a data transmission system provided by an embodiment of the present invention.
  • Fig. 2 is a flowchart of a data transmission method provided by an embodiment of the present invention.
  • Fig. 3 is a block diagram of a network device provided by an embodiment of the present invention.
  • Fig. 4 is a hardware structure diagram of a network device provided by an embodiment of the present invention.
  • FIG. 1 it is an architecture diagram of a network transmission system.
  • the source device 10 and the destination device 20 are connected through a network 30.
  • An applicable scenario of the embodiment of the present invention is: the source device 10 and the destination device 20 may be storage arrays.
  • the host (not shown) sends a write request to the source device 10
  • the source device 10 is based on a preset According to the reliability strategy, the source device 10 writes the data to be written in the write request into multiple destination devices 20.
  • the embodiment of the present invention only takes one destination device 20 as an example for description.
  • the source device 10 includes a processing unit 101, a storage unit 102, and a network device 103.
  • the network device 103 includes a remote direct memory access (RDMA) chip 1031.
  • the destination device 20 includes a processing unit 201, a storage unit 202, and a network device 203.
  • the network device 203 includes an RDMA chip 2031.
  • the storage unit 102 and the storage unit 202 are the memories of the source device 10 and the destination device 20, respectively, which can be dynamic random access memory (Dynamic Random Access Memory, DRAM) or storage class memory (SCM) .
  • DRAM Dynamic Random Access Memory
  • SCM storage class memory
  • the processing unit 101 and the processing unit 201 can have a variety of specific implementation forms, such as a central processing unit (CPU) or graphics processing unit (GPU), and can be a single-core processor or a multi-core processor. If it is implemented as a multi-core processor, the number of processors is not limited.
  • CPU central processing unit
  • GPU graphics processing unit
  • the storage unit 102 and the storage unit 202 are the memories of the source device 10 and the destination device 20, respectively, which can be dynamic random access memory (Dynamic Random Access Memory, DRAM) or storage class memory (SCM) .
  • DRAM Dynamic Random Access Memory
  • SCM storage class memory
  • the storage unit 102 and the storage unit 202 are used to store program codes and data, so that the processing unit can call and execute to implement related functions.
  • the network device 103 and the network device 203 are network cards belonging to the source device 10 and the destination device 20, respectively, but in other embodiments, the network device 103 and the network device 203 are respectively connected to the source The switch of the end device 10 and the destination device 20.
  • the RDMA method when data is transmitted from the source device 10 to the destination device 20, the RDMA method is adopted. Specifically, the processing unit 101 of the source device 10 first sends the write request to the network of the source device 10 Device 103, the RDMA chip 1031 of the network device 103 forwards the write request to the network device 203 of the destination device 20, and the RDMA 2031 of the network device 203 writes the data in the write request to the storage unit 202 of the destination device 20.
  • the processing unit 201 of the destination device 20 does not need to participate, so the computing resources of the destination device 20 are saved.
  • the written data must be verified.
  • the processing unit 201 of the destination device 20 does not participate in the data transmission, it cannot pass the destination.
  • the end device performs data verification.
  • the commonly used method is to use the processing unit 101 of the source device 10 to verify the data written in the storage unit 202 of the destination device 20.
  • the verification process includes:
  • the destination device 20 After the data is written into the storage unit 202 of the destination device 20, the destination device 20 sends response information to the source device 10.
  • the processing unit 101 of the source device 10 initiates a read request again to read the data written in the storage unit 202 of the destination device 20 to the source device 10, and the processing unit 101 of the source device 10 compares the write request Whether the data in is consistent with the data read from the storage unit 202 of the destination device 20.
  • the written data needs to be read from the destination device 20 to the source device 10, which increases the consumption of network bandwidth. Because the source device is required to send and read the destination device’s data.
  • the instruction to write data increases the communication overhead of the data transmission process, thereby increasing the time delay of the data writing operation as a whole.
  • the present invention provides a data transmission method. After the data transmitted from the source device 10 to the destination device 20 is written into the storage unit 202 of the destination device through the network device 203 of the destination device 20, The network device 203 of the destination device 20 verifies the written data without transmitting the data to the source device 10 for verification, thereby reducing the consumption of network resources. The network device of the destination device verifies the data that has been written. After the written data is verified, the response of data writing success is directly returned to the source end device, which avoids the communication overhead in the prior art, thereby reducing the data transmission delay as a whole.
  • the data transmission method provided by the embodiment of the present invention will be described as an example in which the data generated by the source device 10 is written into the destination device 20 with reference to Fig. 2.
  • FIG. 2 it is a flowchart of a method for writing data from a source device 10 to a destination device 20 in an embodiment of the present invention.
  • step S201 the source device 10 receives a write request sent by the host, and the destination device of the write request carries the data to be written and the logical address of the data to be written.
  • the source device 10 when used as a storage device, after receiving a write request from the host, it will store the data to be written in the write request.
  • the write request of the host may be issued by an application running on the host or an application running in a virtual machine of the host.
  • step S202 the processing unit 101 of the source device 10 determines the reliability policy of the source device 10. If the source device 10 supports erasure code (EC) or redundant array of disks (Redundant Arrays of Independent) Drives, RAID) strategy, step S203 is executed, and if the source device 10 supports a multi-copy reliability strategy, step S205 is executed.
  • EC erasure code
  • RAID redundant array of disks
  • Data loss may occur during data transmission.
  • data loss may also occur. Therefore, in order to ensure the reliability of data, it is necessary to set up some reliability strategies and implement these reliability strategies (such as: Redundant data storage), so that when data is lost, these redundant data can be used for data recovery.
  • EC erasure code
  • RAID redundant array of disks
  • a striped data can be divided into three striped data and three striped data Calculate a parity data.
  • a stripe data can be divided into three stripe data, and two parity data are calculated for the three stripe data; the other is a multiple copy strategy, which is writing When data is being stored, multiple backup data are generated for the data to be written, and different backup data are written to different destination devices, so that when the data is damaged, the data can be restored by reading the backup data.
  • a general application will set a reliability strategy in advance. After the processing unit 101 generates a write request, it will check the reliability strategy preset by the application and perform different processing according to different reliability strategies.
  • Step S203 if it is determined in step S202 that the reliability strategy supported by the application is an EC or RAID strategy, when the processor 101 of the source device 10 determines that the data to be written constitutes striped data, it will constitute the striped data to be written
  • the data and related address information are sent to the network device (also referred to as the source network device) 103 of the source device.
  • the data to be written needs to be combined into striped data.
  • Each striped data is composed of multiple striped data.
  • the source device 10 will set each data in advance. The size of each striped data and the number of striped data that make up the striped data.
  • the made-up striped data will be sent to the destination Equipment network equipment 103.
  • the source device 10 adopts RAID 5
  • the data to be written needs to be filled with striped data composed of three striped data and then sent to the network device 130.
  • different striped data will be written to different locations.
  • check data of the striped data needs to be generated. Therefore, the source device 10 will also write each striped data and check data.
  • the address of the written location and the address of the verification data are sent to the network device 103 of the source device. In the embodiment of the present invention, although the verification data for the strip data is not generated in the source device 10, the address where the verification data is stored is sent to the network device 103.
  • Step S204 After receiving the striped data, the network device (ie, the source-end network device) 103 calculates check data for the striped data.
  • check data is generated for the striped data, so that after one of the strips is damaged, the check data can be used to recover.
  • the general practice is that the processing unit of the source device generates the check data after forming the striped data, which will occupy the computing resources of the processing unit of the source device, thereby affecting the IO processing.
  • the check data of the stripe data can be calculated in the network device 103, so that the computing resources of the source device 10 can be saved.
  • the specific implementation can be to add a dedicated chip to the network device 103 to generate verification data, and the other is to install a program instruction in the network device 103, which is called by the microprocessor in the network device 103 To generate the verification data.
  • the network device 103 After the network device 103 receives the striped data, it will first cache the striped data in the cache of the network device, and then calculate the check data.
  • Step S205 The network device (ie, the source network device) 103 generates multiple sub-write requests, and sends the sub-write requests to different destination devices in parallel according to the addresses of the sub-write requests.
  • the processor of the source device 10 further generates a plurality of sub-write requests according to the strip data transmitted by the source device processor 101, that is, the address of each strip and the address of the verification data, one of which is Striping corresponds to one sub-write request, and one check data corresponds to one sub-write request.
  • the network device 103 can send multiple sub-write requests to different destination devices in parallel, which improves the efficiency of data transmission compared with the serial transmission by the source device.
  • Step S206 If it is determined in step S202 that the reliability policy supported by the application is a multiple copy policy, the processing unit 101 of the source device 10 forwards the write request to the network device 103 of the source device 10, and the write The request carries the data to be written, that is, the logical address of the data to be written.
  • Step S207 The network device 103 of the source device locks the logical address of the data to be written carried in the write request.
  • a persistence log (Plog) is generated for each application in the source device 10, and the log records the logical space allocated by the source device 10 for each application.
  • the number of backup data and the backup address set by the application data set the physical address of the storage unit in the destination device 20 storing the backup data for the logical space of the application.
  • the source device 10 After the source device 10 receives the write request, it locks the logical address carried by the write request recorded in the plog to prevent the logical address from being assigned to other write requests. When the logical address is locked, the logical address corresponding to the The physical address of the storage unit 202 in the other destination device 20 will also be locked, thus ensuring the reliability of the data.
  • the Plog space allocation performed by the source device 10 will increase the load of the source device 10, so in the embodiment of the present invention, this function is offloaded to the network device 103. That is, the Plog is transferred to the source device 10 for storage.
  • the network device 103 receives the write request sent by the source device 10
  • the logical address of the write request in the Plog is locked to prevent the logical address from being allocated to other write requests.
  • the network device 103 after the network device 103 receives the write request, it will first cache the data to be written in the write request in the cache of the network device, and then perform subsequent processing.
  • Step S208 The network device 103 of the source device 10 obtains the write addresses of the multiple copies of the data to be written carried in the write request, and generates multiple sub-files based on the write addresses of the multiple copies and the data to be written. Write request.
  • the write addresses of multiple copies of data can be determined according to the logical addresses of the data to be written. In this way, multiple sub-write requests can be generated according to the data to be written and the write addresses of the multiple copies.
  • the data to be written and the write address of each copy are carried in each sub-write request.
  • Step S209 The network device 103 of the source device sends the multiple sub-write requests to different destination devices 20 according to the address of each sub-write request.
  • each stripe or different copy data in the striped data is stored in different destination devices, but the striped striped data is stored in different destination devices.
  • different copies of data stored in different addresses of the same destination device are also within the protection scope of the present invention.
  • step S210 after receiving the sub-write request, the network device 203 of the destination device 20 writes data into the storage unit 202 of the destination device 20 through RDMA. After the data is successfully written into the storage unit 202, the storage unit 202 Generate a successful write response.
  • step S211 after receiving the successful write response, the network device 203 reads data from the address indicated by the sub-write request in the storage unit 202 through RDMA, and compares the read data with the sub-write request. Compare the data to be written in.
  • the embodiment of the present invention adopts The verification method is that after the network device 203 writes the data to be written into the storage unit 202, the written address is read from the written address, so that if an address offset occurs, the data read from the written address The data will be different from the written data, so you can find the address writing error.
  • step S212 if the data is consistent, the network device 203 of the destination device 20 returns a write success response to the source device 10.
  • step S213 if the data is inconsistent, the network device 203 of the destination device 20 returns a write failure response to the source device 10.
  • the data written into the storage unit 202 of the destination device 20 is verified by the destination device 20 and the network device 203, without consuming network bandwidth, and reducing The latency of data writing.
  • FIG. 3 it is an architecture diagram of a network device in an embodiment of the present invention.
  • the network device may be the network device 103 of the source device 10 in FIG. 1 or the network device 203 of the destination device 20.
  • the network device includes a verification data generation module 301, an address lock module 302, a concurrency module 303, an RDMA module 304, and a post-write verification module 305.
  • the verification data generation module 301, the address locking module 302, the concurrency module 303, and the RDMA module 304 are used when the network device is used as the network device of the source device 10, and the RDMA module 304 and the post-write verification module 305 are used in The network device is used as a destination device.
  • the source device 10 will send the striped data to the source network device, and the check data generation module 301 will be the stripe
  • the data generates the verification data.
  • the concurrency module will generate each stripe data and verification data according to the number of strips and verification data in the striped data, and the address of each stripe data and verification data. Multiple sub-write requests, and the generated multiple sub-write requests are concurrently sent to different destination devices 20 according to addresses. For details, please refer to the description of step S205 in FIG. 2.
  • the source device will send the write request to the network device of the source device and at the same time send the addresses written by multiple copies to the network device of the source device.
  • the address locking module 302 will lock the address in the storage unit in the destination device according to the address of each copy sent by the source device to ensure the consistency of multiple copies. For details, please refer to Figure 2 Description of step S207. After the address is locked, the concurrent module generates a sub-write request for each copy according to the data to be written in the write request, that is, the address of each copy, and sends multiple sub-write requests to different addresses in parallel according to the address in the sub-write request.
  • the destination device 20 is a sub-write request for each copy according to the data to be written in the write request, that is, the address of each copy, and sends multiple sub-write requests to different addresses in parallel according to the address in the sub-write request.
  • the RDMA module When the network device is the network device of the destination device, when receiving a sub-write request sent by the network device of the source device, the RDMA module first writes the data to be written in the sub-write request In the storage unit of the destination device, please refer to the description of step S210 in FIG. 2 for details. After the data is written, the post-write verification module 305 verifies whether the written data is correct. For a detailed description of verification, please refer to steps S211-S213 in FIG. 2.
  • Each module in the network device can be implemented by software, hardware or a combination of both.
  • the processor may include but is not limited to at least one of the following: a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU), or artificial intelligence
  • CPU central processing unit
  • DSP digital signal processor
  • MCU microcontroller unit
  • Artificial intelligence Various computing devices such as processors that run software. Each computing device may include one or more cores for executing software instructions to perform operations or processing.
  • the processor can be a single semiconductor chip, or it can be integrated with other circuits to form a semiconductor chip.
  • SoC on-chip
  • other circuits such as codec circuits, hardware acceleration circuits, or various bus and interface circuits.
  • System can be integrated into the ASIC as a built-in processor of an ASIC, and the ASIC integrated with the processor can be packaged separately or together with other circuits.
  • the processor may further include necessary hardware accelerators, such as field programmable gate array (FPGA) and PLD (programmable logic device) , Or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate array
  • PLD programmable logic device
  • the hardware circuits may be general-purpose CPU (Central processing unit, central processing unit), MCU (Micro controller Unit, microcontroller), MPU (Micro processing unit, microprocessor), DSP (Digital signal processing, digital signal processor), SoC (System on Chip, system-on-chip) to achieve, of course, can also use application-specific integrated circuit (application-specific integrated circuit, ASIC) to achieve, or programmable logic device (programmable logic) device, PLD), the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (generic array logic, GAL) or its In any combination, it can run necessary software or does not rely on software to execute the above method flow.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • FIG. 4 it is a hardware architecture diagram of the network device in an embodiment of the present invention.
  • the network device includes a processor 410, a memory 420, an interface 430, and an RDMA chip.
  • Program instructions are stored in the memory.
  • the verification data generation module 301, address locking module 302, concurrency module 303, and post-write verification module 305 in FIG. 3 are implemented in software, the programs corresponding to the above modules are stored In the memory 420, these programs are called and executed by the processor 410 to implement the functions performed by the verification data generation module 301, the address locking module 302, the concurrency module 303, and the post-write verification module 305.
  • the interface 430 is used to receive data from the source device or the destination device or send data to the source device or the destination device.
  • the RDMA chip is used to directly write the data received from the source device into the storage unit of the destination device.
  • the processor may include but is not limited to at least one of the following: central processing unit (central processing unit, processing unit), microprocessor, digital signal processor (DSP), microcontroller (microcontroller unit, MCU), or manual Various computing devices such as smart processors that run software. Each computing device may include one or more cores for executing software instructions to perform operations or processing.
  • the verification data generation module 301, address lock module 302, concurrency module 303, and post-write verification module 305 in FIG. 3 can also be implemented by hardware. When implemented by hardware, these modules serve as hardware modules in the chip of the network device. Integrated in the chip of the network device. When implemented by hardware, the processor of the network device is an application specific integrated circuit (ASIC) or an independent semiconductor chip.
  • ASIC application specific integrated circuit
  • the processor's internal processing is used to execute software instructions to perform calculations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA), PLD (programmable logic device) , Or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate array
  • PLD programmable logic device
  • step S209 when data is migrated between the application and the source device and the destination device, only steps S209 to S213 need to be performed.
  • step S209 the data carried in the sent write request is data that needs to be migrated from the source device 10 to the destination device 20.
  • the other steps are the same as S210 and S213, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application provides a data transmission method, network device, network system and chip. When the network device is configured as a destination network device, it will receive a write request from a source device, and execute the write request to write the data to be written in the write request to the destination device. After the data to be written is written into the destination device, the written data is read from the destination device. Comparing the data to be written in the write request with the read written data; when the data to be written is the same as the read written data, a write success response is returned to the source device; when the data to be written is different from the read written data, a write failure response is returned to the source device. Using the method of the present application, the written data is verified by the destination network device, and there is no need to transmit the data to the source device for verification, thereby reducing the consumption of network transmission resources, and reducing the time delay of data transmission.

Description

一种数据传输方法、网络设备、网络系统及芯片Data transmission method, network equipment, network system and chip 技术领域Technical field
本申请涉及存储领域,尤其涉及一种数据传输方法、网络设备、网络系统及芯片。This application relates to the storage field, and in particular to a data transmission method, network equipment, network system and chip.
背景技术Background technique
为了减少数据传输过程中对设备的计算资源的占用,一般采用远程直接数据存取(Remote Direct Memory Access,RDMA)的方式传输数据,例如,当需要把源端设备的数据传输至目的端设备时,源端设备的数据首先传输至目的端设备的网卡,而目的端设备的网卡中设置有RDMA芯片,这样,可通过网卡中的RDMA芯片直接将数据写入目的端设备的内存,而不需要目的端设备的处理器的参与,从而节省了目的端设备的计算资源。In order to reduce the occupation of the computing resources of the device during the data transmission process, the data is generally transferred by means of Remote Direct Memory Access (RDMA), for example, when the data from the source device needs to be transferred to the destination device , The data of the source device is first transmitted to the network card of the destination device, and the network card of the destination device is equipped with an RDMA chip. In this way, the data can be directly written into the memory of the destination device through the RDMA chip in the network card, without the need The participation of the processor of the destination device saves the computing resources of the destination device.
另外,为了保证传输数据的可靠性,当数据写入目的端设备后,还需要对所写入的数据进行校验,由于通过RDMA的方式传输数据时,不需要目的端设备的处理器的参与,则目的端设备不能对所写入的数据进行校验,需要通过源端设备进行校验,而通过源端设备校验时,需要将写入目的端设备内存中的数据读取至源端设备,这样不但消耗网络资源,而且增加了数据传输的时延。In addition, in order to ensure the reliability of the transmitted data, after the data is written to the destination device, the written data needs to be verified. Because the data is transmitted through RDMA, the processor of the destination device is not required to participate , The destination device cannot verify the written data and needs to be verified by the source device. When the source device is verified, the data written in the memory of the destination device needs to be read to the source. Equipment, this not only consumes network resources, but also increases the delay of data transmission.
发明内容Summary of the invention
本发明提供一种数据传输方法、网络设备、网络系统、及芯片,用于在目的端设备的网络设备执行数据校验,不仅节省了网络传输资源,而且减少了数据传输的时延。The present invention provides a data transmission method, network equipment, network system, and chip for performing data verification on the network equipment of the destination equipment, which not only saves network transmission resources, but also reduces the time delay of data transmission.
本发明第一方面提供一种网络设备,所述网络设备被配置为目的端网络设备(如:目的端设备的网卡)。所述网络设备接收来自于源端设备(如:源端计算设备)的写请求,执行所述写请求以将所述写请求中的待写数据写入目的端设备(如:目的端计算设备);在所述待写数据写入所述目的端设备后,从所述目的端设备中读取已写入数据;将所述写请求中的待写数据与所述读取的已写入数据进行比较;当所述待写数据与所述读取的已写入数据相同,返回写成功响应给所述源端设备;当所述待写数据与所述读取的已写入数据不同,返回写失败响应给所述源端设备。The first aspect of the present invention provides a network device configured as a destination network device (such as a network card of the destination device). The network device receives a write request from a source device (such as a source computing device), and executes the write request to write the data to be written in the write request to a destination device (such as a destination computing device). ); After the data to be written is written to the destination device, read the written data from the destination device; combine the data to be written in the write request with the read written data Data is compared; when the data to be written is the same as the read written data, a write success response is returned to the source device; when the data to be written is different from the read written data , And return a write failure response to the source device.
采用本发明提供的网络设备,可以将源端设备(如:源端计算设备)传输至目的端设备(如:目的端计算设备)的数据经过目的端设备的网络设备(简称为:目的端网络设备)写入目的端设备后,由所述目的端设备的网络设备(目的端设备的网卡)对写入的数据进行校验,而无需将数据传输至源端设备(如:源端计算设备)进行校验,从而减少了现有技术中对网络传输资源的消耗,在目的端设备的网络设备对已写入的数据进行校验后,直接返回数据写成功的响应给源端设备,避免了现有技术的通信开销,从而整体上降低了数据传输的时延。Using the network equipment provided by the present invention, the data transmitted from the source equipment (such as the source computing equipment) to the destination equipment (such as the destination computing equipment) can pass through the network equipment of the destination equipment (referred to as the destination network for short) After the device) is written to the destination device, the network device of the destination device (the network card of the destination device) verifies the written data without transmitting the data to the source device (such as the source computing device). ) Performs verification, thereby reducing the consumption of network transmission resources in the prior art. After the network device of the destination device verifies the data that has been written, it directly returns a successful data write response to the source device to avoid This reduces the communication overhead of the prior art, thereby reducing the data transmission delay as a whole.
在本发明第一方面的一种可能的实现中,所述网络设备被配置为目的端网络设备还用于在接收所述源端设备的写请求之后,缓存所述写请求中的待写数据。在具体实现中,可以在目的端网络设备的缓存中分配一块区域,用来缓存写请求中的待写数据。In a possible implementation of the first aspect of the present invention, the network device is configured as a destination network device and is further configured to, after receiving a write request from the source device, cache the data to be written in the write request . In a specific implementation, an area can be allocated in the cache of the destination network device to cache the data to be written in the write request.
在本发明第一方面的另一种可能的实现中,所述网络设备被配置为目的端网络设备 用于具体执行:解析所述写请求,获得所述待写数据以及所述待写数据的目的地址;及将所述待写数据写入所述目的端设备(如:目的端计算设备)中以所述目的地址标识的存储空间中。在具体实现中,上述的存储空间是目的端设备的内存。In another possible implementation of the first aspect of the present invention, the network device is configured as a destination network device for specific execution: parsing the write request to obtain the data to be written and the data to be written Destination address; and writing the data to be written into the storage space identified by the destination address in the destination device (such as the destination computing device). In a specific implementation, the aforementioned storage space is the memory of the destination device.
在本发明第一方面的另一种可能的实现中,所述网络设备被配置为源端网络设备(如:源端计算设备的网卡)以用于执行:接收所述源端设备(如:源端计算设备)发送的包含至少两个分条数据的条带数据,所述条带数据包含的各个分条数据的写入地址及至少一个校验数据的写入地址;根据所述至少两个分条数据生成所述至少一个校验数据;针对每一个分条数据,根据所述分条数据以及所述分条数据的写入地址生成一个写请求;针对每一个校验数据,根据所述校验数据以及所述校验数据的写入地址生成一个写请求;将上述生成的每个写请求分别发送至一个目的端设备。In another possible implementation of the first aspect of the present invention, the network device is configured as a source-end network device (such as a network card of a source-end computing device) for executing: receiving the source-end device (such as: The source computing device) sends striped data containing at least two strips of data, the write address of each striped data contained in the striped data and the write address of at least one check data; according to the at least two strips of data; Generate the at least one check data for each piece of data; for each piece of data, generate a write request according to the piece of data and the write address of the piece of data; for each piece of data, according to the The verification data and the write address of the verification data generate a write request; each write request generated above is sent to a destination device.
在上面这种可能的实现方式中,将校验数据的生成从源端设备(如:源端计算设备)卸载(Offload)到源端网络设备上(如:源端计算设备的网卡),可以降低源端设备的负载,节省源端设备(如:源端计算设备)的计算资源,从而整体上提高源端设备的数据处理效率。In the above possible implementation, the generation of verification data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device). Reduce the load of the source-end device, save the computing resources of the source-end device (such as the source-end computing device), thereby improving the data processing efficiency of the source-end device as a whole.
在本发明第一方面的另一种可能的实现中,所述网络设备被配置为源端网络设备以用于执行:接收所述源端设备发送的待写数据,及存储所述待写数据的至少一个逻辑副本的至少一个逻辑地址;锁定所述至少一个逻辑地址;针对每一个逻辑地址,根据所述待写数据及所述逻辑地址生成一个写请求;将所述生成的至少一个写请求中的每个写请求分别发送至一个目标网络设备。In another possible implementation of the first aspect of the present invention, the network device is configured as a source-end network device for executing: receiving the data to be written sent by the source device, and storing the data to be written At least one logical address of at least one logical copy of the at least one logical address; lock the at least one logical address; for each logical address, generate a write request based on the data to be written and the logical address; combine the generated at least one write request Each write request in is sent to a target network device.
在上面这种可能的实现方式中,将副本数据的生成从源端设备(如:源端计算设备)卸载(Offload)到源端网络设备上(如:源端计算设备的网卡),,可以降低源端设备的负载,节省源端设备(如:源端计算设备)的计算资源,从而整体上提高源端设备的数据处理效率。In the above possible implementation manner, the generation of duplicate data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device), you can Reduce the load of the source-end device, save the computing resources of the source-end device (such as the source-end computing device), thereby improving the data processing efficiency of the source-end device as a whole.
在本发明第一方面的再一种可能的实现方式中,所述网络设备被配置为源端网络设备以用于:并行将所述生成的至少一个写请求中的每个写请求分别发送至一个目标网络设备。In yet another possible implementation manner of the first aspect of the present invention, the network device is configured as a source-end network device for: parallelly sending each of the at least one generated write request to A target network device.
在上面这种可能的实现方式中,通过并行地将多个写请求发送至目标网络设备,增强了源端网络设备(如:源端计算设备的网卡)数据发送的并发性,从而提高数据的传输效率。In the above possible implementation manner, by sending multiple write requests to the target network device in parallel, the concurrency of data transmission of the source network device (such as the network card of the source computing device) is enhanced, thereby improving data efficiency. Transmission efficiency.
本发明第二方面提供一种数据传输的方法,所述方法可由目的网络设备执行。其中该方法包括:The second aspect of the present invention provides a data transmission method, which can be executed by a destination network device. The method includes:
接收来自于源端设备的写请求,执行所述写请求以将该写请求中的待写数据写入目的端设备;Receiving a write request from the source device, and executing the write request to write the data to be written in the write request to the destination device;
在所述待写数据写入所述目的端设备后,从所述目的端设备中读取已写入数据;After the data to be written is written into the destination device, read the written data from the destination device;
将所述写请求中的待写数据与所读取的已写入数据比较;Comparing the data to be written in the write request with the read data that has been written;
当所述待写数据与所述读取的已写入数据相同,返回写成功响应给源端设备;When the data to be written is the same as the read data that has been written, returning a write success response to the source device;
当所述待写数据与所述读取的已写入数据不同,返回写失败响应给源端设备。When the to-be-written data is different from the read-written data, a write failure response is returned to the source device.
采用本发明第二方面所提供的数据传输方法,可以将源端设备传输至目的端设备的数据经过目的端设备的网络设备(简称为:目的端网络设备)写入所述目的端设备后,由所述目的端设备的网络设备对写入的数据进行校验,而无需将数据传输至源端设备进 行校验,从而减少了现有技术中对网络资源的消耗,在目的端设备的网络设备对已写入的数据进行校验后,直接返回数据写成功的响应给源端设备,避免了现有技术的通信开销,从而整体上降低了数据传输的时延。Using the data transmission method provided by the second aspect of the present invention, the data transmitted from the source device to the destination device can be written into the destination device through the network device of the destination device (referred to as the destination network device), The network device of the destination device verifies the written data without transmitting the data to the source device for verification, thereby reducing the consumption of network resources in the prior art. After the device verifies the written data, it directly returns a data write successful response to the source device, avoiding the communication overhead of the prior art, thereby reducing the data transmission delay as a whole.
在本发明第二方面的一种可能的实现中,所述方法由目的端网络设备执行,且还包括:在所述接收写请求后,缓存所述写请求中的待写数据。In a possible implementation of the second aspect of the present invention, the method is executed by a destination network device, and further includes: after receiving the write request, buffering the data to be written in the write request.
在本发明第二方面另一种可能的实现中,所述写请求还包括所述待写数据的目的地址,所述执行所述写请求以将所述写请求中的待写数据写入目的端设备包括:将所述写请求中的待写数据,按照所述写请求中的目的地址写入所述目的端设备的存储单元。In another possible implementation of the second aspect of the present invention, the write request further includes the destination address of the data to be written, and the execution of the write request is to write the data to be written in the write request to the destination. The end device includes: writing the data to be written in the write request into the storage unit of the destination device according to the destination address in the write request.
在本发明第二方面的另一种可能的实现中,所述方法由源网络设备执行。该方法还包括:In another possible implementation of the second aspect of the present invention, the method is executed by a source network device. The method also includes:
接收源端设备发送的条带数据以及所述构成所述条带数据的多个分条数据的写入地址及至少一个校验数据的写入地址;Receiving the striped data sent by the source-end device and the write address of the plurality of striped data constituting the striped data and the write address of at least one check data;
根据所述多个分条数据生成所述至少一个校验数据;Generating the at least one verification data according to the plurality of striped data;
根据所述多个分条数据的写入地址,所述至少一个校验数据的写入地址,所述多个分条数据及所述至少一个校验数据生成多个写请求,其中每个分条数据对应一个写请求,每个校验数据对应一个写请求;According to the write address of the plurality of striped data, the write address of the at least one parity data, the plurality of striped data and the at least one parity data generate a plurality of write requests, wherein each partition Each piece of data corresponds to a write request, and each check data corresponds to a write request;
将所述多个写请求中的每个写请求分别发送至一个目的端设备。Each of the multiple write requests is sent to a destination device respectively.
在上面这种可能的实现方式中,将校验数据的生成从源端设备(如:源端计算设备)卸载(Offload)到源端网络设备上(如:源端计算设备的网卡),可以降低源端设备的负载,节省源端设备(如:源端计算设备)的计算资源,从而提高源端设备的数据处理效率。In the above possible implementation, the generation of verification data is offloaded from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device). Reduce the load of the source device and save the computing resources of the source device (such as the source computing device), thereby improving the data processing efficiency of the source device.
在本发明第二方面的另一种可能的实现中,所述方法还包括:In another possible implementation of the second aspect of the present invention, the method further includes:
所述源网络设备接收所述源端设备发送的待写数据,及所述待写数据的逻辑地址;Receiving, by the source network device, the data to be written sent by the source device and the logical address of the data to be written;
获取所述逻辑地址对应存储所述待写数据的多个副本的副本地址;Acquiring a copy address corresponding to the logical address storing multiple copies of the data to be written;
锁定所述逻辑地址;Lock the logical address;
根据所述待写数据及每个副本地址生成多个写请求;Generate multiple write requests according to the data to be written and each copy address;
将所述多个写请求中的每个写请求分别发送至一个目标网络设备。Each of the multiple write requests is sent to a target network device respectively.
在上面这种可能的实现方式中,将副本数据的生成从源端设备(如:源端计算设备)卸载(Offload)到源端网络设备上(如:源端计算设备的网卡),可以降低源端设备的负载,节省源端设备(如:源端计算设备)的计算资源,从而整体上提高源端设备的数据处理效率。In this possible implementation manner, offloading the generation of duplicate data from the source device (such as the source computing device) to the source network device (such as the network card of the source computing device) can reduce The load of the source-end device saves the computing resources of the source-end device (such as the source-end computing device), thereby improving the data processing efficiency of the source-end device as a whole.
在本发明第二方面再一种可能的实现中,将所述多个写请求中的每个写请求分别并行发送至一个目标网络设备。In yet another possible implementation of the second aspect of the present invention, each of the multiple write requests is sent to a target network device in parallel.
在上面这种可能的实现方式中,通过并行地将多个写请求发送至目标网络设备,增强了源端网络设备(如:源端计算设备的网卡)数据发送的并发性,从而提高数据的传输效率。In the above possible implementation manner, by sending multiple write requests to the target network device in parallel, the concurrency of data transmission of the source network device (such as the network card of the source computing device) is enhanced, thereby improving data efficiency. Transmission efficiency.
本发明第三方面提供一种网络系统。所述网络系统包括源端设备(如:源端计算设备)、目的端设备(如:目的端计算设备)、以及和所述目的端设备相连的如第一方面或第一方面任意一种可能的实现所提供的目的端网络设备(如:目的端计算设备的网卡)。The third aspect of the present invention provides a network system. The network system includes a source device (such as a source computing device), a destination device (such as a destination computing device), and any one of the first aspect or the first aspect connected to the destination device. The realization of the provided destination network device (such as the network card of the destination computing device).
本发明第四方面提供一种芯片,包括接口及处理器,所述接口用于为所述处理器提供输入及输出数据;所述处理器,用于执行本发明第二方面或第二方面任意一种可能的实现提供的方法。在具体实现过程中,该芯片可以以CPU(Central processing unit,中央处理器)、MCU(Micro controller Unit,微控制器)、MPU(Micro processing unit,微处理器)、DSP(Digital signal processing,数字信号处理器)、SoC(System on Chip,片上系统)、ASIC(application-specific integrated circuit,专用集成电路)、FPGA(Field Programmable Gate Array,现场可编程门阵列)或PLD(Programmable Logic Device,可编辑逻辑器件)的形式实现。The fourth aspect of the present invention provides a chip, including an interface and a processor, the interface is used to provide input and output data for the processor; the processor is used to execute the second aspect or any of the second aspect of the present invention A possible implementation of the provided method. In the specific implementation process, the chip can use CPU (Central processing unit, central processing unit), MCU (Micro controller Unit, microcontroller), MPU (Micro processing unit, microprocessor), DSP (Digital signal processing, digital Signal processor), SoC (System on Chip), ASIC (application-specific integrated circuit, application-specific integrated circuit), FPGA (Field Programmable Gate Array) or PLD (Programmable Logic Device), editable Logic device).
本发明第五方面提供一种网络设备,所述网络设备包括处理器及存储器,所述存储器中存储有程序指令,所述处理器执行所述存储器中的程序指令以执行本发明第二方面或第二方面任意一种可能的实现所提供的方法。A fifth aspect of the present invention provides a network device, the network device includes a processor and a memory, the memory stores program instructions, and the processor executes the program instructions in the memory to execute the second aspect of the present invention or In the second aspect, any possible implementation of the method provided.
本发明第六方面提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行本发明第二方面或第二方面任意一种可能的实现所提供的方法。这些计算机可读存储包括但不限于如下的一个或者多个:ROM(Read-Only Memory,只读存储器)、PROM(Programmable ROM,可编程ROM)、EPROM(Erasable PROM,可擦除的PROM)、Flash存储器、EEPROM(Electrically EPROM,电EPROM)和硬盘驱动器(Hard drive)。The sixth aspect of the present invention provides a computer-readable storage medium, and the computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute the second aspect or any possible implementation of the second aspect of the present invention The method provided. These computer-readable storages include but are not limited to one or more of the following: ROM (Read-Only Memory), PROM (Programmable ROM), EPROM (Erasable PROM), Flash memory, EEPROM (Electrically EPROM, electric EPROM) and hard drive (Hard drive).
第七方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本发明第二方面或第二方面任意一种可能的实现所提供的方法。In a seventh aspect, the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method provided by the second aspect or any one of the possible implementations of the second aspect of the present invention.
可以理解,本发明的第三方面和第三方面的任意一种可能的实现方式和第一方面和第一方面的任意一种可能的实现方式分别对应,第四、五、六、七方面和它们任意一种可能的实现方式和第二方面以及第二方面任意一种可能的实现方式分别对应,其实现的技术效果不再赘述。It can be understood that any one of the possible implementation manners of the third aspect and the third aspect of the present invention corresponds to any one of the first aspect and the first aspect respectively, and the fourth, fifth, sixth, and seventh aspects and Any one of their possible implementation manners corresponds to the second aspect and any one of the possible implementation manners of the second aspect respectively, and the technical effects of the implementation are not repeated here.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.
图1为本发明实施例提供的数据传输系统的架构图。FIG. 1 is an architecture diagram of a data transmission system provided by an embodiment of the present invention.
图2为本发明实施例提供的数据传输方法的流程图。Fig. 2 is a flowchart of a data transmission method provided by an embodiment of the present invention.
图3为本发明实施例提供的网络设备的模块图。Fig. 3 is a block diagram of a network device provided by an embodiment of the present invention.
图4为本发明实施例提供的网络设备的硬件结构图。Fig. 4 is a hardware structure diagram of a network device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments.
如图1所示,为一种网络传输系统的架构图。源端设备10与目的端设备20之间通过网络30连接。本发明实施例的一种适用场景为:源端设备10及目的端设备20可以为存储阵列,在主机(图未示)发送写请求至源端设备10后,源端设备10根据预设的可靠性策略,源端设备10会将写请求中的待写数据写入多个目的端设备20。为了方便描述,本发明实施例仅以一个目的端设备20为例进行说明。As shown in Figure 1, it is an architecture diagram of a network transmission system. The source device 10 and the destination device 20 are connected through a network 30. An applicable scenario of the embodiment of the present invention is: the source device 10 and the destination device 20 may be storage arrays. After the host (not shown) sends a write request to the source device 10, the source device 10 is based on a preset According to the reliability strategy, the source device 10 writes the data to be written in the write request into multiple destination devices 20. For ease of description, the embodiment of the present invention only takes one destination device 20 as an example for description.
源端设备10包括处理单元101、存储单元102及网络设备103。网络设备103中包括远 程直接数据存取(Remote Direct Memory Access,RDMA)芯片1031。目的端设备20包括处理单元201、存储单元202及网络设备203。网络设备203中包括RDMA芯片2031。存储单元102与存储单元202分别为源端设备10及目的端设备20的内存,可以为动态随机存取存储器(Dynamic Random Access Memory,DRAM),也可以为存储级内存(storage class memory,SCM)。The source device 10 includes a processing unit 101, a storage unit 102, and a network device 103. The network device 103 includes a remote direct memory access (RDMA) chip 1031. The destination device 20 includes a processing unit 201, a storage unit 202, and a network device 203. The network device 203 includes an RDMA chip 2031. The storage unit 102 and the storage unit 202 are the memories of the source device 10 and the destination device 20, respectively, which can be dynamic random access memory (Dynamic Random Access Memory, DRAM) or storage class memory (SCM) .
处理单元101与处理单元201可以有多种具体实现形式,例如可以为中央处理器(central processing unit,CPU)或图像处理器(graphics processing unit,GPU),可以是单核处理器或多核处理器,如果实现为多核处理器,处理器的数目不做限定。The processing unit 101 and the processing unit 201 can have a variety of specific implementation forms, such as a central processing unit (CPU) or graphics processing unit (GPU), and can be a single-core processor or a multi-core processor. If it is implemented as a multi-core processor, the number of processors is not limited.
存储单元102与存储单元202分别为源端设备10及目的端设备20的内存,可以为动态随机存取存储器(Dynamic Random Access Memory,DRAM),也可以为存储级内存(storage class memory,SCM)。存储单元102与存储单元202用于存储程序代码和数据,以便于处理单元调用执行以实现相关功能。The storage unit 102 and the storage unit 202 are the memories of the source device 10 and the destination device 20, respectively, which can be dynamic random access memory (Dynamic Random Access Memory, DRAM) or storage class memory (SCM) . The storage unit 102 and the storage unit 202 are used to store program codes and data, so that the processing unit can call and execute to implement related functions.
在本发明实施例中,网络设备103与网络设备203分别为属于源端设备10及目的端设备20的网卡,但在其他实施例中,所述网络设备103与网络设备203为分别连接至源端设备10及目的端设备20的交换机。In the embodiment of the present invention, the network device 103 and the network device 203 are network cards belonging to the source device 10 and the destination device 20, respectively, but in other embodiments, the network device 103 and the network device 203 are respectively connected to the source The switch of the end device 10 and the destination device 20.
本发明实施例中,在将数据从源端设备10传输至目的端设备20时,采用RDMA的方式,具体为:源端设备10的处理单元101将写请求先发送至源端设备10的网络设备103,网络设备103的RDMA芯片1031将写请求转发至目的端设备20的网络设备203,网络设备203的RDMA2031将写请求中的数据写入目的端设备20的存储单元202。在通过RDMA将数据由源端设备10写入目的端设备20时,目的端设备20的处理单元201是不需要参与的,所以节省了目的端设备20的计算资源。In the embodiment of the present invention, when data is transmitted from the source device 10 to the destination device 20, the RDMA method is adopted. Specifically, the processing unit 101 of the source device 10 first sends the write request to the network of the source device 10 Device 103, the RDMA chip 1031 of the network device 103 forwards the write request to the network device 203 of the destination device 20, and the RDMA 2031 of the network device 203 writes the data in the write request to the storage unit 202 of the destination device 20. When data is written from the source device 10 to the destination device 20 through RDMA, the processing unit 201 of the destination device 20 does not need to participate, so the computing resources of the destination device 20 are saved.
一般为了防止数据写入存储单元时出错,都要对写入的数据进行校验,但在采用RDMA进行数据传输时,由于目的端设备20的处理单元201不参与数据的传输,所以无法通过目的端设备进行数据的校验。目前常用的方法是采用源端设备10的处理单元101对写入目的端设备20的存储单元202的数据进行校验,校验的过程包括:Generally, in order to prevent errors when data is written into the storage unit, the written data must be verified. However, when RDMA is used for data transmission, because the processing unit 201 of the destination device 20 does not participate in the data transmission, it cannot pass the destination. The end device performs data verification. At present, the commonly used method is to use the processing unit 101 of the source device 10 to verify the data written in the storage unit 202 of the destination device 20. The verification process includes:
(1)在数据写入目的端设备20的存储单元202后,目的端设备20发送响应信息给源端设备10。(1) After the data is written into the storage unit 202 of the destination device 20, the destination device 20 sends response information to the source device 10.
(2)源端设备10的处理单元101再发起一次读请求,将写入目的端设备20的存储单元202的数据读取至源端设备10,由源端设备10的处理单元101比较写请求中的数据与从目的端设备20的存储单元202中读取的数据是否一致。(2) The processing unit 101 of the source device 10 initiates a read request again to read the data written in the storage unit 202 of the destination device 20 to the source device 10, and the processing unit 101 of the source device 10 compares the write request Whether the data in is consistent with the data read from the storage unit 202 of the destination device 20.
(3)如果比较结果显示两者一致,则数据校验成功,则将写完成的响应发给产生所述写请求的应用。(3) If the comparison result shows that the two are consistent, the data verification is successful, and the write completion response is sent to the application that generated the write request.
上述现有技术中的这种校验方法,已写入的数据需要从目的端设备20读取至源端设备10,增加了网络带宽的消耗,由于需要源端设备发送读取目的端设备已写入数据的指令,增加了数据传输过程的通信开销,从而整体上也增加了数据写入操作的时延。In the above-mentioned verification method in the prior art, the written data needs to be read from the destination device 20 to the source device 10, which increases the consumption of network bandwidth. Because the source device is required to send and read the destination device’s data. The instruction to write data increases the communication overhead of the data transmission process, thereby increasing the time delay of the data writing operation as a whole.
为了解决上述问题,本发明提供一种数据传输方法,源端设备10传输至目的端设备20的数据经过目的端设备20的网络设备203写入所述目的端设备的存储单元202后,由所述目的端设备20的网络设备203对写入的数据进行校验,而无需将数据传输至源端设备10进行校验,从而减少了对网络资源的消耗,在目的端设备的网络设备对已写入的数据进行校验后,直接返回数据写成功的响应给源端设备,避免了现有技术中的通信开销,从而整体上降低了数据传输的时延。下面结合图2,将源端设备10产生的数据写入目的端设 备20的为例介绍本发明实施例所提供的数据传输方法。In order to solve the above-mentioned problems, the present invention provides a data transmission method. After the data transmitted from the source device 10 to the destination device 20 is written into the storage unit 202 of the destination device through the network device 203 of the destination device 20, The network device 203 of the destination device 20 verifies the written data without transmitting the data to the source device 10 for verification, thereby reducing the consumption of network resources. The network device of the destination device verifies the data that has been written. After the written data is verified, the response of data writing success is directly returned to the source end device, which avoids the communication overhead in the prior art, thereby reducing the data transmission delay as a whole. In the following, the data transmission method provided by the embodiment of the present invention will be described as an example in which the data generated by the source device 10 is written into the destination device 20 with reference to Fig. 2.
如图2所示,为本发明实施例中将数据从源端设备10写入目的端设备20的方法的流程图。As shown in FIG. 2, it is a flowchart of a method for writing data from a source device 10 to a destination device 20 in an embodiment of the present invention.
步骤S201,源端设备10接收主机发送的写请求,所述写请求目的端设备中携带待写数据及待写数据的逻辑地址。In step S201, the source device 10 receives a write request sent by the host, and the destination device of the write request carries the data to be written and the logical address of the data to be written.
在执行本步骤时,源端设备10作为存储设备时,接收主机的写请求后,会对写请求中的待写数据进行存储。具体实现中,主机的写请求,可能是运行在主机的应用程序或者运行在主机虚拟机中的应用程序发出的。When performing this step, when the source device 10 is used as a storage device, after receiving a write request from the host, it will store the data to be written in the write request. In specific implementation, the write request of the host may be issued by an application running on the host or an application running in a virtual machine of the host.
步骤S202,源端设备10的处理单元101确定源端设备10的可靠性策略,如果该源端设备10所支持的是纠删码(erasure code,EC)或者磁盘冗余阵列(Redundant Arrays of Independent Drives,RAID)策略,则执行步骤S203,如果该源端设备10所支持的是多副本的可靠性策略,则执行步骤S205。In step S202, the processing unit 101 of the source device 10 determines the reliability policy of the source device 10. If the source device 10 supports erasure code (EC) or redundant array of disks (Redundant Arrays of Independent) Drives, RAID) strategy, step S203 is executed, and if the source device 10 supports a multi-copy reliability strategy, step S205 is executed.
在数据传输过程中有可能产生数据丢失,另外,由于存储设备的故障,也有可能产生数据的丢失,所以为了保障数据的可靠性,需要设置一些可靠性策略并执行这些可靠性策略(如:对冗余数据的存储),以在数据丢失时,可利用这些冗余数据进行数据恢复。目前主要有两种可靠性策略,一种为纠删码(erasure code,EC)或者磁盘冗余阵列(Redundant Arrays of Independent Drives,RAID)策略,这两种方式都为将写请求中的待写数据凑成一个条带数据,再将一个条带数据分成多个分条数据,计算多个分条数据校验数据,并将多个分条数据和校验数据分开存储,这样,在任意一个分条数据丢失之后,可以通过其他分条数据和校验数据恢复所丢失的分条数据,例如,对于RAID 5,一个条带数据可以分为三个分条数据,并为三个分条数据计算一个校验数据,对于RAID6,一个条带数据可以分为三个分条数据,并为三个分条数据计算两个校验数据;另外一种为多副本策略,这种方式是在写数据的时候,为待写数据生成多个备份数据,并将不同的备份数据写入不同的目的端设备,这样当数据损坏之后,则可以通过读取备份数据恢复数据。一般应用会预先设置好可靠性策略,所述处理单元101在产生写请求后,会查看应用预设的可靠性策略,并根据不同的可靠性策略做不同的处理。Data loss may occur during data transmission. In addition, due to storage device failure, data loss may also occur. Therefore, in order to ensure the reliability of data, it is necessary to set up some reliability strategies and implement these reliability strategies (such as: Redundant data storage), so that when data is lost, these redundant data can be used for data recovery. At present, there are mainly two reliability strategies, one is erasure code (EC) or redundant array of disks (Redundant Arrays of Independent Drives, RAID) strategies, both of which are to write to the write request The data is combined into one strip of data, and then one strip of data is divided into multiple strips of data, the verification data of multiple strips is calculated, and the multiple strips of data and the verification data are stored separately. In this way, in any one After the striped data is lost, the lost striped data can be recovered through other striped data and parity data. For example, for RAID 5, a striped data can be divided into three striped data and three striped data Calculate a parity data. For RAID6, a stripe data can be divided into three stripe data, and two parity data are calculated for the three stripe data; the other is a multiple copy strategy, which is writing When data is being stored, multiple backup data are generated for the data to be written, and different backup data are written to different destination devices, so that when the data is damaged, the data can be restored by reading the backup data. A general application will set a reliability strategy in advance. After the processing unit 101 generates a write request, it will check the reliability strategy preset by the application and perform different processing according to different reliability strategies.
步骤S203,如果在步骤S202中,确定应用所支持的可靠性策略为EC或者RAID策略,则源端设备10的处理器101确定待写数据构成条带数据时,将构成条带数据的待写数据及相关的地址信息发送至源端设备的网络设备(又称为:源端网络设备)103。Step S203, if it is determined in step S202 that the reliability strategy supported by the application is an EC or RAID strategy, when the processor 101 of the source device 10 determines that the data to be written constitutes striped data, it will constitute the striped data to be written The data and related address information are sent to the network device (also referred to as the source network device) 103 of the source device.
当源端设备10采用EC或者Raid策略保证数据的可靠性时,需要将待写数据凑成条带数据,每个条带数据由多个分条数据组成,源端设备10会预先设置好每个条带数据的大小及构成条带数据的分条数据的数量,等接收到的待写数据凑够所设置的条带数据的大小时,则将所凑成的条带数据发送至目的端设备网络设备103。例如当源端设备10采用的是RAID5时,则需要将待写入数据凑满由3个分条数据构成的条带数据后再发送至网络设备130。另外,根据RAID算法,不同的分条数据会写入不同的位置,另外,还需要生成分条数据的校验数据,所以所述源端设备10还会将每个分条数据及校验数据所写入位置的地址及校验数据的地址发送至源端设备的网络设备103。在本发明实施例中,虽然不会在源端设备10中为条带数据生成校验数据,但会将校验数据存储的地址发送至网络设备103。When the source device 10 adopts the EC or Raid strategy to ensure the reliability of the data, the data to be written needs to be combined into striped data. Each striped data is composed of multiple striped data. The source device 10 will set each data in advance. The size of each striped data and the number of striped data that make up the striped data. When the received data to be written is enough for the size of the set striped data, the made-up striped data will be sent to the destination Equipment network equipment 103. For example, when the source device 10 adopts RAID 5, the data to be written needs to be filled with striped data composed of three striped data and then sent to the network device 130. In addition, according to the RAID algorithm, different striped data will be written to different locations. In addition, check data of the striped data needs to be generated. Therefore, the source device 10 will also write each striped data and check data. The address of the written location and the address of the verification data are sent to the network device 103 of the source device. In the embodiment of the present invention, although the verification data for the strip data is not generated in the source device 10, the address where the verification data is stored is sent to the network device 103.
步骤S204,网络设备(即:源端网络设备)103在接收到所述条带数据后,为所述条带数据计算校验数据。Step S204: After receiving the striped data, the network device (ie, the source-end network device) 103 calculates check data for the striped data.
为了保证数据的可靠性,在RAID算法中,会为条带化后的数据生成校验数据,这样,在其中一个分条损坏之后,可通过校验数据进行恢复。目前,一般的做法是源端设备的处理单元在构成条带数据后,再生成校验数据,这样会占用源端设备的处理单元的计算资源,从而影响IO的处理。在本发明实施例中,可在网络设备103中计算条带数据的校验数据,这样,可以节省源端设备10的计算资源。具体实现可以为在网络设备103中增加一个专用芯片,专门用来生成校验数据,另外一种为在网络设备103中安装一段程序指令,由网络设备103中的微处理器调用所述程序指令以生成所述校验数据。另外,在网络设备103接收到所述条带数据后,会先将所述条带数据缓存在所述网络设备的缓存中,然后再计算校验数据。In order to ensure the reliability of the data, in the RAID algorithm, check data is generated for the striped data, so that after one of the strips is damaged, the check data can be used to recover. At present, the general practice is that the processing unit of the source device generates the check data after forming the striped data, which will occupy the computing resources of the processing unit of the source device, thereby affecting the IO processing. In the embodiment of the present invention, the check data of the stripe data can be calculated in the network device 103, so that the computing resources of the source device 10 can be saved. The specific implementation can be to add a dedicated chip to the network device 103 to generate verification data, and the other is to install a program instruction in the network device 103, which is called by the microprocessor in the network device 103 To generate the verification data. In addition, after the network device 103 receives the striped data, it will first cache the striped data in the cache of the network device, and then calculate the check data.
步骤S205,所述网络设备(即:源端网络设备)103生成多个子写请求,将所述子写请求根据子写请求的地址并行发送至不同的目的端设备。Step S205: The network device (ie, the source network device) 103 generates multiple sub-write requests, and sends the sub-write requests to different destination devices in parallel according to the addresses of the sub-write requests.
在校验数据生成后,所述源端设备10的处理器进一步根据源端设备处理器101传输的条带数据即每个分条的地址及校验数据的地址生成多个子写请求,其中一个分条对应一个子写请求,一个校验数据对应一个子写请求。在本发明实施例中,网络设备103可以并行地将多个子写请求发送至不同的目的端设备,跟通过源端设备串行发送的方式相比,提升了数据发送的效率。After the verification data is generated, the processor of the source device 10 further generates a plurality of sub-write requests according to the strip data transmitted by the source device processor 101, that is, the address of each strip and the address of the verification data, one of which is Striping corresponds to one sub-write request, and one check data corresponds to one sub-write request. In the embodiment of the present invention, the network device 103 can send multiple sub-write requests to different destination devices in parallel, which improves the efficiency of data transmission compared with the serial transmission by the source device.
步骤S206,如果在步骤S202中确定应用所支持的可靠性策略为多副本策略,源端设备10的处理单元101将所述写请求转发至所述源端设备10的网络设备103,所述写请求中携带待写数据即待写数据的逻辑地址。Step S206: If it is determined in step S202 that the reliability policy supported by the application is a multiple copy policy, the processing unit 101 of the source device 10 forwards the write request to the network device 103 of the source device 10, and the write The request carries the data to be written, that is, the logical address of the data to be written.
步骤S207,所述源端设备的网络设备103锁定所述写请求中携带的所述待写数据的逻辑地址。Step S207: The network device 103 of the source device locks the logical address of the data to be written carried in the write request.
一般情况下,在源端设备10中会为每个应用生成一个持久化日志(persistence log,Plog),该日志中记录了源端设备10为每个应用所分配的逻辑空间,并根据为该应用的数据设置的备份数据的数量及备份地址,为应用的逻辑空间设置存储备份数据的目的端设备20中的存储单元的物理地址。在源端设备10接收到写请求后,就会锁定plog中记录的该写请求所携带的逻辑地址,以免该逻辑地址被分配给其他写请求,当逻辑地址锁定之后,则该逻辑地址对应的其他目的端设备20中的存储单元202的物理地址也会被锁定,如此保证了数据的可靠性。但通过源端设备10进行Plog的空间分配,会增加源端设备10的负载,所以在本发明实施例中,该功能卸载(Offload)到网络设备103中。即将Plog转移至源端设备10存储,在网络设备103接收到源端设备10发送的写请求后,锁定Plog中写请求的逻辑地址,以免该逻辑地址被分配给其他写请求。另外,在网络设备103接收到所述写请求后,会先将所述写请求中的待写数据缓存在所述网络设备的缓存中,然后再执行后续的处理。Generally, a persistence log (Plog) is generated for each application in the source device 10, and the log records the logical space allocated by the source device 10 for each application. The number of backup data and the backup address set by the application data set the physical address of the storage unit in the destination device 20 storing the backup data for the logical space of the application. After the source device 10 receives the write request, it locks the logical address carried by the write request recorded in the plog to prevent the logical address from being assigned to other write requests. When the logical address is locked, the logical address corresponding to the The physical address of the storage unit 202 in the other destination device 20 will also be locked, thus ensuring the reliability of the data. However, the Plog space allocation performed by the source device 10 will increase the load of the source device 10, so in the embodiment of the present invention, this function is offloaded to the network device 103. That is, the Plog is transferred to the source device 10 for storage. After the network device 103 receives the write request sent by the source device 10, the logical address of the write request in the Plog is locked to prevent the logical address from being allocated to other write requests. In addition, after the network device 103 receives the write request, it will first cache the data to be written in the write request in the cache of the network device, and then perform subsequent processing.
步骤S208,所述源端设备10的网络设备103获取所述写请求携带的待写数据的多个副本的写入地址,并根据所述多个副本的写入地址及待写数据生成多个子写请求。Step S208: The network device 103 of the source device 10 obtains the write addresses of the multiple copies of the data to be written carried in the write request, and generates multiple sub-files based on the write addresses of the multiple copies and the data to be written. Write request.
由于plog中记录了逻辑地址空间与多个目的端设备20的存储单元202的物理地址的映射关系,则可以根据待写数据的逻辑地址确定多个副本数据的写入地址。如此,可以根据待写数据和多个副本的写入地址生成多个子写请求。在生成子写请求时,将待写数据和每个副本的写入地址携带在每个子写请求中。Since the mapping relationship between the logical address space and the physical addresses of the storage units 202 of the multiple destination devices 20 is recorded in the plog, the write addresses of multiple copies of data can be determined according to the logical addresses of the data to be written. In this way, multiple sub-write requests can be generated according to the data to be written and the write addresses of the multiple copies. When generating the sub-write request, the data to be written and the write address of each copy are carried in each sub-write request.
步骤S209,所述源端设备的网络设备103将所述多个子写请求根据每个子写请求的地 址发送至不同的目的端设备20。Step S209: The network device 103 of the source device sends the multiple sub-write requests to different destination devices 20 according to the address of each sub-write request.
在本发明实施例中,为了实现更好的可靠性,一般将条带化的数据中的每个分条或者不同的副本数据存储至不同的目的端设备,但将条带化的分条数据或者不同的副本数据存储至同一目的端设备的不同地址也在本发明所保护的范围之内。In the embodiment of the present invention, in order to achieve better reliability, generally each stripe or different copy data in the striped data is stored in different destination devices, but the striped striped data is stored in different destination devices. Or different copies of data stored in different addresses of the same destination device are also within the protection scope of the present invention.
步骤S210,目的端设备20的网络设备203在接收到子写请求后,通过RDMA将数据写入所述目的端设备20的存储单元202中,在数据成功写入存储单元202后,存储单元202产生写成功的响应。In step S210, after receiving the sub-write request, the network device 203 of the destination device 20 writes data into the storage unit 202 of the destination device 20 through RDMA. After the data is successfully written into the storage unit 202, the storage unit 202 Generate a successful write response.
步骤S211,网络设备203在收到写入成功的响应后,通过RDMA从所述存储单元202中所述子写请求所指示的地址处读取数据,并将所读取的数据与子写请求中的待写数据进行比较。In step S211, after receiving the successful write response, the network device 203 reads data from the address indicated by the sub-write request in the storage unit 202 through RDMA, and compares the read data with the sub-write request. Compare the data to be written in.
在数据写入存储单元202的时候,有可能发生地址写偏等错误,所以为了及时发现这类错误,在数据写入存储单元202后,还需要对其进行校验,本发明实施例采用的校验方法为由网络设备203将待写数据写入存储单元202后,在从写入的地址处读取写入的地址,这样,如果发生地址写偏,则从写入地址处读取的数据就会与写入的数据不同,从而可以发现地址写偏的错误。When data is written into the storage unit 202, errors such as address write offset may occur. Therefore, in order to detect such errors in time, after the data is written into the storage unit 202, it needs to be verified. The embodiment of the present invention adopts The verification method is that after the network device 203 writes the data to be written into the storage unit 202, the written address is read from the written address, so that if an address offset occurs, the data read from the written address The data will be different from the written data, so you can find the address writing error.
步骤S212,如果数据一致,则所述目的端设备20的网络设备203返回写成功的响应至源端设备10。In step S212, if the data is consistent, the network device 203 of the destination device 20 returns a write success response to the source device 10.
步骤S213,如果数据不一致,则所述目的端设备20的网络设备203返回写失败的响应至源端设备10。In step S213, if the data is inconsistent, the network device 203 of the destination device 20 returns a write failure response to the source device 10.
跟传统的数据校验方法相比,在本发明实施例中,通过目的端设备20网络设备203对写入目的端设备20的存储单元202的数据进行校验,不用消耗网络带宽,也减少了数据写入的延迟。Compared with the traditional data verification method, in the embodiment of the present invention, the data written into the storage unit 202 of the destination device 20 is verified by the destination device 20 and the network device 203, without consuming network bandwidth, and reducing The latency of data writing.
如图3所示,为本发明实施例中的网络设备的架构图,该网络设备可以为图1中的源端设备10的网络设备103,也可以为目的端设备20的网络设备203。As shown in FIG. 3, it is an architecture diagram of a network device in an embodiment of the present invention. The network device may be the network device 103 of the source device 10 in FIG. 1 or the network device 203 of the destination device 20.
所述网络设备包括校验数据生成模块301、地址锁定模块302、并发模块303、RDMA模块304、及写后校验模块305。其中,校验数据生成模块301、地址锁定模块302、并发模块303、RDMA模块304在所述网络设备作为源端设备10的网络设备时使用,而RDMA模块304和写后校验模块305为在所述网络设备作为目的端设备时使用。The network device includes a verification data generation module 301, an address lock module 302, a concurrency module 303, an RDMA module 304, and a post-write verification module 305. Among them, the verification data generation module 301, the address locking module 302, the concurrency module 303, and the RDMA module 304 are used when the network device is used as the network device of the source device 10, and the RDMA module 304 and the post-write verification module 305 are used in The network device is used as a destination device.
如果源端设备10中所采用的可靠性策略是EC/RAID策略,则源端设备10会将条带数据发送至源网络设备,则所述校验数据生成模块301就会为所述条带数据生成校验数据,具体请参考图2中的步骤S204的描述。在生成校验数据后,所述并发模块会根据条带数据中的分条数量及校验数据及每个分条数据及校验数据的地址,针对每个分条数据及校验数据分别生成多个子写请求,并将所生成的多个子写请求按照地址并发发送至不同的目的端设备20。具体请参考图2中的步骤S205的描述。If the reliability strategy adopted in the source device 10 is an EC/RAID strategy, the source device 10 will send the striped data to the source network device, and the check data generation module 301 will be the stripe The data generates the verification data. For details, please refer to the description of step S204 in FIG. 2. After the verification data is generated, the concurrency module will generate each stripe data and verification data according to the number of strips and verification data in the striped data, and the address of each stripe data and verification data. Multiple sub-write requests, and the generated multiple sub-write requests are concurrently sent to different destination devices 20 according to addresses. For details, please refer to the description of step S205 in FIG. 2.
如果源端设备10中所采用的可靠性策略多副本策略,则源端设备会将写请求发送至源端设备的网络设备的同时,将多个副本写入的地址发送至源端设备的网络设备,则所述地址锁定模块302会根据源端设备发送的各个副本的地址对目的端设备中的存储单元中的地址进行锁定,以保证多个副本的一致性,具体请参考图2中的步骤S 207的描述。在地址锁定之后,所述并发模块根据写请求中的待写数据即每个副本的地址针对每个副本生成一个子写请求,并将多个子写请求根据子写请求中的地址并行发送至不同的目的端设备20。If the reliability strategy adopted by the source device 10 is a multiple copy strategy, the source device will send the write request to the network device of the source device and at the same time send the addresses written by multiple copies to the network device of the source device. Device, the address locking module 302 will lock the address in the storage unit in the destination device according to the address of each copy sent by the source device to ensure the consistency of multiple copies. For details, please refer to Figure 2 Description of step S207. After the address is locked, the concurrent module generates a sub-write request for each copy according to the data to be written in the write request, that is, the address of each copy, and sends multiple sub-write requests to different addresses in parallel according to the address in the sub-write request. The destination device 20.
当所述网络设备作为目的端设备的网络设备时,则当接收到源端设备的网络设备发送的子写请求时,则所述RDMA模块首先将所述子写请求中的待写数据写入目的端设备的存储单元中,具体请参考图2中的步骤S210的描述。数据写入之后,则由所述写后校验模块305校验所写入的数据是否正确,关于校验的具体描述请参考图2中的步骤S211-S213。When the network device is the network device of the destination device, when receiving a sub-write request sent by the network device of the source device, the RDMA module first writes the data to be written in the sub-write request In the storage unit of the destination device, please refer to the description of step S210 in FIG. 2 for details. After the data is written, the post-write verification module 305 verifies whether the written data is correct. For a detailed description of verification, please refer to steps S211-S213 in FIG. 2.
所述网络设备中的各个模块可以以软件、硬件或二者结合来实现。Each module in the network device can be implemented by software, hardware or a combination of both.
当以上任一模块或单元以软件实现的时候,所述软件以计算机程序指令的方式存在,并被存储在存储器中,处理器可以用于执行所述程序指令以实现以上方法流程。所述处理器可以包括但不限于以下至少一种:中央处理单元(central processing unit,CPU)、微处理器、数字信号处理器(DSP)、微控制器(microcontroller unit,MCU)、或人工智能处理器等各类运行软件的计算设备,每种计算设备可包括一个或多个用于执行软件指令以进行运算或处理的核。该处理器可以是个单独的半导体芯片,也可以跟其他电路一起集成为一个半导体芯片,例如,可以跟其他电路(如编解码电路、硬件加速电路或各种总线和接口电路)构成一个SoC(片上系统),或者也可以作为一个ASIC的内置处理器集成在所述ASIC当中,该集成了处理器的ASIC可以单独封装或者也可以跟其他电路封装在一起。该处理器除了包括用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、PLD(可编程逻辑器件)、或者实现专用逻辑运算的逻辑电路。When any of the above modules or units are implemented by software, the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions to implement the above method flow. The processor may include but is not limited to at least one of the following: a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU), or artificial intelligence Various computing devices such as processors that run software. Each computing device may include one or more cores for executing software instructions to perform operations or processing. The processor can be a single semiconductor chip, or it can be integrated with other circuits to form a semiconductor chip. For example, it can form an SoC (on-chip) with other circuits (such as codec circuits, hardware acceleration circuits, or various bus and interface circuits). System), or it can be integrated into the ASIC as a built-in processor of an ASIC, and the ASIC integrated with the processor can be packaged separately or together with other circuits. In addition to the core used to execute software instructions for calculation or processing, the processor may further include necessary hardware accelerators, such as field programmable gate array (FPGA) and PLD (programmable logic device) , Or a logic circuit that implements dedicated logic operations.
当以上模块以硬件电路实现的时候,所述硬件电路可能以通用CPU(Central processing unit,中央处理器)、MCU(Micro controller Unit,微控制器)、MPU(Micro processing unit,微处理器)、DSP(Digital signal processing,数字信号处理器)、SoC(System on Chip,片上系统)来实现,当然也可以采用专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合,其可以运行必要的软件或不依赖于软件以执行以上方法流程。When the above modules are implemented by hardware circuits, the hardware circuits may be general-purpose CPU (Central processing unit, central processing unit), MCU (Micro controller Unit, microcontroller), MPU (Micro processing unit, microprocessor), DSP (Digital signal processing, digital signal processor), SoC (System on Chip, system-on-chip) to achieve, of course, can also use application-specific integrated circuit (application-specific integrated circuit, ASIC) to achieve, or programmable logic device (programmable logic) device, PLD), the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (generic array logic, GAL) or its In any combination, it can run necessary software or does not rely on software to execute the above method flow.
如图4所示,为本发明一实施例中所述网络设备的硬件架构图。所述网络设备包括处理器410、存储器420、接口430及RDMA芯片。所述存储器中存储有程序指令,当图3中的校验数据生成模块301、地址锁定模块302、并发模块303、及写后校验模块305以软件实现的时候,则上述模块对应的程序存储在存储器420中,由处理器410调用执行这些程序以实现校验数据生成模块301、地址锁定模块302、并发模块303、及写后校验模块305所执行的功能。所述接口430用于从源端设备或者目的端设备接收数据或者发送数据至源端设备或者目的端设备。所述RDMA芯片用于将从源端设备接收的数据直接写入目的端设备的存储单元中。所述处理器可以包括但不限于以下至少一种:中央处理单元(central processing unit,处理单元)、微处理器、数字信号处理器(DSP)、微控制器(microcontroller unit,MCU)、或人工智能处理器等各类运行软件的计算设备,每种计算设备可包括一个或多个用于执行软件指令以进行运算或处理的核。As shown in FIG. 4, it is a hardware architecture diagram of the network device in an embodiment of the present invention. The network device includes a processor 410, a memory 420, an interface 430, and an RDMA chip. Program instructions are stored in the memory. When the verification data generation module 301, address locking module 302, concurrency module 303, and post-write verification module 305 in FIG. 3 are implemented in software, the programs corresponding to the above modules are stored In the memory 420, these programs are called and executed by the processor 410 to implement the functions performed by the verification data generation module 301, the address locking module 302, the concurrency module 303, and the post-write verification module 305. The interface 430 is used to receive data from the source device or the destination device or send data to the source device or the destination device. The RDMA chip is used to directly write the data received from the source device into the storage unit of the destination device. The processor may include but is not limited to at least one of the following: central processing unit (central processing unit, processing unit), microprocessor, digital signal processor (DSP), microcontroller (microcontroller unit, MCU), or manual Various computing devices such as smart processors that run software. Each computing device may include one or more cores for executing software instructions to perform operations or processing.
图3中的校验数据生成模块301、地址锁定模块302、并发模块303、及写后校验模块305也可以通过硬件实现,当通过硬件实现时,这些模块作为网络设备的芯片中的硬件模块集成在所述网络设备的芯片中。在通过硬件实现时,网络设备的处理器为专用集 成电路(application specific integrated circuit,ASIC),也可是一个独立的半导体芯片。该处理器内处理用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、PLD(可编程逻辑器件)、或者实现专用逻辑运算的逻辑电路。The verification data generation module 301, address lock module 302, concurrency module 303, and post-write verification module 305 in FIG. 3 can also be implemented by hardware. When implemented by hardware, these modules serve as hardware modules in the chip of the network device. Integrated in the chip of the network device. When implemented by hardware, the processor of the network device is an application specific integrated circuit (ASIC) or an independent semiconductor chip. The processor's internal processing is used to execute software instructions to perform calculations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA), PLD (programmable logic device) , Or a logic circuit that implements dedicated logic operations.
在本发明另一实施例中,在应用与源端设备与目的端设备之间的数据迁移的时候,则只需要执行步骤S209至S213。在步骤S209中,所发送的写请求中携带的数据为需要从源端设备10迁移至目的端设备20的数据。而其他步骤与S210与S213相同,在此不再赘述。In another embodiment of the present invention, when data is migrated between the application and the source device and the destination device, only steps S209 to S213 need to be performed. In step S209, the data carried in the sent write request is data that needs to be migrated from the source device 10 to the destination device 20. The other steps are the same as S210 and S213, and will not be repeated here.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. It should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (11)

  1. 一种网络设备,其特征在于,所述网络设备被配置为目的端网络设备以用于执行:A network device, characterized in that the network device is configured as a destination network device for executing:
    接收来自于源端设备的写请求,执行所述写请求以将所述写请求中的待写数据写入目的端设备;Receiving a write request from a source device, and executing the write request to write the data to be written in the write request to the destination device;
    在所述待写数据写入所述目的端设备后,从所述目的端设备中读取已写入数据;After the data to be written is written into the destination device, read the written data from the destination device;
    将所述写请求中的待写数据与所述读取的已写入数据进行比较;Comparing the data to be written in the write request with the read written data;
    当所述待写数据与所述读取的已写入数据相同,返回写成功响应给所述源端设备;When the data to be written is the same as the read data that has been written, returning a write success response to the source device;
    当所述待写数据与所述读取的已写入数据不同,返回写失败响应给所述源端设备。When the to-be-written data is different from the read-written data, a write failure response is returned to the source device.
  2. 如权利要求1所述的网络设备,其特征在于,所述网络设备被配置为目的端网络设备还用于执行:The network device according to claim 1, wherein the network device is configured as a destination network device and is further configured to perform:
    在接收所述源端设备的写请求之后,缓存所述写请求中的待写数据。After receiving the write request of the source-end device, buffer the data to be written in the write request.
  3. 如权利要求1或2所述的网络设备,其特征在于,所述网络设备被配置为目的端网络设备以用于具体执行:The network device according to claim 1 or 2, wherein the network device is configured as a destination network device for specific execution:
    解析所述写请求,获得所述待写数据以及所述待写数据的目的地址;Parsing the write request to obtain the data to be written and the destination address of the data to be written;
    将所述待写数据写入所述目的端设备中以所述目的地址标识的存储空间中。The data to be written is written into the storage space identified by the destination address in the destination device.
  4. 如权利要求1-3任一所述的网络设备,其特征在于,所述网络设备被配置为源端网络设备以用于执行:The network device according to any one of claims 1-3, wherein the network device is configured as a source-end network device for executing:
    接收所述源端设备发送的包含至少两个分条数据的条带数据,所述条带数据包含的至少两个分条数据的写入地址及至少一个校验数据的写入地址;Receiving striped data containing at least two strips of data sent by the source device, writing addresses of at least two strips of data and writing addresses of at least one verification data included in the striped data;
    根据所述至少两个分条数据生成所述至少一个校验数据;Generating the at least one verification data according to the at least two pieces of data;
    针对每一个分条数据,根据所述分条数据以及所述分条数据的写入地址生成一个写请求;For each striped data, generate a write request according to the striped data and the write address of the striped data;
    针对每一个校验数据,根据所述校验数据以及所述校验数据的写入地址生成一个写请求;For each check data, generate a write request according to the check data and the write address of the check data;
    将所述生成的每个写请求分别发送至一个目的端设备。Each of the generated write requests is sent to a destination device respectively.
  5. 如权利要求1-3任一所述的网络设备,其特征在于,所述网络设备被配置为源端网络设备以用于执行:The network device according to any one of claims 1-3, wherein the network device is configured as a source-end network device for executing:
    接收所述源端设备发送的待写数据,及存储所述待写数据的至少一个逻辑副本的至少一个逻辑地址;Receiving the data to be written sent by the source device, and at least one logical address storing at least one logical copy of the data to be written;
    锁定所述至少一个逻辑地址;Lock the at least one logical address;
    针对每一个逻辑地址,根据所述待写数据及所述逻辑地址生成一个写请求;For each logical address, generate a write request according to the data to be written and the logical address;
    将所述生成的至少一个写请求中的每个写请求分别发送至一个目标网络设备。Each of the generated at least one write request is sent to a target network device respectively.
  6. 如权利要求4或5所述网络设备,其特征在于,所述网络设备被配置为源端网络设备以用于:The network device according to claim 4 or 5, wherein the network device is configured as a source-end network device for:
    并行将所述生成的至少一个写请求中的每个写请求分别发送至一个目标网络设备。Each of the generated at least one write request is sent to a target network device in parallel.
  7. 一种数据传输的方法,其特征在于,所述方法包括:A method of data transmission, characterized in that the method includes:
    接收来自于源端设备的写请求,执行所述写请求以将所述写请求中的待写数据写入目的端设备;Receiving a write request from a source device, and executing the write request to write the data to be written in the write request to the destination device;
    在所述待写数据写入所述目的端设备后,从所述目的端设备中读取已写入数据;After the data to be written is written into the destination device, read the written data from the destination device;
    将所述写请求中的待写数据与所读取的已写入数据比较;Comparing the data to be written in the write request with the read data that has been written;
    当所述待写数据与所述读取的已写入数据相同,返回写成功响应给所述源端设备;When the data to be written is the same as the read data that has been written, returning a write success response to the source device;
    当所述待写数据与所述读取的已写入数据不同,返回写失败响应给所述源端设备。When the to-be-written data is different from the read-written data, a write failure response is returned to the source device.
  8. 如权利要求7所述的数据传输方法,其特征在于,所述方法还包括:8. The data transmission method according to claim 7, wherein the method further comprises:
    在所述接收写请求后,缓存所述写请求中的待写数据。After the write request is received, the data to be written in the write request is cached.
  9. 如权利要求7或8所述的数据传输方法,其特征在于,所述写请求还包括所述待写数据的目的地址,所述执行所述写请求以将所述写请求中的待写数据写入目的端设备包括:The data transmission method according to claim 7 or 8, wherein the write request further includes the destination address of the data to be written, and the execution of the write request is to transfer the data to be written in the write request. The writing destination device includes:
    将所述写请求中的待写数据,按照所述写请求中的目的地址写入所述目的端设备的存储单元。The data to be written in the write request is written into the storage unit of the destination device according to the destination address in the write request.
  10. 一种网络系统,包括:源端设备、目的端设备、以及和所述目的端设备相连的如权利要求1-3任一项所述的目的端网络设备。A network system, comprising: a source device, a destination device, and the destination network device according to any one of claims 1-3 connected to the destination device.
  11. 一种芯片,包括接口及处理器,所述接口用于为所述处理器提供输入及输出数据;A chip includes an interface and a processor, and the interface is used to provide input and output data for the processor;
    所述处理器,用于执行权利要求7-9任一项所述的数据传输的方法。The processor is configured to execute the data transmission method according to any one of claims 7-9.
PCT/CN2020/127446 2019-11-07 2020-11-09 Data transmission method, network device, network system and chip WO2021089036A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911083122.0A CN112788079A (en) 2019-11-07 2019-11-07 Data transmission method, network equipment, network system and chip
CN201911083122.0 2019-11-07

Publications (1)

Publication Number Publication Date
WO2021089036A1 true WO2021089036A1 (en) 2021-05-14

Family

ID=75747961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127446 WO2021089036A1 (en) 2019-11-07 2020-11-09 Data transmission method, network device, network system and chip

Country Status (2)

Country Link
CN (1) CN112788079A (en)
WO (1) WO2021089036A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672431A (en) * 2021-07-29 2021-11-19 济南浪潮数据技术有限公司 Optimization method and device for acceleration chip erasure code plug-in for realizing distributed storage
WO2023016456A1 (en) * 2021-08-09 2023-02-16 华为技术有限公司 Data sending method, network card and computing device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115729444A (en) * 2021-09-01 2023-03-03 华为技术有限公司 Data processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040123013A1 (en) * 2002-12-19 2004-06-24 Clayton Shawn Adam Direct memory access controller system
CN101577716A (en) * 2009-06-10 2009-11-11 中国科学院计算技术研究所 Distributed storage method and system based on InfiniBand network
CN103034559A (en) * 2012-12-18 2013-04-10 无锡众志和达存储技术股份有限公司 PQ (Parity Qualification) inspection module and inspection method based on RDMA (Remote Direct Memory Access) architecture design
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN105487937A (en) * 2015-11-27 2016-04-13 华为技术有限公司 RDMA (Remote Direct Memory Access) implementation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040123013A1 (en) * 2002-12-19 2004-06-24 Clayton Shawn Adam Direct memory access controller system
CN101577716A (en) * 2009-06-10 2009-11-11 中国科学院计算技术研究所 Distributed storage method and system based on InfiniBand network
CN103034559A (en) * 2012-12-18 2013-04-10 无锡众志和达存储技术股份有限公司 PQ (Parity Qualification) inspection module and inspection method based on RDMA (Remote Direct Memory Access) architecture design
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN105487937A (en) * 2015-11-27 2016-04-13 华为技术有限公司 RDMA (Remote Direct Memory Access) implementation method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672431A (en) * 2021-07-29 2021-11-19 济南浪潮数据技术有限公司 Optimization method and device for acceleration chip erasure code plug-in for realizing distributed storage
CN113672431B (en) * 2021-07-29 2023-12-22 济南浪潮数据技术有限公司 Optimization method and device for realizing acceleration chip erasure code plug-in of distributed storage
WO2023016456A1 (en) * 2021-08-09 2023-02-16 华为技术有限公司 Data sending method, network card and computing device

Also Published As

Publication number Publication date
CN112788079A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US11604746B2 (en) Presentation of direct accessed storage under a logical drive model
WO2021089036A1 (en) Data transmission method, network device, network system and chip
US9619389B1 (en) System for a backward and forward application environment compatible distributed shared coherent storage
CN106021147B (en) Storage device exhibiting direct access under logical drive model
US10387307B2 (en) Lock-free raid implementation in multi-queue architecture
US11593000B2 (en) Data processing method and apparatus
CN112912851B (en) System and method for addressing, and media controller
US11288183B2 (en) Operating method of memory system and host recovering data with write error
CN113687977B (en) Data processing device for improving computing performance based on RAID controller
US20240045607A1 (en) Simplified raid implementation for byte-addressable memory
US20240086113A1 (en) Synchronous write method and device, storage system and electronic device
WO2023020136A1 (en) Data storage method and apparatus in storage system
CN110121874B (en) Memory data replacement method, server node and data storage system
US11256439B2 (en) System and method for parallel journaling in a storage cluster
CN115793957A (en) Method and device for writing data and computer storage medium
US10437471B2 (en) Method and system for allocating and managing storage in a raid storage system
US20170090823A1 (en) Storage system, control device, memory device, data access method, and program recording medium
US20200319819A1 (en) Method and Apparatus for Improving Parity Redundant Array of Independent Drives Write Latency in NVMe Devices
US11630734B2 (en) Scale-out storage system and storage control method
WO2024040919A1 (en) Data recovery method and storage device
EP4086774A1 (en) Coherent memory system
WO2022246727A1 (en) Data processing apparatus and data processing method
CN117666931A (en) Data processing method and related equipment
CN115809011A (en) Data reconstruction method and device in storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20884496

Country of ref document: EP

Kind code of ref document: A1