CN104317716B - Data transmission method and distributed node equipment between distributed node - Google Patents

Data transmission method and distributed node equipment between distributed node Download PDF

Info

Publication number
CN104317716B
CN104317716B CN201410598754.1A CN201410598754A CN104317716B CN 104317716 B CN104317716 B CN 104317716B CN 201410598754 A CN201410598754 A CN 201410598754A CN 104317716 B CN104317716 B CN 104317716B
Authority
CN
China
Prior art keywords
data
written
node
cpu
destination node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410598754.1A
Other languages
Chinese (zh)
Other versions
CN104317716A (en
Inventor
熊四兵
倪小珂
李显才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410598754.1A priority Critical patent/CN104317716B/en
Publication of CN104317716A publication Critical patent/CN104317716A/en
Application granted granted Critical
Publication of CN104317716B publication Critical patent/CN104317716B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the data transmission method between a kind of distributed node and distributed node equipment, on the premise of the cpu resource taken is reduced as far as possible, to improve the security of data transfer.This method is:The CPU of source node obtains data to be written and verification data, the verification data is identical with the data division of the setting length from the data tip forward to be written in the data to be written, is write the data to be written in the nonvolatile memory of the destination node by the network I/O device of destination node;The data of setting length in the data that the CPU of source node obtained by the network I/O device of destination node in the nonvolatile memory of destination node, had been written into, this sets the data of length as the data of the setting length from the data tip forward having been written into;When the CPU of source node determines that the data of the setting length obtained from destination node are identical with the verification data, data backup success is determined.

Description

Data transmission method and distributed node equipment between distributed node
Technical field
The present invention relates to the data transmission method between field of computer technology, more particularly to a kind of distributed node and distribution Formula node device.
Background technology
At present, in distributed memory system, data distribution is in multiple distributed nodes, in order to ensure the reliability of data, Data use the preserving type of many copies, i.e., a data need to be stored in multiple distributed nodes, and this results in distributed section Need frequently to carry out substantial amounts of data copy between point.In distributed memory system as shown in Figure 1, by distributed node A The copies of data preserve into distributed node C, the copy of the data in distributed node B is preserved to distributed node D In.
In order to reach the handling capacity required by high-end storage and the requirement of low delay, common Ethernet is carrying out data transmission When, transmitting terminal and receiving terminal are required to central processing unit (CPU) and participate in moving for data, and need operating system in kernel state and Frequent context switching is carried out between User space.Infinite bandwidth (Infiniband, abbreviation IB) network can then provide long-range straight Connect data acquisition (RDMA) technology, using RDMA technologies during transmitting terminal and receiving terminal carry out data transmission without CPU's Participate in, greatly improve the speed of data transfer.
RDMA technologies allow data to be transmitted directly to another calculator memory from a calculator memory, and general principle is such as Shown in Fig. 2, CPU need not participate in data transmission procedure (step 2 and step 4 in Fig. 2), it is only necessary to participate in the control of data transfer early stage System (step 1 in Fig. 2) and data be transmitted after notification procedure (step 5 in Fig. 2).
RDMA mainly has following several traffic operations among the nodes:(Send) operation is sent, (Write) operation is write, reads (Read) operation and atom (Atomic) operation.The data transfer of each traffic operation is completed by IB hardware, data send and Main frame need not be participated in DRP data reception process, but different in the processing after initiating data transfer and completing data transfer, It is specific as shown in Figure 3.
Wherein, send operation to perform according to step 1~4, send in operation and initiating data transmission procedure and receiving data Need before transmitting terminal and receiving terminal main frame participate in consult, and send data and receive data complete when, transmitting terminal and Receiving terminal is required for a notification message (Notify) to notify the application in main frame;
Write operation is performed according to step 1~4, and the main frame of receiving terminal need not carry out any operation, that is to say, that receiving terminal master Machine needs not participate in data transmission procedure, the notice after being transmitted without the negotiation before participating in data transfer and data Process, and send the notification procedure after notice and data before end main frame needs to participate in data transmission are sent completely;
Read operation is performed according to step 1~4, and receiving terminal reads data from transmitting terminal, and transmitting terminal need not participate in whole data Transmitting procedure, the notification procedure after being completed without the notice and digital independent before participating in digital independent, and receiving terminal is needed Initiate read operation, and need after digital independent completion to notify main frame;
Atom (Atomic) operation is performed according to step 1~3, and one end needs to transmit 64 bits (bite) in atomic operation Data are operated to opposite end, and to the 64bite data of opposite end fixed position, and for example atom increase, atom reduction or atom are replaced The operation changed, now receiving terminal is without supplemental characteristic transmitting procedure, and transmitting terminal needs to carry out notice operation.
It is general to use as shown in Figure 4 when transmission data are needed in existing distributed system, between distributed node Flow, it is specific as follows:
Transmitting terminal prepares caching (Buff) information (address of such as buffer area, length, authority) for needing to send, and will Buff information is sent to receiving terminal by RDMA transmission operation, and hardware notification application message has been successfully transmitted;
After the notified message of receiving terminal hardware, notify local terminal is applied to initiate a RDMA according to the Buff information of transmitting terminal Read operation, receiving terminal hardware completes read operation, by the internal memory of the digital independent in the internal memory of transmitting terminal to local terminal, and leads to Know receiving terminal application;
Receiving terminal application is received after notification message, is determined that data have been successfully written nonvolatile memory, is sent The response message of data is read to transmitting terminal;
Transmitting terminal hardware acceptance is to the application of local terminal after response message, is notified, local terminal application is obtained after the response message really Determine opposite end and successfully read data, client input and output can be notified to complete.
During being somebody's turn to do, receiving terminal CPU needs to participate in control and the notice of data transfer, is needed between receiving terminal and transmitting terminal Data transmission procedure could be completed by carrying out repeatedly interaction, and the interaction can not determine whether data back up to receiving terminal completely Nonvolatile memory, if in the case of data backup is incomplete, system initiates follow-up data handling procedure, can shadow The security of acoustic system.
The content of the invention
The embodiment of the present invention provides the data transmission method and distributed node equipment between a kind of distributed node, to On the premise of reducing the cpu resource taken, the security of data transfer is improved.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
First aspect there is provided the data transmission method between a kind of distributed node, including:
The central processor CPU of source node obtains data to be written and for verifying whether the data to be written back up Complete verification data, the verification data and the setting from the data tip forward to be written in the data to be written The data division of length is identical;
The CPU of the source node exports I/O device by the data write-in institute to be written by the network inputs of destination node In the nonvolatile memory for stating destination node;
The CPU of the source node obtains the non-volatile of the destination node by the network I/O device of the destination node In memory, in the data that have been written into, setting length data, the data for setting length is have been written into from described Data tip forward setting length data;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node According to it is identical when, determine data backup success.
With reference in a first aspect, in the first possible implementation, the central processor CPU of the source node is treated Write data and for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written and default is used for whether verifying the initial data Backup is complete, marker bit with setting length, increases the marker bit in the initial data end to be written, will increase Plus the data obtained after the marker bit regard the marker bit as the verification data as the data to be written;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node According to it is identical when, determine data backup success, including:
The CPU of the source node is set described in determining byte number that the marker bit includes and being obtained from the destination node The byte number that the packet of measured length contains is identical, and each word of the data of the setting length obtained from the destination node The content of the content of section byte corresponding with the same position in the marker bit is identical, determines data backup success.
With reference in a first aspect, in second of possible implementation, the CPU of the source node obtain data to be written with And for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written, and the initial data to be written is treated as described Data are write, by the setting length from the initial data tip forward to be written in the initial data to be written Data division is used as the verification data;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node According to it is identical when, determine data backup success, including:
The CPU of the source node is set described in determining byte number that the marker bit includes and being obtained from the destination node The byte number that the packet of measured length contains is identical, and each word of the data of the setting length obtained from the destination node The content of the content of section byte corresponding with the same position in the verification data is identical, determines data backup success.
With reference to any one of first aspect into second of possible implementation, in the third possible implementation In, the CPU of the source node obtains the non-volatile memories of the destination node by the network I/O device of the destination node In the data being had been written into device, setting length data, the data for setting length is from the numbers having been written into According to the data of the setting length of tip forward, including:
Atomic operation in remote direct data acquisition modes or read operation are sent to described by the CPU of the source node The network I/O device of source node, the atomic operation or the read operation be used to indicating obtaining it is in the data that have been written into, from The data of the setting length of the data tip forward having been written into;
The atomic operation or the read operation are transferred to the destination node by the network I/O device of the source node Network I/O device;
It is that the network I/O device that the CPU of the source node receives the destination node is sent, according to the atomic operation or It is in the data being had been written into nonvolatile memory that the read operation is obtained, the destination node, from it is described The data of the setting length of the data tip forward of write-in.
With reference to any one of first aspect into second of possible implementation, in the 4th kind of possible implementation In, the CPU of the source node exports I/O device by the data write-in to be written mesh by the network inputs of destination node Node nonvolatile memory in, including:
The data to be written are transferred to the network I/O device of the source node by the CPU of the source node;
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of the source node, The data to be written are write to the nonvolatile memory of the destination node by the network I/O device of the destination node.
Second aspect, the invention provides a kind of distributed node equipment, including central processor CPU and memory;
The program that the CPU is used to read in memory performs following steps:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the school Test data identical with the data division of the setting length from the data tip forward to be written in the data to be written, lead to The network inputs output I/O device for crossing destination node writes the data to be written the non-volatile memories of the destination node In device, obtained in the nonvolatile memory of the destination node, had been written into by the network I/O device of the destination node Data in, the data of setting length, the data for setting length is from the data tip forwards having been written into The data of length are set, it is determined that the data of the setting length obtained from the destination node are identical with the verification data When, determine data backup success.
With reference to second aspect, in the first possible implementation, in addition to internal memory, to be written for preserving is original Data and the default marker bit for being used to verify whether the initial data backs up completely, with setting length;
The CPU specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, described to be written original Data end increases the marker bit, will increase the data obtained after the marker bit as the data to be written, will be described Marker bit is used as the verification data;
After the data for obtaining the setting length, determine byte number that the marker bit includes with being obtained from the destination node The byte number that the packet of the setting length taken contains is identical, and the number of the setting length obtained from the destination node According to each byte content byte corresponding with the same position in the marker bit content it is identical, determine data backup Success.
With reference to second aspect, in second of possible implementation, in addition to internal memory, to be written for preserving is original Data;
The CPU specifically for:
The initial data to be written is obtained from the internal memory, the initial data to be written is treated as described Data are write, the setting length from the initial data tip forward to be written in the initial data to be written is obtained Data division be set to the verification data;
After the data for obtaining the setting length, determine byte number that the verification data includes with from the destination node The byte number that the packet of the setting length obtained contains is identical, and the setting length obtained from the destination node The content of the content of each byte of data byte corresponding with the same position in the verification data is identical, determines data Back up successfully.
With reference to any one of second aspect into second of possible implementation, in the third possible implementation In, the CPU specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the distributed node equipment Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from it is described The data of the setting length of data tip forward through write-in, by the network I/O device of the distributed node equipment by institute State atomic operation or the read operation is transferred to the network I/O device of the destination node;
Receive it is that the network I/O device of the destination node is sent, obtained according to the atomic operation or the read operation , it is in the data being had been written into the nonvolatile memory of the destination node, from the data end having been written into The data of the setting length forward.
With reference to any one of second aspect into second of possible implementation, in the 4th kind of possible implementation In, the CPU specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node The data to be written are transferred to the network I/O device of the destination node by the network I/O device of equipment, by the destination node Network I/O device the data to be written are write to the nonvolatile memory of the destination node.
Based on above-mentioned technical proposal, in the embodiment of the present invention, the CPU of source node is straight by the network I/O device of destination node The nonvolatile memory of operation destination node is connect, the CPU without destination node is participated in, the data to be written are write into purpose section In the nonvolatile memory of point, the CPU of source node directly operates the non-of destination node by the network I/O device of destination node Volatile memory, the CPU without destination node is participated in, the number being had been written into the nonvolatile memory for obtaining destination node The data of setting length in, from the data tip forward having been written into, by by the number of the setting length of acquisition Compared according to the verification data with the CPU of source node acquisitions, determine whether data back up completely, whole data transmission procedure is saved by source Point is unidirectionally controlled, and the CPU without destination node is participated in, on the premise of the cpu resource taken is reduced as far as possible, improves data transfer Security.
Brief description of the drawings
Fig. 1 is existing distributed memory system configuration diagram;
Fig. 2 is existing RDMA principle schematics;
Fig. 3 is existing various RDMA traffic operations schematic diagrames;
Fig. 4 is the data transfer flow schematic diagram between existing distributed node;
Fig. 5 is data transmission method schematic flow sheet in distributed system in the embodiment of the present invention;
Fig. 6 is data transmission method schematic flow sheet between main-standby nodes in the embodiment of the present invention;
The process schematic of data is preserved during Fig. 7 is implemented for the present invention between main-standby nodes;
Client preserves the method flow schematic diagram of data in host node and slave node in Fig. 8 embodiment of the present invention;
Fig. 9 preserves data procedures schematic diagram for client in the embodiment of the present invention in host node and slave node;
Figure 10 is the structural representation of distributed node equipment in the embodiment of the present invention.
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into One step it is described in detail, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole implementation Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made All other embodiment, belongs to the scope of protection of the invention.
In following examples, the host node of each preservation data in distributed system preassigns a slave node, The slave node is used for the copy for preserving the data that the host node is preserved.Memory requirement meets difference to data in a distributed system Preserved successfully on host node and slave node.
In following embodiment, the nonvolatile memory satisfaction of distributed node can be exported (IO) by network inputs and set It is standby directly to be accessed by network interface.For example, the nonvolatile memory (such as PCIe NVRAM) of distributed node can be adopted Directly accessed with the network equipment (such as Infiniband, RoCE equipment) of PCIe interface.
In the embodiment of the present invention, as shown in figure 5, the method detailed flow of data transfer is as follows in distributed system:
Step 501:The CPU of source node obtains data to be written and for verifying whether the data to be written back up completely Verification data, the data of the setting length from the data tip forward to be written in the verification data and the data to be written Part is identical.
Wherein, the CPU of source node obtains data to be written and verification data, including but not limited to following two specific realities Existing mode:
The first, the CPU of source node obtains initial data to be written and default is used to verify that the initial data is No backup is complete, marker bit with setting length, increases the marker bit in the initial data end to be written, will increase The data obtained after the marker bit regard the marker bit as verification data as the data to be written.
Wherein, the length of marker bit is predesignated, and the value of each bit of marker bit is also to pre-set.Example Such as, the length for preassigning marker bit is 64 bits.
Second, the CPU of source node obtains initial data to be written, and the initial data to be written is treated as described Data are write, the data from the setting length of the initial data tip forward to be written in the initial data to be written are obtained Part is used as verification data.
Specifically, metadata and other data in addition to metadata, the metadata are included in the data to be written of acquisition Structure for describing other data.
Step 502:The data to be written are write the purpose section by the CPU of source node by the network I/O device of destination node In the nonvolatile memory of point.
Specifically, the data to be written are transferred to the network I/O device of source node by the CPU of source node, by source node The data to be written are transferred to the network I/O device of destination node by network I/O device, will by the network I/O device of destination node The data to be written write the nonvolatile memory of destination node.
Step 503:The CPU of source node obtains the non-volatile memories of destination node by the network I/O device of destination node In device, in the data that have been written into, setting length data, this sets the data of length as from the data having been written into The data of the setting length of tip forward.
Specifically, the atomic operation in remote direct data acquisition modes or read operation are sent to by the CPU of source node The network I/O device of source node, the atomic operation or the read operation be used to indicate to obtain in the data that have been written into from this The data of the setting length of the data tip forward of write-in, by the network I/O device of source node by the atomic operation or the read operation It is transferred to the network I/O device of destination node;
It is that the network I/O device that the CPU of source node receives destination node is sent, obtained according to the atomic operation or the read operation Data tip forward in the data being had been written into nonvolatile memory take, destination node, having been written into from this The setting length data.
Specifically, the setting length is obtained from the nonvolatile memory of destination node according to the length of verification data Data, that is, the length of the data of the setting length obtained is identical with the length of verification data.
Wherein, the read operation in remote direct data acquisition modes is defined as:The CPU of source node initiates to read the finger of data Order, can carry the feature for the data for needing to obtain, for example, data length, data storage location in the instruction of the reading data Configured information etc.;Network I/O device of the instruction through source node is transferred to the network I/O device of destination node, destination node The data that network I/O device is obtained according to indicated by the instruction is directly obtained from the nonvolatile memory of purpose end equipment, it is whole Individual process is participated in without the CPU of destination node.
Wherein, the process of data and the execution of read operation are read in the atomic operation in remote direct data acquisition modes Cheng Xiangtong, difference is that the whole implementation procedure of atomic operation will not be interrupted.
Step 504:The CPU of source node determines the data and the check number of the setting length obtained from destination node According to it is identical when, determine data backup success.
Preferably, the data for the setting length that the CPU of source node is obtained from the nonvolatile memory of destination node Afterwards, determine that the byte number that the marker bit is included is identical with the byte number that the packet of the setting length contains, and the setting The content of the content of each byte of the data of length byte corresponding with the same position in the marker bit is identical, it is determined that Data backup success.
In the embodiment, the CPU of source node directly operates the non-easy of destination node by the network I/O device of destination node The property lost memory, the CPU without destination node is participated in, and the data to be written are write to the nonvolatile memory of destination node In, the CPU of source node directly operates the nonvolatile memory of destination node by the network I/O device of destination node, without mesh The CPU of node participate in, obtain being had been written into from this in the data that have been written into the nonvolatile memory of destination node Data tip forward setting length data, by by the CPU of the data of the setting length of acquisition and source node acquisition Verification data compare, determine whether data back up completely, whole data transmission procedure is unidirectionally controlled by source node, without purpose The CPU of node is participated in, on the premise of the cpu resource taken is reduced as far as possible, improves the security of data transfer.
In the embodiment of the present invention, source node can be host node, and destination node can be slave node;Source node can also be Client, destination node is host node or slave node.
Different below according to data backup application scenarios are carried out specifically there is provided two embodiments to the process of data transfer Explanation.
In first embodiment, as shown in fig. 6, the method detailed carried out data transmission in distributed system between main-standby nodes Flow is as follows:
Step 601:The CPU of host node obtains data to be written and for verifying whether the data to be written back up completely Verification data, the data of the setting length from the data tip forward to be written in the verification data and the data to be written Part is identical, and being written into data by the network I/O device of slave node writes in the nonvolatile memory of slave node.
In a detailed embodiment, the CPU of host node receives the request of write-in data, according to asking for the write-in data Ask and obtain initial data to be written.For example, the CPU of host node can obtain original to be written from the request of the write-in data Beginning data, or, the CPU of host node can obtain to be written original according to the storage location indicated by the request of write-in data Data.
Specifically, obtain after initial data to be written, there are following two embodiments:
The first, the CPU of host node will obtain initial data to be written as data to be written, obtain described to be written Initial data in the data division of the setting length from the initial data tip forward to be written be used as verification data;
Second, initial data to be written that the CPU of host node is obtained and default it is used to verify the original number According to whether backing up after marker bit complete, with setting length, the marker bit is increased in the initial data end to be written, The data obtained after the marker bit will be increased as data to be written, the marker bit is regard as verification data.
For example, increasing the marker bit that default length is 64 bits in initial data end to be written.
In one specific implementation, the CPU of host node, which is obtained, can deposit rising for data in the nonvolatile memory of itself Beginning position, regard original position as the first data deposit position in the nonvolatile memory of the host node;The CPU of host node The original position of data can be deposited in the nonvolatile memory for obtaining slave node, the original position is regard as the slave node The second data deposit position in nonvolatile memory.
Wherein, nonvolatile memory can be any one have can persist, power off the storage for not losing characteristic Medium, for example, NVDIMM, NVRAM, SSD etc..
Step 602:The CPU of host node obtains the nonvolatile memory of slave node by the network I/O device of slave node In, the data of setting length in the data that have been written into, this sets the data of length as the data end that has been written into from this The data of setting length forward.
Specifically, the CPU of host node obtains the storage size that data to be written take, and passes through the network I/O of slave node The storage that equipment directly obtained in the nonvolatile memory of slave node, determined using the second data deposit position as original position is empty Between size data in, from the data of the setting length of the data tip forward.
Specifically, the CPU of host node is by the atomic operation in remote direct data acquisition modes or read operation, without The CPU of slave node is participated in, and is directly obtained via the network I/O device of slave node from the nonvolatile memory of slave node with the In the data for the storage size that two data deposit positions determine for original position, setting from the data tip forward The data of measured length.Wherein, the advantage of atomic operation is that the access that can ensure host node is not influenceed by slave node.
Wherein, the caching that atomic operation can refresh on transmission channel (such as PCIe paths), it is ensured that on transmission channel The data cached nonvolatile memory for not influenceing data safety to be written to reach slave node.
Step 603:Host node CPU determines the data and the verification data phase of the setting length obtained from slave node Meanwhile, determine data backup success.
Specifically, when the CPU of host node determines that the data of the setting length obtained from slave node and marker bit are differed, Into abnormality processing.
In the embodiment, the CPU of host node directly operates the non-volatile of slave node by the network I/O device of slave node Memory, the CPU without slave node is participated in, and the data to be written is write in the nonvolatile memory of slave node, host node CPU directly operate the nonvolatile memory of slave node to obtain the nonvolatile memory by the network I/O device of slave node From the data of the setting length of the data tip forward having been written into the data having been written into, obtained by comparing from slave node The data and verification data of the setting length taken, determine whether data back up completely, and whole data transmission procedure is by host node Unidirectionally controlled, the CPU without slave node is participated in, on the premise of the cpu resource taken is reduced as far as possible, and whether checking data back up Completely, the security of data transfer is improved.
Below in conjunction with shown in Fig. 7, the data transmission procedure provided by a specific embodiment first embodiment is carried out Explanation.
Step 1:Host node A CPU increases marker bit in the initial data end of acquisition, will increase what is obtained after marker bit Data are used as data to be written.
Step 2:Data to be written in self EMS memory (memory) are write the non-volatile of itself by host node A CPU In memory (NVRAM);Infiniband (IB) hardware of host node A network I/O device is through PCIe paths from the internal memory of itself (memory) data to be written are obtained in, the data to be written are sent to the Infiniband of slave node B network I/O device (IB) data to be written are write standby section by hardware, the Infiniband hardware of slave node B network I/O device through PCIe paths Point B nonvolatile memory (NVRAM).
Step 3:Host node A CPU is set by network I/O of the IB hardware through slave node B of host node A network I/O device Standby IB hardware, from the slave node B nonvolatile memory obtaining step 2 preserve data in from the data end to The preceding data consistent with mark bit length, are preserved into host node A internal memory.
Wherein, the operation of step 3 can immediately be carried out after the operation triggering of step 2, be write with the data in step 2 It is unrelated whether process completes.As long as step 3 is returned successfully, then it is successful for may insure step 2.
Step 4:The data consistent with mark bit length that host node A CPU obtains step 3 are compared with marker bit Compared with judging whether successfully Backup Data, if both differ, illustrate that data transmission procedure occurs abnormal, it is necessary into different Often processing such as re-starts transmission;If both are identical, illustrate data transfer success, follow-up operation can be carried out.
In second embodiment, as shown in figure 8, client is to the detailed of host node and slave node Backup Data in distributed system Thin method flow is as follows:
Step 801:The CPU of client obtains data to be written and for verifying whether the data to be written have backed up The setting from the data tip forward to be written in full verification data, the verification data and the data to be written is long The data division of degree is identical, by the network I/O device of host node, is written into data according to the first data deposit position direct Write the nonvolatile memory of host node.
In a detailed embodiment, the CPU of client receives the request of write-in data, according to asking for the write-in data Ask and obtain initial data to be written.For example, the CPU of client can obtain original to be written from the request of the write-in data Beginning data, or, the CPU of client can obtain to be written original according to the storage location indicated by the request of write-in data Data.
Specifically, obtain after initial data to be written, there are following two embodiments:
The first, the CPU of client will obtain initial data to be written as data to be written, obtain this to be written The data division of the setting length from the initial data tip forward to be written in initial data is used as verification data;
Second, the CPU of client obtains initial data to be written and pre- in the initial data to be written of acquisition If be used for verify the initial data whether back up completely, with set length marker bit, in the original to be written Beginning data end increases marker bit, will increase the data obtained after the marker bit as data to be written, the marker bit is made For the verification data.
Specifically, the CPU of client, which is obtained in the nonvolatile memory of host node, can deposit the original position of data, It regard original position as the first data deposit position information in the nonvolatile memory of the host node.
Wherein, the CPU of client is written into the network I/O device that data are transferred to client, by the network I/O of client The data to be written are transferred to the network I/O device of host node by equipment, and the network I/O device of host node is deposited according to the first data Position is written into data and write direct in the nonvolatile memory of host node.
Step 802:The CPU of client, will be to be written according to the second data deposit position by the network I/O device of slave node Enter the nonvolatile memory that data write direct slave node.
Specifically, the CPU of client, which is obtained in the nonvolatile memory of slave node, can deposit the original position of data, It regard the original position as the second data deposit position information in the nonvolatile memory of the slave node.
Wherein, the CPU of client is written into the network I/O device that data are transferred to client, by the network I/O of client The data to be written are transferred to the network I/O device of slave node by equipment, and the network I/O device of slave node is deposited according to the second data Position is written into data and write direct in the nonvolatile memory of slave node.
Step 803:The CPU of client obtains the storage size that data to be written take, and passes through the network of host node I/O device directly obtained in the nonvolatile memory of host node, determined by original position of the first data deposit position this deposit Store up in the data of space size, from the first data of the setting length of the data tip forward, and the network for passing through slave node I/O device directly obtained in the nonvolatile memory of slave node, determined by original position of the second data deposit position this deposit Store up in the data of space size, from the second data of the setting length of the data tip forward.
Specifically, the CPU of client is by the atomic operation in remote direct data acquisition modes or read operation, without The CPU of host node is participated in, and first is directly obtained from the nonvolatile memory of host node via the network I/O device of host node Data.
Similarly, the CPU of client is by the atomic operation in remote direct data acquisition modes or read operation, without The CPU of slave node is participated in, and second is directly obtained from the nonvolatile memory of slave node via the network I/O device of slave node Data.
Wherein, the advantage of atomic operation is that the access that can ensure host node is not influenceed by slave node.
Step 804:The CPU of client determines that the first data are identical with verification data and the second data are identical with verification data When, determine data backup success.
Specifically, the CPU of client determines that the first data are differed and/or the second data and verification data with verification data When differing, into abnormality processing.
Below in conjunction with shown in Fig. 9, the data transmission procedure provided by a specific embodiment second embodiment is carried out Illustrate.
Step 1:The CPU of client increases marker bit in the initial data end of acquisition, will increase what is obtained after marker bit Data preserve the data to be written into internal memory as data to be written.
Step 2:The CPU of client obtains the data to be written in self EMS memory, and the data to be written are sent into main section The IB hardware of the IB hardware of point A network I/O device and slave node B network I/O device;The IB of host node A network I/O device Hardware writes the data to be written through PCIe paths in the nonvolatile memory (NVRAM) of itself;Slave node B network I/O The IB hardware of equipment writes the data to be written through PCIe paths in the nonvolatile memory (NVRAM) of itself.
Step 3:The CPU of client is hard by the IB of the network I/O device of RDMA atomic operations or read operation through host node A The setting from the data tip forward in the part data that obtaining step 2 is preserved from host node A nonvolatile memory First data of length, and preserve to internal memory;And pass through the network I/O device of RDMA atomic operations or read operation through slave node B The IB hardware data that obtaining step 2 is preserved from slave node B nonvolatile memory in from the data tip forward Setting length the second data, and preserve to internal memory.The CPU of client is according to the first data obtained from host node A, from standby The second data and marker bit that node B is obtained judge whether successfully Backup Data, and are handled according to judged result.
In the embodiment, the CPU of client is directly deposited by the network I/O device of host node from the non-volatile of host node The first data are obtained in reservoir, and are directly obtained by the network I/O device of slave node from the nonvolatile memory of slave node The second data are taken, by comparing the first data, the second data and verification data, determine whether data are standby in host node and slave node Part success.Whole data transmission procedure is unidirectionally controlled by client, it is to avoid multiple between control end and main-standby nodes interacts, CPU in whole data backup procedure without host node and slave node is participated in, and simplifies the data transfer between distributed node Data transmission delay between process, reduction distributed node, on the premise of the cpu resource taken is reduced as far as possible, verifies number According to whether back up completely, improve data transfer security.
Based on same inventive concept, in fourth embodiment of the invention, a kind of distributed node equipment, the distribution are additionally provided The specific implementation of formula node device can be found in the description of source node in the various embodiments described above, and the source node is specifically as follows host node Or client, repeat part and repeat no more, as shown in Figure 10, the distributed node equipment mainly includes CPU1001 and memory 1002, wherein, CPU1001 is used to read the program in memory 1002, and following steps are performed according to program:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the school Test data identical with the data division of the setting length from the data tip forward to be written in the data to be written, lead to The network I/O device for crossing destination node writes the data to be written in the nonvolatile memory of the destination node, passes through In the data that the network I/O device of the destination node obtained in the nonvolatile memory of the destination node, had been written into , the data of setting length, the data for setting length are to set length forward from the data end having been written into Data, it is determined that when the data of the setting length obtained from the destination node are consistent with the verification data, determining data Back up successfully.
In one is embodied, in addition to internal memory 1003, initial data to be written and default use for preserving Marker bit completely, with setting length whether is backed up in the checking initial data;
The CPU1001 specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, described to be written original Data end increases the marker bit, will increase the data obtained after the marker bit as the data to be written, will be described Marker bit is used as the verification data;
After the data for obtaining the setting length, byte number and the number of the setting length that the marker bit is included are determined According to comprising byte number it is identical, and it is described setting length data each byte content with it is same in the marker bit The content of the corresponding byte in position is identical, determines data backup success.
In another specific implementation, in addition to internal memory 1003, the initial data to be written for preserving;
The CPU1001 specifically for:
The initial data to be written is obtained from the internal memory, the initial data to be written is treated as described Data are write, the setting length from the initial data tip forward to be written in the initial data to be written is obtained Data division be set to the verification data;
After the data for obtaining the setting length, byte number and the number of the setting length that the marker bit is included are determined According to comprising byte number it is identical, and it is described setting length data each byte content with it is same in the marker bit The content of the corresponding byte in position is identical, determines data backup success.
Preferably, the CPU1001 specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the distributed node equipment Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from this The data of the setting length of the data tip forward of write-in, will be described by the network I/O device of the distributed node equipment Atomic operation or the read operation are transferred to the network I/O device of the destination node;
Receive it is that the network I/O device of the destination node is sent, obtained according to the atomic operation or the read operation , it is in the data being had been written into the nonvolatile memory of the destination node, from the data end having been written into The data of the setting length forward.
Preferably, the CPU1001 specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node The data to be written are transferred to the network I/O device of the destination node by the network I/O device of equipment, by the destination node Network I/O device the data to be written are write to the nonvolatile memory of the destination node.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.) Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims (10)

1. the data transmission method between a kind of distributed node, it is characterised in that applied in distributed memory system, including:
The central processor CPU of source node obtains data to be written and for verifying whether the data to be written back up completely Verification data, the setting length from the data tip forward to be written in the verification data and the data to be written Data division it is identical;
The CPU of the source node exports I/O device by the data write-in to be written mesh by the network inputs of destination node Node nonvolatile memory in;
The CPU of the source node obtains the non-volatile memories of the destination node by the network I/O device of the destination node In device, in the data that have been written into, setting length data, the data for setting length is from the numbers having been written into According to the data of the setting length of tip forward;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node Meanwhile, determine data backup success.
2. the method as described in claim 1, it is characterised in that the central processor CPU of the source node obtains number to be written According to this and for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written and default is used to verify whether the initial data backs up Completely, the marker bit with setting length, increases the marker bit, by increasing in the initial data end to be written The data obtained after marker bit are stated as the data to be written, the marker bit are regard as the verification data;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node Meanwhile, data backup success is determined, including:
The CPU of the source node determines that the marker bit byte number included and the setting obtained from the destination node are long The byte number that the packet of degree contains is identical, and each byte of the data of the setting length obtained from the destination node The content of content byte corresponding with the same position in the marker bit is identical, determines data backup success.
3. the method as described in claim 1, it is characterised in that the CPU of the source node obtains data to be written and is used for Verify whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written, using the initial data to be written as described to be written Data, obtain the number of the setting length from the initial data tip forward to be written in the initial data to be written The verification data is used as according to part;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node Meanwhile, data backup success is determined, including:
The CPU of the source node determines the byte number that the verification data is included and the setting obtained from the destination node The byte number that the packet of length contains is identical, and each byte of the data of the setting length obtained from the destination node Content byte corresponding with the same position in the verification data content it is identical, determine data backup success.
4. the method as described in claim 1,2 or 3, it is characterised in that the CPU of the source node passes through the destination node The number of in the data that network I/O device obtained in the nonvolatile memory of the destination node, had been written into, setting length According to, data for setting length as the setting length from the data tip forward having been written into data, including:
Atomic operation in remote direct data acquisition modes or read operation are sent to the source and saved by the CPU of the source node The network I/O device of point, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from described The data of the setting length for the data tip forward having been written into;
The atomic operation or the read operation are transferred to the network of the destination node by the network I/O device of the source node I/O device;
It is that the network I/O device that the CPU of the source node receives the destination node is sent, according to the atomic operation or described It is in the data being had been written into read operation the is obtained, nonvolatile memory of the destination node, had been written into from described Data tip forward the setting length data.
5. the method as described in claim 1,2 or 3, it is characterised in that the network that the CPU of the source node passes through destination node Input and output I/O device writes the data to be written in the nonvolatile memory of the destination node, including:
The data to be written are transferred to the network I/O device of the source node by the CPU of the source node;
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of the source node, by institute The network I/O device for stating destination node writes the data to be written the nonvolatile memory of the destination node.
6. a kind of distributed node equipment, it is characterised in that applied in distributed memory system, including central processor CPU And memory;
The program that the CPU is used to read in memory performs following steps:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the check number According to identical with the data division of the setting length from the data tip forward to be written in the data to be written, pass through mesh The network inputs output I/O device of node the data to be written are write in the nonvolatile memory of the destination node, The data obtained by the network I/O device of the destination node in the nonvolatile memory of the destination node, having been written into In, the data of setting length, the data for setting length are long as the setting from the data tip forward having been written into The data of degree, it is determined that when the data of the setting length obtained from the destination node are identical with the verification data, it is determined that Data backup success.
7. equipment as claimed in claim 6, it is characterised in that also including internal memory, for preserving initial data to be written with And the default marker bit for being used to verify whether the initial data backs up completely, with setting length;
The CPU specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, in the initial data to be written End increases the marker bit, will increase the data obtained after the marker bit as the data to be written, by the mark Position is used as the verification data;
After the data for obtaining the setting length, determine byte number that the marker bit includes and obtained from the destination node The byte number that the packet of the setting length contains is identical, and the data of the setting length obtained from the destination node The content of the content byte corresponding with the same position in the marker bit of each byte is identical, determine data backup into Work(.
8. equipment as claimed in claim 6, it is characterised in that also including internal memory, the initial data to be written for preserving;
The CPU specifically for:
The initial data to be written is obtained from the internal memory, using the initial data to be written as described to be written Data, obtain the number of the setting length from the initial data tip forward to be written in the initial data to be written The verification data is set to according to part;
After the data for obtaining the setting length, determine byte number that the verification data includes with being obtained from the destination node The setting length the byte number that contains of packet it is identical, and data of the setting length obtained from the destination node Each byte content byte corresponding with the same position in the verification data content it is identical, determine data backup Success.
9. the equipment as described in claim 6,7 or 8, it is characterised in that the CPU specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the net of the distributed node equipment Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data that have been written into, write from described The data of the setting length of the data tip forward entered, by the network I/O device of the distributed node equipment by the original Child-operation or the read operation are transferred to the network I/O device of the destination node;
Receive that the network I/O device of the destination node is sent, being obtained according to the atomic operation or the read operation, institute It is in the data being had been written into the nonvolatile memory for stating destination node, from the data tip forward having been written into The data of the setting length.
10. the equipment as described in claim 6,7 or 8, it is characterised in that the CPU specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node equipment Network I/O device the data to be written are transferred to the network I/O device of the destination node, by the net of the destination node Network I/O device writes the data to be written the nonvolatile memory of the destination node.
CN201410598754.1A 2014-10-30 2014-10-30 Data transmission method and distributed node equipment between distributed node Active CN104317716B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410598754.1A CN104317716B (en) 2014-10-30 2014-10-30 Data transmission method and distributed node equipment between distributed node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410598754.1A CN104317716B (en) 2014-10-30 2014-10-30 Data transmission method and distributed node equipment between distributed node

Publications (2)

Publication Number Publication Date
CN104317716A CN104317716A (en) 2015-01-28
CN104317716B true CN104317716B (en) 2017-10-24

Family

ID=52372951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410598754.1A Active CN104317716B (en) 2014-10-30 2014-10-30 Data transmission method and distributed node equipment between distributed node

Country Status (1)

Country Link
CN (1) CN104317716B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453460B (en) * 2015-08-12 2021-01-08 腾讯科技(深圳)有限公司 File distribution method, device and system
CN106445409A (en) * 2016-09-13 2017-02-22 郑州云海信息技术有限公司 Distributed block storage data writing method and device
CN107592361B (en) * 2017-09-20 2020-05-29 郑州云海信息技术有限公司 Data transmission method, device and equipment based on dual IB network
CN108494817B (en) * 2018-02-08 2022-03-04 华为技术有限公司 Data transmission method, related device and system
CN110691062B (en) * 2018-07-06 2021-01-26 浙江大学 Data writing method, device and equipment
CN110955734B (en) * 2020-02-13 2020-08-21 北京一流科技有限公司 Distributed signature decision system and method for logic node

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188569A (en) * 2006-11-16 2008-05-28 饶大平 Method for constructing data quanta space in network and distributed file storage system
CN101577716A (en) * 2009-06-10 2009-11-11 中国科学院计算技术研究所 Distributed storage method and system based on InfiniBand network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243858A1 (en) * 2006-08-01 2008-10-02 Latitude Broadband, Inc. Design and Methods for a Distributed Database, Distributed Processing Network Management System

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188569A (en) * 2006-11-16 2008-05-28 饶大平 Method for constructing data quanta space in network and distributed file storage system
CN101577716A (en) * 2009-06-10 2009-11-11 中国科学院计算技术研究所 Distributed storage method and system based on InfiniBand network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RDMA技术的研究与应用;刘天华等;《沈阳师范大学学报(自然科学版)》;20060430;第24卷(第2期);第185-188页 *
工程设备CAN总线控制系统设计;沈冬祥等;《微计算机信息》;20080131;第24卷(第1-2期);第50-52页 *

Also Published As

Publication number Publication date
CN104317716A (en) 2015-01-28

Similar Documents

Publication Publication Date Title
CN104317716B (en) Data transmission method and distributed node equipment between distributed node
CN107209644B (en) Data processing method and NVMe memory
US20190163364A1 (en) System and method for tcp offload for nvme over tcp-ip
US9727503B2 (en) Storage system and server
US10891253B2 (en) Multicast apparatuses and methods for distributing data to multiple receivers in high-performance computing and cloud-based networks
CN111930676B (en) Method, device, system and storage medium for communication among multiple processors
CN108701004A (en) A kind of system of data processing, method and corresponding intrument
US10026442B2 (en) Data storage mechanism using storage system determined write locations
US10116746B2 (en) Data storage method and network interface card
EP3542276B1 (en) Flow control in remote direct memory access data communications with mirroring of ring buffers
CN110659151B (en) Data verification method and device and storage medium
CN105556930A (en) NVM EXPRESS controller for remote memory access
US11809290B2 (en) Storage system and storage queue processing following port error
US20220222016A1 (en) Method for accessing solid state disk and storage device
US9946721B1 (en) Systems and methods for managing a network by generating files in a virtual file system
EP3542519B1 (en) Faster data transfer with remote direct memory access communications
US9619336B2 (en) Managing production data
CN108833477B (en) Message transmission method, system and device based on shared memory
US10564847B1 (en) Data movement bulk copy operation
WO2024183587A1 (en) Message transmission methods and apparatuses, nonvolatile readable storage medium and electronic apparatus
US9715477B2 (en) Shared-bandwidth multiple target remote copy
US20200341653A1 (en) Method, network adapter and computer program product for processing data
CN115129509B (en) Data transmission method, device and medium
WO2022156376A1 (en) Method, system and device for prefetching target address, and medium
WO2022121385A1 (en) File access method, storage node, and network card

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant