CN104317716B - Data transmission method and distributed node equipment between distributed node - Google Patents
Data transmission method and distributed node equipment between distributed node Download PDFInfo
- Publication number
- CN104317716B CN104317716B CN201410598754.1A CN201410598754A CN104317716B CN 104317716 B CN104317716 B CN 104317716B CN 201410598754 A CN201410598754 A CN 201410598754A CN 104317716 B CN104317716 B CN 104317716B
- Authority
- CN
- China
- Prior art keywords
- data
- written
- node
- cpu
- destination node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the data transmission method between a kind of distributed node and distributed node equipment, on the premise of the cpu resource taken is reduced as far as possible, to improve the security of data transfer.This method is:The CPU of source node obtains data to be written and verification data, the verification data is identical with the data division of the setting length from the data tip forward to be written in the data to be written, is write the data to be written in the nonvolatile memory of the destination node by the network I/O device of destination node;The data of setting length in the data that the CPU of source node obtained by the network I/O device of destination node in the nonvolatile memory of destination node, had been written into, this sets the data of length as the data of the setting length from the data tip forward having been written into;When the CPU of source node determines that the data of the setting length obtained from destination node are identical with the verification data, data backup success is determined.
Description
Technical field
The present invention relates to the data transmission method between field of computer technology, more particularly to a kind of distributed node and distribution
Formula node device.
Background technology
At present, in distributed memory system, data distribution is in multiple distributed nodes, in order to ensure the reliability of data,
Data use the preserving type of many copies, i.e., a data need to be stored in multiple distributed nodes, and this results in distributed section
Need frequently to carry out substantial amounts of data copy between point.In distributed memory system as shown in Figure 1, by distributed node A
The copies of data preserve into distributed node C, the copy of the data in distributed node B is preserved to distributed node D
In.
In order to reach the handling capacity required by high-end storage and the requirement of low delay, common Ethernet is carrying out data transmission
When, transmitting terminal and receiving terminal are required to central processing unit (CPU) and participate in moving for data, and need operating system in kernel state and
Frequent context switching is carried out between User space.Infinite bandwidth (Infiniband, abbreviation IB) network can then provide long-range straight
Connect data acquisition (RDMA) technology, using RDMA technologies during transmitting terminal and receiving terminal carry out data transmission without CPU's
Participate in, greatly improve the speed of data transfer.
RDMA technologies allow data to be transmitted directly to another calculator memory from a calculator memory, and general principle is such as
Shown in Fig. 2, CPU need not participate in data transmission procedure (step 2 and step 4 in Fig. 2), it is only necessary to participate in the control of data transfer early stage
System (step 1 in Fig. 2) and data be transmitted after notification procedure (step 5 in Fig. 2).
RDMA mainly has following several traffic operations among the nodes:(Send) operation is sent, (Write) operation is write, reads
(Read) operation and atom (Atomic) operation.The data transfer of each traffic operation is completed by IB hardware, data send and
Main frame need not be participated in DRP data reception process, but different in the processing after initiating data transfer and completing data transfer,
It is specific as shown in Figure 3.
Wherein, send operation to perform according to step 1~4, send in operation and initiating data transmission procedure and receiving data
Need before transmitting terminal and receiving terminal main frame participate in consult, and send data and receive data complete when, transmitting terminal and
Receiving terminal is required for a notification message (Notify) to notify the application in main frame;
Write operation is performed according to step 1~4, and the main frame of receiving terminal need not carry out any operation, that is to say, that receiving terminal master
Machine needs not participate in data transmission procedure, the notice after being transmitted without the negotiation before participating in data transfer and data
Process, and send the notification procedure after notice and data before end main frame needs to participate in data transmission are sent completely;
Read operation is performed according to step 1~4, and receiving terminal reads data from transmitting terminal, and transmitting terminal need not participate in whole data
Transmitting procedure, the notification procedure after being completed without the notice and digital independent before participating in digital independent, and receiving terminal is needed
Initiate read operation, and need after digital independent completion to notify main frame;
Atom (Atomic) operation is performed according to step 1~3, and one end needs to transmit 64 bits (bite) in atomic operation
Data are operated to opposite end, and to the 64bite data of opposite end fixed position, and for example atom increase, atom reduction or atom are replaced
The operation changed, now receiving terminal is without supplemental characteristic transmitting procedure, and transmitting terminal needs to carry out notice operation.
It is general to use as shown in Figure 4 when transmission data are needed in existing distributed system, between distributed node
Flow, it is specific as follows:
Transmitting terminal prepares caching (Buff) information (address of such as buffer area, length, authority) for needing to send, and will
Buff information is sent to receiving terminal by RDMA transmission operation, and hardware notification application message has been successfully transmitted;
After the notified message of receiving terminal hardware, notify local terminal is applied to initiate a RDMA according to the Buff information of transmitting terminal
Read operation, receiving terminal hardware completes read operation, by the internal memory of the digital independent in the internal memory of transmitting terminal to local terminal, and leads to
Know receiving terminal application;
Receiving terminal application is received after notification message, is determined that data have been successfully written nonvolatile memory, is sent
The response message of data is read to transmitting terminal;
Transmitting terminal hardware acceptance is to the application of local terminal after response message, is notified, local terminal application is obtained after the response message really
Determine opposite end and successfully read data, client input and output can be notified to complete.
During being somebody's turn to do, receiving terminal CPU needs to participate in control and the notice of data transfer, is needed between receiving terminal and transmitting terminal
Data transmission procedure could be completed by carrying out repeatedly interaction, and the interaction can not determine whether data back up to receiving terminal completely
Nonvolatile memory, if in the case of data backup is incomplete, system initiates follow-up data handling procedure, can shadow
The security of acoustic system.
The content of the invention
The embodiment of the present invention provides the data transmission method and distributed node equipment between a kind of distributed node, to
On the premise of reducing the cpu resource taken, the security of data transfer is improved.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
First aspect there is provided the data transmission method between a kind of distributed node, including:
The central processor CPU of source node obtains data to be written and for verifying whether the data to be written back up
Complete verification data, the verification data and the setting from the data tip forward to be written in the data to be written
The data division of length is identical;
The CPU of the source node exports I/O device by the data write-in institute to be written by the network inputs of destination node
In the nonvolatile memory for stating destination node;
The CPU of the source node obtains the non-volatile of the destination node by the network I/O device of the destination node
In memory, in the data that have been written into, setting length data, the data for setting length is have been written into from described
Data tip forward setting length data;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node
According to it is identical when, determine data backup success.
With reference in a first aspect, in the first possible implementation, the central processor CPU of the source node is treated
Write data and for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written and default is used for whether verifying the initial data
Backup is complete, marker bit with setting length, increases the marker bit in the initial data end to be written, will increase
Plus the data obtained after the marker bit regard the marker bit as the verification data as the data to be written;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node
According to it is identical when, determine data backup success, including:
The CPU of the source node is set described in determining byte number that the marker bit includes and being obtained from the destination node
The byte number that the packet of measured length contains is identical, and each word of the data of the setting length obtained from the destination node
The content of the content of section byte corresponding with the same position in the marker bit is identical, determines data backup success.
With reference in a first aspect, in second of possible implementation, the CPU of the source node obtain data to be written with
And for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written, and the initial data to be written is treated as described
Data are write, by the setting length from the initial data tip forward to be written in the initial data to be written
Data division is used as the verification data;
The CPU of the source node determines the data and the check number of the setting length obtained from the destination node
According to it is identical when, determine data backup success, including:
The CPU of the source node is set described in determining byte number that the marker bit includes and being obtained from the destination node
The byte number that the packet of measured length contains is identical, and each word of the data of the setting length obtained from the destination node
The content of the content of section byte corresponding with the same position in the verification data is identical, determines data backup success.
With reference to any one of first aspect into second of possible implementation, in the third possible implementation
In, the CPU of the source node obtains the non-volatile memories of the destination node by the network I/O device of the destination node
In the data being had been written into device, setting length data, the data for setting length is from the numbers having been written into
According to the data of the setting length of tip forward, including:
Atomic operation in remote direct data acquisition modes or read operation are sent to described by the CPU of the source node
The network I/O device of source node, the atomic operation or the read operation be used to indicating obtaining it is in the data that have been written into, from
The data of the setting length of the data tip forward having been written into;
The atomic operation or the read operation are transferred to the destination node by the network I/O device of the source node
Network I/O device;
It is that the network I/O device that the CPU of the source node receives the destination node is sent, according to the atomic operation or
It is in the data being had been written into nonvolatile memory that the read operation is obtained, the destination node, from it is described
The data of the setting length of the data tip forward of write-in.
With reference to any one of first aspect into second of possible implementation, in the 4th kind of possible implementation
In, the CPU of the source node exports I/O device by the data write-in to be written mesh by the network inputs of destination node
Node nonvolatile memory in, including:
The data to be written are transferred to the network I/O device of the source node by the CPU of the source node;
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of the source node,
The data to be written are write to the nonvolatile memory of the destination node by the network I/O device of the destination node.
Second aspect, the invention provides a kind of distributed node equipment, including central processor CPU and memory;
The program that the CPU is used to read in memory performs following steps:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the school
Test data identical with the data division of the setting length from the data tip forward to be written in the data to be written, lead to
The network inputs output I/O device for crossing destination node writes the data to be written the non-volatile memories of the destination node
In device, obtained in the nonvolatile memory of the destination node, had been written into by the network I/O device of the destination node
Data in, the data of setting length, the data for setting length is from the data tip forwards having been written into
The data of length are set, it is determined that the data of the setting length obtained from the destination node are identical with the verification data
When, determine data backup success.
With reference to second aspect, in the first possible implementation, in addition to internal memory, to be written for preserving is original
Data and the default marker bit for being used to verify whether the initial data backs up completely, with setting length;
The CPU specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, described to be written original
Data end increases the marker bit, will increase the data obtained after the marker bit as the data to be written, will be described
Marker bit is used as the verification data;
After the data for obtaining the setting length, determine byte number that the marker bit includes with being obtained from the destination node
The byte number that the packet of the setting length taken contains is identical, and the number of the setting length obtained from the destination node
According to each byte content byte corresponding with the same position in the marker bit content it is identical, determine data backup
Success.
With reference to second aspect, in second of possible implementation, in addition to internal memory, to be written for preserving is original
Data;
The CPU specifically for:
The initial data to be written is obtained from the internal memory, the initial data to be written is treated as described
Data are write, the setting length from the initial data tip forward to be written in the initial data to be written is obtained
Data division be set to the verification data;
After the data for obtaining the setting length, determine byte number that the verification data includes with from the destination node
The byte number that the packet of the setting length obtained contains is identical, and the setting length obtained from the destination node
The content of the content of each byte of data byte corresponding with the same position in the verification data is identical, determines data
Back up successfully.
With reference to any one of second aspect into second of possible implementation, in the third possible implementation
In, the CPU specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the distributed node equipment
Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from it is described
The data of the setting length of data tip forward through write-in, by the network I/O device of the distributed node equipment by institute
State atomic operation or the read operation is transferred to the network I/O device of the destination node;
Receive it is that the network I/O device of the destination node is sent, obtained according to the atomic operation or the read operation
, it is in the data being had been written into the nonvolatile memory of the destination node, from the data end having been written into
The data of the setting length forward.
With reference to any one of second aspect into second of possible implementation, in the 4th kind of possible implementation
In, the CPU specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of equipment, by the destination node
Network I/O device the data to be written are write to the nonvolatile memory of the destination node.
Based on above-mentioned technical proposal, in the embodiment of the present invention, the CPU of source node is straight by the network I/O device of destination node
The nonvolatile memory of operation destination node is connect, the CPU without destination node is participated in, the data to be written are write into purpose section
In the nonvolatile memory of point, the CPU of source node directly operates the non-of destination node by the network I/O device of destination node
Volatile memory, the CPU without destination node is participated in, the number being had been written into the nonvolatile memory for obtaining destination node
The data of setting length in, from the data tip forward having been written into, by by the number of the setting length of acquisition
Compared according to the verification data with the CPU of source node acquisitions, determine whether data back up completely, whole data transmission procedure is saved by source
Point is unidirectionally controlled, and the CPU without destination node is participated in, on the premise of the cpu resource taken is reduced as far as possible, improves data transfer
Security.
Brief description of the drawings
Fig. 1 is existing distributed memory system configuration diagram;
Fig. 2 is existing RDMA principle schematics;
Fig. 3 is existing various RDMA traffic operations schematic diagrames;
Fig. 4 is the data transfer flow schematic diagram between existing distributed node;
Fig. 5 is data transmission method schematic flow sheet in distributed system in the embodiment of the present invention;
Fig. 6 is data transmission method schematic flow sheet between main-standby nodes in the embodiment of the present invention;
The process schematic of data is preserved during Fig. 7 is implemented for the present invention between main-standby nodes;
Client preserves the method flow schematic diagram of data in host node and slave node in Fig. 8 embodiment of the present invention;
Fig. 9 preserves data procedures schematic diagram for client in the embodiment of the present invention in host node and slave node;
Figure 10 is the structural representation of distributed node equipment in the embodiment of the present invention.
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into
One step it is described in detail, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole implementation
Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made
All other embodiment, belongs to the scope of protection of the invention.
In following examples, the host node of each preservation data in distributed system preassigns a slave node,
The slave node is used for the copy for preserving the data that the host node is preserved.Memory requirement meets difference to data in a distributed system
Preserved successfully on host node and slave node.
In following embodiment, the nonvolatile memory satisfaction of distributed node can be exported (IO) by network inputs and set
It is standby directly to be accessed by network interface.For example, the nonvolatile memory (such as PCIe NVRAM) of distributed node can be adopted
Directly accessed with the network equipment (such as Infiniband, RoCE equipment) of PCIe interface.
In the embodiment of the present invention, as shown in figure 5, the method detailed flow of data transfer is as follows in distributed system:
Step 501:The CPU of source node obtains data to be written and for verifying whether the data to be written back up completely
Verification data, the data of the setting length from the data tip forward to be written in the verification data and the data to be written
Part is identical.
Wherein, the CPU of source node obtains data to be written and verification data, including but not limited to following two specific realities
Existing mode:
The first, the CPU of source node obtains initial data to be written and default is used to verify that the initial data is
No backup is complete, marker bit with setting length, increases the marker bit in the initial data end to be written, will increase
The data obtained after the marker bit regard the marker bit as verification data as the data to be written.
Wherein, the length of marker bit is predesignated, and the value of each bit of marker bit is also to pre-set.Example
Such as, the length for preassigning marker bit is 64 bits.
Second, the CPU of source node obtains initial data to be written, and the initial data to be written is treated as described
Data are write, the data from the setting length of the initial data tip forward to be written in the initial data to be written are obtained
Part is used as verification data.
Specifically, metadata and other data in addition to metadata, the metadata are included in the data to be written of acquisition
Structure for describing other data.
Step 502:The data to be written are write the purpose section by the CPU of source node by the network I/O device of destination node
In the nonvolatile memory of point.
Specifically, the data to be written are transferred to the network I/O device of source node by the CPU of source node, by source node
The data to be written are transferred to the network I/O device of destination node by network I/O device, will by the network I/O device of destination node
The data to be written write the nonvolatile memory of destination node.
Step 503:The CPU of source node obtains the non-volatile memories of destination node by the network I/O device of destination node
In device, in the data that have been written into, setting length data, this sets the data of length as from the data having been written into
The data of the setting length of tip forward.
Specifically, the atomic operation in remote direct data acquisition modes or read operation are sent to by the CPU of source node
The network I/O device of source node, the atomic operation or the read operation be used to indicate to obtain in the data that have been written into from this
The data of the setting length of the data tip forward of write-in, by the network I/O device of source node by the atomic operation or the read operation
It is transferred to the network I/O device of destination node;
It is that the network I/O device that the CPU of source node receives destination node is sent, obtained according to the atomic operation or the read operation
Data tip forward in the data being had been written into nonvolatile memory take, destination node, having been written into from this
The setting length data.
Specifically, the setting length is obtained from the nonvolatile memory of destination node according to the length of verification data
Data, that is, the length of the data of the setting length obtained is identical with the length of verification data.
Wherein, the read operation in remote direct data acquisition modes is defined as:The CPU of source node initiates to read the finger of data
Order, can carry the feature for the data for needing to obtain, for example, data length, data storage location in the instruction of the reading data
Configured information etc.;Network I/O device of the instruction through source node is transferred to the network I/O device of destination node, destination node
The data that network I/O device is obtained according to indicated by the instruction is directly obtained from the nonvolatile memory of purpose end equipment, it is whole
Individual process is participated in without the CPU of destination node.
Wherein, the process of data and the execution of read operation are read in the atomic operation in remote direct data acquisition modes
Cheng Xiangtong, difference is that the whole implementation procedure of atomic operation will not be interrupted.
Step 504:The CPU of source node determines the data and the check number of the setting length obtained from destination node
According to it is identical when, determine data backup success.
Preferably, the data for the setting length that the CPU of source node is obtained from the nonvolatile memory of destination node
Afterwards, determine that the byte number that the marker bit is included is identical with the byte number that the packet of the setting length contains, and the setting
The content of the content of each byte of the data of length byte corresponding with the same position in the marker bit is identical, it is determined that
Data backup success.
In the embodiment, the CPU of source node directly operates the non-easy of destination node by the network I/O device of destination node
The property lost memory, the CPU without destination node is participated in, and the data to be written are write to the nonvolatile memory of destination node
In, the CPU of source node directly operates the nonvolatile memory of destination node by the network I/O device of destination node, without mesh
The CPU of node participate in, obtain being had been written into from this in the data that have been written into the nonvolatile memory of destination node
Data tip forward setting length data, by by the CPU of the data of the setting length of acquisition and source node acquisition
Verification data compare, determine whether data back up completely, whole data transmission procedure is unidirectionally controlled by source node, without purpose
The CPU of node is participated in, on the premise of the cpu resource taken is reduced as far as possible, improves the security of data transfer.
In the embodiment of the present invention, source node can be host node, and destination node can be slave node;Source node can also be
Client, destination node is host node or slave node.
Different below according to data backup application scenarios are carried out specifically there is provided two embodiments to the process of data transfer
Explanation.
In first embodiment, as shown in fig. 6, the method detailed carried out data transmission in distributed system between main-standby nodes
Flow is as follows:
Step 601:The CPU of host node obtains data to be written and for verifying whether the data to be written back up completely
Verification data, the data of the setting length from the data tip forward to be written in the verification data and the data to be written
Part is identical, and being written into data by the network I/O device of slave node writes in the nonvolatile memory of slave node.
In a detailed embodiment, the CPU of host node receives the request of write-in data, according to asking for the write-in data
Ask and obtain initial data to be written.For example, the CPU of host node can obtain original to be written from the request of the write-in data
Beginning data, or, the CPU of host node can obtain to be written original according to the storage location indicated by the request of write-in data
Data.
Specifically, obtain after initial data to be written, there are following two embodiments:
The first, the CPU of host node will obtain initial data to be written as data to be written, obtain described to be written
Initial data in the data division of the setting length from the initial data tip forward to be written be used as verification data;
Second, initial data to be written that the CPU of host node is obtained and default it is used to verify the original number
According to whether backing up after marker bit complete, with setting length, the marker bit is increased in the initial data end to be written,
The data obtained after the marker bit will be increased as data to be written, the marker bit is regard as verification data.
For example, increasing the marker bit that default length is 64 bits in initial data end to be written.
In one specific implementation, the CPU of host node, which is obtained, can deposit rising for data in the nonvolatile memory of itself
Beginning position, regard original position as the first data deposit position in the nonvolatile memory of the host node;The CPU of host node
The original position of data can be deposited in the nonvolatile memory for obtaining slave node, the original position is regard as the slave node
The second data deposit position in nonvolatile memory.
Wherein, nonvolatile memory can be any one have can persist, power off the storage for not losing characteristic
Medium, for example, NVDIMM, NVRAM, SSD etc..
Step 602:The CPU of host node obtains the nonvolatile memory of slave node by the network I/O device of slave node
In, the data of setting length in the data that have been written into, this sets the data of length as the data end that has been written into from this
The data of setting length forward.
Specifically, the CPU of host node obtains the storage size that data to be written take, and passes through the network I/O of slave node
The storage that equipment directly obtained in the nonvolatile memory of slave node, determined using the second data deposit position as original position is empty
Between size data in, from the data of the setting length of the data tip forward.
Specifically, the CPU of host node is by the atomic operation in remote direct data acquisition modes or read operation, without
The CPU of slave node is participated in, and is directly obtained via the network I/O device of slave node from the nonvolatile memory of slave node with the
In the data for the storage size that two data deposit positions determine for original position, setting from the data tip forward
The data of measured length.Wherein, the advantage of atomic operation is that the access that can ensure host node is not influenceed by slave node.
Wherein, the caching that atomic operation can refresh on transmission channel (such as PCIe paths), it is ensured that on transmission channel
The data cached nonvolatile memory for not influenceing data safety to be written to reach slave node.
Step 603:Host node CPU determines the data and the verification data phase of the setting length obtained from slave node
Meanwhile, determine data backup success.
Specifically, when the CPU of host node determines that the data of the setting length obtained from slave node and marker bit are differed,
Into abnormality processing.
In the embodiment, the CPU of host node directly operates the non-volatile of slave node by the network I/O device of slave node
Memory, the CPU without slave node is participated in, and the data to be written is write in the nonvolatile memory of slave node, host node
CPU directly operate the nonvolatile memory of slave node to obtain the nonvolatile memory by the network I/O device of slave node
From the data of the setting length of the data tip forward having been written into the data having been written into, obtained by comparing from slave node
The data and verification data of the setting length taken, determine whether data back up completely, and whole data transmission procedure is by host node
Unidirectionally controlled, the CPU without slave node is participated in, on the premise of the cpu resource taken is reduced as far as possible, and whether checking data back up
Completely, the security of data transfer is improved.
Below in conjunction with shown in Fig. 7, the data transmission procedure provided by a specific embodiment first embodiment is carried out
Explanation.
Step 1:Host node A CPU increases marker bit in the initial data end of acquisition, will increase what is obtained after marker bit
Data are used as data to be written.
Step 2:Data to be written in self EMS memory (memory) are write the non-volatile of itself by host node A CPU
In memory (NVRAM);Infiniband (IB) hardware of host node A network I/O device is through PCIe paths from the internal memory of itself
(memory) data to be written are obtained in, the data to be written are sent to the Infiniband of slave node B network I/O device
(IB) data to be written are write standby section by hardware, the Infiniband hardware of slave node B network I/O device through PCIe paths
Point B nonvolatile memory (NVRAM).
Step 3:Host node A CPU is set by network I/O of the IB hardware through slave node B of host node A network I/O device
Standby IB hardware, from the slave node B nonvolatile memory obtaining step 2 preserve data in from the data end to
The preceding data consistent with mark bit length, are preserved into host node A internal memory.
Wherein, the operation of step 3 can immediately be carried out after the operation triggering of step 2, be write with the data in step 2
It is unrelated whether process completes.As long as step 3 is returned successfully, then it is successful for may insure step 2.
Step 4:The data consistent with mark bit length that host node A CPU obtains step 3 are compared with marker bit
Compared with judging whether successfully Backup Data, if both differ, illustrate that data transmission procedure occurs abnormal, it is necessary into different
Often processing such as re-starts transmission;If both are identical, illustrate data transfer success, follow-up operation can be carried out.
In second embodiment, as shown in figure 8, client is to the detailed of host node and slave node Backup Data in distributed system
Thin method flow is as follows:
Step 801:The CPU of client obtains data to be written and for verifying whether the data to be written have backed up
The setting from the data tip forward to be written in full verification data, the verification data and the data to be written is long
The data division of degree is identical, by the network I/O device of host node, is written into data according to the first data deposit position direct
Write the nonvolatile memory of host node.
In a detailed embodiment, the CPU of client receives the request of write-in data, according to asking for the write-in data
Ask and obtain initial data to be written.For example, the CPU of client can obtain original to be written from the request of the write-in data
Beginning data, or, the CPU of client can obtain to be written original according to the storage location indicated by the request of write-in data
Data.
Specifically, obtain after initial data to be written, there are following two embodiments:
The first, the CPU of client will obtain initial data to be written as data to be written, obtain this to be written
The data division of the setting length from the initial data tip forward to be written in initial data is used as verification data;
Second, the CPU of client obtains initial data to be written and pre- in the initial data to be written of acquisition
If be used for verify the initial data whether back up completely, with set length marker bit, in the original to be written
Beginning data end increases marker bit, will increase the data obtained after the marker bit as data to be written, the marker bit is made
For the verification data.
Specifically, the CPU of client, which is obtained in the nonvolatile memory of host node, can deposit the original position of data,
It regard original position as the first data deposit position information in the nonvolatile memory of the host node.
Wherein, the CPU of client is written into the network I/O device that data are transferred to client, by the network I/O of client
The data to be written are transferred to the network I/O device of host node by equipment, and the network I/O device of host node is deposited according to the first data
Position is written into data and write direct in the nonvolatile memory of host node.
Step 802:The CPU of client, will be to be written according to the second data deposit position by the network I/O device of slave node
Enter the nonvolatile memory that data write direct slave node.
Specifically, the CPU of client, which is obtained in the nonvolatile memory of slave node, can deposit the original position of data,
It regard the original position as the second data deposit position information in the nonvolatile memory of the slave node.
Wherein, the CPU of client is written into the network I/O device that data are transferred to client, by the network I/O of client
The data to be written are transferred to the network I/O device of slave node by equipment, and the network I/O device of slave node is deposited according to the second data
Position is written into data and write direct in the nonvolatile memory of slave node.
Step 803:The CPU of client obtains the storage size that data to be written take, and passes through the network of host node
I/O device directly obtained in the nonvolatile memory of host node, determined by original position of the first data deposit position this deposit
Store up in the data of space size, from the first data of the setting length of the data tip forward, and the network for passing through slave node
I/O device directly obtained in the nonvolatile memory of slave node, determined by original position of the second data deposit position this deposit
Store up in the data of space size, from the second data of the setting length of the data tip forward.
Specifically, the CPU of client is by the atomic operation in remote direct data acquisition modes or read operation, without
The CPU of host node is participated in, and first is directly obtained from the nonvolatile memory of host node via the network I/O device of host node
Data.
Similarly, the CPU of client is by the atomic operation in remote direct data acquisition modes or read operation, without
The CPU of slave node is participated in, and second is directly obtained from the nonvolatile memory of slave node via the network I/O device of slave node
Data.
Wherein, the advantage of atomic operation is that the access that can ensure host node is not influenceed by slave node.
Step 804:The CPU of client determines that the first data are identical with verification data and the second data are identical with verification data
When, determine data backup success.
Specifically, the CPU of client determines that the first data are differed and/or the second data and verification data with verification data
When differing, into abnormality processing.
Below in conjunction with shown in Fig. 9, the data transmission procedure provided by a specific embodiment second embodiment is carried out
Illustrate.
Step 1:The CPU of client increases marker bit in the initial data end of acquisition, will increase what is obtained after marker bit
Data preserve the data to be written into internal memory as data to be written.
Step 2:The CPU of client obtains the data to be written in self EMS memory, and the data to be written are sent into main section
The IB hardware of the IB hardware of point A network I/O device and slave node B network I/O device;The IB of host node A network I/O device
Hardware writes the data to be written through PCIe paths in the nonvolatile memory (NVRAM) of itself;Slave node B network I/O
The IB hardware of equipment writes the data to be written through PCIe paths in the nonvolatile memory (NVRAM) of itself.
Step 3:The CPU of client is hard by the IB of the network I/O device of RDMA atomic operations or read operation through host node A
The setting from the data tip forward in the part data that obtaining step 2 is preserved from host node A nonvolatile memory
First data of length, and preserve to internal memory;And pass through the network I/O device of RDMA atomic operations or read operation through slave node B
The IB hardware data that obtaining step 2 is preserved from slave node B nonvolatile memory in from the data tip forward
Setting length the second data, and preserve to internal memory.The CPU of client is according to the first data obtained from host node A, from standby
The second data and marker bit that node B is obtained judge whether successfully Backup Data, and are handled according to judged result.
In the embodiment, the CPU of client is directly deposited by the network I/O device of host node from the non-volatile of host node
The first data are obtained in reservoir, and are directly obtained by the network I/O device of slave node from the nonvolatile memory of slave node
The second data are taken, by comparing the first data, the second data and verification data, determine whether data are standby in host node and slave node
Part success.Whole data transmission procedure is unidirectionally controlled by client, it is to avoid multiple between control end and main-standby nodes interacts,
CPU in whole data backup procedure without host node and slave node is participated in, and simplifies the data transfer between distributed node
Data transmission delay between process, reduction distributed node, on the premise of the cpu resource taken is reduced as far as possible, verifies number
According to whether back up completely, improve data transfer security.
Based on same inventive concept, in fourth embodiment of the invention, a kind of distributed node equipment, the distribution are additionally provided
The specific implementation of formula node device can be found in the description of source node in the various embodiments described above, and the source node is specifically as follows host node
Or client, repeat part and repeat no more, as shown in Figure 10, the distributed node equipment mainly includes CPU1001 and memory
1002, wherein, CPU1001 is used to read the program in memory 1002, and following steps are performed according to program:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the school
Test data identical with the data division of the setting length from the data tip forward to be written in the data to be written, lead to
The network I/O device for crossing destination node writes the data to be written in the nonvolatile memory of the destination node, passes through
In the data that the network I/O device of the destination node obtained in the nonvolatile memory of the destination node, had been written into
, the data of setting length, the data for setting length are to set length forward from the data end having been written into
Data, it is determined that when the data of the setting length obtained from the destination node are consistent with the verification data, determining data
Back up successfully.
In one is embodied, in addition to internal memory 1003, initial data to be written and default use for preserving
Marker bit completely, with setting length whether is backed up in the checking initial data;
The CPU1001 specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, described to be written original
Data end increases the marker bit, will increase the data obtained after the marker bit as the data to be written, will be described
Marker bit is used as the verification data;
After the data for obtaining the setting length, byte number and the number of the setting length that the marker bit is included are determined
According to comprising byte number it is identical, and it is described setting length data each byte content with it is same in the marker bit
The content of the corresponding byte in position is identical, determines data backup success.
In another specific implementation, in addition to internal memory 1003, the initial data to be written for preserving;
The CPU1001 specifically for:
The initial data to be written is obtained from the internal memory, the initial data to be written is treated as described
Data are write, the setting length from the initial data tip forward to be written in the initial data to be written is obtained
Data division be set to the verification data;
After the data for obtaining the setting length, byte number and the number of the setting length that the marker bit is included are determined
According to comprising byte number it is identical, and it is described setting length data each byte content with it is same in the marker bit
The content of the corresponding byte in position is identical, determines data backup success.
Preferably, the CPU1001 specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the distributed node equipment
Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from this
The data of the setting length of the data tip forward of write-in, will be described by the network I/O device of the distributed node equipment
Atomic operation or the read operation are transferred to the network I/O device of the destination node;
Receive it is that the network I/O device of the destination node is sent, obtained according to the atomic operation or the read operation
, it is in the data being had been written into the nonvolatile memory of the destination node, from the data end having been written into
The data of the setting length forward.
Preferably, the CPU1001 specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of equipment, by the destination node
Network I/O device the data to be written are write to the nonvolatile memory of the destination node.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code
The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
Claims (10)
1. the data transmission method between a kind of distributed node, it is characterised in that applied in distributed memory system, including:
The central processor CPU of source node obtains data to be written and for verifying whether the data to be written back up completely
Verification data, the setting length from the data tip forward to be written in the verification data and the data to be written
Data division it is identical;
The CPU of the source node exports I/O device by the data write-in to be written mesh by the network inputs of destination node
Node nonvolatile memory in;
The CPU of the source node obtains the non-volatile memories of the destination node by the network I/O device of the destination node
In device, in the data that have been written into, setting length data, the data for setting length is from the numbers having been written into
According to the data of the setting length of tip forward;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node
Meanwhile, determine data backup success.
2. the method as described in claim 1, it is characterised in that the central processor CPU of the source node obtains number to be written
According to this and for verifying whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written and default is used to verify whether the initial data backs up
Completely, the marker bit with setting length, increases the marker bit, by increasing in the initial data end to be written
The data obtained after marker bit are stated as the data to be written, the marker bit are regard as the verification data;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node
Meanwhile, data backup success is determined, including:
The CPU of the source node determines that the marker bit byte number included and the setting obtained from the destination node are long
The byte number that the packet of degree contains is identical, and each byte of the data of the setting length obtained from the destination node
The content of content byte corresponding with the same position in the marker bit is identical, determines data backup success.
3. the method as described in claim 1, it is characterised in that the CPU of the source node obtains data to be written and is used for
Verify whether the data to be written back up complete verification data, including:
The CPU of the source node obtains initial data to be written, using the initial data to be written as described to be written
Data, obtain the number of the setting length from the initial data tip forward to be written in the initial data to be written
The verification data is used as according to part;
The CPU of the source node determines the data and the verification data phase of the setting length obtained from the destination node
Meanwhile, data backup success is determined, including:
The CPU of the source node determines the byte number that the verification data is included and the setting obtained from the destination node
The byte number that the packet of length contains is identical, and each byte of the data of the setting length obtained from the destination node
Content byte corresponding with the same position in the verification data content it is identical, determine data backup success.
4. the method as described in claim 1,2 or 3, it is characterised in that the CPU of the source node passes through the destination node
The number of in the data that network I/O device obtained in the nonvolatile memory of the destination node, had been written into, setting length
According to, data for setting length as the setting length from the data tip forward having been written into data, including:
Atomic operation in remote direct data acquisition modes or read operation are sent to the source and saved by the CPU of the source node
The network I/O device of point, the atomic operation or the read operation be used to indicating obtaining it is in the data having been written into, from described
The data of the setting length for the data tip forward having been written into;
The atomic operation or the read operation are transferred to the network of the destination node by the network I/O device of the source node
I/O device;
It is that the network I/O device that the CPU of the source node receives the destination node is sent, according to the atomic operation or described
It is in the data being had been written into read operation the is obtained, nonvolatile memory of the destination node, had been written into from described
Data tip forward the setting length data.
5. the method as described in claim 1,2 or 3, it is characterised in that the network that the CPU of the source node passes through destination node
Input and output I/O device writes the data to be written in the nonvolatile memory of the destination node, including:
The data to be written are transferred to the network I/O device of the source node by the CPU of the source node;
The data to be written are transferred to the network I/O device of the destination node by the network I/O device of the source node, by institute
The network I/O device for stating destination node writes the data to be written the nonvolatile memory of the destination node.
6. a kind of distributed node equipment, it is characterised in that applied in distributed memory system, including central processor CPU
And memory;
The program that the CPU is used to read in memory performs following steps:
Obtain data to be written and for verifying whether the data to be written back up complete verification data, the check number
According to identical with the data division of the setting length from the data tip forward to be written in the data to be written, pass through mesh
The network inputs output I/O device of node the data to be written are write in the nonvolatile memory of the destination node,
The data obtained by the network I/O device of the destination node in the nonvolatile memory of the destination node, having been written into
In, the data of setting length, the data for setting length are long as the setting from the data tip forward having been written into
The data of degree, it is determined that when the data of the setting length obtained from the destination node are identical with the verification data, it is determined that
Data backup success.
7. equipment as claimed in claim 6, it is characterised in that also including internal memory, for preserving initial data to be written with
And the default marker bit for being used to verify whether the initial data backs up completely, with setting length;
The CPU specifically for:
The initial data to be written and the marker bit are obtained from the internal memory, in the initial data to be written
End increases the marker bit, will increase the data obtained after the marker bit as the data to be written, by the mark
Position is used as the verification data;
After the data for obtaining the setting length, determine byte number that the marker bit includes and obtained from the destination node
The byte number that the packet of the setting length contains is identical, and the data of the setting length obtained from the destination node
The content of the content byte corresponding with the same position in the marker bit of each byte is identical, determine data backup into
Work(.
8. equipment as claimed in claim 6, it is characterised in that also including internal memory, the initial data to be written for preserving;
The CPU specifically for:
The initial data to be written is obtained from the internal memory, using the initial data to be written as described to be written
Data, obtain the number of the setting length from the initial data tip forward to be written in the initial data to be written
The verification data is set to according to part;
After the data for obtaining the setting length, determine byte number that the verification data includes with being obtained from the destination node
The setting length the byte number that contains of packet it is identical, and data of the setting length obtained from the destination node
Each byte content byte corresponding with the same position in the verification data content it is identical, determine data backup
Success.
9. the equipment as described in claim 6,7 or 8, it is characterised in that the CPU specifically for:
Atomic operation in remote direct data acquisition modes or read operation are sent to the net of the distributed node equipment
Network I/O device, the atomic operation or the read operation be used to indicating obtaining it is in the data that have been written into, write from described
The data of the setting length of the data tip forward entered, by the network I/O device of the distributed node equipment by the original
Child-operation or the read operation are transferred to the network I/O device of the destination node;
Receive that the network I/O device of the destination node is sent, being obtained according to the atomic operation or the read operation, institute
It is in the data being had been written into the nonvolatile memory for stating destination node, from the data tip forward having been written into
The data of the setting length.
10. the equipment as described in claim 6,7 or 8, it is characterised in that the CPU specifically for:
The data to be written are transferred to the network I/O device of the distributed node equipment, by the distributed node equipment
Network I/O device the data to be written are transferred to the network I/O device of the destination node, by the net of the destination node
Network I/O device writes the data to be written the nonvolatile memory of the destination node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410598754.1A CN104317716B (en) | 2014-10-30 | 2014-10-30 | Data transmission method and distributed node equipment between distributed node |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410598754.1A CN104317716B (en) | 2014-10-30 | 2014-10-30 | Data transmission method and distributed node equipment between distributed node |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104317716A CN104317716A (en) | 2015-01-28 |
CN104317716B true CN104317716B (en) | 2017-10-24 |
Family
ID=52372951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410598754.1A Active CN104317716B (en) | 2014-10-30 | 2014-10-30 | Data transmission method and distributed node equipment between distributed node |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104317716B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106453460B (en) * | 2015-08-12 | 2021-01-08 | 腾讯科技(深圳)有限公司 | File distribution method, device and system |
CN106445409A (en) * | 2016-09-13 | 2017-02-22 | 郑州云海信息技术有限公司 | Distributed block storage data writing method and device |
CN107592361B (en) * | 2017-09-20 | 2020-05-29 | 郑州云海信息技术有限公司 | Data transmission method, device and equipment based on dual IB network |
CN108494817B (en) * | 2018-02-08 | 2022-03-04 | 华为技术有限公司 | Data transmission method, related device and system |
CN110691062B (en) * | 2018-07-06 | 2021-01-26 | 浙江大学 | Data writing method, device and equipment |
CN110955734B (en) * | 2020-02-13 | 2020-08-21 | 北京一流科技有限公司 | Distributed signature decision system and method for logic node |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101188569A (en) * | 2006-11-16 | 2008-05-28 | 饶大平 | Method for constructing data quanta space in network and distributed file storage system |
CN101577716A (en) * | 2009-06-10 | 2009-11-11 | 中国科学院计算技术研究所 | Distributed storage method and system based on InfiniBand network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243858A1 (en) * | 2006-08-01 | 2008-10-02 | Latitude Broadband, Inc. | Design and Methods for a Distributed Database, Distributed Processing Network Management System |
-
2014
- 2014-10-30 CN CN201410598754.1A patent/CN104317716B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101188569A (en) * | 2006-11-16 | 2008-05-28 | 饶大平 | Method for constructing data quanta space in network and distributed file storage system |
CN101577716A (en) * | 2009-06-10 | 2009-11-11 | 中国科学院计算技术研究所 | Distributed storage method and system based on InfiniBand network |
Non-Patent Citations (2)
Title |
---|
RDMA技术的研究与应用;刘天华等;《沈阳师范大学学报(自然科学版)》;20060430;第24卷(第2期);第185-188页 * |
工程设备CAN总线控制系统设计;沈冬祥等;《微计算机信息》;20080131;第24卷(第1-2期);第50-52页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104317716A (en) | 2015-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104317716B (en) | Data transmission method and distributed node equipment between distributed node | |
CN107209644B (en) | Data processing method and NVMe memory | |
US20190163364A1 (en) | System and method for tcp offload for nvme over tcp-ip | |
US9727503B2 (en) | Storage system and server | |
US10891253B2 (en) | Multicast apparatuses and methods for distributing data to multiple receivers in high-performance computing and cloud-based networks | |
CN111930676B (en) | Method, device, system and storage medium for communication among multiple processors | |
CN108701004A (en) | A kind of system of data processing, method and corresponding intrument | |
US10026442B2 (en) | Data storage mechanism using storage system determined write locations | |
US10116746B2 (en) | Data storage method and network interface card | |
EP3542276B1 (en) | Flow control in remote direct memory access data communications with mirroring of ring buffers | |
CN110659151B (en) | Data verification method and device and storage medium | |
CN105556930A (en) | NVM EXPRESS controller for remote memory access | |
US11809290B2 (en) | Storage system and storage queue processing following port error | |
US20220222016A1 (en) | Method for accessing solid state disk and storage device | |
US9946721B1 (en) | Systems and methods for managing a network by generating files in a virtual file system | |
EP3542519B1 (en) | Faster data transfer with remote direct memory access communications | |
US9619336B2 (en) | Managing production data | |
CN108833477B (en) | Message transmission method, system and device based on shared memory | |
US10564847B1 (en) | Data movement bulk copy operation | |
WO2024183587A1 (en) | Message transmission methods and apparatuses, nonvolatile readable storage medium and electronic apparatus | |
US9715477B2 (en) | Shared-bandwidth multiple target remote copy | |
US20200341653A1 (en) | Method, network adapter and computer program product for processing data | |
CN115129509B (en) | Data transmission method, device and medium | |
WO2022156376A1 (en) | Method, system and device for prefetching target address, and medium | |
WO2022121385A1 (en) | File access method, storage node, and network card |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |