CN107070976A - A kind of data transmission method - Google Patents

A kind of data transmission method Download PDF

Info

Publication number
CN107070976A
CN107070976A CN201710023479.4A CN201710023479A CN107070976A CN 107070976 A CN107070976 A CN 107070976A CN 201710023479 A CN201710023479 A CN 201710023479A CN 107070976 A CN107070976 A CN 107070976A
Authority
CN
China
Prior art keywords
data
output stream
transmission method
data transmission
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710023479.4A
Other languages
Chinese (zh)
Inventor
吴秋莉
尹立群
吕泽承
张炜
邬蓉蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Guangxi Power Grid Co Ltd filed Critical Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority to CN201710023479.4A priority Critical patent/CN107070976A/en
Publication of CN107070976A publication Critical patent/CN107070976A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention discloses a kind of data transmission method, it is related to technical field of data processing, it is ensured that the accuracy of data stream distribution, so as to ensure the accuracy of Distributed Calculation, high efficiency.The data transmission method includes:Obtain the output stream from upstream node;Output stream carries out persistence backup according to data flow star;According to the subscribing relationship of the output stream and downstream node, by the output stream be sent to in the one-to-one shared drive queue of the downstream node;Send all data flows in the shared drive queue.

Description

A kind of data transmission method
Technical field
The present invention relates to data processing field, more particularly to a kind of data transmission method.
Background technology
With the rapid development of information technology, the application based on data processing and data analysis is widely welcome and closed Note.The explosive growth of data scale is brought in bulk information source, and complicated calculations are carried out to mass data considerably beyond separate unit The disposal ability of computer, has thus promoted the research to distributed system and its key technology.Needs in Distributed Calculation The mass data of progress complicated calculations, which is cut into divide after fritter, transfers to multiple stage computers parallel processing, and local calculation result is whole Conjunction draws final result.
Distributed Calculation is intended to that data repeatedly distribute or forwarded, and corresponding data flow is sent to pair The computer or downstream node answered are handled, accordingly, it would be desirable to ensure the accuracy of data stream distribution, it is ensured that Distributed Calculation Accuracy, high efficiency.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of data transmission method, it is ensured that data stream distribution Accuracy, so as to ensure the accuracy of Distributed Calculation, high efficiency.
In order to solve the above technical problems, the present invention is adopted the following technical scheme that:
The embodiments of the invention provide a kind of data transmission method, the data transmission method includes:
Obtain the output stream from upstream node;
Output stream carries out persistence backup according to data flow star;
According to the subscribing relationship of the output stream and downstream node, the output stream is sent to and the downstream joint In the one-to-one shared drive queue of point;
Send the data flow in the shared drive queue.
It is preferred that, with the one-to-one shared drive queue of the downstream node named with the downstream node 's.
It is preferred that, the data transmission method also includes:
Expired duration is set;
Between record is carried out to the output stream at the beginning of persistence backup;
Obtain current time;
If the time difference between current time and time started is more than or equal to the expired duration, the output data is deleted Stream.
It is preferred that, obtaining the output stream from upstream node includes:
Designated port is monitored, and the designated port is added in epoll handles;
When receiving the name of the output stream from upstream node, famous pipeline is opened to read mark;
The filec descriptor of the famous pipeline is added in epoll circulations;
The output stream from upstream node is read in circulation from the famous pipeline.
It is preferred that, the output stream according to data flow star, which carries out persistence backup, to be included:
Data item in the shared drive queue is stored in key-value forms.
It is preferred that, the key values are the timestamps of data item.
It is preferred that, all data flows sent in the shared drive queue include:
Downstream joint roll-call is packaged into task to be put into task queue;
The downstream joint taken out in the task queue is called the roll, and data are taken out from corresponding shared drive queue;
The configuration of the downstream node according to Zookeeper sends data.
It is preferred that, the configuration of the downstream node according to Zookeeper, which sends data, to be included:
Obtain the flow set of the downstream node;
Based on the flow set, data are sent.
, can root when receiving the output stream from upstream node the invention provides a kind of data transmission method Deposit a datastream on disk and do after persistence backup according to output stream name, and according to output stream and downstream node Subscribing relationship, output stream is sent in the shared drive queue named with downstream node, downstream node is uniformly sent to.So that In output stream, more targetedly, while having higher accuracy.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, embodiment will be described below In required for the accompanying drawing that uses be briefly described, it should be apparent that, drawings in the following description are only some of the present invention Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of data transmission method provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
The embodiments of the invention provide a kind of data transmission method, as shown in figure 1, the data transmission method includes:
Step S1, output stream of the acquisition from upstream node.
Specifically, in embodiments of the present invention, by monitoring a designated port for a long time and being added in epoll handles, When receiving the output stream name (outstream name) of upstream node transmission, to read(READ)Mark is opened famous Pipeline(Fifo channel), and pipe file descriptor is added in epoll circulations.
Step S2, according to data flow star output stream carry out persistence backup.
In order to increase the robustness of framework, data persistence module is with the addition of in node, and the module serves as disk queue Role, it follows queue first in first out (FIFO) semanteme.The output stream being disposed is stored in disk by the module On.In order to avoid disk size expands, persistence module be assigned to the certain expired time of every data and periodically by they from Deleted on disk.
Carry out the unmatched problem of speed that persistence backup advantageously accounts for upstream and downstream node processing data flow.Particularly When the transmission speed of upstream node is much larger than the reception of downstream node, processing speed, has substantial amounts of packet and saved in upstream Accumulate and can not send in the buffering area of point, cause EMS memory occupation to expand and make upstream node internal operation slow.Persistence is standby Part one layer of buffering has been done in the middle of upstream and downstream node, be conducive to the processing speed of alleviation upstream and downstream node it is variant bring ask Topic.
In addition, the processing speed in order to improve Distributed Calculation, when each computer carries out local calculation, to data flow Cutting, fusion and processing can all be completed in internal memory, it is ensured that the processing speed and response speed being exceedingly fast.But this also brings Drawback, the i.e. long term data with respect to " landing " on disk, internal storage data is a kind of non-stable data, when a certain therein Computer hinder restarting for some reason or directly delay machine when, whole disappearances are caused thing by the data in internal memory in Distributed Calculation Therefore diffusion, influence the accuracy of system-computed result.
Further, many streaming computing systems do not support the function of breakpoint transmission, because it was assumed that data flow is disposable By system, when node failure or calculating logic error, original data flow can only be allowed to perform one again again through system It is secondary calculate, not only waste resource, particularly data flow have uniqueness and it is not reproducible when, last output result will not be wrapped Containing this partial data stream, the accuracy of system is substantially reduced.Therefore, in the embodiment of the present invention, add and hold for this drawback Longization back up, to support breakpoint transmission, error retransmission function.
In the embodiment of the present invention, persistence backs up the function of may also provide breakpoint transmission and error retransmission.In existing number According in stream calculation system, data flow is disposable, fully pass through Installed System Memory, because data can not be recovered and reappear, so right Failure is very sensitive, and fault-tolerance is poor.This requires the both sides for participating in data transmission to be carried out once before formally transmission data flow Agreement, downstream node notifies the position that previous data stream of upstream node is terminated.Can be by fixed in the embodiment of the present invention Position host-host protocol is used to the transmission location of location data stream once, and takes out the data of relevant position from persistence backup, from And realize the recovery of data flow.
Specifically, on the basis of step S1, the output number that upstream node is disposed persistently is read from specified pipeline According to stream, it is stored in the disk queue named with output stream, carries out persistence backup.Data item in disk queue is with key- Value forms are stored, and key values are the timestamps of data item, and number is quickly positioned from hard disk queue when so facilitating error retransmission According to item.According to the characteristics of epoll asynchronous read and writes, a state machine, simple similar TCP/IP data transfer association are realized View, provides support, it is ensured that the accuracy and authenticity of data for the breakpoint transmission of data.
Epoll is that linux kernel has made improved poll for processing high-volume filec descriptor, is multichannel under Linux I/O interface select/poll enhancing version is multiplexed, it is only a small amount of active in largely concurrently connecting that it can significantly improve program In the case of system CPU utilization rate.Another reason is exactly that when obtaining event, it is without retouching that traversal is entirely listened Symbol collection is stated, as long as traveling through those descriptor set that Ready queues are added by the asynchronous wake-up of kernel I/O event. Epoll is except providing the select/poll level triggerings of that I/O event(Level Triggered)Outside, edge is additionally provided Triggering(Edge Triggered), this allows for user space program and is possible to cache I/O state, reduces epoll_wait/ Epoll_pwait's calls, and improves application program efficiency.
In the embodiment of the present invention, efficient disk queue can be realized by open source software Tokyo Cabinet, so as to realize Efficient persistence backup.Tokyo Cabinet (abbreviation TC) are the data storage shown a C language a engines, with key- Value mode data storage, supports the plurality of data structures such as Hash, B+ tree, Hash Table, and read or write speed pole It hurry up.
Step S3, the subscribing relationship according to output stream and downstream node, output stream is sent to and downstream joint In the one-to-one shared drive queue of point.
By inquiring about the subscribing relationship of output stream and downstream node, by the data backed up in disk queue It is sent in the shared drive queue named with downstream node (node), it is clear that each downstream node is to that there should be unique altogether Enjoy memory queue.Downstream joint can be called the roll and encapsulated in uniformity during in order to ensure to receive and send data, the embodiment of the present invention Be put into task in task queue, by parse task type be " inside " (from upstream node receive data be " outside " task, From shared drive queue access according to being " inside " task), the downstream joint roll-call in taking-up task, from corresponding shared drive queue Middle taking-up data, and data are sent according to the configuration (IP) of corresponding downstream node in Zookeeper.Shared message queue is also A kind of mode of interprocess communication (IPC) in class unix systems, multiple processes can read or add the message in queue.It is shared Message queue is that with the lasting message chained list of kernel, message therein has defined form, specific type and corresponding excellent First level.
Step S4, the data flow sent in shared drive queue.
Data flow in shared drive queue is found respectively correspondingly after downstream node, you can by shared drive queue Data send.
The embodiments of the invention provide a kind of data transmission method, when receiving the output stream from upstream node When, it can be deposited a datastream on disk and be done after persistence backup according to output stream name, and according to output stream with The subscribing relationship of node is swum, output stream is sent in the shared drive queue named with downstream node, downstream is uniformly sent to Node.So that in output stream, more targetedly, while having higher accuracy.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (8)

1. a kind of data transmission method, it is characterised in that including:
Obtain the output stream from upstream node;
Output stream carries out persistence backup according to data flow star;
According to the subscribing relationship of the output stream and downstream node, the output stream is sent to and the downstream joint In the one-to-one shared drive queue of point;
Send the data flow in the shared drive queue.
2. data transmission method according to claim 1, it is characterised in that
With the one-to-one shared drive queue of the downstream node named with the downstream node.
3. data transmission method according to claim 1, it is characterised in that also include:
Expired duration is set;
Between record is carried out to the output stream at the beginning of persistence backup;
Obtain current time;
If the time difference between current time and time started is more than or equal to the expired duration, the output data is deleted Stream.
4. data transmission method according to claim 1, it is characterised in that obtain the output stream from upstream node Including:
Designated port is monitored, and the designated port is added in epoll handles;
When receiving the name of the output stream from upstream node, famous pipeline is opened to read mark;
The filec descriptor of the famous pipeline is added in epoll circulations;
The output stream from upstream node is read in circulation from the famous pipeline.
5. data transmission method according to claim 1, it is characterised in that the output stream according to data flow star Carrying out persistence backup includes:
Data item in the shared drive queue is stored in key-value forms.
6. data transmission method according to claim 5, it is characterised in that the key values are the timestamps of data item.
7. data transmission method according to claim 1, it is characterised in that send all in the shared drive queue Data flow includes:
Downstream joint roll-call is packaged into task to be put into task queue;
The downstream joint taken out in the task queue is called the roll, and data are taken out from corresponding shared drive queue;
The configuration of the downstream node according to Zookeeper sends data.
8. data transmission method according to claim 7, it is characterised in that the downstream according to Zookeeper The configuration of node, which sends data, to be included:
Obtain the flow set of the downstream node;
Based on the flow set, data are sent.
CN201710023479.4A 2017-01-13 2017-01-13 A kind of data transmission method Pending CN107070976A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710023479.4A CN107070976A (en) 2017-01-13 2017-01-13 A kind of data transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710023479.4A CN107070976A (en) 2017-01-13 2017-01-13 A kind of data transmission method

Publications (1)

Publication Number Publication Date
CN107070976A true CN107070976A (en) 2017-08-18

Family

ID=59599158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710023479.4A Pending CN107070976A (en) 2017-01-13 2017-01-13 A kind of data transmission method

Country Status (1)

Country Link
CN (1) CN107070976A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109343979A (en) * 2018-09-28 2019-02-15 珠海沙盒网络科技有限公司 A kind of configuring management method and system
CN112241339A (en) * 2020-10-23 2021-01-19 浪潮云信息技术股份公司 Network-based Redis persistence method
CN113535716A (en) * 2021-06-23 2021-10-22 浙江浙大中控信息技术有限公司 Efficient data storage and query management method
CN113824777A (en) * 2021-09-06 2021-12-21 武汉中科通达高新技术股份有限公司 Data management method and data management device
CN114969072A (en) * 2022-06-06 2022-08-30 北京友友天宇系统技术有限公司 Data transmission method, device and equipment based on state machine and data persistence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299980A1 (en) * 2006-06-13 2007-12-27 International Business Machines Corporation Maximal flow scheduling for a stream processing system
CN104063293A (en) * 2014-07-04 2014-09-24 华为技术有限公司 Data backup method and streaming computing system
CN105122733A (en) * 2013-02-14 2015-12-02 起元技术有限责任公司 Queue monitoring and visualization
CN105335218A (en) * 2014-07-03 2016-02-17 北京金山安全软件有限公司 Streaming computing method and streaming computing system based on local
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299980A1 (en) * 2006-06-13 2007-12-27 International Business Machines Corporation Maximal flow scheduling for a stream processing system
CN105122733A (en) * 2013-02-14 2015-12-02 起元技术有限责任公司 Queue monitoring and visualization
CN105335218A (en) * 2014-07-03 2016-02-17 北京金山安全软件有限公司 Streaming computing method and streaming computing system based on local
CN104063293A (en) * 2014-07-04 2014-09-24 华为技术有限公司 Data backup method and streaming computing system
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
顾昕: "分布式流式计算框架关键技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109343979A (en) * 2018-09-28 2019-02-15 珠海沙盒网络科技有限公司 A kind of configuring management method and system
CN112241339A (en) * 2020-10-23 2021-01-19 浪潮云信息技术股份公司 Network-based Redis persistence method
CN113535716A (en) * 2021-06-23 2021-10-22 浙江浙大中控信息技术有限公司 Efficient data storage and query management method
CN113824777A (en) * 2021-09-06 2021-12-21 武汉中科通达高新技术股份有限公司 Data management method and data management device
CN113824777B (en) * 2021-09-06 2023-12-19 武汉中科通达高新技术股份有限公司 Data management method and data management device
CN114969072A (en) * 2022-06-06 2022-08-30 北京友友天宇系统技术有限公司 Data transmission method, device and equipment based on state machine and data persistence

Similar Documents

Publication Publication Date Title
CN107070976A (en) A kind of data transmission method
Stellner CoCheck: Checkpointing and process migration for MPI
EP2834755B1 (en) Platform for continuous graph update and computation
US8856801B2 (en) Techniques for executing normally interruptible threads in a non-preemptive manner
CN107046510B (en) Node suitable for distributed computing system and system composed of nodes
WO2016115831A1 (en) Fault tolerant method, apparatus and system for virtual machine
Varghese et al. Hashed and hierarchical timing wheels: efficient data structures for implementing a timer facility
CN104699757B (en) Distributed network information acquisition method under cloud environment
EP3172682B1 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
CN110502583A (en) Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing
Zhuang et al. Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems
Pickartz et al. Application migration in HPC—a driver of the exascale era?
US7840725B2 (en) Capture of data in a computer network
US9268621B2 (en) Reducing latency in multicast traffic reception
Bergstrom et al. The distributed open network emulator: Using relativistic time for distributed scalable simulation
CN106656870B (en) Storage port-based two-layer switch storage method
CN107800501A (en) A kind of method for synchronizing time, apparatus and system
Vardoulakis et al. Tebis: index shipping for efficient replication in LSM key-value stores
Liu et al. SAND: A fault-tolerant streaming architecture for network traffic analytics
WO2022160714A1 (en) Communication method, apparatus, and system
CN115221116A (en) Data writing method, device and equipment and readable storage medium
Tanenbaum A comparison of three microkernels
CN110908798B (en) Multi-process cooperative network traffic analysis method and device
CN107070977A (en) It is a kind of to reduce the data transmission method of delay
Shi et al. SyncSnap: Synchronized live memory snapshots of virtual machine networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818