CN107070976A - A kind of data transmission method - Google Patents
A kind of data transmission method Download PDFInfo
- Publication number
- CN107070976A CN107070976A CN201710023479.4A CN201710023479A CN107070976A CN 107070976 A CN107070976 A CN 107070976A CN 201710023479 A CN201710023479 A CN 201710023479A CN 107070976 A CN107070976 A CN 107070976A
- Authority
- CN
- China
- Prior art keywords
- data
- output stream
- transmission method
- data transmission
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
- H04L67/5682—Policies or rules for updating, deleting or replacing the stored data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The invention discloses a kind of data transmission method, it is related to technical field of data processing, it is ensured that the accuracy of data stream distribution, so as to ensure the accuracy of Distributed Calculation, high efficiency.The data transmission method includes:Obtain the output stream from upstream node;Output stream carries out persistence backup according to data flow star;According to the subscribing relationship of the output stream and downstream node, by the output stream be sent to in the one-to-one shared drive queue of the downstream node;Send all data flows in the shared drive queue.
Description
Technical field
The present invention relates to data processing field, more particularly to a kind of data transmission method.
Background technology
With the rapid development of information technology, the application based on data processing and data analysis is widely welcome and closed
Note.The explosive growth of data scale is brought in bulk information source, and complicated calculations are carried out to mass data considerably beyond separate unit
The disposal ability of computer, has thus promoted the research to distributed system and its key technology.Needs in Distributed Calculation
The mass data of progress complicated calculations, which is cut into divide after fritter, transfers to multiple stage computers parallel processing, and local calculation result is whole
Conjunction draws final result.
Distributed Calculation is intended to that data repeatedly distribute or forwarded, and corresponding data flow is sent to pair
The computer or downstream node answered are handled, accordingly, it would be desirable to ensure the accuracy of data stream distribution, it is ensured that Distributed Calculation
Accuracy, high efficiency.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of data transmission method, it is ensured that data stream distribution
Accuracy, so as to ensure the accuracy of Distributed Calculation, high efficiency.
In order to solve the above technical problems, the present invention is adopted the following technical scheme that:
The embodiments of the invention provide a kind of data transmission method, the data transmission method includes:
Obtain the output stream from upstream node;
Output stream carries out persistence backup according to data flow star;
According to the subscribing relationship of the output stream and downstream node, the output stream is sent to and the downstream joint
In the one-to-one shared drive queue of point;
Send the data flow in the shared drive queue.
It is preferred that, with the one-to-one shared drive queue of the downstream node named with the downstream node
's.
It is preferred that, the data transmission method also includes:
Expired duration is set;
Between record is carried out to the output stream at the beginning of persistence backup;
Obtain current time;
If the time difference between current time and time started is more than or equal to the expired duration, the output data is deleted
Stream.
It is preferred that, obtaining the output stream from upstream node includes:
Designated port is monitored, and the designated port is added in epoll handles;
When receiving the name of the output stream from upstream node, famous pipeline is opened to read mark;
The filec descriptor of the famous pipeline is added in epoll circulations;
The output stream from upstream node is read in circulation from the famous pipeline.
It is preferred that, the output stream according to data flow star, which carries out persistence backup, to be included:
Data item in the shared drive queue is stored in key-value forms.
It is preferred that, the key values are the timestamps of data item.
It is preferred that, all data flows sent in the shared drive queue include:
Downstream joint roll-call is packaged into task to be put into task queue;
The downstream joint taken out in the task queue is called the roll, and data are taken out from corresponding shared drive queue;
The configuration of the downstream node according to Zookeeper sends data.
It is preferred that, the configuration of the downstream node according to Zookeeper, which sends data, to be included:
Obtain the flow set of the downstream node;
Based on the flow set, data are sent.
, can root when receiving the output stream from upstream node the invention provides a kind of data transmission method
Deposit a datastream on disk and do after persistence backup according to output stream name, and according to output stream and downstream node
Subscribing relationship, output stream is sent in the shared drive queue named with downstream node, downstream node is uniformly sent to.So that
In output stream, more targetedly, while having higher accuracy.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, embodiment will be described below
In required for the accompanying drawing that uses be briefly described, it should be apparent that, drawings in the following description are only some of the present invention
Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these
Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of data transmission method provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of protection of the invention.
The embodiments of the invention provide a kind of data transmission method, as shown in figure 1, the data transmission method includes:
Step S1, output stream of the acquisition from upstream node.
Specifically, in embodiments of the present invention, by monitoring a designated port for a long time and being added in epoll handles,
When receiving the output stream name (outstream name) of upstream node transmission, to read(READ)Mark is opened famous
Pipeline(Fifo channel), and pipe file descriptor is added in epoll circulations.
Step S2, according to data flow star output stream carry out persistence backup.
In order to increase the robustness of framework, data persistence module is with the addition of in node, and the module serves as disk queue
Role, it follows queue first in first out (FIFO) semanteme.The output stream being disposed is stored in disk by the module
On.In order to avoid disk size expands, persistence module be assigned to the certain expired time of every data and periodically by they from
Deleted on disk.
Carry out the unmatched problem of speed that persistence backup advantageously accounts for upstream and downstream node processing data flow.Particularly
When the transmission speed of upstream node is much larger than the reception of downstream node, processing speed, has substantial amounts of packet and saved in upstream
Accumulate and can not send in the buffering area of point, cause EMS memory occupation to expand and make upstream node internal operation slow.Persistence is standby
Part one layer of buffering has been done in the middle of upstream and downstream node, be conducive to the processing speed of alleviation upstream and downstream node it is variant bring ask
Topic.
In addition, the processing speed in order to improve Distributed Calculation, when each computer carries out local calculation, to data flow
Cutting, fusion and processing can all be completed in internal memory, it is ensured that the processing speed and response speed being exceedingly fast.But this also brings
Drawback, the i.e. long term data with respect to " landing " on disk, internal storage data is a kind of non-stable data, when a certain therein
Computer hinder restarting for some reason or directly delay machine when, whole disappearances are caused thing by the data in internal memory in Distributed Calculation
Therefore diffusion, influence the accuracy of system-computed result.
Further, many streaming computing systems do not support the function of breakpoint transmission, because it was assumed that data flow is disposable
By system, when node failure or calculating logic error, original data flow can only be allowed to perform one again again through system
It is secondary calculate, not only waste resource, particularly data flow have uniqueness and it is not reproducible when, last output result will not be wrapped
Containing this partial data stream, the accuracy of system is substantially reduced.Therefore, in the embodiment of the present invention, add and hold for this drawback
Longization back up, to support breakpoint transmission, error retransmission function.
In the embodiment of the present invention, persistence backs up the function of may also provide breakpoint transmission and error retransmission.In existing number
According in stream calculation system, data flow is disposable, fully pass through Installed System Memory, because data can not be recovered and reappear, so right
Failure is very sensitive, and fault-tolerance is poor.This requires the both sides for participating in data transmission to be carried out once before formally transmission data flow
Agreement, downstream node notifies the position that previous data stream of upstream node is terminated.Can be by fixed in the embodiment of the present invention
Position host-host protocol is used to the transmission location of location data stream once, and takes out the data of relevant position from persistence backup, from
And realize the recovery of data flow.
Specifically, on the basis of step S1, the output number that upstream node is disposed persistently is read from specified pipeline
According to stream, it is stored in the disk queue named with output stream, carries out persistence backup.Data item in disk queue is with key-
Value forms are stored, and key values are the timestamps of data item, and number is quickly positioned from hard disk queue when so facilitating error retransmission
According to item.According to the characteristics of epoll asynchronous read and writes, a state machine, simple similar TCP/IP data transfer association are realized
View, provides support, it is ensured that the accuracy and authenticity of data for the breakpoint transmission of data.
Epoll is that linux kernel has made improved poll for processing high-volume filec descriptor, is multichannel under Linux
I/O interface select/poll enhancing version is multiplexed, it is only a small amount of active in largely concurrently connecting that it can significantly improve program
In the case of system CPU utilization rate.Another reason is exactly that when obtaining event, it is without retouching that traversal is entirely listened
Symbol collection is stated, as long as traveling through those descriptor set that Ready queues are added by the asynchronous wake-up of kernel I/O event.
Epoll is except providing the select/poll level triggerings of that I/O event(Level Triggered)Outside, edge is additionally provided
Triggering(Edge Triggered), this allows for user space program and is possible to cache I/O state, reduces epoll_wait/
Epoll_pwait's calls, and improves application program efficiency.
In the embodiment of the present invention, efficient disk queue can be realized by open source software Tokyo Cabinet, so as to realize
Efficient persistence backup.Tokyo Cabinet (abbreviation TC) are the data storage shown a C language a engines, with key-
Value mode data storage, supports the plurality of data structures such as Hash, B+ tree, Hash Table, and read or write speed pole
It hurry up.
Step S3, the subscribing relationship according to output stream and downstream node, output stream is sent to and downstream joint
In the one-to-one shared drive queue of point.
By inquiring about the subscribing relationship of output stream and downstream node, by the data backed up in disk queue
It is sent in the shared drive queue named with downstream node (node), it is clear that each downstream node is to that there should be unique altogether
Enjoy memory queue.Downstream joint can be called the roll and encapsulated in uniformity during in order to ensure to receive and send data, the embodiment of the present invention
Be put into task in task queue, by parse task type be " inside " (from upstream node receive data be " outside " task,
From shared drive queue access according to being " inside " task), the downstream joint roll-call in taking-up task, from corresponding shared drive queue
Middle taking-up data, and data are sent according to the configuration (IP) of corresponding downstream node in Zookeeper.Shared message queue is also
A kind of mode of interprocess communication (IPC) in class unix systems, multiple processes can read or add the message in queue.It is shared
Message queue is that with the lasting message chained list of kernel, message therein has defined form, specific type and corresponding excellent
First level.
Step S4, the data flow sent in shared drive queue.
Data flow in shared drive queue is found respectively correspondingly after downstream node, you can by shared drive queue
Data send.
The embodiments of the invention provide a kind of data transmission method, when receiving the output stream from upstream node
When, it can be deposited a datastream on disk and be done after persistence backup according to output stream name, and according to output stream with
The subscribing relationship of node is swum, output stream is sent in the shared drive queue named with downstream node, downstream is uniformly sent to
Node.So that in output stream, more targetedly, while having higher accuracy.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (8)
1. a kind of data transmission method, it is characterised in that including:
Obtain the output stream from upstream node;
Output stream carries out persistence backup according to data flow star;
According to the subscribing relationship of the output stream and downstream node, the output stream is sent to and the downstream joint
In the one-to-one shared drive queue of point;
Send the data flow in the shared drive queue.
2. data transmission method according to claim 1, it is characterised in that
With the one-to-one shared drive queue of the downstream node named with the downstream node.
3. data transmission method according to claim 1, it is characterised in that also include:
Expired duration is set;
Between record is carried out to the output stream at the beginning of persistence backup;
Obtain current time;
If the time difference between current time and time started is more than or equal to the expired duration, the output data is deleted
Stream.
4. data transmission method according to claim 1, it is characterised in that obtain the output stream from upstream node
Including:
Designated port is monitored, and the designated port is added in epoll handles;
When receiving the name of the output stream from upstream node, famous pipeline is opened to read mark;
The filec descriptor of the famous pipeline is added in epoll circulations;
The output stream from upstream node is read in circulation from the famous pipeline.
5. data transmission method according to claim 1, it is characterised in that the output stream according to data flow star
Carrying out persistence backup includes:
Data item in the shared drive queue is stored in key-value forms.
6. data transmission method according to claim 5, it is characterised in that the key values are the timestamps of data item.
7. data transmission method according to claim 1, it is characterised in that send all in the shared drive queue
Data flow includes:
Downstream joint roll-call is packaged into task to be put into task queue;
The downstream joint taken out in the task queue is called the roll, and data are taken out from corresponding shared drive queue;
The configuration of the downstream node according to Zookeeper sends data.
8. data transmission method according to claim 7, it is characterised in that the downstream according to Zookeeper
The configuration of node, which sends data, to be included:
Obtain the flow set of the downstream node;
Based on the flow set, data are sent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710023479.4A CN107070976A (en) | 2017-01-13 | 2017-01-13 | A kind of data transmission method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710023479.4A CN107070976A (en) | 2017-01-13 | 2017-01-13 | A kind of data transmission method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107070976A true CN107070976A (en) | 2017-08-18 |
Family
ID=59599158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710023479.4A Pending CN107070976A (en) | 2017-01-13 | 2017-01-13 | A kind of data transmission method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107070976A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109343979A (en) * | 2018-09-28 | 2019-02-15 | 珠海沙盒网络科技有限公司 | A kind of configuring management method and system |
CN112241339A (en) * | 2020-10-23 | 2021-01-19 | 浪潮云信息技术股份公司 | Network-based Redis persistence method |
CN113535716A (en) * | 2021-06-23 | 2021-10-22 | 浙江浙大中控信息技术有限公司 | Efficient data storage and query management method |
CN113824777A (en) * | 2021-09-06 | 2021-12-21 | 武汉中科通达高新技术股份有限公司 | Data management method and data management device |
CN114969072A (en) * | 2022-06-06 | 2022-08-30 | 北京友友天宇系统技术有限公司 | Data transmission method, device and equipment based on state machine and data persistence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070299980A1 (en) * | 2006-06-13 | 2007-12-27 | International Business Machines Corporation | Maximal flow scheduling for a stream processing system |
CN104063293A (en) * | 2014-07-04 | 2014-09-24 | 华为技术有限公司 | Data backup method and streaming computing system |
CN105122733A (en) * | 2013-02-14 | 2015-12-02 | 起元技术有限责任公司 | Queue monitoring and visualization |
CN105335218A (en) * | 2014-07-03 | 2016-02-17 | 北京金山安全软件有限公司 | Streaming computing method and streaming computing system based on local |
CN105959151A (en) * | 2016-06-22 | 2016-09-21 | 中国工商银行股份有限公司 | High availability stream processing system and method |
-
2017
- 2017-01-13 CN CN201710023479.4A patent/CN107070976A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070299980A1 (en) * | 2006-06-13 | 2007-12-27 | International Business Machines Corporation | Maximal flow scheduling for a stream processing system |
CN105122733A (en) * | 2013-02-14 | 2015-12-02 | 起元技术有限责任公司 | Queue monitoring and visualization |
CN105335218A (en) * | 2014-07-03 | 2016-02-17 | 北京金山安全软件有限公司 | Streaming computing method and streaming computing system based on local |
CN104063293A (en) * | 2014-07-04 | 2014-09-24 | 华为技术有限公司 | Data backup method and streaming computing system |
CN105959151A (en) * | 2016-06-22 | 2016-09-21 | 中国工商银行股份有限公司 | High availability stream processing system and method |
Non-Patent Citations (1)
Title |
---|
顾昕: "分布式流式计算框架关键技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109343979A (en) * | 2018-09-28 | 2019-02-15 | 珠海沙盒网络科技有限公司 | A kind of configuring management method and system |
CN112241339A (en) * | 2020-10-23 | 2021-01-19 | 浪潮云信息技术股份公司 | Network-based Redis persistence method |
CN113535716A (en) * | 2021-06-23 | 2021-10-22 | 浙江浙大中控信息技术有限公司 | Efficient data storage and query management method |
CN113824777A (en) * | 2021-09-06 | 2021-12-21 | 武汉中科通达高新技术股份有限公司 | Data management method and data management device |
CN113824777B (en) * | 2021-09-06 | 2023-12-19 | 武汉中科通达高新技术股份有限公司 | Data management method and data management device |
CN114969072A (en) * | 2022-06-06 | 2022-08-30 | 北京友友天宇系统技术有限公司 | Data transmission method, device and equipment based on state machine and data persistence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107070976A (en) | A kind of data transmission method | |
Stellner | CoCheck: Checkpointing and process migration for MPI | |
EP2834755B1 (en) | Platform for continuous graph update and computation | |
US8856801B2 (en) | Techniques for executing normally interruptible threads in a non-preemptive manner | |
CN107046510B (en) | Node suitable for distributed computing system and system composed of nodes | |
WO2016115831A1 (en) | Fault tolerant method, apparatus and system for virtual machine | |
Varghese et al. | Hashed and hierarchical timing wheels: efficient data structures for implementing a timer facility | |
CN104699757B (en) | Distributed network information acquisition method under cloud environment | |
EP3172682B1 (en) | Distributing and processing streams over one or more networks for on-the-fly schema evolution | |
CN110502583A (en) | Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing | |
Zhuang et al. | Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems | |
Pickartz et al. | Application migration in HPC—a driver of the exascale era? | |
US7840725B2 (en) | Capture of data in a computer network | |
US9268621B2 (en) | Reducing latency in multicast traffic reception | |
Bergstrom et al. | The distributed open network emulator: Using relativistic time for distributed scalable simulation | |
CN106656870B (en) | Storage port-based two-layer switch storage method | |
CN107800501A (en) | A kind of method for synchronizing time, apparatus and system | |
Vardoulakis et al. | Tebis: index shipping for efficient replication in LSM key-value stores | |
Liu et al. | SAND: A fault-tolerant streaming architecture for network traffic analytics | |
WO2022160714A1 (en) | Communication method, apparatus, and system | |
CN115221116A (en) | Data writing method, device and equipment and readable storage medium | |
Tanenbaum | A comparison of three microkernels | |
CN110908798B (en) | Multi-process cooperative network traffic analysis method and device | |
CN107070977A (en) | It is a kind of to reduce the data transmission method of delay | |
Shi et al. | SyncSnap: Synchronized live memory snapshots of virtual machine networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |