CN108076111B - System and method for distributing data in big data platform - Google Patents

System and method for distributing data in big data platform Download PDF

Info

Publication number
CN108076111B
CN108076111B CN201611029700.9A CN201611029700A CN108076111B CN 108076111 B CN108076111 B CN 108076111B CN 201611029700 A CN201611029700 A CN 201611029700A CN 108076111 B CN108076111 B CN 108076111B
Authority
CN
China
Prior art keywords
data
distribution
asynchronous
module
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611029700.9A
Other languages
Chinese (zh)
Other versions
CN108076111A (en
Inventor
周伟
俞力
赵贵阳
周春楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yiyang Safety Technology Co ltd
Original Assignee
Yiyang Safety Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yiyang Safety Technology Co ltd filed Critical Yiyang Safety Technology Co ltd
Priority to CN201611029700.9A priority Critical patent/CN108076111B/en
Publication of CN108076111A publication Critical patent/CN108076111A/en
Application granted granted Critical
Publication of CN108076111B publication Critical patent/CN108076111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/562Brokering proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The system and the method for distributing data in the big data platform construct a big data distribution unit for high-speed distribution of big data by adopting asynchronous I/O as a technical basis, adopt thread separation of a server and a client, improve the throughput of data distribution, ensure the complete structuralization of the data by a multi-dimensional structure storage unit, and ensure the accuracy and the correctness of the data distribution by a big data management center and a data bus module, so that all parts can run at high speed without waiting for resources mutually and can fully utilize the resources. Meanwhile, the whole system has good flexibility.

Description

System and method for distributing data in big data platform
Technical Field
The invention relates to the field of data security, in particular to a method and a system for distributing data in a big data platform.
Background
In the prior art, a big data platform based on a Hadoop architecture has high expandability, high reliability and high fault tolerance. At present, a large amount of data queries and data flows widely adopt a Memory database (Memory DB), a non-relational database technology (NoSql) and a Cache technology (Cache), and good progress has been made. However, in the actual process of big data service processing, such as applications of wireless application protocol internet log, big user email system, Blog log analysis, user information tracking and analysis, etc., the current big data platform has defects in data I/O processing method, especially for unstructured, semi-structured, big data volume services, the I/O processing speed has a serious problem, which is mainly reflected in that:
1. under the condition of large data volume, especially under the condition of continuous writing of large data volume, the I/O performance is slower, and the I/O speed-up ratio and the number of the server nodes are not in a linear relation;
2. on the processing of unstructured and semi-structured data such as LOG, BLOG, video, social relationship information and the like, optimization is not carried out according to the storage type and characteristics of big data, and the processing speed is slow;
3. the adoption of the multi-service synchronous writing technology leads to longer synchronization time under the conditions of unclear network and storage equipment and the like, and leads to longer spending time on the processing of data consistency.
Therefore, improving data high-speed distribution and I/O operations is the most important goal to improve the performance of large data platforms.
Disclosure of Invention
The purpose of the invention is realized by the following technical scheme.
According to an embodiment of the present invention, a system for distributing data in a big data platform is provided, where the system specifically includes: the system comprises a big data distribution unit, a data bus module, a management center and a big data adaptation module; wherein the content of the first and second substances,
the big data distribution unit is used for receiving data sent by a plurality of clients and storing the data in a cache of the big data distribution unit; acquiring a data distribution rule from the data bus module and distributing the cache data; then distributing the data to a target server;
the data bus module is a distributed published message subscription system and is used for continuously storing data distribution rules for concurrent reading of big data distribution units;
the management center formulates a data distribution rule according to the resource information and sends the data distribution rule to the data bus module for storage;
and the big data adaptation module is used for collecting resource information of the big data distribution unit and the target server and sending the resource information to the management center.
Preferably, the big data distribution unit specifically includes:
the load balancing module is used for receiving data sent by a plurality of clients and caching the data into self caches of the asynchronous servers by using a load balancing algorithm;
the asynchronous server is used for caching the data balanced by the load balancing module; the data bus module is also used for acquiring data distribution rules from the data bus module; carrying out structure reconstruction on data in a cache of the cache according to a distribution rule, generating new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit;
the multidimensional structure storage unit is used for storing multidimensional structure data generated by the reconstruction data of the plurality of asynchronous servers;
and the asynchronous client is used for acquiring the data in the multi-dimensional structure storage unit, finding the resource information of the distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server.
In particular, the asynchronous server and the asynchronous client interact using an asynchronous I/O mode.
Preferably, the asynchronous server further specifically includes:
the configuration module is used for configuring the service provided by the asynchronous server and sending the configuration information to the management center;
the data acquisition module is used for receiving the data balanced by the load balancing module and storing the data in a cache of the data acquisition module;
the distribution rule acquisition module is used for acquiring a data distribution rule from the data bus module;
the structure reconstruction module is used for loading the distribution rule on data in a cache of the structure reconstruction module, forming a distribution packet comprising a source address and a port, a destination address and a port, a connection protocol and a data part, and loading the distribution packet into a bidirectional opening continuity data queue, wherein the queue is a bidirectional insertion/deletion queue at the head end and the tail end;
and the asynchronous client response module is used for responding to the request of the asynchronous client.
Preferably, the asynchronous client further specifically includes:
and the data distribution module is used for acquiring the data in the multi-dimensional structure storage unit through the asynchronous server, finding out the resource information of the distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server.
The detection module asynchronously waits for the operation completion signal of the target server and detects the signal; if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multidimensional structure storage unit; and if the detection signal shows that the data distribution fails, calling the data distribution module to retransmit the data.
According to another embodiment of the present invention, there is also provided a method performed by the above system for distributing data in a big data platform, the method including the steps of:
the big data distribution unit configures the provided service and sends configuration information to the management center;
the big data adaptation module collects configuration information of the big data distribution unit and resource information of the target server and sends the configuration information and the resource information to the management center;
the management center formulates a data distribution rule according to the configuration information and the resource information, and sends the data distribution rule to the data bus module for storage; the data bus module is a distributed published message subscription system and is used for persistently storing data distribution rules so as to enable the big data distribution units to concurrently read the big data distribution rules;
the big data distribution unit receives data sent by a plurality of clients and stores the data in a cache;
the big data distribution unit acquires a data distribution rule from the data bus module and distributes and processes data in the cache;
the large data distribution unit distributes data to the target server.
Preferably, the big data distribution unit receives data sent by the plurality of clients through the plurality of load balancing modules, and caches the data in the cache of each asynchronous server by using a load balancing algorithm.
Preferably, the big data distribution unit acquires the data distribution rule from the data bus module through the asynchronous server; performing structure reconstruction on the data in the cache according to a distribution rule to generate new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit; the big data distribution unit obtains the data in the multi-dimensional structure storage unit through the asynchronous client, finds the resource information of the distribution target server, and distributes the data in the multi-dimensional structure storage unit to the target server.
The method includes that the asynchronous server side performs structure reconstruction on data in a cache of the asynchronous server side to generate new multi-dimensional structure data, and specifically includes:
the asynchronous server loads the distribution rule to the data in its own high-speed buffer memory, and forms the distribution packet including source address and port, destination address and port, connection protocol and data part, and loads it into the bidirectional open continuity data queue, the queue is the bidirectional insertion/deletion queue at the head and tail ends.
And the asynchronous server and the asynchronous client use an asynchronous I/O mode for interaction.
Preferably, after the asynchronous client distributes the data to the target server, the method further includes:
the asynchronous client asynchronously waits for an operation completion signal of the target server and detects the signal;
if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multi-dimensional structure storage unit;
and if the detection signal shows that the data distribution fails, repeatedly distributing the data to the target server.
The system and the method for distributing data in the big data platform construct a big data distribution unit for high-speed distribution of big data by adopting asynchronous I/O as a technical basis, adopt thread separation of a server and a client, improve the throughput of data distribution, ensure the complete structuralization of the data by a multi-dimensional structure storage unit, and ensure the accuracy and the correctness of the data distribution by a big data management center and a data bus module, so that all parts can run at high speed without waiting for resources mutually and can fully utilize the resources. Meanwhile, the whole system has good flexibility.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a schematic diagram of a system for distributing data in a big data platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a big data distribution unit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an asynchronous server architecture according to an embodiment of the present invention;
FIG. 4 shows a flow diagram of a method for distributing data in a big data platform, according to another embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
According to an embodiment of the present invention, a system for distributing data in a big data platform is provided, as shown in fig. 1, the system specifically includes: the system comprises a big data distribution unit M101, a data bus module M102, a management center M103 and a big data adaptation module M104; wherein the content of the first and second substances,
the big data distribution unit is used for receiving mass data sent by a plurality of clients and storing the mass data in a cache of the big data distribution unit; acquiring a data distribution rule from the data bus module and distributing the cache data; then distributing the data to a target server;
the data bus module is a distributed published message subscription system and is used for continuously storing data distribution rules so that big data distribution units can efficiently and concurrently read at any time;
the management center formulates a data distribution rule according to the resource information and sends the data distribution rule to the data bus module for storage;
the big data adaptation module is used for collecting resource information of the big data distribution unit and the target server and sending the resource information to the management center; the resource information at least comprises a connection protocol, an IP address and a port.
Preferably, as shown in fig. 2, the big data distribution unit specifically includes:
the load balancing module is used for receiving mass data sent by a plurality of clients and caching the data into self caches of the asynchronous servers by using a load balancing algorithm;
the asynchronous server is used for caching the data balanced by the load balancing module; the data bus module is also used for acquiring data distribution rules from the data bus module; carrying out structure reconstruction on data in a cache of the cache according to a distribution rule, generating new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit;
the multidimensional structure storage unit is used for storing multidimensional structure data generated by the reconstruction data of the plurality of asynchronous servers;
and the asynchronous client is used for acquiring the data in the multi-dimensional structure storage unit, finding the resource information of the distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server.
In particular, the asynchronous server and the asynchronous client interact using an asynchronous I/O mode.
Preferably, as shown in fig. 3, the asynchronous server further specifically includes:
the configuration module is used for configuring the service provided by the asynchronous server and sending the configuration information to the management center;
the data acquisition module is used for receiving the data balanced by the load balancing module and storing the data in a cache of the data acquisition module;
the distribution rule acquisition module is used for acquiring a data distribution rule from the data bus module;
the structure reconstruction module is used for loading the distribution rule on the data in the cache of the structure reconstruction module, forming a distribution packet comprising a source address and a port, a destination address and a port, a connection protocol and a data part, and loading the distribution packet into a bidirectional opening continuity data queue, wherein the queue is an efficient bidirectional insertion/deletion queue at the head end and the tail end;
and the asynchronous client response module is used for responding to the request of the asynchronous client.
Preferably, the asynchronous client further specifically includes:
and the data distribution module is used for acquiring the data in the multi-dimensional structure storage unit through the asynchronous server, finding out the resource information of the distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server.
The detection module asynchronously waits for the operation completion signal of the target server and detects the signal; if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multidimensional structure storage unit; and if the detection signal shows that the data distribution fails, calling the data distribution module to retransmit the data.
According to another embodiment of the present invention, there is also provided a method for distributing data in a big data platform, performed by the above system, as shown in fig. 4, the method including the steps of:
the big data distribution unit configures the provided service and sends configuration information to the management center;
the big data adaptation module collects configuration information of the big data distribution unit and resource information of the target server and sends the configuration information and the resource information to the management center; the resource information at least comprises a connection protocol, an IP address and a port;
the management center formulates a data distribution rule according to the configuration information and the resource information, and sends the data distribution rule to the data bus module for storage; the data bus module is a distributed published message subscription system and is used for persistently storing data distribution rules so as to efficiently and concurrently read the big data distribution units at any time; the consistency and correctness of the distribution rule are ensured.
For example: the management center firstly carries out 'formatting' on the resource information of the target server, and the data formed after formatting is as follows: a connection protocol of/user name of password @ host name @ ip address of port of ssh:// zhangsan of 123456@ localhost @127.0.0.1: 22.; then, the management center arranges the formatted resource information into a distribution rule according to the convention, and the specific form is as follows: connection protocol:// user name: password @ gateway name @ gateway ip address: source port->Connection protocol// user name: password @ destination host name @ destination ip address: destination port, as follows: { [ ssh:// zhangsan:123456@ host1@192.168.0.1@22->ssh://lisi:654321@host2@192.168.0.2:8022][…][…]…}。
The big data distribution unit receives mass data sent by a plurality of clients and stores the mass data in a cache;
the big data distribution unit acquires a data distribution rule from the data bus module and distributes and processes data in the cache;
the large data distribution unit distributes data to the target server.
Preferably, the big data distribution unit receives mass data sent by a plurality of clients through a plurality of load balancing modules, and caches the data in the cache of each asynchronous server by using a load balancing algorithm;
preferably, the big data distribution unit acquires the data distribution rule from the data bus module through the asynchronous server; performing structure reconstruction on the data in the cache according to a distribution rule to generate new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit; the big data distribution unit obtains the data in the multi-dimensional structure storage unit through the asynchronous client, finds the resource information of the distribution target server, and distributes the data in the multi-dimensional structure storage unit to the target server.
The method includes that the asynchronous server side performs structure reconstruction on data in a cache of the asynchronous server side to generate new multi-dimensional structure data, and specifically includes:
the asynchronous server loads the distribution rule to the data in its own high-speed buffer memory, and forms the distribution packet including source address and port, destination address and port, connection protocol and data part, and loads it into the bidirectional open continuous data queue, which is a high-efficiency bidirectional insertion/deletion queue at the head and tail ends. And the asynchronous server and the asynchronous client use an asynchronous I/O mode for interaction.
Preferably, after the asynchronous client distributes the data to the target server, the method further includes:
the asynchronous client asynchronously waits for an operation completion signal of the target server and detects the signal;
if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multi-dimensional structure storage unit;
and if the detection signal shows that the data distribution fails, repeatedly distributing the data to the target server.
The following describes in detail a specific implementation of the core part of the present application, i.e. the asynchronous I/O part. The specific implementation manner of the asynchronous I/O part is an asynchronous processing process, and specifically includes:
the load balancing modules receive data sent by the client through a Linux virtual server cluster (LVS) and send the data to the asynchronous server by using a load balancing technology. The load balancing technology comprises DNS load balancing, HTTP load balancing, IP load balancing, link layer load balancing and mixed P load balancing.
The asynchronous I/O selects an asyncio asynchronous module provided by python language, the asynchronous server generates asynchronous service by using get _ event _ loop rewrite service api in the asyncio asynchronous module, monitors a self port, and circularly receives data sent by the LVS by calling a run _ neutral _ complete method.
And after receiving the data, the asynchronous server calls a background method to reload the api, and writes the data into the cache of the asynchronous server.
And the asynchronous server feeds back a receiving completion signal to the LVS.
The asynchronous server reconstructs the data in the cache of the asynchronous server into a data structure which is convenient to call in real time, wherein a deque data structure provided by python language is used, and the data structure has the characteristics of advanced use, excellent performance, and excellent characteristics of deadlock prevention and the like.
The management center collects the information of the target server through a big data adaptation module and converts the information into a set of distribution rules which can be communicated with the asynchronous server, the rules can use json format, dit format or xml format, and the rules are stored on a bus of a kafka technical framework after being serialized by using a protocol buf technology provided by google.
The asynchronous server uses kafka to read and parse the distribution rules on the bus and saves the rules into its cache.
The asynchronous server loads the data in the deque structure through the distribution rule, and more dimensions and depths can be regenerated in the deque according to the distribution rule.
The asynchronous server side uses the asyncio asynchronous client side to distribute data at a high speed according to rules, and the optional client side is aiohttp/parmiko and the like.
The asynchronous server uses the async and the awake keywords to asynchronize the function, firstly, asynchronously obtain the response, and then asynchronously read the content of the response. The request is initiated using the client Session as the primary interface. Client sessions allow cookies and related object information to be saved between multiple requests. Session needs to be closed after the Session is used, and closing Session is another asynchronous operation, so asynchronization needs to be performed each time by using async with key words.
The asynchronous server establishes a client session, uses it to initiate a request, and starts other multiple asynchronous operations. After the asynchronous distribution program runs normally, the asynchronous server side adds other data in the cache into the event loop.
After the asynchronous server side finishes distributing the data, asynchronously waiting for a detection signal sent by a target server and directly storing the distributed data into a cache of the asynchronous server side; and when the completion signal sent by the target server is asynchronously received, releasing the corresponding part of the self cache.
When the asynchronous server receives a signal of failed reception of the target server, the asynchronous server reconstructs and pushes the part of data from the cache of the asynchronous server to the high-speed data structure again, resends the data to the target server, and repeats the two steps.
In the above example, the LVS represents a Linux Virtual Server cluster, including a load balancer, which is responsible for collecting data requests of clients and sending the data to a group of servers for caching; server pool: executing a data request of a client; shared stored provides data storage for server pool. The system provides a data source for the asynchronous server.
In the above example, kafka technology needs to be used to ensure the consistency of the distribution rules. kafka is a distributed publish-subscribe messaging system that provides high throughput for both publish and subscribe; it supports multiple subscribers, automatically balances consumers when failing; it persists messages to disk and is therefore available for bulk consumption (e.g., data warehousing technology ETL) as well as real-time applications. So kafka is a good technical carrier for providing distribution rules for asynchronous servers; when the distribution strategy is updated, the asynchronous server side can poll the kafka bus regularly to ensure the consistency of the distribution strategy.
The data distribution of the invention is based on a big data platform and adopts asynchronous I/O as a technical basis to carry out high-speed distribution, so that the usability of the whole system is greatly increased, and the efficiency is obviously improved under the condition of frequent reading and writing of unstructured data. The implementation method can effectively improve the I/O efficiency of the large data platform on the premise of keeping the consistency and the integrity of the data.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (5)

1. A system for distributing data in a big data platform specifically comprises: the system comprises a big data distribution unit, a data bus module, a management center and a big data adaptation module; wherein the content of the first and second substances,
the big data distribution unit is used for receiving data sent by a plurality of clients and storing the data in a cache of the big data distribution unit; acquiring a data distribution rule from the data bus module and distributing the cache data; then distributing the data to a target server;
the data bus module is a distributed published message subscription system and is used for continuously storing data distribution rules for concurrent reading of big data distribution units;
the management center formulates a data distribution rule according to the resource information and sends the data distribution rule to the data bus module for storage;
the big data adaptation module is used for collecting configuration information of the big data distribution unit and resource information of the target server and sending the configuration information and the resource information to the management center;
the management center is used for formulating a data distribution rule according to the configuration information and the resource information;
the big data distribution unit specifically comprises:
the load balancing module is used for receiving data sent by a plurality of clients and caching the data into self caches of the asynchronous servers by using a load balancing algorithm;
the asynchronous server is used for caching the data balanced by the load balancing module; the data bus module is also used for acquiring data distribution rules from the data bus module; carrying out structure reconstruction on data in a cache of the cache according to a distribution rule, generating new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit;
the asynchronous server further specifically comprises:
the configuration module is used for configuring the service provided by the asynchronous server and sending the configuration information to the management center;
the data acquisition module is used for receiving the data balanced by the load balancing module and storing the data in a cache of the data acquisition module;
the distribution rule acquisition module is used for acquiring a data distribution rule from the data bus module;
the structure reconstruction module is used for loading the distribution rule on data in a cache of the structure reconstruction module, forming a distribution packet comprising a source address and a port, a destination address and a port, a connection protocol and a data part, and loading the distribution packet into a bidirectional opening continuity data queue, wherein the queue is a bidirectional insertion/deletion queue at the head end and the tail end;
the asynchronous client response module is used for responding to the request of the asynchronous client;
the multidimensional structure storage unit is used for storing multidimensional structure data generated by the reconstruction data of the plurality of asynchronous servers;
and the asynchronous client is used for acquiring the data in the multi-dimensional structure storage unit, finding the resource information of the distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server.
2. The system of claim 1, the asynchronous server and asynchronous client to interact using asynchronous I/O mode.
3. The system of claim 1, wherein the asynchronous client further comprises:
the data distribution module is used for acquiring data in the multi-dimensional structure storage unit through the asynchronous server, finding out resource information of a distribution target server and distributing the data in the multi-dimensional structure storage unit to the target server;
the detection module asynchronously waits for the operation completion signal of the target server and detects the signal; if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multidimensional structure storage unit; and if the detection signal shows that the data distribution fails, calling the data distribution module to retransmit the data.
4. A method of distributing data in a big data platform, the method comprising the steps of:
the big data distribution unit configures the provided service and sends configuration information to the management center;
the big data adaptation module collects configuration information of the big data distribution unit and resource information of the target server and sends the configuration information and the resource information to the management center;
the management center formulates a data distribution rule according to the configuration information and the resource information, and sends the data distribution rule to the data bus module for storage; the data bus module is a distributed published message subscription system and is used for persistently storing data distribution rules so as to enable the big data distribution units to concurrently read the big data distribution rules;
the big data distribution unit receives data sent by a plurality of clients and stores the data in a cache;
the big data distribution unit acquires a data distribution rule from the data bus module and distributes and processes data in the cache;
the big data distribution unit distributes data to the target server;
the big data distribution unit receives data sent by a plurality of clients through a plurality of load balancing modules and caches the data in the cache of each asynchronous server by using a load balancing algorithm;
the big data distribution unit acquires a data distribution rule from the data bus module through the asynchronous server; performing structure reconstruction on the data in the cache according to a distribution rule to generate new multi-dimensional structure data containing distribution resource information, and storing the new multi-dimensional structure data in a multi-dimensional structure storage unit; the big data distribution unit acquires data in the multi-dimensional structure storage unit through the asynchronous client, finds resource information of a distribution target server, and distributes the data in the multi-dimensional structure storage unit to the target server;
the asynchronous server performs structure reconstruction on data in a cache of the asynchronous server to generate new multi-dimensional structure data, and the method specifically comprises the following steps:
the asynchronous server loads the distribution rule on the data in the cache of the asynchronous server, forms a distribution packet comprising a source address and a port, a destination address and a port, a connection protocol and a data part, and loads the distribution packet into a bidirectional open continuity data queue, wherein the queue is a bidirectional insertion/deletion queue at the head end and the tail end; and the asynchronous server and the asynchronous client use an asynchronous I/O mode for interaction.
5. The method of claim 4, after the asynchronous client distributing data to the target server, further comprising:
the asynchronous client asynchronously waits for an operation completion signal of the target server and detects the signal;
if the detection signal shows that the data distribution is successful, deleting the distributed data part from the multi-dimensional structure storage unit;
and if the detection signal shows that the data distribution fails, repeatedly distributing the data to the target server.
CN201611029700.9A 2016-11-15 2016-11-15 System and method for distributing data in big data platform Active CN108076111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611029700.9A CN108076111B (en) 2016-11-15 2016-11-15 System and method for distributing data in big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611029700.9A CN108076111B (en) 2016-11-15 2016-11-15 System and method for distributing data in big data platform

Publications (2)

Publication Number Publication Date
CN108076111A CN108076111A (en) 2018-05-25
CN108076111B true CN108076111B (en) 2021-07-09

Family

ID=62161323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611029700.9A Active CN108076111B (en) 2016-11-15 2016-11-15 System and method for distributing data in big data platform

Country Status (1)

Country Link
CN (1) CN108076111B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457582B (en) * 2019-08-10 2023-03-21 北京酷我科技有限公司 Data distribution method and recommendation system
CN112732996A (en) * 2021-01-11 2021-04-30 深圳市洪堡智慧餐饮科技有限公司 Multi-platform distributed data crawling method based on asynchronous aiohttp

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731298A (en) * 2013-11-15 2014-04-16 中国航天科工集团第二研究院七〇六所 Large-scale distributed network safety data acquisition method and system
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN104092767A (en) * 2014-07-21 2014-10-08 北京邮电大学 Posting/subscribing system for adding message queue models and working method thereof
CN104754036A (en) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 Message processing system and processing method based on kafka

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731298A (en) * 2013-11-15 2014-04-16 中国航天科工集团第二研究院七〇六所 Large-scale distributed network safety data acquisition method and system
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN104092767A (en) * 2014-07-21 2014-10-08 北京邮电大学 Posting/subscribing system for adding message queue models and working method thereof
CN104754036A (en) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 Message processing system and processing method based on kafka

Also Published As

Publication number Publication date
CN108076111A (en) 2018-05-25

Similar Documents

Publication Publication Date Title
US10439916B2 (en) Client-side fault tolerance in a publish-subscribe system
US9578081B2 (en) System and method for providing an actively invalidated client-side network resource cache
US9794365B2 (en) Re-establishing push notification channels via user identifiers
Tang et al. Design and implementation of push notification system based on the MQTT protocol
EP2485443A1 (en) System and method for managing multiple queues of non-persistent messages in a networked environment
US10075549B2 (en) Optimizer module in high load client/server systems
US20080177872A1 (en) Managing aggregation and sending of communications
CN106021315B (en) Log management method and system for application program
Lazidis et al. Publish–Subscribe approaches for the IoT and the cloud: Functional and performance evaluation of open-source systems
CN113839977B (en) Message pushing method, device, computer equipment and storage medium
KR20140072044A (en) Distributing multi-source push notifications to multiple targets
CN105306585A (en) Data synchronization method for plurality of data centers
CN111352716B (en) Task request method, device and system based on big data and storage medium
US20180336222A1 (en) Methods and systems for migrating public folders to online mailboxes
Sharvari et al. A study on modern messaging systems-kafka, rabbitmq and nats streaming
CN105183470A (en) Natural language processing systematic service platform
CN108076111B (en) System and method for distributing data in big data platform
CN110798495A (en) Method and server for end-to-end message push in cluster architecture mode
CN113630366A (en) Internet of things equipment access method and system
Hu et al. Research and implementation of campus information push system based on WebSocket
US10182119B2 (en) System and methods for facilitating communication among a subset of connections that connect to a web application
Ji et al. A push-notification service for use in the UCWW
Bertelsen et al. Federated publish/subscribe services
Meiklejohn et al. Partisan: Enabling cloud-scale erlang applications
CN103678521A (en) Distributed file monitoring system based on Hadoop frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant