CN115051981A - Zookeeper-based asynchronous downloading method and device - Google Patents

Zookeeper-based asynchronous downloading method and device Download PDF

Info

Publication number
CN115051981A
CN115051981A CN202210515994.5A CN202210515994A CN115051981A CN 115051981 A CN115051981 A CN 115051981A CN 202210515994 A CN202210515994 A CN 202210515994A CN 115051981 A CN115051981 A CN 115051981A
Authority
CN
China
Prior art keywords
dynamic
data
query
queues
downloading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210515994.5A
Other languages
Chinese (zh)
Inventor
许吉来
罗晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202210515994.5A priority Critical patent/CN115051981A/en
Publication of CN115051981A publication Critical patent/CN115051981A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the application provides a Zookeeper-based asynchronous downloading method and device, which are used for reducing the time consumed by data downloading, and the method specifically comprises the following steps: according to m Hadoop data sources, m dynamic queues are created in a Zookeeper, m is the number of components supporting a query function in a Hadoop cluster, m is a positive integer larger than or equal to 1, data downloading tasks of the m dynamic queues are controlled in parallel, each dynamic queue corresponds to one data source, a monitor monitors the change conditions of the m dynamic queues in real time, and according to the change conditions of the m dynamic queues, a query component corresponding to the changed dynamic queue is called to download data from the corresponding data source, so that the states of the data downloading tasks and downloaded data files are obtained.

Description

Zookeeper-based asynchronous downloading method and device
Technical Field
The invention relates to the technical field of electronics, in particular to a Zookeeper-based asynchronous downloading method and device.
Background
With the rapid development of information technology, big data technology is widely applied to various industries, and Hadoop provides big data solutions for various industries by virtue of low software and hardware cost and strong parallel computing capability. Because the Hadoop storage data volume is large, when a user inquires data on a foreground page, the user needs to adopt a paging mode for inquiry, and when the foreground page downloads large data volume, the time consumption is more, and great inconvenience is brought to the user.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an asynchronous downloading method based on Zookeeper, so as to overcome or at least partially solve the above problems, and the specific scheme is as follows:
in a first aspect, an embodiment of the present invention discloses an asynchronous downloading method based on Zookeeper, where the method includes:
according to the m Hadoop data sources, m dynamic queues are created in a Zookeeper, wherein m is the number of components supporting a query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and data downloading tasks of the m dynamic queues are controlled in parallel; each dynamic queue corresponds to a data source;
the monitoring program monitors the change conditions of the m dynamic queues in real time, and calls the query assemblies corresponding to the changed dynamic queues to download data from corresponding data sources according to the change conditions of the m dynamic queues;
and acquiring the state of the data downloading task and the downloaded data file.
Optionally, the creating m dynamic queues in the Zookeeper includes:
acquiring the m dynamic queues, wherein each dynamic queue comprises a primary node; the primary node corresponds to a query component;
when a downloading request is received, determining a query component corresponding to the downloading request;
determining a primary node corresponding to the downloading request according to the query component corresponding to the downloading request;
under the primary node corresponding to the downloading request, sequentially creating secondary nodes according to the sequence of receiving the downloading request; the secondary nodes correspond to the download requests one to one.
Optionally, sequentially creating, under the primary node corresponding to the download request, secondary nodes according to the order of receiving the download request further includes:
when a plurality of secondary nodes exist under any one primary node, the serial numbers of the plurality of secondary nodes are sequentially increased according to the sequence of creating the secondary nodes.
Optionally, the monitoring program monitors the change conditions of the m dynamic queues in real time, and invokes a query component corresponding to the dynamic queues to download data according to the change conditions of the m dynamic queues, including:
setting m monitoring programs, wherein each monitoring program respectively monitors the change condition of a secondary node in a dynamic queue in real time;
when the secondary node changes, acquiring information of all the secondary nodes in the dynamic queue where the changed secondary node is located according to a notification sent by a Zookeeper;
acquiring a secondary node with the minimum serial number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
Optionally, the monitoring program monitors the change conditions of the m dynamic queues in real time, and invokes query components corresponding to the dynamic queues to download data according to the change conditions of the m dynamic queues, and the method further includes:
and when the concurrency of the query component exceeds a preset threshold value, setting a monitoring program to be in a waiting state until the concurrency of the query component is smaller than the preset threshold value, and continuing monitoring the change conditions of the m dynamic queues.
In a second aspect, the present invention discloses an asynchronous downloading device based on Zookeeper, which includes:
the creating unit is used for creating m dynamic queues in the Zookeeper according to the m Hadoop data sources, wherein m is the number of components supporting the query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and the data downloading tasks of the m dynamic queues are controlled in parallel; each dynamic queue corresponds to a data source;
the monitoring unit is used for monitoring the change conditions of the m dynamic queues in real time through a monitoring program, and calling the query assemblies corresponding to the changed dynamic queues to download data from corresponding data sources according to the change conditions of the m dynamic queues;
and the acquisition unit is used for acquiring the data downloading task state and the downloaded data file.
Optionally, the creating unit is specifically configured to:
acquiring the m dynamic queues, wherein each dynamic queue comprises a primary node; the primary node corresponds to a query component;
when a downloading request is received, determining a query component corresponding to the downloading request;
determining a primary node corresponding to the downloading request according to the query component corresponding to the downloading request;
under the primary node corresponding to the downloading request, sequentially creating secondary nodes according to the sequence of receiving the downloading request; the secondary nodes correspond to the download requests one to one.
Optionally, the creating unit is specifically configured to:
when a plurality of secondary nodes exist under any one primary node, the serial numbers of the plurality of secondary nodes are sequentially increased according to the sequence of creating the secondary nodes.
Optionally, the monitoring unit is specifically configured to:
setting m monitoring programs, wherein each monitoring program respectively monitors the change condition of a secondary node in a dynamic queue in real time;
when the secondary node changes, acquiring information of all the secondary nodes in the dynamic queue where the changed secondary node is located according to a notification sent by a Zookeeper;
acquiring a secondary node with the minimum serial number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
Optionally, the listening unit is further configured to:
and when the concurrency of the query component exceeds a preset threshold, setting a monitoring program to be in a waiting state until the concurrency of the query component is smaller than the preset threshold, and continuing monitoring the change conditions of the m dynamic queues.
Compared with the prior art, the invention has the following beneficial effects:
according to the method and the device, a plurality of dynamic queues are established for different data source data in the Zookeeper, the change condition of each dynamic queue is monitored, the corresponding query assembly is called to download data from the corresponding data source according to the change condition, so that queue control is performed on data downloading of each assembly of Hadoop, mutual influence during data downloading of a plurality of assemblies is avoided, parallelization and automation of asynchronous downloading of Hadoop and multiple data sources are realized, and the use range of the Zookeeper is expanded.
Drawings
FIG. 1 is a diagram of an association relationship between roles of Zookeeper;
fig. 2 is a schematic flowchart of an asynchronous downloading method based on Zookeeper in the embodiment of the present application;
FIG. 3 is a Zookeeper node directory structure diagram;
fig. 4 is a schematic diagram of an asynchronous downloading device based on Zookeeper in the embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As can be seen from the background art, in the prior art, when a large amount of data is downloaded from multiple components of a Hadoop, a long time is required, and in the embodiment of the present application, a corresponding solution is provided for the above situation, and a detailed description is provided below for a specific implementation scheme provided by the embodiment of the present application.
Referring to fig. 1, for an association diagram among roles of Zookeeper, Zookeeper is used in the Hadoop big data technology, and Zookeeper has no alternative position in a distributed system. The Zookeeper is used as a coordinator of the distributed system, so that the distributed system is used like a single machine and is more reliable than the single machine. The Zookeeper is a cluster consisting of a plurality of servers, one leader and a plurality of folrows, each Server stores one copy of data, the global data is consistent, distributed reading and writing are performed, and data updating requests are transmitted to the leader by the folrows to be uniformly implemented. And (4) association relationship among roles of Zookeeper.
Referring to fig. 2, fig. 2 is a schematic flowchart of an asynchronous downloading method based on Zookeeper according to an embodiment of the present application, where the method specifically includes:
s201: according to the m Hadoop data sources, m dynamic queues are created in a Zookeeper, wherein m is the number of components supporting a query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and data downloading tasks of the m dynamic queues are controlled in parallel; each dynamic queue corresponds to a data source.
The Zookeeper node is of a tree structure, a root node and a plurality of branch nodes. And comprehensively utilizing the real-time property, the sequence property, the atomicity and the consistency of the Zookeeper during node updating, and dynamically managing the Zookeeper node by using a Java client API.
And determining the number of the components supporting the query function in the Hadoop cluster according to the local configuration of the Hadoop cluster, wherein the components supporting the query function can be Impala, HBase, Phoenix, Kylin and the like. And creating m dynamic queues according to the determined number of the components supporting the query function in the Hadoop cluster, wherein m is the number of the components supporting the query function in the Hadoop cluster, for example, when the determined query components are Impala and Hbase, m is equal to 2, and when the determined query components are Impala, Hbase and Phoenix, m is equal to 3. And according to the determined type quantity of the query components, creating dynamic queues with the same type quantity as the query components, wherein each dynamic queue corresponds to one data source. And downloading the downloading tasks through the plurality of dynamic queues, thereby realizing the parallel control of the downloading tasks.
In an embodiment, step S201 specifically includes:
a1, acquiring the m dynamic queues, wherein each dynamic queue comprises a first-level node; the one primary node corresponds to a query component.
A2, when receiving the download request, determining the inquiry component corresponding to the download request.
The received downloading request comprises information such as a task number, a user number, a data table name, query conditions and the like, and the query assembly corresponding to the downloading request is determined according to the query conditions.
A3, determining a primary node corresponding to the download request according to the query component corresponding to the download request.
A4, under the primary node corresponding to the download request, sequentially creating secondary nodes according to the order of receiving the download request; the secondary nodes correspond to the download requests one to one.
When a plurality of download requests which need to be downloaded by using the same inquiry component are received, a plurality of secondary nodes are created under the primary node corresponding to the inquiry component, and the serial numbers of the secondary nodes in the primary nodes are sequentially increased according to the order of creating the secondary nodes.
The method comprises the steps that first-level nodes in m dynamic queues respectively receive downloading requests of different query components, after the query components needed to be used are determined according to the downloading requests, second-level nodes are created under the first-level nodes corresponding to the query components, if a plurality of second-level nodes exist under the first-level nodes, serial numbers of the second-level nodes are sequentially increased progressively according to the order of creating the second-level nodes, and the serial number of the second-level node under one first-level node is not related to the serial numbers of the second-level nodes under other first-level nodes. For example, "/Impala", "/HBase", "/Phoenix", "/Kylin" 4 primary nodes (Znode) are created for receiving data download requests for Impala, HBase, Phoenix and Kylin, respectively. After a user submits a data downloading request, a secondary node is created under a corresponding primary node, the prefix of the node name is 'queue-', the type of the node is PERSISTENT _ SEQUENTIAL (persistent order, PERSISTENT represents persistent storage until a command is deleted, SEQUENTIAL represents an increasing unique serial number), and the format of the serial number is 10 digits 0000000001. Referring to fig. 3, the primary node is "/Impala", and the names of the secondary nodes to which "/Impala" belongs are "/Impala/queue-0000000001", "/Impala/queue-0000000002", "/Impala/queue-0000000003"; the primary node is "/Kylin", the names of the secondary nodes belonging to the "/Kylin" are "/Kylin/queue-0000000001", "/Kylin/queue-0000000002", and the like. Therefore, the download requests with different query conditions can be distributed to the corresponding dynamic queues, and the corresponding secondary nodes are created according to the sequence of the received download requests.
S202: and the monitoring program monitors the change condition of each dynamic queue in real time, and calls the query component corresponding to the changed dynamic queue to download data from the corresponding data source according to the change condition of each dynamic queue.
And monitoring whether each dynamic queue changes in real time, determining the changed dynamic queue when the dynamic queue changes, and selecting the query assembly corresponding to the dynamic queue to download data.
In an embodiment, step S202 specifically includes:
and B1, setting m monitoring programs, wherein each monitoring program respectively monitors the change condition of the secondary node in one dynamic queue in real time. The m listeners correspond to the m dynamic queues one to one.
And B2, when the secondary node changes, acquiring the information of all the secondary nodes in the dynamic queue where the changed secondary node is located according to the notification sent by the Zookeeper.
And the m monitoring programs monitor the change condition of the secondary nodes in the dynamic queue, and if the secondary nodes in the dynamic queue change, the information of all the secondary nodes in the dynamic queue where the changed secondary nodes are located is obtained through a getChildren () method according to the notification sent by the Zookeeper.
B3, acquiring the secondary node with the minimum sequence number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
And selecting a secondary node with the minimum serial number, acquiring the name and the query condition of the data table stored in the secondary node with the minimum serial number by a getData method because the data information of the downloading request is stored in the secondary node, packaging the name and the query condition of the data table into a data quantity query statement, selecting a Hadoop query component corresponding to the primary node in the dynamic queue for query, and deleting the secondary node from a directory structure. For example: the client side listens to the program Impala watch, reads the information in the secondary node with the minimum sequence number "/Impala/queue-XXXXXXXXXX" (xxxxxxxxxxxx is just beginning to be 0000000001) after finding that the secondary node under the "/Impala" node is changed, then queries the Impala component, and deletes the secondary node "/Impala/queue-0000000001 from the directory structure. And the downloading requests corresponding to the secondary nodes are correspondingly processed in sequence according to the application sequence of the downloading requests.
And rejecting the downloading request for the task with the query data volume exceeding 1048576 lines (the upper limit of excel), encapsulating the data table name and the query condition into a query statement for the condition that the query data volume is less than 1048576 lines, carrying out data query on the query component again, finally encapsulating the query result into an excel file, compressing the excel file, and storing the excel file into a certain directory of the WAS server, wherein the file is named as 'data downloading task number, zip', and is used for a user to download.
Setting a threshold value for the concurrency of the query assembly, setting the monitoring program to be in a waiting state when the concurrency of the query assembly exceeds the preset threshold value until the concurrency of the query assembly is smaller than the preset threshold value, and awakening the monitoring program to continue monitoring the change condition of the secondary nodes in the dynamic queue. The threshold is set for the concurrency of the query components, different preset thresholds may be set for different query components, the same preset threshold may be set for all query components, or a total preset threshold may be set for all query components. Therefore, the normal operation of the query assembly is ensured, and the conditions of over high utilization rate of a CPU and a memory and the like are avoided.
S203: and acquiring the state of the data downloading task and the downloaded data file.
When the user requires to track the download request provided by the user, if the task of the download request is not finished, the execution state of the current download task is displayed, and if the download request is finished, the user can download the execution result.
And setting a statistic and summarizing function so that a system administrator can conveniently carry out statistic analysis on the downloading time of each task and optimize the preset threshold of the query assembly.
The method comprises the steps of establishing a plurality of dynamic queues aiming at different data source data in the Zookeeper, monitoring the change condition of each dynamic queue, calling the corresponding query component to download data from the corresponding data source according to the change condition, and thus realizing queue control on data downloading of all components of Hadoop.
The embodiment of the invention provides an asynchronous downloading device based on Zookeeper, which comprises the following units:
a creating unit 401, configured to create m dynamic queues in the Zookeeper according to the m Hadoop data sources, where m is the number of components supporting a query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and perform parallel control on data downloading tasks of the m dynamic queues; each dynamic queue corresponds to a data source.
A monitoring unit 402, configured to monitor the change conditions of the m dynamic queues in real time through a monitor, and invoke query components corresponding to the changed dynamic queues to download data from corresponding data sources according to the change conditions of the m dynamic queues.
An obtaining unit 403, configured to obtain a data downloading task state and a downloaded data file.
A creating unit 401, configured to obtain the m dynamic queues, where each dynamic queue includes a first-level node; the primary node corresponds to a query component;
when a downloading request is received, determining a query component corresponding to the downloading request;
determining a primary node corresponding to the downloading request according to the query component corresponding to the downloading request;
under the primary node corresponding to the downloading request, sequentially creating secondary nodes according to the sequence of receiving the downloading request; the secondary nodes correspond to the download requests one to one.
When a plurality of secondary nodes exist under any one primary node, the serial numbers of the plurality of secondary nodes are sequentially increased according to the sequence of creating the secondary nodes.
A monitoring unit 402, configured to set m monitoring programs, where each monitoring program monitors a change condition of a secondary node in a dynamic queue in real time;
when the secondary node changes, acquiring information of all the secondary nodes in the dynamic queue where the changed secondary node is located according to a notification sent by a Zookeeper;
acquiring a secondary node with the minimum serial number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
And when the concurrency of the query component exceeds a preset threshold, setting a monitoring program to be in a waiting state until the concurrency of the query component is smaller than the preset threshold, and continuing monitoring the change conditions of the m dynamic queues.
The method comprises the steps of establishing a plurality of dynamic queues aiming at different data source data in the Zookeeper, monitoring the change condition of each dynamic queue, calling corresponding query components to download data from corresponding data sources according to the change condition, and accordingly realizing queue control of data downloading of all components of Hadoop.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A Zookeeper-based asynchronous downloading method is characterized by comprising the following steps:
according to the m Hadoop data sources, m dynamic queues are created in a Zookeeper, wherein m is the number of components supporting a query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and data downloading tasks of the m dynamic queues are controlled in parallel; each dynamic queue corresponds to a data source;
the monitoring program monitors the change conditions of the m dynamic queues in real time, and calls the query assemblies corresponding to the changed dynamic queues to download data from corresponding data sources according to the change conditions of the m dynamic queues;
and acquiring the state of the data downloading task and the downloaded data file.
2. The method of claim 1, wherein creating m dynamic queues in a Zookeeper comprises:
acquiring the m dynamic queues, wherein each dynamic queue comprises a primary node; the primary node corresponds to a query component;
when a downloading request is received, determining a query component corresponding to the downloading request;
determining a primary node corresponding to the downloading request according to the query component corresponding to the downloading request;
under the primary node corresponding to the downloading request, sequentially creating secondary nodes according to the sequence of receiving the downloading request; the secondary nodes correspond to the download requests one to one.
3. The method according to claim 2, wherein said creating, under the primary node corresponding to the download request, secondary nodes in order according to the order in which the download request is received further comprises:
when a plurality of secondary nodes exist under any one primary node, the serial numbers of the plurality of secondary nodes are sequentially increased according to the sequence of creating the secondary nodes.
4. The method according to claim 2, wherein the monitoring program monitors the change of the m dynamic queues in real time, and invokes the query component corresponding to the dynamic queue to download data according to the change of the m dynamic queues, including:
setting m monitoring programs, wherein each monitoring program respectively monitors the change condition of a secondary node in a dynamic queue in real time;
when the secondary node changes, acquiring information of all the secondary nodes in the dynamic queue where the changed secondary node is located according to a notification sent by a Zookeeper;
acquiring a secondary node with the minimum serial number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
5. The method according to claim 4, wherein the monitor monitors the change of the m dynamic queues in real time, and invokes the query component corresponding to the dynamic queue to download data according to the change of the m dynamic queues, further comprising:
and when the concurrency of the query component exceeds a preset threshold value, setting a monitoring program to be in a waiting state until the concurrency of the query component is smaller than the preset threshold value, and continuing monitoring the change conditions of the m dynamic queues.
6. An asynchronous Zookeeper-based downloading device, the device comprising:
the creating unit is used for creating m dynamic queues in the Zookeeper according to the m Hadoop data sources, wherein m is the number of components supporting the query function in a Hadoop cluster, and m is a positive integer greater than or equal to 1, and the data downloading tasks of the m dynamic queues are controlled in parallel; each dynamic queue corresponds to a data source;
the monitoring unit is used for monitoring the change conditions of the m dynamic queues in real time through a monitoring program, and calling the query assemblies corresponding to the changed dynamic queues to download data from corresponding data sources according to the change conditions of the m dynamic queues;
and the acquisition unit is used for acquiring the data downloading task state and the downloaded data file.
7. The apparatus according to claim 6, wherein the creating unit is specifically configured to:
acquiring the m dynamic queues, wherein each dynamic queue comprises a primary node; the primary node corresponds to a query component;
when a downloading request is received, determining a query component corresponding to the downloading request;
determining a primary node corresponding to the downloading request according to the query component corresponding to the downloading request;
under the primary node corresponding to the downloading request, sequentially creating secondary nodes according to the sequence of receiving the downloading request; the secondary nodes correspond to the download requests one to one.
8. The apparatus according to claim 7, wherein the creating unit is specifically configured to:
when a plurality of secondary nodes exist under any one primary node, the serial numbers of the plurality of secondary nodes are sequentially increased according to the sequence of creating the secondary nodes.
9. The apparatus of claim 7, wherein the listening unit is specifically configured to:
setting m monitoring programs, wherein each monitoring program respectively monitors the change condition of a secondary node in a dynamic queue in real time;
when the secondary node changes, acquiring information of all secondary nodes in the dynamic queue where the changed secondary node is located according to a notification sent by the Zookeeper;
acquiring a secondary node with the minimum serial number; according to the downloading request, packaging the data information stored in the secondary node with the minimum serial number into a data quantity query statement, and querying through a query component corresponding to the primary node; and deleting the secondary node with the minimum sequence number.
10. The apparatus of claim 9, wherein the listening unit is further configured to:
and when the concurrency of the query component exceeds a preset threshold value, setting a monitoring program to be in a waiting state until the concurrency of the query component is smaller than the preset threshold value, and continuing monitoring the change conditions of the m dynamic queues.
CN202210515994.5A 2022-05-12 2022-05-12 Zookeeper-based asynchronous downloading method and device Pending CN115051981A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210515994.5A CN115051981A (en) 2022-05-12 2022-05-12 Zookeeper-based asynchronous downloading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210515994.5A CN115051981A (en) 2022-05-12 2022-05-12 Zookeeper-based asynchronous downloading method and device

Publications (1)

Publication Number Publication Date
CN115051981A true CN115051981A (en) 2022-09-13

Family

ID=83157612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210515994.5A Pending CN115051981A (en) 2022-05-12 2022-05-12 Zookeeper-based asynchronous downloading method and device

Country Status (1)

Country Link
CN (1) CN115051981A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN106294472A (en) * 2015-06-03 2017-01-04 中国移动通信集团广东有限公司 The querying method of a kind of Hadoop data base HBase and device
CN110673933A (en) * 2019-08-15 2020-01-10 平安普惠企业管理有限公司 ZooKeeper-based distributed asynchronous queue implementation method, device, equipment and medium
CN112199334A (en) * 2020-10-23 2021-01-08 东北大学 Method and device for storing data stream processing check point file based on message queue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN106294472A (en) * 2015-06-03 2017-01-04 中国移动通信集团广东有限公司 The querying method of a kind of Hadoop data base HBase and device
CN110673933A (en) * 2019-08-15 2020-01-10 平安普惠企业管理有限公司 ZooKeeper-based distributed asynchronous queue implementation method, device, equipment and medium
CN112199334A (en) * 2020-10-23 2021-01-08 东北大学 Method and device for storing data stream processing check point file based on message queue

Similar Documents

Publication Publication Date Title
US7546335B2 (en) System and method for a data protocol layer and the transfer of data objects using the data protocol layer
US7970823B2 (en) System for sharing data objects among applications
TWI549080B (en) The method, system and device for sending information of category information
US10338958B1 (en) Stream adapter for batch-oriented processing frameworks
US10235384B2 (en) Servicing database operations using a messaging server
CN111143382B (en) Data processing method, system and computer readable storage medium
CN108196787B (en) Quota management method of cluster storage system and cluster storage system
CN109032803B (en) Data processing method and device and client
US10216556B2 (en) Master database synchronization for multiple applications
US11836132B2 (en) Managing persistent database result sets
US10635650B1 (en) Auto-partitioning secondary index for database tables
WO2023231339A1 (en) Transaction execution method and node in blockchain system, and blockchain system
CN115185705A (en) Message notification method, device, medium and equipment
CN113761052A (en) Database synchronization method and device
CN110929126A (en) Distributed crawler scheduling method based on remote procedure call
CN111444148A (en) Data transmission method and device based on MapReduce
CN111427920A (en) Data acquisition method, device, system, computer equipment and storage medium
CN112395337A (en) Data export method and device
CN115051981A (en) Zookeeper-based asynchronous downloading method and device
CN116108036A (en) Method and device for off-line exporting back-end system data
CN111159142A (en) Data processing method and device
US10114864B1 (en) List element query support and processing
US20180032553A1 (en) Augmenting database schema using information from multiple sources
CN114063931A (en) Data storage method based on big data
CN110650033B (en) Distributed application configuration management method and distributed computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination