CN111722980A - Data acquisition system and method - Google Patents

Data acquisition system and method Download PDF

Info

Publication number
CN111722980A
CN111722980A CN202010529803.1A CN202010529803A CN111722980A CN 111722980 A CN111722980 A CN 111722980A CN 202010529803 A CN202010529803 A CN 202010529803A CN 111722980 A CN111722980 A CN 111722980A
Authority
CN
China
Prior art keywords
node
telegraf
leader
leader node
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010529803.1A
Other languages
Chinese (zh)
Other versions
CN111722980B (en
Inventor
徐晶
李琳
张晓颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010529803.1A priority Critical patent/CN111722980B/en
Publication of CN111722980A publication Critical patent/CN111722980A/en
Application granted granted Critical
Publication of CN111722980B publication Critical patent/CN111722980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes

Abstract

The embodiment of the invention provides a data acquisition system and a method, wherein the system comprises: the system comprises a plurality of servers, wherein each server is provided with a Telegraf node, and the Telegraf node is used for starting a corresponding Telegraf subprocess to acquire data; a ZooKeeper distributed framework module comprising a plurality of temporary fields registered by the Telegraf node on the ZooKeeper distributed framework module; a Redis database comprising an election event blocking queue for obtaining trigger messages for election events among the Telegraf nodes. According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively arranged on the plurality of servers, and a new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.

Description

Data acquisition system and method
Technical Field
The invention relates to the field of big data, in particular to a data acquisition system and a data acquisition method.
Background
Telegraf is a real-time acquisition tool for performance index data related to middleware of a server-based infrastructure. In the prior art, a Telegraf process is generally deployed on a certain server to monitor various types of infrastructures and middleware distributed on multiple servers.
If the server where the Telegraf process is located goes down or has a network fault, and the like, the Telegraf process cannot acquire relevant data in real time, and the data are permanently lost because the data are not acquired in real time, and subsequent data analysis work is affected.
Disclosure of Invention
In view of at least one of the above technical problems in the prior art, embodiments of the present invention provide a data acquisition system and method.
In a first aspect, an embodiment of the present invention provides a data acquisition system, including a plurality of servers, a ZooKeeper distributed framework module, and a Redis database, where:
the multiple servers are all deployed with Telegraf nodes; the data acquisition system comprises Telegraf nodes, wherein the Telegraf nodes comprise only one Telegraf node serving as a Leader node, and the Leader node is used for starting a Telegraf subprocess to acquire data;
the Telegraf nodes comprise event monitoring nodes and message acquisition nodes; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is satisfied, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is satisfied; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is used for taking the message as the elected Leader node and starting a corresponding Telegraf subprocess to acquire data if the message triggering the Leader node election event is obtained from the election event blocking queue;
the ZooKeeper distributed framework module comprises a temporary field registered by the Telegraf node on the ZooKeeper distributed framework module; the temporary field is used as a basis for the event monitoring node to monitor whether a trigger condition of the Leader node election event is satisfied;
the Redis database comprises an election event blocking queue, and is used for storing messages for triggering the Leader node election events.
Optionally, the Redis database further includes a distributed lock, configured to determine a telegraff node that sends a trigger message of the election event to the election event blocking queue.
Optionally, the ZooKeeper distributed framework module further includes a Leader field, and is configured to store identification information of a server where the Leader node is located.
In a second aspect, an embodiment of the present invention provides a data acquisition method applied to the data acquisition system in the first aspect, including:
monitoring whether a triggering condition of a Leader node election event is satisfied, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is monitored to be satisfied; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
and if the message triggering the Leader node election event is acquired from the election event blocking queue, starting a corresponding Telegraf subprocess for data acquisition as the elected selected node serving as the Leader node.
Optionally, the triggering condition of the Leader node election event is specifically:
the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last receiving time and the current time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module exceeds a preset threshold value.
Optionally, the sending a message triggering a Leader node election event to the election event blocking queue includes:
a plurality of Telegraf nodes contend for a distributed lock of a Redis database;
and the Telegraf node contends for the distributed lock, and sends a message for triggering a Leader node election event to an election event blocking queue of the Redis database.
Optionally, after the selected candidate node serving as the Leader node starts a corresponding Telegraf subprocess to perform data acquisition, the method further includes:
and if the start of the Telegraf subprocess is abnormal, triggering the Leader node election event again.
Optionally, the method further comprises:
and the ZooKeeper distributed framework module sends an event notice for closing a Telegraf subprocess corresponding to the Leader node.
Optionally, the method further comprises:
and determining one Telegraf node in a plurality of Telegraf nodes as a Leader node for data acquisition by setting a Leader field in the ZooKeeper distributed framework module.
Optionally, the determining, by setting a leader field in the ZooKeeper distributed framework module, one Telegraf node among the multiple Telegraf nodes for data acquisition includes:
setting a leader field in a ZooKeeper distributed framework module as identification information of a server where one Telegraf node in a plurality of Telegraf nodes is located;
and the Telegraf node reads a leader field in the ZooKeeper distributed framework module, and if the leader field is identification information of a server where the Telegraf node is located, the Telegraf node starts a corresponding Telegraf sub-process to collect data.
According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively arranged on the plurality of servers, and a new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention, and as shown in fig. 1, the system includes:
the system comprises a plurality of servers 110, wherein each server is provided with a Telegraf node 111, only one Telegraf node 111 serving as a Leader node is included in the Telegraf nodes 111 included in the data acquisition system, and the Leader node is used for starting a Telegraf subprocess 112 to acquire data;
the Telegraf node 111 comprises an event monitoring node and a message acquisition node; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is satisfied, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is satisfied; the Leader node election event is used for triggering a Telegraf node 111 included in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is configured to, if the message triggering the Leader node election event is obtained from the election event blocking queue, serve as the elected node serving as the Leader node, and start a corresponding Telegraf subprocess 112 to perform data acquisition;
a ZooKeeper distributed framework module 120 including a plurality of temporary fields 121 in which the Telegraf node is registered on the ZooKeeper distributed framework module; the temporary field is used as a basis for the event monitoring node to monitor whether a trigger condition of the Leader node election event is satisfied;
the Redis database 130 includes an election event blocking queue 131, configured to store a message that triggers the Leader node election event.
Specifically, the implementation of the present invention is applied to a multi-server environment, where a telgraff node 111 is deployed on each server 110, that is, a plurality of independent telgraff nodes 111 exist in the multi-server environment. One of the important functions of the Telegraf node 111 is to initiate the corresponding Telegraf sub-process 112 for data collection. More generally, the Telegraf node 111 can be used to manage the lifecycle of the corresponding Telegraf sub-process 112, such as starting and stopping the Telegraf sub-process 112.
The Telegraf subprocess 112 is a process for monitoring performance indexes related to the middleware of the server side of the operating system, provides an out-of-box monitoring data acquisition plug-in set for a plurality of infrastructures and middleware of the server side of the operating system, and a user can configure a required plug-in set according to needs and adjust parameters of various plug-ins, so that the plug-ins can be put into a formal environment for use. The Telegraf node 111 in the embodiment of the invention can be used as a parent process of a Telegraf child process 112. In a specific implementation, the Telegraf node 111 is generally implemented by a lightweight script such as Bash or Python.
Specifically, the telegraff node 111 further has a function of determining a trigger condition of an election event among the multiple telegraff nodes 111, and listening for a trigger message of the election event from the Redis database 130. Election events among multiple Telegraf nodes 111 refer to: in the embodiment of the present invention, only one telgraff node among the multiple telgraff nodes 111 starts a telgraff subprocess to be responsible for data acquisition, that is, in the case that various abnormalities occur in operation of the Leader node, a new telgraff node needs to be selected among the multiple telgraff nodes 111 to become a unique Leader node, so as to replace an original Leader node to continue a data acquisition task.
Further, under the condition that various types of abnormalities occur in the operation of the Leader node, the Telegraf node 111 may determine whether the trigger condition of the election event is satisfied. When the trigger condition of the election event is satisfied, a trigger message of the election event may occur in the Redis database 130 in the embodiment of the present invention, and the Telegraf node 111 may monitor the trigger message of the election event in the Redis database 130 in real time, and may know the trigger of the election event in time, thereby participating in the election event.
Specifically, the Redis database 130 in the embodiment of the present invention is used as a lightweight message queue, has a better performance under the condition that the communication data size is small based on the working mode of the shared memory thereof, and is suitable for being used as a target container for election event notification and monitoring processing in the embodiment of the present invention. The Redis database 130 specifically includes an election event blocking queue 131, where the election event blocking queue 131 is configured to obtain a trigger message of an election event between the multiple Telegraf nodes, that is, the Telegraf node sends the trigger message of the election event to the election event blocking queue 131.
To ensure the uniqueness of the trigger message, only one trigger message for the election event is needed when triggering the election event. Therefore, in order to prevent multiple telgraff nodes from sending a trigger message of an election event to an election event blocking queue at the same time, the Redis database 130 further includes a distributed lock 132, and the multiple telgraff nodes need to preempt the distributed lock first, and the telgraff nodes that preempt the distributed lock can only send the trigger message to the election event blocking queue, so that uniqueness of the trigger message is realized.
The data acquisition system in the embodiment of the present invention further includes a ZooKeeper distributed framework module 120, as a program coordination service, where the ZooKeeper distributed framework module 120 plays a role of a target container for state synchronization in the system in the embodiment of the present invention, and implements its function through the following fields:
an initialization field for identifying whether the system completes an initialization process;
a Leader field, configured to store identification information of a server where the Leader node is located, such as a host name or an ip address of the server;
temporary fields, i.e., a plurality of temporary fields registered by the Telegraf node on the ZooKeeper distributed framework module. If the Telegraf node is a Leader node, the temporary field registered by the Leader node is used for storing the last receiving and reporting time of the Leader node; if the Telegraf node is not a Leader node, the temporary field registered by the Telegraf node is used for storing the health state of the Telegraf node, when the value is true, the node is indicated to be normally operated at present and has the right of being elected as the Leader node, otherwise, the value is false.
Further, the last time of receiving and reporting of the Leader node in the embodiment of the present invention refers to that the Leader node reports the latest time of acquiring data at a preset frequency in the process of acquiring data, and the last time of receiving and reporting, that is, the time when the Leader node normally acquires and reports data last, represents the health state of the Leader node.
According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively arranged on the plurality of servers, and a new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
On the basis of the foregoing embodiment, fig. 2 is a flowchart of a data acquisition method provided in an embodiment of the present invention, where the method is applied to the data acquisition system provided in the foregoing embodiment, and as shown in fig. 2, the method includes:
s201, monitoring whether a triggering condition of a Leader node election event is satisfied, and sending a message triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is monitored to be satisfied; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
specifically, the applied scenario in the embodiment of the present invention is in the data acquisition system in the embodiment of the present invention, and the Telegraf node deployed on a certain server in the system performs a data acquisition task, that is, a Leader node, by using a corresponding Telegraf sub-process.
In order to prevent the data acquisition failure caused by the problems of downtime, network failure and the like of the server, the embodiment of the invention needs to select a new Leader node for data acquisition. First, each candidate node needs to be controlled to determine whether a trigger condition for an election event among multiple Telegraf nodes is satisfied. Wherein the candidate nodes are the Telegraf nodes except the Leader node in the Telegraf nodes. Meanwhile, the Leader node actually has a function of judging the trigger condition of the election event, like other candidate nodes, but a new Leader node to be elected by the election event is usually generated in the candidate nodes.
Further, the triggering conditions of the election event in the embodiment of the present invention include at least two conditions: the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last reporting time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module and the current time exceeds a preset threshold value. If one of the two conditions is satisfied, which represents that the trigger condition of the election event is reached, a new Leader node needs to be elected.
Specifically, for the first trigger condition, because each telegraff node including a Leader node in the embodiment of the present invention registers a corresponding temporary field in the ZooKeeper distributed framework module, if the state of the temporary field corresponding to the Leader node changes from existing to absent, it is described that the Leader node does not exist in the system according to an event notification mechanism of the ZooKeeper, and possible reasons include: and when the server where the Leader node is positioned is down, the Leader node is disconnected with the ZooKeeper distributed framework module due to network failure, and the like, the candidate node receives the notification event from the ZooKeeper distributed framework module.
Specifically, for the second trigger condition, since the Leader node stores the last reporting time in the temporary field corresponding to the ZooKeeper distributed framework module, the last reporting time is updated with the preset frequency as the data collection task progresses. Meanwhile, the candidate node starts a corresponding timing check thread, the last receiving time is checked in the ZooKeeper distributed frame module, if the difference value between the last receiving time and the current time exceeds a preset threshold value, the problems of network congestion, packet loss and the like in the Telegraf subprocess report managed by the Leader node are solved, and the situation that an election event needs to be triggered also belongs to the situation.
Further, after the candidate node determines that the trigger condition of the election event among the multiple Telegraf nodes is satisfied, it is necessary to send a trigger message of the election event to an election event blocking queue of the Redis database. The election event blocking queue is a message container for acquiring a trigger message of an election event among the multiple Telegraf nodes, that is, in the embodiment of the present invention, the Telegraf node sends the trigger message of the election event to the election event blocking queue, and a sender may specifically be a Leader node or a candidate node.
To ensure the uniqueness of the trigger message, only one trigger message for the election event is needed when triggering the election event. Therefore, in order to prevent multiple Telegraf nodes from sending the trigger message of the election event to the election event blocking queue at the same time, the Redis database also contains a distributed lock, the multiple Telegraf nodes need to preempt the distributed lock first, and the Telegraf nodes who preempt the distributed lock can send the trigger message to the election event blocking queue, so that the uniqueness of the trigger message is realized.
And S202, if the message triggering the Leader node election event is acquired from the election event blocking queue, the selected Leader node is used as the elected node, and a corresponding Telegraf subprocess is started for data acquisition.
Specifically, after one telegraff node sends a trigger message of an election event to the election event blocking queue, only one telegraff node in the multiple telegraff nodes can monitor the trigger message according to the exclusivity of the trigger message monitoring mode. That is, after one of the Telegraf nodes monitors the trigger message, other nodes cannot monitor the trigger message any more, that is, only one node obtains the vote in the election event, and becomes the selected node, that is, the new Leader node.
Specifically, the selected node is used as a new Leader node to replace the original Leader node to continue the process data acquisition. The specific steps can be that firstly, a selected node starts a corresponding Telegraf subprocess, and then the Telegraf subprocess corresponding to the selected node is used for data acquisition.
According to the data acquisition method provided by the embodiment of the invention, the new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
On the basis of any of the above embodiments, fig. 3 is a flowchart of a data acquisition method provided in an embodiment of the present invention, and as shown in fig. 3, the method is specifically a complete flow of an initialization stage, and includes:
s301, manually and randomly electing;
specifically, under the condition that servers, infrastructures and middleware in the whole system environment are ready, a network operation and maintenance worker can designate a telegraff node in any one server as a Leader node in a manner of manually setting a Leader field in a ZooKeeper distributed framework module.
The Leader field is used for storing identification information of a server where the Leader node is located in the ZooKeeper distributed framework module, such as a host name or an ip address of the server. Each Telegraf node can acquire the information of the Leader node through the information in the field.
S302, starting a Telegraf node;
specifically, an initialization field in the ZooKeeper distributed framework module is used to identify whether the system completes an initialization procedure. Therefore, at the beginning of the initialization task, the initialization field in the ZooKeeper distributed framework module needs to be manually set to false for identifying that the initialization work is not completed yet.
Meanwhile, the telegraff nodes deployed in different servers are generally script programs implemented by lightweight scripts such as Bash or Python. Therefore, to start a script program of multiple Telegraf nodes simultaneously in the initialization phase, the Telegraf nodes can be started in batches by means of an automated operation and maintenance tool such as Ansible.
S303, registering a temporary field;
specifically, the ZooKeeper distributed framework module is a target container for state synchronization in the system, and the temporary field is obtained by registering a Telegraf node on the ZooKeeper distributed framework module, so that state synchronization of different Telegraf nodes is realized. Therefore, after the Telegraf node is started, the Telegraf node registers to the ZooKeeper distributed framework module with identification information such as a host name or an ip address of the server where the Telegraf node is located, so as to obtain a corresponding temporary field.
S304, obtaining Leader information;
specifically, because the Leader field in the ZooKeeper distributed framework module is manually set by the network operation and maintenance personnel, each Telegraf node does not know whether the terminal is the Leader node set in the initialization task. Therefore, each Telegraf node can read Leader information from the ZooKeeper to make a judgment.
S305, the Leader node starts a Telegraf subprocess;
specifically, after the Telegraf node reads Leader information from the ZooKeeper, if the Telegraf node finds that the Telegraf node is a Leader node, the Telegraf node starts a corresponding Telegraf sub-process, and data collection of the Telegraf node specified by the network operation and maintenance personnel in an initialization task is achieved.
S306, electing event message queue monitoring;
specifically, when the current Leader node is designated to perform data acquisition, the initialization task needs to prepare for a subsequent election event which may be started, and each Telegraf node needs to know in time when the election event occurs, so that each Telegraf node needs to monitor a trigger message of the election event to implement the task.
Specifically, each Telegraf node starts a consumer thread, and specifically, the election event blocking queue with an empty current message volume in the Redis is monitored by using a mode of LPOP/BRPOP in the Redis. Once the election event blocking queue has the trigger message of the election event in the election event blocking queue, the trigger message is monitored by each Telegraf node.
S307, checking an initial state;
specifically, the Telegraf subprocess is a process directly used for data acquisition, and in order to ensure that the Telegraf subprocess has no abnormal phenomenon after being started, whether the Telegraf subprocess normally runs or not is judged by scanning a log file of the subprocess after the Leader node starts the Telegraf subprocess.
S308, updating the state of the temporary field;
specifically, after the Telegraf node registers to the ZooKeeper distributed framework module to obtain the corresponding temporary field, the state of the temporary field needs to be updated. If the Telegraf node is a Leader node, the temporary field registered by the Leader node is used for storing the last receiving and reporting time of the Leader node; if the Telegraf node is not a Leader node, the temporary field registered by the Telegraf node is used for storing the health state of the Telegraf node, when the value is true, the node is indicated to be normally operated at present and has the right of being elected as the Leader node, otherwise, the value is false.
Therefore, in the step of updating the state of the temporary field, the Leader node can update the health state of the corresponding temporary node on the ZooKeeper distributed framework module to true after the initial state check is finished and no error is found; for other nodes, the health state is set as true only by judging that the monitoring of the election event message queue is successful; and setting the corresponding health state as false for the condition that each node can not normally monitor and the Leader node can not normally start the Telegraf subprocess.
S309, abnormal alarm detection and execution.
Specifically, after a preset event after each Telegraf node is started, for example, 30 seconds, the distributed lock is preempted by Redis, the Telegraf node which obtains the distributed lock starts an abnormal alarm monitoring thread to scan the health value of each temporary field of the ZooKeeper distributed frame module, if false exists, an alarm is triggered, the problem of system environment needs to be manually intervened, otherwise, the initialized field of the ZooKeeper is set to true, and the initialization of the data acquisition system is completed by identification.
According to the data acquisition method provided by the embodiment of the invention, the Telegraf node responsible for data acquisition is designated through the initialization task, and a state synchronization and message monitoring mechanism is realized among a plurality of Telegraf nodes, so that the data acquisition task can be ensured to normally run, and meanwhile, the loss of real-time monitoring data can be avoided.
On the basis of any of the above embodiments, fig. 4 is a flowchart of a data acquisition method provided in an embodiment of the present invention, and as shown in fig. 4, the method is specifically a complete flow of an election event phase, and includes:
s401, judging whether an election event is triggered or not;
in order to prevent data acquisition failure caused by problems such as downtime or network failure of a server where an original Leader node is located, in this case, the embodiment of the present invention needs to select a new Leader node for data acquisition. First, each candidate node needs to be controlled to determine whether a trigger condition for an election event among multiple Telegraf nodes is satisfied. Wherein the candidate nodes are the Telegraf nodes except the Leader node in the Telegraf nodes. Meanwhile, the Leader node actually has a function of judging the trigger condition of the election event, like other candidate nodes, but a new Leader node to be elected by the election event is usually generated in the candidate nodes.
Further, the triggering conditions of the election event in the embodiment of the present invention include at least two conditions: the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last reporting time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module and the current time exceeds a preset threshold value. If one of the two conditions is satisfied, which represents that the trigger condition of the election event is reached, a new Leader node needs to be elected.
Specifically, for the first trigger condition, because each telegraff node including a Leader node in the embodiment of the present invention registers a corresponding temporary field in the ZooKeeper distributed framework module, if the state of the temporary field corresponding to the Leader node changes from existing to absent, it is described that the Leader node does not exist in the system according to an event notification mechanism of the ZooKeeper, and possible reasons include: and when the server where the Leader node is positioned is down, the Leader node is disconnected with the ZooKeeper distributed framework module due to network failure, and the like, the candidate node receives the notification event from the ZooKeeper distributed framework module.
Specifically, for the second trigger condition, since the Leader node stores the last reporting time in the temporary field corresponding to the ZooKeeper distributed framework module, the last reporting time is updated with the preset frequency as the data collection task progresses. Meanwhile, the candidate node starts a corresponding timing check thread, the last receiving time is checked in the ZooKeeper distributed frame module, if the difference value between the last receiving time and the current time exceeds a preset threshold value, the problems of network congestion, packet loss and the like in the Telegraf subprocess report managed by the Leader node are solved, and the situation that an election event needs to be triggered also belongs to the situation.
S402, triggering an election event to a Redis blocking queue;
specifically, after the candidate node determines that the trigger condition of the election event among the multiple Telegraf nodes is satisfied, the candidate node needs to send a trigger message of the election event to an election event blocking queue of the Redis database. The election event blocking queue is a message container for acquiring a trigger message of an election event among the multiple Telegraf nodes, that is, in the embodiment of the present invention, the Telegraf node sends the trigger message of the election event to the election event blocking queue, and a sender may specifically be a Leader node or a candidate node.
To ensure the uniqueness of the trigger message, only one trigger message for the election event is needed when triggering the election event. Therefore, in order to prevent multiple Telegraf nodes from sending a trigger message of an election event to an election event blocking queue at the same time, the Redis database also contains a distributed lock, the Telegraf nodes need to preempt the distributed lock first, the Telegraf nodes who preempt the distributed lock can send the trigger message to the election event blocking queue, and the distributed lock is released after the trigger message is sent, so that the uniqueness of the trigger message is realized.
S403, the nodes meeting the candidate qualification compete for votes;
specifically, after one telegraff node sends a trigger message of an election event to the election event blocking queue, only one telegraff node in the multiple telegraff nodes can monitor the trigger message according to the exclusivity of the trigger message monitoring mode. That is, after one of the Telegraf nodes monitors the trigger message, other nodes cannot monitor the trigger message any more, that is, only one node obtains the vote in the election event, and becomes the selected node, that is, the new Leader node.
S404, starting a Telegraf subprocess;
specifically, after a Telegraf node is selected as a new Leader node, the Telegraf node starts a corresponding Telegraf subprocess, and under the condition that the original Leader node is abnormal, the Telegraf subprocess in the original Leader node is replaced to perform data acquisition, so that the continuous data acquisition is ensured.
S405, discarding the candidate qualification;
specifically, after the selected node starts the telegraff subprocess, whether the telegraff subprocess is started successfully needs to be detected. If the Telegraf subprocess is abnormally started, for example, the Telegraf subprocess is not normally started or the condition that normal report cannot be received exists through log scanning after the Telegraf subprocess is started, the Telegraf subprocess and the corresponding consumer thread are terminated, namely the candidate qualification is discarded. And then generating a triggering message of a cardiac election event in an election event blocking queue in the Redis database, and meanwhile, updating the health state of the temporary field corresponding to the Telegraf node to be false.
S406, synchronizing Zookeeper state information;
specifically, if the selected node detects that the start of the Telegraf subprocess is successful, the selected node formally becomes a new Leader node to start data collection. Meanwhile, the leader field on the ZooKeeper distributed framework module needs to be updated, and the content of the leader field is updated to the identification information of the server where the selected node is located. In addition, the selected node is used as a new Leader node, and the last receiving and reporting time in the corresponding temporary field needs to be updated.
S407, smoothly retreating the original Leader node;
specifically, after the selected node updates the Leader field on the ZooKeeper distributed framework module, the original Leader node receives an event notification from the ZooKeeper distributed framework module, and at this time, the original Leader node checks whether the telegraff subprocess of the original Leader node is still running, and closes the telegraff subprocess, so that smooth switching of the unique telegraff subprocess in the system is realized.
S408, detecting and executing an abnormal alarm;
specifically, after step S407, optionally executing this step, controlling the selected node to check the number of temporary fields and the number of health values true on the ZooKeeper distributed framework module, and if a certain value is smaller than a specified threshold value, indicating that the system is currently in an abnormal state, triggering an alarm to remind the operation and maintenance staff of intervention in advance.
According to the data acquisition method provided by the embodiment of the invention, the new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A data acquisition system, comprising a plurality of servers, a ZooKeeper distributed framework module, and a Redis database, wherein:
the multiple servers are all deployed with Telegraf nodes; the data acquisition system comprises Telegraf nodes, wherein the Telegraf nodes comprise only one Telegraf node serving as a Leader node, and the Leader node is used for starting a Telegraf subprocess to acquire data;
the Telegraf nodes comprise event monitoring nodes and message acquisition nodes; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is satisfied, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is satisfied; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is used for taking the message as the elected Leader node and starting a corresponding Telegraf subprocess to acquire data if the message triggering the Leader node election event is obtained from the election event blocking queue;
the ZooKeeper distributed framework module comprises a temporary field registered by the Telegraf node on the ZooKeeper distributed framework module; the temporary field is used as a basis for the event monitoring node to monitor whether a trigger condition of the Leader node election event is satisfied;
the Redis database comprises an election event blocking queue, and is used for storing messages for triggering the Leader node election events.
2. The data collection system of claim 1, wherein the Redis database further comprises a distributed lock to determine a Telegraf node that sends a trigger message for the election event to the election event blocking queue.
3. The data acquisition system of claim 1, wherein the ZooKeeper distributed framework module further comprises a Leader field for storing identification information of a server where the Leader node is located.
4. A data acquisition method applied to the data acquisition system according to any one of claims 1 to 3, comprising:
monitoring whether a triggering condition of a Leader node election event is satisfied, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is monitored to be satisfied; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
and if the message triggering the Leader node election event is acquired from the election event blocking queue, starting a corresponding Telegraf subprocess for data acquisition as the elected selected node serving as the Leader node.
5. The data acquisition method according to claim 4, wherein the trigger condition of the Leader node election event is specifically:
the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last receiving time and the current time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module exceeds a preset threshold value.
6. The data collection method of claim 4, wherein sending a message to an election event blocking queue that triggers a Leader node election event comprises:
a plurality of Telegraf nodes contend for a distributed lock of a Redis database;
and the Telegraf node contends for the distributed lock, and sends a message for triggering a Leader node election event to an election event blocking queue of the Redis database.
7. The data acquisition method according to claim 4, wherein after the selected node as the Leader node starts the corresponding Telegraf subprocess for data acquisition, the method further comprises:
and if the start of the Telegraf subprocess is abnormal, triggering the Leader node election event again.
8. The data acquisition method of claim 4, further comprising:
and the ZooKeeper distributed framework module sends an event notice for closing a Telegraf subprocess corresponding to the Leader node.
9. The data acquisition method of claim 4, further comprising:
and determining one Telegraf node in a plurality of Telegraf nodes as a Leader node for data acquisition by setting a Leader field in the ZooKeeper distributed framework module.
10. The data acquisition method according to claim 9, wherein the determining one Telegraf node among the plurality of Telegraf nodes for data acquisition by setting a leader field in the ZooKeeper distributed framework module comprises:
setting a leader field in a ZooKeeper distributed framework module as identification information of a server where one Telegraf node in a plurality of Telegraf nodes is located;
and the Telegraf node reads a leader field in the ZooKeeper distributed framework module, and if the leader field is identification information of a server where the Telegraf node is located, the Telegraf node starts a corresponding Telegraf sub-process to collect data.
CN202010529803.1A 2020-06-11 2020-06-11 Data acquisition system and method Active CN111722980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010529803.1A CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010529803.1A CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Publications (2)

Publication Number Publication Date
CN111722980A true CN111722980A (en) 2020-09-29
CN111722980B CN111722980B (en) 2023-10-20

Family

ID=72567968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010529803.1A Active CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Country Status (1)

Country Link
CN (1) CN111722980B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071853A1 (en) * 2006-09-18 2008-03-20 Mosier Timothy J Distributed-leader-election service for a distributed computer system
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN107895009A (en) * 2017-11-10 2018-04-10 北京国信宏数科技有限责任公司 One kind is based on distributed internet data acquisition method and system
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income
CN109088908A (en) * 2018-06-06 2018-12-25 武汉酷犬数据科技有限公司 A kind of the distributed general collecting method and system of network-oriented
CN110247954A (en) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 A kind of dispatching method and system of distributed task scheduling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071853A1 (en) * 2006-09-18 2008-03-20 Mosier Timothy J Distributed-leader-election service for a distributed computer system
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN107895009A (en) * 2017-11-10 2018-04-10 北京国信宏数科技有限责任公司 One kind is based on distributed internet data acquisition method and system
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income
CN109088908A (en) * 2018-06-06 2018-12-25 武汉酷犬数据科技有限公司 A kind of the distributed general collecting method and system of network-oriented
CN110247954A (en) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 A kind of dispatching method and system of distributed task scheduling

Also Published As

Publication number Publication date
CN111722980B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US10592330B2 (en) Systems and methods for automatic replacement and repair of communications network devices
EP0591345B1 (en) Method and system for monitoring a computer system
US5634008A (en) Method and system for threshold occurrence detection in a communications network
CN104699759B (en) A kind of data base automatic operation and maintenance method
TW201944236A (en) Task processing method, apparatus, and system
CN108710544B (en) Process monitoring method of database system and rail transit comprehensive monitoring system
US8001231B2 (en) Method and apparatus for implementing a predetermined operation in device management
CN108984366B (en) Terminal monitoring processing method, device and equipment
CN111885005B (en) Container cloud platform service communication method, device, equipment and medium
CN111698121B (en) SNMP trap alarm test method and related device
CN110795264A (en) Monitoring management method and system and intelligent management terminal
CN106385343B (en) Method and device for monitoring client under distributed system and distributed system
CN111722980A (en) Data acquisition system and method
CN111737060A (en) Method and device for processing component exception and electronic equipment
CN110290019B (en) Monitoring method and system
CN113760634A (en) Data processing method and device
JP2007228421A (en) Ip network route diagnosis apparatus and ip network route diagnosis system
CN113727210B (en) Equipment information management method, system, storage medium and equipment
US20190332463A1 (en) Hardware error corrections based on policies
CN107864057B (en) Online automatic checking and alarming method based on networking state
CN104731648B (en) A kind of distributed system Centroid structure, submission, monitoring method and device
CN111464357A (en) Resource allocation method and device
JP2014067232A (en) Management device including collective management function of performance information
CN113590420B (en) Cluster state supervision method and device
US20220188724A1 (en) Maintenance management system for service providing application, maintenance management device, maintenance management method, and maintenance management program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant