CN114328604B - Method, device and medium for improving cluster data acquisition capacity - Google Patents

Method, device and medium for improving cluster data acquisition capacity Download PDF

Info

Publication number
CN114328604B
CN114328604B CN202111645492.6A CN202111645492A CN114328604B CN 114328604 B CN114328604 B CN 114328604B CN 202111645492 A CN202111645492 A CN 202111645492A CN 114328604 B CN114328604 B CN 114328604B
Authority
CN
China
Prior art keywords
leader
data
node
nodes
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111645492.6A
Other languages
Chinese (zh)
Other versions
CN114328604A (en
Inventor
李兴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111645492.6A priority Critical patent/CN114328604B/en
Publication of CN114328604A publication Critical patent/CN114328604A/en
Application granted granted Critical
Publication of CN114328604B publication Critical patent/CN114328604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a medium for improving cluster data acquisition capacity, and relates to the technical field of system design. Firstly, selecting a plurality of Leader from a plurality of nodes; then when each node is started, each node is controlled to select a corresponding Leader from a plurality of Leader, and report data to the Leader so as to obtain the data of the corresponding local node in each Leader and generate metadata by each Leader on the report data statistics; and finally controlling each Leader to interact metadata. The method comprises a plurality of headers capable of reporting data, wherein each header comprises data of a local node and metadata of other nodes, so that all nodes do not report the data to the same main node, the problem of insufficient disk capacity caused by the fact that the data is concentrated in a certain node is avoided, and the cluster data acquisition capacity is improved.

Description

Method, device and medium for improving cluster data acquisition capacity
Technical Field
The present disclosure relates to the field of system design technologies, and in particular, to a method, an apparatus, and a medium for improving cluster data acquisition capability.
Background
With the development of information technology, the data is required to be more and more, and the data is required to be acquired from a cluster.
The traditional data collection and reporting process is that each node collects own data and then sends the data to a master node (the master node can be understood as a node running cluster management software). When the external interface makes a query, the external interface makes a query to the master node. This function is easily implemented when the cluster size is small. However, as the cluster size increases, a situation is likely to be encountered when the cluster is subjected to data collection and storage: when the cluster size is large (the current managed individual clusters reach the size of 300 nodes), the reported data volume is very large and possibly exceeds the hard disk capacity of the management nodes, and the cluster size is limited to be further enlarged, so that the cluster data acquisition capacity is greatly reduced.
It can be seen that how to improve the cluster data acquisition capability is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method, a device and a medium for improving cluster data acquisition capability, which are used for improving the cluster data acquisition capability.
In order to solve the above technical problems, the present application provides a method for improving cluster data acquisition capability, where the method includes:
selecting a plurality of headers from a plurality of nodes, wherein the number of the headers is smaller than the number of the nodes;
when each node is started, controlling each node to select a corresponding Leader from a plurality of the leaders, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader, and generating metadata by each Leader on the reported data statistics; the metadata are data representing position information reported by actual data;
and controlling each Leader to interact with the metadata so as to obtain metadata of the rest nodes except the local node in the cluster in each Leader.
Preferably, after said controlling each Leader to interact with said metadata so as to obtain metadata of remaining said nodes in each said Leader, further comprises:
when data is requested from the current Leader, judging whether the requested data is on the local node of the current Leader;
if yes, acquiring the data from the local node;
and if not, acquiring the data according to the metadata of the rest nodes.
Preferably, the selecting a plurality of headers from a plurality of nodes includes:
and selecting a plurality of Leader from a plurality of nodes according to the number of the nodes in the cluster scale and the storage capacity of each node.
Preferably, when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders includes:
when each node is started, acquiring the total number of the nodes of the Leader in the cluster;
and controlling each node to select two Leader with the least number of connected nodes from a plurality of Leader.
Preferably, after reporting the data to the two headers, the method further includes:
the data is stored to a database.
Preferably, in the case that one Leader is down, the data is acquired from the remaining Leader.
Preferably, under the condition that one Leader is down, controlling the local node contained in the Leader to select the corresponding Leader from the rest of a plurality of the Leader, and reporting data to the Leader; and entering the step of controlling each Leader to interact with the metadata so as to obtain metadata of the rest of the nodes in each Leader.
In order to solve the above technical problem, the present application further provides a device for improving the capability of cluster data acquisition, including:
the selecting module is used for selecting a plurality of Leader from a plurality of nodes, wherein the number of the Leader is smaller than that of the nodes;
the first control module is used for controlling each node to select a corresponding Leader from a plurality of the leaders when each node is started, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader and generating metadata by each Leader on the reported data statistics; the metadata are data representing position information reported by actual data;
and the second control module is used for controlling each Leader to interact with the metadata so as to obtain the metadata of the rest nodes in each Leader.
In order to solve the above technical problem, the present application further provides a device for improving the capability of cluster data acquisition, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the method for improving the cluster data acquisition capability when executing the computer program.
To solve the above technical problem, the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for improving the capability of cluster data collection described above.
The method for improving the cluster data acquisition capability comprises the steps of selecting a plurality of Leader from a plurality of nodes; when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader and generating metadata by each Leader on the reported data statistics, wherein the metadata is data representing position information reported by an actual number; controlling each Leader to interact with metadata so as to obtain metadata of the remaining nodes in each Leader. The method comprises a plurality of headers capable of reporting data, wherein each header comprises data of a local node and metadata of other nodes, so that all nodes do not report the data to the same main node, the problem of insufficient disk capacity caused by the fact that the data is concentrated in a certain node is avoided, and the cluster data acquisition capacity is improved.
In addition, the application further provides a device for improving the cluster data acquisition capability and a computer readable storage medium, and the method for improving the cluster data acquisition capability comprises the above-mentioned method for improving the cluster data acquisition capability, and has the same effects as the above.
Drawings
For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for improving cluster data acquisition capability provided in the present application;
FIG. 2 is a flow chart of conventional data acquisition and reporting when 1 Leader is used for 5 nodes;
FIG. 3 is a flow chart of data collection and reporting when 3 Leader are 5 nodes;
FIG. 4 is a flow chart of data collection and reporting when a Leader is down;
FIG. 5 is a block diagram of an apparatus for improving cluster data acquisition capabilities according to an embodiment of the present application;
fig. 6 is a block diagram of an apparatus for improving cluster data acquisition capability according to another embodiment of the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments herein without making any inventive effort are intended to fall within the scope of the present application.
The core of the application is to provide a method, a device and a medium for improving the cluster data acquisition capability, which are used for improving the cluster data acquisition capability.
In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description. Fig. 1 is a flowchart of a method for improving cluster data acquisition capability provided in the present application. The method comprises the following steps:
s10: and selecting a plurality of headers from the plurality of nodes, wherein the number of the headers is smaller than the number of the nodes.
When data is collected, the data can be collected from a server, a terminal device, a memory and the like. In this process, all devices form a cluster, and the devices such as a server, a terminal device, a memory, etc. in the cluster may be referred to as a node. The cluster comprises a plurality of nodes, and the traditional data acquisition and reporting process is that each node acquires own data and reports to a master node. Fig. 2 is a flow chart of conventional data acquisition and reporting when 1 Leader is used for 5 nodes. As shown in fig. 2, the data1, data2, data3, data4, and data5 are all reported to the Leader by the five nodes and 1 Leader, the node1, the node2, the node3, the node4, and the node5, and when the data is queried by the third party system, the data is queried from the Leader. Since the data of all nodes in the cluster are stored in the Leader, the capacity of the hard disk of the Leader is limited, and when the data amount queried by the third party system from the Leader is large, the CPU utilization rate of the Leader is large, therefore, the horizontal expansion of the cluster is restricted when the data of all nodes in the cluster are stored in 1 Leader. Therefore, in order to reduce the load of 1 Leader, in practice, a plurality of Leader is selected from a plurality of nodes to store data. It should be noted that, in the present application, a plurality of Leader is a plurality of master nodes.
The method for selecting the plurality of the loaders from the plurality of the nodes and the number of the selected loaders are not limited, so long as the load of the loaders can be relieved compared with the previous method for reporting the data of all the nodes in the cluster to one loader. In the implementation, the number of the selected headers is related to the number of the nodes in the cluster, the storage capacity of the nodes and the like, so that the proper number of the selected headers for all the nodes in the cluster can be calculated according to the number of the nodes in the cluster scale and the storage of each node. The node specifically used as a Leader is usually selected according to practical situations, for example, may be selected according to an IP address or an arrangement sequence in a memory.
S11: when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders, and reporting data to the Leader so as to obtain the data of a corresponding local node in each Leader and generate metadata by each Leader on the reported data statistics; the metadata is data representing position information reported by the data.
In S10, a plurality of Leader of the data reported by the node is selected, and when the node is started, each node is controlled to report the data to the selected plurality of Leader. The number of nodes connected to each Leader is not limited, and it is preferable to ensure that the number of nodes connected to each Leader is balanced, and at this time, the method may be implemented by using an algorithm or the like. When the connected nodes are distributed to each Leader, the connected nodes can be distributed to each Leader before all the nodes report data to the corresponding Leader; the method can also adaptively adjust the headers connected with the rest nodes in the cluster according to the number of the connected nodes in the process of reporting the data, thereby ensuring that the nodes connected on each header can be balanced and further relieving the load on a certain or some headers; when each node reports data to a Leader, the data may be reported to one Leader, or the data may be reported to a plurality of Leader, which is not limited herein, when each node reports data to one Leader, if the Leader is down, the data of the node cannot be obtained, resulting in data loss, when each node reports data to a plurality of Leader, the system redundancy may be caused, and the load of the system is increased, so that the preferable mode is that each node reports data to 2 Leader. Since each node reports data to the corresponding Leader, the corresponding Leader includes data on the node reporting data to the Leader, and the node reporting data to the Leader is referred to as a local node, and correspondingly, the data on the node reporting data to the Leader is referred to as data of the local node.
Each Leader includes data of a local node, and each Leader processes the reported data to generate metadata, where the metadata is data representing position information reported by the data, and the form of the metadata is not limited, so long as the metadata can represent the position information reported by the data, that is, the metadata can find the position information of the data. In the present application, the metadata selected is [ header, node, start_time, end_time ] tetrad data, where header represents the node name of the current header, node represents the node name of the data report, start_time represents the start time of the data report, and end_time represents the time of the last data report, and such data is referred to as metadata. Fig. 3 is a flowchart of data collection and reporting when 3 headers are provided at 5 nodes. As shown in FIG. 3, nodes reporting data to header 2 have node1, node2 and node5, assuming that the times for reporting data to header 2 by node1, node2 and node5 are "2021-11-01:00:00" to "2021-11-00:00:00", "2021-11-02:00:00:00" to "2021-11-10:00:00", "2021-11-03:00:00" to "2021-11-13-00:00", respectively. For metadata to be generated in Leader2, [ "Leader2", "node1", "2021-11-01:00:00", "2021-11-11:00:00" ], [ "Leader2", "node2", "2021-11-02:00:00", "2021-11-10:00:00" ], [ "Leader2", "node5", "2021-11-03:00:00", "2021-11-13:00:00" ], this metadata means that the Leader2 node holds data between node1 from "2021-11-01:00:00" to "2021-11-00:00:00", node2 from "2021-11-02:00:00" to "2021-11-10:00:00", and node5 from "2021-11-03:00:00" to "2021-11-00:00:00". Similarly, for Leader1, leader3 will also generate the corresponding metadata.
S12: controlling each Leader to interact metadata so as to obtain metadata of the rest nodes except the local node in the cluster in each Leader.
In the step S11, metadata of each Leader is obtained, and in the existing cluster data collection process, data of all nodes are reported to the same master node, so that the master node includes data reported by all nodes. When the third party system needs to collect data, the requested data can be obtained by only inquiring on the main node, and in the application, the nodes are stored in groups in the steps, so that if the third party system is connected with one Leader, only the data of the local node can be obtained, but the data of the rest nodes in the cluster cannot be obtained, but the metadata obtained in the steps are the data containing the position information, so that the metadata of the local node and the metadata of the rest nodes are ensured to be contained on each Leader through interaction among the Leader. The metadata is generated by the Leader2 listed in the above steps and sent to the Leader1 and the Leader3, respectively, and similarly, when the corresponding metadata is generated by the Leader1, the metadata is sent to the Leader2 and the Leader3, respectively, and similarly, the Leader3 is also similar. This enables metadata interaction between the headers. Each Leader after the metadata interaction contains the data of the local node and the metadata of the rest nodes.
The method for improving the cluster data acquisition capability provided by the embodiment comprises the steps of selecting a plurality of Leader from a plurality of nodes; when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader and generating metadata by each Leader on the reported data statistics, wherein the metadata is data representing position information reported by an actual number; controlling each Leader to interact with metadata so as to obtain metadata of the remaining nodes in each Leader. The method comprises a plurality of headers capable of reporting data, wherein each header comprises data of a local node and metadata of other nodes, so that all nodes do not report the data to the same main node, the problem of insufficient disk capacity caused by the fact that the data is concentrated in a certain node is avoided, and the cluster data acquisition capacity is improved.
In the above embodiment, the data included in each header includes the data of the local node and the metadata of the remaining nodes in the cluster. When the third party system requests data, in order to increase the speed of acquiring the data, the preferred embodiment further includes, after controlling each Leader to interact metadata so as to obtain metadata of the remaining nodes in each Leader:
when data is requested from the current Leader, judging whether the requested data is on a local node of the current Leader;
if yes, acquiring data from the local node;
if not, acquiring data according to the metadata of the rest nodes.
And reporting data to the corresponding Leader by each node and processing the reported data by the Leader to generate metadata, wherein the metadata of each Leader are subjected to data interaction, so that the data of the local node and the metadata of the rest nodes are contained in each Leader. When a third party system needs to query data, a loader is connected to request data from the third party system. At this time, the Leader can firstly query the local node, and if the local node cannot query the metadata of other nodes, the location of the data is queried, and then the requested data is acquired from the corresponding Leader; the Leader may also query the metadata of the remaining nodes for the location of the requested data, and then obtain the requested data from the corresponding Leader, which is not limited herein. However, since the header includes the data of the local node and the metadata of the rest nodes, the data of the local node is specific data of the node, and the metadata is not specific data of the node, but is only data representing the position information of the node. Therefore, in order to increase the speed of acquiring data by the third party system, the preferred embodiment is to query the local node first, directly acquire the requested data from the local node when the local node contains the requested data, determine the location information of the requested data according to the metadata when the local node does not contain the requested data, and then acquire the requested data from the corresponding Leader. As shown in fig. 3, in the above embodiment, as can be seen from fig. 3, the header 1 holds the data of the node1, the node2, the node3, and the node4, and the header 2 holds the data of the node1, the node2, and the node5, and the header 3 holds the data of the node3, the node4, and the node5, as can be seen from the metadata. When the third party system is connected to the Leader1 and queries the data on the node1, the Leader1 directly returns from the local query data 1; if the data on the node5 needs to be queried, since the Leader1 does not exist, the metadata is used to know that the data on the Leader2 and the Leader3 stores the data of the node5, and then a request is sent to the Leader2, and the local data queried by the Leader2 is returned to the Leader1.
When the third party system requests data from the cluster, the data is preferentially queried in the local node, if the data can be found in the local node, the data is directly obtained in the local node, and if the data can not be found in the local node, the data is searched according to the metadata. The method improves the speed of acquiring data by the third party system.
In the above embodiment, in order to alleviate the load of selecting all nodes to report data to a loader, a plurality of loaders are selected from a plurality of nodes. For the number of the selected headers, preferably, selecting a plurality of headers from a plurality of nodes includes:
and selecting a plurality of Leader from the plurality of nodes according to the number of the nodes in the cluster scale and the storage capacity of each node.
Under the condition of smaller cluster scale, when the number of selected Leader is small, the cluster data acquisition capability can also meet the requirement, and when the number of selected Leader is large, as metadata interaction is needed between each Leader, redundancy is formed in the whole cluster; when the cluster size is large, the number of nodes connected to each Leader is small, and the disk capacity of each Leader is limited, so that the load of each Leader is increased. Therefore, in the implementation, a proper number of the headers are selected, so that the load of each header is not increased as much as possible, and redundancy is not formed in the whole cluster.
In the implementation, a proper number of Leader is selected for a certain cluster, the total number of nodes in the cluster can be obtained first, then the storage capacity of each node is obtained, and the ratio of the total number of nodes in the cluster to the storage capacity of each node is calculated, wherein the obtained ratio is the number of the Leader which is selected due to the cluster scale. Taking a cluster size of 300 nodes as an example, assuming that each node's disk can store 50 nodes of reported data, 300/50=6 Leader will be elected.
According to the method and the device for selecting the plurality of the headers from the plurality of the nodes according to the number of the nodes in the cluster scale and the storage capacity of each node, the plurality of the headers can be selected from the plurality of the nodes as much as possible, on one hand, each header is in an allowable storage range, so that the load of each header is not increased, and on the other hand, redundancy caused to the cluster when the headers are too many can be reduced.
In practice, a Leader downtime may occur. When a node only reports data to a corresponding Leader, if the Leader is down, the data of the node cannot be obtained when a third party system requests the data of the node. Therefore, in order to ensure that data can be acquired when a third party system requests the data, preferably, when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders includes:
when each node is started, acquiring the total number of the nodes of the Leader in the cluster;
and controlling each node to select two Leader with the least number of connected nodes from the plurality of Leader.
As described above, when a node only reports data to a corresponding Leader, if the Leader is down, the data of the node cannot be obtained when a third party system requests the data of the node. Thus, in implementation, one node reports data to two or more headers. However, since metadata exchange is performed between the headers, when one node reports data to more than two headers, the load of the cluster is increased, and therefore, each node preferably reports data to two headers.
The manner of selecting the two headers is not limited. However, in order to enable each Leader to distribute the connected nodes equally, an algorithm may be used to achieve the equalization. The specific algorithm is not limited as long as the number of each Leader connection node can be equalized as much as possible. In this embodiment, in order to ensure that the number of the connected nodes of each Leader is balanced, two Leader with the least number of the connected nodes is selected from a plurality of Leader to upload data. As described above, in fig. 3, assuming that the number of nodes of the Leader1 is 4, the number of nodes of the Leader2 is 2, and the number of nodes of the Leader3 is 1, when the node1 reports data to two Leader, two Leader with the least number of connected nodes is selected from the 3 Leader, and is Leader2 and Leader3, at this time, the node1 reports data to the Leader2 and the Leader3 respectively.
The embodiment selects two headers with the least number of connected nodes from the plurality of headers to upload data. Because the number of reported Leader selected by each node is two, when one Leader is down, the data of the node can be still obtained, and furthermore, the two Leader with the least number of connected nodes is selected from a plurality of Leader to upload the data, so that the balance of the number of the nodes connected by each Leader in the cluster can be ensured as much as possible, and the cluster data acquisition capacity is improved.
When the data is reported to the Leader, the data can be directly stored in the memory, but the capacity of the memory is limited, so that in order to relieve the pressure of the memory, after the data is reported to the two Leader, the method further comprises the following steps:
the data is stored to a database.
After each node reports data to the corresponding Leader, if the data is directly stored in the memory, on one hand, the capacity of the memory is limited, and when the data volume is relatively large, the capacity of the memory cannot meet the storage space required by the data, so that the load of the system is increased, and on the other hand, the running speed of the CPU is influenced. Thus, in practice, after each node reports data to the corresponding Leader, the data is stored in the database.
After the data are reported to the two headers, the data are stored in the database, so that the space occupied in the memory can be reduced, and the running speed of the CPU can be relatively increased.
In the above embodiment, each node reports data to two headers, so when one of them is down, in order to still be able to acquire data, in implementation, in the case that one header is down, data is acquired from the remaining headers.
Since each node reports data to two headers, i.e., both headers contain the data for that node. Thus, when a Leader is down, the third system may be connected to any of the other Leader except for the down Leader. As shown in fig. 3, the third party system is connected to the Leader1, and the data is queried from the Leader1, and when the Leader1 is down, the third party system may be connected to the Leader2 or the Leader3 to query the data.
According to the method and the device for acquiring the data from the rest of the headers under the condition that one header is down, each node reports the data to two headers, so that the data can still be acquired from the rest of the headers under the condition that one header is down, and the cluster data acquisition capacity is improved.
In the implementation, when a Leader of the data reported by a node is in downtime, in order to ensure that a third party system can still acquire the data, in the implementation, under the condition that one Leader is in downtime, a local node contained in the Leader is controlled to select a corresponding Leader from the rest multiple Leader, and the data is reported to the Leader; and proceeds to the step of controlling each Leader to interact with metadata so as to obtain metadata of the remaining nodes in each Leader.
Fig. 4 is a flow chart of data collection and reporting when a Leader goes down. Referring to fig. 3 and fig. 4 of the present embodiment, in fig. 3, the third party system collects data by connecting with the Leader1, the local nodes on the Leader1 are node1, node2, node3 and node4, and if the Leader1 is down, at this time, the node1, node2, node3 and node4 cannot report data to the Leader1, so that it is necessary to report data to the new Leader again for the node1, node2, node3 and node4, and in fig. 4, the Leader1 is down, and the third party system is connected with the Leader2. The selection of a new Leader is not limited as long as it is other than a Leader already containing the node data. As shown in fig. 4, since node1 has reported data to header 2, node1 selects a new header to be header 3 only; since node2 has reported data to header 2, the selection of a new header by node2 can only be header 3, and the method of selecting a new header by node3 and node4 is not described here. In fig. 3 and 4, only three Leader units are shown, and when one of the Leader units is down and the number of the Leader units exceeds three, the other Leader units may be selected as new Leader units from the group consisting of node1, node2, node3 and node4, and the selection of the new Leader unit is not limited as long as the new Leader unit includes only the Leader units of the node data. After a new Leader is selected, reporting data to the Leader; and proceeds to the step of controlling each Leader to interact with metadata so as to obtain metadata of the remaining nodes in each Leader.
According to the method for controlling the interaction metadata of the Leader, under the condition that one Leader is down, new Leader is selected to report data, and the method for controlling the interaction metadata of each Leader is entered.
In the foregoing embodiments, a method for improving the capability of cluster data acquisition is described in detail, and the present application further provides corresponding embodiments of an apparatus for improving the capability of cluster data acquisition. It should be noted that the present application describes an embodiment of the device portion from two angles, one based on the angle of the functional module and the other based on the angle of the hardware.
Fig. 5 is a block diagram of an apparatus for improving cluster data acquisition capability according to an embodiment of the present application. The embodiment is based on the angle of the functional module, and comprises:
a selecting module 10, configured to select a plurality of headers from a plurality of nodes, where the number of headers is less than the number of nodes;
the first control module 11 is configured to control each node to select a corresponding Leader from a plurality of leaders when each node is started, and report data to the Leader so as to obtain data of a corresponding local node in each Leader, and generate metadata by statistics of each Leader on the reported data; the metadata are data representing position information reported by actual data;
and a second control module 12, configured to control each Leader to interact with metadata so as to obtain metadata of the remaining nodes in each Leader.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
The device for improving the cluster data acquisition capability provided by the embodiment firstly selects a plurality of headers from a plurality of nodes through a selection module; then, when each node is started, each node is controlled to select a corresponding Leader from a plurality of Leader, and report data to the Leader so as to obtain the data of the corresponding local node in each Leader and generate metadata by statistics of each Leader on the reported data; and finally, controlling the Leader interaction metadata through a second control module so as to obtain metadata of the rest nodes in the Leader. The device comprises a plurality of headers capable of reporting data, wherein each header comprises data of a local node and metadata of other nodes, so that all nodes do not report the data to the same main node, the problem that the data is concentrated in a certain node to cause insufficient disk capacity is avoided, and the cluster data acquisition capacity is improved.
Fig. 6 is a block diagram of an apparatus for improving cluster data acquisition capability according to another embodiment of the present application. The device for improving the cluster data acquisition capability based on the hardware angle in this embodiment, as shown in fig. 6, includes:
a memory 20 for storing a computer program;
a processor 21 for implementing the steps of the method of improving the capabilities of cluster data acquisition as mentioned in the above embodiments when executing a computer program.
The device for improving the cluster data acquisition capability provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in hardware in at least one of a digital signal processor (Digital Signal Processor, DSP), a Field programmable gate array (Field-Programmable Gate Array, FPGA), a programmable logic array (Programmable Logic Array, PLA). The processor 21 may also comprise a main processor, which is a processor for processing data in an awake state, also called central processor (Central Processing Unit, CPU), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with an image processor (Graphics Processing Unit, GPU) for rendering and rendering of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an artificial intelligence (Artificial Intelligence, AI) processor for processing computing operations related to machine learning.
Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, where the computer program, when loaded and executed by the processor 21, is capable of implementing the relevant steps of the method for improving the capability of cluster data collection disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may further include an operating system 202, data 203, and the like, where the storage manner may be transient storage or permanent storage. The operating system 202 may include Windows, unix, linux, among others. The data 203 may include, but is not limited to, the data referred to above in relation to the method of improving the capabilities of clustered data collection, and the like.
In some embodiments, the device for improving the capability of cluster data collection may further include a display screen 22, an input-output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the structure shown in fig. 6 is not limiting of the means for improving the ability to collect clustered data and may include more or fewer components than shown.
The device for improving the cluster data acquisition capability provided by the embodiment of the application comprises a memory and a processor, wherein the processor can realize the following method when executing a program stored in the memory: the method for improving the cluster data acquisition capability has the same effect.
Finally, the present application also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps as described in the method embodiments above.
It will be appreciated that the methods of the above embodiments, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored on a computer readable storage medium. With such understanding, the technical solution of the present application, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, performing all or part of the steps of the method described in the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The computer readable storage medium provided by the application comprises the method for improving the cluster data acquisition capability, and the method has the same effects.
The method, the device and the medium for improving the cluster data acquisition capability are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (9)

1. A method for improving the ability of clustered data acquisition, comprising:
selecting a plurality of headers from a plurality of nodes, wherein the number of the headers is smaller than the number of the nodes;
when each node is started, controlling each node to select a corresponding Leader from a plurality of the leaders, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader, and generating metadata by each Leader on the reported data statistics; the metadata are data representing position information reported by actual data; the metadata at least comprises node names of current Leader and node names of data reporting;
controlling each Leader to interact with the metadata so as to obtain metadata of the rest nodes except the local node in the cluster in each Leader;
after said controlling each of said headers to interact said metadata so as to obtain metadata of remaining said nodes in each of said headers, further comprising:
when data is requested from the current Leader, judging whether the requested data is on the local node of the current Leader;
if yes, acquiring the data from the local node;
and if not, acquiring the data according to the metadata of the rest nodes.
2. The method for improving the capability of clustered data collection of claim 1 wherein selecting a plurality of headers from a plurality of nodes comprises:
and selecting a plurality of Leader from a plurality of nodes according to the number of the nodes in the cluster scale and the storage capacity of each node.
3. The method of claim 1, wherein when each node is started, controlling each node to select a corresponding Leader from a plurality of leaders comprises:
when each node is started, acquiring the total number of the nodes of the Leader in the cluster;
and controlling each node to select two Leader with the least number of connected nodes from a plurality of Leader.
4. The method for improving the capability of clustered data collection according to claim 3, further comprising, after reporting data to two of said headers:
the data is stored to a database.
5. A method of improving the ability to collect clustered data as claimed in claim 3, further comprising:
and under the condition that one Leader is down, acquiring the data from the rest of the Leader.
6. The method for improving the ability of clustered data acquisition of claim 1 further comprising:
under the condition that one Leader is down, controlling the local node contained in the Leader to select the corresponding Leader from the rest of a plurality of the Leader, and reporting data to the Leader; and entering the step of controlling each Leader to interact with the metadata so as to obtain metadata of the rest of the nodes in each Leader.
7. An apparatus for improving the ability of clustered data acquisition, comprising:
the selecting module is used for selecting a plurality of Leader from a plurality of nodes, wherein the number of the Leader is smaller than that of the nodes;
the first control module is used for controlling each node to select a corresponding Leader from a plurality of the leaders when each node is started, reporting data to the Leader so as to obtain data of a corresponding local node in each Leader and generating metadata by each Leader on the reported data statistics; the metadata are data representing position information reported by actual data; the metadata at least comprises node names of current Leader and node names of data reporting;
the second control module is used for controlling each Leader to interact with the metadata so as to obtain metadata of the rest nodes in each Leader;
after said controlling each of said headers to interact said metadata so as to obtain metadata of remaining said nodes in each of said headers, further comprising:
when data is requested from the current Leader, judging whether the requested data is on the local node of the current Leader;
if yes, acquiring the data from the local node;
and if not, acquiring the data according to the metadata of the rest nodes.
8. An apparatus for improving the ability of clustered data acquisition, comprising:
a memory for storing a computer program;
processor for implementing the steps of the method of improving the capability of clustered data acquisition as claimed in any one of claims 1 to 6 when executing said computer program.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the steps of the method of improving the capability of clustered data acquisition as claimed in any one of claims 1 to 6.
CN202111645492.6A 2021-12-29 2021-12-29 Method, device and medium for improving cluster data acquisition capacity Active CN114328604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111645492.6A CN114328604B (en) 2021-12-29 2021-12-29 Method, device and medium for improving cluster data acquisition capacity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111645492.6A CN114328604B (en) 2021-12-29 2021-12-29 Method, device and medium for improving cluster data acquisition capacity

Publications (2)

Publication Number Publication Date
CN114328604A CN114328604A (en) 2022-04-12
CN114328604B true CN114328604B (en) 2024-01-12

Family

ID=81016378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111645492.6A Active CN114328604B (en) 2021-12-29 2021-12-29 Method, device and medium for improving cluster data acquisition capacity

Country Status (1)

Country Link
CN (1) CN114328604B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109040183A (en) * 2018-06-27 2018-12-18 郑州云海信息技术有限公司 Node information acquisition method, device, equipment and computer readable storage medium
CN111049928A (en) * 2019-12-24 2020-04-21 北京奇艺世纪科技有限公司 Data synchronization method, system, electronic device and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109040183A (en) * 2018-06-27 2018-12-18 郑州云海信息技术有限公司 Node information acquisition method, device, equipment and computer readable storage medium
CN111049928A (en) * 2019-12-24 2020-04-21 北京奇艺世纪科技有限公司 Data synchronization method, system, electronic device and computer readable storage medium

Also Published As

Publication number Publication date
CN114328604A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
US20230400990A1 (en) System and method for performing live partitioning in a data store
CN109660607B (en) Service request distribution method, service request receiving method, service request distribution device, service request receiving device and server cluster
CN110096336B (en) Data monitoring method, device, equipment and medium
CN112860695B (en) Monitoring data query method, device, equipment, storage medium and program product
CN103607424B (en) Server connection method and server system
CN110784498B (en) Personalized data disaster tolerance method and device
CN113806300B (en) Data storage method, system, device, equipment and storage medium
CN108881379B (en) Method and device for data synchronization between server clusters
CN110381136B (en) Data reading method, terminal, server and storage medium
CN112084173A (en) Data migration method and device and storage medium
CN113347238A (en) Message partitioning method, system, device and storage medium based on block chain
CN114328604B (en) Method, device and medium for improving cluster data acquisition capacity
US11727003B2 (en) Scaling query processing resources for efficient utilization and performance
CN112905119B (en) Data write-in control method, device and equipment of distributed storage system
CN113377866A (en) Load balancing method and device for virtualized database proxy service
CN109327520B (en) Method and device for establishing connection between terminal and server node
CN111885159A (en) Data acquisition method and device, electronic equipment and storage medium
CN117992243B (en) Load balancing method and device for middleware and computer equipment
CN114363227B (en) Method and device for determining ECN performance, electronic equipment and storage medium
CN112804335B (en) Data processing method, data processing device, computer readable storage medium and processor
CN113282405B (en) Load adjustment optimization method and terminal
CN117119058B (en) Storage node optimization method in Ceph distributed storage cluster and related equipment
CN110457392B (en) Copy reading and writing method and device
CN110309101B (en) Data management method and Hadoop distributed file system
CN117992243A (en) Load balancing method and device for middleware and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant