CN111901405A - Multi-node monitoring method and device, electronic equipment and storage medium - Google Patents

Multi-node monitoring method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111901405A
CN111901405A CN202010706635.9A CN202010706635A CN111901405A CN 111901405 A CN111901405 A CN 111901405A CN 202010706635 A CN202010706635 A CN 202010706635A CN 111901405 A CN111901405 A CN 111901405A
Authority
CN
China
Prior art keywords
monitoring
monitoring data
data
node
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010706635.9A
Other languages
Chinese (zh)
Other versions
CN111901405B (en
Inventor
郭健伟
季统凯
贺忠堂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN202010706635.9A priority Critical patent/CN111901405B/en
Publication of CN111901405A publication Critical patent/CN111901405A/en
Priority to PCT/CN2021/073799 priority patent/WO2022016845A1/en
Application granted granted Critical
Publication of CN111901405B publication Critical patent/CN111901405B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/101Server selection for load balancing based on network conditions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a multi-node monitoring method and device, electronic equipment and a storage medium. According to the technical scheme provided by the embodiment of the application, each host starts to collect monitoring data of corresponding public resources according to a preset public resource monitoring item and monitoring weight, network delay data of each host and a monitoring data storage node are extracted, the reliability value of each monitoring data collected by each host is calculated by using a predefined reliability calculation formula according to the network delay data and the monitoring weight, and finally the monitoring data with the highest reliability value are screened and stored in the monitoring data storage node. By adopting the technical means, the monitoring data with the highest reliability value is screened and stored, so that the high reliability of acquiring and storing the monitoring data can be ensured, and the system overhead of the cloud computing platform is reduced. And by screening and storing the monitoring data, the repeated acquisition and storage of the monitoring data can be avoided, and the occupation of storage resources of the database is reduced.

Description

Multi-node monitoring method and device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of cloud computing monitoring, in particular to a multi-node monitoring method and device, electronic equipment and a storage medium.
Background
Cloud computing platforms, as an emerging business computing model, are generally composed of a plurality of computing nodes (i.e., hosts), and manage and monitor resources generated by the platform through the plurality of computing nodes. When the cloud computing platform is monitoring, the host generally acquires relevant data of resources on the host, and stores the data in a corresponding database, so as to monitor the cloud computing platform.
However, because there are multiple hosts, there are many resources shared in the cloud computing platform, i.e., different hosts may share the same resource. Therefore, when the storage resources are acquired, each host of the cloud computing platform repeatedly acquires the common resources and stores the common resources. Repeated acquisition and repeated storage of the same monitoring data can increase the storage volume of the monitoring data and occupy the storage resources of the database.
Disclosure of Invention
The embodiment of the application provides a multi-node monitoring method and device, electronic equipment and a storage medium, which can avoid repeated acquisition and storage of monitoring data and reduce occupation of storage resources in a database.
In a first aspect, an embodiment of the present application provides a multi-node monitoring method, including:
each host computer starts to collect monitoring data of corresponding public resources according to a pre-configured public resource monitoring item and a monitoring weight, wherein the monitoring weight is used for adjusting the collection frequency and the collection time point of the monitoring data;
extracting network delay data of each host and a monitoring data storage node, and calculating a reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight;
and screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
Further, each host starts to collect the monitoring data of the corresponding public resource according to the pre-configured public resource monitoring item and the monitoring weight, respectively, and the method comprises the following steps:
and each host machine acquires initial data of corresponding public resources according to the acquisition frequency and the acquisition time point, calculates an average value based on the acquisition frequency and the data quantity of the initial data, and takes the average value as the monitoring data.
Further, the screening of the monitoring data with the highest reliability value and the storing of the monitoring data in the monitoring data storage node include:
and storing the data value of the monitoring data with the highest reliability value, the data acquisition time, the name of the corresponding public resource monitoring item and the data acquisition object in the monitoring data storage node.
Further, the monitoring data storage node is a time-series database.
Further, after the monitoring data with the highest reliability value is screened and stored in the monitoring data storage node, the method further includes:
and taking the host machine corresponding to the monitoring data with the highest reliability value as a monitoring node, keeping the monitoring node to acquire the monitoring data corresponding to the public resource within a set time period, and stopping the rest host machines from acquiring the monitoring data corresponding to the public resource.
Further, after the host corresponding to the monitoring data with the highest reliability value is used as a monitoring node, the monitoring node is kept to acquire the monitoring data of the corresponding public resource within a set time period, and the rest of the hosts are stopped from acquiring the monitoring data of the corresponding public resource, the method further includes:
and after a time period is set, recalculating the reliability value of the monitoring data of the corresponding public resource acquired by each host, and re-determining the monitoring node according to the recalculated reliability value.
Further, before each host starts to collect the monitoring data of the corresponding public resource according to the pre-configured public resource monitoring item and the monitoring weight, the method further comprises the following steps:
and each host machine sets the monitoring weight of the corresponding public resource according to user definition or at random.
In a second aspect, an embodiment of the present application provides a multi-node monitoring apparatus, including:
the system comprises an acquisition module, a monitoring module and a monitoring module, wherein the acquisition module is used for starting to acquire monitoring data of corresponding public resources through each host according to a pre-configured public resource monitoring item and a monitoring weight, and the monitoring weight is used for adjusting the acquisition frequency and the acquisition time point of the monitoring data;
the calculation module is used for extracting network delay data of each host and a monitoring data storage node, and calculating the reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight;
and the screening module is used for screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory and one or more processors;
the memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, cause the one or more processors to implement the multi-node monitoring method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions for performing the multi-node monitoring method according to the first aspect when executed by a computer processor.
According to the embodiment of the application, each host computer starts to collect monitoring data of corresponding public resources according to the pre-configured public resource monitoring items and the monitoring weights, network delay data of each host computer and the monitoring data storage nodes are extracted, the reliability value of each monitoring data collected by each host computer is calculated by using a predefined reliability calculation formula according to the network delay data and the monitoring weights, and finally the monitoring data with the highest reliability value are screened and stored in the monitoring data storage nodes. By adopting the technical means, the monitoring data with the highest reliability value is screened and stored, so that the high reliability of acquiring and storing the monitoring data can be ensured, and the system overhead of the cloud computing platform is reduced. And by screening and storing the monitoring data, the repeated acquisition and storage of the monitoring data can be avoided, and the occupation of storage resources of the database is reduced.
Drawings
Fig. 1 is a flowchart of a multi-node monitoring method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a cloud computing platform according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a multi-node monitoring apparatus according to a second embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The multi-node monitoring method aims to calculate the reliability value of monitoring data collected by each host (namely, a computing node) and select the monitoring data collected by one host to be stored based on the reliability value, so that repeated acquisition and storage of the monitoring data are avoided, and occupation of storage resources in a database is reduced. Compared with the traditional cloud computing platform, the cloud computing platform generally processes the business by a plurality of computing nodes or a plurality of hosts, and the cloud computing monitoring monitors the resources based on the structure. The cloud computing high scheduling, the general management of various resources, the sharing of various resources, and the inevitable monitoring of the common resources during the monitoring process can cause the repeated acquisition of the monitoring data of the common resources by different hosts. The data storage volume is easily too large due to repeated storage of data, and particularly, under the condition that a cloud computing platform has a large number of hosts, the data storage volume of the monitoring data storage node is too large. When each host acquires the same monitoring data, all hosts also need to use additional storage overhead for monitoring the data. Therefore, the multi-node monitoring method provided by the embodiment of the application is provided to solve the technical problems that the existing cloud computing platform monitors repeated storage of data and storage resources are excessively occupied.
The first embodiment is as follows:
fig. 1 is a flowchart of a multi-node monitoring method according to an embodiment of the present disclosure, where the multi-node monitoring method provided in this embodiment may be executed by a multi-node monitoring device, the multi-node monitoring device may be implemented in a software and/or hardware manner, and the multi-node monitoring device may be formed by two or more physical entities or may be formed by one physical entity. Generally, the multi-node monitoring device is a cloud computing platform.
The following description will be given taking the cloud computing platform as an example of a main body for executing the multi-node monitoring method. Referring to fig. 1, the multi-node monitoring method specifically includes:
s110, each host starts to collect monitoring data of corresponding public resources according to a preset public resource monitoring item and a monitoring weight, wherein the monitoring weight is used for adjusting the collection frequency and the collection time point of the monitoring data.
And the cloud computing platform monitors various resources in the operation process. And for the monitoring of the public resources, public resource monitoring items configured in advance by each host machine are correspondingly monitored. Referring to fig. 2, a schematic diagram of a cloud computing platform is provided. The cloud computing platform connects each host 11 with the central management node 12, and is used for configuring relevant parameters. And the host 11 is connected with the monitoring data storage node 13, and the monitoring data collected by the host 11 is stored in the monitoring data storage node 13. Specifically, before monitoring the public resource, public resource monitoring items need to be configured on each host in advance, the public resource monitoring items indicate the public resource which the host needs to monitor, and according to the public resource monitoring items, the host correspondingly acquires monitoring data of the public resource. It should be noted that, for different hosts, the configured common resource monitoring items may be the same or different. For a host, there are usually multiple common resource monitoring items. Generally speaking, according to the actual monitoring requirement, a corresponding number of hosts are configured for one common resource monitoring item. And when monitoring the corresponding public resource, acquiring monitoring data by each host machine configured with the public resource monitoring item.
On the other hand, on each host machine, monitoring weights are configured for different public resource monitoring items, and the monitoring weights represent system overhead paid by the host machine for monitoring the corresponding public resources. Specifically, each host sets the monitoring weight of the corresponding public resource according to user definition or at random. It can be understood that, corresponding to different public resource monitoring items configured on the same host, the configured monitoring weights may be the same or different. It should be noted that, the configuration of the public resource monitoring item and the monitoring weight may be performed by a user interacting with the central management node to configure the public resource monitoring item and the corresponding monitoring weight of each host, and the central management node issues corresponding configuration information to each host according to the configuration operation of the user to perform the configuration of the public resource monitoring item and the monitoring weight. In addition, the user can also directly configure the resource monitoring items and the corresponding monitoring weights on the host machine. It should be noted that, if a host is not configured for monitoring a certain public resource monitoring item, the host is randomly configured for the public resource monitoring item, and the monitoring weight is also randomly set.
Furthermore, each host machine adjusts the acquisition frequency and the acquisition time point of the monitoring data corresponding to the public resource according to the monitoring weight of the monitoring item of the corresponding public resource configured in advance. The acquisition frequency represents the frequency of acquiring the monitoring data in unit time, and the acquisition time point represents the time stamp of acquiring the monitoring data. Thereafter, when the monitoring data of the corresponding common resource is acquired, the acquisition of the monitoring data is performed based on the acquisition frequency and the acquisition time point.
In one embodiment, each of the hosts acquires initial data of a corresponding common resource according to the acquisition frequency and the acquisition time point, calculates an average value based on the acquisition frequency and the data amount of the initial data, and uses the average value as the monitoring data. For example, when monitoring Ceph (distributed file system), the read-write rate (i.e., monitoring data) of Ceph is obtained. The read-write rate of Ceph can be understood as the average amount of data read-write of Ceph in 1 minute. After the acquisition frequency and the acquisition time point are adjusted according to the monitoring weight, if the monitoring weight is 1, the host machine only acquires the read-write quantity of Ceph for 1 time within 1 minute, and then the average value is removed; if the monitoring weight is set to 10, the host collects 10 times of Ceph reading and writing quantities within 1 minute, and then averages the values. And finally, the obtained average value is used as monitoring data, so that the universality and representativeness of the monitoring data can be reflected, and the data is more reliable. It should be noted that, in the embodiment of the present application, the monitoring weight generally takes a value of 1 to 60 corresponding to the monitoring data acquisition of any public resource within one minute. For example, if the monitoring weight is 30, the acquisition frequency is 60/30, i.e., 2S is used to acquire the monitoring data once. The collection time points may be 0S, 2S, 4S.
S120, extracting network delay data of each host and a monitoring data storage node, and calculating the reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight.
As one public resource comprises a plurality of host machines for collecting monitoring data, for the same public resource, in order to avoid repeated storage of the monitoring data, the monitoring data with the highest reliability is screened for storage, so that the system overhead is saved. And determining the reliability of the monitoring data acquired by each host according to the network connectivity of each host and the monitoring data storage node and the monitoring weight of each host on the public resource. In the embodiment of the application, network delay data is used for representing the network connectivity of each host and the monitoring data storage node. It will be appreciated that the better the network connectivity of the host with the monitoring data storage node, the smaller the value of its network latency data.
Specifically, a predefined reliability calculation formula is used for calculating the reliability value of the monitoring data acquired by each host machine. And expressing the reliability of each monitoring data by using a reliability numerical value. It will be appreciated that the higher the reliability value, the higher the reliability of the monitored data. Wherein, the reliability calculation formula is as follows:
h=(w*0.3*0.01)+(100/g*0.7)*100
wherein h is a reliability value, w is a monitoring weight, the value is 1-60, and g is network delay data. 0.3 and 0.7 represent the ratio of network connectivity and monitoring weight to influence reliability values, which are defined in terms of measured data. It should be noted that, corresponding to the impression of the reliability value, the network connectivity takes 7, because the network connectivity of the host and the monitoring data storage node is more consistent with the source of the system overhead than the monitoring weight. Since the monitoring weight is set manually or randomly, the objectivity is relatively low, and therefore, setting 0.3 and 0.7 respectively represents the ratio of the network connectivity to the reliability value affected by the monitoring weight. Further, since the network delay is inversely proportional to the network connectivity, the reliability value of the portion is calculated using "(100/g × 0.7) × 100" unlike the monitoring weight calculation method in the above reliability calculation formula. It should be noted that the reliability calculation formula is only one form of calculating the reliability value of the monitoring data in the embodiment of the present application, and in practical application, according to the calculation requirement of the reliability value, other calculation formulas may be predefined to calculate the reliability value. On the other hand, the unit used by the network delay data g is "millisecond", and when extracting the network delay data, only the numerical value thereof needs to be extracted and substituted into the formula, and the unit thereof does not need to be used.
Further, based on the reliability calculation formula, network delay data of each host and the monitoring data storage node and monitoring weights corresponding to the monitoring data are extracted and substituted into the reliability calculation formula, and a reliability value corresponding to the monitoring data acquired by each host is calculated.
And S130, screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
Finally, the reliability values of the monitoring data of the host machines calculated based on the reliability calculation formula can be screened out and stored by comparing the reliability values. When the monitoring data with the maximum reliability value are stored, the data value, the data acquisition time, the name of the corresponding public resource monitoring item and the data acquisition object of the monitoring data with the maximum reliability value are stored in the monitoring data storage node.
Further, in the embodiment of the present application, the monitoring data storage node may be a relational database or a time-series database. Since the overhead of storing the monitoring data is the most time consuming in terms of processing efficiency, if a relational database is used, the acquisition of time consumes a part of the resources in the program, and the relational database cannot satisfy the processing of a large amount of data. The time sequence type database inflixdb can be used for solving partial resources consumed by time, the time sequence type database inflixdb can automatically acquire time of inserted data, and simultaneous processing of a large amount of data can be met. Therefore, the monitoring data storage node of the embodiment of the present application is preferably a time-ordered database.
Further, after the monitoring data with the highest reliability value is determined, the host machine corresponding to the monitoring data with the highest reliability value is used as a monitoring node, the monitoring node is kept to acquire the monitoring data of the corresponding public resource within a set time period, and the rest host machines are stopped from acquiring the monitoring data of the corresponding public resource. Further, after a time period is set, recalculating the reliability value of the monitoring data of the corresponding public resource acquired by each host, and re-determining the monitoring node according to the recalculated reliability value.
For example, with 10 minutes as a set time period, when monitoring public resources in a cycle of every 10 minutes and acquiring first monitoring data of the cycle (for example, acquiring the monitoring data of the first minute), each host acquires the monitoring data according to a public resource monitoring item and a corresponding monitoring weight, further corresponding to each public resource, determines the monitoring data with the highest reliability value by comparing the reliability values of the monitoring data acquired by each host, and stores the monitoring data in a monitoring data storage node. And the host machine corresponding to the monitoring data is taken as the monitoring node, and the monitoring node collects and stores the monitoring data corresponding to the public resource when the monitoring data of the public resource is collected in the period. By analogy, the monitoring nodes are determined to be circulated every 10 minutes, the monitoring nodes collect and store monitoring data, and the other hosts suspend the collection of the monitoring data of the corresponding public resources, so that the system overhead of the computing cloud platform is saved.
The monitoring data of the corresponding public resources are collected through the host machines according to the pre-configured public resource monitoring items and the monitoring weights, the network delay data of the host machines and the monitoring data storage nodes are extracted, the reliability values of the monitoring data collected by the host machines are calculated according to the network delay data and the monitoring weights by using a predefined reliability calculation formula, and finally the monitoring data with the highest reliability values are screened and stored in the monitoring data storage nodes. By adopting the technical means, the monitoring data with the highest reliability value is screened and stored, so that the high reliability of acquiring and storing the monitoring data can be ensured, and the system overhead of the cloud computing platform is reduced. And by screening and storing the monitoring data, the repeated acquisition and storage of the monitoring data can be avoided, and the occupation of storage resources of the database is reduced.
Example two:
on the basis of the foregoing embodiments, fig. 3 is a schematic structural diagram of a multi-node monitoring apparatus according to a second embodiment of the present application. Referring to fig. 3, the multi-node monitoring apparatus provided in this embodiment specifically includes: an acquisition module 21, a calculation module 22 and a screening module 23.
The acquisition module 21 is configured to start, by each host, acquiring monitoring data of a corresponding public resource according to a preconfigured public resource monitoring item and a monitoring weight, where the monitoring weight is used to adjust an acquisition frequency and an acquisition time point of the monitoring data;
the calculation module 22 is configured to extract network delay data of each host and a monitoring data storage node, and calculate a reliability value of each monitoring data acquired by each host according to the network delay data and the monitoring weight by using a predefined reliability calculation formula;
the screening module 23 is configured to screen the monitoring data with the highest reliability value and store the monitoring data in the monitoring data storage node.
The monitoring data of the corresponding public resources are collected through the host machines according to the pre-configured public resource monitoring items and the monitoring weights, the network delay data of the host machines and the monitoring data storage nodes are extracted, the reliability values of the monitoring data collected by the host machines are calculated according to the network delay data and the monitoring weights by using a predefined reliability calculation formula, and finally the monitoring data with the highest reliability values are screened and stored in the monitoring data storage nodes. By adopting the technical means, the monitoring data with the highest reliability value is screened and stored, so that the high reliability of acquiring and storing the monitoring data can be ensured, and the system overhead of the cloud computing platform is reduced. And by screening and storing the monitoring data, the repeated acquisition and storage of the monitoring data can be avoided, and the occupation of storage resources of the database is reduced.
The multi-node monitoring device provided by the second embodiment of the present application can be used for executing the multi-node monitoring method provided by the first embodiment of the present application, and has corresponding functions and beneficial effects.
Example three:
an embodiment of the present application provides an electronic device, and with reference to fig. 4, the electronic device includes: a processor 31, a memory 32, a communication module 33, an input device 34, and an output device 35. The number of processors in the electronic device may be one or more, and the number of memories in the electronic device may be one or more. The processor, memory, communication module, input device, and output device of the electronic device may be connected by a bus or other means.
The memory 32 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the multi-node monitoring method according to any embodiment of the present application (for example, the collection module, the calculation module, and the screening module in the multi-node monitoring apparatus). The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The communication module 33 is used for data transmission.
The processor 31 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory, that is, implements the multi-node monitoring method described above.
The input device 34 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 35 may include a display device such as a display screen.
The electronic device provided above can be used to execute the multi-node monitoring method provided in the first embodiment, and has corresponding functions and advantages.
Example four:
embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a multi-node monitoring method, the multi-node monitoring method including: each host computer starts to collect monitoring data of corresponding public resources according to a pre-configured public resource monitoring item and a monitoring weight, wherein the monitoring weight is used for adjusting the collection frequency and the collection time point of the monitoring data; extracting network delay data of each host and a monitoring data storage node, and calculating a reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight; and screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media residing in different locations, e.g., in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the multi-node monitoring method described above, and may also perform related operations in the multi-node monitoring method provided in any embodiment of the present application.
The multi-node monitoring apparatus, the storage medium, and the electronic device provided in the above embodiments may execute the multi-node monitoring method provided in any embodiment of the present application, and reference may be made to the multi-node monitoring method provided in any embodiment of the present application without detailed technical details described in the above embodiments.
The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (10)

1. A multi-node monitoring method, comprising:
each host computer starts to collect monitoring data of corresponding public resources according to a pre-configured public resource monitoring item and a monitoring weight, wherein the monitoring weight is used for adjusting the collection frequency and the collection time point of the monitoring data;
extracting network delay data of each host and a monitoring data storage node, and calculating a reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight;
and screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
2. The multi-node monitoring method according to claim 1, wherein each host starts collecting the monitoring data of the corresponding public resource according to the pre-configured public resource monitoring item and the monitoring weight, respectively, and comprises:
and each host machine acquires initial data of corresponding public resources according to the acquisition frequency and the acquisition time point, calculates an average value based on the acquisition frequency and the data quantity of the initial data, and takes the average value as the monitoring data.
3. The multi-node monitoring method according to claim 1, wherein the step of screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node comprises:
and storing the data value of the monitoring data with the highest reliability value, the data acquisition time, the name of the corresponding public resource monitoring item and the data acquisition object in the monitoring data storage node.
4. The multi-node monitoring method of claim 3, wherein the monitoring data storage node is a time-sequential database.
5. The multi-node monitoring method according to claim 1, further comprising, after the monitoring data with the highest reliability value is screened and stored in the monitoring data storage node:
and taking the host machine corresponding to the monitoring data with the highest reliability value as a monitoring node, keeping the monitoring node to acquire the monitoring data corresponding to the public resource within a set time period, and stopping the rest host machines from acquiring the monitoring data corresponding to the public resource.
6. The multi-node monitoring method according to claim 5, wherein after the host corresponding to the monitoring data with the highest reliability value is used as a monitoring node, the monitoring node keeps acquiring the monitoring data of the corresponding public resource within a set period of time, and the rest of the hosts are stopped from acquiring the monitoring data of the corresponding public resource, the method further comprises:
and after a time period is set, recalculating the reliability value of the monitoring data of the corresponding public resource acquired by each host, and re-determining the monitoring node according to the recalculated reliability value.
7. The multi-node monitoring method according to claim 1, wherein before each host starts collecting the monitoring data of the corresponding public resource according to the pre-configured public resource monitoring item and the monitoring weight, the method further comprises:
and each host machine sets the monitoring weight of the corresponding public resource according to user definition or at random.
8. A multi-node monitoring apparatus, comprising:
the system comprises an acquisition module, a monitoring module and a monitoring module, wherein the acquisition module is used for starting to acquire monitoring data of corresponding public resources through each host according to a pre-configured public resource monitoring item and a monitoring weight, and the monitoring weight is used for adjusting the acquisition frequency and the acquisition time point of the monitoring data;
the calculation module is used for extracting network delay data of each host and a monitoring data storage node, and calculating the reliability value of each monitoring data acquired by each host by using a predefined reliability calculation formula according to the network delay data and the monitoring weight;
and the screening module is used for screening the monitoring data with the highest reliability value and storing the monitoring data in the monitoring data storage node.
9. An electronic device, comprising:
a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the multi-node monitoring method of any of claims 1-7.
10. A storage medium containing computer-executable instructions for performing the multi-node monitoring method of any of claims 1-7 when executed by a computer processor.
CN202010706635.9A 2020-07-21 2020-07-21 Multi-node monitoring method and device, electronic equipment and storage medium Active CN111901405B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010706635.9A CN111901405B (en) 2020-07-21 2020-07-21 Multi-node monitoring method and device, electronic equipment and storage medium
PCT/CN2021/073799 WO2022016845A1 (en) 2020-07-21 2021-01-26 Multi-node monitoring method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010706635.9A CN111901405B (en) 2020-07-21 2020-07-21 Multi-node monitoring method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111901405A true CN111901405A (en) 2020-11-06
CN111901405B CN111901405B (en) 2023-05-05

Family

ID=73190386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010706635.9A Active CN111901405B (en) 2020-07-21 2020-07-21 Multi-node monitoring method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111901405B (en)
WO (1) WO2022016845A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022016845A1 (en) * 2020-07-21 2022-01-27 国云科技股份有限公司 Multi-node monitoring method and apparatus, electronic device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473834B (en) * 2022-09-14 2024-04-02 中国电信股份有限公司 Monitoring task scheduling method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102761454A (en) * 2011-04-28 2012-10-31 中兴通讯股份有限公司 Method and system for monitoring internet of things
US20170155560A1 (en) * 2015-12-01 2017-06-01 Quanta Computer Inc. Management systems for managing resources of servers and management methods thereof
CN109062699A (en) * 2018-08-15 2018-12-21 郑州云海信息技术有限公司 A kind of resource monitoring method, device, server and storage medium
CN109688106A (en) * 2018-11-19 2019-04-26 中国科学院信息工程研究所 A kind of data collaborative acquisition method and system
CN109714402A (en) * 2018-12-12 2019-05-03 胡书恺 A kind of redundant data acquisition system and its operation application method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104917836A (en) * 2015-06-10 2015-09-16 北京奇虎科技有限公司 Method and device for monitoring and analyzing availability of computing equipment based on cluster
CN107844402A (en) * 2017-11-17 2018-03-27 北京联想超融合科技有限公司 A kind of resource monitoring method, device and terminal based on super fusion storage system
CN111258870A (en) * 2020-01-17 2020-06-09 中国建设银行股份有限公司 Performance analysis method, device, equipment and storage medium of distributed storage system
CN111901405B (en) * 2020-07-21 2023-05-05 国云科技股份有限公司 Multi-node monitoring method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102761454A (en) * 2011-04-28 2012-10-31 中兴通讯股份有限公司 Method and system for monitoring internet of things
US20170155560A1 (en) * 2015-12-01 2017-06-01 Quanta Computer Inc. Management systems for managing resources of servers and management methods thereof
CN109062699A (en) * 2018-08-15 2018-12-21 郑州云海信息技术有限公司 A kind of resource monitoring method, device, server and storage medium
CN109688106A (en) * 2018-11-19 2019-04-26 中国科学院信息工程研究所 A kind of data collaborative acquisition method and system
CN109714402A (en) * 2018-12-12 2019-05-03 胡书恺 A kind of redundant data acquisition system and its operation application method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022016845A1 (en) * 2020-07-21 2022-01-27 国云科技股份有限公司 Multi-node monitoring method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
WO2022016845A1 (en) 2022-01-27
CN111901405B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN108512719B (en) Integrated resource monitoring system based on open-source cloud platform
CN108920153A (en) A kind of Docker container dynamic dispatching method based on load estimation
WO2021093365A1 (en) Gpu video memory management control method and related device
CN107908521A (en) A kind of monitoring method of container performance on the server performance and node being applied under cloud environment
CN112527848B (en) Report data query method, device and system based on multiple data sources and storage medium
CN111901405B (en) Multi-node monitoring method and device, electronic equipment and storage medium
CN106487601A (en) Resource monitoring method, apparatus and system
CN110807145A (en) Query engine acquisition method, device and computer-readable storage medium
CN110784377A (en) Method for uniformly managing cloud monitoring data in multi-cloud environment
CN111680085A (en) Data processing task analysis method and device, electronic equipment and readable storage medium
CN109597764A (en) A kind of test method and relevant apparatus of catalogue quota
CN111858656A (en) Static data query method and device based on distributed architecture
CN111427887A (en) Method, device and system for rapidly scanning HBase partition table
CN111368413A (en) Tracking management method and system for clothing production plan
CN115563160A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN115984022A (en) Unified account checking method and device for distributed payment system
CN109522124B (en) Storage management system loading method, device, equipment and readable storage medium
CN112860529B (en) Universal analysis device and method
CN114020214A (en) Storage cluster capacity expansion method and device, electronic equipment and readable storage medium
CN113448867A (en) Software pressure testing method and device
CN113722141A (en) Method and device for determining delay reason of data task, electronic equipment and medium
CN107943902B (en) Call bill collection method and device
CN108733484A (en) The method and apparatus of management application
CN112084022B (en) Project capacity planning method and device, computer equipment and storage medium
CN112463524B (en) External storage real-time monitoring method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant