CN112104493A - Acquisition and analysis system for low-delay host resource monitoring in cluster environment - Google Patents

Acquisition and analysis system for low-delay host resource monitoring in cluster environment Download PDF

Info

Publication number
CN112104493A
CN112104493A CN202010929325.3A CN202010929325A CN112104493A CN 112104493 A CN112104493 A CN 112104493A CN 202010929325 A CN202010929325 A CN 202010929325A CN 112104493 A CN112104493 A CN 112104493A
Authority
CN
China
Prior art keywords
data
monitoring
host
network
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010929325.3A
Other languages
Chinese (zh)
Inventor
吴晓勇
晏东
白俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Ghostcloud Technology Co ltd
Original Assignee
Chengdu Ghostcloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Ghostcloud Technology Co ltd filed Critical Chengdu Ghostcloud Technology Co ltd
Priority to CN202010929325.3A priority Critical patent/CN112104493A/en
Publication of CN112104493A publication Critical patent/CN112104493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0695Management of faults, events, alarms or notifications the faulty arrangement being the maintenance, administration or management system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays

Abstract

The invention discloses a low-delay host resource monitoring acquisition and analysis system in a cluster environment, and belongs to the field of clustered host resource monitoring. The monitoring system comprises a plurality of host monitoring modules, a monitoring center module and a data storage module, wherein the host monitoring modules are all connected with the monitoring center module, the data storage module is connected with the monitoring center module, the host monitoring modules are mainly responsible for collecting host resource use conditions and sending data to the monitoring center module, and the monitoring center receives the data from the host monitoring modules, collects and aggregates the data, and calls a data storage module interface to store the monitoring data. The invention collects all aspects of monitoring data of the host by directly analyzing various files recorded by the operating system, achieves the aim of quickly and efficiently acquiring the monitoring data, and can store the packaged data to be monitored in the data storage module and inquire the data in the monitoring center module. The method is suitable for monitoring the host in a battlefield environment.

Description

Acquisition and analysis system for low-delay host resource monitoring in cluster environment
Technical Field
The invention relates to the field of clustered host resource monitoring, in particular to a low-delay host resource monitoring acquisition and analysis system in a clustered environment.
Background
With the development of the internet technology, the system scale is larger and larger, the task processing amount is larger and larger, the computing power of a single host can not meet the requirement, the clustering is carried out at the end, and a plurality of hosts are utilized to perform parallel operation, so that the overall computing power of the system is improved, and the cost is reduced. The cluster can also improve the expansibility of the system and dynamically increase or decrease the hosts to adapt to the overall calculation performance requirement of the system. The reliability is greatly improved in the cluster system, when a single host fails, other hosts can still provide services to the outside, the availability of the whole system is ensured, and the loss caused by the failure is reduced.
Specifically, if the clustering is applied to battlefield monitoring, real-time performance and accuracy of monitoring data of each host resource in a cluster environment need to be guaranteed, but no corresponding measures are currently taken to guarantee real-time performance and accuracy of the monitoring data.
Disclosure of Invention
The invention aims to provide a low-delay host resource monitoring acquisition and analysis system in a cluster environment, which can ensure the real-time performance and accuracy of monitoring data of each host resource in the cluster environment.
The invention solves the technical problem, and adopts the technical scheme that: the acquisition and analysis system for low-delay host resource monitoring in cluster environment comprises a plurality of host monitoring modules, a monitoring center module and a data storage module, wherein the host monitoring modules are all connected with the monitoring center module, and the data storage module is connected with the monitoring center module;
each host monitoring module in the plurality of host monitoring modules is used for establishing and maintaining long connection with the monitoring center module after being started, the host monitoring modules collect data to be monitored according to a set time interval through a resource collection method, cache the data collected at the current time to the local, and perform difference calculation on the data collected twice according to a resource use calculation method when collecting data at the next time to obtain resource use condition data, package the resource use condition data, and send the resource use condition data to the monitoring center module through remote calling;
the monitoring center module is used for receiving data from each host monitoring module, filtering invalid data through screening, and storing the invalid data to the data storage module;
the data storage module comprises a key value database and a relational database, wherein the key value database is used for storing the latest screened monitoring data of each host, and the relational database is used for storing each screened monitoring data of each host;
the monitoring center module comprises an interface for inquiring monitoring data, and the following data inquiry modes exist through the interface: when the real-time state of each current host needs to be inquired, the monitoring center module acquires the latest data from the key value database and feeds the latest data back to a user; and the number of the first and second groups,
when the condition of the host computer in a certain time period needs to be inquired, the monitoring center module acquires data in the certain time period from the relational database, and the data are aggregated through an algorithm and fed back to a user.
Further, the monitoring center module saves the monitoring data by calling a data storage module interface.
Further, the data to be monitored comprises CPU, memory, disk and network resource use condition data.
Further, the collection process of the CPU service condition data comprises the following steps: installing a host monitoring module program on a host, reading/proc/stat file information by the module program, analyzing a first row of data, obtaining user, iowait, irrq, softirq, nice, system, idle, steady, gust and gust _ nice index data, recording the total time all of a CPU as the sum of each data item of the user, iowait, irrq, softirq, nice, system, steady, gust and gust _ nice, recording the total use time use of the CPU as all minus idle data item, respectively obtaining once all and use before and after a time period when counting the CPU use rate in the time period, then calculating a difference value, recording the first obtained data as all _ start, use _ start, and the second obtained data as all _ end and use _ end, wherein the CPU uses a user Rat calculation formula in the time period as follows: useRate (use _ end-use _ start)/(all _ end-all _ start) 100%.
Further, the acquisition process of the memory use condition data comprises the following steps: the host monitoring module program reads/proc/meminfo files, selects MemTotal and MemAvailable index data, records the total memory as MemTotal data item, records the available memory as MemAvailable data item, and records the used memory Usedmemory as the difference value of MemTotal minus MemAvailable.
Further, the acquisition process of the disk use condition data comprises the following steps: the disk data executes the df-k-T, ext2-T, ext3-T, ext4-T, vfat-T, xfs-T, nfs-T and nfs4-T commands through a host monitoring module program to obtain Used and Available data items, one file system usage is recorded as Used, the residual usage is recorded as Available, and the total amount is the sum of the Used and the value of the Available.
Further, the acquisition process of the network resource usage data is as follows: the network data checks/proc/net/dev files through a host monitoring module program, analyzes monitored network card data, obtains a current time point, records a total network outlet flow as netwout and a network inlet flow as netwokn, respectively obtains the netwout and the netwokn before and after a time period when network speed is calculated, then calculates a difference value, records the first obtained data as netwout _ start and netwokn _ start, obtains the second obtained data as netwout _ end and netwokn _ end, and when a time interval of two times of recording is t, the network outlet speed formula is as follows: the network out _ speed is (network out _ end-network out _ start)/t, and the network entry speed formula is as follows: network in _ speed ═ (network in _ end-network in _ start)/t.
The invention has the advantages that through the low-delay host resource monitoring acquisition and analysis system in the cluster environment, monitoring data of all aspects of the host can be acquired through directly analyzing various files recorded by the operating system in the cluster environment, the purpose of quickly and efficiently acquiring the monitoring data is achieved, and the data is compressed and then sent to the monitoring center for summarizing and storing. In the process of collecting and transmitting the monitoring data, the time consumption is reduced, and the host resource monitoring with low delay can be realized.
In addition, the invention can reduce the delay of data acquisition and transmission on the basis of the existing host resource monitoring technology, ensure the real-time performance and accuracy of host resource monitoring data in the cluster environment, and improve the real-time performance of monitoring the load condition of the combat equipment when being applied to the battlefield environment.
Drawings
Fig. 1 is a block diagram of a low-latency collection and analysis system for monitoring host resources in a cluster environment according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and embodiments.
Examples
The embodiment of the invention provides a low-delay host resource monitoring acquisition and analysis system in a cluster environment, and the structural block diagram of the system is shown in figure 1, wherein the system comprises a plurality of host monitoring modules, a monitoring center module and a data storage module, the plurality of host monitoring modules are all connected with the monitoring center module, and the data storage module is connected with the monitoring center module.
In the system, each host monitoring module in a plurality of host monitoring modules is used for establishing and maintaining long connection with a monitoring center module after being started, the host monitoring module collects data to be monitored according to a set time interval by a resource collection method, caches the data collected at the current time to the local, and performs difference calculation on the data collected at two times according to a resource use calculation method when collecting data at the next time to obtain resource use condition data, packages the resource use condition data and sends the resource use condition data to the monitoring center module through remote calling; the monitoring center module is used for receiving data from each host monitoring module, filtering invalid data through screening, and storing the invalid data to the data storage module; the data storage module comprises a key value database and a relational database, wherein the key value database is used for storing the latest screened monitoring data of each host, and the relational database is used for storing each screened monitoring data of each host, so that historical data information can be conveniently provided; the monitoring center module comprises an interface for inquiring monitoring data, and the following data inquiry modes exist through the interface: when the real-time state of each current host needs to be inquired, the monitoring center module acquires the latest data from the key value database and feeds the latest data back to a user; and when the condition of the host computer in a certain time period needs to be inquired, the monitoring center module acquires data in the certain time period from the relational database, and the data are aggregated through an algorithm and fed back to the user.
In this embodiment, in order to realize fast storage of data, the monitoring center module may store the monitoring data by calling the data storage module interface; in practical application, the data to be monitored generally includes CPU, memory, disk and network resource usage data.
Specifically, the collection process of the CPU usage data is as follows: installing a host monitoring module program on a host, reading/proc/stat file information by the module program, analyzing a first row of data, obtaining user, iowait, irrq, softirq, nice, system, idle, steady, gust and gust _ nice index data, recording the total time all of a CPU as the sum of each data item of the user, iowait, irrq, softirq, nice, system, steady, gust and gust _ nice, recording the total use time use of the CPU as all minus idle data item, respectively obtaining once all and use before and after a time period when counting the CPU use rate in the time period, then calculating a difference value, recording the first obtained data as all _ start, use _ start, and the second obtained data as all _ end and use _ end, wherein the CPU uses a user Rat calculation formula in the time period as follows: useRate (use _ end-use _ start)/(all _ end-all _ start) 100%.
The acquisition process of the memory use condition data comprises the following steps: the host monitoring module program reads/proc/meminfo files, selects MemTotal and MemAvailable index data, records the total memory as MemTotal data item, records the available memory as MemAvailable data item, and records the used memory Usedmemory as the difference value of MemTotal minus MemAvailable.
The acquisition process of the data of the use condition of the magnetic disk comprises the following steps: the disk data executes the df-k-T, ext2-T, ext3-T, ext4-T, vfat-T, xfs-T, nfs-T and nfs4-T commands through a host monitoring module program to obtain Used and Available data items, one file system usage is recorded as Used, the residual usage is recorded as Available, and the total amount is the sum of the Used and the value of the Available.
The acquisition process of the network resource use condition data comprises the following steps: the network data checks/proc/net/dev files through a host monitoring module program, analyzes monitored network card data, obtains a current time point, records a total network outlet flow as netwout and a network inlet flow as netwokn, respectively obtains the netwout and the netwokn before and after a time period when network speed is calculated, then calculates a difference value, records the first obtained data as netwout _ start and netwokn _ start, obtains the second obtained data as netwout _ end and netwokn _ end, and when a time interval of two times of recording is t, the network outlet speed formula is as follows: the network out _ speed is (network out _ end-network out _ start)/t, and the network entry speed formula is as follows: network in _ speed ═ (network in _ end-network in _ start)/t.
In a specific application of this embodiment, after the host monitoring module is started, the host monitoring module is connected with the remote monitoring center module for a long time, and the host monitoring module collects data according to the resource collection method at a set time interval, and when the data is collected for the first time, the data collected for the first time is cached locally, and when the data is collected for the second time, the data collected for the two times is subjected to difference value calculation according to the resource usage calculation method to obtain resource usage data, and the CPU, the memory, the disk and the network resource usage data are packaged and sent to the monitoring center module through remote invocation. The data collected for the second time is still cached locally, operation is carried out after the data is collected for the third time, and the like is carried out on subsequent collection.
The monitoring center module receives data from the monitoring modules of the hosts, filters invalid data through screening, stores the latest monitoring data of each host in the key value database, and stores each piece of monitoring data in the relational database, so that historical monitoring information can be provided conveniently. The monitoring center provides an interface for inquiring monitoring data, and has two data inquiry modes: the first is to inquire the real-time state of each current host, the monitoring center obtains the latest data from the key value database, the second is to inquire the state of the host in a certain time period, and the monitoring center obtains the data in a certain time period from the relational database, aggregates the data through an algorithm and returns the aggregated data to the user.
In the embodiment, the monitoring data is obtained by directly reading the memory mapping file and executing the query command, so that accurate resource use data is quickly obtained, a long connection is maintained in the monitoring module and the monitoring center to send data, repeated network connection is avoided, resource consumption is reduced, and delay is effectively reduced in the acquisition and transmission processes of the operations. Two databases are used to store the monitoring data to accommodate different scenarios. For example, in a battlefield environment, the condition of each device changes at any time, and the changes can be rapidly monitored by the monitoring mode, so that support and reference are provided for rapid response of the battlefield.

Claims (7)

1. The acquisition and analysis system for low-delay host resource monitoring in cluster environment is characterized by comprising a plurality of host monitoring modules, a monitoring center module and a data storage module, wherein the host monitoring modules are all connected with the monitoring center module, and the data storage module is connected with the monitoring center module;
each host monitoring module in the plurality of host monitoring modules is used for establishing and maintaining long connection with the monitoring center module after being started, the host monitoring modules collect data to be monitored according to a set time interval through a resource collection method, cache the data collected at the current time to the local, and perform difference calculation on the data collected twice according to a resource use calculation method when collecting data at the next time to obtain resource use condition data, package the resource use condition data, and send the resource use condition data to the monitoring center module through remote calling;
the monitoring center module is used for receiving data from each host monitoring module, filtering invalid data through screening, and storing the invalid data to the data storage module;
the data storage module comprises a key value database and a relational database, wherein the key value database is used for storing the latest screened monitoring data of each host, and the relational database is used for storing each screened monitoring data of each host;
the monitoring center module comprises an interface for inquiring monitoring data, and the following data inquiry modes exist through the interface: when the real-time state of each current host needs to be inquired, the monitoring center module acquires the latest data from the key value database and feeds the latest data back to a user; and the number of the first and second groups,
when the condition of the host computer in a certain time period needs to be inquired, the monitoring center module acquires data in the certain time period from the relational database, and the data are aggregated through an algorithm and fed back to a user.
2. The system for low latency monitoring of host resources in a cluster environment of claim 1, wherein the monitoring center module saves monitoring data by invoking a data storage module interface.
3. The system for low latency collection and analysis of host resource monitoring in a cluster environment of claim 1, wherein the data to be monitored includes CPU, memory, disk, and network resource usage data.
4. The system of claim 3, wherein the CPU usage data is collected by: installing a host monitoring module program on a host, reading/proc/stat file information by the module program, analyzing a first row of data, obtaining user, iowait, irrq, softirq, nice, system, idle, steady, gust and gust _ nice index data, recording the total time all of a CPU as the sum of each data item of the user, iowait, irrq, softirq, nice, system, steady, gust and gust _ nice, recording the total use time use of the CPU as all minus idle data item, respectively obtaining once all and use before and after a time period when counting the CPU use rate in the time period, then calculating a difference value, recording the first obtained data as all _ start, use _ start, and the second obtained data as all _ end and use _ end, wherein the CPU uses a user Rat calculation formula in the time period as follows: useRate (use _ end-use _ start)/(all _ end-all _ start) 100%.
5. The system for collection and analysis of low latency monitoring of host resources in a cluster environment according to claim 3 or 4, wherein the collection procedure of the memory usage data is as follows: the host monitoring module program reads/proc/meminfo files, selects MemTotal and MemAvailable index data, records the total memory as MemTotal data item, records the available memory as MemAvailable data item, and records the used memory Usedmemory as the difference value of MemTotal minus MemAvailable.
6. The system for collection and analysis of low latency host resource monitoring in a cluster environment according to claim 3 or 4, wherein the flow of collecting the disk usage data is as follows: the disk data executes the df-k-T, ext2-T, ext3-T, ext4-T, vfat-T, xfs-T, nfs-T and nfs4-T commands through a host monitoring module program to obtain Used and Available data items, one file system usage is recorded as Used, the residual usage is recorded as Available, and the total amount is the sum of the Used and the value of the Available.
7. The system for collection and analysis of low latency monitoring of host resources in a cluster environment according to claim 3 or 4, wherein the collection procedure of the network resource usage data is as follows: the network data checks/proc/net/dev files through a host monitoring module program, analyzes monitored network card data, obtains a current time point, records a total network outlet flow as netwout and a network inlet flow as netwokn, respectively obtains the netwout and the netwokn before and after a time period when network speed is calculated, then calculates a difference value, records the first obtained data as netwout _ start and netwokn _ start, obtains the second obtained data as netwout _ end and netwokn _ end, and when a time interval of two times of recording is t, the network outlet speed formula is as follows: the network out _ speed is (network out _ end-network out _ start)/t, and the network entry speed formula is as follows: network in _ speed ═ (network in _ end-network in _ start)/t.
CN202010929325.3A 2020-09-07 2020-09-07 Acquisition and analysis system for low-delay host resource monitoring in cluster environment Pending CN112104493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010929325.3A CN112104493A (en) 2020-09-07 2020-09-07 Acquisition and analysis system for low-delay host resource monitoring in cluster environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010929325.3A CN112104493A (en) 2020-09-07 2020-09-07 Acquisition and analysis system for low-delay host resource monitoring in cluster environment

Publications (1)

Publication Number Publication Date
CN112104493A true CN112104493A (en) 2020-12-18

Family

ID=73757530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010929325.3A Pending CN112104493A (en) 2020-09-07 2020-09-07 Acquisition and analysis system for low-delay host resource monitoring in cluster environment

Country Status (1)

Country Link
CN (1) CN112104493A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001020503A1 (en) * 1999-09-14 2001-03-22 E-Club Australia Limited A method of monitoring internet activity
US20110099268A1 (en) * 2009-10-26 2011-04-28 Hitachi, Ltd. Information processing system, and management method for storage monitoring server
CN105718351A (en) * 2016-01-08 2016-06-29 北京汇商融通信息技术有限公司 Hadoop cluster-oriented distributed monitoring and management system
CN110278102A (en) * 2018-03-15 2019-09-24 勤智数码科技股份有限公司 A kind of IT automation operational system and method
CN110309130A (en) * 2018-03-21 2019-10-08 中国人民财产保险股份有限公司 A kind of method and device for host performance monitor
CN110733038A (en) * 2019-09-30 2020-01-31 浙江工业大学 Industrial robot remote monitoring and data processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001020503A1 (en) * 1999-09-14 2001-03-22 E-Club Australia Limited A method of monitoring internet activity
US20110099268A1 (en) * 2009-10-26 2011-04-28 Hitachi, Ltd. Information processing system, and management method for storage monitoring server
CN105718351A (en) * 2016-01-08 2016-06-29 北京汇商融通信息技术有限公司 Hadoop cluster-oriented distributed monitoring and management system
CN110278102A (en) * 2018-03-15 2019-09-24 勤智数码科技股份有限公司 A kind of IT automation operational system and method
CN110309130A (en) * 2018-03-21 2019-10-08 中国人民财产保险股份有限公司 A kind of method and device for host performance monitor
CN110733038A (en) * 2019-09-30 2020-01-31 浙江工业大学 Industrial robot remote monitoring and data processing system

Similar Documents

Publication Publication Date Title
CN110502494B (en) Log processing method and device, computer equipment and storage medium
CN112035404B (en) Medical data monitoring and early warning method, device, equipment and storage medium
CN111078755B (en) Time sequence data storage query method and device, server and storage medium
WO2021091489A1 (en) Method and apparatus for storing time series data, and server and storage medium thereof
CN111177178B (en) Data processing method and related equipment
CN105824744A (en) Real-time log collection and analysis method on basis of B2B (Business to Business) platform
CN111077870A (en) Intelligent OPC data real-time acquisition and monitoring system and method based on stream calculation
CN109684416A (en) A kind of high concurrent real-time history data storage system
CN110602178B (en) Method for calculating and processing temperature sensor data based on edge compression
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
CN115186883A (en) Industrial equipment health state monitoring system and method based on Bian Yun collaborative computing
CN115344207A (en) Data processing method and device, electronic equipment and storage medium
CN113761013A (en) Time sequence data pre-statistical method and device and storage medium
CN111083008A (en) Nginx-based traffic collection and analysis method
CN110689953A (en) Data storage method and device, data searching method and device, and electronic equipment
CN112104493A (en) Acquisition and analysis system for low-delay host resource monitoring in cluster environment
CN111241074B (en) Steel enterprise data center application system based on time sequence data and relation data
CN115203212A (en) Home textile customer data collection system based on big data
CN113872814A (en) Information processing method, device and system for content distribution network
CN113760640A (en) Monitoring log processing method, device, equipment and storage medium
CN112241429A (en) Equipment thing allies oneself with system based on big data
CN111459738B (en) Fault diagnosis method and system for parallel storage system based on fail-slow model
CN111782588A (en) File reading method, device, equipment and medium
CN115033457B (en) Multi-source data real-time acquisition method and system capable of monitoring and early warning
CN112579552A (en) Log storage and calling method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201218

RJ01 Rejection of invention patent application after publication