WO2023221846A1 - 计算集群及其数据采集方法、设备及存储介质 - Google Patents

计算集群及其数据采集方法、设备及存储介质 Download PDF

Info

Publication number
WO2023221846A1
WO2023221846A1 PCT/CN2023/093405 CN2023093405W WO2023221846A1 WO 2023221846 A1 WO2023221846 A1 WO 2023221846A1 CN 2023093405 W CN2023093405 W CN 2023093405W WO 2023221846 A1 WO2023221846 A1 WO 2023221846A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frequency
collector
collectors
computing
Prior art date
Application number
PCT/CN2023/093405
Other languages
English (en)
French (fr)
Inventor
孙相征
何万青
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023221846A1 publication Critical patent/WO2023221846A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3096Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents wherein the means or processing minimize the use of computing system or of computing system component resources, e.g. non-intrusive monitoring which minimizes the probe effect: sniffing, intercepting, indirectly deriving the monitored data from other directly available data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of cloud computing technology, and in particular to a computing cluster and its data collection method, equipment and storage medium.
  • High Performance Computing refers to computing systems and environments that typically use many processors (as part of a single machine) or several computers organized in a cluster (operating as a single computing resource).
  • HPC High Performance Computing
  • Most cluster-based HPC systems use high-performance networks to interconnect computers.
  • performance monitoring and performance analysis are an indispensable part of building an HPC system.
  • the processing overhead of performance indicator data is a challenge faced by HPC systems during performance monitoring and performance analysis, including data transmission overhead, data processing overhead, data storage overhead and data analysis overhead.
  • a common approach is to use a lower collection frequency to reduce the amount of performance indicator data.
  • a lower collection frequency may cause the loss or distortion of performance indicator data, affecting the accuracy of performance analysis results. nature, leading to decision-making errors based on performance analysis results.
  • Various aspects of this application provide a computing cluster and its data collection method, equipment and storage medium to solve the conflict between the processing overhead of performance indicator data and the accuracy of performance analysis results, and ensure the collection accuracy of performance indicators. Ensure the accuracy of performance analysis while reducing the overhead of collecting and processing performance indicator data.
  • Embodiments of this application provide a computing cluster, including: a management and control node and multiple computing nodes. Multiple collectors are deployed on each computing node. Different collectors are used to collect different performance indicators; the management and control node is used to collect data from multiple computing nodes. Deploy the same job task on at least two computing nodes among the computing nodes, and control at least two computing nodes to execute the job task; each computing node is used to start at least two targets related to the job task during the execution of the job task.
  • the at least two target collectors collect at least two performance index data of the computing nodes where they are located at the current collection frequency; when it is determined that it is the master node among at least two computing nodes, it collects at least two target collectors based on at least two computing nodes.
  • adjust the collection frequency of at least two target collectors notify other computing nodes of at least two computing nodes to adjust the collection frequency of at least two target collectors on them, so that at least The two target collectors continue to collect at least two performance index data of the computing nodes where they are located at the adjusted collection frequency.
  • Embodiments of the present application also provide a data collection method for a computing cluster, including: during the execution of a job task, starting at least two target collectors related to the job task, so that the at least two target collectors can The current collection frequency collects at least two performance index data of the computing node where it is located, and the job task is deployed on at least two computing nodes in the computing cluster; when it is determined that it is the master node of at least two computing nodes, according to Change information of at least two performance indicator data, adjust the collection frequency of at least two target collectors; notify other computing nodes in at least two computing nodes to adjust the collection frequency of at least two target collectors deployed on them, so that other At least two target collectors on the computing node continue to collect at least two performance index data of the computing node where they are located at the adjusted collection frequency.
  • the embodiment of the present application also provides another data collection method for a computing cluster, including: during the execution of a job task, starting at least two target collectors related to the job task, so that the at least two target collectors Collect at least two performance indicator data of the computing node where it is located at the current collection frequency; divide at least two target collectors into at least two associated collector groups based on the correlation of the performance indicators that the at least two target collectors are responsible for collecting. ;According to the change information of the performance index data collected by the target collector in each associated collector group, adjust the collection frequency of the target collector in each associated collector group separately, so that the target collector can continue with the adjusted collection frequency Collect at least two performance indicator data of the computing node where it is located.
  • Embodiments of the present application also provide a data collection device, applied to any computing node in the computing cluster, including: a startup module, used to start at least two target collectors related to the job task during the execution of the job task, In order to enable the at least two target collectors to collect at least two performance index data of the computing nodes where they are located at the current collection frequency, the job tasks are deployed on at least two computing nodes in the computing cluster; the adjustment module is used to determine When it is the master node in at least two computing nodes, it adjusts the collection frequency of at least two target collectors based on the change information of at least two performance index data; the notification module is used to notify at least two computing nodes. Other computing nodes adjust the collection frequencies of at least two target collectors deployed on them, so that at least two target collectors on other computing nodes continue to collect at least two performance index data of the computing nodes where they are located at the adjusted collection frequency. .
  • a startup module used to start at least two target collectors related to the job task during the execution of the job
  • Embodiments of the present application also provide a computing node, which can be applied to computing clusters.
  • the computing node includes: a memory and a processor; the memory is used to store computer programs; the processor is coupled with the memory and is used to execute computer programs for follow the steps in the method described above.
  • Embodiments of the present application also provide a computer-readable storage medium storing a computer program.
  • the computer program When executed by a processor, the processor can implement the steps in the above method.
  • the collection frequency of performance index data is adaptively changed according to the change information of the performance index data, which can not only ensure the collection accuracy, but also ensure the performance analysis based on the performance index data and the analysis results. It can improve the decision-making accuracy and reduce the collection and processing overhead of performance index data; in the process of adaptively changing the collection frequency, for at least two computing nodes executing the same job task, the master node of at least two computing nodes is responsible for collection
  • the adaptive change processing of the frequency is synchronized to other computing nodes when changes are needed. Other computing nodes do not need to be responsible for the adaptive change processing of the acquisition frequency, which can reduce the processing burden of other computing nodes.
  • Figure 1a is a schematic structural diagram of a computing cluster provided by an exemplary embodiment of the present application.
  • Figure 1b is a schematic diagram of multiple frequency groups provided by an exemplary embodiment of the present application.
  • Figure 2 is a schematic flowchart of a data collection method for a computing cluster provided by another exemplary embodiment of the present application;
  • Figure 3 is a schematic flowchart of a data collection method for a computing cluster provided by another exemplary embodiment of the present application.
  • Figure 4 is a schematic structural diagram of a data collection device provided by another exemplary embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a data collection device provided by another exemplary embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a computing node provided by yet another exemplary embodiment of the present application.
  • FIG 1a is a schematic structural diagram of a computing cluster 100 provided by an exemplary embodiment of the present application.
  • the computing cluster 100 in this embodiment can be implemented as a large-scale computing platform, or an HPC system, or one or more computer rooms, an Internet data center (IDC), or a cloud computing system, etc. This embodiment is for computing clusters.
  • the specific implementation form of 100 is not limited.
  • the computing cluster 100 includes: a management node 101 and a plurality of computing nodes 102. Communication connections can be implemented between the management node 101 and multiple computing nodes 102, and between the multiple computing nodes 102.
  • the above communication connection may be a wired or wireless communication connection.
  • the communication connection between each node can be realized through a mobile network.
  • the network standard of the mobile network can be 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD). -SCDMA, CDMA2000, UTMS), 4G (LTE), 4G+ (LTE+), 5G, WiMax or any of the new network standards that will appear in the future.
  • each node can also be located in the same local area network.
  • each node can also achieve communication connection through Bluetooth, WiFi, infrared, zigbee or NFC.
  • the implementation form of the management node 101 and the computing node 102 is not limited.
  • the management and control node 101 can be implemented in various forms, for example, it can be deployed on a virtual machine, cloud server, cloud host or physical machine.
  • the management and control node 101 can be centrally deployed on one physical machine or one virtual machine, or can be distributed on multiple physical machines or multiple virtual machines, but is not limited thereto.
  • the computing node 102 can be any device form with certain computing capabilities and communication capabilities, for example, it can be a virtual machine, a physical machine (such as a server, a computer device), a cloud server, a cloud host, a virtual center, a server array or a database, etc. .
  • the management and control node 101 can provide a human-computer interaction interface for users on the one hand and receive job tasks submitted by users through the human-computer interaction interface.
  • it can perform various management and control on the computing cluster 100.
  • it can perform various management and control on multiple computing nodes 102.
  • a job task can be deployed on one computing node 102, or can also be deployed on at least two computing nodes 102, depending on the type and performance requirements of the job task. For applications with a large amount of calculation or Job tasks that require high computing efficiency can be deployed on at least two computing nodes 102 at the same time.
  • the management and control node 101 can be specifically used to deploy the same job task on at least two computing nodes 102 among the plurality of computing nodes 102, and control the at least two computing nodes 102 to execute the job task.
  • the method of deploying the job task on the computing node 102 may be, but is not limited to: sending the data related to the job task to the computing node 102, or issuing a task instruction to the computing node 102, and the task instruction carries the information of the job task.
  • Identification information the computing node 102 obtains data related to the job task from the task database according to the identification information of the job task.
  • the method of controlling the computing node 102 to execute the job task may be, but is not limited to: sending a startup instruction to the computing node 102 to instruct the computing node 102 to start executing the job task; or, may send a job instruction parameter to the computing node 102, and the job instruction
  • the parameters include the time to execute the job task, for example, start the job task after 10 minutes, or start the job task at the specified time xxx hour xxx minute, etc.
  • each computing node 102 serves as a task execution node, can receive job tasks deployed by the management and control node 101, and execute the job tasks according to the control of the management and control node 101.
  • multiple collectors are deployed on each computing node 102.
  • Each collector is responsible for collecting one kind of performance index data, and different collectors are responsible for collecting different performance index data.
  • the collector can be a program code with a data collection function. In terms of implementation, it can be a plug-in or SDK that relies on the main program, or it can be an independent software function module, which is not limited.
  • Each collector can collect performance indicator data according to a certain collection frequency.
  • the number of performance indicator data is directly related to the collection frequency; among them, the higher the collection frequency, the more performance indicator data is collected. Based on the performance indicator
  • each computing node 102 in this embodiment in addition to executing job tasks according to the control of the management and control node 101, can also start at least two collectors related to the job tasks during the execution of the job tasks, so that the At least two started collectors collect at least two performance index data at the current collection frequency.
  • at least two collectors related to the job task that are started by the computing node 102 during the execution of the job task are called target collectors.
  • the number of target collectors is at least two. For different job tasks, The target collectors started will vary, depending on the performance index requirements of the job task.
  • the performance index data in this embodiment include but are not limited to: CPU utilization, memory utilization, remaining memory, network bandwidth, amount of CPU resources occupied by job tasks, amount of memory resources occupied by job tasks, The amount of bandwidth resources consumed by job tasks, etc.
  • performance analysis and performance monitoring can be performed from the perspective of computing nodes 102 and/or job tasks.
  • the performance attributes of the computing node 102 can be analyzed or monitored.
  • the performance attributes that can be analyzed or monitored include but are not limited to: task load conditions, network status, and the current amount of available resources, etc. The amount of currently available resources can be analyzed or monitored.
  • the management and control node 101 can obtain these performance attributes of the computing node 102, and can determine based on these performance attributes whether it can continue to allocate new job tasks to the computing node 102, And whether it is necessary to dynamically adjust the resource amount of the computing node 102, such as increasing CPU resources or network bandwidth resources, etc. For example, based on these performance indicator data, the running status and resource consumption of job tasks can be analyzed or monitored, as well as the quality of service (QoS) corresponding to the job tasks, etc.; further, the management and control node 101 can obtain the running status and resource consumption of job tasks.
  • QoS quality of service
  • the computer cluster in this embodiment also includes: performance analysis node 104, Communication connections can be realized between the analysis node 104, the management and control node 101 and multiple computing points.
  • the performance analysis node 104 is responsible for receiving the performance index data reported by each computing node 102, and performing performance analysis and performance monitoring on the computing node 102 and/or job tasks according to the performance index data reported by each computing node 102.
  • Performance analysis and monitoring are necessary parts for protective maintenance of the computing cluster 100, which facilitates operation and maintenance personnel to understand the operation of the entire cluster and observe cluster resource usage efficiency, etc.
  • the performance analysis node 104 can analyze or monitor the computing node 102 based on the performance indicator data reported by each computing node 102, such as the CPU utilization, memory utilization, memory remaining amount, network bandwidth, etc. of each computing node 102.
  • the performance attributes that can be analyzed or monitored include but are not limited to: task load conditions, network status, and currently available resources, etc., and the performance attributes of each computing node 102 are provided to the management and control node 101 for management and control of the node. 101 to make further decisions.
  • the performance analysis node 104 can be based on the performance indicator data reported by each computing node 102, such as the CPU utilization, memory utilization, amount of CPU resources occupied by job tasks, amount of memory resources and consumed bandwidth resources of each computing node 102. , can analyze or monitor the running status of job tasks, resource consumption, and performance data such as QoS corresponding to job tasks, and provide various performance data of job tasks to the management and control node 101 for the management and control node 101 to make further decisions. For example, the management and control node 101 can analyze the runtime behavioral characteristics of the job task; according to the runtime behavioral characteristics of the job task, adapt a better resource configuration to the job task. These resource configurations at least include the number of computing nodes 102, each CPU, memory, network and other resources on each computing node 102.
  • the performance analysis node 104 can be configured to analyze and obtain at least two calculations based on at least two performance index data reported by the at least two computing nodes 102.
  • the latest performance attributes of node 102 are provided to the management and control node 101 for further decision-making, so that the entire computer cluster forms a closed loop in terms of performance management and control.
  • each computing node 102 can execute a job task, and can start at least two target collectors related to the job task during the execution of the job task, so that the at least two target collectors can collect data at the current collection frequency. Collect at least two performance index data of the computing node 102 where it is located. In addition, during the process of collecting performance index data by at least two target collectors, each computing node 102 can also adaptively change the collection of performance indicators by at least two target collectors based on the change information of the performance index data.
  • the collection frequency of data usage is to facilitate variable frequency collection of performance index data, which can not only ensure the accuracy of collection, ensure the accuracy of performance analysis and decision-making based on performance index data, but also reduce the collection and processing overhead of performance index data.
  • each computing node 102 since at least two target collectors on each computing node 102 may need to perform frequency conversion processing during the collection process, if each computing node 102 adjusts the collection frequency of its at least two target collectors, Adjusting calculations one by one requires too much data processing, which is time-consuming and laborious. Especially in supercomputing scenarios, where there are a large number of computing nodes 102 and target collectors, the amount of calculations will be relatively large, affecting the overall performance of the computing nodes 102.
  • the master node 103 can be selected from them , the master node 103 adjusts the collection frequency of at least two target collectors based on the change information of at least two performance index data of the master node 103 collected by at least two target collectors on it, and notifies at least two computing nodes 102 The other computing nodes 102 adjust the collection frequencies of at least two target collectors on them, so that the at least two target collectors on each computing node 102 can continue to collect at least two performance index data at the adjusted collection frequency.
  • At least two target collectors on each computing node are responsible for collecting at least two performance indicators of the computing node where they are located. according to.
  • only the master node 103 is responsible for performing data processing operations related to collection frequency adjustment, while other computing nodes 102 do not need to perform data processing operations related to collection frequency adjustment, and can directly adjust at least two of them according to the notification of the master node 103
  • the collection frequency of the target collector can be used, which can reduce the amount of data processing caused by dynamically adjusting the collection frequency, and can also improve the efficiency of adjusting the collection frequency of the target collector.
  • each computing node 102 starts at least two target collectors related to the job task during the execution of the task.
  • a specific implementation method is as follows: according to the task related to the task it performs, At least two performance indicators, determine at least two target collectors corresponding to the at least two performance indicators; and then send the startup command to the corresponding at least two target collectors based on the locally stored identification information of the at least two target collectors.
  • Target collector after receiving the startup command, at least two target collectors start running and collect corresponding performance index data.
  • At least two collectors send the collected performance index data to the affiliated computing node 102, and the affiliated computing node 102 adjusts at least two targets based on the change information of at least two performance index data collected by the at least two target collectors.
  • the collection frequency of the collector Since the collection frequency of each target collector may be the same or similar when executing the same job task, in order to reduce the task load of each computing node 102 in adjusting the collection frequency of each target collector, at least two computing nodes 102 can Select one computing node 102 as the master node 103.
  • the master node 103 determines the adjusted collection frequency based on the change information of at least two performance index data collected by at least two target collectors on it.
  • the master node 103 adjusts the local at least
  • the collection frequencies of the two target collectors are notified to other computing nodes 102, and other computing nodes 102 directly adjust the collection frequencies used by at least two target collectors based on the notification.
  • each computing node 102 also needs to determine whether it belongs to the master node 103. Change information of at least two performance index data collected by the collector, adjust the collection frequency of at least two target collectors; notify other computing nodes 102 of at least two computing nodes 102 to adjust the collection frequency of at least two target collectors thereon .
  • each computing node 102 if it determines that it is not the master node 103, it can wait for a notification from the master node 103. Before receiving the notification from the master node 103, at least two target collectors collect its data at the current collection frequency. For at least two performance index data of the computing node where it is located, after receiving the notification from the master node 103, the collection frequency of at least two local target collectors is adjusted, and at least two target collectors continue to collect the computing node where they are located at the adjusted collection frequency. At least two performance indicator data of the node.
  • the selection method of the master node 103 is not limited. The following methods can be used but are not limited to:
  • the master node 103 is selected by the control node 101. Specifically, the management and control node 101 is also used to: select the master node 103 from the at least two computing nodes 102 based on the attribute information of the at least two computing nodes 102, and send a notification message to the master node 103. For at least two computing nodes 102 executing the same job task, each computing node 102 can determine that it is the master node 103 according to whether it receives the notification message sent by the management node 101. If the notification message is received, it determines that it is the master node 103. When receiving the notification message, it determines that it is not the master node 103.
  • At least two computing nodes 102 may execute one or more job tasks, and different job tasks may have different workload sizes, required network bandwidth, and amounts of resources used.
  • an optional implementation manner in which the management and control node 101 selects the master node 103 from at least two computing nodes 102 based on the attribute information of the at least two computing nodes 102 is as follows: based on the task load and network of the at least two computing nodes 102 The master node 103 is selected from at least two computing nodes 102 based on at least one performance attribute of status and amount of available resources.
  • the task load represents the load of the task executed by the computing node 102; the network status represents the network bandwidth of the computing node 102 when executing the task; and the amount of available resources represents the current remaining amount of CPU, memory, etc. of the computing node 102.
  • several specific implementation methods when selecting the master node 103 from at least two computing nodes 102 are as follows: According to the task load of the at least two computing nodes 102, select the master node 103 from the at least two computing nodes 102. Node 103; or, select the master node 103 from at least two computing nodes 102 according to the network status of at least two computing nodes 102; or, select the master node 103 from at least two computing nodes 102 according to the amount of available resources of at least two computing nodes 102.
  • the master node 103 is selected from at least two computing nodes 102; or, the master node 103 is selected from at least two computing nodes 102 according to the task load and network status of at least two computing nodes 102; or, select the master node 103 from the task load and available resources of at least two computing nodes 102
  • the master node 103 is selected from at least two computing nodes 102; or, the master node 103 is selected from at least two computing nodes 102 according to the network status and available resource amount of at least two computing nodes 102; or, the master node 103 is selected from at least two computing nodes 102 according to the network status and available resource amount of at least two computing nodes 102;
  • the master node 103 is selected from at least two computing nodes 102 based on the task load, network status and amount of available resources of the computing nodes 102 .
  • the computing node 102 with the smaller task load is used as the master node 103;
  • the computing node 102 with better network status is used as the master node 103;
  • the available resources of the at least two computing nodes 102 amount when selecting the master node 103 from at least two computing nodes 102, the computing node 102 with more available resources is used as the master node 103; according to the task load and network status of the at least two computing nodes 102, from at least two computing nodes 102
  • the computing node 102 with smaller task load and better network status is used as the master node 103; based on the task load and available resource amount of at least two computing no
  • the computing node 102 with a smaller task load and a larger amount of available resources is used as the master node 103; according to the network status and the amount of available resources of at least two computing nodes 102, from at least two computing nodes 102
  • the computing node 102 with better network status and larger amount of available resources is used as the master node 103; or, based on the task load, network status and amount of available resources of at least two computing nodes 102.
  • the computing node 102 with smaller task load, better network status, and larger amount of available resources is used as the master node 103 .
  • a new master node 103 may be dynamically replaced.
  • the performance analysis node 104 can analyze and obtain the latest performance attributes of at least two computing nodes 102 based on at least two performance indicator data reported by at least two computing nodes 102 respectively, and provide them to the management and control node 101; the management and control node 101 also uses Yu: Based on the latest performance attributes of the at least two computing nodes 102, re-select a new master node 103 from the at least two computing nodes 102, and send a notification message to the new master node 103.
  • the new master node 103 when the new master node 103 receives the notification message, it can automatically become the new master node 103 . Furthermore, the management and control node 101 can also send instruction information to the original master node 103 to become a non-master node 103. When receiving the instruction information, the original master node 103 shuts down the function of the master node 103. It should be noted that the manner in which the management and control node 101 selects the master node 103 and dynamically updates the master node 103 is shown in Figure 1a.
  • Method A2 In addition to the management node 101 selecting the master node 103 from at least two computing nodes 102, the computing nodes 102 can also negotiate the master node 103 among themselves according to the set master node 103 selection method.
  • a specific implementation is as follows: for each computing node 102 that performs the same job task, it can be based on at least two computing nodes 102 (i.e., the specified attribute information of itself and other computing nodes 102), combined with the preset conditions that should be met to select the master node 103 based on the specified attribute information, to determine whether it is the master node 103.
  • the specified attribute information may be the device number or IP address of the computing node 102.
  • the condition that the master node 103 should meet according to the specified attribute information may be the node with the largest device number or IP address as the master node 103, or the device number or IP address.
  • the node with the largest address serves as the master node 103.
  • the above-mentioned method of determining whether it is the master node 103 based on the device numbers or IP addresses of at least two computing nodes 102, combined with the preset conditions that should be met by selecting the master node 103 based on specified attribute information includes: Each computing node 102 compares its own device number or IP address with the device numbers or IP addresses of other computing devices; if its own device number or IP address is the largest, it determines itself to be the master node 103, otherwise it determines that it is not the master node 103.
  • each computing node 102 compares its own device number or IP address with the device numbers or IP addresses of other computing devices; if its own device number or IP address is the smallest, then determine It is the master node 103, otherwise it is determined that it is not the master node 103. It should be noted that the manner in which the computing node 102 autonomously negotiates the master node 103 is also shown in Figure 1a.
  • the master node 103 can adjust the collection frequency of at least two target collectors based on the change information of at least two performance index data collected by at least two local target collectors.
  • the specific implementation is as follows: first, according to the correlation of the performance indicators that the at least two target collectors are responsible for collecting, divide at least two target collectors into at least two associated collector groups; taking the associated collector group as a unit, According to the change information of the performance index data collected by the target collector in each associated collector group, the collection frequency of the target collector in each associated collector group is adjusted respectively.
  • the collectors are grouped, and the collection frequency of the collectors with strong correlation is adjusted uniformly in group units. That is to say, for the target collectors in the same associated collector group, The adjusted collection frequency is the same, which is helpful to further simplify the computing resources consumed in adjusting the collection frequency and improve the overall adjustment efficiency of the collection frequency.
  • target collectors whose correlations of performance indicators collected by at least two target collectors are greater than a preset threshold can be divided into at least two associated collector groups.
  • the target collector used to collect CPU utilization and the target collector used to collect CPU floating point operating efficiency can be divided into the same associated collector group, and the target collector used to collect memory utilization and the target collector used to collect CPU floating point operating efficiency can be divided into the same associated collector group.
  • the target collectors that collect read and write bandwidth resources are divided into the same associated collector group, and the target collectors used to collect network reception/transmission bandwidth resources and the target collectors used to collect network packet reception/transmission rates are divided into the same correlator. in groups, but not limited to this.
  • the current frequency of each target collector in the associated collector group can also be determined. Whether the collection frequency is the same, if not, adjust the current collection frequency of each target collector in the associated collector group to the same collection frequency.
  • the current collection frequency refers to the collection frequency currently used by each target collector.
  • the current collection frequencies of each target collector in the associated collector group can be adjusted to the same collection frequency.
  • the following optional implementation methods are adopted: adjust the current collection frequency of each target collector in the associated collector group to the average value of the current collection frequency of each target collector in the associated collector group; or, adjust the current collection frequency of each target collector in the associated collector group;
  • the current acquisition frequency of each target collector in the associated collector group is adjusted to the maximum acquisition frequency among the current acquisition frequencies of each target collector in the associated collector group; or, the current acquisition frequency of each target collector in the associated target collector is adjusted.
  • the current collection frequency is adjusted to the minimum collection frequency among the current collection frequencies of each target collector in the associated collector group.
  • the collection frequency of the target collector in each associated collector group is adjusted respectively.
  • the specific implementation method is as follows: for each associated collector group, according to the performance indicators collected by the target collector in the associated collector group The change information of the data is used to determine the frequency conversion direction corresponding to the associated collector group; and then the collection frequency currently used by the target collector in the associated collector group is adjusted to the closest preset frequency in the frequency conversion direction among multiple preset frequencies. Preset frequencies, where multiple preset frequencies are arranged from small to large.
  • the frequency conversion directions include increasing the frequency, keeping the frequency unchanged, and decreasing the frequency.
  • the frequency conversion granularity of increasing the frequency and decreasing the frequency can also be refined to obtain more types of frequency conversion. direction.
  • multiple frequencies are preset, and the multiple preset frequencies are arranged from small to large and are not the same. Assume that the multiple preset frequencies are f1, f2, f3, f4, f5 from small to large. Assume that the current acquisition frequency is f2 and the frequency conversion direction is to increase the frequency.
  • the preset frequency refers to the frequency f3; similarly, when the frequency conversion direction is to reduce the frequency, the closest preset frequency in the frequency conversion direction among multiple preset frequencies refers to the frequency f1.
  • multiple frequency groups are preset, and each frequency group corresponds to a preset frequency.
  • These preset frequencies are arranged from small to large, and are arranged in an orderly manner according to the collection frequency from large to small, as shown in Figure 1b shown, including the following frequency groups, such as high frequency group m - high frequency group 1, fundamental frequency group, and low frequency group 1 - low frequency group n; where m and n are positive integers.
  • At least two target collectors can be initialized as the base frequency, and these target collectors can be added to the base frequency group for management; then, the target collector is responsible for the collection performance according to the Correlation of indicators, divide at least two target collectors into different associated collector groups, take the associated collector group as the unit, group each associated collector, and collect the data collected by the target collector in the associated collector group. Based on the change information of the performance index data, determine the frequency conversion direction corresponding to the associated collector group; according to the frequency conversion direction, adjust the target collector in the associated collector group from the current frequency group to the closest frequency conversion direction in the frequency conversion direction. frequency group.
  • the target collector in the associated collector group is moved from the base frequency group to the high frequency group 1; for the associated collector group whose frequency conversion direction is to decrease the frequency Group, move the target collectors in the associated collector group from the base frequency group to the low frequency group 1; for the associated collector group whose frequency conversion direction remains unchanged, keep the target collectors in the associated collector group. in the fundamental frequency group.
  • the frequency group of the target collector in each associated collector group can be continuously adjusted according to similar frequency conversion methods.
  • the preset frequency corresponding to the frequency group where it is located is used. Collect performance indicator data.
  • the frequency conversion direction corresponding to the associated collector group is determined based on the change information of the performance index data collected by the target collector in the associated collector group.
  • a specific The implementation is as follows: first, for each associated collector group, obtain the key performance indicator data collected by the key collectors in the associated collector group.
  • the key collector is the target collector responsible for collecting key performance indicators.
  • the key indicators The data is part of the indicator data that can be collected by each target collector in the associated collector group.
  • the key performance indicator data can be one or more indicator data ranked first in importance; then, according to the set statistical interval, statistics of each The change rate of the key performance indicator data, and generate a global change rate based on the change rate of each key performance indicator data; further, according to the global change rate, determine the frequency conversion direction corresponding to the associated collector group, and determine the frequency conversion direction corresponding to the associated collector group.
  • the frequency conversion direction can be any one of increasing frequency, decreasing frequency, or keeping the same.
  • the change rate of each key performance indicator data within the statistical interval can be calculated according to the set statistical interval.
  • This embodiment does not limit the statistical interval, and the target can be The collection period corresponding to the current collection frequency of the collector is used as the statistical interval. For example, if the collection period corresponding to the current collection frequency is 1s, then the statistical interval is 1s, that is, a key performance indicator data is collected every 1s, and the two consecutive collections are calculated.
  • Pi is used to represent each key performance indicator
  • ⁇ (Pi) represents the change in the performance value of the key indicator Pi in two adjacent statistical intervals
  • different key performance indicators are given different weights, using (W1, W2...Wn), where Wi represents the weight of the i-th key performance indicator
  • Wi represents the weight of the i-th key performance indicator
  • the global change rate generated based on the change rate of each key performance indicator data can be expressed as:
  • this embodiment will set the statistical threshold of each key performance indicator.
  • the statistical threshold is represented by (PT1, PT2,...PTn), where PTi represents the lowest change threshold of the i-th key performance indicator. Based on this, It can be judged by whether the change amount of each key performance indicator exceeds the threshold in a judgment period.
  • the change rate of each key performance indicator data is counted. Otherwise, if the change rate of the key performance indicator data in two adjacent periods The amount of change does not exceed the corresponding threshold, indicating that the change in the key performance indicator data is relatively small. You can directly set the change direction to 0, indicating that frequency modulation is not required, that is, the frequency conversion direction remains unchanged.
  • the global change rate is generated according to the above weighted summation formula. Furthermore, according to the global change rate, a frequency conversion strategy is used to determine the frequency conversion direction corresponding to the associated collector group.
  • a specific implementation method is as follows: use the global change rate as the input of the frequency conversion strategy, and determine the associated collection based on the output value. The frequency conversion direction corresponding to the converter group. Among them, when the output value is 0, it means that the frequency of the target collector does not need to be changed; when the output value is 1, it means that the collection frequency of the target collector needs to be increased; when the output value is -1, it means that the frequency of the target collector needs to be reduced. Collection frequency.
  • the frequency conversion strategy is used to determine the output value.
  • a specific implementation method is as follows: Calculate the weighted change rate of the key indicator data in the associated collector group.
  • the upper and lower limits of the change rate threshold are ( ⁇ 1, ⁇ 2), where ⁇ 1 ⁇ 2, when KeyDelta> ⁇ 2, the acquisition frequency needs to be increased, and the output value is 1; when KeyDelta ⁇ 1, the acquisition frequency needs to be reduced, and the output value is - 1; when ⁇ 1 ⁇ KeyDelta ⁇ 2, the original acquisition frequency is maintained and the output value is 0.
  • the associated collector in order to further improve the accuracy of the frequency conversion direction, is determined based on the change information of the performance index data collected by the target collector in the associated collector group.
  • Another specific implementation method for the frequency conversion direction corresponding to the group is as follows: for each associated collector group, according to the change information of the performance index data collected by the target collector in the associated collector group, and the latest data based on at least two The performance analysis results obtained from the performance index data collected by the target collector determine the frequency conversion direction corresponding to the associated collector group.
  • each associated collector group determines the first frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group, and determine the first frequency conversion direction corresponding to the associated collector group according to the latest Based on the analysis results obtained from the performance index data collected by at least two target collectors, determine the second frequency conversion direction corresponding to the associated collector group. If the first frequency conversion direction and the second frequency conversion direction are the same, determine the first frequency conversion direction as The frequency conversion direction corresponding to the associated collector group; if the first frequency conversion direction and the second frequency conversion direction are not the same, the current collection frequency can not be adjusted temporarily, and the first frequency conversion direction in multiple statistical intervals can be continuously counted. The first frequency conversion direction within the statistical interval ultimately determines the frequency conversion direction corresponding to the associated collector group.
  • each associated collector group in the process of adjusting the collection frequency of the target collector in the associated collector group, if the current collection frequency of the target collector in the associated collector group is the maximum preset frequency, and the frequency conversion direction is to increase the frequency, the current collection frequency remains unchanged; if the current collection frequency of each target collector in the associated collector group is the minimum preset frequency, and the frequency conversion direction is to decrease the frequency, the current collection frequency is maintained The collection frequency remains unchanged.
  • each computing node 102 can group at least two local target collectors, and each computing node 102 groups the target collectors according to the same standard.
  • the master node 103 can take the associated collector group as a unit.
  • the master node 103 can send a notification message to other computing nodes 102.
  • the notification message carries instruction information to increase or decrease the frequency, so that other computing nodes 102 can increase or decrease the sampling frequency of the target collector in the corresponding associated collector group according to the instruction information. Specifically, they can increase or decrease the frequency. Decrease to the closest preset frequency in the frequency conversion direction.
  • each computing node 102 can also divide the current collection of each target collector in each associated collector group in the same manner.
  • the frequencies are unified, so that the current collection frequency of the same associated collector group on different computing nodes 102 is the same, thereby ensuring that the target collectors in the same associated collector group on each computing node 102 can increase or decrease at the same time. , ensure the use of the same collection frequency, ensure the consistency of the collection frequency of the same collector on different computing nodes 102, and facilitate subsequent analysis and processing of the same performance index data.
  • the collection frequency of performance index data is adaptively changed according to the change information of the performance index data, which can not only ensure the collection accuracy, but also ensure the performance analysis and performance based on the performance index data.
  • the decision-making accuracy of the analysis results can also reduce the collection overhead; in the process of adaptively changing the collection frequency, for at least two computing nodes 102 performing the same job task, the master node 103 of the at least two computing nodes 102 is responsible for the collection frequency.
  • the adaptive change processing is synchronized to other computing nodes 102 when changes are needed. Other computing nodes 102 do not need to be responsible for the adaptive change processing of the collection frequency, which can reduce the collection overhead of other computing nodes 102 and further reduce the overall collection overhead.
  • the computing node 102 serving as the master node 103 implements the function of adjusting the collection frequency when the same job task is deployed on at least two computing nodes 102 .
  • the same job task may also be deployed on a computing node 102. In this case, it is not necessary to select the master node 103.
  • the computing node 102 executing the job task can adjust the local target collector by itself in the following manner.
  • the collection frequency specifically includes: during the execution of the job task, start at least two target collectors related to the job task, so that the at least two target collectors collect at least two performance indicators of the computing node where they are located at the current collection frequency.
  • FIG. 2 is a schematic flowchart of a data collection method for the computing cluster 100 provided by an exemplary embodiment of the present application. As shown in Figure 2, the method includes:
  • the job task is deployed on at least two computing nodes in the computing cluster 100;
  • adjusting the collection frequency of the at least two target collectors based on the change information of the at least two performance indicator data includes: based on the correlation of the performance indicators that the at least two target collectors are responsible for collecting. property, the at least two target collectors are divided into at least two associated collector groups; according to the change information of the performance index data collected by the target collectors in each associated collector group, each associated collector is adjusted respectively The collection frequency of target collectors within the group.
  • adjusting the collection frequency of the target collectors in each associated collector group separately according to the change information of the performance indicator data collected by the target collectors in each associated collector group includes: for each associated collector group Associating the collector group, determine the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group; assign the current collection frequency used by the target collector in the associated collector group The frequency is adjusted to the closest preset frequency in the frequency conversion direction among multiple preset frequencies, where the multiple preset frequencies are arranged from small to large.
  • the process of adjusting the collection frequency of the target collector in the associated collector group also includes: if the current collection frequency of the target collector in the associated collector group is the maximum If the current acquisition frequency of each target collector in the associated collector group is the minimum preset frequency, and the frequency conversion direction is to decrease the frequency , keeping the current collection frequency unchanged.
  • the method further includes: If the current collection frequency of each target collector in the associated collector group is different, adjust the current collection frequency of each target collector in the associated collector group to the same collection frequency.
  • adjusting the current collection frequencies of each target collector in the associated collector group to the same collection frequency includes: adjusting the current collection frequencies of each target collector in the associated collector group to The average value of the current collection frequency of each target collector in the associated collector group; or, adjust the current collection frequency of each target collector in the associated collector group to the current collection frequency of each target collector in the associated collector group. The maximum collection frequency among the collection frequencies; or, adjust the current collection frequency of each target collector in the associated target collector to the minimum collection frequency among the current collection frequencies of each target collector in the associated collector group.
  • each associated collector group determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group, including: For each associated collector group, obtain the key performance indicator data collected by the key collectors in the associated collector group.
  • the key collector is the target collector responsible for collecting key performance indicators; according to the set statistical interval, statistics of each The change rate of key performance indicator data, and generate a global change rate based on the change rate of each key performance indicator data; based on the global change rate, determine the frequency conversion direction corresponding to the associated collector group, the frequency conversion direction includes increasing frequency, Either reduce the frequency or leave it unchanged.
  • each associated collector group determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group, including: Each associated collector group, based on the change information of the performance indicator data collected by the target collectors in the associated collector group, and the latest performance analysis results obtained based on the performance indicator data collected by the at least two target collectors , determine the frequency conversion direction corresponding to the associated collector group.
  • FIG. 3 is a schematic flowchart of another data collection method for the computing cluster 100 provided by an exemplary embodiment of the present application. As shown in Figure 3, the method includes:
  • Figure 4 is a schematic structural diagram of a data collection device provided by an exemplary embodiment of the present application. As shown in Figure 4, the device includes:
  • the startup module 41 is configured to start at least two target collectors related to the job task during the execution of the job task, so that the at least two target collectors collect at least two of the computing nodes where they are located at the current collection frequency.
  • a kind of performance index data, the job task is deployed on at least two computing nodes in the computing cluster 100;
  • the adjustment module 42 is configured to adjust the collection frequency of the at least two target collectors based on the change information of the at least two performance index data when it is determined that it is the master node among the at least two computing nodes. ;
  • Notification module 43 configured to notify other computing nodes among the at least two computing nodes to adjust the collection frequency of the at least two target collectors deployed thereon, so that at least two target collectors on the other computing nodes can The adjusted collection frequency continues to collect at least two performance indicator data of the computing node where it is located.
  • the adjustment module 42 when used to adjust the collection frequencies of the at least two target collectors based on the change information of the at least two performance index data, it is specifically configured to: based on the at least two target collectors.
  • Each target collector is responsible for the correlation of collected performance indicators, and the at least two target collectors are divided into at least two associated collector groups; according to the performance indicator data collected by the target collector in each associated collector group According to the change information, the collection frequency of the target collector in each associated collector group is adjusted separately.
  • the adjustment module 42 is used to separately adjust the collection frequency of the target collectors in each associated collector group according to the change information of the performance index data collected by the target collectors in each associated collector group. is specifically used to: for each associated collector group, determine the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group; The collection frequency currently used by the target collectors in the group is adjusted to the closest preset frequency in the frequency conversion direction among multiple preset frequencies. Among them, multiple preset frequencies are arranged from small to large.
  • the adjustment module 42 when used for each associated collector group, in the process of adjusting the collection frequency of the target collector in the associated collector group, it is also used to: if the target in the associated collector group The current collection frequency of the collector is the maximum preset frequency, and if the frequency conversion direction is to increase the frequency, the current collection frequency remains unchanged; if the current collection frequency of each target collector in the associated collector group is the minimum preset frequency , and the frequency conversion direction is to reduce the frequency and keep the current acquisition frequency unchanged.
  • the adjustment module 42 is used for determining, for each associated target collector, the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector within the associated collector group. When, it is also used: for each associated collector group, if the current collection frequency of each target collector in the associated collector group is different, adjust the current collection frequency of each target collector in the associated collector group to be the same collection frequency.
  • the adjustment module 42 when used to adjust the current collection frequency of each target collector in the associated collector group to the same collection frequency, it is specifically used to: adjust the current collection frequency of each target collector in the associated collector group.
  • the current collection frequency of the collector is adjusted to the average of the current collection frequencies of each target collector in the associated collector group; or, the current collection frequency of each target collector in the associated collector group is adjusted to the associated collection The maximum collection frequency among the current collection frequencies of each target collector in the associated target collector group; or, adjust the current collection frequencies of each target collector in the associated target collector to the current collection frequency of each target collector in the associated collector group Minimum acquisition frequency among frequencies.
  • the adjustment module 42 is used for each associated collector group, and determines the corresponding data of the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the frequency conversion direction it is specifically used to: for each associated collector group, obtain the key performance indicator data collected by the key collectors in the associated collector group.
  • the key collector is the target collector responsible for collecting key performance indicators; According to the set statistical interval, the change rate of each key performance indicator data is counted, and a global change rate is generated based on the change rate of each key performance indicator data; based on the global change rate, the frequency conversion direction corresponding to the associated collector group is determined,
  • the frequency conversion direction includes any one of increasing frequency, decreasing frequency, and maintaining the same frequency.
  • the adjustment module 42 is used for each associated collector group, and determines the corresponding data of the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the frequency conversion direction it is specifically used for: for each associated collector group, based on the change information of the performance index data collected by the target collectors in the associated collector group, and the latest data collected by the at least two target collectors. Based on the performance analysis results obtained from the performance index data, determine the frequency conversion direction corresponding to the associated collector group.
  • Figure 5 is a schematic structural diagram of another data collection device provided by an exemplary embodiment of the present application. As shown in Figure 5, the device includes:
  • the startup module 51 is configured to start at least two target collectors related to the job task during the execution of the job task, so that the at least two target collectors collect at least two of the computing nodes where they are located at the current collection frequency. performance indicator data;
  • the grouping module 52 is configured to group all the performance indicators collected by the at least two target collectors according to the correlation.
  • the at least two target collectors are divided into at least two associated collector groups;
  • the adjustment module 53 is used to respectively adjust the collection frequency of the target collectors in each associated collector group according to the change information of the performance index data collected by the target collectors in each associated collector group, so that the target collector can adjust Continue to collect at least two performance index data of the computing node where it is located at the subsequent collection frequency.
  • FIG. 6 is a schematic structural diagram of a computing node provided by an exemplary embodiment of the present application.
  • the computing node can be used as any computing node in the above-mentioned computing cluster 100.
  • the computing node includes: a memory 60a and a processor 60b; the memory 60a is used to store computer programs; the processor 60b , coupled to the memory 60a, for executing the computer program, for:
  • At least two target collectors related to the job task are started, so that the at least two target collectors collect at least two performance index data of the computing node where they are located at the current collection frequency.
  • the task is deployed on at least two computing nodes in the computing cluster 100;
  • the processor 60b when the processor 60b adjusts the collection frequency of the at least two target collectors according to the change information of the at least two performance index data, it is specifically used to: collect data according to the at least two target collectors.
  • the processor is responsible for the correlation of the collected performance indicators, and divides the at least two target collectors into at least two associated collector groups; according to the change information of the performance indicator data collected by the target collectors in each associated collector group , adjust the collection frequency of the target collector in each associated collector group separately.
  • the processor 60b when the processor 60b separately adjusts the collection frequency of the target collectors in each associated collector group based on the change information of the performance index data collected by the target collectors in each associated collector group, specifically Used for: For each associated collector group, determine the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group; The collection frequency currently used by the collector is adjusted to the closest preset frequency in the frequency conversion direction among multiple preset frequencies, where the multiple preset frequencies are arranged from small to large.
  • the processor 60b is also configured to: if the target in the associated collector group The current collection frequency of the collector is the maximum preset frequency, and if the frequency conversion direction is to increase the frequency, the current collection frequency remains unchanged; if the current collection frequency of each target collector in the associated collector group is the minimum preset frequency , and the frequency conversion direction is to reduce the frequency and keep the current acquisition frequency unchanged.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector within the associated collector group. Previously, it was also used: for each associated collector group, if the current collection frequency of each target collector in the associated collector group is different, adjust the current collection frequency of each target collector in the associated collector group. adjusted to the same acquisition frequency.
  • the processor 60b when adjusting the current collection frequency of each target collector in the associated collector group to the same collection frequency, is specifically configured to: adjust the current collection frequency of each target collector in the associated collector group.
  • the current collection frequency of each target collector in the associated collector group is adjusted to the average value of the current collection frequency of each target collector in the associated collector group; or, the current collection frequency of each target collector in the associated collector group is adjusted to the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the processor 60b determines the frequency conversion direction corresponding to the associated collector group based on the change information of the performance index data collected by the target collector in the associated collector group.
  • the computing node serving as the master node implements the function of adjusting the collection frequency when the same job task is deployed on at least two computing nodes.
  • the same job task may also be deployed on a computing node. In this case, it is not necessary to select the master node.
  • the processor 60b of the computing node that executes the job task may also perform the following operations:
  • At least two target collectors related to the job task start at least two target collectors related to the job task, so that the at least two target collectors collect at least two performance index data of the computing node where they are located at the current collection frequency;
  • the collection frequency of the target collector in each associated collector group is adjusted respectively, so that the target collector continues to collect at the adjusted collection frequency. At least two performance indicator data of the computing node where it is located.
  • the electronic device also includes: a communication component 60c, a power supply component 60d, and other components. Only some components are schematically shown in FIG. 6 , which does not mean that the electronic device only includes the components shown in FIG. 6 .
  • computing node provided by this embodiment can implement the technical solution described in the method embodiment of Figure 2 or Figure 3.
  • the specific implementation principles of each module or unit mentioned above can be found in the implementation of the computing cluster 100 shown in Figure 1a. The corresponding content in the example will not be repeated here.
  • An exemplary embodiment of the present application provides a computer-readable storage medium storing computer programs/instructions.
  • the processor can implement the steps in the above method, which will not be described again here. .
  • An exemplary embodiment of the present application provides a computer program product, which includes a computer program/instruction.
  • the processor can implement the steps in the above method, which will not be described again here.
  • the communication component in the above embodiment is configured to facilitate wired or wireless communication between the node where the communication component is located and other nodes.
  • the node where the communication component is located can access a wireless network based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the display in the above embodiment includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • the power supply component in the above embodiment provides power for various components of the node where the power supply component is located.
  • a power component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the node in which the power component resides.
  • the audio components in the above embodiments may be configured to output and/or input audio signals.
  • the audio component includes a microphone (MIC).
  • the microphone When the node where the audio component is located is in an operating mode, such as call mode, recording mode, and voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in memory or sent via a communications component.
  • the audio component further includes a speaker for outputting audio signals.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing node to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing node, so that on the computer A series of operational steps are executed on a computer or other programmable node to produce a computer-implemented process, whereby instructions executed on a computer or other programmable node provide for implementing a process or processes in a flowchart and/or a block in a block diagram or steps for a function specified in multiple boxes.
  • a compute node includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cartridges tape disk storage or other magnetic storage nodes or any other non-transmission media can be used to store information that can be accessed by the computing nodes.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例提供一种计算集群及其数据采集方法、设备及存储介质。在本申请实施例中,在计算集群场景中,根据性能指标数据的变化信息,自适应的改变性能指标数据的采集频率,既能保证采集精度,保障基于性能指标数据的性能分析和基于分析结果的决策准确性,又能减少性能指标数据的采集和处理开销;在自适应改变采集频率过程中,针对执行同一作业任务的至少两个计算节点,由至少两个计算节点中的主节点负责采集频率的适应性变化处理并在需要改变的情况下同步给其它计算节点,其它计算节点无需负责采集频率的自适应变化处理,可减轻其它计算节点的处理负担。

Description

计算集群及其数据采集方法、设备及存储介质
本申请要求于2022年05月17日提交中国专利局、申请号为202210541467.1、申请名称为“计算集群及其数据采集方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云计算技术领域,尤其涉及一种计算集群及其数据采集方法、设备及存储介质。
背景技术
高性能计算(High Performance Computing,缩写HPC)指通常使用很多处理器(作为单个机器的一部分)或者某一集群中组织的几台计算机(作为单个计算资源操作)的计算系统和环境。有许多类型的HPC系统,其范围从标准计算机的大型集群,到高度专用的硬件。大多数基于集群的HPC系统使用高性能网络将计算机进行互连。
其中,性能监测和性能分析是构建HPC系统不可缺少的一部分。对HPC系统来说,性能监测和性能分析时,性能指标数据的处理开销是HPC系统面临的一个挑战,包括数据传输开销、数据处理开销、数据存储开销和数据分析开销。
为了降低性能指标数据的处理开销,通常做法是使用较低的采集频率,来减少性能指标数据的数据量,然而较低的采集频率可能造成性能指标数据的缺失或失真,影响性能分析结果的准确性,导致基于性能分析结果的决策失误。
发明内容
本申请的多个方面提供一种计算集群及其数据采集方法、设备及存储介质,用以解决性能指标数据的处理开销与性能分析结果准确性之间的矛盾问题,保证性能指标的采集精度,保障性能分析的准确性,同时减少性能指标数据的采集和处理开销。
本申请实施例提供一种计算集群,包括:管控节点和多个计算节点,每个计算节点上部署有多个采集器,不同采集器用于采集不同的性能指标;管控节点,用于在多个计算节点中的至少两个计算节点上部署同一作业任务,并控制至少两个计算节点执行作业任务;每个计算节点,用于在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;在确定自身是至少两个计算节点中的主节点的情况下,根据至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率;通知至少两个计算节点中其它计算节点调整其上至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
本申请实施例还提供一种用于计算集群的数据采集方法,包括:在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,作业任务被部署在计算集群中的至少两个计算节点上;在确定自身是至少两个计算节点中的主节点的情况下,根据至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率;通知至少两个计算节点中的其它计算节点调整其上部署的至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
本申请实施例还提供另一种用于计算集群的数据采集方法,包括:在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;根据至少两个目标采集器负责采集的性能指标的关联性,将至少两个目标采集器划分至至少两个关联采集器分组中;根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
本申请实施例还提供一种数据采集装置,应用于计算集群中的任一计算节点,包括:启动模块,用于在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,作业任务被部署在计算集群中的至少两个计算节点上;调整模块,用于在确定自身是至少两个计算节点中的主节点的情况下,根据至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率;通知模块,用于通知至少两个计算节点中的其它计算节点调整其上部署的至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
本申请实施例还提供一种计算节点,可应用于计算集群,计算节点包括:存储器和处理器;存储器,用于存储计算机程序;处理器,与存储器耦合,用于执行计算机程序,以用于执行以上所述方法中的步骤。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序被处理器执行时,致使处理器能够实现以上所述方法中的步骤。
在本申请实施例中,在计算集群场景中,根据性能指标数据的变化信息,自适应的改变性能指标数据的采集频率,既能保证采集精度,保障基于性能指标数据的性能分析和基于分析结果的决策准确性,又能减少性能指标数据的采集和处理开销;在自适应改变采集频率过程中,针对执行同一作业任务的至少两个计算节点,由至少两个计算节点中的主节点负责采集频率的适应性变化处理并在需要改变的情况下同步给其它计算节点,其它计算节点无需负责采集频率的自适应变化处理,可减轻其它计算节点的处理负担。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1a为本申请一示例性实施例提供的计算集群的结构示意图;
图1b为本申请一示例性实施例提供的多个频率组的示意图;
图2为本申请又一示例性实施例提供的计算集群的数据采集方法的流程示意图;
图3为本申请又一示例性实施例提供的计算集群的数据采集方法的流程示意图;
图4为本申请又一示例性实施例提供的数据采集装置的结构示意图;
图5为本申请又一示例性实施例提供的数据采集装置的结构示意图;
图6为本申请又一示例性实施例提供的计算节点的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图1a为本申请一示例性实施例提供的计算集群100的结构示意图。本实施例的计算集群100可以实现为规模较大的计算平台,或者是HPC系统,或者是一个或多个机房,互联网数据中心(IDC),或者是云计算系统等,本实施例对计算集群100的具体实现形态不做限定。如图1a所示,该计算集群100包括:管控节点101和多个计算节点102。管控节点101与多个计算节点102之间,以及多个计算节点102之间可以实现通信连接。
在本实施例中,上述通信连接可以是有线或无线通信连接。可选地,在无线通信连接的情况下,各节点之间可以通过移动网络实现通信连接,相应地,移动网络的网络制式可以为2G(GSM)、2.5G(GPRS)、3G(WCDMA、TD-SCDMA、CDMA2000、UTMS)、4G(LTE)、4G+(LTE+)、5G、WiMax或者未来即将出现的新网络制式等中的任意一种。可选地,各节点也可以位于同一局域网内,则在无线通信连接的情况下,各节点也可以通过蓝牙、WiFi、红外、zigbee或NFC等方式实现通信连接。
在本实施例中,并不限定管控节点101和计算节点102的实现形态。其中,管控节点101的实现形态有多种,例如可以部署在虚拟机、云服务器、云主机或物理机上。可选地,管控节点101可以集中部署在一台物理机或一台虚拟机上,也可以分布式部署在多台物理机或多台虚拟机上,并不限于此。相应地,计算节点102可以是任何具有一定计算能力和通信能力的设备形态,例如可以是虚拟机、物理机(如服务器、计算机设备)、云服务器、云主机、虚拟中心、服务器阵列或数据库等。
其中,管控节点101一方面可以面向用户提供人机交互接口,通过人机交互接口接收用户提交的作业任务,另一方面可以对计算集群100进行各种管控,例如可以在多个计算节点102上部署作业任务,控制各计算节点102执行作业任务,管理各计算节点102的任务执行状态等。在实际应用中,一个作业任务可以被部署在一个计算节点102上,或者也可以被部署在至少两个计算节点102上,具体是作业任务的类型和性能要求而定,对于计算量较大或者对计算效率要求较高的作业任务,可以同时被部署在至少两个计算节点102 上,由至少两个计算节点102并行执行,提高计算效率。基于此,管控节点101具体可用于在多个计算节点102中的至少两个计算节点102上部署同一作业任务,并控制至少两个计算节点102执行该作业任务。其中,在计算节点102上部署作业任务的方式可以是但不限于:将作业任务相关的数据下发至计算节点102,或者,向计算节点102下发任务指令,在任务指令中携带作业任务的标识信息,由计算节点102根据作业任务的标识信息从任务数据库中获取作业任务相关的数据。其中,控制计算节点102执行作业任务的方式可以是但不限于:向计算节点102发送启动指令,指示计算节点102开始执行作业任务;或者,可以向计算节点102下发作业指示参数,该作业指示参数包括执行作业任务的时间,例如10分钟后启动作业任务,或者在指定时间xxx时xxx分启动作业任务,等等。
在本实施例中,每个计算节点102作为任务执行节点,可接收管控节点101部署的作业任务,并根据管控节点101的控制执行作业任务。除此之外,每个计算节点102上部署有多个采集器,每个采集器负责采集一种性能指标数据,不同采集器负责采集不同的性能指标数据。采集器可以是一种具有数据采集功能的程序代码,在实现形态上可以是依托于主程序的插件、SDK,或者也可以是独立的软件功能模块,对此不做限定。每个采集器可按照一定采集频率进行性能指标数据的采集操作,性能指标数据的数量与采集频率的大小有直接关系;其中,采集频率越高,采集到的性能指标数据越多,基于性能指标数据的性能分析和性能监测的精度越高,当然,数据传输、存储和计算开销也就越大;采集频率越低,采集到的性能指标数据越少,基于性能指标数据的性能分析和性能监测的精度相对会较低,当然,数据传输、存储和计算开销也相对越小。
基于上述,本实施例中的每个计算节点102除了根据管控节点101的控制执行作业任务之外,还可以在执行作业任务过程中,启动与作业任务相关的至少两个采集器,以使被启动的至少两个采集器以当前采集频率采集至少两种性能指标数据。为了便于描述和区分,将计算节点102在执行作业任务过程中所启动的与该作业任务相关的至少两个采集器称为目标采集器,目标采集器的数量为至少两个,对于不同作业任务所启动的目标采集器会有所不同,具体视作业任务对性能指标的要求而定。
本实施例中的性能指标数据包括但不限于:计算节点102的CPU利用率、内存利用率、内存剩余量、网络带宽的大小、作业任务占用的CPU资源量、作业任务占用的内存资源量、作业任务消耗的带宽资源量,等等。根据这些性能指标数据,可以从计算节点102和/或作业任务的维度进行性能分析和性能监测。例如,根据这些性能指标数据,可以分析或监测计算节点102的性能属性,可分析或可监测的性能属性包括但不限于:任务负载情况、网络状态,以及当前可用资源量等,当前可用资源量至少包括:计算节点102的CPU或内存的剩余量;进一步,管控节点101可以获取计算节点102的这些性能属性,并可根据这些性能属性确定是否可以继续为该计算节点102分配新的作业任务,以及是否需要动态调整该计算节点102的资源量,例如增加CPU资源或者网络带宽资源等。例如,根据这些性能指标数据,可以分析或监测作业任务的运行状态、资源消耗情况,以及作业任务对应的服务质量(QoS)等;进一步,管控节点101可以获取作业任务的运行状态、资源消耗情况,以及作业任务对应的QoS,并可根据这些信息确定是否需要为作业任务增加或减少计算节点102,以便在保证作业任务的运行状态和QoS的情况下,尽量做到节点资源的合理利用。
基于上述分析,如图1a所示,本实施例的计算机集群还包括:性能分析节点104,性 能分析节点104与管控节点101及多个计算几点之间实现通信连接。通信连接方式可参见上文描述,在此不再赘述。在本实施例中,性能分析节点104负责接收各计算节点102上报的性能指标数据,根据各计算节点102上报的性能指标数据对计算节点102和/或作业任务进行性能分析和性能监测。性能分析和监测是针对计算集群100进行防护性维护的必要部分,方便运维人员了解整个集群的运行情况,观察集群资源使用效率等。
具体地,性能分析节点104可以根据各计算节点102上报的性能指标数据,例如各计算节点102的CPU利用率、内存利用率、内存剩余量、网络带宽的大小等,可以分析或监测计算节点102的性能属性,可分析或可监测的性能属性包括但不限于:任务负载情况、网络状态,以及当前可用资源量等,并将各计算节点102的性能属性提供给管控节点101,以供管控节点101做进一步决策。和/或,性能分析节点104可以根据各计算节点102上报的性能指标数据,例如各计算节点102的CPU利用率、内存利用率、作业任务占用的CPU资源量、内存资源量以及消耗的带宽资源,可以分析或监测作业任务的运行状态、资源消耗情况,以及作业任务对应的QoS等性能数据,并将作业任务的各种性能数据提供给管控节点101,以供管控节点101做进一步决策。例如,管控节点101可以分析作业任务的运行时的行为特征;根据作业任务的运行时的行为特征,为该作业任务适配较佳的资源配置,这些资源配置至少包括计算节点102的数量、每个计算节点102上的CPU、内存、网络等资源。
基于上述,对于在至少两个计算节点102上部署同一作业任务的情况,性能分析节点104具体可用于根据上述至少两个计算节点102分别上报的至少两种性能指标数据,分析得到至少两个计算节点102的最新性能属性,并提供给管控节点101,以供管控节点101做进一步决策,使整个计算机集群在性能管控方面形成一个闭环。
在本申请实施例中,每个计算节点102可以执行作业任务,并可以在执行作业任务过程中启动与作业任务相关的至少两个目标采集器,以使至少两个目标采集器以当前采集频率采集其所在计算节点102的至少两种性能指标数据。除此之外,在至少两个目标采集器采集性能指标数据的过程中,每个计算节点102还可以根据性能指标数据的变化信息,自适应的改变其上至少两个目标采集器采集性能指标数据使用的采集频率,以便于实现性能指标数据的变频采集,既能保证采集精度,保障基于性能指标数据的性能分析和决策的准确性,又能减少性能指标数据的采集和处理开销。
在本实施例中,由于每个计算节点102上的至少两个目标采集器在采集的过程中可能都需要做变频处理,若每个计算节点102针对各自的至少两个目标采集器的采集频率逐个进行计算调整,数据处理量太大,既费时又费力,尤其是在超算场景下,计算节点102以及目标采集器的数量较多,计算量会比较大,影响计算节点102的整体性能。为了减少动态调整采集频率引起的数据处理量并提高对目标采集器的频率的调整效率,在本申请实施例中,对于部署同一作业任务的至少两个计算节点102,可以从中选定主节点103,由主节点103根据其上至少两个目标采集器采集到的主节点103的至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率,并通知至少两个计算节点102中其它计算节点102调整其上至少两个目标采集器的采集频率,这样各计算节点102上的至少两个目标采集器就都能够以调整后的采集频率继续采集至少两种性能指标数据。需要说明的是,每个计算节点上的至少两个目标采集器负责采集其所在计算节点的至少两种性能指标数 据。在该过程中,只有主节点103负责执行集频率调整相关的数据处理操作,而其它计算节点102无需执行采集频率调整相关的数据处理操作,可以根据主节点103的通知直接调整其上至少两个目标采集器的采集频率即可,可减少动态调整采集频率引起的数据处理量,还可以提高对目标采集器的采集频率的调整效率。
在本本申请上述或下述实施例中,每个计算节点102在执行任务过程中,启动与作业任务相关的至少两个目标采集器,一种具体的实现方式如下:根据与自身所执行任务相关的至少两个性能指标,确定与该至少两个性能指标对应的至少两个目标采集器;之后根据本地存储的上述至少两个目标采集器的标识信息,将启动指令发送至相应的至少两个目标采集器;至少两个目标采集器在接收到启动指令后,开始运行并采集相应的性能指标数据。
进一步,至少两个采集器将采集到的性能指标数据发送给所属计算节点102,所属计算节点102会根据至少两个目标采集器采集的至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率。由于在执行同一作业任务中,各目标采集器的采集频率可能是相同或相近的,那么为了减少各计算节点102对各目标采集器的采集频率调整的任务量,可以从至少两个计算节点102中选择一个计算节点102作为主节点103,由该主节点103根据其上至少两个目标采集器采集到的至少两种性能指标数据的变化信息,确定调整后的采集频率,一方面调整本地至少两个目标采集器的采集频率,另一方面通知给其它计算节点102,其它计算节点102根据通知直接调整其上至少两个目标采集器使用的采集频率。基于此,对执行同一作业任务的至少两个计算节点102而言,每个计算节点102还需要确定自身是否属于主节点103,在确定自己是主节点103的情况下,根据本地至少两个目标采集器采集到的至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率;通知至少两个计算节点102中其它计算节点102调整其上至少两个目标采集器的采集频率。进一步,对于每个计算节点102,在确定自己不是主节点103的情况下,可以等待主节点103的通知,在接收到主节点103的通知之前,至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,在接收到主节点103的通知之后,调整本地至少两个目标采集器的采集频率,至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
在本申请实施例中,并不限定主节点103的选择方式,可以采用但不限于以下几种:
方式A1:主节点103由管控节点101负责选择。具体地,管控节点101还用于:根据至少两个计算节点102的属性信息,从至少两个计算节点102中选择主节点103,并向主节点103发送通知消息。对执行同一作业任务的至少两个计算节点102而言,每个计算节点102可根据是否接收到管控节点101发送的通知消息,如果接收到通知消息,则确定自身是主节点103,如果未接收到通知消息,则确定自身不是主节点103。
在本实施例中,至少两个计算节点102可能执行一个或多个作业任务,不同的作业任务的工作负载的大小、所需的网络带宽及使用的资源量都可能是不同的。基于此,管控节点101根据至少两个计算节点102的属性信息,从至少两个计算节点102中选择主节点103的一种可选实施方式如下:根据至少两个计算节点102的任务负载、网络状态和可用资源量中的至少一种性能属性,从至少两个计算节点102中选择主节点103。其中,任务负载表示计算节点102所执行任务的负载的大小;网络状态表示计算节点102在执行任务时的网络带宽的大小;可用资源量表示计算节点102当前的CPU、内存等剩余量。
在一可选实施例中,从至少两个计算节点102中选择主节点103时几种具体的实施方式如下:根据至少两个计算节点102的任务负载,从至少两个计算节点102中选择主节点103;或者,根据至少两个计算节点102的网络状态,从至少两个计算节点102中选择主节点103;或者,根据至少两个计算节点102的可用资源量,从至少两个计算节点102中选择主节点103;或者,根据至少两个计算节点102的任务负载和网络状态,从至少两个计算节点102中选择主节点103;或者,从至少两个计算节点102的任务负载和可用资源量,从至少两个计算节点102中选择主节点103;或者,根据至少两个计算节点102的网络状态和可用资源量,从至少两个计算节点102中选择主节点103;或者,根据至少两个计算节点102的任务负载、网络状态和可用资源量,从至少两个计算节点102中选择主节点103。
进一步,接续于上述可选实施例,在根据至少两个计算节点102的任务负载,从至少两个计算节点102中选择主节点103时,将任务负载小的计算节点102作为主节点103;在根据至少两个主节点103的网络状态,从至少两个计算节点102中选择主节点103时,将网络状态较佳的计算节点102作为主节点103;在根据至少两个计算节点102的可用资源量,从至少两个计算节点102中选择主节点103时,将可用资源量较多的计算节点102作为主节点103;在根据至少两个计算节点102的任务负载和网络状态,从至少两个计算节点102中选择主节点103时,将任务负载较小且网络状态较佳的计算节点102作为主节点103;在根据至少两个计算节点102的任务负载和可用资源量,从至少两个计算节点102中选择主节点103时,将任务负载较小且可用资源量较多的计算节点102作为主节点103;在根据至少两个计算节点102的网络状态和可用资源量,从至少两个计算节点102中选择主节点103时,将网络状态较佳且可用资源量较多的计算节点102作为主节点103;或者,在根据至少两个计算节点102的任务负载、网络状态和可用资源量,从至少两个计算节点102中选择主节点103时,将任务负载较小、网络状态较佳和可用资源量较多的计算节点102作为主节点103。
需要说明的是,上述从至少两个计算节点102确定主节点103的方式仅为示例性说明,但不限于此。
进一步,在本实施例中,考虑到主节点103的性能属性可能是动态变化的,为了便于提高动态调整采集频率的执行效率,可以动态更换新的主节点103。基于此,性能分析节点104可以根据至少两个计算节点102分别上报的至少两种性能指标数据,分析得到至少两个计算节点102的最新性能属性,并提供给管控节点101;管控节点101还用于:根据至少两个计算节点102的最新性能属性,重新从至少两个计算节点102中选择新的主节点103,并向新的主节点103发送通知消息。进一步,当新的主节点103接收到通知消息后,可以自动成为新的主节点103。进一步,管控节点101还可以向原有主节点103发送转为非主节点103的指示信息,原有主节点103在接收到指示信息时,关闭主节点103的功能。需要说明的是,在图1a中示出了管控节点101选择主节点103并动态更新主节点103的方式。
方式A2:除了管控节点101从至少两个计算节点102选择主节点103之外,计算节点102之间还可以按照设定的主节点103选择方式自行协商出主节点103。一种具体的实施方式如下:对于执行同一作业任务的每个计算节点102,可以根据至少两个计算节点102 (即自身和其它计算节点102)的指定属性信息,结合预先设定的根据指定属性信息选择主节点103应该满足的条件,确定自身是否为主节点103。其中,指定属性信息可以是计算节点102的设备编号或IP地址,根据指定属性信息选择主节点103应该满足的条件可以是设备编号或IP地址最大的节点作为主节点103,或者是设备编号或IP地址最大的节点作为主节点103。基于此,上述根据至少两个计算节点102的设备编号或IP地址,结合预先设定的根据指定属性信息选择主节点103应该满足的条件,确定自身是否为主节点103的一种方式,包括:每个计算节点102将自身的设备编号或IP地址与其它计算设备的设备编号或IP地址进行比较;若自身的设备编号或IP地址是最大的,则确定自身为主节点103,反之确定自身不是主节点103。在另一种可选实施例中,每个计算节点102将自身的设备编号或IP地址与其它计算设备的设备编号或IP地址进行比较;若自身的设备编号或IP地址是最小的,则确定自身为主节点103,反之确定自身不是主节点103。需要说明的是,在图1a中同时示出了计算节点102自主协商主节点103的方式。
在本申请上述或下述实施例中,主节点103可以根据本地至少两个目标采集器采集到的至少两种性能指标数据的变化信息,调整至少两个目标采集器的采集频率。其具体实施方式如下:首先,根据至少两个目标采集器负责采集的性能指标的关联性,将至少两个目标采集器划分至至少两个关联采集器分组中;以关联采集器分组为单位,根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率。在本实施例中,对采集器进行分组,并以分组为单位对关联性较强的采集器统一进行采集频率的调整,也就是说,对同一关联采集器分组内的目标采集器而言,调整后的采集频率是相同的,有利于进一步简化调整采集频率所消耗的计算资源,提高采集频率的整体调整效率。
在一可选实施例中,可以将至少两个目标采集器负责采集的性能指标的关联性大于预设阈值的目标采集器划分至至少两个关联采集器分组中。例如,可以将用于采集CPU利用率的目标采集器和用于采集CPU浮点运行效率的目标采集器划分至同一关联采集器分组中,将用于采集内存利用率的目标采集器和用于采集读写带宽资源的目标采集器划分至同一关联采集器分组中,将用于采集网络收/发带宽资源的目标采集器和用于采集网络收/发包率的目标采集器划分至同一关联器分组中,但不限于此。
进一步,在一可选实施例中,针对每个关联采集器分组,在调整每个关联采集器分组内目标采集器的采集频率之前,还可以判断该关联采集器分组内各目标采集器的当前采集频率是否相同,若不相同,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率。当前采集频率是指各目标采集器当前使用的采集频率。
接续于上述可选实施例,在关联采集器分组内各目标采集器的当前采集频率不同的情况下,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率,可以采用以下几种可选的实施方式:将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率的平均值;或者,将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最大采集频率;或者,将该关联目标采集器内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最小采集频率。上述对关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率的方式仅为示例性 说明,并不限于此。
同理,接续于上述可选实施例,在将关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率后,根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,其具体实施方式如下:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;之后将该关联采集器分组内目标采集器当前使用的采集频率,调整为多个预设频率中在该变频方向上最接近的预设频率,其中,多个预设频率由小到大。在本实施例中,变频方向包括增大频率,保持不变和减小频率三种,但不限于此,也可以将增大频率、减小频率的变频粒度细化,从而得到更多种变频方向。在本实施例中,预先设定多个频率,且多个预设频率由小到大,并不相同。假设多个预设频率从小到大依次为f1、f2、f3、f4、f5,假设当前采集频率为f2,变频方向为增大频率,则多个预设频率中在该变频方向上最接近的预设频率是指频率f3;同理,在变频方向为减小频率的情况下,多个预设频率中在该变频方向上最接近的预设频率是指频率f1。
在上述实施例中,预先设定多个频率分组,每个频率分组对应一个预设频率,这些预设频率由小到大,并按照采集频率从大到小有序地排列,如图1b所示,包括以下频率组,例如高频组m-高频组1,基频组,以及低频组1-低频组n;其中,m和n是正整数。对主节点103而言,在开始时,可以将至少两个目标采集器初始化为基频,并将这些目标采集器统一添加至基频组中进行管理;然后,根据目标采集器负责采集的性能指标的关联性,将至少两个目标采集器划分到不同的关联采集器分组中,以关联采集器分组为单位,对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;根据该变频方向,将该关联采集器分组内的目标采集器从当前所在频率组调整到在该变频方向上最接近的频率组中。
具体地,对于变频方向为增大频率的关联采集器分组,将该关联采集器分组内的目标采集器从基频分组中移动到高频组1中;对于变频方向为减小频率的关联采集器分组,将该关联采集器分组内的目标采集器从基频分组中移动到低频分组1中;对于变频方向为保持不变的关联采集器分组,将该关联采集器分组内的目标采集器继续保留在基频组中。随着时间的推移,可以根据类似变频方式,不断调整各关联采集器分组内的目标采集器所在的频率组,对于位于某个频率组中的目标采集器使用其所在频率组对应的预设频率进行性能指标数据的采集。
在一可选实施例中,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,一种具体的实施方式如下:首先针对每个关联采集器分组,获取该关联采集器分组内关键采集器采集到的关键性能指标数据,所述关键采集器是负责采集关键性能指标的目标采集器,关键指标数据是该关联采集器分组中各目标采集器能够采集到的部分指标数据,例如关键性能指标数据可以是重要性排序靠前的一个或多个指标数据;之后根据设定的统计间隔,统计各关键性能指标数据的变化率,并根据各关键性能指标数据的变化率生成全局变化率;进一步,根据所述全局变化率,确定该关联采集器分组对应的变频方向,该关联采集器分组对应的变频方向可以是增大频率、降低频率和保持不变中的任一种。
其中,在统计各关键性能指标数据的变化率时,可以根据设定的统计间隔,统计各关键性能指标数据在统计间隔之内的变化率,本实施例对统计间隔不做限定,可以将目标采集器当前采集频率对应的采集周期作为统计间隔,例如当前采集频率对应的采集周围为1s,则统计间隔为1s,也就是每1s采集一个关键性能指标数据,并计算相邻两次采集到的关键性能指标数据的变化率;或者,也可以将多个采集周期作为统计间隔,例如,可以将10个采集周期作为统计间隔,即统计间隔为10s,则每10s计算一次关键性能指标数据的变化率,可以根据在这10s内采集到的10次关键性能指标数据,计算关键性能指标数据在这10s内的变化率。
为了便于描述,使用Pi表示各关键性能指标,Δ(Pi)表示关键指标Pi相邻两个统计间隔的性能取值的变化量,并为不同的关键性能指标赋予不同的权重,用(W1,W2...Wn)表示,其中Wi代表第i个关键性能指标的权重,那么根据各关键性能指标数据的变化率生成的全局变化率可以表示为:需要说明的是,本实施例会设定每个关键性能指标的统计阈值,统计阈值用(PT1,PT2,...PTn)表示,其中PTi表示第i个关键性能指标最低变化阈值,基于此,可以通过判断在一个判断周期内,各关键性能指标的变化量是否超过该阈值,若变化量超过该阈值,统计各关键性能指标数据的变化率,否则若两个相邻周期的关键性能指标数据的变化量未超过对应的阈值,说明关键性能指标数据的变化比较小,可以直接将变化方向设置为0,表示不需要调频,即变频方向为保持不变。
在统计出各关键性能指标数据的变化率的情况下,根据上述加权求和公式,生成全局变化率。进一步,根据全局变化率,以利用变频策略,确定该关联采集器分组对应的变频方向,其一种具体的实施方式如下:将全局变化率作为变频策略的输入,并根据输出值判定该关联采集器分组对应的变频方向。其中,在输出值为0时,表示不改变目标采集器的频率;在输出值为1时,表示需要提高目标采集器的采集频率;在输出值为-1时,表示需要降低目标采集器的采集频率。进一步可选地,利用变频策略,确定输出值,一种具体的实施方式如下:计算该关联采集器分组中关键指标数据的加权变化率,为了便于描述,使用KeyDelta表示关键指标数据的加权变化率,变化率阈值的上下限为(β1,β2),其中β1≤β2,当KeyDelta>β2时,需要提高采集频率,输出值为1;当KeyDelta<β1时,需要降低采集频率,输出值为-1;当β1≤KeyDelta≤β2,保持原有采集频率,输出值为0。
在一可选实施例中,为了进一步提高变频方向的精确性,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,另一种具体的实施方式如下:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,以及最近一次根据至少两个目标采集器采集到的性能指标数据得到的性能分析结果,确定该关联采集器分组对应的变频方向。进一步可选地,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的第一变频方向,以及根据最近一次根据至少两个目标采集器采集到的性能指标数据得到的分析结果,确定该关联采集器分组对应的第二变频方向,若第一变频方向和第二变频方向相同,则确定第一变频方向为该关联采集器分组对应的变频方向;若第一变频方向和第二变频方向不相同,则可以暂时不调整当前采集频率,可以连续统计多个统计间隔内的第一变频方向,根据连续多个统计间隔内的第一变频方向,最终确定该关联采集器分组对应的变频方向。
需要说明的是,针对每个关联采集器分组,在调整该关联采集器分组内目标采集器的采集频率的过程中,若该关联采集器分组内目标采集器的当前采集频率为最大的预设频率,且变频方向为增大频率的,保持当前采集频率不变;若该关联采集器分组内各目标采集器的当前采集频率为最小的预设频率,且变频方向为减小频率,保持当前采集频率不变。
在此说明,在本实施例中,每个计算节点102都可以对本地至少两个目标采集器进行分组,且各计算节点102按照同样的标准对目标采集器进行分组。这样,主节点103可以以关联采集器分组为单位,当某个关联采集器分组的变频方向为增大频率或减小频率的情况下,可以向其它计算节点102发送通知消息,在该通知消息中携带增大频率或减小频率的指示信息,这样其它计算节点102根据该指示信息,可以将对应关联采集器分组中的目标采集器的采样频率进行增大或减小,具体可以增大或减小为该变频方向上最接近的预设频率。进一步可选地,各计算节点102在将至少两个目标采集器划分为至少两个关联采集器分组之后,也可以按照同样的方式,将每个关联采集器分组中各目标采集器的当前采集频率进行统一化,这样不同计算节点102上的同样关联采集器分组的当前采集频率是一样的,从而保证各计算节点102上同样关联采集器分组中的目标采集器可以同时增大或者同时减小,确保使用同样的采集频率,确保不同计算节点102上相同采集器的采集频率的一致性,方便后续对同一性能指标数据进行分析处理。
本申请上述各实施例中,在计算集群100场景中,根据性能指标数据的变化信息,自适应的改变性能指标数据的采集频率,既能保证采集精度,保障基于性能指标数据的性能分析和基于分析结果的决策准确性,又能减少采集开销;在自适应改变采集频率过程中,针对执行同一作业任务的至少两个计算节点102,由至少两个计算节点102中的主节点103负责采集频率的适应性变化处理并在需要改变的情况下同步给其它计算节点102,其它计算节点102无需负责采集频率的自适应变化处理,可减轻其它计算节点102的采集开销,进一步降低整体的采集开销。
在上述实施例中,描述了同一作业任务被部署在至少两个计算节点102上的情况下,作为主节点103的计算节点102如何对进行采集频率调整的功能实现。除此之外,同一作业任务也可能被部署在一个计算节点102上,在该情况下,可以不用选择主节点103,则执行作业任务的计算节点102可以采用下述方式自行调节本地目标采集器的采集频率,具体包括:在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;根据至少两个目标采集器负责采集的性能指标的关联性,将至少两个目标采集器划分至至少两个关联采集器分组中;根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使每个关联采集器分组内的目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。各步骤的详细实现方式,可参见前述实施例的描述,在此不再赘述。
图2为本申请一示例性实施例提供的用于计算集群100的数据采集方法的流程示意图。如图2所示,该方法包括:
201、在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标 数据,所述作业任务被部署在计算集群100中的至少两个计算节点上;
202、在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
203、通知所述至少两个计算节点中的其它计算节点调整其上部署的所述至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
在本实施例中,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率,包括:根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率。
在一可选实施例中,根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,包括:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;将该关联采集器分组内目标采集器当前使用的采集频率,调整为多个预设频率中在该变频方向上最接近的预设频率,其中,多个预设频率由小到大。
进一步可选地,针对每个关联采集器分组,在调整该关联采集器分组内目标采集器的采集频率的过程中,还包括:若该关联采集器分组内目标采集器的当前采集频率为最大的预设频率,且变频方向为增大频率的,保持当前采集频率不变;若该关联采集器分组内各目标采集器的当前采集频率为最小的预设频率,且变频方向为减小频率,保持当前采集频率不变。
进一步可选地,针对每个关联目标采集器,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向之前,还包括:针对每个关联采集器分组,若该关联采集器分组内各目标采集器的当前采集频率不相同,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率。
在一可选实施例中,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率,包括:将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率的平均值;或者,将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最大采集频率;或者,将该关联目标采集器内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最小采集频率。
在一可选实施例中,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,包括:针对每个关联采集器分组,获取该关联采集器分组内关键采集器采集到的关键性能指标数据,所述关键采集器是负责采集关键性能指标的目标采集器;根据设定的统计间隔,统计各关键性能指标数据的变化率,并根据各关键性能指标数据的变化率生成全局变化率;根据所述全局变化率,确定该关联采集器分组对应的变频方向,所述变频方向包括增大频率、降低频率和保持不变中的任一种。
在一可选实施例中,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,包括:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,以及最近一次根据所述至少两个目标采集器采集到的性能指标数据得到的性能分析结果,确定该关联采集器分组对应的变频方向。
这里需要说明的是:本实施例提供的用于计算集群100的数据采集方法中的各步骤的具体实现的原理可参见上述计算集群100实施例中的相应内容,此处不再赘述。
图3为本申请一示例性实施例提供的另一用于计算集群100的数据采集方法的流程示意图。如图3所示,该方法包括:
301、在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
302、根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;
303、根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
这里需要说明的是:本实施例提供的用于计算集群100的数据采集方法中的各步骤的具体实现的原理可参见上述计算集群100实施例中的相应内容,此处不再赘述。
图4为本申请一示例性实施例提供的数据采集装置的结构示意图。如图4所示,该装置包括:
启动模块41,用于在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,所述作业任务被部署在计算集群100中的至少两个计算节点上;
调整模块42,用于在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
通知模块43,用于通知所述至少两个计算节点中的其它计算节点调整其上部署的所述至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
在一可选实施例中,调整模块42在用于根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率时,具体用于:根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率。
在一可选实施例中,调整模块42在用于根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率时,具体用于:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;将该关联采集器分组内目标采集器当前使用的采集频率,调整为多个预设频率中在该变频方向上最接近的预设频率, 其中,多个预设频率由小到大。
进一步可选地,调整模块42在用于针对每个关联采集器分组,在调整该关联采集器分组内目标采集器的采集频率的过程中时,还用于:若该关联采集器分组内目标采集器的当前采集频率为最大的预设频率,且变频方向为增大频率的,保持当前采集频率不变;若该关联采集器分组内各目标采集器的当前采集频率为最小的预设频率,且变频方向为减小频率,保持当前采集频率不变。
进一步可选地,调整模块42在用于针对每个关联目标采集器,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向之前时,还用于:针对每个关联采集器分组,若该关联采集器分组内各目标采集器的当前采集频率不相同,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率。
在一可选实施例中,调整模块42在用于将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率时,具体用于:将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率的平均值;或者,将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最大采集频率;或者,将该关联目标采集器内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最小采集频率。
在一可选实施例中,调整模块42在用于针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向时,具体用于:针对每个关联采集器分组,获取该关联采集器分组内关键采集器采集到的关键性能指标数据,所述关键采集器是负责采集关键性能指标的目标采集器;根据设定的统计间隔,统计各关键性能指标数据的变化率,并根据各关键性能指标数据的变化率生成全局变化率;根据所述全局变化率,确定该关联采集器分组对应的变频方向,所述变频方向包括增大频率、降低频率和保持不变中的任一种。
在一可选实施例中,调整模块42在用于针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向时,具体用于:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,以及最近一次根据所述至少两个目标采集器采集到的性能指标数据得到的性能分析结果,确定该关联采集器分组对应的变频方向。
这里需要说明的是:本实施例提供的数据采集装置可实现上述图2方法实施例中描述的技术方案,上述各模块或单元具体实现的原理可参见上述图1a所示计算集群100实施例及图2方法实施例中的相应内容,此处不再赘述。
图5为本申请一示例性实施例提供的另一数据采集装置的结构示意图。如图5所示,该装置包括:
启动模块51,用于在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
分组模块52,用于根据所述至少两个目标采集器负责采集的性能指标的关联性,将所 述至少两个目标采集器划分至至少两个关联采集器分组中;
调整模块53,用于根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
这里需要说明的是:本实施例提供的数据采集装置可实现上述图3方法实施例中描述的技术方案,上述各模块或单元具体实现的原理可参见上述图1a所示计算集群100实施例及图3方法实施例中的相应内容,此处不再赘述。
图6为本申请一示例性实施例提供的计算节点的结构示意图。该计算节点可作为上述计算集群100中的任一计算节点,如图6所示,该计算节点包括:存储器60a和处理器60b;所述存储器60a,用于存储计算机程序;所述处理器60b,与所述存储器60a耦合,用于执行所述计算机程序,以用于:
在执行作业任务过程中,启动与作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,所述作业任务被部署在计算集群100中的至少两个计算节点上;
在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
通知所述至少两个计算节点中的其它计算节点调整其上部署的所述至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
在一可选实施例中,处理器60b根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率时,具体用于:根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率。
在一可选实施例中,处理器60b根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率时,具体用于:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;将该关联采集器分组内目标采集器当前使用的采集频率,调整为多个预设频率中在该变频方向上最接近的预设频率,其中,多个预设频率由小到大。
在一可选实施例中,处理器60b在针对每个关联采集器分组,在调整该关联采集器分组内目标采集器的采集频率的过程中,还用于:若该关联采集器分组内目标采集器的当前采集频率为最大的预设频率,且变频方向为增大频率的,保持当前采集频率不变;若该关联采集器分组内各目标采集器的当前采集频率为最小的预设频率,且变频方向为减小频率,保持当前采集频率不变。
在一可选实施例中,处理器60b在针对每个关联目标采集器,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向之前,还用于:针对每个关联采集器分组,若该关联采集器分组内各目标采集器的当前采集频率不相同,将该关联采集器分组内各目标采集器的当前采集频率调 整为相同的采集频率。
在一可选实施例中,处理器60b在将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率时,具体用于:将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率的平均值;或者,将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最大采集频率;或者,将该关联目标采集器内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最小采集频率。
在一可选实施例中,处理器60b在针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向时,具体用于:针对每个关联采集器分组,获取该关联采集器分组内关键采集器采集到的关键性能指标数据,所述关键采集器是负责采集关键性能指标的目标采集器;根据设定的统计间隔,统计各关键性能指标数据的变化率,并根据各关键性能指标数据的变化率生成全局变化率;根据所述全局变化率,确定该关联采集器分组对应的变频方向,所述变频方向包括增大频率、降低频率和保持不变中的任一种。
在一可选实施例中,处理器60b在针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向时,具体用于:针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,以及最近一次根据所述至少两个目标采集器采集到的性能指标数据得到的性能分析结果,确定该关联采集器分组对应的变频方向。
在上述实施例中,描述了同一作业任务被部署在至少两个计算节点上的情况下,作为主节点的计算节点如何对进行采集频率调整的功能实现。除此之外,同一作业任务也可能被部署在一个计算节点上,在该情况下,可以不用选择主节点,执行作业任务的计算节点的处理器60b还可以执行以下操作:
在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;
根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
进一步地,如图6所示,该电子设备还包括:通信组件60c、电源组件60d、等其它组件。图6中仅示意性给出部分组件,并不意味着电子设备只包括图6所示组件。
这里需要说明的是:本实施例提供的计算节点可实现上述图2或图3方法实施例中描述的技术方案,上述各模块或单元具体实现的原理可参见上述图1a所示计算集群100实施例中的相应内容,此处不再赘述。
本申请一示例性实施例提供存储有计算机程序/指令的计算机可读存储介质,当计算机程序/指令被处理器执行时,致使处理器能够实现以上所述方法中的步骤,此处不再赘述。
本申请一示例性实施例提供一种计算机程序产品,包括计算机程序/指令,当计算机程序/指令被处理器执行时,致使处理器能够实现以上所述方法中的步骤,此处不再赘述。
上述实施例中的通信组件被配置为便于通信组件所在节点和其他节点之间有线或无线方式的通信。通信组件所在节点可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
上述实施例中的显示器包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。
上述实施例中的电源组件,为电源组件所在节点的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在节点生成、管理和分配电力相关联的组件。
上述实施例中的音频组件,可被配置为输出和/或输入音频信号。例如,音频组件包括一个麦克风(MIC),当音频组件所在节点处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音频信号。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、节点(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理节点的处理器以产生一个机器,使得通过计算机或其他可编程数据处理节点的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理节点以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理节点上,使得在计算机 或其他可编程节点上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程节点上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算节点包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储节点或任何其他非传输介质,可用于存储可以被计算节点访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者节点不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者节点所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者节点中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (14)

  1. 一种计算集群,其特征在于,包括:管控节点和多个计算节点,每个计算节点上部署有多个采集器,不同采集器用于采集不同的性能指标;
    所述管控节点,用于在所述多个计算节点中的至少两个计算节点上部署同一作业任务,并控制所述至少两个计算节点执行所述作业任务;
    每个计算节点,用于在执行所述作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
    在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
    通知所述至少两个计算节点中其它计算节点调整其上至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
  2. 一种用于计算集群的数据采集方法,应用于计算集群中的任一计算节点,其特征在于,所述方法包括:
    在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,所述作业任务被部署在计算集群中的至少两个计算节点上;
    在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
    通知所述至少两个计算节点中的其它计算节点调整其上部署的所述至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
  3. 根据权利要求2所述的方法,其特征在于,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率,包括:
    根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;
    根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率。
  4. 根据权利要求3所述的方法,其特征在于,根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,包括:
    针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向;
    将该关联采集器分组内目标采集器当前使用的采集频率,调整为多个预设频率中在该变频方向上最接近的预设频率,其中,多个预设频率由小到大。
  5. 根据权利要求4所述的方法,其特征在于,针对每个关联采集器分组,在调整该关联采集器分组内目标采集器的采集频率的过程中,还包括:
    若该关联采集器分组内目标采集器的当前采集频率为最大的预设频率,且变频方 向为增大频率的,保持当前采集频率不变;
    若该关联采集器分组内各目标采集器的当前采集频率为最小的预设频率,且变频方向为减小频率,保持当前采集频率不变。
  6. 根据权利要求4所述的方法,其特征在于,针对每个关联目标采集器,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向之前,还包括:
    针对每个关联采集器分组,若该关联采集器分组内各目标采集器的当前采集频率不相同,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率。
  7. 根据权利要求6所述的方法,其特征在于,将该关联采集器分组内各目标采集器的当前采集频率调整为相同的采集频率,包括:
    将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率的平均值;
    或者
    将该关联采集器分组内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最大采集频率;
    或者
    将该关联目标采集器内各目标采集器的当前采集频率均调整为该关联采集器分组内各目标采集器的当前采集频率中的最小采集频率。
  8. 根据权利要求4所述的方法,其特征在于,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,包括:
    针对每个关联采集器分组,获取该关联采集器分组内关键采集器采集到的关键性能指标数据,所述关键采集器是负责采集关键性能指标的目标采集器;
    根据设定的统计间隔,统计各关键性能指标数据的变化率,并根据各关键性能指标数据的变化率生成全局变化率;
    根据所述全局变化率,确定该关联采集器分组对应的变频方向,所述变频方向包括增大频率、降低频率和保持不变中的任一种。
  9. 根据权利要求4-8任一项所述的方法,其特征在于,针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,确定该关联采集器分组对应的变频方向,包括:
    针对每个关联采集器分组,根据该关联采集器分组内目标采集器采集到的性能指标数据的变化信息,以及最近一次根据所述至少两个目标采集器采集到的性能指标数据得到的性能分析结果,确定该关联采集器分组对应的变频方向。
  10. 一种用于计算集群的数据采集方法,应用于计算集群中的任一计算节点,其特征在于,所述方法包括:
    在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
    根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目 标采集器划分至至少两个关联采集器分组中;
    根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
  11. 一种数据采集装置,应用于计算集群中的任一计算节点,其特征在于,所述装置包括:
    启动模块,用于在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据,所述作业任务被部署在计算集群中的至少两个计算节点上;
    调整模块,用于在确定自身是所述至少两个计算节点中的主节点的情况下,根据所述至少两种性能指标数据的变化信息,调整所述至少两个目标采集器的采集频率;
    通知模块,用于通知所述至少两个计算节点中的其它计算节点调整其上部署的所述至少两个目标采集器的采集频率,以使其它计算节点上的至少两个目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
  12. 一种数据采集装置,应用于计算集群中的任一计算节点,其特征在于,所述装置包括:
    启动模块,用于在执行作业任务过程中,启动与所述作业任务相关的至少两个目标采集器,以使所述至少两个目标采集器以当前采集频率采集其所在计算节点的至少两种性能指标数据;
    分组模块,用于根据所述至少两个目标采集器负责采集的性能指标的关联性,将所述至少两个目标采集器划分至至少两个关联采集器分组中;
    调整模块,用于根据每个关联采集器分组内目标采集器采集到的性能指标数据的变化信息,分别调整每个关联采集器分组内目标采集器的采集频率,以使目标采集器以调整后的采集频率继续采集其所在计算节点的至少两种性能指标数据。
  13. 一种计算节点,可应用于计算集群,其特征在于,所述计算节点包括:存储器和处理器;所述存储器,用于存储计算机程序;所述处理器,与所述存储器耦合,用于执行所述计算机程序,以用于执行权利要求2-10任一项所述方法中的步骤。
  14. 一种存储有计算机程序的计算机可读存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器能够实现权利要求2-10任一项所述方法中的步骤。
PCT/CN2023/093405 2022-05-17 2023-05-11 计算集群及其数据采集方法、设备及存储介质 WO2023221846A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210541467.1 2022-05-17
CN202210541467.1A CN115080341A (zh) 2022-05-17 2022-05-17 计算集群及其数据采集方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023221846A1 true WO2023221846A1 (zh) 2023-11-23

Family

ID=83249486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093405 WO2023221846A1 (zh) 2022-05-17 2023-05-11 计算集群及其数据采集方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115080341A (zh)
WO (1) WO2023221846A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784697A (zh) * 2024-01-31 2024-03-29 成都秦川物联网科技股份有限公司 智慧燃气管网数据采集终端智能控制方法与物联网系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080341A (zh) * 2022-05-17 2022-09-20 阿里巴巴(中国)有限公司 计算集群及其数据采集方法、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634092A (zh) * 2012-08-21 2014-03-12 艾默生网络能源-嵌入式计算有限公司 Cpu集群中的高精度定时器
US20140241175A1 (en) * 2013-02-22 2014-08-28 Apple Inc. Methods for Determining Relative Locations of Multiple Nodes in a Wireless Network
CN105682121A (zh) * 2016-01-29 2016-06-15 中国联合网络通信集团有限公司 传感器网络的数据采集方法、网关和数据采集系统
KR101992303B1 (ko) * 2018-03-18 2019-06-24 박성근 스마트 센서 네트워크의 데이터 수집 및 분석, 모니터링 시스템
CN110990227A (zh) * 2019-12-04 2020-04-10 哈尔滨工程大学 一种数值水池应用特征性能采集和监控系统及其运行方法
CN111104303A (zh) * 2019-12-13 2020-05-05 苏州浪潮智能科技有限公司 一种服务器指标数据采集方法、装置和介质
CN115080341A (zh) * 2022-05-17 2022-09-20 阿里巴巴(中国)有限公司 计算集群及其数据采集方法、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634092A (zh) * 2012-08-21 2014-03-12 艾默生网络能源-嵌入式计算有限公司 Cpu集群中的高精度定时器
US20140241175A1 (en) * 2013-02-22 2014-08-28 Apple Inc. Methods for Determining Relative Locations of Multiple Nodes in a Wireless Network
CN105682121A (zh) * 2016-01-29 2016-06-15 中国联合网络通信集团有限公司 传感器网络的数据采集方法、网关和数据采集系统
KR101992303B1 (ko) * 2018-03-18 2019-06-24 박성근 스마트 센서 네트워크의 데이터 수집 및 분석, 모니터링 시스템
CN110990227A (zh) * 2019-12-04 2020-04-10 哈尔滨工程大学 一种数值水池应用特征性能采集和监控系统及其运行方法
CN111104303A (zh) * 2019-12-13 2020-05-05 苏州浪潮智能科技有限公司 一种服务器指标数据采集方法、装置和介质
CN115080341A (zh) * 2022-05-17 2022-09-20 阿里巴巴(中国)有限公司 计算集群及其数据采集方法、设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784697A (zh) * 2024-01-31 2024-03-29 成都秦川物联网科技股份有限公司 智慧燃气管网数据采集终端智能控制方法与物联网系统
CN117784697B (zh) * 2024-01-31 2024-05-24 成都秦川物联网科技股份有限公司 智慧燃气管网数据采集终端智能控制方法与物联网系统

Also Published As

Publication number Publication date
CN115080341A (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
WO2023221846A1 (zh) 计算集群及其数据采集方法、设备及存储介质
CN109067862B (zh) API Gateway自动伸缩的方法与装置
EP3606008B1 (en) Method and device for realizing resource scheduling
Xia et al. Phone2Cloud: Exploiting computation offloading for energy saving on smartphones in mobile cloud computing
US11106560B2 (en) Adaptive thresholds for containers
CN106533723B (zh) 虚拟资源调度方法、装置及系统
WO2019001092A1 (zh) 负载均衡引擎,客户端,分布式计算系统以及负载均衡方法
CN103516807B (zh) 一种云计算平台服务器负载均衡系统及方法
TWI553472B (zh) 個人資料中心內的排程和管理
US20140282520A1 (en) Provisioning virtual machines on a physical infrastructure
US8547840B1 (en) Bandwidth allocation of bursty signals
CN103152393A (zh) 一种云计算的计费方法和计费系统
US11750711B1 (en) Systems and methods for adaptively rate limiting client service requests at a blockchain service provider platform
WO2021046774A1 (zh) 资源调度与信息预测方法、设备、系统及存储介质
CN102339233A (zh) 云计算集中管理平台
CN111078404A (zh) 一种计算资源确定方法、装置、电子设备及介质
CN113301075A (zh) 流量控制方法、分布式系统、设备及存储介质
Wu et al. Dynamically adjusting scale of a kubernetes cluster under qos guarantee
CN115981863A (zh) 一种结合业务特性的智能云资源弹性伸缩方法和系统
US20210182194A1 (en) Processor unit resource exhaustion detection and remediation
WO2022016845A1 (zh) 一种多节点监控方法、装置、电子设备及存储介质
CN106843890B (zh) 基于智能决策的传感器网络、节点及其运行方法
CN113728294B (zh) 功耗控制与方案生成方法、设备、系统及存储介质
CN108733536A (zh) 监控管理系统及方法
US10200724B1 (en) System for optimizing distribution of audio data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23806798

Country of ref document: EP

Kind code of ref document: A1