WO2023178923A1 - Intelligent monitoring micro-adjustment method and apparatus, device, and storage medium - Google Patents

Intelligent monitoring micro-adjustment method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023178923A1
WO2023178923A1 PCT/CN2022/115312 CN2022115312W WO2023178923A1 WO 2023178923 A1 WO2023178923 A1 WO 2023178923A1 CN 2022115312 W CN2022115312 W CN 2022115312W WO 2023178923 A1 WO2023178923 A1 WO 2023178923A1
Authority
WO
WIPO (PCT)
Prior art keywords
hardware
monitoring
target server
information
alarm
Prior art date
Application number
PCT/CN2022/115312
Other languages
French (fr)
Chinese (zh)
Inventor
刘成平
张东
郭锋
曹永奇
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023178923A1 publication Critical patent/WO2023178923A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Definitions

  • the present application relates to the field of computer monitoring technology, and more specifically, to an intelligent monitoring fine-adjustment method, device, equipment and storage medium.
  • the scale of current data center monitoring is getting larger and larger, and it is normal to use monitoring platforms for unified operation and maintenance management.
  • An intelligent monitoring and fine-tuning method including:
  • obtaining the hardware information corresponding to the changed hardware of the target server is the target information; wherein the target server is any server that needs to be monitored;
  • the corresponding monitoring items are matched from the preset monitoring item storage system, and the monitoring item information of the above target server is adjusted accordingly based on the matched monitoring items; wherein, the monitoring items stored in the above monitoring item storage system All monitoring items that can be used when running the server; and
  • An intelligent monitoring and fine-tuning device including:
  • An information acquisition module configured to: in response to changes in the hardware of the target server, obtain hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
  • the monitoring adjustment module is used to: match the corresponding monitoring items from the preset monitoring item storage system based on the above target information, and make corresponding adjustments to the monitoring item information of the above target server based on the matched monitoring items; wherein, the above monitoring All monitoring items that can be used when the monitoring server is stored in the item storage system; and
  • the hardware monitoring module is used to monitor the above-mentioned target server based on the monitoring item information of the above-mentioned target server.
  • An intelligent monitoring and fine-tuning device including:
  • Memory for storing computer-readable instructions
  • One or more processors are used to implement the steps of the above-mentioned intelligent monitoring and fine-tuning method when executing the above-mentioned computer-readable instructions.
  • a non-volatile computer-readable storage medium stores computer-readable instructions.
  • the above-mentioned computer-readable instructions are executed by one or more processors, the above-mentioned intelligent monitoring is realized. Steps for fine-tuning the method.
  • Figure 1 is a flow chart of an intelligent monitoring fine-adjustment method provided by one or more embodiments of the present application
  • Figure 2 is a schematic diagram of a specific implementation of an intelligent monitoring fine-adjustment method provided by one or more embodiments of the present application;
  • Figure 3 is a schematic structural diagram of an intelligent monitoring fine adjustment device provided by one or more embodiments of the present application.
  • Figure 4 is an internal structure diagram of an intelligent monitoring and fine-tuning device provided by one or more embodiments of the present application.
  • Figure 1 shows a flow chart of an intelligent monitoring fine-adjustment method provided by an embodiment of the present application. Specifically, it may include:
  • S11 In response to a change in the hardware of the target server, obtain the hardware information corresponding to the changed hardware of the target server as target information; where the target server is any server that needs to be monitored.
  • the embodiment of the present application utilizes a monitoring system including a monitoring platform to monitor servers; specifically, for servers that have been included in the monitoring system, they need to be monitored, so that any servers among them need to be monitored.
  • the hardware of the server changes, the corresponding monitoring item information is adjusted based on the changed hardware of any server.
  • the hardware of the server can include HBA (Host Bus Adapter, host bus adapter) card, switch, router, etc.
  • the hardware information of the hardware in the server is information that can be used to identify the corresponding hardware, which can include the manufacturer, SN (Serial Number, product) serial number), performance parameters, etc.
  • S12 Match the corresponding monitoring items from the preset monitoring item storage system based on the target information, and adjust the monitoring item information of the target server accordingly based on the matched monitoring items; wherein, the monitoring item storage system stores the monitoring server All monitoring items that can be used at the time.
  • the monitoring item storage system is a preset storage system. Its main function is to store all monitoring items that can be used by the monitoring system.
  • the monitoring items are customized minimized monitoring components, including CPU monitoring items, MEM (Memory, memory) monitoring items, Hard disk monitoring items, GPU monitoring items, network card monitoring items, power supply monitoring items, fan monitoring items, backplane monitoring items, external hardware HBA card monitoring items, etc.
  • the CPU monitoring items can include monitoring items from different manufacturers and configurations. Each CPU can adopt different monitoring items according to the manufacturer, number of cores, main frequency and other parameters.
  • the monitoring items for other hardware are similar. According to different manufacturers, Configuration and other parameters that affect performance are different, and different monitoring items are defined for the same type of hardware.
  • the monitoring item information of any server is the information including the monitoring items that each hardware in the arbitrary server matches respectively, that is, the information of each monitoring item that needs to implement data collection for the arbitrary server. Specifically, it can be the corresponding monitoring item. Scripts and other settings can also be made according to actual needs, which are all within the protection scope of this application.
  • the monitoring item storage system that stores all monitoring items that can be used when monitoring the server matches the monitoring items of the changed hardware, and then adjusts the monitoring item information of any server based on this. , so that the monitoring item information of the arbitrary server matches the hardware after the hardware changes, and then the monitoring of the arbitrary server is realized based on the monitoring item information of the arbitrary server.
  • this application can automatically match the corresponding monitoring items and adjust the monitoring item information when the server's hardware changes, so that the server's monitoring item information matches the server after the hardware change occurs, and then realizes the server based on the monitoring item information. Effective monitoring shows that when the hardware changes, this application can quickly incorporate the updated hardware into the unified operation and maintenance of the infrastructure, thereby enabling users to conduct integrated automatic operation and maintenance management more conveniently and quickly.
  • An intelligent monitoring fine-adjustment method makes corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items, which may include:
  • the corresponding information of the matched monitoring item is added to the monitoring item information of the target server; or in response to the change of the hardware of the target server, specifically by deleting the hardware, the corresponding information of the matched monitoring item is added.
  • the corresponding information of the monitoring item is deleted from the monitoring item information of the target server.
  • changes to hardware may be additions or deletions; specifically, when new hardware is added to any server, the corresponding information of the matched monitoring items can be automatically added to the monitoring of any server.
  • the item information when any server deletes hardware, the information of the monitoring item corresponding to the deleted hardware in the monitoring item information of any server can be automatically deleted. In this way, the server monitoring range can be updated in this simple and fast way without manual adjustment. It can plug and start when adding hardware and pull and stop when deleting hardware, avoiding manual omissions. Ineffective monitoring.
  • the monitoring item information can be a corresponding script.
  • adding the monitoring item corresponding information to the monitoring item information of the corresponding server means adding the monitoring item script to all monitoring item scripts of the corresponding server, and the monitoring item corresponding information is removed from the corresponding server.
  • Deletion from the monitoring item information of the server means deleting the monitoring item script from all monitoring item scripts of the corresponding server; of course, other settings based on actual needs are also within the scope of protection of this application.
  • new hardware it may be new hardware with the same configuration (new hardware with the same configuration and brand as any hardware that already exists in the server), or new hardware with heterogeneous configuration (new hardware with the same configuration and brand as the existing hardware in the server).
  • Existing hardware configurations and/or hardware of different brands in the embodiment of this application, when new hardware with the same configuration is added, the monitoring item script of the newly added hardware is automatically loaded and added to all monitoring item scripts of the corresponding server. , it can also detect the pressure on the monitoring system itself (i.e. system pressure) after adding new hardware.
  • the indicators of system pressure can include the monitoring system's own CPU load, its own memory utilization and its own network port speed, etc., and then detect the system pressure.
  • the pressure increase threshold can be set to 10%.
  • other settings can also be made according to actual needs, which are all within the protection scope of this application.
  • the alarm thresholds of the added hardware corresponding monitoring items are adjusted accordingly based on the alarm setting information, which may include:
  • the alarm configuration information In response to the alarm configuration information indicating that the user has not independently modified the alarm threshold of the added hardware corresponding monitoring item, select the threshold unchanged policy, and keep the currently used alarm threshold of the corresponding monitoring item unchanged based on the threshold unchanged policy; or, in response to the alarm configuration
  • the information indicates that the user independently modified the alarm thresholds of the added hardware corresponding monitoring items, selected the margin unchanged policy, and set the alarm thresholds of the added hardware corresponding monitoring items based on the margin unchanged policy, so that the adjusted alarm thresholds are the same as those used before the adjustment.
  • the corresponding alarm thresholds need to be in the same numerical range as the corresponding alarm monitoring items.
  • New heterogeneous configuration hardware includes non-same brands, different configurations, etc.; for new heterogeneous configuration hardware, parameters such as alarm thresholds can be intelligently adjusted according to user selections, and the adjustment strategy can include the threshold unchanged strategy, remaining Invariant quantity strategy, etc. Specifically, if it is detected that the user has not independently modified the alarm threshold of the newly added hardware, but instead uses the default corresponding alarm threshold of the monitoring system, it indicates that the user has a tolerance for the alarm threshold (that is, the alarm configuration information indicates that the user has not independently modified the added hardware).
  • the alarm threshold of the hardware corresponding to the monitoring item at this time the monitoring system automatically selects the "Threshold unchanged policy" by default; if it is detected that the user actively modifies the alarm threshold of the newly added hardware, it indicates that the user has a deep understanding of the hardware and follows their own understanding. Make proprietary settings (that is, the alarm configuration information indicates that the user has independently modified the alarm thresholds of the monitoring items corresponding to the added hardware). At this time, the monitoring system automatically selects the "margin unchanged strategy" by default. At the same time, in order to prevent errors in intelligent selection, embodiments of the present application can also provide prompts and windows for manually modifying the corresponding alarm threshold. It can be seen that the embodiment of the present application can make the monitoring alarms for monitoring items more in line with the actual needs of users by adjusting the alarm thresholds of monitoring items.
  • the alarm threshold of the monitoring item corresponding to the new hardware currently in use will be automatically updated for the new hardware after adding hardware of different brands and different configurations.
  • the alarm threshold of the monitoring item remains unchanged; for example, a 500G hard drive from manufacturer A has a serious alarm with a utilization rate of 80%.
  • the monitoring system automatically replaces the hard drive monitoring of manufacturer A due to the differences in hardware from different manufacturers.
  • the item is the hard disk monitoring item of manufacturer B, but the severe alarm threshold is still 80% and will not be changed.
  • the alarm thresholds of the monitoring items corresponding to the new hardware currently in use are added.
  • the monitoring items corresponding to the new hardware are automatically updated for the new hardware.
  • the alarm threshold of the monitoring item is automatically adjusted according to the customer's requirement that the margin remains unchanged.
  • the alarm threshold after adjustment has the same numerical range as the alarm threshold before adjustment (the margin is the same); for example, the 500G hard disk of manufacturer C, using The rate of serious alarms is 80%.
  • the above rules apply to hard disk utilization, CPU utilization, memory utilization, GPU card utilization, HBA card utilization, network card speed, fan speed and other hardware with performance indicator alarms.
  • Embodiments of the present application can monitor the hardware status of each hardware on the server in real time or regularly, thereby adjusting the monitoring strategy when the hardware status changes, thereby making the monitoring strategy flexible and suitable for actual conditions.
  • the monitoring and collection of indicators corresponding to CPU and memory monitoring items can be stopped to reduce the need for CPU and memory monitoring. loss, effectively preventing downtime; at the same time, for the indicator data of other monitoring items on the server where the CPU and memory status change, appropriate measures will be taken according to the alarm level.
  • it can be implemented in the following ways:
  • Minor CPU and memory alarms The collection of indicators corresponding to hardware monitoring items such as hard disks, network cards, and GPUs is reduced from once every polling cycle to once every three cycles; while ensuring performance data, it also takes into account minor CPU and memory alarms. Fault;
  • Moderate CPU and memory alarms Stop collecting indicators corresponding to all monitoring items;
  • CPU and memory serious alarms Stop collecting indicators corresponding to all monitoring items.
  • the new server When adding a new server, determine the new server as the target server, obtain the hardware information of each hardware on the target server, add the information of all monitoring items in the monitoring item storage system to the monitoring item information of the target server, and add the target server Among the monitoring item information, the corresponding information of the monitoring items that does not match the hardware information of any hardware on the target server is deleted.
  • corresponding initialization monitoring needs to be implemented to achieve effective monitoring of the server based on this.
  • full monitoring of all monitoring items under the default configuration can be automatically adopted (that is, the information of all monitoring items in the monitoring item storage system is added to the monitoring item information of the added server), and in the Obtain the hardware information of all the hardware of the added server at once (these hardware information can be stored in the corresponding hardware list), and then complete the accurate matching of full monitoring based on this, that is, the full monitoring does not correspond to any hardware information of the added server.
  • Delete the information of the monitoring items for example, if the added server does not have an external HBA card, you can delete the corresponding information of the monitoring items of the HBA card from the full monitoring. In this way, after adding the server, accurate matching can be achieved to the greatest extent, reducing useless monitoring item polling, reducing the load of the monitoring system, increasing the management scale and saving user costs.
  • an intelligent monitoring fine-adjustment method provided by the embodiment of the present application can be based on the three subsystems included in the monitoring system (respectively, the monitoring item storage system, the hardware matching system and the monitoring item adjustment system) and the monitoring platform.
  • the specific implementation can be as follows:
  • Monitoring item storage system The main function is to store the monitoring items used by the monitoring system, including CPU monitoring items, MEM monitoring items, hard disk monitoring items, GPU monitoring items, network card monitoring items, power supply monitoring items, fan monitoring items, and backplane monitoring items. External hardware HBA card monitoring items, etc.
  • Hardware matching system The main function is to automatically collect the identification information (hardware information) of all hardware of the server after the server is included in the monitoring system for the first time, including manufacturer, SN number, performance parameters, etc., and store it; the collected hardware information is the monitoring system Basic data for accurate matching of monitoring items and automatic adjustment of monitoring items after subsequent hardware updates; the hardware matching system regularly polls the server's hardware information and detects that the server has newly added hardware with the same configuration, new hardware with heterogeneous configuration, or deleted hardware. . When the hardware status changes, the monitoring item adjustment system is called to accurately adjust the server hardware monitoring items;
  • Monitoring item adjustment system The main function is to make intelligent adjustments to the hardware monitoring items polled by the server based on changes in hardware or hardware status:
  • Newly added heterogeneous configuration hardware including different brands and different configurations; intelligent adjustment of parameters such as alarm thresholds based on user selections, and adjustment strategies include unchanged threshold strategies and unchanged margin strategies;
  • Hardware status change adjustment For hardware that changes from normal to alarm status, the collection of data corresponding to the corresponding server monitoring items is stopped or the collection frequency is reduced.
  • This application atomizes server monitoring projects, centers on hardware, and stores corresponding monitoring items in the monitoring system according to different manufacturers and configurations; after monitoring changes in the managed server hardware, there is no need to manually re-add servers or New hardware corresponds to monitoring item information. Instead, monitoring of new servers or new hardware is automatically started based on the original configuration, and automatic adjustment of parameters such as alarm thresholds is completed without manual intervention. While ensuring the accuracy of monitoring data, it is more user-friendly. It greatly reduces the workload of customers to operate and maintain faulty servers, minimizes the operation and maintenance costs of data center servers, improves the operation and maintenance efficiency of data center server equipment, and helps operation and maintenance administrators complete management quickly and intelligently. The unification of equipment ensures the stable operation of upper-layer services and provides users with a unified, complete and accurate monitoring display page.
  • the embodiment of the present application also provides an intelligent monitoring fine-adjustment device, as shown in Figure 3, which may specifically include:
  • the information acquisition module 11 is configured to: in response to changes in the hardware of the target server, obtain hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
  • the monitoring adjustment module 12 is used to: match the corresponding monitoring items from the preset monitoring item storage system based on the target information, and make corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items; wherein, the monitoring item storage All monitoring items that can be used when monitoring the server are stored in the system;
  • the hardware monitoring module 13 is used to monitor the target server based on the monitoring item information of the target server.
  • the monitoring and adjustment module may include:
  • the monitoring adjustment unit is used to: respond to changes in the hardware of the target server, specifically by adding hardware, and add matching information corresponding to the monitoring items to the monitoring item information of the target server, or, in response to changes in the hardware of the target server, specifically by: Delete the hardware and delete the corresponding information of the matched monitoring items from the monitoring item information of the target server.
  • the stress detection module is used to: respond to changes in the hardware of the target server, specifically adding hardware, and the added hardware is the same brand and configuration as the hardware that already exists in the target server, and detect the changes in the target server after the hardware is added. Increased system pressure, and in response to the increased system pressure reaching the pressure increase threshold, corresponding expansion configuration prompt information is output; where the system pressure includes CPU load and/or memory utilization and/or network port rate.
  • the threshold adjustment module is used to: in response to changes in the hardware of the target server, specifically adding hardware, and the added hardware is of a different brand and/or configuration from the existing hardware in the target server, and obtaining user alarms corresponding to the added hardware. Set the information, and make corresponding adjustments to the alarm thresholds of the added hardware corresponding monitoring items based on the alarm setting information.
  • the threshold adjustment module may include:
  • a threshold adjustment unit configured to: respond to the alarm configuration information indicating that the user has not independently modified the alarm threshold of the monitoring item corresponding to the added hardware, select a threshold unchanged policy, and keep the alarm threshold of the currently used corresponding monitoring item unchanged based on the threshold unchanged policy ; Or, in response to the alarm configuration information indicating that the user voluntarily modified the alarm threshold of the added hardware corresponding monitoring item, select the margin unchanged policy, and set the alarm threshold of the added hardware corresponding monitoring item based on the margin unchanged policy, so that the adjusted
  • the alarm threshold is in the same numerical range as the corresponding alarm threshold used before adjustment, and the corresponding numerical range of the monitoring items to be alarmed is the same.
  • the status monitoring module is used to: monitor the hardware status of each hardware on the target server, and in response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm status, and reduce the corresponding level of any hardware based on the alarm status of any hardware.
  • the initialization module is used to: when adding a new server, determine the new server as the target server, obtain the hardware information of each hardware on the target server, and add the information of all monitoring items in the monitoring item storage system to the monitoring items of the target server. information, and delete the information corresponding to the monitoring items in the target server's monitoring item information that does not match the hardware information of any hardware on the target server.
  • the embodiment of the present application also provides an intelligent monitoring and fine-tuning device, which may include:
  • One or more processors are used to implement the steps of any of the above intelligent monitoring and fine-tuning methods when executing computer-readable instructions.
  • the intelligent monitoring and fine-tuning device includes a processor, memory, network interface and database connected through a system bus.
  • the processor of the intelligent monitoring and fine-tuning device is used to provide computing and control capabilities.
  • the memory of the intelligent monitoring and fine-tuning device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases.
  • This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the database of the intelligent monitoring and fine-tuning device is used to store the obtained data such as hardware information corresponding to the changed hardware of the target server. For specific stored data, please refer to the limitations in the above method embodiments.
  • the network interface of the intelligent monitoring and fine-tuning device is used to communicate with external terminals through a network connection.
  • the computer program implements a transportation path determination method when executed by the processor.
  • the intelligent monitoring fine-tuning device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • Embodiments of the present application also provide a non-volatile computer-readable storage medium.
  • Computer-readable instructions are stored on the non-volatile computer-readable storage medium.
  • Steps to implement any of the above intelligent monitoring and fine-tuning methods are executed by one or more processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Disclosed in the present application are an intelligent monitoring micro-adjustment method and apparatus, a device, and a storage medium. The method comprises: when hardware of a target server has changed, acquiring hardware information corresponding to the hardware, which has changed, of the target server, and taking the hardware information as target information; on the basis of the target information, matching a corresponding monitoring item from a preset monitoring item storage system, and on the basis of the matching monitoring item, correspondingly adjusting monitoring item information of the target server; and on the basis of the monitoring item information of the target server, monitoring the target server, wherein the target server is any server that needs to be monitored, and the monitoring item storage system stores all the monitoring items that can be used when a server is monitored.

Description

一种智能监控微调整方法、装置、设备及存储介质An intelligent monitoring fine-adjustment method, device, equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年03月23日提交中国专利局,申请号为202210285156.3,申请名称为“一种智能监控微调整方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on March 23, 2022, with the application number 202210285156.3, and the application name is "An intelligent monitoring fine-adjustment method, device, equipment and storage medium", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及计算机监控技术领域,更具体地说,涉及一种智能监控微调整方法、装置、设备及存储介质。The present application relates to the field of computer monitoring technology, and more specifically, to an intelligent monitoring fine-adjustment method, device, equipment and storage medium.
背景技术Background technique
当前的数据中心监控规模越来越大,利用监控平台来进行统一运维管理已是一种常态化的事情。随着需要监控的服务器数量的增多,服务器发生故障的数量在逐步上升,对服务器进行维修,更换硬件是一种主流解决故障的方式,发明人意识到,在服务器更换完新的硬件之后,因新硬件的厂商、配置有可能会发生变化,针对这些新更新的硬件,如何快速的纳入到基础设施的统一运维工程当中,让用户能够更加方便、快捷的进行一体化的自动运维管理,则成为本领域技术人员亟待解决的问题。The scale of current data center monitoring is getting larger and larger, and it is normal to use monitoring platforms for unified operation and maintenance management. As the number of servers that need to be monitored increases, the number of server failures is gradually increasing. Repairing servers and replacing hardware is a mainstream way to solve faults. The inventor realized that after the server is replaced with new hardware, due to The manufacturers and configurations of new hardware may change. For these newly updated hardware, how to quickly incorporate them into the unified operation and maintenance project of the infrastructure, so that users can more conveniently and quickly perform integrated automatic operation and maintenance management? This has become an urgent problem to be solved by those skilled in the art.
发明内容Contents of the invention
本申请本申请提供如下技术方案:This application provides the following technical solutions:
一种智能监控微调整方法,包括:An intelligent monitoring and fine-tuning method, including:
响应于目标服务器的硬件发生变动,获取上述目标服务器发生变动的硬件对应硬件信息为目标信息;其中,上述目标服务器为需要监控的任意服务器;In response to the hardware change of the target server, obtaining the hardware information corresponding to the changed hardware of the target server is the target information; wherein the target server is any server that needs to be monitored;
基于上述目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对上述目标服务器的监控项信息进行相应的调整;其中,上述监控项存储系统中存储有监控服务器时能够使用的全部监控项;及Based on the above target information, the corresponding monitoring items are matched from the preset monitoring item storage system, and the monitoring item information of the above target server is adjusted accordingly based on the matched monitoring items; wherein, the monitoring items stored in the above monitoring item storage system All monitoring items that can be used when running the server; and
基于上述目标服务器的监控项信息实现对上述目标服务器的监控。Realize monitoring of the above target server based on the monitoring item information of the above target server.
一种智能监控微调整装置,包括:An intelligent monitoring and fine-tuning device, including:
信息获取模块,用于:响应于目标服务器的硬件发生变动,获取上述目标服务器发生变动的硬件对应硬件信息为目标信息;其中,上述目标服务器为需要监控的任意服务 器;An information acquisition module, configured to: in response to changes in the hardware of the target server, obtain hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
监控调整模块,用于:基于上述目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对上述目标服务器的监控项信息进行相应的调整;其中,上述监控项存储系统中存储有监控服务器时能够使用的全部监控项;及The monitoring adjustment module is used to: match the corresponding monitoring items from the preset monitoring item storage system based on the above target information, and make corresponding adjustments to the monitoring item information of the above target server based on the matched monitoring items; wherein, the above monitoring All monitoring items that can be used when the monitoring server is stored in the item storage system; and
硬件监控模块,用于:基于上述目标服务器的监控项信息实现对上述目标服务器的监控。The hardware monitoring module is used to monitor the above-mentioned target server based on the monitoring item information of the above-mentioned target server.
一种智能监控微调整设备,包括:An intelligent monitoring and fine-tuning device, including:
存储器,用于存储计算机可读指令;及Memory for storing computer-readable instructions; and
一个或多个处理器,用于执行上述计算机可读指令时实现如上上述智能监控微调整方法的步骤。One or more processors are used to implement the steps of the above-mentioned intelligent monitoring and fine-tuning method when executing the above-mentioned computer-readable instructions.
一种非易失性计算机可读存储介质,上述该非易失性计算机可读存储介质上存储有计算机可读指令,上述计算机可读指令被一个或多个处理器执行时实现如上上述智能监控微调整方法的步骤。A non-volatile computer-readable storage medium. The above-mentioned non-volatile computer-readable storage medium stores computer-readable instructions. When the above-mentioned computer-readable instructions are executed by one or more processors, the above-mentioned intelligent monitoring is realized. Steps for fine-tuning the method.
本申请本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only This is an embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.
图1为本申请一个或多个实施例提供的一种智能监控微调整方法的流程图;Figure 1 is a flow chart of an intelligent monitoring fine-adjustment method provided by one or more embodiments of the present application;
图2为本申请一个或多个实施例提供的一种智能监控微调整方法的具体实现示意图;Figure 2 is a schematic diagram of a specific implementation of an intelligent monitoring fine-adjustment method provided by one or more embodiments of the present application;
图3为本申请一个或多个实施例提供的一种智能监控微调整装置的结构示意图;Figure 3 is a schematic structural diagram of an intelligent monitoring fine adjustment device provided by one or more embodiments of the present application;
图4为本申请一个或多个实施例提供的智能监控微调整设备的内部结构图。Figure 4 is an internal structure diagram of an intelligent monitoring and fine-tuning device provided by one or more embodiments of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
请参阅图1,其示出了本申请实施例提供的一种智能监控微调整方法的流程图,具体 可以包括:Please refer to Figure 1, which shows a flow chart of an intelligent monitoring fine-adjustment method provided by an embodiment of the present application. Specifically, it may include:
S11:响应于目标服务器的硬件发生变动,获取目标服务器发生变动的硬件对应硬件信息为目标信息;其中,目标服务器为需要监控的任意服务器。S11: In response to a change in the hardware of the target server, obtain the hardware information corresponding to the changed hardware of the target server as target information; where the target server is any server that needs to be monitored.
需要说明的是,本申请实施例是利用包含有监控平台的监控系统来实现对服务器的监控的;具体来说,对于已经纳入监控系统的服务器,需要对其进行监控,从而在监控到其中任意服务器的硬件发生变动时,基于该任意服务器发生变动的硬件实现相应的监控项信息的调整。其中,服务器的硬件可以包括HBA(Host Bus Adapter,主机总线适配器)卡、交换机、路由器等,服务器中硬件的硬件信息为可以用于识别相应硬件的信息,可以包括厂商、SN(Serial Number,产品序列号)号、性能参数等。It should be noted that the embodiment of the present application utilizes a monitoring system including a monitoring platform to monitor servers; specifically, for servers that have been included in the monitoring system, they need to be monitored, so that any servers among them need to be monitored. When the hardware of the server changes, the corresponding monitoring item information is adjusted based on the changed hardware of any server. Among them, the hardware of the server can include HBA (Host Bus Adapter, host bus adapter) card, switch, router, etc. The hardware information of the hardware in the server is information that can be used to identify the corresponding hardware, which can include the manufacturer, SN (Serial Number, product) serial number), performance parameters, etc.
S12:基于目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对目标服务器的监控项信息进行相应的调整;其中,监控项存储系统中存储有监控服务器时能够使用的全部监控项。S12: Match the corresponding monitoring items from the preset monitoring item storage system based on the target information, and adjust the monitoring item information of the target server accordingly based on the matched monitoring items; wherein, the monitoring item storage system stores the monitoring server All monitoring items that can be used at the time.
监控项存储系统为预先设置的存储系统,主要功能是存储监控系统能够使用的全部监控项,监控项为自定义的最小化的监控组件,包括CPU监控项、MEM(Memory,内存)监控项、硬盘监控项、GPU监控项、网卡监控项、电源监控项、风扇监控项、背板监控项、外传硬件HBA卡监控项等。其中,CPU监控项可以包括不同厂商、配置的监控项,每颗CPU可根据厂商、核数、主频及其它参数的不同而采用不同的监控项;其它硬件的监控项类似,根据不同厂商、配置等影响性能的参数不同,定义同类型硬件不同的监控项。基于监控项存储系统,在纳入监控系统的任意服务器的硬件发生变动时,可以从监控项存储系统中匹配该发生变动的硬件对应的监控项,然后基于此对该任意服务器的监控项信息进行相应的调整,以使得该任意服务器的监控项信息与发生硬件变动后该任意服务器所包含的全部硬件匹配,进而指示监控系统实时或者定时对该任意服务器的监控项信息对应各监控项的数据进行采集,基于采集到的数据实现对该任意服务器的监控。其中,任意服务器的监控项信息即为包含该任意服务器中各硬件分别匹配的监控项的信息,也即为需要对该任意服务器实现数据采集的各监控项的信息,具体可以是相应的监控项脚本,也可以根据实际需要进行其他设定,均在本申请的保护范围之内。The monitoring item storage system is a preset storage system. Its main function is to store all monitoring items that can be used by the monitoring system. The monitoring items are customized minimized monitoring components, including CPU monitoring items, MEM (Memory, memory) monitoring items, Hard disk monitoring items, GPU monitoring items, network card monitoring items, power supply monitoring items, fan monitoring items, backplane monitoring items, external hardware HBA card monitoring items, etc. Among them, the CPU monitoring items can include monitoring items from different manufacturers and configurations. Each CPU can adopt different monitoring items according to the manufacturer, number of cores, main frequency and other parameters. The monitoring items for other hardware are similar. According to different manufacturers, Configuration and other parameters that affect performance are different, and different monitoring items are defined for the same type of hardware. Based on the monitoring item storage system, when the hardware of any server included in the monitoring system changes, the monitoring item corresponding to the changed hardware can be matched from the monitoring item storage system, and then the monitoring item information of any server can be matched based on this Adjustment is made so that the monitoring item information of any server matches all the hardware contained in the arbitrary server after the hardware change occurs, and then instructs the monitoring system to collect the data corresponding to each monitoring item of the monitoring item information of any server in real time or regularly. , to monitor any server based on the collected data. Among them, the monitoring item information of any server is the information including the monitoring items that each hardware in the arbitrary server matches respectively, that is, the information of each monitoring item that needs to implement data collection for the arbitrary server. Specifically, it can be the corresponding monitoring item. Scripts and other settings can also be made according to actual needs, which are all within the protection scope of this application.
S13:基于目标服务器的监控项信息实现对目标服务器的监控。S13: Monitor the target server based on the monitoring item information of the target server.
本申请在任意服务器的硬件发生变动时,由存储有监控服务器时能够使用的全部监控项的监控项存储系统匹配发生变动的硬件的监控项,然后基于此对该任意服务器的监控项信息进行调整,以使得该任意服务器的监控项信息与硬件发生变动后的各硬件相匹配,进而基于该任意服务器的监控项信息实现对该任意服务器的监控。其中,本申请在 服务器的硬件发生变动时,能够自动实现相应监控项的匹配及监控项信息的调整,使得服务器的监控项信息与发生硬件变动后的服务器匹配,进而基于监控项信息实现对服务器的有效监控,可见本申请在硬件发生变动时能够将更新的硬件快速纳入基础设施的统一运维工作中,进而使得用户能够更加方便、快捷的进行一体化的自动运维管理。In this application, when the hardware of any server changes, the monitoring item storage system that stores all monitoring items that can be used when monitoring the server matches the monitoring items of the changed hardware, and then adjusts the monitoring item information of any server based on this. , so that the monitoring item information of the arbitrary server matches the hardware after the hardware changes, and then the monitoring of the arbitrary server is realized based on the monitoring item information of the arbitrary server. Among them, this application can automatically match the corresponding monitoring items and adjust the monitoring item information when the server's hardware changes, so that the server's monitoring item information matches the server after the hardware change occurs, and then realizes the server based on the monitoring item information. Effective monitoring shows that when the hardware changes, this application can quickly incorporate the updated hardware into the unified operation and maintenance of the infrastructure, thereby enabling users to conduct integrated automatic operation and maintenance management more conveniently and quickly.
本申请实施例提供的一种智能监控微调整方法,基于匹配到的监控项对目标服务器的监控项信息进行相应的调整,可以包括:An intelligent monitoring fine-adjustment method provided by embodiments of this application makes corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items, which may include:
响应于目标服务器的硬件发生变动具体为增加硬件,将匹配到的监控项对应信息添加至目标服务器的监控项信息中,或,响应于目标服务器的硬件发生变动具体为删除硬件,将匹配到的监控项对应信息从目标服务器的监控项信息中删除。In response to the change in the hardware of the target server, specifically by adding hardware, the corresponding information of the matched monitoring item is added to the monitoring item information of the target server; or in response to the change of the hardware of the target server, specifically by deleting the hardware, the corresponding information of the matched monitoring item is added. The corresponding information of the monitoring item is deleted from the monitoring item information of the target server.
需要说明的是,对于硬件发生的变动可能是增加,也可能是删除;具体来说,在任意服务器新增硬件的情况下,可以自动将匹配到的监控项对应信息添加至该任意服务器的监控项信息中,而在任意服务器删除硬件的情况下,可以自动将该任意服务器的监控项信息中与删除的硬件对应的监控项的信息进行删除。从而通过这种简单快速的方式实现对服务器监控范围的更新,无需人工调整,能够在新增硬件时做到即插即始、在删除硬件时做到即拔即停的同时,避免人工遗漏造成的无效监控。另外,监控项信息可以为相应脚本,因此将监控项对应信息添加至相应服务器的监控项信息,则为将监控项脚本添加至相应服务器全部的监控项脚本中,而将监控项对应信息从相应服务器的监控项信息中删除,则为将监控项脚本从相应服务器的全部监控项脚本中删除;当然根据实际需要进行的其他设定也均在本申请的保护范围之内。It should be noted that changes to hardware may be additions or deletions; specifically, when new hardware is added to any server, the corresponding information of the matched monitoring items can be automatically added to the monitoring of any server. In the item information, when any server deletes hardware, the information of the monitoring item corresponding to the deleted hardware in the monitoring item information of any server can be automatically deleted. In this way, the server monitoring range can be updated in this simple and fast way without manual adjustment. It can plug and start when adding hardware and pull and stop when deleting hardware, avoiding manual omissions. Ineffective monitoring. In addition, the monitoring item information can be a corresponding script. Therefore, adding the monitoring item corresponding information to the monitoring item information of the corresponding server means adding the monitoring item script to all monitoring item scripts of the corresponding server, and the monitoring item corresponding information is removed from the corresponding server. Deletion from the monitoring item information of the server means deleting the monitoring item script from all monitoring item scripts of the corresponding server; of course, other settings based on actual needs are also within the scope of protection of this application.
本申请实施例提供的一种智能监控微调整方法,还可以包括:An intelligent monitoring fine-adjustment method provided by embodiments of the present application may also include:
响应于目标服务器的硬件发生变动具体为增加硬件,且增加的为与目标服务器中已存在硬件的品牌及配置均相同的硬件,检测增加硬件后对目标服务器进行监控所增加的系统压力,以及响应于所增加的系统压力达到压力涨幅阈值,输出相应的扩容配置提示信息;其中,系统压力包括CPU负载和/或内存利用率和/或网口速率。In response to the change in the hardware of the target server, specifically adding hardware, and the added hardware is the same brand and configuration as the existing hardware in the target server, detecting the increased system pressure on the target server after adding the hardware, and responding When the increased system pressure reaches the pressure increase threshold, corresponding expansion configuration prompt information is output; where the system pressure includes CPU load and/or memory utilization and/or network port speed.
对于新增硬件的情况,可能是新增同配置硬件(新增与服务器内已存在的任意硬件配置及品牌均相同的硬件),也可能是新增异构配置硬件(新增与服务器内已存在的硬件配置和/或品牌不同的硬件);本申请实施例中对于新增同配置硬件的情况,在自动加载新增硬件的监控项脚本并添加至相应服务器的全部监控项脚本中的同时,还可以探测新增硬件后对监控系统自身的压力(即系统压力),系统压力的指标可以包括监控系统的自身CPU负载、自身内存利用率及自身网口速率等,进而在检测到系统压力因新增硬件造成的涨幅达到根据实际需要设定的压力涨幅阈值,则输出提示用于扩容服务主机的配置的 提示信息,以确保监控系统能够正常工作、不会因新增硬件发生宕机。其中,压力涨幅阈值可以设置为10%,当然也可以根据实际需要进行其他设定,均在本申请的保护范围之内。For new hardware, it may be new hardware with the same configuration (new hardware with the same configuration and brand as any hardware that already exists in the server), or new hardware with heterogeneous configuration (new hardware with the same configuration and brand as the existing hardware in the server). Existing hardware configurations and/or hardware of different brands); in the embodiment of this application, when new hardware with the same configuration is added, the monitoring item script of the newly added hardware is automatically loaded and added to all monitoring item scripts of the corresponding server. , it can also detect the pressure on the monitoring system itself (i.e. system pressure) after adding new hardware. The indicators of system pressure can include the monitoring system's own CPU load, its own memory utilization and its own network port speed, etc., and then detect the system pressure. If the increase caused by new hardware reaches the pressure increase threshold set based on actual needs, a prompt message for the configuration of the expanded service host will be output to ensure that the monitoring system can work normally and will not be down due to new hardware. Among them, the pressure increase threshold can be set to 10%. Of course, other settings can also be made according to actual needs, which are all within the protection scope of this application.
本申请实施例提供的一种智能监控微调整方法,还可以包括:An intelligent monitoring fine-adjustment method provided by embodiments of the present application may also include:
响应于目标服务器的硬件发生变动具体为增加硬件,且增加的为与目标服务器中已存在硬件的品牌不同和/或配置不同的硬件,获取用户对应增加的硬件的告警设置信息,并基于告警设置信息对增加的硬件对应监控项的告警阈值进行相应的调整;In response to changes in the hardware of the target server, specifically adding hardware, and the added hardware is of a different brand and/or configuration from the hardware that already exists in the target server, obtain the user's alarm setting information corresponding to the added hardware, and based on the alarm settings The information will make corresponding adjustments to the alarm thresholds of the monitoring items corresponding to the added hardware;
其中,基于告警设置信息对增加的硬件对应监控项的告警阈值进行相应的调整,可以包括:Among them, the alarm thresholds of the added hardware corresponding monitoring items are adjusted accordingly based on the alarm setting information, which may include:
响应于告警配置信息表示用户未自主修改增加的硬件对应监控项的告警阈值,选择阈值不变策略,并基于阈值不变策略保持当前使用的相应监控项告警阈值不变;或,响应于告警配置信息表示用户自主修改增加的硬件对应监控项的告警阈值,选择余量不变策略,并基于余量不变策略设置增加的硬件对应监控项的告警阈值,使调整后的告警阈值与调整前使用的相应告警阈值需告警的监控项对应数值范围相同。In response to the alarm configuration information indicating that the user has not independently modified the alarm threshold of the added hardware corresponding monitoring item, select the threshold unchanged policy, and keep the currently used alarm threshold of the corresponding monitoring item unchanged based on the threshold unchanged policy; or, in response to the alarm configuration The information indicates that the user independently modified the alarm thresholds of the added hardware corresponding monitoring items, selected the margin unchanged policy, and set the alarm thresholds of the added hardware corresponding monitoring items based on the margin unchanged policy, so that the adjusted alarm thresholds are the same as those used before the adjustment. The corresponding alarm thresholds need to be in the same numerical range as the corresponding alarm monitoring items.
新增异构配置硬件包括非同一品牌、不同配置等;对于新增异构配置硬件的情况可以根据用户的选择智能进行的告警阈值等参数的调整,而调整策略可以包括阈值不变策略、余量不变策略等。具体来说,如果监测到用户未自主修改新增硬件的告警阈值,而是采用监控系统默认的相应告警阈值,则表明用户对告警阈值拥有容忍量(即告警配置信息表示用户未自主修改增加的硬件对应监控项的告警阈值),此时监控系统自动默认选择“阈值不变策略”;如果监测到用户主动修改新增硬件的告警阈值,表明用户对硬件有深度的了解,并且按照自己的理解做出专有的设置(即告警配置信息表示用户自主修改增加的硬件对应监控项的告警阈值),此时监控系统自动默认选择“余量不变策略”。同时为了防止智能选择的误差,本申请实施例还可以提供人工修改相应告警阈值的提示及窗口。可见,本申请实施例能够通过监控项告警阈值的调整,使得对于监控项的监控告警更加符合用户的实际需求。New heterogeneous configuration hardware includes non-same brands, different configurations, etc.; for new heterogeneous configuration hardware, parameters such as alarm thresholds can be intelligently adjusted according to user selections, and the adjustment strategy can include the threshold unchanged strategy, remaining Invariant quantity strategy, etc. Specifically, if it is detected that the user has not independently modified the alarm threshold of the newly added hardware, but instead uses the default corresponding alarm threshold of the monitoring system, it indicates that the user has a tolerance for the alarm threshold (that is, the alarm configuration information indicates that the user has not independently modified the added hardware). The alarm threshold of the hardware corresponding to the monitoring item), at this time the monitoring system automatically selects the "Threshold unchanged policy" by default; if it is detected that the user actively modifies the alarm threshold of the newly added hardware, it indicates that the user has a deep understanding of the hardware and follows their own understanding. Make proprietary settings (that is, the alarm configuration information indicates that the user has independently modified the alarm thresholds of the monitoring items corresponding to the added hardware). At this time, the monitoring system automatically selects the "margin unchanged strategy" by default. At the same time, in order to prevent errors in intelligent selection, embodiments of the present application can also provide prompts and windows for manually modifying the corresponding alarm threshold. It can be seen that the embodiment of the present application can make the monitoring alarms for monitoring items more in line with the actual needs of users by adjusting the alarm thresholds of monitoring items.
在选择“阈值不变策略”后,当前使用的新增硬件对应监控项告警阈值,在新增不同品牌、不同配置的硬件后,针对新增的硬件,自动更新新增硬件对应的监控项,但是监控项的告警阈值不变;例如A厂商的500G硬盘、利用率为80%严重告警,更换为B厂商的500G硬盘后,因不同厂商的硬件差异化,监控系统自动更换A厂商的硬盘监控项为B厂商的硬盘监控项,但是严重告警阈值仍为80%,不进行变化。在选择“余量不变策略”后,当前使用的新增硬件对应监控项告警阈值,在新增不同品牌、不同配置的硬件后, 针对新增的硬件,自动更新新增硬件对应的监控项,且监控项的告警阈值根据客户余量不变的要求进行自动调整,调整后告警阈值与调整前告警阈值所需要告警的数值范围(余量是相同的);例如C厂商的500G硬盘、利用率为80%严重告警,更换为D厂商的1T硬盘后,监控系统自动更换D厂商的硬盘监控项,同时对严重告警阈值进行调整:C厂商告警余量为500G X(1-80%)=100G,D厂商新阈值余量100G/1T=0.1,1-0.1=90%,告警阈值自动修正为90%。上述规则适用于硬盘利用率、CPU利用率、内存利用率、GPU卡利用率、HBA卡利用率、网卡速率、风扇转速及其它具备性能指标告警的硬件。After selecting the "Threshold unchanged strategy", the alarm threshold of the monitoring item corresponding to the new hardware currently in use will be automatically updated for the new hardware after adding hardware of different brands and different configurations. However, the alarm threshold of the monitoring item remains unchanged; for example, a 500G hard drive from manufacturer A has a serious alarm with a utilization rate of 80%. After it is replaced with a 500G hard drive from manufacturer B, the monitoring system automatically replaces the hard drive monitoring of manufacturer A due to the differences in hardware from different manufacturers. The item is the hard disk monitoring item of manufacturer B, but the severe alarm threshold is still 80% and will not be changed. After selecting the "Standard Margin Strategy", the alarm thresholds of the monitoring items corresponding to the new hardware currently in use are added. After adding hardware of different brands and different configurations, the monitoring items corresponding to the new hardware are automatically updated for the new hardware. , and the alarm threshold of the monitoring item is automatically adjusted according to the customer's requirement that the margin remains unchanged. The alarm threshold after adjustment has the same numerical range as the alarm threshold before adjustment (the margin is the same); for example, the 500G hard disk of manufacturer C, using The rate of serious alarms is 80%. After replacing it with the 1T hard disk of manufacturer D, the monitoring system automatically replaces the hard disk monitoring items of manufacturer D and adjusts the serious alarm threshold: the alarm margin of manufacturer C is 500G X (1-80%) = 100G, manufacturer D’s new threshold margin is 100G/1T=0.1, 1-0.1=90%, and the alarm threshold is automatically corrected to 90%. The above rules apply to hard disk utilization, CPU utilization, memory utilization, GPU card utilization, HBA card utilization, network card speed, fan speed and other hardware with performance indicator alarms.
本申请实施例提供的一种智能监控微调整方法,还可以包括:An intelligent monitoring fine-adjustment method provided by embodiments of the present application may also include:
监控目标服务器上各硬件的硬件状态;Monitor the hardware status of each hardware on the target server;
响应于监控到目标服务器上任意硬件的硬件状态由正常状态转为告警状态,基于该任意硬件告警状态的等级降低该任意硬件对应监控项数据的采集频率或者停止该任意硬件对应监控项数据的采集。In response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm state, based on the level of the alarm status of the arbitrary hardware, reduce the collection frequency of the monitoring item data corresponding to the arbitrary hardware or stop the collection of monitoring item data corresponding to the arbitrary hardware .
本申请实施例可以实时或者定时监控服务器上各硬件的硬件状态,从而在硬件状态发生变化时调整监控策略,从而使得监控策略灵活且满足实际情况。具体来说,针对由正常状态转为告警状态的硬件,如CPU、内存状态改变,由正常状态变为告警状态后,则可以停止CPU、内存监控项对应指标的监控采集,减少对CPU、内存的损耗,有效防止宕机;同时对于CPU、内存状态改变的服务器上的其它监控项的指标数据,根据告警程度适当采取措施,如可以按照下列方式实现:Embodiments of the present application can monitor the hardware status of each hardware on the server in real time or regularly, thereby adjusting the monitoring strategy when the hardware status changes, thereby making the monitoring strategy flexible and suitable for actual conditions. Specifically, for hardware that changes from normal status to alarm status, such as CPU and memory status changes, after changing from normal status to alarm status, the monitoring and collection of indicators corresponding to CPU and memory monitoring items can be stopped to reduce the need for CPU and memory monitoring. loss, effectively preventing downtime; at the same time, for the indicator data of other monitoring items on the server where the CPU and memory status change, appropriate measures will be taken according to the alarm level. For example, it can be implemented in the following ways:
CPU、内存轻微告警:硬盘、网卡、GPU等硬件监控项对应指标采集由原来每个轮询周期采集一次,降低为每三个周期采集一次;在保障性能数据的同时,兼顾CPU、内存的轻微故障;Minor CPU and memory alarms: The collection of indicators corresponding to hardware monitoring items such as hard disks, network cards, and GPUs is reduced from once every polling cycle to once every three cycles; while ensuring performance data, it also takes into account minor CPU and memory alarms. Fault;
CPU、内存中度告警:停止一切监控项对应指标的收集;Moderate CPU and memory alarms: Stop collecting indicators corresponding to all monitoring items;
CPU、内存严重告警:停止一切监控项对应指标的收集。CPU and memory serious alarms: Stop collecting indicators corresponding to all monitoring items.
本申请实施例提供的一种智能监控微调整方法,还可以包括:An intelligent monitoring fine-adjustment method provided by embodiments of the present application may also include:
在添加新的服务器时,确定该新的服务器为目标服务器,获取目标服务器上各硬件的硬件信息,将监控项存储系统中全部监控项的信息加入至目标服务器的监控项信息,并将目标服务器的监控项信息中、与目标服务器上任意硬件的硬件信息均不匹配的监控项对应信息删除。When adding a new server, determine the new server as the target server, obtain the hardware information of each hardware on the target server, add the information of all monitoring items in the monitoring item storage system to the monitoring item information of the target server, and add the target server Among the monitoring item information, the corresponding information of the monitoring items that does not match the hardware information of any hardware on the target server is deleted.
在监控系统需要纳入新的服务器时,需要实现相应的初始化监控,以基于此实现对服务器的有效监控。具体来说,在服务器添加后,可以自动采用默认配置下的全部监控项的全量监控(也即将监控项存储系统中全部监控项的信息添加至添加的服务器的监控 项信息中),并在第一时间获取添加的服务器全部硬件的硬件信息(这些硬件信息可存储在相应硬件列表中),进而基于此完成全量监控的精准匹配,也即将全量监控中与添加的服务器的任意硬件信息均不对应的监控项的信息删除;比如添加的服务器没有外插HBA卡,则可以将HBA卡的监控项对应信息从全量监控中删除。从而在添加服务器后最大程度做到精准匹配,减少无用的监控项轮询,降低监控系统的负载,增大管理规模的同时节约用户的成本。When a new server needs to be included in the monitoring system, corresponding initialization monitoring needs to be implemented to achieve effective monitoring of the server based on this. Specifically, after the server is added, full monitoring of all monitoring items under the default configuration can be automatically adopted (that is, the information of all monitoring items in the monitoring item storage system is added to the monitoring item information of the added server), and in the Obtain the hardware information of all the hardware of the added server at once (these hardware information can be stored in the corresponding hardware list), and then complete the accurate matching of full monitoring based on this, that is, the full monitoring does not correspond to any hardware information of the added server. Delete the information of the monitoring items; for example, if the added server does not have an external HBA card, you can delete the corresponding information of the monitoring items of the HBA card from the full monitoring. In this way, after adding the server, accurate matching can be achieved to the greatest extent, reducing useless monitoring item polling, reducing the load of the monitoring system, increasing the management scale and saving user costs.
在一种具体实现方式中,本申请实施例提供的一种智能监控微调整方法可以基于监控系统包含的三个子系统(分别为监控项存储系统、硬件匹配系统及监控项调整系统)及监控平台实现,如图2所示,具体实现可以如下:In a specific implementation manner, an intelligent monitoring fine-adjustment method provided by the embodiment of the present application can be based on the three subsystems included in the monitoring system (respectively, the monitoring item storage system, the hardware matching system and the monitoring item adjustment system) and the monitoring platform. Implementation, as shown in Figure 2, the specific implementation can be as follows:
监控项存储系统:主要功能是存储监控系统使用的监控项,包括CPU监控项、MEM监控项、硬盘监控项、GPU监控项、网卡监控项、电源监控项、风扇监控项、背板监控项、外传硬件HBA卡监控项等。Monitoring item storage system: The main function is to store the monitoring items used by the monitoring system, including CPU monitoring items, MEM monitoring items, hard disk monitoring items, GPU monitoring items, network card monitoring items, power supply monitoring items, fan monitoring items, and backplane monitoring items. External hardware HBA card monitoring items, etc.
硬件匹配系统:主要功能是在服务器首次纳入监控系统后,自动采集服务器所有硬件的识别信息(硬件信息),包括厂商、SN号、性能参数等,并进行存储;采集到的硬件信息为监控系统进行监控项精准匹配、以及后续硬件更新后监控项自动调整的基础数据;硬件匹配系统定期轮询服务器的硬件信息,在检测到服务器有新增同配置硬件、新增异构配置硬件、删除硬件、硬件状态改变时,调用监控项调整系统进行服务器硬件监控项的精准调整;Hardware matching system: The main function is to automatically collect the identification information (hardware information) of all hardware of the server after the server is included in the monitoring system for the first time, including manufacturer, SN number, performance parameters, etc., and store it; the collected hardware information is the monitoring system Basic data for accurate matching of monitoring items and automatic adjustment of monitoring items after subsequent hardware updates; the hardware matching system regularly polls the server's hardware information and detects that the server has newly added hardware with the same configuration, new hardware with heterogeneous configuration, or deleted hardware. . When the hardware status changes, the monitoring item adjustment system is called to accurately adjust the server hardware monitoring items;
监控项调整系统:主要功能是根据硬件变动或者硬件状态的变动,对服务器轮询的硬件监控项做出智能调整:Monitoring item adjustment system: The main function is to make intelligent adjustments to the hardware monitoring items polled by the server based on changes in hardware or hardware status:
1)初始化监控:服务器添加后,自动采用默认配置下的全部监控项的全量监控,并在第一时间获取服务器硬件匹配系统的数据,根据获取到的硬件信息的列表,完成全量监控项目的精准匹配;1) Initial monitoring: After the server is added, it will automatically adopt the full monitoring of all monitoring items under the default configuration, and obtain the data of the server hardware matching system at the first time. Based on the list of obtained hardware information, complete the accurate monitoring of all monitoring items. match;
2)新增同配置硬件:在新增同配置硬件的情况下,自动加载新硬件的监控项脚本,完成对新硬件的告警监控;同时探测新增硬件后,检测监控系统自身的压力,从而在压力涨幅较大时提醒用户扩容服务主机的配置,确保监控系统不宕机;2) Add new hardware with the same configuration: When adding hardware with the same configuration, automatically load the monitoring item script of the new hardware to complete the alarm monitoring of the new hardware; at the same time, after detecting the new hardware, detect the pressure of the monitoring system itself, thereby When the pressure increases significantly, users are reminded to expand the configuration of the service host to ensure that the monitoring system does not go down;
3)新增异构配置硬件:包括非同一品牌、不同配置;根据用户的选择智能进行的告警阈值等参数调整,调整策略包括阈值不变策略及余量不变策略;3) Newly added heterogeneous configuration hardware: including different brands and different configurations; intelligent adjustment of parameters such as alarm thresholds based on user selections, and adjustment strategies include unchanged threshold strategies and unchanged margin strategies;
4)删除硬件:在探测到硬件删除的情况下,自动删除硬件的监控项脚本,完成对监控范围的自动更新;4) Delete hardware: When hardware deletion is detected, the monitoring item script of the hardware is automatically deleted and the automatic update of the monitoring scope is completed;
5)硬件状态改变调整:针对由正常转为告警状态的硬件,实现相应服务器监控项对 应数据的停止采集或者采集频率降低。5) Hardware status change adjustment: For hardware that changes from normal to alarm status, the collection of data corresponding to the corresponding server monitoring items is stopped or the collection frequency is reduced.
本申请将服务器监控项目原子化,以硬件为中心、根据厂商及配置的不同,在监控系统中对相应监控项进行存储;在监控到被管理的服务器硬件发生变动后,无需手动重新添加服务器或者新硬件对应监控项信息,而是自动根据原有的配置启动新服务器或者新硬件的监控,并完成告警阈值等参数的自动化调整,无需人工干预,在保障监控数据准确性的同时,更加符合用户的预期,大大减少了客户运维故障服务器的工作量,最大化的减少数据中心服务器的运维成本,提高数据中心服务器设备的运维效率,能够帮助运维管理员快速、智能化的完成管理设备的统一,保障上层业务的稳定运行,为用户提供统一的、完整的、精准的监控展示页面。This application atomizes server monitoring projects, centers on hardware, and stores corresponding monitoring items in the monitoring system according to different manufacturers and configurations; after monitoring changes in the managed server hardware, there is no need to manually re-add servers or New hardware corresponds to monitoring item information. Instead, monitoring of new servers or new hardware is automatically started based on the original configuration, and automatic adjustment of parameters such as alarm thresholds is completed without manual intervention. While ensuring the accuracy of monitoring data, it is more user-friendly. It greatly reduces the workload of customers to operate and maintain faulty servers, minimizes the operation and maintenance costs of data center servers, improves the operation and maintenance efficiency of data center server equipment, and helps operation and maintenance administrators complete management quickly and intelligently. The unification of equipment ensures the stable operation of upper-layer services and provides users with a unified, complete and accurate monitoring display page.
本申请实施例还提供了一种智能监控微调整装置,如图3所示,具体可以包括:The embodiment of the present application also provides an intelligent monitoring fine-adjustment device, as shown in Figure 3, which may specifically include:
信息获取模块11,用于:响应于目标服务器的硬件发生变动,获取目标服务器发生变动的硬件对应硬件信息为目标信息;其中,目标服务器为需要监控的任意服务器;The information acquisition module 11 is configured to: in response to changes in the hardware of the target server, obtain hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
监控调整模块12,用于:基于目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对目标服务器的监控项信息进行相应的调整;其中,监控项存储系统中存储有监控服务器时能够使用的全部监控项;The monitoring adjustment module 12 is used to: match the corresponding monitoring items from the preset monitoring item storage system based on the target information, and make corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items; wherein, the monitoring item storage All monitoring items that can be used when monitoring the server are stored in the system;
硬件监控模块13,用于:基于目标服务器的监控项信息实现对目标服务器的监控。The hardware monitoring module 13 is used to monitor the target server based on the monitoring item information of the target server.
本申请实施例提供的一种智能监控微调整装置,监控调整模块可以包括:An embodiment of the present application provides an intelligent monitoring and fine-adjustment device. The monitoring and adjustment module may include:
监控调整单元,用于:响应于目标服务器的硬件发生变动具体为增加硬件,将匹配到的监控项对应信息添加至目标服务器的监控项信息中,或,响应于目标服务器的硬件发生变动具体为删除硬件,将匹配到的监控项对应信息从目标服务器的监控项信息中删除。The monitoring adjustment unit is used to: respond to changes in the hardware of the target server, specifically by adding hardware, and add matching information corresponding to the monitoring items to the monitoring item information of the target server, or, in response to changes in the hardware of the target server, specifically by: Delete the hardware and delete the corresponding information of the matched monitoring items from the monitoring item information of the target server.
本申请实施例提供的一种智能监控微调整装置,还可以包括:The intelligent monitoring and fine-adjustment device provided by the embodiment of the present application may also include:
压力检测模块,用于:响应于目标服务器的硬件发生变动具体为增加硬件,且增加的为与目标服务器中已存在硬件的品牌及配置均相同的硬件,检测增加硬件后对目标服务器进行监控所增加的系统压力,以及,响应于所增加的系统压力达到压力涨幅阈值,输出相应的扩容配置提示信息;其中,系统压力包括CPU负载和/或内存利用率和/或网口速率。The stress detection module is used to: respond to changes in the hardware of the target server, specifically adding hardware, and the added hardware is the same brand and configuration as the hardware that already exists in the target server, and detect the changes in the target server after the hardware is added. Increased system pressure, and in response to the increased system pressure reaching the pressure increase threshold, corresponding expansion configuration prompt information is output; where the system pressure includes CPU load and/or memory utilization and/or network port rate.
本申请实施例提供的一种智能监控微调整装置,还可以包括:The intelligent monitoring and fine-adjustment device provided by the embodiment of the present application may also include:
阈值调整模块,用于:响应于目标服务器的硬件发生变动具体为增加硬件,且增加的为与目标服务器中已存在硬件的品牌不同和/或配置不同的硬件,获取用户对应增加的硬件的告警设置信息,并基于告警设置信息对增加的硬件对应监控项的告警阈值进行相 应的调整。The threshold adjustment module is used to: in response to changes in the hardware of the target server, specifically adding hardware, and the added hardware is of a different brand and/or configuration from the existing hardware in the target server, and obtaining user alarms corresponding to the added hardware. Set the information, and make corresponding adjustments to the alarm thresholds of the added hardware corresponding monitoring items based on the alarm setting information.
本申请实施例提供的一种智能监控微调整装置,阈值调整模块可以包括:In an intelligent monitoring fine-adjustment device provided by an embodiment of the present application, the threshold adjustment module may include:
阈值调整单元,用于:响应于告警配置信息表示用户未自主修改增加的硬件对应监控项的告警阈值,选择阈值不变策略,并基于阈值不变策略保持当前使用的相应监控项告警阈值不变;或,响应于告警配置信息表示用户自主修改增加的硬件对应监控项的告警阈值,选择余量不变策略,并基于余量不变策略设置增加的硬件对应监控项的告警阈值,使调整后的告警阈值与调整前使用的相应告警阈值需告警的监控项对应数值范围相同。A threshold adjustment unit configured to: respond to the alarm configuration information indicating that the user has not independently modified the alarm threshold of the monitoring item corresponding to the added hardware, select a threshold unchanged policy, and keep the alarm threshold of the currently used corresponding monitoring item unchanged based on the threshold unchanged policy ; Or, in response to the alarm configuration information indicating that the user voluntarily modified the alarm threshold of the added hardware corresponding monitoring item, select the margin unchanged policy, and set the alarm threshold of the added hardware corresponding monitoring item based on the margin unchanged policy, so that the adjusted The alarm threshold is in the same numerical range as the corresponding alarm threshold used before adjustment, and the corresponding numerical range of the monitoring items to be alarmed is the same.
本申请实施例提供的一种智能监控微调整装置,还可以包括:The intelligent monitoring and fine-adjustment device provided by the embodiment of the present application may also include:
状态监控模块,用于:监控目标服务器上各硬件的硬件状态,响应于监控到目标服务器上任意硬件的硬件状态由正常状态转为告警状态,基于该任意硬件告警状态的等级降低该任意硬件对应监控项数据的采集频率或者停止该任意硬件对应监控项数据的采集。The status monitoring module is used to: monitor the hardware status of each hardware on the target server, and in response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm status, and reduce the corresponding level of any hardware based on the alarm status of any hardware. The collection frequency of monitoring item data or stopping the collection of monitoring item data corresponding to any hardware.
本申请实施例提供的一种智能监控微调整装置,还可以包括:The intelligent monitoring and fine-adjustment device provided by the embodiment of the present application may also include:
初始化模块,用于:在添加新的服务器时,确定该新的服务器为目标服务器,获取目标服务器上各硬件的硬件信息,将监控项存储系统中全部监控项的信息加入至目标服务器的监控项信息,并将目标服务器的监控项信息中、与目标服务器上任意硬件的硬件信息均不匹配的监控项对应信息删除。The initialization module is used to: when adding a new server, determine the new server as the target server, obtain the hardware information of each hardware on the target server, and add the information of all monitoring items in the monitoring item storage system to the monitoring items of the target server. information, and delete the information corresponding to the monitoring items in the target server's monitoring item information that does not match the hardware information of any hardware on the target server.
本申请实施例还提供了一种智能监控微调整设备,可以包括:The embodiment of the present application also provides an intelligent monitoring and fine-tuning device, which may include:
存储器,用于存储计算机可读指令;memory for storing computer-readable instructions;
一个或多个处理器,用于执行计算机可读指令时实现如上任一项智能监控微调整方法的步骤。One or more processors are used to implement the steps of any of the above intelligent monitoring and fine-tuning methods when executing computer-readable instructions.
进一步地,智能监控微调整设备的内部结构图可以如图4所示。该智能监控微调整设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该智能监控微调整设备的处理器用于提供计算和控制能力。该智能监控微调整设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该智能监控微调整设备的数据库用于存储获取到的目标服务器发生变动的硬件对应硬件信息等数据,具体存储的数据还可以参见上述方法实施例中的限定。该智能监控微调整设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种运输路径确定方法。Further, the internal structure diagram of the intelligent monitoring and fine-tuning device can be shown in Figure 4. The intelligent monitoring and fine-tuning device includes a processor, memory, network interface and database connected through a system bus. Among them, the processor of the intelligent monitoring and fine-tuning device is used to provide computing and control capabilities. The memory of the intelligent monitoring and fine-tuning device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The database of the intelligent monitoring and fine-tuning device is used to store the obtained data such as hardware information corresponding to the changed hardware of the target server. For specific stored data, please refer to the limitations in the above method embodiments. The network interface of the intelligent monitoring and fine-tuning device is used to communicate with external terminals through a network connection. The computer program implements a transportation path determination method when executed by the processor.
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结 构的框图,并不构成对本申请方案所应用于其上的智能监控微调整设备的限定,具体的智能监控微调整设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the intelligent monitoring and fine-tuning equipment to which the solution of the present application is applied. Specifically, The intelligent monitoring fine-tuning device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
本申请实施例还提供了一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质上存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时实现如上任一项智能监控微调整方法的步骤。Embodiments of the present application also provide a non-volatile computer-readable storage medium. Computer-readable instructions are stored on the non-volatile computer-readable storage medium. When the computer-readable instructions are executed by one or more processors, Steps to implement any of the above intelligent monitoring and fine-tuning methods.
需要说明的是,本申请实施例提供的一种智能监控微调整装置、设备及存储介质中相关部分的说明请参见本申请实施例提供的一种智能监控微调整方法中对应部分的详细说明,在此不再赘述。另外,本申请实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明,以免过多赘述。It should be noted that, for descriptions of the relevant parts of the intelligent monitoring fine-adjustment device, equipment and storage medium provided by the embodiments of the present application, please refer to the detailed description of the corresponding parts of the intelligent monitoring fine-adjustment method provided by the embodiments of the present application. I won’t go into details here. In addition, the parts of the above technical solutions provided by the embodiments of the present application that are consistent with the implementation principles of the corresponding technical solutions in the prior art have not been described in detail to avoid excessive redundancy.
对所公开的实施例的上述说明,使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the application. Therefore, the present application is not to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (20)

  1. 一种智能监控微调整方法,其特征在于,包括:An intelligent monitoring and fine-tuning method, characterized by including:
    响应于目标服务器的硬件发生变动,获取所述目标服务器发生变动的硬件对应硬件信息为目标信息;其中,所述目标服务器为需要监控的任意服务器;In response to a change in the hardware of the target server, obtaining hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
    基于所述目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对所述目标服务器的监控项信息进行相应的调整;其中,所述监控项存储系统中存储有监控服务器时能够使用的全部监控项;及Match the corresponding monitoring items from the preset monitoring item storage system based on the target information, and make corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items; wherein, in the monitoring item storage system All monitoring items that can be used when a monitoring server is stored; and
    基于所述目标服务器的监控项信息实现对所述目标服务器的监控。The target server is monitored based on the monitoring item information of the target server.
  2. 根据权利要求1所述的方法,其特征在于,基于匹配到的监控项对所述目标服务器的监控项信息进行相应的调整,包括:The method according to claim 1, characterized in that, based on the matched monitoring items, the monitoring item information of the target server is adjusted accordingly, including:
    响应于所述目标服务器的硬件发生变动具体为增加硬件,将匹配到的监控项对应信息添加至所述目标服务器的监控项信息中。In response to a change in the hardware of the target server, specifically adding hardware, the matched monitoring item corresponding information is added to the monitoring item information of the target server.
  3. 根据权利要求1所述的方法,其特征在于,基于匹配到的监控项对所述目标服务器的监控项信息进行相应的调整,包括:The method according to claim 1, characterized in that, based on the matched monitoring items, the monitoring item information of the target server is adjusted accordingly, including:
    响应于所述目标服务器的硬件发生变动具体为删除硬件,将匹配到的监控项对应信息从所述目标服务器的监控项信息中删除。In response to a change in the hardware of the target server, specifically deleting the hardware, the matched information corresponding to the monitoring item is deleted from the monitoring item information of the target server.
  4. 根据权利要求2所述的方法,其特征在于,还包括:The method according to claim 2, further comprising:
    响应于所述目标服务器的硬件发生变动具体为增加硬件,且增加的为与所述目标服务器中已存在硬件的品牌及配置均相同的硬件,检测增加硬件后对所述目标服务器进行监控所增加的系统压力。In response to the change in the hardware of the target server, specifically adding hardware, and the added hardware is the same brand and configuration as the hardware that already exists in the target server, detecting the added hardware and then monitoring the added hardware on the target server. system pressure.
  5. 根据权利要求4所述的方法,其特征在于,检测增加硬件后对所述目标服务器进行监控所增加的系统压力之后还包括:The method according to claim 4, characterized in that after detecting the increased system pressure caused by monitoring the target server after adding hardware, it further includes:
    响应于所增加的系统压力达到压力涨幅阈值,输出相应的扩容配置提示信息;其中,所述系统压力包括CPU负载和/或内存利用率和/或网口速率。In response to the increased system pressure reaching the pressure increase threshold, corresponding expansion configuration prompt information is output; wherein the system pressure includes CPU load and/or memory utilization and/or network port rate.
  6. 根据权利要求2所述的方法,其特征在于,还包括:The method according to claim 2, further comprising:
    响应于所述目标服务器的硬件发生变动具体为增加硬件,且增加的为与所述目标服务器中已存在硬件的品牌不同和/或配置不同的硬件,获取用户对应增加的硬件的告警设置信息,并基于所述告警设置信息对增加的硬件对应监控项的告警阈值进行相应的调整。In response to a change in the hardware of the target server, specifically adding hardware, and the added hardware is of a different brand and/or configuration from the hardware that already exists in the target server, obtain the user's alarm setting information corresponding to the added hardware, And based on the alarm setting information, the alarm thresholds of the added hardware corresponding monitoring items are adjusted accordingly.
  7. 根据权利要求6所述的方法,其特征在于,基于所述告警设置信息对增加的硬件对应监控项的告警阈值进行相应的调整,包括:The method according to claim 6, characterized in that, based on the alarm setting information, the alarm threshold of the added hardware corresponding monitoring item is adjusted accordingly, including:
    响应于所述告警配置信息表示用户未自主修改增加的硬件对应监控项的告警阈值,选择阈值不变策略,并基于所述阈值不变策略保持当前使用的相应监控项告警阈值不变。In response to the alarm configuration information indicating that the user has not independently modified the alarm threshold of the added hardware corresponding monitoring item, select a threshold unchanged policy, and keep the currently used alarm threshold of the corresponding monitoring item unchanged based on the threshold unchanged policy.
  8. 根据权利要求6所述的方法,其特征在于,基于所述告警设置信息对增加的硬件对应监控项的告警阈值进行相应的调整,包括:The method according to claim 6, characterized in that, based on the alarm setting information, the alarm threshold of the added hardware corresponding monitoring item is adjusted accordingly, including:
    响应于所述告警配置信息表示用户自主修改增加的硬件对应监控项的告警阈值,选择余量不变策略,并基于所述余量不变策略设置增加的硬件对应监控项的告警阈值,使调整后的告警阈值与调整前使用的相应告警阈值需告警的监控项对应数值范围相同。In response to the alarm configuration information indicating that the user voluntarily modified the alarm threshold of the added hardware-corresponding monitoring item, select a margin-invariant strategy, and set the alarm threshold of the added hardware-corresponding monitoring item based on the margin-invariant strategy, so that the adjustment The alarm threshold after adjustment is the same as the corresponding alarm threshold used before adjustment, and the corresponding value range of the monitoring items that need to be alarmed is the same.
  9. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    监控所述目标服务器上各硬件的硬件状态;Monitor the hardware status of each hardware on the target server;
    响应于监控到所述目标服务器上任意硬件的硬件状态由正常状态转为告警状态,基于该任意硬件告警状态的等级降低该任意硬件对应监控项数据的采集频率。In response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm status, the collection frequency of monitoring item data corresponding to the arbitrary hardware is reduced based on the level of the alarm status of the arbitrary hardware.
  10. 根据权利要求1或9所述的方法,其特征在于,还包括:The method according to claim 1 or 9, further comprising:
    监控所述目标服务器上各硬件的硬件状态,响应于监控到所述目标服务器上任意硬件的硬件状态由正常状态转为告警状态,基于该任意硬件告警状态的等级停止该任意硬件对应监控项数据的采集。Monitor the hardware status of each hardware on the target server, and in response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm status, stop the monitoring item data corresponding to the arbitrary hardware based on the level of the alarm status of the arbitrary hardware collection.
  11. 根据权利要求1至10任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 10, further comprising:
    在添加新的服务器时,确定该新的服务器为目标服务器,获取所述目标服务器上各硬件的硬件信息,将所述监控项存储系统中全部监控项的信息加入至所述目标服务器的监控项信息。When adding a new server, determine the new server as the target server, obtain the hardware information of each hardware on the target server, and add the information of all monitoring items in the monitoring item storage system to the monitoring items of the target server information.
  12. 根据权利要求11所述的方法,其特征在于,还包括:将所述目标服务器的监控项信息中、与所述目标服务器上任意硬件的硬件信息均不匹配的监控项对应信息删除。The method according to claim 11, further comprising: deleting the information corresponding to the monitoring items in the monitoring item information of the target server that does not match the hardware information of any hardware on the target server.
  13. 根据权利要求2所述的方法,其特征在于,所述监控项信息为脚本;The method according to claim 2, characterized in that the monitoring item information is a script;
    将匹配到的监控项对应信息添加至所述目标服务器的监控项信息中,包括:Add the corresponding information of the matched monitoring items to the monitoring item information of the target server, including:
    将匹配到的监控项脚本添加至所述目标服务器全部的监控项脚本中。Add the matched monitoring item script to all monitoring item scripts on the target server.
  14. 根据权利要求3所述的方法,其特征在于,所述监控项信息为脚本;The method according to claim 3, characterized in that the monitoring item information is a script;
    将匹配到的监控项对应信息从所述目标服务器的监控项信息中删除,包括:Delete the matching information corresponding to the monitoring items from the monitoring item information of the target server, including:
    将匹配到的监控项脚本从所述目标服务器的全部监控项脚本中删除。Delete the matching monitoring item script from all monitoring item scripts on the target server.
  15. 根据权利要求9所述的方法,其特征在于,响应于监控到所述目标服务器上任意硬件的硬件状态由正常状态转为告警状态,所述方法还包括:The method according to claim 9, characterized in that, in response to monitoring that the hardware status of any hardware on the target server changes from a normal state to an alarm state, the method further includes:
    基于该任意硬件告警状态的等级,降低其它监控项的指标数据的采集频率或者停止其它监控项的指标数据的采集;其它监控项是指所述目标服务器上的该任意硬件之外的 其它硬件对应的监控项。Based on the level of the alarm status of any hardware, reduce the collection frequency of indicator data of other monitoring items or stop the collection of indicator data of other monitoring items; other monitoring items refer to other hardware corresponding to the arbitrary hardware on the target server. monitoring items.
  16. 根据权利要求15所述的方法,其特征在于,所述告警状态的等级包括轻微告警、中度告警和严重告警;The method according to claim 15, characterized in that the levels of the alarm status include minor alarm, moderate alarm and severe alarm;
    基于该任意硬件告警状态的等级,降低其它监控项的指标数据的采集频率,包括:Based on the level of any hardware alarm status, reduce the collection frequency of indicator data of other monitoring items, including:
    响应于该任意硬件告警状态的等级为轻微告警,降低其它监控项的指标数据的采集频率。In response to the level of any hardware alarm status being minor alarm, reduce the collection frequency of indicator data of other monitoring items.
  17. 根据权利要求16所述的方法,其特征在于,基于该任意硬件告警状态的等级停止其它监控项的指标数据的采集,包括:The method according to claim 16, characterized in that stopping the collection of indicator data of other monitoring items based on the level of the arbitrary hardware alarm status includes:
    响应于该任意硬件告警状态的等级为中度告警或严重告警,停止其它监控项的指标数据的采集。In response to the level of any hardware alarm status being a moderate alarm or a severe alarm, the collection of indicator data of other monitoring items is stopped.
  18. 一种智能监控微调整装置,其特征在于,包括:An intelligent monitoring and fine-tuning device, characterized by including:
    信息获取模块,用于:响应于目标服务器的硬件发生变动,获取所述目标服务器发生变动的硬件对应硬件信息为目标信息;其中,所述目标服务器为需要监控的任意服务器;An information acquisition module, configured to: in response to changes in the hardware of the target server, obtain hardware information corresponding to the changed hardware of the target server as target information; wherein the target server is any server that needs to be monitored;
    监控调整模块,用于:基于所述目标信息从预设的监控项存储系统中匹配相应的监控项,并基于匹配到的监控项对所述目标服务器的监控项信息进行相应的调整;其中,所述监控项存储系统中存储有监控服务器时能够使用的全部监控项;及A monitoring adjustment module configured to: match corresponding monitoring items from a preset monitoring item storage system based on the target information, and make corresponding adjustments to the monitoring item information of the target server based on the matched monitoring items; wherein, All monitoring items that can be used when the monitoring server is stored in the monitoring item storage system; and
    硬件监控模块,用于:基于所述目标服务器的监控项信息实现对所述目标服务器的监控。A hardware monitoring module, configured to monitor the target server based on the monitoring item information of the target server.
  19. 一种智能监控微调整设备,其特征在于,包括:An intelligent monitoring and fine-tuning device, characterized by including:
    存储器,用于存储计算机可读指令;memory for storing computer-readable instructions;
    一个或多个处理器,用于执行所述计算机可读指令时实现如权利要求1至13任一项所述智能监控微调整方法的步骤。One or more processors, configured to implement the steps of the intelligent monitoring fine-tuning method according to any one of claims 1 to 13 when executing the computer-readable instructions.
  20. 一种非易失性计算机可读存储介质,其特征在于,所述非易失性计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现如权利要求1至13任一项所述智能监控微调整方法的步骤。A non-volatile computer-readable storage medium, characterized in that computer-readable instructions are stored on the non-volatile computer-readable storage medium, and when the computer-readable instructions are executed by one or more processors, The steps of implementing the intelligent monitoring and fine-tuning method according to any one of claims 1 to 13.
PCT/CN2022/115312 2022-03-23 2022-08-26 Intelligent monitoring micro-adjustment method and apparatus, device, and storage medium WO2023178923A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210285156.3A CN114389971B (en) 2022-03-23 2022-03-23 Intelligent monitoring fine adjustment method, device, equipment and storage medium
CN202210285156.3 2022-03-23

Publications (1)

Publication Number Publication Date
WO2023178923A1 true WO2023178923A1 (en) 2023-09-28

Family

ID=81205059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115312 WO2023178923A1 (en) 2022-03-23 2022-08-26 Intelligent monitoring micro-adjustment method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114389971B (en)
WO (1) WO2023178923A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389971B (en) * 2022-03-23 2022-12-23 苏州浪潮智能科技有限公司 Intelligent monitoring fine adjustment method, device, equipment and storage medium
CN116701104B (en) * 2023-05-05 2024-01-26 北京瑞祺皓迪技术股份有限公司 Algorithm adjustment method, device and monitoring system in edge monitoring equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017063505A1 (en) * 2015-10-16 2017-04-20 中兴通讯股份有限公司 Method for detecting hardware fault of server, apparatus thereof, and server
CN106961352A (en) * 2017-03-29 2017-07-18 努比亚技术有限公司 Monitoring system and monitoring method
CN108959009A (en) * 2018-07-26 2018-12-07 郑州云海信息技术有限公司 A kind of server failure analysis method and its fail analysis device
CN110908862A (en) * 2019-11-08 2020-03-24 北京浪潮数据技术有限公司 Monitoring method and device, electronic equipment and storage medium
CN112311617A (en) * 2019-08-02 2021-02-02 中国移动通信有限公司政企客户分公司 Configured data monitoring and alarming method and system
CN112346924A (en) * 2020-09-21 2021-02-09 西安交大捷普网络科技有限公司 Server monitoring method and system
WO2021139308A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Cloud server monitoring method, apparatus and device, and storage medium
CN113518002A (en) * 2021-05-24 2021-10-19 平安普惠企业管理有限公司 Monitoring method, device, equipment and storage medium based on server-free platform
CN114389971A (en) * 2022-03-23 2022-04-22 苏州浪潮智能科技有限公司 Intelligent monitoring fine adjustment method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9741017B2 (en) * 2009-12-08 2017-08-22 Tripwire, Inc. Interpreting categorized change information in order to build and maintain change catalogs
US9887879B2 (en) * 2015-02-13 2018-02-06 Canon Kabushiki Kaisha Monitoring apparatus and method
CN108717391B (en) * 2018-05-16 2021-09-28 平安科技(深圳)有限公司 Monitoring device and method for test process and computer readable storage medium
CN109491866A (en) * 2018-11-09 2019-03-19 郑州云海信息技术有限公司 Monitor method, apparatus, terminal and the computer readable storage medium of storage hardware
CN112650648A (en) * 2020-12-30 2021-04-13 杭州趣链科技有限公司 Monitoring method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017063505A1 (en) * 2015-10-16 2017-04-20 中兴通讯股份有限公司 Method for detecting hardware fault of server, apparatus thereof, and server
CN106961352A (en) * 2017-03-29 2017-07-18 努比亚技术有限公司 Monitoring system and monitoring method
CN108959009A (en) * 2018-07-26 2018-12-07 郑州云海信息技术有限公司 A kind of server failure analysis method and its fail analysis device
CN112311617A (en) * 2019-08-02 2021-02-02 中国移动通信有限公司政企客户分公司 Configured data monitoring and alarming method and system
CN110908862A (en) * 2019-11-08 2020-03-24 北京浪潮数据技术有限公司 Monitoring method and device, electronic equipment and storage medium
WO2021139308A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Cloud server monitoring method, apparatus and device, and storage medium
CN112346924A (en) * 2020-09-21 2021-02-09 西安交大捷普网络科技有限公司 Server monitoring method and system
CN113518002A (en) * 2021-05-24 2021-10-19 平安普惠企业管理有限公司 Monitoring method, device, equipment and storage medium based on server-free platform
CN114389971A (en) * 2022-03-23 2022-04-22 苏州浪潮智能科技有限公司 Intelligent monitoring fine adjustment method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114389971A (en) 2022-04-22
CN114389971B (en) 2022-12-23

Similar Documents

Publication Publication Date Title
WO2023178923A1 (en) Intelligent monitoring micro-adjustment method and apparatus, device, and storage medium
US10606725B2 (en) Monitor peripheral device based on imported data
CN113826073B (en) Dynamically configurable baseboard management controller
US8656003B2 (en) Method for controlling rack system using RMC to determine type of node based on FRU's message when status of chassis is changed
US8332670B2 (en) Method and apparatus for discovery and detection of relationship between device and power distribution outlet
WO2020000745A1 (en) Log management method and apparatus, computer device, and storage medium
US20100332661A1 (en) Computer System and Its Operation Information Management Method
CN104699589B (en) Fan fault detection system and method
US8554906B2 (en) System management method in computer system and management system
WO2020000760A1 (en) Server management method and device, computer apparatus, and storage medium
CN111130962B (en) Automatic configuration method, equipment, system and storage medium for switch
US8140913B2 (en) Apparatus and method for monitoring computer system, taking dependencies into consideration
CN111400121A (en) Server hard disk slot positioning and maintaining method
WO2023179684A1 (en) Method and apparatus for monitoring state of central processing unit, and device and storage medium
CN108959025A (en) A kind of server alarm method, device and server
US8819481B2 (en) Managing storage providers in a clustered appliance environment
CN115643163A (en) Fault equipment positioning method, device, equipment and storage medium
US20210173797A1 (en) Method, apparatus, and device for transmitting file based on bmc, and medium
WO2019148831A1 (en) Method, device, and apparatus for heat dissipation regulation of rack scale server, and storage medium
US8671186B2 (en) Computer system management method and management apparatus
US11237892B1 (en) Obtaining data for fault identification
TW202221525A (en) Integrated control management system
CN111416721A (en) Far-end eliminating method for abnormal state of cabinet applied to data center
CN110069293A (en) A kind of mthods, systems and devices of upgrade server software
US11474904B2 (en) Software-defined suspected storage drive failure identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932975

Country of ref document: EP

Kind code of ref document: A1