CN118295892A - Method, apparatus, device and computer readable medium for monitoring data - Google Patents

Method, apparatus, device and computer readable medium for monitoring data Download PDF

Info

Publication number
CN118295892A
CN118295892A CN202410525933.6A CN202410525933A CN118295892A CN 118295892 A CN118295892 A CN 118295892A CN 202410525933 A CN202410525933 A CN 202410525933A CN 118295892 A CN118295892 A CN 118295892A
Authority
CN
China
Prior art keywords
hardware
hardware equipment
data
running state
specification parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410525933.6A
Other languages
Chinese (zh)
Inventor
王帅兵
董可新
陈国峰
孙彬彬
高新路
任杰轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Publication of CN118295892A publication Critical patent/CN118295892A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for monitoring data, and relates to the technical field of computers. One embodiment of the method comprises the following steps: the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period; the gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example; the identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults. The embodiment can realize centralized large-scale server monitoring.

Description

Method, apparatus, device and computer readable medium for monitoring data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for monitoring data.
Background
Currently, the most used server monitoring technology in data centers is based on out-of-band monitoring of the baseboard management controller (baseboard management controller, BMC). The out-of-band monitoring mode may use a variety of types.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: centralized large-scale server monitoring cannot be achieved by means of BMC.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, apparatus, device, and computer readable medium for monitoring data, which can implement centralized large-scale server monitoring.
To achieve the above object, according to one aspect of the embodiments of the present invention, there is provided a method of monitoring data, including:
the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period;
The gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example;
The identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults.
The collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period, and the method comprises the following steps:
After the collector is adapted to the hardware equipment, the specification parameters of the hardware equipment and the current running state data of the hardware equipment are obtained according to a data monitoring period;
And the collector sends the specification parameters of the hardware equipment and the current running state data of the hardware equipment to a gateway after load balancing in a preset data format, wherein the gateway after load balancing corresponds to a plurality of collectors.
The collector sends the specification parameters of the hardware equipment and the current running state data of the hardware equipment to a gateway after load balancing in a preset data format, and the method comprises the following steps:
the collector converts the format of the specification parameters of the hardware equipment and the format of the current running state data of the hardware equipment, and then matches keywords to obtain the specification parameters of the hardware equipment in a preset data format and the current running state data of the hardware equipment in the preset data format;
And the collector sends the specification parameters of the hardware equipment in a preset data format and the current running state data of the hardware equipment in the preset data format to a gateway after load balancing.
The gateway after load balancing stores the specification parameters of the hardware device and the current running state data of the hardware device into a message queue through configured examples, and the gateway comprises:
And the gateway running mirror image starting example after load balancing asynchronously stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue.
The identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment, and sends an alarm of the hardware faults, and the method comprises the following steps:
the identification end screens out a fault identification field of the hardware equipment in a database according to the specification parameters of the hardware equipment obtained from the message queue;
And successfully matching the current running state of the hardware equipment by using the fault identification field of the hardware equipment, identifying the hardware fault based on the successfully matched fault identification field, and sending an alarm of the hardware fault.
The method further comprises the steps of:
Extracting current operation parameters from current operation state data of the hardware equipment according to the specification parameters of the hardware equipment;
And predicting the hardware fault type of the hardware equipment in a preset time period by adopting a machine learning model according to the historical operation parameters of the hardware equipment and the current operation parameters.
The operation parameters comprise one or more of the following, remapped sector count, bad block increment count, wear balance operation frequency current sector count to be mapped and read error block count;
the hardware failure type includes a hard disk failure type.
According to a second aspect of an embodiment of the present invention, there is provided an apparatus for monitoring data, including:
the collector is used for acquiring and pushing the specification parameters of the hardware equipment and the current running state data of the hardware equipment in a preset data format according to the data monitoring period to the gateway after load balancing;
the gateway is used for storing the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through configured examples;
and the identification end is used for screening out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify the hardware fault in combination with the current running state of the hardware equipment and send an alarm of the hardware fault.
According to a third aspect of an embodiment of the present invention, there is provided an electronic device that monitors data, including:
one or more processors;
Storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as described above.
One embodiment of the above invention has the following advantages or benefits: the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period; the gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example; the identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults. The collector can acquire specification parameters and running state data of various hardware devices, and process the parameters and the data on the basis, so that centralized large-scale server monitoring can be realized.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow chart of a method of monitoring data according to an embodiment of the invention;
FIG. 2 is a flow chart of acquiring and pushing specification parameters of a hardware device and current operating state data of the hardware device according to an embodiment of the present invention;
FIG. 3 is a flow diagram of converting the format of specification parameters of a hardware device and the format of current operating state data of the hardware device according to an embodiment of the present invention;
FIG. 4 is a flow diagram of sending an alert of a hardware failure according to an embodiment of the invention;
FIG. 5 is a flow diagram of predicting a hardware failure type of a hardware device according to an embodiment of the invention;
FIG. 6 is a schematic application diagram of a method of monitoring data according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the main structure of an apparatus for monitoring data according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 9 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The intelligent platform management interface (INTELLIGENT PLATFORM MANAGEMENT INTERFACE, IPMI) is a new generation of universal interface standard that enables "intelligent" hardware management. The user may monitor the physical characteristics of the server using IPMI, such as: temperature, voltage, operating status of the fan, power supply, chassis intrusion, etc. The greatest advantage of the IPMI is that the IPMI is independent of the CPU BIOS and the OS, so that a user can monitor a server by only switching on a power supply no matter in a power-on state or a power-off state. IPMI is a standard of specification, with the most important physical component being BMC. BMC it is equivalent to the "brain" managed by the entire platform, through which IPMI can monitor the data of various sensors and log various events.
The device supports a simple network management protocol (Simple Network Management Protocol, SNMP). SNMP is an application layer protocol that monitors and manages network devices through a standard framework, common language, and corresponding security mechanisms. The architecture of SNMP includes four parts, a network management platform, an SNMP agent, a network management protocol, and a management information base (MANAGEMENT INFORMATION BASE, MIB).
1) The network management platform is a platform which is managed by network management software, such as: ADVENTNET, SOLARWINDS, etc. The network management platform sends Get message and Set message to SNMP agent and receives agent response to achieve the purpose of managing and monitoring network equipment.
2) The SNMP agent is a software module running on the managed network device that maintains the information data of the managed device and sends the management data to the network management platform when needed.
3) Network management protocol: the network management platform and the SNMP agent are connected through a network management protocol, and exchange information through the form of SNMP messages. The protocol mainly supports three functions of Get, set and Trap, get is used for the management platform to obtain MIB object value of the agent, set is used for the management platform to Set MIB object value of the agent, and Trap is used for the agent to notify important events to the management platform.
4) The Management Information Base (MIB) is an information database maintained by the SNMP agent regarding network devices, where the contents of the information base may be used by the network management platform to query or set the values of variables therein.
Redfish is an open industry standard specification that provides simple, modern, and secure management functions for extensible platform hardware. It is a hypermedia API so it can represent various implementations through a consistent interface. It has mechanisms to manage data center resources, handle events, long-term tasks, and discovery. After the unified management interface specification is popularized based on Redfish, a large amount of adaptation, development and test work brought by different server hardware management interfaces can be effectively reduced in the future.
The existing server BMC monitoring technology has no standard centralized management architecture, implementation method and available platform system, and centralized large-scale server management and monitoring cannot be realized by means of the existing BMC functional module. The main defects are that the monitoring log information obtained by using the IPMI monitoring is incomplete and cannot be related to the specific SN number of the fault equipment, and the SMART information of the hard disk cannot be checked. The different brands of the threshold value and the mode method for judging the equipment faults by each server manufacturer are different and compatible, so that CE error log information of the memory cannot be obtained. The SNMP of the prior art has performance bottleneck, and the collected data has upper line 64K limit. The prior art Redfish is not mature, the stock server does not support Redfish, and the BMC technology cannot realize the prediction of hard disk fault data.
In summary, centralized large-scale server management and monitoring cannot be achieved by means of BMC.
In order to solve the problem that centralized large-scale server management and monitoring cannot be realized by means of BMC, the following technical scheme in the embodiment of the invention can be adopted.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for monitoring data according to an embodiment of the present invention, wherein hardware faults are identified based on specification parameters and operation status data. As shown in fig. 1 and 100, the method specifically comprises the following steps:
s101, the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to the gateway after load balancing according to the data monitoring period.
The technical scheme of the embodiment of the invention is applied to monitoring of a plurality of servers. A server is an indispensable device for providing services. The server includes various hardware devices, such as: a hard disk; a memory; a CUP; an array card; a network card; a GPU; fans, power supplies, etc. The normal operation of the server needs to ensure that a plurality of hardware devices in the server work normally. Therefore, data of the hardware device needs to be collected.
Referring to fig. 2 and 200, fig. 2 is a schematic flow chart of acquiring and pushing specification parameters of a hardware device and current operation state data of the hardware device according to an embodiment of the present invention. The method specifically comprises the following steps:
S201, after the collector is adapted to the hardware equipment, the specification parameters of the hardware equipment and the current running state data of the hardware equipment are obtained according to the data monitoring period.
In an embodiment of the invention, the data in the hardware device is collected using a collector. The hardware devices in a server often belong to different server vendors, brands and specifications. The collector needs to adapt to the hardware device.
Specifically, the adaptation of the collector and the hardware device is realized through the bottom layer dependent package. The bottom layer dependency package of the hardware device is stored in advance, and the collector is adapted to the hardware device through the bottom layer dependency package, so that data can be obtained from the hardware device.
As one example, the underlying dependency packages and commands of different server vendors, brands and specifications are adapted through Golang programming language to mask underlying differencing, and compatible legacy and legacy devices operate on underlying hardware devices by issuing Linux instructions to obtain data from the hardware devices. That is, the collector adapts the hardware device through the Golang programming language.
After the collector is adapted to the hardware device, the specification parameters of the hardware device and the current running state data of the hardware device can be obtained according to the data monitoring period.
The data monitoring period is the time interval during which the collector collects data. The data monitoring period may be set based on the actual application scenario. The specification parameter of the hardware device is a production parameter of the hardware device. As one example, the production parameters include: server vendor, server brand, and server specification.
The current operating state data of the hardware device is indicative of the operating condition of the hardware device. The hardware device comprises one or more of the following: CPU, memory, hard disk, GPU, network card, power, fan and array card.
The current operating state data of the hardware device may be acquired in the following manner.
The hardware device comprises a CPU and a memory, and the key word log error level is regularly matched and the current running state data of the hardware device is obtained by looking up edac log logs.
The hardware device comprises a hard disk and an array card, and current running state data of the hardware device are obtained through an array card command and a standard cli tool.
The hardware equipment comprises a CPU, a memory, a hard disk, a GPU, a network card, a power supply, a fan and an array card, and the current running state data of the hardware equipment are obtained by regularly matching keyword log error levels through BMC sel logs.
S202, the collectors send specification parameters of the hardware equipment and current running state data of the hardware equipment to the gateway after load balancing in a preset data format, and the gateway after load balancing corresponds to the collectors.
In one embodiment of the present invention, the collector may convert the data format into a preset data format after acquiring the specification parameters of the hardware device and the current operation state data of the hardware device. The aim is that: avoiding limiting the amount of data and fixed log format of the acquired data.
Referring to fig. 3, 300, fig. 3 is a flowchart illustrating a format of converting specification parameters of a hardware device and a format of current operation state data of the hardware device according to an embodiment of the present invention. The method specifically comprises the following steps:
S301, after the collector converts the format of the specification parameters of the hardware equipment and the format of the current running state data of the hardware equipment, the keyword is used for matching to obtain the specification parameters of the hardware equipment with the preset data format and the current running state data of the hardware equipment with the preset data format.
And the data acquisition is to acquire the specification parameters of the hardware equipment and the current running state data of the hardware equipment by executing a Linux command through an OS library of Golang. After the collector executes the Linux command, the result cache file is stored or temporarily stored in the Golang variable, and the specification parameters of the hardware equipment with the preset data format and the current running state data of the hardware equipment with the preset data format are identified through clearing blank spaces, formatting data, combining and keyword regular matching. As one example, the preset data format includes a JSON structure.
S302, the collector sends the specification parameters of the hardware equipment with the preset data format and the current running state data of the hardware equipment with the preset data format to the gateway after load balancing.
The collector needs to send specification parameters of the hardware equipment with the preset data format and current running state data of the hardware equipment with the preset data format to the gateway after load balancing through a remote procedure call protocol (Remote Procedure Call Protocol, RPC).
In the embodiment of the invention, the gateway after load balancing corresponds to a plurality of collectors. That is, a gateway after load balancing may receive data collected by multiple collectors. As one example, through the IP port of the gateway and the IP address of the gateway, the collector may determine the corresponding gateway. And setting the corresponding relation between the gateway and the plurality of collectors, and realizing load balancing among the plurality of gateways.
In the embodiment of fig. 3, after the collector collects the data of the hardware device, the data is sent to the gateway in a preset data format to realize monitoring of the hardware device.
In one embodiment of the invention, the collector may be started after setting the identification and data monitoring period of the gateway. The gateway pushing the data can be determined according to the identification of the gateway.
S102, the gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through the configured example.
After receiving the specification parameters of the hardware equipment and the current running state data of the hardware equipment, the gateway after load balancing can store the specification parameters of the equipment and the current running state data of the hardware equipment into a message queue.
The message queue stores specification parameters and current running state parameters of a plurality of hardware devices. Specifically, the specification parameters and the current operation state parameters of the hardware device may be stored into the message queue through the configured instance to achieve high availability.
In one embodiment of the invention, the gateway running mirror image starting example after load balancing asynchronously stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into the message queue.
Examples are all environment dependencies that contain the running program. As one example, examples include one of the following: container, virtual machine, and physical machine. The mirror image can be obtained by making a container, and running the mirror image by a container command. The mirror of the run is an example of a docker container. Multiple container instances may be launched running multiple mirror images; multiple instances may be started when a single instance cannot handle a large number of requests, and scheduled to the started container instance by a load balancing algorithm.
The gateway after load balancing runs the mirror image starting instance and processes a large number of specification parameters of the hardware equipment and current running state data of the hardware equipment in time. In addition, in order to avoid influencing the normal operation of the server, the specification parameters and the current operation state parameters of the hardware equipment can be stored into the message queue through the configured example so as to realize the usability.
In one embodiment of the invention, redis and Kafka may be employed to support the processing of message queues.
S103, the identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify the hardware fault by combining the current running state of the hardware equipment and send an alarm of the hardware fault.
In the embodiment of the invention, the hardware fault is identified by the identification terminal on the specification parameters of the hardware equipment and the current running state of the hardware equipment. The identification end can realize asynchronous judgment so as to avoid influencing the operation of hardware equipment.
Referring to fig. 4, 400, fig. 4 is a schematic flow diagram of sending an alarm of a hardware failure according to an embodiment of the present invention. The method specifically comprises the following steps:
S401, the identification end screens out fault identification fields of the hardware equipment in the database according to the specification parameters of the hardware equipment obtained from the message queue.
In an embodiment of the present invention, the failure determination field is stored in advance in the database. As one example, the failure-to-identify field includes one or more of the following: fault name, fault code, fault brand, fault source, fault type, fault accessory, fault level, judgment expression, fault threshold, and fault description.
The identification end obtains the specification parameters of the hardware equipment from the message queue, and matches the fault identification field in the database by taking the specification parameters of the hardware equipment as the basis. The failure determination field is different for different hardware devices, so that the failure determination field of the hardware device needs to be obtained by the specification parameters of the hardware device.
S402, the current running state of the hardware equipment is successfully matched with the fault identification field of the hardware equipment, the hardware fault is identified based on the successfully matched fault identification field, and an alarm of the hardware fault is sent.
After determining the failure identification field of the hardware device, matching the current running state of the hardware device with the failure identification field. As an example, searching for a failure identification field in the current running state of the hardware device, and if the failure identification field is searched, matching is successful; if no fault identification field is searched, the matching fails.
The failure identification field of the hardware device successfully matches the current running state of the hardware device, and then the hardware failure can be identified based on the successfully matched failure identification field. As one example, the failure identification field of the hard disk: and if the number of the bad tracks is greater than 100, the hard disk is determined to be faulty.
And identifying the hardware fault based on the successfully matched fault identification field, and sending an alarm of the hardware fault. As one example, an alert of hardware failure is displayed on a WEB interface through a visual HTML page. The current operating state of the hardware device may also be displayed in the case of an alarm showing a hardware failure.
In the embodiment of fig. 4, the hardware fault is identified by matching the fault identification field with the current running state of the hardware device, so as to monitor the hardware device.
In one embodiment of the invention, the hardware fault type of the hardware device is predicted by using the running state data of the hardware device for a period of time.
Referring to fig. 5, 500, fig. 5 is a schematic flow chart of predicting a hardware failure type of a hardware device according to an embodiment of the present invention. The method specifically comprises the following steps:
S501, extracting current operation parameters from current operation state data of the hardware equipment according to specification parameters of the hardware equipment.
In the embodiment of the invention, the technical scheme in fig. 4 is adopted to identify the current fault of the hardware equipment. The solution in fig. 5 may also be used to predict future failures of hardware devices. For the future faults, early warning and intervention measures can be provided in advance.
Firstly, the specification parameters of the hardware equipment and the current running state data of the hardware equipment are acquired from a message queue, and then the specification parameters of the hardware equipment and the current running state data of the hardware equipment are stored.
As one example, specification parameters of the hardware device and current operating state data of the hardware device are stored in a preset format. The preset format includes: system SN number, system IP, management IP, CPU, memory, CPU core number, hard disk, array card, network card, GPU, make, model, tag, device status, part number, firmware version, and device slot number.
Specifically, the current operation parameters in the current operation state data of the hardware device may be extracted according to the specification parameters of the hardware device.
S502, predicting the hardware fault type of the hardware equipment in a preset time period by adopting a machine learning model according to the historical operation parameters and the current operation parameters of the hardware equipment.
Historical operating parameters may also be extracted for the same hardware device. Thus, the historical operating parameters and the current operating parameters of the hardware device are input into the machine learning model to predict the hardware fault type of the hardware device within a preset time period.
In an embodiment of the invention, the machine learning model is a model trained using hardware fault type training data. The machine learning model is adopted to predict the hardware fault type of the hardware equipment in a preset time period. As one example, the machine learning model is derived based on the following algorithm: random forest, decision tree, SVM, XGBOOST and LTSM algorithms.
In one embodiment of the invention, the hardware device current operating parameters include SMART data. The Self-Monitoring analysis and reporting technique (Self-Monitoring ANALYSIS AND Reporting Technology, SMART) of the hard disk characterizes the operation of the hard disk.
The hardware equipment comprises a hard disk, historical SMART data and current SMART data of the hard disk are input into a machine learning model, and the machine learning model predicts the hardware fault type of the hard disk within 30 days.
In one embodiment of the invention, SMART data, i.e., the operating parameters, include one or more of the following fields: remap sector count, bad block growth count, wear leveling operation number, current to-be-mapped sector count, and read error block count. Wherein the wear leveling operation number is the average erasing number. And predicting the hard disk fault type of the hard disk in a preset time period by adopting the operation parameters. That is, the hardware failure type includes a hard disk failure type.
In the embodiment of fig. 5, hardware fault types may also be predicted to address countermeasures based on historical operating parameters and current operating parameters of the hardware device.
Referring to fig. 6, fig. 6 is an application diagram of a method of monitoring data according to an embodiment of the present invention.
The collector in fig. 6 acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment into the gateway according to the data monitoring period in a preset data format.
The gateway stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through the configured instance.
The identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify the hardware fault by combining the current running state of the hardware equipment and send an alarm of the hardware fault.
The identification end can also predict hardware faults by adopting a machine learning model according to the historical operation parameters and the current operation parameters of the hardware equipment.
In the embodiment of the invention, the collector acquires and pushes the specification parameters of the hardware equipment and the current running state data of the hardware equipment to the gateway after load balancing according to the data monitoring period in a preset data format; the gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example; the identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults. The collector can acquire specification parameters and running state data of various hardware devices, and process the parameters and the data on the basis, so that centralized large-scale server monitoring can be realized.
Referring to fig. 7, fig. 7 is a schematic main structural diagram of a data monitoring device according to an embodiment of the present invention, where the data monitoring device may implement a data monitoring method, as shown in fig. 7 and 700, where the data monitoring device specifically includes:
The collector 701 is configured to acquire and push, according to a data monitoring period, a specification parameter of the hardware device and current running state data of the hardware device in a preset data format, to a gateway after load balancing;
gateway 702, configured to store, in a message queue, a specification parameter of the hardware device and current running state data of the hardware device through a configured instance;
And the identifying end 703 is configured to screen out a fault identification field of the hardware device according to the specification parameter of the hardware device obtained from the message queue, so as to identify a hardware fault in combination with the current running state of the hardware device, and send an alarm of the hardware fault.
In one embodiment of the present invention, the collector 701 is specifically configured to obtain, after adapting to the hardware device, a specification parameter of the hardware device and current operation state data of the hardware device according to a data monitoring period;
And sending the specification parameters of the hardware equipment and the current running state data of the hardware equipment to a gateway after load balancing in a preset data format, wherein the gateway after load balancing corresponds to a plurality of collectors.
In one embodiment of the present invention, the collector 701 is specifically configured to obtain, by using keyword matching, the specification parameter of the hardware device in a preset data format and the current running state data of the hardware device in a preset data format after converting the format of the specification parameter of the hardware device and the format of the current running state data of the hardware device;
and sending the specification parameters of the hardware equipment in the preset data format and the current running state data of the hardware equipment in the preset data format to a gateway after load balancing.
In one embodiment of the present invention, gateway 702 is specifically configured to run a mirror startup instance, and asynchronously store the specification parameters of the hardware device and the current operating state data of the hardware device into a message queue.
In one embodiment of the present invention, the identifying terminal 703 is specifically configured to screen out a failure identification field of the hardware device from a database according to the specification parameter of the hardware device obtained from the message queue;
And successfully matching the current running state of the hardware equipment by using the fault identification field of the hardware equipment, identifying the hardware fault based on the successfully matched fault identification field, and sending an alarm of the hardware fault.
In one embodiment of the present invention, the identifying terminal 703 is further configured to extract a current operation parameter in the current operation state data of the hardware device according to the specification parameter of the hardware device;
And predicting the hardware fault type of the hardware equipment in a preset time period by adopting a machine learning model according to the historical operation parameters of the hardware equipment and the current operation parameters.
In one embodiment of the invention, the operating parameters include one or more of a remapped sector count, a bad block growth count, a wear leveling operation number current sector count to be mapped, and a read error block count;
the hardware failure type includes a hard disk failure type.
Fig. 8 illustrates an exemplary system architecture 800 of a method of monitoring data or an apparatus of monitoring data to which embodiments of the present invention may be applied.
As shown in fig. 8, a system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves as a medium for providing communication links between the terminal devices 801, 802, 803 and the server 805. The network 804 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 805 through the network 804 using the terminal devices 801, 802, 803 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 805 may be a server providing various services, such as a background management server (by way of example only) that provides support for shopping-type websites browsed by users using the terminal devices 801, 802, 803. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for monitoring data provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the device for monitoring data is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, there is illustrated a schematic diagram of a computer system 900 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 901.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor comprises a collector, a gateway and an identification terminal. The names of these modules do not limit the module itself in some cases, for example, the collector may also be described as "used for acquiring and pushing, according to a data monitoring period, the specification parameters of the hardware device and the current operation state data of the hardware device in a preset data format, to the gateway after load balancing".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period;
The gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example;
The identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults.
According to the technical scheme of the embodiment of the invention, the collector acquires and pushes the specification parameters of the hardware equipment and the current running state data of the hardware equipment in a preset data format to the gateway after load balancing according to the data monitoring period; the gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example; the identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults. The collector can acquire specification parameters and running state data of various hardware devices, and process the parameters and the data on the basis, so that centralized large-scale server monitoring can be realized.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention. It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

Claims (10)

1. A method of monitoring data, comprising:
the collector acquires and pushes specification parameters of the hardware equipment and current running state data of the hardware equipment in a preset data format to a gateway after load balancing according to a data monitoring period;
The gateway after load balancing stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through a configured example;
The identification end screens out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify hardware faults in combination with the current running state of the hardware equipment and send alarms of the hardware faults.
2. The method for monitoring data according to claim 1, wherein the acquiring and pushing, by the collector, the specification parameters of the hardware device and the current operation state data of the hardware device in a preset data format according to a data monitoring period, to the gateway after load balancing includes:
After the collector is adapted to the hardware equipment, the specification parameters of the hardware equipment and the current running state data of the hardware equipment are obtained according to a data monitoring period;
And the collector sends the specification parameters of the hardware equipment and the current running state data of the hardware equipment to a gateway after load balancing in a preset data format, wherein the gateway after load balancing corresponds to a plurality of collectors.
3. The method of monitoring data according to claim 2, wherein the collector sends the specification parameters of the hardware device and the current operation state data of the hardware device to the gateway after load balancing in a preset data format, including:
the collector converts the format of the specification parameters of the hardware equipment and the format of the current running state data of the hardware equipment, and then matches keywords to obtain the specification parameters of the hardware equipment in a preset data format and the current running state data of the hardware equipment in the preset data format;
And the collector sends the specification parameters of the hardware equipment in a preset data format and the current running state data of the hardware equipment in the preset data format to a gateway after load balancing.
4. The method for monitoring data according to claim 1, wherein the gateway after load balancing stores the specification parameters of the hardware device and the current operation state data of the hardware device into a message queue through configured instances, including:
And the gateway running mirror image starting example after load balancing asynchronously stores the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue.
5. The method according to claim 1, wherein the identifying means screens out a fault identification field of the hardware device according to the specification parameter of the hardware device obtained from the message queue, so as to identify a hardware fault in combination with a current operation state of the hardware device, and sends an alarm of the hardware fault, including:
the identification end screens out a fault identification field of the hardware equipment in a database according to the specification parameters of the hardware equipment obtained from the message queue;
And successfully matching the current running state of the hardware equipment by using the fault identification field of the hardware equipment, identifying the hardware fault based on the successfully matched fault identification field, and sending an alarm of the hardware fault.
6. The method of monitoring data according to claim 1, wherein the method further comprises:
Extracting current operation parameters from current operation state data of the hardware equipment according to the specification parameters of the hardware equipment;
And predicting the hardware fault type of the hardware equipment in a preset time period by adopting a machine learning model according to the historical operation parameters of the hardware equipment and the current operation parameters.
7. The method of monitoring data of claim 6, wherein the operating parameters include one or more of a remap sector count, a bad block growth count, a wear-leveling operation number current sector count to be mapped, and a read error block count;
the hardware failure type includes a hard disk failure type.
8. An apparatus for monitoring data, comprising:
the collector is used for acquiring and pushing the specification parameters of the hardware equipment and the current running state data of the hardware equipment in a preset data format according to the data monitoring period to the gateway after load balancing;
the gateway is used for storing the specification parameters of the hardware equipment and the current running state data of the hardware equipment into a message queue through configured examples;
and the identification end is used for screening out a fault identification field of the hardware equipment according to the specification parameters of the hardware equipment obtained from the message queue so as to identify the hardware fault in combination with the current running state of the hardware equipment and send an alarm of the hardware fault.
9. An electronic device for monitoring data, comprising:
one or more processors;
Storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202410525933.6A 2024-04-28 Method, apparatus, device and computer readable medium for monitoring data Pending CN118295892A (en)

Publications (1)

Publication Number Publication Date
CN118295892A true CN118295892A (en) 2024-07-05

Family

ID=

Similar Documents

Publication Publication Date Title
US10270644B1 (en) Framework for intelligent automated operations for network, service and customer experience management
US10048996B1 (en) Predicting infrastructure failures in a data center for hosted service mitigation actions
US8533337B2 (en) Continuous upgrading of computers in a load balanced environment
US20200034178A1 (en) Virtualization agnostic orchestration in a virtual computing system
US20140122931A1 (en) Performing diagnostic tests in a data center
CN109684038B (en) Docker service container log processing method and device and electronic equipment
CN111225064A (en) Ceph cluster deployment method, system, device and computer-readable storage medium
WO2021190659A1 (en) System data acquisition method and apparatus, and medium and electronic device
WO2016032442A1 (en) Computer device error instructions
CN111694707A (en) Small server cluster management system and method
CN110896362B (en) Fault detection method and device
CN115640110A (en) Distributed cloud computing system scheduling method and device
CN110221910B (en) Method and apparatus for performing MPI jobs
US20170262032A1 (en) Method and apparatus for controlling a network node
US9729464B1 (en) Method and apparatus for provisioning of resources to support applications and their varying demands
CN113220342A (en) Centralized configuration method and device, electronic equipment and storage medium
US20080216057A1 (en) Recording medium storing monitoring program, monitoring method, and monitoring system
US12020039B2 (en) Compute instance warmup operations
WO2019241199A1 (en) System and method for predictive maintenance of networked devices
CN118295892A (en) Method, apparatus, device and computer readable medium for monitoring data
CN113778780B (en) Application stability determining method and device, electronic equipment and storage medium
CN113656378A (en) Server management method, device and medium
CN110445628B (en) NGINX-based server and deployment and monitoring methods and devices thereof
CN117176613B (en) Data acquisition method and device
US20240111579A1 (en) Termination of sidecar containers

Legal Events

Date Code Title Description
PB01 Publication