CN112286755B - Out-of-band data acquisition method and device for cluster server and computer equipment - Google Patents

Out-of-band data acquisition method and device for cluster server and computer equipment Download PDF

Info

Publication number
CN112286755B
CN112286755B CN202011014602.4A CN202011014602A CN112286755B CN 112286755 B CN112286755 B CN 112286755B CN 202011014602 A CN202011014602 A CN 202011014602A CN 112286755 B CN112286755 B CN 112286755B
Authority
CN
China
Prior art keywords
band data
cluster server
asset
cluster
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011014602.4A
Other languages
Chinese (zh)
Other versions
CN112286755A (en
Inventor
王家尧
张浩龙
原帅
吕灼恒
王雄斌
周军
李斌
沙超群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Shuguang International Information Industry Co ltd
Zhongke Sugon Information Industry Chengdu Co ltd
Dawning Information Industry Co Ltd
Original Assignee
Zhongke Shuguang International Information Industry Co ltd
Zhongke Sugon Information Industry Chengdu Co ltd
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Shuguang International Information Industry Co ltd, Zhongke Sugon Information Industry Chengdu Co ltd, Dawning Information Industry Co Ltd filed Critical Zhongke Shuguang International Information Industry Co ltd
Priority to CN202011014602.4A priority Critical patent/CN112286755B/en
Publication of CN112286755A publication Critical patent/CN112286755A/en
Application granted granted Critical
Publication of CN112286755B publication Critical patent/CN112286755B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a cluster server out-of-band data acquisition method, a cluster server out-of-band data acquisition device and computer equipment. The out-of-band data acquisition method of the cluster server comprises the following steps: asset information of cluster servers is scanned regularly, the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread; when the change of the asset information is monitored, acquiring asset change information; updating the cache of the BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of the asset sensor; the asset sensor based cache collects out-of-band data for the cluster server. The method, the device and the computer equipment for collecting the out-of-band data of the cluster server can realize second-level collection of the out-of-band data, quickly locate fault points, have high collection speed, improve efficiency and save resources.

Description

Out-of-band data acquisition method and device for cluster server and computer equipment
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method, an apparatus, and a computer device for collecting out-of-band data of a cluster server.
Background
With the development of technology, the service data has been rapidly increased, so cluster servers have been continuously expanded, and the size of clusters has been continuously expanded. Monitoring for large-scale cluster servers is a problem to be solved. Currently, most enterprises implement server monitoring based on IPMI (Intelligent Platform Management Interface ). IPMI can span across different operating systems, firmware, and hardware platforms, and can intelligently monitor, control, and automatically report back the operational status of a large number of servers to reduce server system costs. The monitoring of the system temperature, fan, power supply, voltage and other BMC (Baseboard Manager Controller, baseboard management controller) sensors of the server is an important component for out-of-band monitoring of the server. If the server has the problems of over-temperature, hardware damage and the like, the problems need to be positioned quickly and timely, and related data are acquired.
The existing framework for collecting out-of-band data mainly pulls needed information for a single server through an IPMITOOL management tool. However, this approach is slow to collect data and occupies a lot of key node resources. Meanwhile, network detection is not supported by the network detection device, and detection is ended by generally commanding automatic timeout. Therefore, the need to monitor large-scale cluster servers cannot be met.
Disclosure of Invention
The object of the present invention is to solve at least to some extent one of the above-mentioned technical problems.
Therefore, the first purpose of the invention is to provide a cluster server out-of-band data acquisition method, which can realize second-level acquisition of out-of-band data, quickly locate fault points, and has the advantages of high acquisition speed, high efficiency and resource conservation.
The second purpose of the invention is to provide a cluster server out-of-band data acquisition device.
A third object of the invention is to propose a computer device.
A fourth object of the present invention is to propose a non-transitory computer readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for collecting out-of-band data of a cluster server, the method including:
asset information of cluster servers is scanned regularly, wherein the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
when the asset information is monitored to change, acquiring asset change information;
updating the cache of a BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of an asset sensor;
and collecting out-of-band data of the cluster server based on the cache of the asset sensor.
Optionally, before the asset information of the cluster server is scanned periodically, the method further comprises:
grouping all cluster servers according to preset rules, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
Optionally, collecting the out-of-band data of the cluster server based on the cache of the asset sensor includes:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
Optionally, the method further includes, while executing the command for collecting the out-of-band data of the cluster server according to the out-of-band data index:
counting the acquisition commands of the out-of-band data of the cluster server;
and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added to the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
Optionally, the method further comprises:
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
Optionally, after collecting the out-of-band data of the cluster server based on the buffering of the asset sensor, the method further comprises:
and filtering the out-of-band data based on a preset index.
Optionally, filtering the out-of-band data based on a preset index includes:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
According to the out-of-band data acquisition method for the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, the out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to achieve the above object, an embodiment of a second aspect of the present invention provides an out-of-band data collection device for a cluster server, including:
the scanning module is used for regularly scanning asset information of cluster servers, the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
the acquisition module is used for acquiring asset change information when the asset information is monitored to change;
the updating module is used for updating the cache of the BMC management tool according to the asset change information and controlling the BMC management tool to update the cache of the asset sensor;
and the acquisition module is used for acquiring out-of-band data of the cluster server based on the cache of the asset sensor.
Optionally, the apparatus further comprises:
the grouping module is used for grouping all the cluster servers according to preset rules before the asset information of the cluster servers is scanned at regular time, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
Optionally, the acquisition module is configured to:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
Optionally, the acquisition module is further configured to:
executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the acquisition command of the out-of-band data of the cluster server;
and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added to the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
Optionally, the acquisition module is further configured to:
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
Optionally, the apparatus further comprises:
and the filtering module is used for filtering out-of-band data of the cluster server based on a preset index after the out-of-band data are acquired based on the cache of the asset sensor.
Optionally, the filtering module is configured to:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
According to the out-of-band data acquisition device of the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to achieve the above object, an embodiment of a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement a cluster server out-of-band data collection method according to the embodiment of the first aspect.
To achieve the above object, an embodiment of a fourth aspect of the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to implement a cluster server out-of-band data collection method according to the embodiment of the first aspect.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a cluster server out-of-band data collection method in accordance with one embodiment of the invention;
FIG. 2 is a flow chart of collecting out-of-band data for a cluster server in accordance with one embodiment of the invention;
FIG. 3 is a flow chart of collecting out-of-band data of the cluster server in accordance with another embodiment of the invention;
FIG. 4 is a flow chart of collecting out-of-band data of the cluster server in accordance with yet another embodiment of the invention;
FIG. 5 is a flow chart of a cluster server out-of-band data collection method in accordance with another embodiment of the invention;
FIG. 6 is a flow diagram of cluster server grouping in accordance with one embodiment of the invention;
FIG. 7 is a flow chart of a cluster server out-of-band data collection method in accordance with yet another embodiment of the invention;
FIG. 8 is a schematic diagram of a cluster server out-of-band data collection device according to one embodiment of the invention;
FIG. 9 is a schematic diagram of an out-of-band data collection device of a cluster server according to another embodiment of the invention;
fig. 10 is a schematic structural diagram of a cluster server out-of-band data collection device according to another embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
The invention is described in further detail below in connection with specific examples which are not to be construed as limiting the scope of the invention as claimed.
The method, the device and the computer equipment for collecting out-of-band data of the cluster server in the embodiment of the invention are described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a cluster server out-of-band data collection method according to one embodiment of the invention, as shown in FIG. 1, comprising the steps of:
s1, asset information of a cluster server is scanned regularly.
The cluster servers can be divided into a plurality of groups, and each group of cluster servers corresponds to one thread. That is, a thread of a timed scan operation, the scanned object is a group of clustered servers. The asset information may include, among other things, some cluster service device configuration information such as physical address, IP address, model and frequency of CPU, size of hard disk, etc.
In one embodiment, it may be arranged to scan the asset information of the cluster server every 30 minutes to monitor whether the asset information changes.
S2, when the change of the asset information is monitored, acquiring asset change information.
Wherein the asset transition may be adding an asset, deleting an asset, modifying an asset, etc.
In the process of scanning the asset information of the cluster server, if the asset information is found to change, corresponding asset change information, namely which assets are added, which assets are deleted, which assets are modified, and the like, can be acquired.
And S3, updating the cache of the BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of the asset sensor.
After the asset change information is acquired, the changed asset information may be updated to a cache of the baseboard management controller BMC management tool based on the asset change information, where the cache is mainly used for storing the asset information. Then, the BMC management tool is controlled to update the buffer memory of the asset sensor, wherein the buffer memory mainly stores the monitoring items and the corresponding values of the asset sensor. For example, freeipmi is a BMC management tool. After the asset change is monitored, the ipmi detected list of freeipmi can be dynamically modified, so that the information after the asset change is correspondingly modified. And then updated to the freeipmi cache so that freeipmi knows that the asset has changed. Thereafter, freeipmi may detect the asset that has changed and update the asset change information to the asset sensor's cache to determine which data the asset sensor is specifically required to collect.
In one embodiment of the present application, the freeipmi network probe service may also be invoked while updating freeipmi. The freeipmi network probe service may add the changed assets to the freeipmi network probe. Network probing is mainly that freeipmi automatically probes the ipmi network of all cluster servers. If the network is not available, the asset sensor cache cannot be updated, and the data acquisition operation cannot be performed. And if the network is connected, the cache of asset sensors may be updated.
S4, the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
After the cache of the asset sensor is updated in the previous step, it is determined which data the asset sensor specifically collects, so that out-of-band data of the corresponding cluster server may be collected based on the foregoing settings.
The specific acquisition process may be as shown in fig. 2.
S41, determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor.
The out-of-band data indicators may include, among other things, system temperature, rotational speed of the fan, power of the power supply, voltage value, etc.
S42, executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
In another embodiment of the present invention, other operations may be performed concurrently with the collection of out-of-band data. As shown in fig. 3, the specific steps include:
s43, executing the collection command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the collection command of the out-of-band data of the cluster server.
S44, if the number of times of executing the collection command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is obtained, and the FRU information is added to the corresponding out-of-band data.
Wherein the FRU (Field Replace Unit, field replaceable unit) information may include, but is not limited to, serial number and vendor information. For example, if the counting result is zero, it is indicated that the current acquisition is the first acquisition operation, the FRU information such as the serial number of the cluster server, vendor information, etc. may be obtained, and then the FRU information is combined into the out-of-band data of the corresponding cluster server. The FRU information may be obtained by executing the FRU command.
In yet another embodiment of the present invention, there is a case where the count result is not zero, and the specific steps may be as shown in fig. 4:
s45, if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server.
And S46, when the number of times of executing the acquisition command of the out-of-band data of the cluster server reaches a preset value, updating FRU information of the cluster server, and adding the updated FRU information into the corresponding out-of-band data.
Since the FRU information of the cluster server is not changed in general, it is not necessary to acquire the FRU information once every time data is acquired. The number of collection times may be counted, and when the number reaches a preset number, for example, 100 times, the count result may be set to 0. Then the next time the data is collected, the operation of obtaining the FRU information may be performed simultaneously, thereby updating the FRU information. That is, the original FRU information is emptied, and the FRU information is acquired again.
According to the out-of-band data acquisition method for the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, the out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In another embodiment of the present invention, as shown in fig. 5, further comprising:
s5, before asset information of the cluster servers is scanned at regular time, grouping all the cluster servers according to preset rules.
The preset rules can include the same account number and password, the maximum IP number, the maximum character number, the same network segment and the like.
For example, if the maximum IP number of each group set in grouping is 100, after the group is filled with 100 cluster servers, another group is newly built. For another example, if the network segments to which the IP addresses of the cluster servers belong are the same, the cluster servers located in the same network segment are preferentially grouped.
For example, a specific grouping process may be as shown in fig. 6.
S51, reading the configuration file to set the maximum value of the number of the IPs in each group.
S52, reading the configuration file to set the maximum value of the number of all ipmi network address characters in each group.
S53, clearing the grouping list.
Each time a packet is made, the last packet list is emptied.
S54, traversing all cluster server ipmi configuration information, and grouping cluster servers based on preset conditions.
The preset conditions can comprise account passwords, maximum IP numbers, maximum character numbers and the same network segment.
The concrete steps are as follows:
1, judging whether account passwords used by a cluster server are the same, and if the account passwords are the same, dividing the cluster server into the same group; otherwise, it is divided into different groups.
And 2, judging whether the IP number of the cluster servers in the current group exceeds the maximum IP number, and if so, grouping the cluster servers into one more group.
And 3, judging whether the number of characters of the cluster server of the current group exceeds the maximum number of characters, and if the number of characters exceeds the maximum number of characters, grouping the characters into one more group.
And 4, judging whether the cluster servers belong to the same network segment, and if so, merging the cluster servers of the same network segment into the same group.
S55, generating a grouping list.
And acquiring the account passwords of the cluster servers in each group list, and adding the account passwords to the corresponding groups so as to generate a final group list.
By grouping the cluster servers, each group corresponds to one thread, and the acquisition efficiency can be effectively improved.
In yet another embodiment of the present invention, as shown in fig. 7, the cluster server out-of-band data collection method further includes:
s6, filtering out-of-band data based on a preset index after the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
Specifically, filtering out-of-band data may include the steps of:
first, a first index name of a cluster server is obtained. Then, a corresponding asset sensor name is obtained according to the first index name. And then, acquiring the index value corresponding to the first index name in the out-of-band data according to the name of the asset sensor. In this embodiment, the index value is read from the redis database.
The first index name refers to an important index focused in the cluster server.
By filtering the out-of-band data, only the index value of the required key index is read, so that the system resource can be saved. For example, an important index in the cluster server is the temperature of the CPU or the like. Through the preset, only the index value of the temperature of the CPU, such as 60 ℃, can be read without reading the corresponding values of all indexes, thereby achieving the purpose of saving system resources.
In order to realize the embodiment, the invention further provides a cluster server out-of-band data acquisition device.
Fig. 8 is a schematic structural diagram of a cluster server out-of-band data collection device according to an embodiment of the present invention.
As shown in fig. 8, the apparatus includes a scanning module 81, an acquisition module 82, an updating module 83, and an acquisition module 84.
And a scanning module 81, configured to scan asset information of the cluster server at regular time. The cluster servers are in multiple groups, and each group of cluster servers corresponds to one thread.
And an acquisition module 82, configured to acquire asset transition information when it is detected that the asset transition information changes.
The updating module 83 is configured to update the cache of the baseboard management controller BMC management tool according to the asset change information, and control the baseboard management controller BMC management tool to update the cache of the asset sensor.
The collection module 84 is configured to collect out-of-band data of the cluster server based on the cache of the asset sensor.
The collection module 84 is specifically configured to: determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor; and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
The acquisition module 84 is further configured to: executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the acquisition command of the out-of-band data of the cluster server; and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added into the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
The acquisition module 84 is further configured to: if the number of times of executing the acquisition command of the out-of-band data of the cluster server is not equal to zero, adding one operation to the number of times of executing the acquisition command of the out-of-band data of the cluster server; when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
In another embodiment of the present invention, as shown in fig. 9, the cluster server out-of-band data collection device further includes a grouping module 85.
The grouping module 85 is configured to group all cluster servers according to a preset rule before the asset information of the cluster servers is scanned at regular time. The preset rule comprises at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
In yet another embodiment of the present invention, as shown in fig. 10, the cluster server out-of-band data collection device further includes a filtering module 86.
The filtering module 86 is configured to filter out-of-band data based on a preset index after the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
The filtering module 86 is specifically configured to: acquiring a first index name of a cluster server; acquiring a corresponding asset sensor name according to the first index name; and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
It should be understood that the out-of-band data collection device of the cluster server in this embodiment is consistent with the description of the out-of-band data collection method of the cluster server in the embodiment of the first aspect, and will not be repeated here.
According to the out-of-band data acquisition device of the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to implement the above embodiment, the present invention also proposes a computer device.
The computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the out-of-band data acquisition method of the cluster server according to the embodiment of the first aspect when executing the computer program.
To achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium.
The non-transitory computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a cluster server out-of-band data collection method as in the embodiment of the first aspect.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
It should be noted that in the description of the present specification, descriptions of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Claims (10)

1. The method for collecting the out-of-band data of the cluster server is characterized by comprising the following steps of:
asset information of cluster servers is scanned regularly, wherein the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
when the asset information is monitored to change, acquiring asset change information;
updating the cache of a BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of an asset sensor;
collecting out-of-band data of the cluster server based on the cache of the asset sensor;
wherein collecting out-of-band data of the cluster server based on the cache of the asset sensor comprises:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
executing an out-of-band data acquisition command of the cluster server according to the out-of-band data index;
executing the out-of-band data acquisition command of the cluster server according to the out-of-band data index, and simultaneously, further comprising:
counting the acquisition commands of the out-of-band data of the cluster server;
if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is obtained, and the FRU information is added into the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information;
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
2. The method of claim 1, further comprising, prior to timing scanning the asset information of the cluster server:
grouping all cluster servers according to preset rules, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
3. The method of claim 1, further comprising, after collecting the out-of-band data of the cluster server based on the asset sensor's cache:
and filtering the out-of-band data based on a preset index.
4. The method of claim 3, wherein filtering the out-of-band data based on a preset index comprises:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
5. An out-of-band data acquisition device of a cluster server, comprising:
the scanning module is used for regularly scanning asset information of cluster servers, the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
the acquisition module is used for acquiring asset change information when the asset information is monitored to change;
the updating module is used for updating the cache of the BMC management tool according to the asset change information and controlling the BMC management tool to update the cache of the asset sensor;
the acquisition module is used for acquiring out-of-band data of the cluster server based on the cache of the asset sensor;
the acquisition module is specifically configured to:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
executing an out-of-band data acquisition command of the cluster server according to the out-of-band data index;
executing the out-of-band data acquisition command of the cluster server according to the out-of-band data index, and simultaneously, further comprising:
counting the acquisition commands of the out-of-band data of the cluster server;
if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is obtained, and the FRU information is added into the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information;
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
6. The apparatus as recited in claim 5, wherein the apparatus further comprises:
the grouping module is used for grouping all the cluster servers according to preset rules before the asset information of the cluster servers is scanned at regular time, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
7. The apparatus as recited in claim 5, wherein the apparatus further comprises:
and the filtering module is used for filtering out-of-band data of the cluster server based on a preset index after the out-of-band data are acquired based on the cache of the asset sensor.
8. The apparatus of claim 7, wherein the filter module is to:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the cluster server out-of-band data collection method according to any of claims 1-4 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the cluster server out-of-band data collection method according to any of claims 1-4.
CN202011014602.4A 2020-09-24 2020-09-24 Out-of-band data acquisition method and device for cluster server and computer equipment Active CN112286755B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011014602.4A CN112286755B (en) 2020-09-24 2020-09-24 Out-of-band data acquisition method and device for cluster server and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011014602.4A CN112286755B (en) 2020-09-24 2020-09-24 Out-of-band data acquisition method and device for cluster server and computer equipment

Publications (2)

Publication Number Publication Date
CN112286755A CN112286755A (en) 2021-01-29
CN112286755B true CN112286755B (en) 2023-05-05

Family

ID=74421283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011014602.4A Active CN112286755B (en) 2020-09-24 2020-09-24 Out-of-band data acquisition method and device for cluster server and computer equipment

Country Status (1)

Country Link
CN (1) CN112286755B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590414B (en) * 2021-06-30 2024-08-16 济南浪潮数据技术有限公司 Method, device, equipment and medium for collecting and caching server cluster information
CN113839810B (en) * 2021-08-27 2023-04-07 济南浪潮数据技术有限公司 HPC-based server out-of-band data transmission method, device and system
CN116743561A (en) * 2022-03-02 2023-09-12 中兴通讯股份有限公司 Asset information acquisition method, electronic device, and computer-readable storage medium
CN114978660B (en) * 2022-05-17 2024-04-19 阿里巴巴(中国)有限公司 Out-of-band network construction method and out-of-band processing method based on out-of-band network
CN118503053B (en) * 2024-07-17 2024-09-27 苏州元脑智能科技有限公司 Hardware information transmission method, product, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103200199A (en) * 2013-04-15 2013-07-10 北京搜狐新媒体信息技术有限公司 Out of band (OOB) data collection system
CN104063300A (en) * 2014-01-18 2014-09-24 浪潮电子信息产业股份有限公司 Acquisition device based on FPGA (Field Programmable Gate Array) for monitoring information of high-end multi-channel server
CN106227636A (en) * 2016-07-20 2016-12-14 国网安徽省电力公司信息通信分公司 A kind of data center based on IPMI outband management system
CN107431643A (en) * 2015-02-03 2017-12-01 Netapp股份有限公司 Monitor storage cluster element
CN107465714A (en) * 2017-01-23 2017-12-12 北京思特奇信息技术股份有限公司 A kind of configuration data dynamic update system and method based on application cluster
WO2019153553A1 (en) * 2018-02-12 2019-08-15 平安科技(深圳)有限公司 Cross wide area network data return method and apparatus, computer device, and storage medium
CN110543409A (en) * 2019-08-29 2019-12-06 南方电网数字电网研究院有限公司 Hardware data acquisition method and device, computer equipment and storage medium
CN111694707A (en) * 2020-05-23 2020-09-22 苏州浪潮智能科技有限公司 Small server cluster management system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8769075B2 (en) * 2012-01-18 2014-07-01 International Business Machines Corporation Use of a systems management tool to manage an integrated solution appliance
US20130212210A1 (en) * 2012-02-10 2013-08-15 General Electric Company Rule engine manager in memory data transfers
US20170155741A1 (en) * 2015-12-01 2017-06-01 Le Holdings (Beijing) Co., Ltd. Server, method, and system for providing service data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103200199A (en) * 2013-04-15 2013-07-10 北京搜狐新媒体信息技术有限公司 Out of band (OOB) data collection system
CN104063300A (en) * 2014-01-18 2014-09-24 浪潮电子信息产业股份有限公司 Acquisition device based on FPGA (Field Programmable Gate Array) for monitoring information of high-end multi-channel server
CN107431643A (en) * 2015-02-03 2017-12-01 Netapp股份有限公司 Monitor storage cluster element
CN106227636A (en) * 2016-07-20 2016-12-14 国网安徽省电力公司信息通信分公司 A kind of data center based on IPMI outband management system
CN107465714A (en) * 2017-01-23 2017-12-12 北京思特奇信息技术股份有限公司 A kind of configuration data dynamic update system and method based on application cluster
WO2019153553A1 (en) * 2018-02-12 2019-08-15 平安科技(深圳)有限公司 Cross wide area network data return method and apparatus, computer device, and storage medium
CN110543409A (en) * 2019-08-29 2019-12-06 南方电网数字电网研究院有限公司 Hardware data acquisition method and device, computer equipment and storage medium
CN111694707A (en) * 2020-05-23 2020-09-22 苏州浪潮智能科技有限公司 Small server cluster management system and method

Also Published As

Publication number Publication date
CN112286755A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112286755B (en) Out-of-band data acquisition method and device for cluster server and computer equipment
JP5284469B2 (en) Automatic discovery of physical connectivity between power outlets and IT equipment
US9589229B2 (en) Dynamic model-based analysis of data centers
CN107729210B (en) Distributed service cluster abnormity diagnosis method and device
CN112882796B (en) Abnormal root cause analysis method and device and storage medium
US8655623B2 (en) Diagnostic system and method
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN102187327B (en) Trend is determined and is identified
US9210057B2 (en) Cross-cutting event correlation
US8275876B2 (en) Method and apparatus for collecting attribute-information, and computer product
CN112380089A (en) Data center monitoring and early warning method and system
CN113708986A (en) Server monitoring apparatus, method and computer-readable storage medium
US10574552B2 (en) Operation of data network
US8601318B2 (en) Method, apparatus and computer program product for rule-based directed problem resolution for servers with scalable proactive monitoring
CN111611048A (en) Migration method and device of virtual machine in cloud computing environment and computer equipment
CN106899436A (en) A kind of cloud platform failure predication diagnostic system
WO2024139333A1 (en) Method and device for predicting operating state of storage cluster
WO2020000669A1 (en) Data code analysis method and apparatus
CN115687026A (en) Multi-node server fault early warning method, device, equipment and medium
CN112732517B (en) Disk fault alarm method, device, equipment and readable storage medium
Zimmer et al. Towards self-optimization in HPC I/O
CN117439899B (en) Communication machine room inspection method and system based on big data
CN111506422A (en) Event analysis method and system
CN104883273A (en) Method and system for processing service influence model in virtualized service management platform
CN113703685B (en) Data storage method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211013

Address after: 300384 floor 1-3, No.15 Haitai Huake street, Huayuan Industrial Zone, Binhai New Area, Tianjin

Applicant after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Applicant after: Zhongke Shuguang International Information Industry Co.,Ltd.

Applicant after: ZHONGKE SUGON INFORMATION INDUSTRY CHENGDU Co.,Ltd.

Address before: 300384 floor 1-3, No.15 Haitai Huake street, Huayuan Industrial Zone, Binhai New Area, Tianjin

Applicant before: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Applicant before: Zhongke Shuguang International Information Industry Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant