Disclosure of Invention
The object of the present invention is to solve at least to some extent one of the above-mentioned technical problems.
Therefore, the first purpose of the invention is to provide a cluster server out-of-band data acquisition method, which can realize second-level acquisition of out-of-band data, quickly locate fault points, and has the advantages of high acquisition speed, high efficiency and resource conservation.
The second purpose of the invention is to provide a cluster server out-of-band data acquisition device.
A third object of the invention is to propose a computer device.
A fourth object of the present invention is to propose a non-transitory computer readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for collecting out-of-band data of a cluster server, the method including:
asset information of cluster servers is scanned regularly, wherein the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
when the asset information is monitored to change, acquiring asset change information;
updating the cache of a BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of an asset sensor;
and collecting out-of-band data of the cluster server based on the cache of the asset sensor.
Optionally, before the asset information of the cluster server is scanned periodically, the method further comprises:
grouping all cluster servers according to preset rules, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
Optionally, collecting the out-of-band data of the cluster server based on the cache of the asset sensor includes:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
Optionally, the method further includes, while executing the command for collecting the out-of-band data of the cluster server according to the out-of-band data index:
counting the acquisition commands of the out-of-band data of the cluster server;
and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added to the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
Optionally, the method further comprises:
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
Optionally, after collecting the out-of-band data of the cluster server based on the buffering of the asset sensor, the method further comprises:
and filtering the out-of-band data based on a preset index.
Optionally, filtering the out-of-band data based on a preset index includes:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
According to the out-of-band data acquisition method for the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, the out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to achieve the above object, an embodiment of a second aspect of the present invention provides an out-of-band data collection device for a cluster server, including:
the scanning module is used for regularly scanning asset information of cluster servers, the cluster servers are in a plurality of groups, and each group of cluster servers corresponds to one thread;
the acquisition module is used for acquiring asset change information when the asset information is monitored to change;
the updating module is used for updating the cache of the BMC management tool according to the asset change information and controlling the BMC management tool to update the cache of the asset sensor;
and the acquisition module is used for acquiring out-of-band data of the cluster server based on the cache of the asset sensor.
Optionally, the apparatus further comprises:
the grouping module is used for grouping all the cluster servers according to preset rules before the asset information of the cluster servers is scanned at regular time, wherein the preset rules comprise at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
Optionally, the acquisition module is configured to:
determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor;
and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
Optionally, the acquisition module is further configured to:
executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the acquisition command of the out-of-band data of the cluster server;
and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added to the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
Optionally, the acquisition module is further configured to:
if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server;
when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
Optionally, the apparatus further comprises:
and the filtering module is used for filtering out-of-band data of the cluster server based on a preset index after the out-of-band data are acquired based on the cache of the asset sensor.
Optionally, the filtering module is configured to:
acquiring a first index name of a cluster server;
acquiring a corresponding asset sensor name according to the first index name;
and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
According to the out-of-band data acquisition device of the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to achieve the above object, an embodiment of a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement a cluster server out-of-band data collection method according to the embodiment of the first aspect.
To achieve the above object, an embodiment of a fourth aspect of the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to implement a cluster server out-of-band data collection method according to the embodiment of the first aspect.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
The invention is described in further detail below in connection with specific examples which are not to be construed as limiting the scope of the invention as claimed.
The method, the device and the computer equipment for collecting out-of-band data of the cluster server in the embodiment of the invention are described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a cluster server out-of-band data collection method according to one embodiment of the invention, as shown in FIG. 1, comprising the steps of:
s1, asset information of a cluster server is scanned regularly.
The cluster servers can be divided into a plurality of groups, and each group of cluster servers corresponds to one thread. That is, a thread of a timed scan operation, the scanned object is a group of clustered servers. The asset information may include, among other things, some cluster service device configuration information such as physical address, IP address, model and frequency of CPU, size of hard disk, etc.
In one embodiment, it may be arranged to scan the asset information of the cluster server every 30 minutes to monitor whether the asset information changes.
S2, when the change of the asset information is monitored, acquiring asset change information.
Wherein the asset transition may be adding an asset, deleting an asset, modifying an asset, etc.
In the process of scanning the asset information of the cluster server, if the asset information is found to change, corresponding asset change information, namely which assets are added, which assets are deleted, which assets are modified, and the like, can be acquired.
And S3, updating the cache of the BMC management tool according to the asset change information, and controlling the BMC management tool to update the cache of the asset sensor.
After the asset change information is acquired, the changed asset information may be updated to a cache of the baseboard management controller BMC management tool based on the asset change information, where the cache is mainly used for storing the asset information. Then, the BMC management tool is controlled to update the buffer memory of the asset sensor, wherein the buffer memory mainly stores the monitoring items and the corresponding values of the asset sensor. For example, freeipmi is a BMC management tool. After the asset change is monitored, the ipmi detected list of freeipmi can be dynamically modified, so that the information after the asset change is correspondingly modified. And then updated to the freeipmi cache so that freeipmi knows that the asset has changed. Thereafter, freeipmi may detect the asset that has changed and update the asset change information to the asset sensor's cache to determine which data the asset sensor is specifically required to collect.
In one embodiment of the present application, the freeipmi network probe service may also be invoked while updating freeipmi. The freeipmi network probe service may add the changed assets to the freeipmi network probe. Network probing is mainly that freeipmi automatically probes the ipmi network of all cluster servers. If the network is not available, the asset sensor cache cannot be updated, and the data acquisition operation cannot be performed. And if the network is connected, the cache of asset sensors may be updated.
S4, the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
After the cache of the asset sensor is updated in the previous step, it is determined which data the asset sensor specifically collects, so that out-of-band data of the corresponding cluster server may be collected based on the foregoing settings.
The specific acquisition process may be as shown in fig. 2.
S41, determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor.
The out-of-band data indicators may include, among other things, system temperature, rotational speed of the fan, power of the power supply, voltage value, etc.
S42, executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
In another embodiment of the present invention, other operations may be performed concurrently with the collection of out-of-band data. As shown in fig. 3, the specific steps include:
s43, executing the collection command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the collection command of the out-of-band data of the cluster server.
S44, if the number of times of executing the collection command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is obtained, and the FRU information is added to the corresponding out-of-band data.
Wherein the FRU (Field Replace Unit, field replaceable unit) information may include, but is not limited to, serial number and vendor information. For example, if the counting result is zero, it is indicated that the current acquisition is the first acquisition operation, the FRU information such as the serial number of the cluster server, vendor information, etc. may be obtained, and then the FRU information is combined into the out-of-band data of the corresponding cluster server. The FRU information may be obtained by executing the FRU command.
In yet another embodiment of the present invention, there is a case where the count result is not zero, and the specific steps may be as shown in fig. 4:
s45, if the number of times of executing the out-of-band data acquisition command of the cluster server is not equal to zero, adding one operation to the number of times of executing the out-of-band data acquisition command of the cluster server.
And S46, when the number of times of executing the acquisition command of the out-of-band data of the cluster server reaches a preset value, updating FRU information of the cluster server, and adding the updated FRU information into the corresponding out-of-band data.
Since the FRU information of the cluster server is not changed in general, it is not necessary to acquire the FRU information once every time data is acquired. The number of collection times may be counted, and when the number reaches a preset number, for example, 100 times, the count result may be set to 0. Then the next time the data is collected, the operation of obtaining the FRU information may be performed simultaneously, thereby updating the FRU information. That is, the original FRU information is emptied, and the FRU information is acquired again.
According to the out-of-band data acquisition method for the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, the out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In another embodiment of the present invention, as shown in fig. 5, further comprising:
s5, before asset information of the cluster servers is scanned at regular time, grouping all the cluster servers according to preset rules.
The preset rules can include the same account number and password, the maximum IP number, the maximum character number, the same network segment and the like.
For example, if the maximum IP number of each group set in grouping is 100, after the group is filled with 100 cluster servers, another group is newly built. For another example, if the network segments to which the IP addresses of the cluster servers belong are the same, the cluster servers located in the same network segment are preferentially grouped.
For example, a specific grouping process may be as shown in fig. 6.
S51, reading the configuration file to set the maximum value of the number of the IPs in each group.
S52, reading the configuration file to set the maximum value of the number of all ipmi network address characters in each group.
S53, clearing the grouping list.
Each time a packet is made, the last packet list is emptied.
S54, traversing all cluster server ipmi configuration information, and grouping cluster servers based on preset conditions.
The preset conditions can comprise account passwords, maximum IP numbers, maximum character numbers and the same network segment.
The concrete steps are as follows:
1, judging whether account passwords used by a cluster server are the same, and if the account passwords are the same, dividing the cluster server into the same group; otherwise, it is divided into different groups.
And 2, judging whether the IP number of the cluster servers in the current group exceeds the maximum IP number, and if so, grouping the cluster servers into one more group.
And 3, judging whether the number of characters of the cluster server of the current group exceeds the maximum number of characters, and if the number of characters exceeds the maximum number of characters, grouping the characters into one more group.
And 4, judging whether the cluster servers belong to the same network segment, and if so, merging the cluster servers of the same network segment into the same group.
S55, generating a grouping list.
And acquiring the account passwords of the cluster servers in each group list, and adding the account passwords to the corresponding groups so as to generate a final group list.
By grouping the cluster servers, each group corresponds to one thread, and the acquisition efficiency can be effectively improved.
In yet another embodiment of the present invention, as shown in fig. 7, the cluster server out-of-band data collection method further includes:
s6, filtering out-of-band data based on a preset index after the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
Specifically, filtering out-of-band data may include the steps of:
first, a first index name of a cluster server is obtained. Then, a corresponding asset sensor name is obtained according to the first index name. And then, acquiring the index value corresponding to the first index name in the out-of-band data according to the name of the asset sensor. In this embodiment, the index value is read from the redis database.
The first index name refers to an important index focused in the cluster server.
By filtering the out-of-band data, only the index value of the required key index is read, so that the system resource can be saved. For example, an important index in the cluster server is the temperature of the CPU or the like. Through the preset, only the index value of the temperature of the CPU, such as 60 ℃, can be read without reading the corresponding values of all indexes, thereby achieving the purpose of saving system resources.
In order to realize the embodiment, the invention further provides a cluster server out-of-band data acquisition device.
Fig. 8 is a schematic structural diagram of a cluster server out-of-band data collection device according to an embodiment of the present invention.
As shown in fig. 8, the apparatus includes a scanning module 81, an acquisition module 82, an updating module 83, and an acquisition module 84.
And a scanning module 81, configured to scan asset information of the cluster server at regular time. The cluster servers are in multiple groups, and each group of cluster servers corresponds to one thread.
And an acquisition module 82, configured to acquire asset transition information when it is detected that the asset transition information changes.
The updating module 83 is configured to update the cache of the baseboard management controller BMC management tool according to the asset change information, and control the baseboard management controller BMC management tool to update the cache of the asset sensor.
The collection module 84 is configured to collect out-of-band data of the cluster server based on the cache of the asset sensor.
The collection module 84 is specifically configured to: determining an out-of-band data index to be acquired corresponding to the cache of the asset sensor; and executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index.
The acquisition module 84 is further configured to: executing the acquisition command of the out-of-band data of the cluster server according to the out-of-band data index, and simultaneously counting the acquisition command of the out-of-band data of the cluster server; and if the number of times of executing the acquisition command of the out-of-band data of the cluster server is equal to zero, FRU information of the cluster server is acquired, and the FRU information is added into the corresponding out-of-band data, wherein the FRU information comprises at least one of a serial number and manufacturer information.
The acquisition module 84 is further configured to: if the number of times of executing the acquisition command of the out-of-band data of the cluster server is not equal to zero, adding one operation to the number of times of executing the acquisition command of the out-of-band data of the cluster server; when the number of times of executing the collection command of the out-of-band data of the cluster server reaches a preset value, the FRU information of the cluster server is updated, and the updated FRU information is added to the corresponding out-of-band data.
In another embodiment of the present invention, as shown in fig. 9, the cluster server out-of-band data collection device further includes a grouping module 85.
The grouping module 85 is configured to group all cluster servers according to a preset rule before the asset information of the cluster servers is scanned at regular time. The preset rule comprises at least one of the same account number and password, the maximum IP number, the maximum character number and the same network segment.
In yet another embodiment of the present invention, as shown in fig. 10, the cluster server out-of-band data collection device further includes a filtering module 86.
The filtering module 86 is configured to filter out-of-band data based on a preset index after the out-of-band data of the cluster server is collected based on the cache of the asset sensor.
The filtering module 86 is specifically configured to: acquiring a first index name of a cluster server; acquiring a corresponding asset sensor name according to the first index name; and acquiring an index value corresponding to the first index name in the out-of-band data according to the asset sensor name.
It should be understood that the out-of-band data collection device of the cluster server in this embodiment is consistent with the description of the out-of-band data collection method of the cluster server in the embodiment of the first aspect, and will not be repeated here.
According to the out-of-band data acquisition device of the cluster server, asset information of the cluster server is scanned at regular time, asset change information is acquired when the asset information is monitored to change, then the cache of the BMC management tool is updated according to the asset change information, the BMC management tool is controlled to update the cache of the asset sensor, out-of-band data of the cluster server is acquired based on the cache of the asset sensor, so that second-level acquisition of the out-of-band data can be realized, fault points can be quickly positioned, the acquisition speed is high, the efficiency is improved, and resources are saved.
In order to implement the above embodiment, the present invention also proposes a computer device.
The computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the out-of-band data acquisition method of the cluster server according to the embodiment of the first aspect when executing the computer program.
To achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium.
The non-transitory computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a cluster server out-of-band data collection method as in the embodiment of the first aspect.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
It should be noted that in the description of the present specification, descriptions of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.