CN111953544B - Fault detection method, device, equipment and storage medium of server - Google Patents
Fault detection method, device, equipment and storage medium of server Download PDFInfo
- Publication number
- CN111953544B CN111953544B CN202010821134.5A CN202010821134A CN111953544B CN 111953544 B CN111953544 B CN 111953544B CN 202010821134 A CN202010821134 A CN 202010821134A CN 111953544 B CN111953544 B CN 111953544B
- Authority
- CN
- China
- Prior art keywords
- fault
- keywords
- data information
- server
- matched
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application discloses a fault detection method of a server, which comprises the following steps: collecting fault data information generated when a server is in fault; performing word segmentation processing on each fault data information to obtain corresponding keywords; determining the matching times of each keyword and preset keywords in a preset word segmentation class library; and calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value. The method can improve the convenience of fault detection on the servers of different manufacturers or the servers of different operating systems, thereby improving the efficiency of fault detection on the servers. The application also discloses a fault detection device, equipment and a computer readable storage medium of the server, which have the beneficial effects.
Description
Technical Field
The present invention relates to the field of server detection, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for detecting a failure of a server.
Background
With the rapid development of cloud computing technology, the demand of servers is increasing day by day, the failure rate is inevitably increased due to the long-time uninterrupted operation of a large number of servers, and how to rapidly discover and process the failure of the servers becomes a technical problem to be solved by technicians.
Currently, each server manufacturer sets a monitoring platform for its own server to detect the failure of its own server, but different types of servers produced by different server manufacturers and different operating systems installed in the servers themselves make it necessary to use the corresponding monitoring platforms to detect the failure. That is to say, in the prior art, the functions of the monitoring platforms for performing fault detection on different servers are different, and the monitoring platforms are mostly based on unique protocols, which results in that unified supervision and output cannot be performed, and when fault detection needs to be performed on multiple types of servers and servers provided with multiple types of operating systems, the operation process is complex and the detection efficiency is low.
Therefore, how to improve the convenience and efficiency of detecting the server failure is a technical problem that needs to be solved by those skilled in the art at present.
Disclosure of Invention
In view of this, the present invention provides a method for detecting a server fault, which can improve the convenience and efficiency of detecting a server fault; another object of the present invention is to provide a fault detection apparatus, a device and a computer readable storage medium for a server, all of which have the above advantages.
In order to solve the above technical problem, the present invention provides a method for detecting a failure of a server, including:
collecting fault data information generated when a server is in fault;
performing word segmentation processing on each fault data information to obtain a corresponding keyword;
determining the matching times of each keyword and preset keywords in a preset word segmentation class library;
and calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value.
Preferably, the fault data information includes in-band fault data information and out-of-band fault data information; correspondingly, the process of collecting fault data information generated when the server fails specifically includes:
receiving the in-band fault information sent by an operating system and/or a preset monitoring platform of the server;
acquiring the in-band fault information by running a preset acquisition script in the server;
receiving the out-of-band fault data information forwarded by the BMC of the server;
correspondingly, the process of calculating the fault value of the server according to the matched keyword and the corresponding matching times and determining the fault condition of the server according to the fault value specifically includes:
and calculating a fault value of the server according to the matched keywords and the corresponding matching times which respectively correspond to the in-band fault data information and the out-of-band fault data information, and determining the fault condition of the server according to the fault value.
Preferably, after the collecting fault data information generated when the server fails, the method further includes:
judging the data format type of the fault data information;
if the fault data information is in a text format, storing the fault data information into a database, and performing word segmentation processing on each fault data information to obtain a corresponding keyword;
if the fault data information is in a graphic format, identifying characters in the fault data information to obtain the fault data information in a text format, storing the fault data information in the text format into the database, and performing word segmentation processing on each fault data information to obtain corresponding keywords.
Preferably, further comprising:
presetting a fault alarm upper limit value and a fault alarm lower limit value;
correspondingly, the process of calculating the fault value of the server according to the matched keyword and the corresponding matching times and determining the fault condition of the server according to the fault value specifically includes:
calculating the fault value of the server according to the matched keywords and the corresponding matching times;
if the fault value is larger than the lower fault alarm limit value and smaller than the upper fault alarm limit value, determining that the server can automatically repair the fault, and starting a preset fault repair program;
if the fault value is larger than the fault alarm upper limit value, determining that the server cannot automatically repair the fault, and sending corresponding alarm information.
Preferably, the process of performing word segmentation processing on each fault data information to obtain a corresponding keyword specifically includes:
judging the language type of the fault data information;
if the fault data information is Chinese, performing word segmentation processing on the fault data information by using a Chinese word segmentation tool to obtain a corresponding keyword;
and if the fault data information is English, performing word segmentation processing on the fault data information by using an English word segmentation tool to obtain a corresponding keyword.
Preferably, the process of determining the matching times of each keyword and a preset keyword in a preset segmentation class library specifically includes:
determining a removed word in the keyword by using a removed word library in the preset word class library, and deleting the removed word;
and matching the rest keywords by using the keyword class library in the word segmentation class library to determine the matching times of the rest keywords and the preset keywords.
Preferably, after the calculating a failure value of the server according to the matched keyword and the corresponding matching number of times and determining a failure condition of the server according to the failure value, the method further includes:
and sending the fault condition to the target terminal equipment in a mail and/or short message mode.
In order to solve the above technical problem, the present invention further provides a fault detection apparatus for a server, including:
the acquisition module is used for acquiring fault data information generated when the server fails;
the word segmentation module is used for carrying out word segmentation processing on each fault data message to obtain a corresponding keyword;
the matching module is used for determining the matching times of each keyword and preset keywords in a preset segmentation class library;
and the determining module is used for calculating a fault value of the server according to the matched keywords and the corresponding matching times and determining the fault condition of the server according to the fault value.
In order to solve the above technical problem, the present invention further provides a fault detection device for a server, including:
a memory for storing a computer program;
a processor for implementing the steps of the fault detection method of any one of the above servers when executing the computer program.
In order to solve the above technical problem, the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the fault detection method for any one of the servers.
The invention provides a fault detection method of a server, which comprises the following steps: collecting fault data information generated when a server is in fault; performing word segmentation processing on each fault data information to obtain a corresponding keyword; determining the matching times of each keyword and a preset keyword in a preset word classification library; and calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value. Therefore, the method is based on NLP to carry out word segmentation matching on the collected fault data information, and calculates the fault value according to the word segmentation matching condition, so as to determine the fault condition of the server, the processes of word segmentation matching and fault condition determination can be executed by using a unified computer program, and the server of different manufacturers or the server of different operating systems does not need to be differentially provided with corresponding fault detection programs or monitoring platforms, so that the method can improve the convenience of fault detection on the servers of different manufacturers or the servers of different operating systems, and further improve the efficiency of fault detection on the server.
In order to solve the technical problem, the invention also provides a fault detection device, equipment and a computer readable storage medium of the server, which have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting a failure of a server according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for detecting a failure of a server according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a specific storage directory structure according to an embodiment of the present invention;
fig. 4 is a structural diagram of a fault detection apparatus of a server according to an embodiment of the present invention;
fig. 5 is a structural diagram of a fault detection device of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The core of the embodiment of the invention is to provide a fault detection method of a server, which can improve the convenience and efficiency of fault detection of the server; another core of the present invention is to provide a fault detection apparatus, device and computer readable storage medium for a server, all having the above-mentioned advantages.
In order that those skilled in the art will better understand the disclosure, reference will now be made in detail to the embodiments of the disclosure as illustrated in the accompanying drawings.
Fig. 1 is a flowchart of a method for detecting a failure of a server according to an embodiment of the present invention. As shown in fig. 1, a method for detecting a failure of a server includes:
s10: collecting fault data information generated when a server fails;
in this embodiment, first, fault data information generated when a server fails needs to be acquired, specifically, the method includes acquiring corresponding fault data information through fault files such as logs, scripts, and fault screenshots. It should be noted that, in actual operation, a specific manner of acquiring the fault data information of the server is not limited, and for example, the fault data information may be acquired by running a preset acquisition script in the server and automatically returned, or the fault data information may be acquired and sent by directly receiving an operating system of the server or a preset monitoring platform corresponding to the server.
S20: performing word segmentation processing on each fault data information to obtain a corresponding keyword;
s30: and determining the matching times of each keyword and preset keywords in a preset word segmentation class library.
In this embodiment, after the fault data information is acquired, the acquired fault data information is analyzed based on a natural language processing NLP method. Specifically, before NLP parsing is performed, a segmentation class library for NLP needs to be constructed, and the segmentation class library is mainly used for defining professional words with server faults so as to facilitate segmentation. The set word class library needs to include: word name, part of speech, fault index; the word names are used for distinguishing different names of faults; the parts of speech include nouns, adjectives and the like, so that the word segmentation tool can distinguish the parts of speech conveniently; and the fault index represents an influence coefficient of the keywords obtained by word segmentation on the fault and is used for calculating a fault value.
It should be noted that, in actual operation, the segmentation class library includes a general thesaurus and an exclusive thesaurus, the general thesaurus is used for analyzing all server faults in a unified manner, and the exclusive thesaurus is used for processing special fault segmentation. After the word segmentation class library is constructed, reading fault data information which is collected in advance and stored in a database one by one according to a time sequence by taking a server as a unit, and calling a word segmentation tool to segment the fault data information to obtain a word segmentation result, namely a keyword; and then matching each keyword with a preset keyword in a segmentation class library respectively, and recording the matched keyword and the matching times corresponding to the keyword.
S40: and calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value.
Specifically, after determining the keywords matched with the preset keywords in the segmentation class library and the corresponding matching times, calculating a fault value corresponding to the server by taking the server as a unit, and determining the fault condition of the server according to the fault value, namely determining whether the server is abnormal. For example, through a preset threshold, when a fault value is greater than the preset threshold, it indicates that the server has a fault that needs to be repaired; otherwise, it indicates that the failure of the server is negligible.
The embodiment of the invention provides a fault detection method for a server, which comprises the following steps: collecting fault data information generated when a server is in fault; performing word segmentation processing on each fault data information to obtain a corresponding keyword; determining the matching times of each keyword and preset keywords in a preset word segmentation class library; and calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value. Therefore, the method is based on NLP to carry out word segmentation matching on the collected fault data information, and calculates the fault value according to the word segmentation matching condition, so as to determine the fault condition of the server, the processes of word segmentation matching and fault condition determination can be executed by using a unified computer program, and the server of different manufacturers or the server of different operating systems does not need to be differentially provided with corresponding fault detection programs or monitoring platforms, so that the method can improve the convenience of fault detection on the servers of different manufacturers or the servers of different operating systems, and further improve the efficiency of fault detection on the server.
On the basis of the above embodiment, this embodiment further describes and optimizes the technical solution, and specifically, in this embodiment, the fault data information includes in-band fault data information and out-of-band fault data information; correspondingly, the process of collecting fault data information generated when the server fails specifically includes:
receiving in-band fault information sent by an operating system and/or a preset monitoring platform of a server;
acquiring in-band fault information by running a preset acquisition script in a server;
receiving out-of-band fault data information forwarded by a BMC of a server;
correspondingly, the process of calculating the fault value of the server according to the matched keywords and the corresponding matching times and determining the fault condition of the server according to the fault value specifically comprises the following steps:
and calculating the fault value of the server according to the matched keywords and the corresponding matching times which respectively correspond to the in-band fault data information and the out-of-band fault data information, and determining the fault condition of the server according to the fault value.
Specifically, in this embodiment, the collected fault data information includes in-band fault data information and out-of-band fault data information; the in-band fault information acquisition has two modes, namely a mode I: passively receiving fault data information sent by an operating system of a server or a preset monitoring platform of the server through an information forwarding server; the second method comprises the following steps: and running a preset acquisition script by invading an operating system of the server, and acquiring and automatically returning the in-band fault information of the server by the preset acquisition script. The out-of-band fault information collection adopts a passive receiving mode, and fault data information forwarded by the BMC of the server is passively received by the information forwarding server.
That is, in the present embodiment, the fault data information includes two parts, i.e., in-band fault data information and out-of-band fault data information; therefore, after the keywords and the corresponding matching times of the in-band fault data information and the preset segmentation class library, and the keywords and the corresponding matching times of the out-of-band fault data information and the preset segmentation class library are respectively determined, when a fault value is calculated, the fault value of the server needs to be calculated according to the matching keywords and the matching times corresponding to the keywords, which respectively correspond to the in-band fault data information and the out-of-band fault data information, and then the fault condition of the server is determined according to the fault value.
Therefore, in the embodiment, the accuracy of calculating the fault value of the server can be improved by acquiring the in-band fault data information and the out-of-band fault data information and calculating the fault value of the server according to the two kinds of fault data information, so that the accuracy of detecting the fault of the server can be improved.
On the basis of the foregoing embodiment, the present embodiment further describes and optimizes the technical solution, and specifically, after acquiring fault data information generated when the server fails, the present embodiment further includes:
judging the data format type of the fault data information;
if the fault data information is in a text format, storing the fault data information into a database, and performing word segmentation processing on each fault data information to obtain a corresponding keyword;
and if the fault data information is in a graphic format, identifying characters in the fault data information to obtain the fault data information in a text format, storing the fault data information in the text format into a database, and performing word segmentation processing on each fault data information to obtain a corresponding keyword.
Specifically, in this embodiment, after collecting the fault data information generated when the server fails, the data format type of the collected fault data information is first identified. The data format types include a text format and a graphic format, the log or script file is generally in the text format, and the fault screenshot is generally in the graphic format.
When the fault data information is in a text format, storing the fault data information in the text format into a database, and performing word segmentation processing on each fault data information to obtain a corresponding keyword; if the fault data information is in a graphic format, characters in the fault data information in the graphic format need to be identified through an Optical Character Recognition (OCR) technology to obtain the fault data information in a text format, then the fault data information in the text format is stored in a database, and word segmentation processing is performed on each fault data information to obtain a corresponding keyword.
Therefore, the method of the embodiment can process the fault data information in the text format and the graphic format, increase the diversity and the richness of the fault data information, determine the fault value based on the more diversified fault data information to perform fault detection on the server, and further improve the accuracy of the fault detection.
On the basis of the above embodiments, the present embodiment further describes and optimizes the technical solution, and specifically, the present embodiment further includes:
presetting a fault alarm upper limit value and a fault alarm lower limit value;
correspondingly, the process of calculating the fault value of the server according to the matched keywords and the corresponding matching times and determining the fault condition of the server according to the fault value specifically comprises the following steps:
calculating a fault value of the server according to the matched keywords and the corresponding matching times;
if the fault value is larger than the lower fault alarm limit value and smaller than the upper fault alarm limit value, determining that the server can automatically repair the fault, and starting a preset fault repair program;
if the fault value is larger than the fault alarm upper limit value, the server is determined to be incapable of automatically repairing the fault, and corresponding alarm information is sent out.
Specifically, in this embodiment, a fault alarm upper limit value and a fault alarm lower limit value are further preset; after the fault value corresponding to the server is calculated, the fault value is compared with a fault alarm upper limit value and a fault alarm lower limit value respectively; if the fault value is larger than the lower fault alarm limit and smaller than the upper fault alarm limit, namely the fault value exceeds the lower fault alarm limit but does not reach the upper fault alarm limit, the current fault of the server is represented as an automatically repairable fault, and therefore a preset fault repairing program is started to perform self-repairing on the server; if the fault value is greater than the fault alarm upper limit value, the current fault of the server is a fault which cannot be automatically repaired, corresponding alarm information is sent out, and a server responsible person is informed to carry out manual fault repair on the server; if the fault value is smaller than the fault alarm lower limit value, the current fault of the server can be ignored, and therefore fault data information of the server during fault is continuously acquired.
Therefore, in the embodiment, the calculated fault value is further compared with the preset upper fault alarm limit value and the preset lower fault alarm limit value to further analyze the fault condition of the server, and when the fault of the server is an automatically repairable fault, the preset fault repairing program is started to self-repair the server, so that the operation that a technician needs to perform troubleshooting and repairing is relatively reduced, and the workload of the technician is relatively reduced.
On the basis of the foregoing embodiment, this embodiment further describes and optimizes the technical solution, and specifically, in this embodiment, the process of performing word segmentation processing on each fault data information to obtain a corresponding keyword specifically includes:
judging the language type of the fault data information;
if the fault data information is Chinese, performing word segmentation processing on the fault data information by using a Chinese word segmentation tool to obtain a corresponding keyword;
and if the fault data information is English, performing word segmentation processing on the fault data information by using an English word segmentation tool to obtain a corresponding keyword.
Specifically, in this embodiment, in the process of performing word segmentation processing on each fault data information to obtain a corresponding keyword, the language type of the fault data information is first analyzed and determined, where the language type includes chinese and english, so that a corresponding word segmentation tool needs to be used to perform word segmentation processing on the fault data information to obtain a corresponding keyword. As a preferred embodiment, the chinese word segmentation tool may be specifically Ansj; the English word segmentation tool can be specifically NLTK; the present embodiment is not limited to the type of the specific segmentation tool used.
Therefore, the language type of the fault data information is further distinguished, and the word segmentation processing is performed on the fault data information by using the word segmentation tool corresponding to the language type, so that the accuracy of the keywords obtained by word segmentation can be further improved, the accuracy of calculating the fault value is improved, and the accuracy of detecting the fault of the server is improved.
On the basis of the foregoing embodiment, this embodiment further describes and optimizes the technical solution, and specifically, in this embodiment, the process of determining the matching times of each keyword and a preset keyword in a preset segmentation class library specifically includes:
determining a removed word in the keyword by using a removed word class library in a preset word class library, and deleting the removed word;
and matching the rest keywords by using the keyword class library in the word segmentation class library to determine the matching times of the rest keywords and preset keywords.
Specifically, in this embodiment, a removal word class library is further set when the segmentation word class library is set, where a preset removal word is set in the removal word class library, and the preset removal word includes a word irrelevant to determining whether the server is faulty or not and a word that may interfere with/obscure determining whether the server is faulty or not; then, matching each keyword of the fault data information with each preset removal word in a removal word library respectively to determine the removal word in the keywords, and deleting the matched removal word to obtain the remaining keywords; and then, matching the rest keywords by using the keyword class library in the word segmentation class library, determining the keywords which are matched with the preset keywords in the keyword class library in the rest keywords, and determining the corresponding matching times.
That is to say, in this embodiment, the removed words in the keywords are screened in advance, the remaining keywords are then used to match with the keyword class library in the segmentation class library, and then the keywords matched with the preset keywords and the corresponding matching times are determined according to the remaining keywords, so that the accuracy of determining the keywords matched with the preset keywords in the keyword class library in the segmentation class library and the corresponding matching times can be improved, the accuracy of calculating the fault value of the server is improved, and the accuracy of detecting the fault of the server is improved.
On the basis of the foregoing embodiment, this embodiment further describes and optimizes the technical solution, and specifically, after calculating a failure value of the server according to the matched keyword and the corresponding matching times, and determining a failure condition of the server according to the failure value, this embodiment further includes:
and sending the fault condition to the target terminal equipment in a mail and/or short message mode.
Specifically, in this embodiment, a target terminal device that receives a fault condition is preset, then a fault value of the server is calculated according to the matched keyword and the corresponding matching times, and after the fault condition of the server is determined according to the fault value, that is, after the fault condition of the fault detection of the server is determined, the fault condition is further sent to the target terminal device by a mail and/or a short message; therefore, technicians can remotely acquire the fault detection condition of the server through the target terminal equipment, and the convenience of the fault detection of the server is further improved.
In order to make those skilled in the art better understand the technical solutions in the present application, the following describes the technical solutions in the embodiments of the present application in detail with reference to practical application scenarios. Fig. 2 is a flowchart of another method for detecting a failure of a server according to an embodiment of the present invention; as shown in fig. 2, taking a target server as an execution subject and taking fault detection on a plurality of client servers as an example, a specific process of a fault detection method for a server is as follows:
1. preparation phase
The equipment information of the client server which needs fault detection, including the server name, the server address (IP), the upper fault alarm limit value and the lower fault alarm limit value, is input into a server information table, and the table structure is as follows:
table 1 server information table
Serial number | Server name | IP | Upper limit of fault alarm | Lower limit of fault alarm | Current alarm value |
Creating an NLP (non-line segment) word class library, wherein the NLP word class library and the exclusive word library are both provided with a keyword class library and a removed word class library, preset keywords are added into the keyword class library, preset removed words are added into the removed word class library, general fault identification keywords are stored in the general word library, fault identification keywords which cannot be suitable for the general word library are added into the exclusive word library, and parts of speech (such as nouns, adjectives and the like) and fault indexes are required to be added when each keyword is input. Taking Ansj Chinese word segmentation as an example, a general thesaurus and an exclusive thesaurus need to be created and loaded into a configuration file, and a fault index corresponding to a keyword needs to be separately stored into a database table.
2. Information collection
The collected fault data information includes in-band fault data information and out-of-band fault data information. Specifically, a target address of the information forwarding server is configured in advance as a server IP where the information acquisition module is located, that is, an IP of the target server; starting an information acquisition program, automatically issuing an information acquisition script to each client server at regular time, keeping a link with an information forwarding server smooth, and waiting for receiving fault data information; after receiving a command for running the information acquisition script, the client server automatically runs the information acquisition script and returns an execution result to the target server; when the client server fails, the corresponding fault data information is acquired through an operating system of the client server or a corresponding preset monitoring platform, and is forwarded to the target server through the information forwarding server. When the BMC of the client server detects the fault data information, the fault data information is forwarded to the target server through the information forwarding server.
As shown in fig. 3, for a specific schematic diagram of a storage directory structure provided in an embodiment of the present invention, collected failure data information is stored in a file form, in-band failure data information is stored in an in _ band folder, out-of-band failure data information is stored in an out _ of _ band folder, and failure data information in either an in-band or an out-of-band is stored in units of servers (e.g., server 01), and a failure of the same server is stored in units of timestamps (e.g., 1594532194).
3. Data storage
And starting a data storage program and entering a storage directory of fault data information.
Scanning an in _ band folder and an out _ of _ band folder respectively, acquiring all folder lists of the in _ band folder and the out _ of _ band folder under the current directories respectively, starting from a first folder, entering a server01 directory, scanning a timestamp folder under the directories, reading files in each directory respectively, obtaining fault data information, and identifying the data format type; if the data is in a text format, directly writing the fault data information into a server fault information table; and if the fault data is in the graphic format, calling OCR (optical character recognition) analysis, identifying characters in the fault data information in the graphic format, obtaining the fault data information in the text format, and storing the fault data information in a server fault information table. It should be noted that, when storing the fault data information into the server fault information table, it is necessary to perform corresponding storage according to the in-band fault data information and the out-of-band fault data information.
TABLE 2 Server s
Serial number | Server name | In-band fault information | Out-of-band fault information | Update time |
4. NLP resolution
Starting NLP analysis, reading the contents in the server fault information table one by one, calling a Chinese word segmentation tool (example: ansj) or an English word segmentation tool (example: NLTK) to perform word segmentation on the in-band fault data information and the out-of-band fault data information respectively according to the language type of the read fault data information, and storing word segmentation result information (keywords) into a fault information word segmentation result table. The table structure is as follows:
TABLE 3 Fault information participle result Table
Serial number | Server name | Results of in-band word segmentation | Results with out-of-band word segmentation | Time of update |
And reading a fault information word segmentation result table, and matching the contents of the two columns of the in-band word segmentation result and the out-of-band word segmentation result with a preset word segmentation class library. The method specifically comprises the steps of matching by utilizing a general thesaurus and an exclusive thesaurus, removing words in keywords, storing the matched keywords and the matching number corresponding to each keyword, and storing the matched keywords and the matching number in a fault matching result table. The table structure is as follows:
TABLE 4 Fault matching results Table
Serial number | Server name | In-band keyword matching results | Match result with out-of-band keywords | Update time |
5. Fault calculation
And starting a fault calculation program, wherein the fault calculation program takes the client server as a unit, reads the fault matching result table and calculates the fault value of the server. In this embodiment, the manner of calculating the fault value of the server is as follows:
wherein f is m Representing a fault value of the server, wherein a represents the number of keywords in the in-band fault data information, which are matched with preset keywords in the segmentation class library, b represents the number of keywords in the out-of-band fault data information, which are matched with the preset keywords in the segmentation class library, and c (c = a + b) represents the total number of the keywords in the in-band and out-of-band fault data information, which are matched with the preset keywords in the segmentation class library; δ represents the coefficient level of the in-band fault; gamma represents the coefficient level of out-of-band faults; n is i Showing a phasesMatching number of keywords i in the matched keywords; n is j Representing the matching number of the keywords j in the b matched keywords; x is the number of i Representing the matching coefficient of a keyword i in the a matched keywords; x is the number of j Representing the matching coefficient of the key word j in the b matched key words; x is the number of k (x k =x i ∪x j ) And representing the matching coefficient of the key word k (k = i ≦ j) in the c matched key words.
The calculated current failure value of the client server is stored in the server information table (table 1).
6. Result push
Starting a result pushing program, and reading the fault alarm upper limit value f of each client server recorded in the server information table max (m) and lower limit of failure alarm f min (m); reading current fault value f of client server m Judging whether (f) is satisfied max (m)>f m )&&(f m >f min (m)); if so, determining that the client server can automatically repair the fault, and starting a preset fault repair program to automatically repair the fault; if it is judged that f is satisfied max (m)<f m If the current fault of the server is the fault which can not be automatically repaired, the current fault of the server is represented, so that the alarm information is sent to the target terminal equipment in a mail and/or short message mode, and a server responsible person is informed to carry out manual fault repair on the server; if none of the above conditions is satisfied, i.e. f min (m)>f m And continuing to acquire the fault data information of the client server.
According to the fault detection method of the server, provided by the embodiment of the invention, the word segmentation matching is carried out on the collected fault data information based on the NLP, the fault value is calculated according to the word segmentation matching condition, the fault condition of the server is further determined, the word segmentation matching and the fault condition determining process can be executed by using a unified computer program, and the corresponding fault detection program or the corresponding monitoring platform does not need to be set differently for the servers of different manufacturers or the servers of different operating systems, so that the method can improve the convenience of fault detection of the servers of different manufacturers or the servers of different operating systems, and further improve the efficiency of fault detection of the server.
The above detailed description is given for the embodiment of the method for detecting a fault of a server according to the present invention, and the present invention further provides a device, an apparatus, and a computer-readable storage medium for detecting a fault of a server corresponding to the method.
Fig. 4 is a structural diagram of a fault detection apparatus for a server according to an embodiment of the present invention, and as shown in fig. 4, the fault detection apparatus for the server includes:
the acquisition module 41 is used for acquiring fault data information generated when the server fails;
the word segmentation module 42 is configured to perform word segmentation processing on each fault data information to obtain a corresponding keyword;
a matching module 43, configured to determine matching times of each keyword with preset keywords in a preset segmentation class library;
and the determining module 44 is configured to calculate a fault value of the server according to the matched keyword and the corresponding matching times, and determine a fault condition of the server according to the fault value.
The fault detection device of the server provided by the embodiment of the invention has the beneficial effects of the fault detection method of the server.
As a preferred embodiment, the acquisition module specifically includes:
the first acquisition submodule is used for receiving in-band fault information sent by an operating system of the server and/or a preset monitoring platform;
the second acquisition submodule is used for acquiring in-band fault information by running a preset acquisition script in the server;
the third acquisition submodule is used for receiving the out-of-band fault data information forwarded by the BMC of the server;
correspondingly, the matching module specifically comprises:
and the determining submodule is used for calculating the fault value of the server according to the matched keywords respectively corresponding to the in-band fault data information and the out-of-band fault data information and the corresponding matching times, and determining the fault condition of the server according to the fault value.
As a preferred embodiment, the failure detection apparatus of a server further includes:
the judging module is used for judging the data format type of the fault data information;
the first execution module is used for storing the fault data information into a database and calling the word segmentation module if the fault data information is in a text format;
and the second execution module is used for identifying characters in the fault data information if the fault data information is in a graphic format, obtaining the fault data information in a text format, storing the fault data information in the text format into a database, and calling the word segmentation module.
As a preferred embodiment, the failure detection apparatus of a server further includes:
the alarm value setting module is used for presetting a fault alarm upper limit value and a fault alarm lower limit value;
correspondingly, the determining module specifically includes:
the calculation submodule is used for calculating a fault value of the server according to the matched keywords and the corresponding matching times;
the third execution module is used for determining that the server can automatically repair the fault and starting a preset fault repair program if the fault value is greater than the lower fault alarm limit value and less than the upper fault alarm limit value;
and the fourth execution module is used for determining that the server cannot automatically repair the fault if the fault value is greater than the fault alarm upper limit value, and sending corresponding alarm information.
As a preferred embodiment, the failure detection apparatus of a server further includes:
and the sending module is used for sending the fault condition to the target terminal equipment in a mail and/or short message mode.
Fig. 5 is a structural diagram of a fault detection device of a server according to an embodiment of the present invention, and as shown in fig. 5, the fault detection device of the server includes:
a memory 51 for storing a computer program;
a processor 52 for implementing the steps of the fault detection method of the server as described above when executing the computer program.
The fault detection equipment of the server provided by the embodiment of the invention has the beneficial effects of the fault detection method of the server.
In order to solve the above technical problem, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the fault detection method of the server as described above.
The computer-readable storage medium provided by the embodiment of the invention has the beneficial effects of the fault detection method of the server.
The method, the device, the equipment and the computer readable storage medium for detecting the fault of the server provided by the invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are set forth only to help understand the method and its core ideas of the present invention. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, it is possible to make various improvements and modifications to the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Claims (10)
1. A method for detecting a failure of a server, comprising:
collecting fault data information generated when a server fails;
performing word segmentation processing on each fault data information to obtain corresponding keywords;
determining the matching times of each keyword and a preset keyword in a preset word classification library;
calculating a fault value of the server according to the matched keywords and the corresponding matching times, and determining the fault condition of the server according to the fault value;
the fault value of the server is calculated according to the matched keywords and the corresponding matching times, and is specifically calculated according to the following formula:
wherein, f m For the fault value, a is the number of the keywords matched with the preset keywords in the word class library in the in-band fault data information, b is the number of the keywords matched with the preset keywords in the word class library in the out-of-band fault data information, c = a + b, and c is the number of the keywords matched with the preset keywords in the word class library in the in-band and out-of-band fault data informationThe total number of the keywords matched with the preset keywords in the word class library is delta, gamma is the coefficient grade of the in-band fault, gamma is the coefficient grade of the out-of-band fault, and n is i The matching number n of the keywords i in the keywords which are matched for a j The matching number, x, of the keywords j in the keywords that are b matched i Matching coefficient, x, of keyword i of the keywords matched for a j A matching coefficient, x, of a keyword j of the b matched keywords k =x i ∪x j ,x k And matching coefficients of key words k in the key words matched with the c key words, wherein k = i ≧ j.
2. The method of claim 1, wherein the fault data information comprises the in-band fault data information and the out-of-band fault data information; correspondingly, the process of collecting fault data information generated when the server fails specifically includes:
receiving the in-band fault information sent by an operating system and/or a preset monitoring platform of the server;
acquiring the in-band fault information by running a preset acquisition script in the server;
receiving the out-of-band fault data information forwarded by the BMC of the server;
correspondingly, the process of calculating the fault value of the server according to the matched keyword and the corresponding matching times and determining the fault condition of the server according to the fault value specifically includes:
and calculating a fault value of the server according to the matched keywords and the corresponding matching times which respectively correspond to the in-band fault data information and the out-of-band fault data information, and determining the fault condition of the server according to the fault value.
3. The method of claim 1, wherein after collecting failure data information generated when the server fails, further comprising:
judging the data format type of the fault data information;
if the fault data information is in a text format, storing the fault data information into a database, and performing word segmentation processing on each fault data information to obtain a corresponding keyword;
if the fault data information is in a graphic format, identifying characters in the fault data information to obtain the fault data information in a text format, storing the fault data information in the text format into the database, and performing word segmentation processing on each fault data information to obtain corresponding keywords.
4. The method of claim 1, further comprising:
presetting a fault alarm upper limit value and a fault alarm lower limit value;
calculating the fault value of the server according to the matched keywords and the corresponding matching times;
if the fault value is larger than the lower fault alarm limit value and smaller than the upper fault alarm limit value, determining that the server can automatically repair the fault, and starting a preset fault repair program;
if the fault value is larger than the fault alarm upper limit value, determining that the server cannot automatically repair the fault, and sending corresponding alarm information.
5. The method according to claim 1, wherein the process of performing word segmentation processing on each fault data message to obtain a corresponding keyword specifically includes:
judging the language type of the fault data information;
if the fault data information is Chinese, performing word segmentation processing on the fault data information by using a Chinese word segmentation tool to obtain a corresponding keyword;
and if the fault data information is English, performing word segmentation processing on the fault data information by using an English word segmentation tool to obtain a corresponding keyword.
6. The method according to claim 1, wherein the process of determining the matching times of each keyword with preset keywords in a preset segmentation class library specifically comprises:
determining a removed word in the keyword by using a removed word library in the preset word class library, and deleting the removed word;
and matching the rest keywords by using the keyword class library in the word segmentation class library to determine the matching times of the rest keywords and the preset keywords.
7. The method according to any one of claims 1 to 6, wherein after the calculating a failure value of the server according to the matched keyword and the corresponding matching times and determining a failure condition of the server according to the failure value, further comprising:
and sending the fault condition to the target terminal equipment in a mail and/or short message mode.
8. A failure detection apparatus for a server, comprising:
the acquisition module is used for acquiring fault data information generated when the server fails;
the word segmentation module is used for carrying out word segmentation processing on each fault data information to obtain corresponding keywords;
the matching module is used for determining the matching times of each keyword and preset keywords in a preset segmentation class library;
the determining module is used for calculating a fault value of the server according to the matched keywords and the corresponding matching times and determining the fault condition of the server according to the fault value;
the fault value of the server is calculated according to the matched keywords and the corresponding matching times, and is specifically calculated according to the following formula:
wherein f is m For the fault value, a is the number of the keywords matched with the preset keywords in the segmentation class library in the in-band fault data information, b is the number of the keywords matched with the preset keywords in the segmentation class library in the out-of-band fault data information, c = a + b, c is the total number of the keywords matched with the preset keywords in the segmentation class library in the in-band and out-of-band fault data information, delta is the coefficient grade of the in-band fault, gamma is the coefficient grade of the out-of-band fault, n is the coefficient grade of the out-of-band fault i The matching number n of the keywords i in the keywords which are matched for a j The matching number, x, of the keywords j in the keywords that are b matched i Matching coefficient, x, of keyword i of the keywords matched for a j Matching coefficient, x, of keyword j in the keywords matched for b k =x i ∪x j ,x k And k = i ≡ j is a matching coefficient of a keyword k in the c matched keywords.
9. A failure detection apparatus of a server, characterized by comprising:
a memory for storing a computer program;
processor for implementing the steps of the method of fault detection of a server according to any of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of a method of fault detection of a server according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010821134.5A CN111953544B (en) | 2020-08-14 | 2020-08-14 | Fault detection method, device, equipment and storage medium of server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010821134.5A CN111953544B (en) | 2020-08-14 | 2020-08-14 | Fault detection method, device, equipment and storage medium of server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111953544A CN111953544A (en) | 2020-11-17 |
CN111953544B true CN111953544B (en) | 2023-04-07 |
Family
ID=73342966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010821134.5A Active CN111953544B (en) | 2020-08-14 | 2020-08-14 | Fault detection method, device, equipment and storage medium of server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111953544B (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101499064A (en) * | 2008-02-01 | 2009-08-05 | 华为技术有限公司 | Method and apparatus for building pattern matching state machine |
CN108182523A (en) * | 2017-12-26 | 2018-06-19 | 新疆金风科技股份有限公司 | The treating method and apparatus of fault data, computer readable storage medium |
CN109271272B (en) * | 2018-10-15 | 2022-05-17 | 江苏物联网研究发展中心 | Big data assembly fault auxiliary repair system based on unstructured log |
CN109902153B (en) * | 2019-04-02 | 2020-11-06 | 杭州安脉盛智能技术有限公司 | Equipment fault diagnosis method and system based on natural language processing and case reasoning |
-
2020
- 2020-08-14 CN CN202010821134.5A patent/CN111953544B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111953544A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263009B (en) | Method, device and equipment for generating log classification rule and readable storage medium | |
CN110955550B (en) | Cloud platform fault positioning method, device, equipment and storage medium | |
CN113448935A (en) | Method, electronic device and computer program product for providing log information | |
CN111045902A (en) | Pressure testing method and device for server | |
CN111581057B (en) | General log analysis method, terminal device and storage medium | |
CN112328499A (en) | Test data generation method, device, equipment and medium | |
CN109582504A (en) | A kind of data reconstruction method and device for apple equipment | |
CN112395195A (en) | Method, device and equipment for processing automatic test data and storage medium | |
CN112100070A (en) | Version defect detection method and device, server and storage medium | |
CN111865673A (en) | Automatic fault management method, device and system | |
CN113312258B (en) | Interface testing method, device, equipment and storage medium | |
CN117407242B (en) | Low-cost zero-sample online log analysis method based on large language model | |
CN111966339B (en) | Buried point parameter input method and device, computer equipment and storage medium | |
CN111309584A (en) | Data processing method and device, electronic equipment and storage medium | |
CN113392000A (en) | Test case execution result analysis method, device, equipment and storage medium | |
CN111953544B (en) | Fault detection method, device, equipment and storage medium of server | |
CN111767213A (en) | Method and device for testing database check points, electronic equipment and storage medium | |
CN116340172A (en) | Data collection method and device based on test scene and test case detection method | |
CN116074183A (en) | C3 timeout analysis method, device and equipment based on rule engine | |
CN113037521B (en) | Method for identifying state of communication equipment, communication system and storage medium | |
CN115186001A (en) | Patch processing method and device | |
CN113010339A (en) | Method and device for automatically processing fault in online transaction test | |
CN113342657A (en) | Method and device for detecting code exception | |
CN118295864B (en) | Linux operating system hardware error identification method and system | |
CN112925754B (en) | File descriptor overflow reporting method, device and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |