KR101783201B1 - System and method for managing servers totally - Google Patents
System and method for managing servers totally Download PDFInfo
- Publication number
- KR101783201B1 KR101783201B1 KR1020150178246A KR20150178246A KR101783201B1 KR 101783201 B1 KR101783201 B1 KR 101783201B1 KR 1020150178246 A KR1020150178246 A KR 1020150178246A KR 20150178246 A KR20150178246 A KR 20150178246A KR 101783201 B1 KR101783201 B1 KR 101783201B1
- Authority
- KR
- South Korea
- Prior art keywords
- server
- managed
- failure
- battery
- managed server
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000007726 management method Methods 0.000 claims description 152
- 238000004519 manufacturing process Methods 0.000 claims description 13
- 230000007423 decrease Effects 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 230000003466 anti-cipated effect Effects 0.000 claims description 2
- 208000032953 Device battery issue Diseases 0.000 claims 2
- 230000015556 catabolic process Effects 0.000 claims 2
- 230000002950 deficient Effects 0.000 claims 2
- 238000006731 degradation reaction Methods 0.000 claims 2
- 239000000243 solution Substances 0.000 description 18
- 230000005540 biological transmission Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000010365 information processing Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003449 preventive effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000013024 troubleshooting Methods 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/30—Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
-
- G06Q50/32—
-
- H04L51/22—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to a server integrated management system and method for managing and integrating servers, and more particularly, to a server integrated management system in which two or more managed servers are integrated and managed, A management server for collecting and managing the status of each server, a hardware server for storing collected hardware information and software information, a database for providing the stored information to the management server, and a manager And a manager terminal for communicating with the management server and displaying the status of the managed server on the screen and transmitting the command input from the manager to the management server, A similar failure occurs by analyzing the fault pattern The server transmits a predicted failure occurrence message indicating that a failure may occur in response to an event generated when a predetermined event is generated in the managed server, to the corresponding managed server, and a solution to the expected failure is managed To the target server. According to the present invention, a failure occurring in the server can be prevented in advance by predicting and alerting a failure occurring in the server and providing a solution, thereby reducing the damage caused by the server failure.
Description
The present invention relates to a server integrated management system and method for managing servers in a unified manner. More particularly, the present invention relates to a server integrated management system for analyzing a failure pattern occurring in a server, And more particularly, to a server integrated management system and method.
BACKGROUND ART [0002] Recently, as computers have become larger and faster, computer troubles due to system errors and viruses are frequently occurring. Especially, in the case of a large capacity server, various troubles due to various operations such as operation of the application programs, data storage, reading and transmission may occur frequently. Therefore, each company maintains a separate server manager that manages these servers, manages the servers, and handles them when a failure occurs.
However, specialized skills are required for server management, and a considerable expense is required to employ such skilled personnel. Therefore, especially in a small-sized enterprise, a suitable person is selected as a server manager instead of employing a professional engineer as a server manager. In such a case, it is difficult to manage the server smoothly, and it is almost impossible to smoothly cope with a server failure.
In addition, even when a server manager having a specialized skill for server management is employed, when the server manager is located at a remote place on the server due to a business trip or the like, it is difficult for the manager to be notified of the situation of the server promptly, In addition, even when the server manager is informed of the failure of the server, it is difficult to immediately deal with the failure because the server manager is located at the remote location. As a result, the server may be seriously damaged.
Conventionally, when a server failure occurs in a server integrated management system that manages a plurality of servers, the system detects the failure and restores the failure after the failure. However, the conventional post-failure recovery method has a problem that the operation of the corresponding server is interrupted during the period of recovering the failed server, the loss due to the interruption of the server use occurs, and the loss due to the labor and cost for recovery is large have.
SUMMARY OF THE INVENTION The present invention has been conceived to solve the above-mentioned problems, and it is an object of the present invention to provide a server integrated management system and method for preventing a failure occurrence by preemptively detecting a failure pattern occurring in a server, The purpose is to provide.
The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.
In order to achieve the above object, according to the present invention, in a server integrated management system in which two or more managed servers are integrated and managed, hardware information and software information are collected from two or more managed servers, A management server, a hardware server for storing collected hardware information and software information, a database for providing information to the management server, and a terminal used by a manager for managing the server integrated management system, And an administrator terminal for displaying the status of the managed server on the screen and transmitting the command input from the administrator to the management server. The management server analyzes the failure pattern of the managed server to prevent similar failures , A predetermined event on the managed server The expected failure message describing the failure that may occur due to the occurrence, the occurrence of an event and passed a resolution to send and, with this predicted failure for that managed server to the managed server.
The management server transmits a predicted failure occurrence message to the manager of the corresponding management server registered in the database through a short message service (SMS) and an e-mail (e-mail) , And a solution to the anticipated failure.
The management server checks the backup battery unit (BBU) cycle of the managed server and informs the managed server of the contents of the backup battery unit cycle when the predetermined period has elapsed.
The management server checks the BBU charge capacity of the managed server and notifies the managed server of the content when the charge efficiency of the battery decreases to a predetermined value or less. For example, the management server checks the charge capacity of the BBU of the managed server and informs the managed server of the content when the charge efficiency of the battery is reduced to 40% or less.
The management server checks the remaining capacity of the BBU of the managed server, and when the remaining capacity of the battery is equal to or less than a predetermined value, it can notify the managed server of the content. For example, the management server checks the remaining capacity of the BBU of the managed server, and notifies the managed server of the remaining capacity of the battery when the remaining battery capacity is 10% or less.
The management server checks the BBU write policy of the managed server and notifies the managed server of the write policy when the write policy is changed.
The management server includes a Dell server among the managed servers. When the management server detects an abnormal operation on an operating system (OS) after a kernel update on the Dell server, It is possible to transmit the expected failure occurrence message to the corresponding managed server and to forward the solution to the expected failure to the corresponding managed server.
The management server may diagnose a memory production cycle of the managed server, determine a predetermined memory production cycle as bad, and inform the management server of the content.
In a server integrated management method in a server integrated management system that integrates and manages two or more managed servers according to the present invention, the server integrated management system collects hardware information and software information from two or more managed servers, Analyzing a failure pattern of a management target server, analyzing a failure pattern, and outputting a predicted failure occurrence message indicating that a failure may occur according to an event generated when a predetermined event occurs, to the management server In addition, a solution to the expected failure can be transmitted to the corresponding managed server.
The server integrated management system transmits a predicted failure occurrence message to a manager in charge of the registered managed server through a short message service (SMS) and an e-mail (e-mail) You can communicate a solution to the expected failure.
The server integrated management system can check the backup battery unit (BBU) cycle of the managed server and notify the management server of the content when the predetermined period has elapsed.
The server integrated management system checks the BBU charge capacity of the managed server and notifies the managed server of the content when the charge efficiency of the battery decreases to a predetermined value or less.
The server integrated management system can check the BBU charging capacity of the managed server and notify the managed server of the contents when the charging efficiency of the battery is reduced to 40% or less.
The server integrated management system checks the remaining capacity of the BBU of the managed server and notifies the managed server of the remaining capacity of the battery when the remaining capacity of the battery is less than a predetermined value.
The server integrated management system can check the remaining capacity of the BBU of the managed server and notify the managed server of the remaining capacity of the battery when the remaining battery capacity is 10% or less.
The server integrated management system can check the BBU write policy of the managed server and notify the managed server of the changed contents when the write policy is changed.
And a Dell server among the managed servers. When the server integrated management system detects an abnormal operation on an operating system (OS) after a kernel update in the Dell server, It is possible to transmit an expected failure occurrence message to the corresponding managed server and to transmit a solution to the expected failure to the managed server.
The server integrated management system may diagnose a memory production cycle of the managed server, determine a predetermined memory production cycle as bad, and inform the management server of the content.
According to the present invention, a failure occurring in the server can be prevented in advance by predicting and alerting a failure occurring in the server and providing a solution, thereby reducing the damage caused by the server failure.
In addition, according to the present invention, the failure pattern generated in the server is analyzed and updated to actively cope with various server failures.
In addition, according to the present invention, not only a server failure is notified in advance, but also a solution method thereof is presented, thereby providing convenience in that the server manager can manage the server more easily.
1 is a diagram illustrating a network configuration of a server integrated management system according to an embodiment of the present invention.
2 is a block diagram illustrating an internal configuration of a server integrated management system according to an embodiment of the present invention.
FIGS. 3 to 12 are screen examples of a server integrated management system according to an embodiment of the present invention.
13 to 16 are exemplary report screens of the server integrated management system according to an embodiment of the present invention.
17 is an example of a screen when an event occurs in the server according to an embodiment of the present invention.
18 is a flowchart illustrating a server integrated management method according to an embodiment of the present invention.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted in an ideal or overly formal sense unless expressly defined in the present application Do not.
In the following description of the present invention with reference to the accompanying drawings, the same components are denoted by the same reference numerals regardless of the reference numerals, and redundant explanations thereof will be omitted. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.
The present invention relates to a server integration management system that integrates and manages two or more managed servers.
1 is a diagram illustrating a network configuration of a server integrated management system according to an embodiment of the present invention.
Referring to FIG. 1, the server integrated management system of the present invention includes a
The server integrated management system integrates and manages a plurality of
The
In addition, the
The manager accesses the
The
The
The administrator terminal 130 is a terminal used by an administrator who manages the server integrated management system and communicates with the
In the present invention, the
The
The
The
The
In addition, the
2 is a block diagram illustrating an internal configuration of a server integrated management system according to an embodiment of the present invention.
2, the
The
The administrator terminal 130 includes an
The information collecting unit 11 of the
The
The
The
The
The
The
The information storage unit 113 serves to store the analyzed information in the
The
The command transmission unit 115 transmits the received command to the
The
The present invention relates to a server integrated management system that integrates and manages a plurality of servers of the present invention, diagnoses various functions of the server, predicts and alerts a failure in advance, and presents a solution method.
First, among various functions of the server, a backup battery unit (BBU) will be exemplified in the present invention.
As an example of a Dell server, it is necessary to check the battery status of the BBU and proceed with preemptive replacement to prevent loss of cache data due to battery controller failure. To do this, the Full Charging efficiency (%) of the battery is checked by checking the log of the Dell server, the equipment with the full charging efficiency of less than 50% is checked, and the battery is replaced. The battery charging efficiency after 36 months is naturally reduced to about 70%, and it can be judged that the charging efficiency is poor for a battery having an additional reduction of about 20%.
FIG. 3 to FIG. 6 show an example of a BBU management function according to an embodiment of the present invention.
3 to 6, the server integrated management system of the present invention performs BBU cycle check, charge capacity check, remaining capacity check, and write policy check, thereby preventing cache data loss, Prevent risk factors in advance.
FIG. 3 shows an example of a BBU period check screen. In the case of battery charging, the disk write policy is changed from WriteBack to WriteThrough, and a phenomenon occurs in which the speed decrease and data loss occur. When it is near 90 days, it informs relevant server about related information.
FIG. 4 shows an example of a screen for checking the BBU charge capacity. As a symptom, there is a phenomenon in which the charge efficiency of the battery drops and the charge process is frequently required. In the case where the charge efficiency of the battery decreases to 40% To the related information.
FIG. 5 shows an example of a screen for checking the remaining capacity of the BBU. As a symptom, there is a possibility that the remaining amount of the battery falls to a dangerous level and the disk writing policy is changed. If the battery charging is required as a processing method and the battery remaining amount is 10% Tells the server about the relevant information.
FIG. 6 shows an example of the BBU write policy check screen. As a symptom, the write policy of the RC card changes and the speed is lowered. The RC card and the battery check are required as a processing method. T, the changed server is checked through the notification function.
FIGS. 7 to 13 are views showing functions of a server integrated management system according to an embodiment of the present invention.
7 is an example of a screen for displaying various OS information such as Windows, Linux, and VMware.
Referring to FIG. 7, the physical system, OS, and software information of the managed server can be retrieved at a time.
FIG. 8 is an example of a screen for allowing a software status and a specific software version to be viewed for the entire managed server.
Referring to FIG. 8, in the server integrated management system of the present invention, a list of software installed in each system of the entire management target server can be inquired, not by accessing an individual system of the managed server, The system of FIG.
FIG. 9 is an example of a screen for identifying a job history of a specific equipment through a condition search, and it is possible to quickly grasp job history information of a specific equipment because it supports condition search through accumulated data.
FIG. 10 illustrates an example of a prediction model in which a pattern of a similar obstacle can be analyzed and prevented and counteracted.
Referring to FIG. 10, it is possible to prevent a similar disorder by selecting a risk group of a risk group for a specific disorder pattern.
The server integrated management system of the present invention can identify the fault information by searching for a date condition such as a fault occurrence date, a work date and time, a completion date and the like.
11 is an example of a screen in which a date condition is searched and a monthly fault is searched.
12 is an example of a screen for analyzing a failure pattern to prevent similar failures.
12 shows that the search condition includes the
13 to 16 are exemplary report screens of the server integrated management system according to an embodiment of the present invention.
Referring to Fig. 13, there is shown an example of a risk group management screen of a preventive check report.
As shown in FIG. 13, the risk management screen is displayed on the upper part of the screen in the form of a graph and a chart, so that the contents can be easily grasped and the risk group name, description, target equipment, normal number, abnormal number, You can easily identify the details of a risk group by marking them in a table with items.
FIG. 14 shows an example of a job management screen in the preventive maintenance report, and it is displayed in the form of a table having a fault name, a job classification, a group, a model, an operator, and a status item along with a chart.
Fig. 15 shows an example of the inventory management screen in the precautionary inspection report, and is displayed in the form of a table having a host name, a change number, a model, a change date and time, and a status item in addition to a chart.
FIG. 16 shows an example of a system management screen in the pre-occurrence check report, and is displayed in a form of a table having a template, a total number, a model, a person in charge, and a registration date item along with a chart.
According to the present invention, when an event occurs, the system diagnoses that a failure may occur in the server through the event, warns the system of the server in advance, and transmits information about the solution. In this regard, there are a variety of events occurring in the server, and new events may occur that have not occurred before. Hereinafter, some events among the events that may occur in the server will be exemplified in the present invention.
1. Latest version of iDRAC7 1.51.51 Latest product applied FAN noise on Dell R720 server (Reading over 12,000 RPM).
The recommended solution is to downgrade to iDRAC7 version 1.46.45.
2. Power usage rate in
Referring to FIG. 17, not only the Dell server but also the HP server are set to operate in the active standby mode by default of the power supply, so that power is supplied to the rack PDU , It is necessary to adjust the ratio of Primary to PSU in order to balance the balance.
3. Operating system error after Dell R620 server kernel update.
At this time, when the
4. Service is disabled due to lack of TCP / IP port.
This is a phenomenon where the network TIME_WAIT session is not closed when the Uptime is more than 497 days in
5. Windows (Windows) 2003, 2008 Event log generation.
6. Memory production cycle diagnosis.
(R730, R930, R630), and the failed OS is a server that contains the hotfix KB3064209 in the
In the present invention, the
7. If you are using a PCIe Type SSD, the device setup will stop responding.
The workaround for this is to update to BIOS 1.1.4 -> 1.2.10.
8. 12G Server (Server) Temperature after sensor update (Sensor) Alert_ (Alert_) continues to occur due to sensor failure.
The solution is to diagnose the BIOS version 2.5.2 and update to the latest firmware
Update.
9. Booting after BSOD occurs after patch update
This is due to the August 2014 Patch Tuesday update Windows error KB2982791.
The failure target is the Windows2008 server, and the failure can be solved through patch update.
10. Windows (Windows) 2012 DNS connection error in client using Active Directory.
When logging in to the domain account on the server, the error "Username or password is incorrect" occurs even though the account and password are normal.
AES256-CTS-HMAC-SHA1-96, AES128-CTS-HMAC-SHA1-96, RC4 without using DES-CBC-MD5 and DES-CBC-CRC encryption from
11. Vulnerabilities in the GNU Bash 4.3 Shell.
Using the Bash vulnerability, attackers are known to be able to perform content and code changes on Web servers, Web site tampering, user data leakage, and DDoS attacks. In addition, attack scenarios of Bash code injection vulnerability under various circumstances such as SSH and DHCP protocol are being raised.
The failure target is Red
12. Buffer overflow vulnerability in the GNU C library (glibc).
A vulnerable function is called when gethostbyname () and gethostbyname2 () functions are frequently used to connect to a network. An external attacker can execute arbitrary code remotely from a vulnerable server.
The target of the failure is Red
13. Bugs in Radhat V5 and V6 series operating systems.
Red
The failure target is Red
14. Raid Controller Battery Fail.
I / O performance is degraded due to unavailability of the RAID controller cache. The failure target is the Raid Controller Battery for Dell Perc 5i, 6i, and the troubleshooting method is the advance replacement every 4-5 years for the Raid Controller Battery for Dell Perc 5i, 6i.
15. CPU IERR System down due to an error (SYSTEM DOWN).
The failure target is the Intel iBridge V2 CPU used server (PE R720, PE R920), and the troubleshooting method is to change the BIOS setting.
For example, if you set the System Profile Settings to Custom, set the System Profile to Custom, set CPU Power Management to Maximum Performance, set C1E to Disabled C States Disabled, and set Monitor / Mwait To Disabled.
16. When using iDrac 1.50.50 F / W (Firmware) (search corresponding version)
Upgrade to iDrac F / W (Firmware) 1.51.51.
1) F / W upgrade on OS (Upgrade)
2) Upgrade through media in everyday life (Upgrade)
18 is a flowchart illustrating a server integrated management method according to an embodiment of the present invention.
Referring to FIG. 18, a server integrated management method in a server integrated management system that integrates and manages two or more managed servers is as follows.
First, hardware information and software information are collected from two or more managed servers, and the status of each server is acquired and managed (S210).
Then, the failure pattern of the managed server is analyzed (S220).
As a result of analyzing the failure pattern, at step S230, a predetermined failure occurrence message indicating that a failure may occur according to the generated event is transmitted to the managed server at step S240.
At the same time, a solution to the expected failure is transmitted to the corresponding managed server (S250).
In the present invention, the server integrated management system transmits a predicted failure occurrence message to a manager in charge of the registered managed server through a short message service (SMS) and an e-mail (e-mail) Together, you can pass a resolution to the expected failure.
The server integrated management system checks the BBU (Backup Battery Unit) cycle of the management target server, and notifies the management server of this content when a predetermined period has elapsed.
Further, the server integrated management system checks the BBU charging capacity of the management subject server and notifies the management server of the contents when the charging efficiency of the battery decreases to a predetermined value or less. For example, the server integrated management system checks the BBU charge capacity of the managed server and informs the managed server of the content when the charge efficiency of the battery decreases to 40% or less.
The server integrated management system checks the remaining capacity of the BBU of the managed server and notifies the managed server of the remaining capacity of the battery when the remaining capacity of the battery is less than a predetermined value. For example, the server integrated management system checks the remaining capacity of the BBU of the managed server, and notifies the managed server of the remaining capacity of the battery when the remaining battery capacity is 10% or less.
The server integrated management system checks the BBU write policy of the managed server and notifies the managed server of the write policy when the write policy is changed.
In one embodiment of the present invention, a Dell server is included among the managed servers, and the server integrated management system detects an abnormal operation on an operating system (OS) after a kernel update in the Dell server , A predicted failure occurrence message that may be caused thereby is transmitted to the corresponding managed server, and a solution for the expected failure can be transmitted to the managed server.
The server integrated management system diagnoses the memory production cycle of the managed server, determines that the predetermined memory production cycle is bad, and informs the managed server of the content.
While the present invention has been described with reference to several preferred embodiments, these embodiments are illustrative and not restrictive. It will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit of the invention and the scope of the appended claims.
110
130
11
13
15
112 information analysis unit 113 information storage unit
114 Command receiving unit 115 Command transmitting unit
131 command transmission unit
Claims (20)
A management server for collecting hardware information and software information from two or more managed servers to identify and manage the status of each server;
A database for storing the collected hardware information and software information, and providing the stored information to the management server; And
And a manager terminal for communicating with the management server and displaying the status of the managed server on the screen and transmitting the command input from the manager to the management server,
In order to prevent the occurrence of a similar failure by analyzing the failure pattern of the managed server, the management server manages a predetermined failure occurrence message describing that a failure may occur according to the generated event when a predetermined event is generated in the managed server And transmits the solution to the target server together with the solution to the expected failure,
If the predetermined event is a phenomenon that can not be closed due to a network TIME_WAIT session remaining when the Uptime is equal to or greater than a predetermined number of days in a specific OS version, A service interruption due to a lack of a port, and a solution to the expected failure is to remove an updated patch of the particular OS version,
Wherein if the predetermined event is a RAID controller battery failure, the predicted failure is a degradation of I / O performance due to a failure of using a RAID controller cache, Is to replace the RAID controller battery,
The predetermined event is a phenomenon in which power is pushed to one of a plurality of rack PDUs in a managed server in which a power supply is set to operate as an active standby by default Wherein the expected failure is a balance collapse of the power usage rate, and a solution to the expected failure is a ratio of the primary to the PSU.
The management server transmits a predicted failure occurrence message to the manager of the corresponding management server registered in the database through a short message service (SMS) and an e-mail (e-mail) And delivering a solution to the anticipated failure.
Wherein the management server checks the backup battery unit (BBU) cycle of the managed server and informs the managed server of the content when the predetermined period has elapsed.
Wherein the management server checks the BBU charge capacity of the managed server and informs the managed server of the content when the charging efficiency of the battery decreases to a predetermined value or less.
Wherein the management server checks the BBU charge capacity of the managed server and informs the managed server of the content when the charge efficiency of the battery is reduced to 40% or less.
Wherein the management server checks the remaining capacity of the BBU of the managed server and notifies the managed server of the remaining capacity of the battery when the remaining capacity of the battery is less than a predetermined value.
Wherein the management server checks the remaining capacity of the BBU of the managed server and informs the managed server of the remaining capacity of the battery when the remaining capacity of the battery is 10% or less.
Wherein the management server checks the BBU write policy of the management server and notifies the management server of the write policy when the write policy is changed.
After the kernel is updated on the management server, if the management server detects an abnormal operation on an operating system (OS), the management server transmits a predicted failure occurrence message to the management server, Together with a solution to the expected failure, to the corresponding managed server.
Wherein the management server diagnoses a memory production cycle of the managed server, determines that a predetermined memory production cycle is defective, and informs the managed server of the content.
The server integrated management system collecting hardware information and software information from two or more managed servers to identify and manage the status of each server;
Analyzing a failure pattern of the managed server; And
As a result of analyzing the failure pattern, when a predetermined event occurs, a predicted failure occurrence message indicating that a failure may occur according to the generated event is transmitted to the corresponding management server, and a solution to the expected failure is transmitted to the corresponding management server , ≪ / RTI >
If the predetermined event is a phenomenon that can not be closed due to a network TIME_WAIT session remaining when the Uptime is equal to or greater than a predetermined number of days in a specific OS version, A service interruption due to a lack of a port, and a solution to the expected failure is to remove an updated patch of the particular OS version,
Wherein if the predetermined event is a RAID controller battery failure, the predicted failure is a degradation of I / O performance due to a failure of using a RAID controller cache, Is to replace the RAID controller battery,
The predetermined event is a phenomenon in which power is pushed to one of a plurality of rack PDUs in a managed server in which a power supply is set to operate as an active standby by default Wherein the expected failure is a balance collapse of the power usage rate, and a solution to the expected failure is a ratio of the primary to the PSU.
The server integrated management system transmits a predicted failure occurrence message to a manager in charge of the registered managed server through a short message service (SMS) and an e-mail (e-mail) And forwarding a solution to the expected failure.
Wherein the server integrated management system checks the backup battery unit (BBU) cycle of the managed server and informs the managed server of the contents when the predetermined period is reached.
Wherein the server integrated management system checks the BBU charge capacity of the managed server and informs the managed server of the content when the charging efficiency of the battery drops below a predetermined value.
Wherein the server integrated management system checks the BBU charging capacity of the managed server and informs the managed server of the content when the charging efficiency of the battery is reduced to 40% or less.
Wherein the server integrated management system checks the remaining capacity of the BBU of the managed server and informs the managed server of the remaining capacity of the battery when the remaining capacity of the battery is less than a predetermined value.
Wherein the server integrated management system checks the remaining capacity of the BBU of the managed server and informs the managed server of the remaining capacity of the battery when the remaining amount of the battery is less than 10%.
Wherein the server integrated management system checks the BBU write policy of the managed server and, when the write policy is changed, notifies the managed server of the changed content.
When the server integrated management system detects an abnormal operation on an operating system (OS) after a kernel update on the managed server, the server integrated management system transmits a predicted failure occurrence message that may be generated to the managed server And a solution method for the expected failure is transmitted to the corresponding managed server.
Wherein the server integrated management system diagnoses a memory production cycle of the managed server, determines that a predetermined memory production cycle is defective, and informs the managed server of the content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150178246A KR101783201B1 (en) | 2015-12-14 | 2015-12-14 | System and method for managing servers totally |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150178246A KR101783201B1 (en) | 2015-12-14 | 2015-12-14 | System and method for managing servers totally |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170070568A KR20170070568A (en) | 2017-06-22 |
KR101783201B1 true KR101783201B1 (en) | 2017-10-13 |
Family
ID=59282914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150178246A KR101783201B1 (en) | 2015-12-14 | 2015-12-14 | System and method for managing servers totally |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101783201B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102176028B1 (en) | 2020-08-24 | 2020-11-09 | (주)에오스와이텍 | System for Real-time integrated monitoring and method thereof |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102139058B1 (en) * | 2019-05-10 | 2020-07-29 | (주)비앤에스컴 | Cloud computing system for zero client device using cloud server having device for managing server and local server |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010526352A (en) | 2006-11-16 | 2010-07-29 | サムスン エスディーエス カンパニー リミテッド | Performance fault management system and method using statistical analysis |
US20150095718A1 (en) | 2013-09-30 | 2015-04-02 | Fujitsu Limited | Locational Prediction of Failures |
-
2015
- 2015-12-14 KR KR1020150178246A patent/KR101783201B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010526352A (en) | 2006-11-16 | 2010-07-29 | サムスン エスディーエス カンパニー リミテッド | Performance fault management system and method using statistical analysis |
US20150095718A1 (en) | 2013-09-30 | 2015-04-02 | Fujitsu Limited | Locational Prediction of Failures |
Non-Patent Citations (1)
Title |
---|
Watanabe 외 4명. 'Online failure prediction in cloud datacenters by real-time message pattern learning'. IEEE 4th International Conference on Cloud Computing Technology and Science, 2012, pp.504-511. |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102176028B1 (en) | 2020-08-24 | 2020-11-09 | (주)에오스와이텍 | System for Real-time integrated monitoring and method thereof |
Also Published As
Publication number | Publication date |
---|---|
KR20170070568A (en) | 2017-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11269750B2 (en) | System and method to assess information handling system health and resource utilization | |
US10761926B2 (en) | Server hardware fault analysis and recovery | |
US10069710B2 (en) | System and method to identify resources used by applications in an information handling system | |
US8839032B2 (en) | Managing errors in a data processing system | |
US8713350B2 (en) | Handling errors in a data processing system | |
US8892965B2 (en) | Automated trouble ticket generation | |
US20110004791A1 (en) | Server apparatus, fault detection method of server apparatus, and fault detection program of server apparatus | |
US10275330B2 (en) | Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus | |
US20160378602A1 (en) | Pre-boot self-healing and adaptive fault isolation | |
US7318171B2 (en) | Policy-based response to system errors occurring during OS runtime | |
CN108292342B (en) | Notification of intrusions into firmware | |
US9798625B2 (en) | Agentless and/or pre-boot support, and field replaceable unit (FRU) isolation | |
KR101783201B1 (en) | System and method for managing servers totally | |
EP2819020A1 (en) | Information system management device and information system management method and program | |
KR20130075807A (en) | An atm with back-up hdd for booting and the booting method there of | |
WO2019241199A1 (en) | System and method for predictive maintenance of networked devices | |
KR102526368B1 (en) | Server management system supporting multi-vendor | |
JP2018169920A (en) | Management device, management method and management program | |
KR20230073469A (en) | Server management system capable of responding to failure | |
US11593191B2 (en) | Systems and methods for self-healing and/or failure analysis of information handling system storage | |
Lundin et al. | Significant advances in Cray system architecture for diagnostics, availability, resiliency and health | |
US20220391277A1 (en) | Computing cluster health reporting engine | |
US20240028723A1 (en) | Suspicious workspace instantiation detection | |
JP2011159234A (en) | Fault handling system and fault handling method | |
JP2017134559A (en) | Server device, screen information acquisition method, and bmc |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right |