CN103905238A

CN103905238A - Data center abnormal information collection system and method

Info

Publication number: CN103905238A
Application number: CN201210584066.0A
Authority: CN
Inventors: 林明珉
Original assignee: Hongfujin Precision Industry Shenzhen Co Ltd; Hon Hai Precision Industry Co Ltd
Current assignee: Hongfujin Precision Industry Shenzhen Co Ltd; Hon Hai Precision Industry Co Ltd
Priority date: 2012-12-28
Filing date: 2012-12-28
Publication date: 2014-07-02

Abstract

The invention provides a data center abnormal information collection method. The method is applied to a monitoring server. The monitoring server is connected with each server of a data center. The method comprises the steps that a list is created in the monitoring server, wherein the list is used for storing abnormal information transmitted from each server; every preset time, the abnormal information sent from each server is received; the acquired abnormal information of each server is stored in the list; every preset time, the abnormal information in the list is transmitted to a client; and when an abnormal server is restored, a removing instruction of the server is received to remove the abnormal information of the server in the list. The invention further provides a data center abnormal information collection system. According to the invention, maintenance personnel do not need to set the servers in a single setting manner, which is convenient for the maintenance personnel, improves the efficiency, and improves the stability of the data center.

Description

Data center's abnormal information gathering system and method

Technical field

The present invention relates to a kind of information acquisition system and method, especially about a kind of data center abnormal information gathering system and method.

Background technology

Data center (data center), generally includes several and even station server up to ten thousand, also referred to as server farm (server farm), refers to the facility for settling computer system and associated components, for example, and telecommunications and stocking system.Conventionally, data center comprises redundancy and stand-by power supply, redundant data communication connection, and environment control (for example air-conditioning, fire extinguisher) and safety means, wherein, in data center, most important equipment is the server for storing data.

Generally speaking, the server of data center breaks down unavoidably in the process of operation, and for example, the CPU of server damages, the abnormal conditions of hard disk corruptions.In order to ensure that server runs well, need to there is special maintenance personal to investigate in data center, to ensure in the time that server breaks down, the server can on-call maintenance breaking down, but because the server of data center is very huge, finding out which station server breaks down, need to spend maintenance personal's regular hour, thus, reduce the efficiency of maintenance, also make the time of maintenance elongated, also reduced the stability of data center.

Summary of the invention

In view of above content, be necessary to provide a kind of data center abnormal information gathering system, can know in time which station server of data center breaks down, improve maintenance efficiency, shorten maintenance time, also improve the stability of data center.

In view of above content, be also necessary to provide a kind of data center abnormal information collection method, can know in time which station server of data center breaks down, improve maintenance efficiency, shorten maintenance time, also improve the stability of data center.

A kind of data center abnormal information gathering system, this system runs on monitoring server, this monitoring server is connected with each server of data center, this system comprises: creation module, for creating a list at monitoring server, the abnormal information that this list sends for service device; Receiver module, for every Preset Time, receives the abnormal information that each server sends over; Memory module, for being stored in list by the abnormal information of described each server obtaining; Sending module, for every Preset Time, sends the abnormal information in list to client; Described receiver module, also, in the time that the server of generation abnormal conditions recovers normal, receives the clearance order of this server, to remove the abnormal information of this server in this list.

A kind of data center abnormal information collection method, the method applies to monitoring server, this monitoring server is connected with each server of data center, and the method comprises: in monitoring server, create a list, the abnormal information that this list sends for service device; Every Preset Time, receive the abnormal information that each server sends over; The abnormal information of described each server obtaining is stored in list; Every Preset Time, send the abnormal information in list to client; In the time that the server of generation abnormal conditions recovers normal, receive the clearance order of this server, to remove the abnormal information of this server in this list.

Compared to prior art, data center provided by the invention abnormal information gathering system and method, can know which station server of data center breaks down in time, improved maintenance efficiency, shortened maintenance time, also improved the stability of data center.

Accompanying drawing explanation

The applied environment figure of Tu1Shi data center of the present invention abnormal information gathering system preferred embodiment.

Fig. 2 is the structural representation of monitoring server preferred embodiment of the present invention.

The flow chart of Tu3Shi data center of the present invention abnormal information collection method preferred embodiment.

The structural representation of Tu4Shi data center of the present invention.

Main element symbol description

Client	10
		Monitoring server	20

Communicator	510
		Database	30
Network	40
		Data center	50
Server	500
		Data center's abnormal information gathering system	200
Creation module	210
		Distribution module	220
Receiver module	230
		Memory module	240
Sending module	250
		Memory	260
Processor	270

Following embodiment further illustrates the present invention in connection with above-mentioned accompanying drawing.

Embodiment

Consulting shown in Fig. 1, is the applied environment figure of data center of the present invention abnormal information gathering system 200 preferred embodiments.This data center's abnormal information gathering system 200 is applied in monitoring server 20.This monitoring server 20 is communicated and is connected by network 40 with data center (Data Center) 50.

Described network 40 can be the Internet, local area network (LAN) or other communication network.

Described data center 50 comprises multiple servers 500, and described server 500 is blade server.In the present embodiment, the structure of described data center 50 as shown in Figure 4, puts together in the mode of stack between server 500.On each server 500, also comprise a communicator 510, for being connected with monitoring server 20 by network 40.Described communication unit 510 can be, but be not limited to, the communicators such as bluetooth module, WIFI module, Wideband Code Division Multiple Access (WCDMA) (Wideband Code Division Multiple Access, WCDMA) module and Long Term Evolution module (Long Term Evolution, LTE).The mode that described communication unit 510 is connected with monitoring server 20 networks can be that wireless network connects, and can be also that cable network connects.It should be noted that, due to server 500 One's name is legions in data center 50, use the mode of wireless connections can save the inner space of data center 50, therefore, consider from the angle of saving data center space, the mode that described communicator 510 connects by wireless network is carried out network with monitoring server 20 and is connected.

Described monitoring server 20 is provided with DynamicHost agreement (Dynamic Host Configuration Protocol is set, DHCP) service, serve agreement (the Internet Protocol interconnecting between can distribution network by DHCP, IP) address, to the communicator 510 of each server 500 in data center 50, makes can communicate with each communicator 510 of data center 50 in monitoring server 20.This monitoring server 20 can be personal computer, the webserver, can also be any other applicable computer.In addition it is inner or using the some servers 500 in data center 50 as monitoring server that, this monitoring server 20 can also be placed on data center 50.

This monitoring server 20 is connected and is connected with database 30 by a database.Wherein, described database connection can be an open type data storehouse and connects (Open Database Connectivity, ODBC), or Java database connects (Java Database Connectivity, JDBC).Described database 30 is for storing the abnormal information sending from each server 500 of data center 50, described abnormal information comprises that abnormal conditions occur the numbering of server 500 and server 500 hardware (for example, CPU, memory bar, hard disk, USB interface, supply unit, fan and CD-ROM drive etc.) the information such as numbering.Described abnormal information is detected voluntarily by server 500, as occurs abnormal conditions, and server 500 can record concrete which hardware to be occurred extremely, and the numbering of this hardware is stored in daily record, sends to monitoring server 20 afterwards by communicator 510.

It should be noted that at this, database 30 can be independent of monitoring server 20, also can be positioned at monitoring server 20, and for example described database 30 can be stored in the hard disk or flash disk of monitoring server 20.Consider from the angle of security of system, the database 30 in the present embodiment is independent of monitoring server 20.

Client 10 is for providing an interactive interface to attendant, is convenient to that attendant operates and the various data in operating process are stored in monitoring server 20.This client 10 can be personal computer, notebook computer, mobile phone, panel computer and other equipment that can be connected with monitoring server 20 arbitrarily.In this preferred embodiment, to carry the convenience angle of client 10 from attendant and consider, described client 10 is mobile phone.

Consulting shown in Fig. 2, is the structural representation of monitoring server 20 preferred embodiments of the present invention.This monitoring server 20, except comprising data center's abnormal information gathering system 200, also comprises memory 260 and processor 270.This data center's abnormal information gathering system 200 comprises creation module 210, distribution module 220, receiver module 230, memory module 240 and sending module 250.The sequencing code storage of module 210 to 250 is in memory 260, and processor 270 is carried out these sequencing codes, realizes the above-mentioned functions that data center's abnormal information gathering system 200 provides.

Creation module 210 is for create a list at monitoring server 20, and this list is used for the abnormal information that service device 500 sends.Particularly, on every station server 500, operating system is all installed, operating system is in the process of operation, can detect voluntarily whether normal operation of each hardware, for example, the CPU in server 500 breaks down in the process of instruction of carrying out some programs, cannot work on, operating system can record CPU and break down, and is saved in journal file.Server 500 regularly sends to monitoring server 20 by communicator 510 by the abnormal information in journal file.Generally speaking, abnormal information in journal file is very many, if information all in journal file is passed to monitoring server 20 by each server 500, can cause network 30 block up and the storage pressure of monitoring server 20 increases, blocking up and alleviating the storage pressure of monitoring server 20 for fear of network 30,500 parts that intercept abnormal information in daily record of server send monitoring server 20 to, as the numbering of server 500, the information such as the hardware number of the server 500 of generation abnormal conditions.

Distribution module 220 is the communicator 510 to each server 500 of data center 50 for the DHCP service distribution IP address by monitoring server 20, to establish a communications link with each server 500.

Receiver module 230, for for example, every Preset Time (, one hour), receives the abnormal information of each server 500.The information such as the abnormal information of described reception comprises the numbering of server 500, the hardware number of the server 500 of generation abnormal conditions.In addition, because the mode with stack of the server 500 of data center 50 is put, the numbering of each server 500 comprises line number and the columns of each server 500 in data center 50, and the numbering of each server 500 has embodied the putting position of server 500, as shown in Figure 4, n represents the columns at server 500 places, and m represents the line number at server 500 places.Suppose be numbered (20,1) of certain server 500, represent that this server 500 is the 20 row at the putting position of data center 50, the position of first row.The hardware number of server 500 can be that numeral can be also letter, it can also be the numbering of numeral and alphabetical combination, for example, numbering " 01 " represents CPU, and " 02 " represents hard disk, and " 03 " represents fan, " 04 " represents CD-ROM device, " 05 " represents memory bar, and " 06 " represents supply unit, " 07 " expression " USB interface ".If server 500 comprises multiple same hardware, on former numbered basis, also add differentiation number, to distinguish which hardware, for example, some servers 500 comprise two CPU, in numbering " 0102 ", above two digits " 01 " represents CPU, after two digits " 02 " for distinguishing number, represent second CPU in this server 500.

Memory module 240 is for being stored in list by the abnormal information of described each server 500 obtaining.Particularly, memory module 240 according to the sequential storage of the line number columns of each server 500 in list.

Sending module 250, for for example, every Preset Time (, ten minutes), sends the abnormal information in list to client 10.Particularly, sending module detects in list whether have abnormal information for 250 every ten minutes, if there is abnormal information, sends abnormal information in list to client 10, makes attendant know which hardware of which server 500 goes wrong.For example, the information receiving as maintenance personal during for (20,1,0101), shows the 20 row in data center 50, and first CPU of the server 500 of first row breaks down.

Described receiver module 230 also, in the time that the server 500 of generation abnormal conditions recovers normal, receives the clearance order of this server 500, to remove the abnormal information of this server 500 in this list.After attendant repairs server 500, that is to say, server 500 (is for example normally worked after certain hour again, one hour), send clearance order to monitoring server 20, after monitoring server 20 receives this instruction, remove the abnormal information of this server 500 in list.

As shown in Figure 3, be the flow chart of data center of the present invention abnormal information collection method preferred embodiment.

Step S10, creation module 210 creates a list in monitoring server 20, the abnormal information that this list sends for service device 500.Particularly, on every station server 500, operating system is all installed, operating system is in the process of operation, can detect voluntarily whether normal operation of each hardware, for example, the CPU in server 500 breaks down in the process of instruction of carrying out some programs, cannot work on, operating system can record CPU and break down, and is saved in journal file.Server 500 regularly sends to monitoring server 20 by communicator 510 by the abnormal information in journal file.Generally speaking, abnormal information in journal file is very many, if information all in journal file is passed to monitoring server 20 by each server 500, can cause network 30 block up and the storage pressure of monitoring server 20 increases, blocking up and alleviating the storage pressure of monitoring server 20 for fear of network 30,500 parts that intercept abnormal information in daily record of server send monitoring server 20 to, as the numbering of server 500, the information such as the hardware number of the server 500 of generation abnormal conditions.

Step S20, distribution module 220 is passed through the DHCP service distribution IP address of monitoring server 20 to the communicator 510 of each server 500 of data center 50, to establish a communications link with each server 500.

Step S30, receiver module 230 for example,, every Preset Time (, one hour), obtains the abnormal information that each server 500 sends.The information such as the abnormal information of described reception comprises the numbering of server 500, the hardware number of the server 500 of generation abnormal conditions.In addition, because the mode with stack of the server 500 of data center 50 is put, the numbering of each server 500 comprises line number and the columns of each server 500 in data center 50, and the numbering of each server 500 has embodied the putting position of server 500, as shown in Figure 4, n represents the columns at server 500 places, and m represents the line number at server 500 places.Suppose be numbered (20,1) of certain server 500, represent that this server 500 is the 20 row at the putting position of data center 50, the position of first row.The hardware number of server 500 can be that numeral can be also letter, it can also be the numbering of numeral and alphabetical combination, for example, numbering " 01 " represents CPU, and " 02 " represents hard disk, and " 03 " represents fan, " 04 " represents CD-ROM device, " 05 " represents memory bar, and " 06 " represents supply unit, " 07 " expression " USB interface ".If server 500 comprises multiple same hardware, on former numbered basis, also add differentiation number, to distinguish which hardware, for example, some servers 500 comprise two CPU, in numbering " 0102 ", above two digits " 01 " represents CPU, after two digits " 02 " for distinguishing number, represent second CPU in this server 500.

Step S40, memory module 240 is stored in the abnormal information of described each server 500 obtaining in list.Particularly, memory module 240 according to the sequential storage of the line number columns of each server 500 in list.

Step S50, sending module 250 for example,, every Preset Time (, ten minutes), sends the abnormal information in list to client 10.Particularly, sending module detects in list whether have abnormal information for 250 every ten minutes, if there is abnormal information, sends abnormal information in list to client 10, makes attendant know which hardware of which server 500 goes wrong.For example, the information receiving as maintenance personal during for (20,1,0101), shows the 20 row in data center 50, and first CPU of the server 500 of first row breaks down.

Step S60, in the time that the server 500 of generation abnormal conditions recovers normal, receiver module 230 receives the clearance order of these servers 500, to remove the abnormal information of this server 500 in this list.After attendant repairs server 500, that is to say, server 500 (is for example normally worked after certain hour again, one hour), send clearance order to monitoring server 20, after monitoring server 20 receives this instruction, remove the abnormal information of this server 500 in list.

It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to above preferred embodiment, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not depart from the spirit and scope of technical solution of the present invention.

Claims

1. data center's abnormal information gathering system, this system runs on monitoring server, and this monitoring server is connected with each server of data center, it is characterized in that, and this system comprises:

Creation module, for create a list at monitoring server, the abnormal information that this list sends for service device;

Receiver module, for every Preset Time, receives the abnormal information that each server sends over;

Memory module, for being stored in list by the abnormal information of described each server obtaining;

Sending module, for every Preset Time, sends the abnormal information in list to client; And

Described receiver module, also, in the time that the server of generation abnormal conditions recovers normal, receives the clearance order of this server, to remove the abnormal information of this server in this list.

2. data center as claimed in claim 1 abnormal information gathering system, is characterized in that, described abnormal information comprises the numbering of server and the hardware number of the server of abnormal conditions occurs.

3. data center as claimed in claim 2 abnormal information gathering system, is characterized in that, the numbering of described server comprises line number and the columns of this server in data center.

4. data center as claimed in claim 1 abnormal information gathering system, is characterized in that, between each server of described data center, puts together in the mode of stack.

5. data center's abnormal information collection method, the method applies to monitoring server, and this monitoring server is connected with each server of data center, it is characterized in that, and the method comprises:

In monitoring server, create a list, the abnormal information that this list sends for service device;

Every Preset Time, receive the abnormal information that each server sends over;

The abnormal information of described each server obtaining is stored in list;

Every Preset Time, send the abnormal information in list to client; And

In the time that the server of generation abnormal conditions recovers normal, receive the clearance order of this server, to remove the abnormal information of this server in this list.

6. data center as claimed in claim 5 abnormal information collection method, is characterized in that, described abnormal information comprises the numbering of server and the hardware number of the server of abnormal conditions occurs.

7. data center as claimed in claim 6 abnormal information collection method, is characterized in that, the numbering of described server comprises line number and the columns of this server in data center.

8. data center as claimed in claim 5 abnormal information collection method, is characterized in that, between each server of described data center, puts together in the mode of stack.