WO2019095448A1 - 一种远程教育系统的服务器群的监测系统 - Google Patents

一种远程教育系统的服务器群的监测系统 Download PDF

Info

Publication number
WO2019095448A1
WO2019095448A1 PCT/CN2017/114405 CN2017114405W WO2019095448A1 WO 2019095448 A1 WO2019095448 A1 WO 2019095448A1 CN 2017114405 W CN2017114405 W CN 2017114405W WO 2019095448 A1 WO2019095448 A1 WO 2019095448A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
monitoring
processing
unit
processing server
Prior art date
Application number
PCT/CN2017/114405
Other languages
English (en)
French (fr)
Inventor
陈鹏宇
刘善果
滕凯
Original Assignee
深圳市鹰硕技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市鹰硕技术有限公司 filed Critical 深圳市鹰硕技术有限公司
Publication of WO2019095448A1 publication Critical patent/WO2019095448A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • the invention relates to the field of internet education technology, in particular to a distance education system based on internet technology, in particular to a monitoring system of a server group of a distance education system.
  • Distance education is an educational form between students and teachers, students and educational institutions, which mainly adopts multiple media methods for systematic teaching and communication. Education for one or more students outside the campus. Modern distance education refers to the transfer of courses to off-campus education through audio, video (live or video) and Internet technologies, including real-time and non-real-time. At present, distance education is booming. It is an important means to solve problems such as educational resources and educational opportunities. It not only enriches teaching methods, but also reduces restrictions on teaching places, so it is loved by many students and teachers.
  • the current distance education system is typically architected in a single-machine centralized mode.
  • a single-machine centralized distance education system is usually provided with a teaching terminal, one or more learning terminals, and a network and the teaching terminal.
  • the server communicated by the one or more learning terminals, wherein the server may store teaching resources, process various requests from the teaching end or the learning end, distribute and receive teaching resources according to requests from the teaching end or the learning end, and the like.
  • an Internet-based distance education service system is disclosed in the Chinese Patent Application Publication No. CN107195212, wherein the distance education service system typically employs a single server centralized processing principle. Specifically, it includes an administrator client, a user client, and a cloud server, where the administrator client includes an administrator login module, a teaching video publishing module, and a teaching video classification module, and the administrator login module is used for For the administrator to register the administrator member account and the login of the administrator member account, the teaching video publishing module is used for the administrator to release the teaching video, and the teaching video publishing module is also used for teaching the administrator to publish the teaching.
  • the video is sent to the teaching video classification module, and the teaching video classification module classifies the teaching video, and sends the classified teaching video to the cloud server, where the user client includes a user login module and a video display module.
  • the user login module For the user to register the member account and the login of the member account, the video display module is configured to receive the classified teaching video sent by the cloud server and display it for the user to click to view.
  • a single-machine centralized distance education system and an implementation method thereof are disclosed in the Chinese Patent Publication No. CN104464412A, wherein the distance education system includes: an education terminal, a client and a server; an education terminal and a client Information exchange, the server and the client exchange information through the Internet.
  • the common solution is usually to upgrade a single centralized processing server in the distance education system to speed up the processing by improving the performance of the centralized processing server.
  • this method can improve the processing efficiency of the system; but if the processing time is very high, or the amount of processed data is very large, only through the lifting process Server performance has been unable to meet processing requirements.
  • this solution does not fundamentally address the contradiction between the supply and demand imbalance between the limited processing power of the processing server and the massive processing task request at the technical level.
  • Another promising solution that can fundamentally solve the above technical problems is to replace the existing single server with a server group that can perform parallel computing, that is, first increase the number of servers for task processing, and then manually set or The system automatically configures the batch task requests that need to be processed to be distributed to multiple servers for parallel computing according to certain rules.
  • This solution fundamentally solves the problem in the prior art, that is, when the performance of the single processing server cannot meet the processing requirements, multiple task processing servers can be provided to share the pending task request of the single processing server. In order to meet the business application needs in terms of processing power and processing time.
  • a single-server centralized processing distance education system is disclosed in the Chinese Patent Publication No. CN105184498A, which includes a log management module: the log management module records all operations of the user, and the management personnel can pass the log management module. Query all the log information; the log information includes: a user operation record, a server monitoring log, and a server alarm log, thereby monitoring the working state of the server in the education system by reading the monitoring log of the processing server.
  • the method for obtaining the monitoring information of the server node of the whole cabinet is disclosed in the Chinese Patent Publication No.
  • the method for obtaining the monitoring information of the server node of the entire cabinet is simplified compared with the prior art, and the node BMC and the node are simplified.
  • the node middle board can obtain a large amount of data from the node BMC at a time, especially the real-time change information, which is beneficial to improving the system response time and has strong practicability.
  • a storage server chassis management system and method based on BMC wherein the monitoring of the chassis monitoring module determines whether the state information is abnormal according to the state information of the monitored device, and The abnormal information is reported to the storage module, and the storage module further determines the command, and the command management module issues a corresponding action command to implement management of the monitored device, ensure normal operation of the chassis, improve stability of the storage server, and read the BMC through the IPMI protocol.
  • the running status information is easier to manage than the method based on the SES protocol for chassis management.
  • the process is simple, reducing the management cost and development cost.
  • the process of the chassis management module and the storage module is also linked through the IPC, and the storage module judges once it appears. In the case of a serious abnormality, the input and output streams of the data are interrupted, and the purpose of protecting the data is achieved, and the reliability of the storage server is improved.
  • a terminal state reminding method and related device and system are disclosed in the Chinese Patent Publication No. CN104464412A filed by Facebook Group Holding Co., Ltd., in a specific case.
  • Send a reminder to the management The information is used to know the state of the terminal, so that the state of the first terminal can be known without professional knowledge, which is easier to implement than the traditional way of knowing the state of the terminal by looking at the SEL, which can be to some extent Reduce the workload of the monitor.
  • the above described method for single server monitoring is not applicable to server farm monitoring of a server farm having multiple processing servers, because the server farm including many servers needs to be used for each server when jointly providing network services externally.
  • the working status is monitored. Take the server group with 100 servers as an example. If the monitoring method for a single server is continued, a monitoring device is configured for each processing server to monitor the abnormal state by logging. Obviously, it is unacceptable both in terms of the hardware cost of the monitoring equipment required by the server group and the workload that the monitoring personnel need to face. Obviously, it is difficult for the monitor to find the abnormal part of the server from the contents of the log file in a short time, so that the abnormal condition of the faulty server cannot be known and eliminated in time. In other words, it is this deficiencies in server farm monitoring that technically restricts the large-scale application of server farms for data processing in distance education systems.
  • the present invention provides a monitoring system for a server group of a distance education system, the server group comprising a plurality of processing servers, each of the plurality of processing servers including a motherboard management controller, wherein:
  • the monitoring system includes a monitoring terminal, and the monitoring terminal includes a server status real-time monitoring unit that can perform network connection with a processing server in the server group; the server status real-time monitoring unit of the monitoring terminal periodically performs network with the processing server.
  • the connection, and the host management controller of the processing server is required to report the current status information of the processing server when an abnormality occurs in the network of the discovery server.
  • the present invention tests the server group to provide network service status by making the monitoring terminal actively and periodically connect to the server group, and discovers that the server group has a processing server that cannot provide network services.
  • the monitoring terminal actively requests the baseboard management controller of each processing server in the server group to provide current status information of its own processing server, so that the manager of the server group can timely and clearly understand the abnormality of the server group.
  • the status of the server group, and the corresponding measures or disposal can be immediately performed, thereby improving the management efficiency of the server group.
  • the plurality of processing servers in the server group have the same network address that provides the network service in common, and the motherboard management controllers have different network addresses from each other, and the server state real-time monitoring unit is unified according to the processing server.
  • the network address is connected to the processing server network to test the network service, and is connected to the motherboard management controller network according to the network address of the motherboard management controller to request the motherboard management controller to report current status information.
  • the server group provides a stable network service through a single network address and accepts an access request via a unified port, and then distributes the burden in the server group.
  • the motherboard management controller having a plurality of different network addresses can ensure that the monitoring terminal can accept the current state information of each processing server separately.
  • the server status real-time monitoring unit comprises: at least one network input port for network connection with the mainboard management controller, the network input port is configured to receive return information of the mainboard management controller of the at least one processing server; and the server group Network output/input port for network connection.
  • the number of network input ports is the same as the number of processing servers in the server group, and each network input port is respectively in communication with a motherboard management controller of a processing server.
  • the monitoring terminal according to the present invention can simultaneously receive information returns from the motherboard management controllers of the respective processing servers in parallel.
  • the monitoring system further comprises at least one monitoring proxy server configured to connect to the plurality of proxy servers and configured to accept the plurality of proxy servers connected thereto Status information, and transmit status information of the received plurality of proxy servers to the monitoring terminal, wherein the number of the network input ports is the same as the number of monitoring proxy servers, and each network input port communicates with a monitoring proxy server connection.
  • the number of network input ports in the monitoring terminal only needs to be set to be consistent with the number of monitoring proxy servers, thereby greatly reducing the hardware requirements and costs of the monitoring terminal.
  • the number of servers directly connected to the monitoring terminal is reduced, the problem that the monitoring terminal is not processed in time or the processor of the monitoring terminal is overloaded due to too much report information transmitted by the processing server at the same time is avoided, so that even Reducing the processing speed of the monitoring terminal can also meet the normal monitoring of the server group.
  • the monitoring proxy server includes an information input unit, a data buffer pool, an information output unit, and a control unit respectively communicably connected to the information input unit, the data buffer pool, and the information output unit, wherein the information input unit Accepting status information from a processing server connected thereto and transmitting the status information to a data buffer pool, the control unit is configured to prioritize status information of the plurality of processing servers in the data buffer pool, and The status information after sorting is transmitted to the monitoring terminal in an orderly manner.
  • the state information of the plurality of processing servers can be sorted and the state information of the processing server that is most likely to be faulty can be preferentially transmitted, and the monitoring terminal can further improve the processing of the failure as compared with the simultaneous reporting of the plurality of processing servers. Server identification and response speed.
  • the data buffer pool comprises a queue and a local database, the queue being configured to sort status information of the plurality of processing servers, the local database being configured to temporarily store the status information before being transmitted to the monitoring terminal Status information, wherein the control unit prioritizes the status information in the queue according to the average failure interval of the plurality of proxy servers and the reliability index of the server.
  • the monitoring system further comprises a server monitoring group having a common output port composed of a plurality of processing servers in the server group, the common output port being configured to output status information of the faulty server, wherein the network input port
  • the number is the same as the number of server monitoring groups, and each network input port is separately connected to a common output port of a server monitoring group.
  • each processing server in the server farm is divided into a plurality of common output ports.
  • the monitoring group reduces the number of network input ports and processing requirements of the monitoring terminal.
  • each processing server in the same server monitoring set has a wireless communication unit communicably connected to the motherboard management controller, and each motherboard management controller has a setting unit, a monitoring unit, and a status information output unit, and the setting unit uses Setting a name of a server monitoring group to which the processing server belongs and a communication home of the processing server and a communication home
  • the wireless communication unit includes a transmitting unit and a receiving unit, wherein the transmitting unit and the receiving unit of each processing server are configured Communicate with the communication home and communication to form a complete closed-loop communication chain in the monitoring group.
  • the advantageous condition that the processing servers in the server group are spatially compactly arranged is fully utilized, and a closed data communication link is formed by using short-range wireless communication technologies between the respective processing servers, thereby performing multiple processing.
  • the servers are combined to monitor a server monitoring group with a common output port.
  • the monitoring part is configured to determine whether the status information of the processing server exceeds a status threshold, and when the status threshold is exceeded, issue an instruction to stop transmitting and receiving information to the wireless communication unit to disconnect the closed communication chain
  • the common output port issues an instruction to the processing server at the disconnection point of the closed-loop communication chain, so that the state information output unit to which the control unit belongs belongs to the common output port to transmit the status information of the motherboard controller to which it belongs and transmit through the common output port.
  • the server group further includes an allocating unit connected to the plurality of processing servers by using a transmission control protocol data, wherein the allocating unit is configured to establish an allocation matrix to dynamically manage the number of idle processing processes on each processing server; The unit periodically monitors whether there is a task request to be processed, and if so, checks the number of idle processing processes of each processing server in the allocation matrix. If there is an idle processing server, the pending task is allocated to the idle processing server.
  • an allocating unit connected to the plurality of processing servers by using a transmission control protocol data, wherein the allocating unit is configured to establish an allocation matrix to dynamically manage the number of idle processing processes on each processing server; The unit periodically monitors whether there is a task request to be processed, and if so, checks the number of idle processing processes of each processing server in the allocation matrix. If there is an idle processing server, the pending task is allocated to the idle processing server.
  • the allocation unit automatically assigns batch tasks to the respective processing servers to meet processing capacity and processing time requirements. Therefore, it is possible to make full use of the processing server in the server group, and to process the batch into a batch in a timely and rapid manner in the presence of a large number of access requests.
  • FIG. 1 is a general architectural diagram of a distance education system in accordance with the present invention.
  • FIG. 2 is a block diagram of a first embodiment of a monitoring system in accordance with the present invention.
  • FIG. 3 is a flow chart showing the operation of the monitoring system of Figure 2;
  • FIG. 4 is a block diagram of a second embodiment of a monitoring system in accordance with the present invention.
  • FIG. 5 is a block diagram of a server of a third embodiment of a monitoring system in accordance with the present invention.
  • FIG. 6 is a block diagram of a third embodiment of a monitoring system in accordance with the present invention.
  • processing server group 101 allocation unit 11 (a) processing server 12 (a) processing server 13 (a) processing server 14 (a) processing server 15 processing server 2 client 3 network port 4 motherboard management controller 41 EWS interface 42 monitoring Department 43 setting unit 44 status information output unit 5 monitoring terminal 51 server status real-time monitoring unit 5a network input/output port 5b network input port 61 proxy server 62 proxy server 61a information input unit 61b data buffer pool 61c information output unit 61d control unit 7
  • the distance education system includes: a processing server group 1, wherein the processing server group 1 includes a plurality of processing servers 11-14, 11a-14a, etc., in the present invention, according to a task request amount to be processed How many sets of processing servers are set as an example of the present invention, which has 40 processing servers in the present invention, and only partially shows in FIG. 1 for the sake of simplicity. Process the server.
  • the processing server group 1 in order to enable a plurality of processing servers in the server group 1 to jointly provide network services (for example, but not limited to, providing a teaching website for users in the user terminal 2), the processing server group 1
  • the network ports of the plurality of processing servers have the same network address (IP), and the plurality of processing servers 11-14 and the like can be commonly used to provide network services to the client 2 to implement load sharing of the internal processing server.
  • any client 2 can connect to or access the network port of the server group 1 according to the same network address, thereby accepting computing services from the processing server.
  • the allocation unit converts the access request into a processing server having an idle processing process in the server group 1 according to a certain allocation rule by a certain hash (HASH) algorithm.
  • HASH hash
  • the client 2 may be a teaching terminal or a learning terminal operated by an operator of a remote education system, such as a teacher and/or a consumer such as a student, to operate through a wired network or a wireless network.
  • a remote education system such as a teacher and/or a consumer such as a student
  • the server group 1 may realize the teaching and training of distance education. Since the plurality of processing servers 11-14 in the server group 1 have the same network address, the batch processing request from the client terminal 2 can access the server group 1 via the same network address as the arrow A direction shown and then distribute it to the network.
  • the processing servers 11-14 perform processing and commitment, thereby being able to efficiently process batch task requests, and then multiple processing servers in the server group 1 process the assigned task requests accordingly and in the direction of the arrow B direction
  • the operation in response to the user request is transmitted back to the client terminal 2, thereby realizing the interaction between the server group 1 and the client terminal 2 and finally completing the teaching work of the distance education system.
  • an allocation unit 101 may be provided in the server group 1, wherein each of the processing servers 11-14 and the like may be via a Transmission Control Protocol (TCP). Sending data to the allocating unit to inform the allocating unit 101 The total number of tasks that can currently be performed by itself, so that the allocation unit allocates access task requests.
  • An allocation matrix may be established in the allocation unit to record the current task number of each processing server, and the task allocation of each processing server is managed by the allocation matrix. Specifically, the matrix mainly records the number of idle processing processes on each processing server 11-14 for reference use when assigning tasks.
  • the allocation unit After receiving the batch processing task request, the allocation unit will sort each task request according to the first in first out (FIFO) rule to form a task request queue to wait for allocation to each processing server for processing.
  • the allocation unit periodically checks the task request queue to see if there is a task request waiting to be executed. If there is a task request to be executed in the queue, the allocation unit checks the allocation matrix to check the current number of idle processing processes of each processing server. If the number of idle processing processes of the processing server is greater than 0, it means that the processing server can perform task request allocation, and thus the task request allocation can be performed by the allocation unit, for example, via the HASH algorithm; if each processing server is currently busy, then Wait for the processing server to be idle.
  • FIFO first in first out
  • the allocating unit sequentially extracts the tasks to be processed in the task queue, and allocates the task requests to be processed to the idle processing server for processing by sending TCP data, and adjusts the number of idle processing processes of each processing server in the allocation matrix accordingly.
  • the distribution server checks that multiple processing servers have idle processing
  • the present invention distributes the allocation according to the number of idle processing processes on each processing server in order to fully utilize the resources of the processing servers.
  • the allocation unit can be 3
  • the processing task requests are assigned to the processing server 11, and the two processing task requests are assigned to the processing server 12, and finally one processing task request is assigned to the processing server 13.
  • This kind of allocation scheme allocates more tasks to the processing server with a large number of idle processing processes, and the processing server with fewer idle processing processes allocates fewer tasks or does not allocate tasks, which is beneficial to fully utilize the performance of all processing servers. To achieve the purpose of balancing external tasks in a balanced manner and handling tasks in a timely manner.
  • the allocation unit may also be determined according to the number of task requests to be allocated and the number of idle processing processes on each processing server. For example, each processing server may be assigned the same task request. Or, as an alternative, first batch The task is assigned to a processing server until the number of idle processing processes of the server is 0, and the remaining tasks are assigned to other idle processing servers.
  • each processing server includes a Baseboard Management Controller (BMC) 4.
  • BMC Baseboard Management Controller
  • the BMC 4 is a microcontroller embedded in the main board of the processing server 11, which mainly monitors the processing server 11 such as temperature, power mode, cooling fan speed, operating system state or other hardware.
  • the status information such as the status of the driving device is transmitted, and the warning signal is transmitted to the monitoring terminal 5 in the monitoring system in time.
  • the network ports 3 of the respective processing servers have the same network address
  • the BMCs of the respective processing servers respectively have different network addresses
  • the BMC can be provided with embedded based on its own network address.
  • An embedded web server (EWS) interface 41 thereby allowing the monitoring terminal 5 in the present invention to be linked to the network address of the BMC 4 of each processing server to log in to the EWS interface 41, and receive the processing from the EWS interface 41.
  • the status information of the server for example, the hardware and software status within the processing server 11, and monitors the processing server 11.
  • a server status real-time monitoring unit (hereinafter referred to as a monitoring unit) 51 is configured, wherein the monitoring unit 51 includes: a plurality of network input ports 5b for network connection with the BMC4, and the network input port 5b is used.
  • the number of the network input ports 5b is the same as the number of processing servers in the server group 1, and is, for example, in one-to-one correspondence with the EWS interface 41 of the BMC; and a server The group performs a network connection network output/input port 5a, wherein the monitoring unit 51 can be uniformly connected to the network ports of the respective processing servers via the network output/input port 5a to test whether the respective processing servers 11-14 can be normal.
  • the network service is provided to the outside, for example, the client terminal 2, and if each processing server can be normally connected to the monitoring unit 51 of the monitoring terminal 5, the monitoring unit 51 is interrupted with each server after a predetermined time (for example, 1 second). Online, then, the monitoring unit 51 will wait for a predetermined time (for example, 1 hour), The monitoring unit 51 then tries again with each online processing server to test whether they are still able to properly provide network services, enabling monitoring of a server farm 1.
  • the monitoring unit 51 monitors that there is a processing server in the server group 1 that cannot provide the network service normally, and the monitoring unit 51 according to each processing server
  • the network address of the BMC4 is connected to the BMC4 network, and the BMC4 is sent an instruction to report its current status information.
  • the EWS interface 41 provided by the BMC4 issues a report command to the BMC4, and the BMC4 is required to report the current status information of the processing server. Therefore, when the monitoring unit 51 receives the current status information of the processing server reported by the BMC of each processing server, it records the current status information of each processing server and displays it in real time on the display server 5 outside the display of the monitoring terminal 5.
  • the monitoring personnel for example, the server administrator, enable the network administrator to determine the faulty processing server and perform maintenance operations based on the status information records of the respective processing servers.
  • FIG. 3 shows an operation method of the monitoring system according to FIG. 2, wherein the monitoring system can periodically monitor the operation status of the server group 1, and when the server group 1 is abnormal, the monitoring personnel of the server group 1 or The administrator can immediately obtain the system status information of the server group 1 for real-time disposal.
  • the monitoring terminal 5 in the present invention periodically performs network connection with the server group 1, that is, the monitoring unit 51 of the monitoring terminal 5 according to the unified network address of the server group 1.
  • the network output/input port 5a is periodically connected to each processing server of the server group 1 (for example, periodically for 1 hour) to test whether the network service provided by the server group 1 is normal.
  • step 102 of FIG. 3 if no abnormality occurs in the network services provided by each processing server in the server group 1, the monitoring terminal 5 disconnects the network connection with each processing server in the server group 1, and as shown in FIG. Step 103 is shown, the monitoring terminal 5 will wait in the period until the end of the cycle, perform step 101 again, and perform network connection with each processing server of the server group 1 by its network output/input port 5a to achieve timing monitoring.
  • step 102 when the monitoring unit 51 finds that a normal network connection cannot be made with one or more processing servers in the server group 1, this means that there is a faulty processing server in the server group 1.
  • step 104 of FIG. 3 the monitoring unit 51 of the monitoring terminal 5 is connected to the BMC network of each processing server via a plurality of network input ports 5b that are network-connected to the EWS interface 41 of the mainboard management controller, and is issued. Require BMC4 to return it The instruction of the current status information requires the BMC 4 to report the current status information of its own processing server.
  • the monitoring unit 51 records the fault status and current status information of the processing server, and displays it on the display of the monitoring terminal 5 in real time. Therefore, the monitoring personnel of the server group 1 can visually understand the status information of each processing server in the server group 1 in real time by the monitoring terminal 5, perform immediate processing on the failed processing server, or analyze the abnormality of the processing server that has failed. s reason.
  • the above embodiment uses the monitoring terminal 5 to actively and periodically perform network connection with the server group 1 to test the overall running status of the server group 1, and to discover the external network connection of the processing server in the server group 1.
  • the monitoring unit 51 of the monitoring terminal 5 actively requests the BMC4 of each processing server to provide the current state information of the processing server, so that the monitoring personnel of the server can timely and clearly understand the abnormality of the server group 1.
  • the status of each processing server in the server group 1 can immediately perform corresponding measures or treatments, thereby improving the monitoring efficiency of the server group 1. It can be seen that the present invention can realize fault monitoring of a large server group 1 with low hardware cost and labor cost, which clears the obstacles for applying the server group to the remote education system for task request processing. .
  • the purpose of the following preferred embodiment is to further reduce the hardware cost of the monitoring system, that is, to reduce the number of network input ports 5b in the monitoring terminal 5 that are network-connected to the EWS interface 41 of the BMC 4 of the processing server, without affecting To the monitoring of server group 1.
  • the monitoring system of the server group 1 according to the second embodiment of the present invention includes a plurality of monitoring proxy servers 61-62, which are shown, for example, in FIG. 4 (but are known to those skilled in the art). It is not limited to two), wherein one monitoring proxy server 61 or 62 simultaneously serves a plurality of processing servers in the server group 1, thereby using the monitoring proxy server as a plurality of processing servers in the server group 1 and a single monitoring terminal 5
  • the transmission interface between the monitoring port 5b can be set to match the number of monitoring proxy servers, thereby greatly reducing the hardware requirements and costs of the monitoring terminal.
  • the components in the second embodiment have substantially the same structure as the first embodiment except for monitoring the proxy server. Therefore, components having substantially the same functions as those of the first embodiment are given the same reference numerals herein.
  • the monitoring system mainly includes: a server group having a plurality of processing servers 11-14, 11a-14a, etc. 1, two monitoring proxy servers 61 and 62, and a monitoring terminal 5; .
  • a plurality of processing servers 11-14 are respectively connected to the monitoring proxy server 61
  • a plurality of proxy servers 11a-14a are respectively connected to the monitoring proxy server 62, thereby realizing the monitoring of the proxy server and the plurality of proxy servers.
  • the monitoring proxy server 61 and the monitoring proxy server 62 are respectively connected to the monitoring terminal 5 so that the received status information from the processing server can be transmitted to the monitoring terminal 5.
  • the number of monitoring proxy servers is less than the number of processing servers 11-14 and 11a-14a in the server farm 1 to be monitored, and each monitoring proxy server can provide monitoring for multiple proxy servers.
  • the number of network input interfaces of the monitoring terminal 5 and the processing capability requirements of the monitoring terminal can be greatly reduced without affecting the monitoring function of the monitoring system.
  • the monitoring system exemplarily shows eight proxy servers 11-14 and 11a-14a, which are served by two monitoring proxy servers 61-62, respectively. Therefore, the monitoring terminal 5 only needs to have two network input ports, and no need to have eight network input ports, so the load and hardware cost can be effectively reduced.
  • a signal is first sent to the monitoring terminal 5 to register with the network input port 5b of the monitoring terminal 5.
  • a plurality of proxy servers in the server group 1 to be monitored also need to register with the monitoring terminal 5 first, and then accept the allocation of the monitoring terminal 5 after the registration is completed. Thereby, these processing servers can know which monitoring proxy server must pass to transmit their own status information to the monitoring terminal.
  • the processing server mainly transmits its own state information, such as temperature, power mode, cooling fan speed, operating system state or state of other hardware-driven devices, to the assigned monitoring proxy server 61 or 62, and Then, the monitoring proxy server performs buffering and priority sorting, and then sequentially inputs the monitoring terminal 5 for external display and analysis processing.
  • state information such as temperature, power mode, cooling fan speed, operating system state or state of other hardware-driven devices
  • the monitoring proxy server 61 includes an information input unit 61a, a data buffer pool 61b, an information output unit 61c, and a control unit 61d, wherein the information input unit 61a and the data buffer pool 61b
  • the information output unit 61c is sequentially communicably connected, and the control unit 61d is also communicably connected to the information input unit 61a, the data buffer pool 61b, and the information output unit 61c, respectively.
  • the monitoring proxy server 61 When the monitoring proxy server 61 receives the status information from the processing server assigned to itself, it will first transmit it to the data buffer pool for buffering and send a notification of receipt of the input information to the control unit.
  • the data buffer pool mainly includes a queue and a local database, and the queue is used to sort the status information of the input to be processed, and the local database is used to temporarily store the status information before the status information has been successfully transmitted to the monitoring terminal 5. status information.
  • the control unit sorts the status information of the plurality of proxy servers in the queue according to a predetermined rule to sequentially input the status information of the plurality of proxy servers to the monitoring according to a certain rule. terminal.
  • the probability of simultaneous failure or abnormal state is essentially non-existent.
  • the proxy server in the server group 1 may malfunction or be abnormal at the wrong time due to the service life or load condition of each proxy server itself.
  • Status information is transmitted later or later. Since the status information of the plurality of processing servers can be sorted and the status information of the processing server that is most likely to be faulty is preferentially transmitted, compared with the simultaneous reporting of the plurality of processing servers, the monitoring terminal can further improve the monitoring terminal to the failed processing server. Identify and respond to speed.
  • the average failure time interval of the plurality of processing servers and the reliability index of the processing server are recorded by the control unit to evaluate the probability of the failure of the processing server, thereby implementing the status information.
  • the priority of the transmission is sorted.
  • the average failure time interval (MTBF) of each proxy server and the reliability index of each processing server are recorded by the control unit (the reliability index is set by the administrator based on the computing power and storage space of the processing server) ).
  • a weighting factor is assigned to the MTBF and the reliability index, respectively, wherein the sum of the weighting factors is 1.
  • the MTBF has a weight of 0.6 and the reliability index has a weight of 0.4.
  • the control unit separately calculates the difference between the average failure interval time and the run time of each proxy server according to the time when the server group 1 has been running, and multiplies the difference with the weight of the MTBF and adds the reliability index.
  • the value can be used to get the current fault wind of each proxy server. Risk factor.
  • the control unit determines that the probability of the server 12 malfunctioning is greater than the probability that the server 11 is faulty, and then monitors the control in the proxy server 61 when the monitoring terminal 5 requests the BMC4 of each processing server to report its own state information.
  • the unit ranks the priority of the status information of the processing server 12 located in the queue in front of the status information of the processing server 11.
  • the control unit in the monitoring proxy server ranks the priority of the status information of the processing server 12 located in the queue at the processing server 11 Status information is in front of it.
  • the state information of the completed priority ranking may be temporarily stored in the local database in the data pool to ensure that the state information is not transmitted before being sent out. Will be lost. This design can effectively improve the security of the monitoring system and ensure the integrity of the data. Subsequently, when detecting that the network input port 5b of the monitoring terminal 5 can accept the status information, the control unit issues an instruction to allow the status information stored in the local database to be transmitted to the network input port 5b of the monitoring terminal via the information output unit, thereby completing the status. The transmission of information.
  • the monitoring proxy server and the processing server perform registration and processing server allocation work, so that each processing server knows which monitoring proxy server itself returns the status information to the monitoring terminal 5.
  • the monitoring terminal 5 periodically makes a network connection with the server group 1 to test whether the network service provided by the server group 1 is normal. If no abnormality occurs in the network services provided by each processing server in the server group 1, the monitoring terminal 5 interrupts the network connection with each processing server in the server group 1. After waiting for a period of time, the monitoring terminal 5 is again connected to the server group.
  • Each processing server of 1 performs a network connection to achieve the purpose of regularly monitoring the operation status of the server group 1.
  • the monitoring terminal 5 finds that a normal network connection cannot be made with one or more processing servers in the server group 1, the monitoring terminal 5 requests the BMC 4 to report the current status information of its own processing server.
  • the BMCs of the multiple processing servers report their own status information
  • the status information is first transmitted to the information input unit of its associated monitoring proxy server, and is forwarded to the queue of the data pool after the information input unit accepts the status information.
  • the control unit performs priority ordering. After receiving the notification from the information input unit, the control unit sorts the status information of the plurality of proxy servers in the queue according to a predetermined rule, and the status information of the plurality of proxy servers that have been sorted are temporarily stored in the local database. .
  • the control unit detects that the network input port of the monitoring terminal can accept the status information, it allows the status information stored in the local database to be sent to the network input port of the monitoring terminal via the information output unit, thereby completing the transmission of the status information.
  • the second embodiment can effectively control the number of network input ports of the monitoring terminal by setting at least one monitoring proxy server as a transmission interface between the plurality of processing servers and the monitoring terminal. And requirements for monitoring the processing capabilities of the terminal.
  • the number of servers that are in communication connection with the monitoring terminal is reduced, the problem that the monitoring terminal is not processed in time or the processor of the monitoring terminal is overloaded due to too much report information transmitted by the processing server at the same time is avoided, so that even if it is lowered, Monitoring the processing speed of the terminal can also meet the normal monitoring of the server group.
  • a third embodiment in accordance with the present invention is illustrated in Figures 5-6.
  • the third embodiment further reduces the hardware requirements for the monitoring terminal 5 on the basis of the above embodiment.
  • the third embodiment makes full use of the advantage that the processing servers in the server group are spatially compactly arranged, and by using the short-range wireless communication technology between the processing servers to form a closed data communication link,
  • the processing servers are combined into a single server monitoring group with a common output port for monitoring.
  • each processing server in the server group 1 is divided into a plurality of monitoring groups having a common output port, thereby reducing the number of network input ports of the monitoring terminal and processing requirements.
  • FIG. 8 A processing server in a server farm according to a third embodiment of the present invention is shown in FIG.
  • the BMC 4 and a wireless communication unit 8 in communication therewith are included in the processing server.
  • the wireless communication unit 8 may be NFC (Near Field Communication) or other short-range communication module.
  • the wireless communication unit 8 includes a transmitting unit 81 and receiving unit 82.
  • the BMC 4 includes a setting unit, a monitoring unit, and a status information output unit. As an example, components such as settings in the BMC can be executed by a central processing unit in the BMC.
  • the five processing servers of the processing server 11, the processing server 12, the processing server 13, the processing server 14, and the processing server 15 in the server group 1 are composed of one common
  • the server of the output port 71 monitors the group 7.
  • the number of processing servers in each server monitoring group is not limited to five, and may be set to be more or less than five as needed.
  • the entire server farm 1 can be divided into a plurality of server monitoring groups.
  • the setting unit in the BMC of each processing server is used to set the name of the server monitoring group to which the processing server belongs and the communication home and communication home of the processing server.
  • the processing server 11 is used as the first processing server in the monitoring group, and the name of the monitoring group in which the monitoring group is located is Group 1, and the communication home is the processing server 15, and the communication is provided.
  • the processing server 12 is selected as the second processing server, and the name of the monitoring group in which the monitoring server 12 is located is Group 1, the communication home is the processing server 11, the communication is the processing server 13, and the processing server 13 is selected.
  • the third processing server sets the name of the monitoring group in which it is located to Group 1, the communication home is the processing server 12, the communication is the processing server 14; the processing server 14 is the fourth processing server, and the name of the monitoring group in which it is located is set.
  • the communication home is the processing server 13, the communication is the processing server 15; the processing server 15 is the fifth processing server, the name of the monitoring group where the monitoring group is located is Group 1, the communication home is the processing server 14, and the communication is The next home is the processing server 11; through the above settings of the setting section in the BMC of each processing server 11-15, these five places In the same set of servers to a monitoring group and Group 1 forms a closed loop in the communication chain shown in Figure 6 a complete.
  • the processing server 11, the processing server 12, the processing server 13, the processing server 14, and the processing server 15 communicate with each other through respective wireless communication units 8.
  • the monitoring terminal 5 finds that a normal network connection cannot be made with one or more processing servers in the server group, the monitoring terminal requests the BMC 4 to report the current status information of its own processing server.
  • the BMC of each processing server begins a self-test.
  • the monitoring unit in the BMC 4 of each processing server can monitor the status of the monitored, for example, temperature, power mode, cooling fan speed, operating system state, or other hardware-driven device with a preset normal operation.
  • the status thresholds in the status are compared.
  • the monitoring unit sends an operation instruction to the wireless communication unit 8 accordingly according to whether the above state value has exceeded the state threshold of the normal operating state.
  • each transmitting unit 81 is for transmitting information to a communication home and a communication home belonging to the same monitoring group. Accordingly, each receiving unit 82 is configured to receive information from the communication home and the communication home. The transmitting unit 81 is further configured to send the response information to the communication home or the communication home after the receiving unit 82 receives the information from the communication home or the communication home.
  • the monitoring unit of one or more processing servers in the same server monitoring group finds that the status information value of its own processing server has exceeded the state threshold of the normal operating state, it respectively sends a stop to its own wireless communication unit. Instructions for transmitting information and accepting information to the outside world.
  • the complete closed-loop communication chain of the server monitoring group is disconnected at, for example, the first processing server 11 and the second processing server 12, which indicates that the server monitors the processing server in the group where there is a failure.
  • each processing server since each processing server has its corresponding communication home and communication home, the processing server in the server monitoring group that is faulty or abnormal can be easily located according to the disconnection point of the closed-loop communication chain.
  • the common output port 71 of the server monitoring group issues an instruction to the monitoring unit of the determined failure processing server.
  • the monitoring unit of the failure processing server instructs the state information output unit to which it belongs to transmit the state information of the BMC to which it belongs to the common output port.
  • the common output port is input to the monitoring terminal through the network input port 5b of the monitoring terminal, thereby realizing monitoring of the monitoring group of the server.
  • the third embodiment can implement multiple processing services in the server group 1 only by adding a simple short-range communication unit on the BMC. Divide and quickly identify faulty servers in each server monitoring group. Specifically, since the number of network input ports 5b of the third embodiment is the same as the number of server monitoring groups (the number of monitoring groups is a fraction or even a few tenths of the number of processing servers), with the first embodiment In contrast, this can greatly reduce the number of network input ports 5b of the monitoring terminal 5, which in turn effectively controls the number of network input ports of the monitoring terminal and the processing capability of the monitoring terminal, ultimately reducing the hardware cost of the monitoring system, while It will not affect the monitoring of server group 1.
  • the monitoring proxy server in the second embodiment is eliminated, thereby reducing the hardware cost of the monitoring system without affecting the monitoring of the server group 1.
  • the system of the present invention can actively request the baseboard management controller to provide current status information of its own processing server, so that the manager of the server group can timely and clearly understand the status of the server group after the abnormality of the server group, and can immediately Corresponding measures or disposals are carried out to improve the management efficiency of the server farm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种远程教育系统的服务器群(1)的监测系统,包括多个处理服务器(11-14),多个处理服务器(11-14)均包括主板管理控制器(4),监测系统包括监测终端(5),监测终端(5)包含可与服务器群(1)中的处理服务器(11-14)进行网络连接的服务器状态实时监测单元(51),监测终端(5)的服务器状态实时监测单元(51)定期与处理服务器(11-14)进行网络连接,并在发现与处理服务器(11-14)的网络连接发生异常时要求处理服务器(11-14)的主板管理控制器(4)回报处理服务器(11-14)的当前状态信息,通过主动要求主板管理控制器(4)提供自身服务器(11-14)当前状态信息,能够在服务器群(1)发生异常后及时了解服务器群(1)状况,立即进行相应措施或处置。

Description

一种远程教育系统的服务器群的监测系统 技术领域
本发明涉及互联网教育技术领域,特别是涉及一种基于互联网技术的远程教育系统,具体来说涉及一种远程教育系统的服务器群的监测系统。
背景技术
随着科学技术的发展,出现了众多的远程教育产品,远程教育是学生与教师、学生与教育机构之间,主要采取多种媒体方式进行系统教学和通信联系的教育形式,是将课程传送给校园外的一处或多处学生的教育。现代远程教育则是指通过音频、视频(直播或录像)以及包括实时和非实时在内的互联网技术把课程传送到校园外的教育。目前,远程教育正在蓬勃发展之中,它是解决教育资源、教育机会不均衡等问题的重要手段,不仅丰富了教学方式,还降低了对教学地点的限制,因此得到许多学生和老师的喜爱。
当前远程教育系统典型地采用单机集中式的模式进行架构,一般来说,采用单机集中式的远程教育系统通常设置有包括教学端,一个或多个学习端,以及通过网络与所述教学端和所述一个或多个学习端通信的服务器,其中该服务器可以存储教学资源、处理来自教学端或学习端的各种请求、根据来自教学端或学习端的请求分发和接收教学资源等。
作为单机集中式的远程教育系统的示例,在中国发明专利公开说明书CN107195212中公开了一种基于互联网的远程教育服务系统,其中该远程教育服务系统典型地采用了单服务器集中式处理的工作原理。具体来说,其包括管理员客户端、用户客户端、云端服务器,所述的管理员客户端包括管理员登录模块、教学视频发布模块、教学视频分类模块,所述的管理员登录模块用于供管理员进行管理员会员账号的注册以及管理员会员账号的登录,所述的教学视频发布模块用于供管理员发布教学视频,所述的教学视频发布模块还用于将管理员发布的教学视频发送至教学视频分类模块,所述的教学视频分类模块对教学视频进行分类,并将分类后的教学视频发送至云端服务器,所述的用户客户端包括用户登录模块、视频显示模块,所述的用户登录模块 用于供用户进行会员账号的注册以及会员账号的登录,所述的视频显示模块用于接收云端服务器发送的分类后的教学视频并显示出来供用户点击查看。
进一步,在中国发明专利公开说明书CN104464412A中也公开了一种采用这种单机集中式的远程教育系统及其实现方法,其中该远程教育系统包括:教育终端、客户端和服务器;教育终端与客户端进行信息交换,服务器与客户端通过Internet交换信息。
随着远程教育的不断蓬勃发展,各种远程教育系统所涵盖的教学端和学习端的数量规模不断地扩大、远程教育课程内容不断地增多、教学课程中视频文件和音频文件占比不断地增涨,这些因素都不可避免地导致基于互联网传输的数据交互量呈指数级别增长,在这种情形下在实际使用中发现,常规的单机集中式处理方法在实时数据处理和计算效率方面已经显得有些力不从心,其表现为整个系统的实时性和响应性无法得到保证,从而带来了不好的用户体验,进而降低了使用该远程教育系统的运营商的商业口碑,并且在技术上约束了运营商的潜在市场份额,这是迫切需要在技术上进行克服的现实挑战。
针对这种情况,常用的解决方案通常是升级远程教育系统中的单个集中式处理服务器,通过提升集中式处理服务器的性能来加快处理速度。在成批的任务请求对处理时间要求不高的情况下,这种方式能改善系统的处理效率;但如果对处理时间要求很高,或者处理的数据量非常大的情况下,仅仅通过提升处理服务器性能已无法达到处理要求。换句话说,这种解决方案并没有从根本上在技术层面解决处理服务器的有限处理能力和海量的处理任务请求之间所存在的供需失衡的矛盾。
另外一种能够从根本上解决以上技术问题的、有前景的方案是将现有的单一服务器替换为可以进行并行计算的服务器群,即首先增加进行任务处理的服务器数量,然后通过人工设定或者系统自动配置的方式,将需要处理的成批的任务请求按照一定的规则分配到多个服务器进行并行计算处理。这种方案从根本上解决了现有技术中存在的问题,即当依靠提升单台处理服务器性能无法满足处理需要时,可以配备多台任务处理服务器来分担单台处理服务器的待处理的任务请求,从而在处理能力和处理时间上满足业务应用需求。
尽管这种采用服务器群的技术方案存在显而易见的优点,然而其也相应地存在以下制约其大规模应用的技术难题,即服务器群的可靠性监测。在现有的单服务器集中式处理的远程教育系统中,整个系统的复杂度很低,仅需要对单个服务器进行监测就可以实现对整个系统可靠性的监测。作为一种已知的监控处理服务器的运作状态的方式,通过在处理服务器内配置一个主板管理控制器,该控制器会监测处理服务器的整体状态并记录在系统事件日志(System Event Log,以下将简称SEL)文件中供监测人员定时进行读取,或者传送该SEL文件至终端装置供服务器管理者需要时查看。作为一种示例,在中国发明专利公开说明书CN105184498A中公开了一种单服务器集中式处理的远程教育系统,其中包括日志管理模块:该日志管理模块记录用户的所有操作,管理人员可以通过日志管理模块查询所有的日志信息;所述的日志信息包括:用户的操作记录、服务器监控日志和服务器报警日志,借此,采用阅读处理服务器的监控日志的方式来监控教育系统中服务器的工作状态。同时,在中国发明专利公开说明书CN105868077A公开了一种获取整机柜服务器节点监控信息的方法,其中该获取整机柜服务器节点监控信息的方法与现有技术相比,简化了节点BMC和节点中板的通信过程,节点中板可一次从节点BMC中获取大量数据,尤其是实时变化信息,对提高系统响应时间大有裨益,实用性强。进一步,在中国发明专利公开说明书CN106844162A中公开了一种基于BMC的存储服务器机箱管理系统及方法,其中通过机箱监控模块的监控,根据被监控器件的状态信息,判断状态信息是否为异常,并将异常信息上报给存储模块,存储模块进一步作出判断,命令管理模块发出相应的动作命令,实现对被监控器件的管理,保证机箱运行正常,提高了存储服务器的稳定性;并且通过IPMI协议读取BMC上的运行状态信息,相比基于SES协议进行机箱管理的方法,便于管理,过程简单,降低管理成本和开发成本;还通过IPC将机箱管理模块和存储模块的进程联动,经存储模块判断一旦出现严重异常情况,就中断数据的输入输出流,达到了保护数据的目的,提高了存储服务器的可靠性。
为了降低监测人员阅读系统事件日志的工作量,在由阿里巴巴集团控股有限公司提交的中国发明专利公开说明书CN104464412A中公开了一种终端状态提醒方法及相关设备与系统,其通过在特定的情况下向管理端发送提醒 信息来获知终端的状态,因此无需专业知识即可获知第一终端的状态,相对于传统的通过查看SEL的方式来获知终端状态的方式来说,更加易于实现,这种方式能够在一定程度上降低监测人员的工作量。
然而,以上描述的用于单个服务器监测的方式并不能适用于具有多个处理服务器的服务器群的服务器群监测,这是因为对于包括许多服务器的服务器群在共同对外提供网络服务时需要对各个服务器的工作状态进行监测。以包含100台服务器的服务器群为例,如果继续沿用针对单一服务器的监测方式,即向每一台处理服务器配设一台监测设备以记录日志的方式监测其异常状态。明显地,无论从服务器群所需要的监测设备的硬件成本上,还是从监测人员所需要面对的工作量上,都是不可接受的。显而易见地是,监测人员难以在短时间内从海量的日志文件内容中找到记录服务器异常的部分,进而无法及时了解并排除故障服务器的异常状况。换句话说,正是这种服务器群监测上所存在的不足,从技术上制约了在远程教育系统中大规模应用服务器群进行数据处理。
鉴于此,如何提供一种适用于远程教育系统的服务器群的监测系统,是当前远程教育系统的研发人员将服务器群技术应用到远程教育系统中处理任务请求所急待解决的问题。
申请人在此声明,以上部分中陈述的仅为申请人所知晓的技术内容。上述内容仅仅提供了涉及本公开相关的背景信息但并不必然构成本申请的现有技术。
发明内容
为了克服现有技术中存在的不足,本发明提供了一种远程教育系统的服务器群的监测系统,该服务器群包括多个处理服务器,所述多个处理服务器均包括主板管理控制器,其中:所述监测系统包括监测终端,该监测终端包含可以与服务器群中的处理服务器进行网络连接的服务器状态实时监测单元;该监测终端的该服务器状态实时监测单元定期地与处理服务器进行网络 连接,并在发现与处理服务器的网络发生异常时要求该处理服务器的主板管理控制器回报该处理服务器的当前状态信息。
由此,本发明通过使该监测终端主动且定期地与该服务器群进行网络连接,以测试该服务器群对外提供网络服务的状况,并在发现该服务器群存在不能提供网络服务的处理服务器时,由该监测终端主动要求该服务器群中的各个处理服务器的基板管理控制器提供其自身处理服务器的当前状态信息,让该服务器群的管理者在该服务器群发生异常后,可及时且清楚地了解服务器群的状况,而能立即进行相对应的措施或处置,进而提升服务器群的管理效率。
优选地,其中该服务器群中的多个处理服务器具有共同对外提供网络服务的同一网络地址,且该主板管理控制器彼此间具有不同的网络地址,该服务器状态实时监测单元是根据处理服务器的统一网络地址与处理服务器网络连接以测试该网络服务,并根据该主板管理控制器的网络地址与该主板管理控制器网络连接以要求该主板管理控制器回报当前状态信息。
由此,根据本发明的服务器群通过单个网络地址对外提供稳定的网络服务并经由统一的端口接受访问请求后再在服务器群中进行分配负担。同时,通过具有多个不同网络地址的主板管理控制器可以确保监测终端能够分别接受每一个处理服务器的当前状态信息。
优选地,该服务器状态实时监测单元包括:与主板管理控制器进行网络连接的至少一个网络输入端口,该网络输入端口用以接受至少一个处理服务器的主板管理控制器的回报信息;以及与服务器群进行网络连接的网络输出/输入端口。
优选地,所述网络输入端口的数量与服务器群中的处理服务器的数量相同,每个网络输入端口分别与一个处理服务器的主板管理控制器通信连接。
由此,根据本发明的监测终端能够并行地同时接受来自各个处理服务器的主板管理控制器的信息回报。
优选地,该监测系统还包括至少一个监测代理服务器,该监测代理服务器被配置成连接至多个代理服务器并配置为接受与其连接的多个代理服务器 的状态信息,并将接受到的多个代理服务器的状态信息传输给监测终端,其中所述网络输入端口的数量与监测代理服务器的数量相同,每个网络输入端口分别与一个监测代理服务器进行通信连接。
由此,在该方案中监测终端中的网络输入端口的数量只需要设置为与监测代理服务器的数量一致即可,从而大大降低了监控终端的硬件要求和成本。与此同时,由于降低了与监测终端直接进行通信连接的服务器数量,从而避免了若处理服务器同时传输的回报信息太多造成的监控终端处理不及时或者监测终端的处理器过载的问题,使得即使降低监测终端的处理速度,也能够满足服务器群的正常监测。
优选地,所述监测代理服务器包括有依次通信连接的信息输入单元、数据缓存池、信息输出单元以及分别与信息输入单元、数据缓存池、信息输出单元通信连接的控制单元,其中该信息输入单元接受来自与其连接的处理服务器的状态信息并将该状态信息传输给数据缓存池,所述控制单元用于对数据缓存池内的多个处理服务器的状态信息进行优先级排序,并经由信息输出单元将排序完毕后的状态信息有序地传输给监测终端。
由此,能够将多个处理服务器的状态信息进行排序处理并优先发送最可能发生故障的处理服务器的状态信息,与多个处理服务器同时回报相比,其能够进一步提高监测终端对发生故障的处理服务器的识别和响应速度。
优选地,该数据缓存池包括队列及本地数据库,该队列被配置为用于排序多个处理服务器的状态信息,该本地数据库被配置为用于在状态信息尚未被传输至监测终端前暂时储存该状态信息,其中所述控制单元根据多个代理服务器的平均故障间隔时间和服务器的可靠性指数对队列中的状态信息进行优先级排序。
优选地,该监测系统还包括由服务器群中的多个处理服务器组成的具有共同输出端口的服务器监测组,所述共同输出端口被配置为输出故障服务器的状态信息,其中所述网络输入端口的数量与服务器监测组的数量相同,每个网络输入端口分别与一个服务器监测组的共同输出端口进行通信连接。
由此,服务器群中的各个处理服务器被划分为多个具有共同输出端口的 监测组,从而减小了监测终端的网络输入端口的数量和处理要求。
优选地,在同一服务器监测集中的每个处理服务器均具有与主板管理控制器通信连接的无线通信单元,每个主板管理控制器均具有设置部、监控部及状态信息输出部,该设置部用于设置该处理服务器所属服务器监测组的名称及该处理服务器的通信上家及通信下家,所述无线通信单元包括发射部和接受部,其中每个处理服务器的发射部和接受部被配置用于与其通信上家及通信下家通信,从而在该监测组中形成完整的闭环通信链。
由此,充分利用了服务器群中的各个处理服务器在空间上紧凑布置这一有利条件,通过在各个处理服务器之间利用近距离无线通信技术来组成封闭的数据通信链路,进而将多个处理服务器组合成一个具有共同输出端口的服务器监测组加以监测。
优选地,所述监控部被配置为用于判定处理服务器的状态信息是否超出状态阈值,在超出状态阈值时,向无线通信单元发出停止发射和接受信息的指令从而使所述闭环通信链断开,所述共同输出端口向处于闭环通信链断开点的处理服务器发出指令,以使其所属的状态信息输出部向该共同输出端口发送其所属的主板控制器的状态信息并经由共同输出端口传输至监测终端。
优选地,其中所述服务器群还包括与多个处理服务器通过传输控制协议数据连接的分配单元,其中该分配单元用于建立分配矩阵以动态管理各处理服务器上的闲置处理进程数;所述分配单元定时监测是否有待处理的任务请求,若有,则、检查分配矩阵中各处理服务器的闲置处理进程数,若存在闲置的处理服务器,则将待处理任务分配给闲置的处理服务器。
由此,通过配设有分配单元和多个处理服务器,由分配单元自动将成批的任务分配给各个处理服务器,达到处理能力和处理时间上的要求。因此可以在充分利用服务器群中的处理服务器,在存在大量访问请求的情况下做到及时、快速处理成批量的任务
附图说明
本发明的附加的技术特征及有益效果,将参考以下附图中所示出的实施例而被清楚地呈现,其中:
图1是根据本发明的远程教育系统的总体架构图;
图2是根据本发明的监测系统的第一实施例的架构图;
图3是图2中的监测系统的工作流程图;
图4是根据本发明的监测系统的第二实施例的架构图;
图5是根据本发明的监测系统的第三实施例的服务器的架构图;及
图6是根据本发明的监测系统的第三实施例的架构图。
附图标记列表
1处理服务器群 101分配单元 11(a) 处理服务器 12(a) 处理服务器 13(a) 处理服务器 14(a) 处理服务器 15处理服务器 2用户端 3网络端口 4主板管理控制器 41 EWS接口 42监控部 43设置部 44状态信息输出部 5监测终端 51服务器状态实时监测单元 5a网络输入/输出端口 5b网络输入端口 61代理服务器 62代理服务器 61a信息输入单元 61b数据缓存池 61c信息输出单元 61d控制单元 7处理服务器组 71共同输出端口 8无线通讯单元 81发射部 82接受部
具体实施方式
下面将结合附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
如附图1所示,其中公开了根据本发明的远程教育系统的总体架构示意图,其中在附图1中,以组件的图形来表示各个功能模块的功能和相互间的技术关联。如附图1所示,其中该远程教育系统包括:处理服务器群1,其中该处理服务器群1包括多个处理服务器11-14、11a-14a等在本发明中可以根据待处理的任务请求量多少来设定处理服务器的数量,作为不限定本发明的一个实例,在本发明中该处理服务器群1具有40台处理服务器,为了简便起见,在附图1中仅示意性地示出了部分处理服务器。
在本发明中,为了使服务器群1中的多个处理服务器能够共同对外提供网络服务(例如但不限于提供一个供用户端2中的用户进行操作的教学网站),该处理服务器群1中的多个处理服务器的网络端口具有相同的网络地址(IP),该多个处理服务器11-14等可以共同地用于向用户端2提供网络服务从而实现内部处理服务器的负载分担。在使用中,任意用户端2可以根据该同一网络地址连上或者访问服务器群1的网络端口,从而接受来自该处理服务器的运算服务。在存在访问请求时,由分配单元通过一定的哈希(HASH)算法,将访问请求按照一定的分配规则转换入服务器群1中具有闲置处理进程的处理服务器。
在本发明中,具体来说,用户端2可以是教学端或学习端,该用户端2由远程教育系统的运营商的员工例如教师和/或消费者例如学生操作以通过有线网络或者无线网络或者蓝牙的方式与该服务器群1进行交互,进而实现远程教育的教学培训工作。由于该服务器群1中的多个处理服务器11-14具有同一网络地址,因此来自用户端2的成批的处理请求可以如图示箭头A方向经由同一网络地址访问服务器群1并随后分发给多个处理服务器11-14进行处理和承担,从而能够高效地处理成批的任务请求,随后服务器群1中的多个处理服务器对所分配的任务请求进行相应处理并以图示箭头B方向的方式将响应于用户请求的操作回传给用户端2,由此实现了服务器群1与用户端2之间的交互并最终完成远程教育系统的教学工作。
作为本发明中的服务器群1用于分配来自用户端2的访问请求的示例,可以在该服务器群1中设置有分配单元101,其中各个处理服务器11-14等可以经由传输控制协议(TCP)发送数据给该分配单元以告知该分配单元101 自身当前可以承担执行的总任务数,以便分配单元分配访问任务请求。在分配单元中可以建立一个分配矩阵,记录各个处理服务器的当前任务数情况,通过该分配矩阵来管理各个处理服务器的任务分配。具体来说,该矩阵主要记录各个处理服务器11-14上的闲置处理进程数,用于分配任务时参考使用。
分配单元收到成批的处理任务请求后,将根据先进先出(FIFO)的规则对各任务请求进行排序以形成任务请求队列以等待分配给各个处理服务器进行处理。该分配单元定时检查任务请求队列,查看是否有等待执行的任务请求。如果队列中有待执行的任务请求,则分配单元检查分配矩阵,查看当前各处理服务器的闲置处理进程数的情况。如果有处理服务器的闲置处理进程数大于0,则意味着可以对该处理服务器进行任务请求分配,进而可以由分配单元例如经由HASH算法进行任务请求分配;如果当前各处理服务器均处于忙碌状态,则等待有处理服务器处于空闲状态。
随后,分配单元依次取出任务队列中待处理的任务,通过发送TCP数据将待处理的任务请求分配给闲置的处理服务器进行处理,并相应地调整分配矩阵中各个处理服务器的闲置处理进程数。当分配服务器检查到多个处理服务器都有闲置处理进程时,本发明会根据各个处理服务器上的闲置处理进程数的多少来分散分配,以便充分利用各处理服务器的资源。例如,假设当前有6个访问任务请求,处理服务器11的闲置处理进程数为5,处理服务器12的闲置处理进程数为4,处理服务器13的闲置处理进程数为3;那么分配单元可以将3个处理任务请求分配给处理服务器11,而将2个处理任务请求分配给处理服务器12,最后再将1个处理任务请求分配给处理服务器13。这种分配方案,为闲置处理进程数多的处理服务器分配较多的任务,闲置处理进程数少的处理服务器就分配较少的任务或不分配任务,这样有利于充分发挥所有的处理服务器的性能,达到均衡地承担外部任务从而及时处理任务的目的。
当然,本领域技术人员容易想到的是,分配单元也可以根据待分配的任务请求的数量和各处理服务器上闲置处理进程数的大小决定,例如每个处理服务器可能分配得到相同的任务请求。或者,作为一种替代方式,先将批量 任务分配给一个处理服务器,直至该服务器的闲置处理进程数为0时,再将剩余任务分配给其他空闲的处理服务器。
进一步,在图2中示例性地示出了根据本发明的用于远程教育系统的服务器群1的监测系统。对于服务器群1中的各个处理服务器11-14等来说,各个处理服务器均包括有主板管理控制器(Baseboard management controller,以下简称BMC)4。以处理服务器11为例,其中的BMC 4是一个内嵌于处理服务器11的主板中的微控制器,其主要监测处理服务器11的例如温度、电源模式、冷却风扇速度、操作系统状态或其他硬件驱动设备的状态等状态信息,并适时发送警示讯号传输给监测系统中的监测终端5。在本发明的服务器群1中,与各个处理服务器的网络端口3具有同一网络地址不同,各个处理服务器的BMC4分别具有彼此不同的网络地址,并且该BMC基于其自身的网络地址可以提供有内嵌式网页服务器(Embedded Web Server,以下简称EWS)接口41,从而允许本发明中的监测终端5网络链接到各个处理服务器的BMC4的网络地址而登入EWS接口41,并通过EWS接口41接收来自各个处理服务器的状态信息,例如处理服务器11内的硬件与软件状态,并监控该处理服务器11。
在本发明的监测终端5中配置有一个服务器状态实时监测单元(以下简称监测单元)51,其中该监测单元51包括:多个与BMC4进行网络连接的网络输入端口5b,该网络输入端口5b用以接受一个处理服务器的BMC的回报信息,因此该网络输入端口5b的数量与服务器群1中的处理服务器的台数相同,且例如与BMC的EWS接口41一一对应地通信连接;以及一个与服务器群进行网络连接的网络输出/输入端口5a,其中该监测单元51可以经由该网络输出/输入端口5a统一地与各个处理服务器的网络端口进行联机,以测试该各个处理服务器11-14是否能够正常地向外部例如用户端2提供网络服务,如果各个处理服务器均能够与监测终端5的监测单元51正常网络连接,则在预定的时间后(例如1秒钟)该监测单元51会与各个服务器中断联机,接着,该监测单元51会等待一段预定的时间(例如1小时),再然后该监测单元51会再次尝试与各个处理服务器进行联机以测试其是否均仍能够正常地提供网络服务,从而实现对服务器群1的监测。
当该服务器群1中的一个或多个处理服务器发生故障时,则监测单元51会监测到服务器群1中存在有不能正常提供网络服务的处理服务器,此时监测单元51会根据各个处理服务器的BMC4的网络地址与BMC4网络连接,并向BMC4下发要求其回报其当前状态信息的指令,例如通过BMC4提供的EWS接口41下达回报命令给BMC4,要求BMC4回报处理服务器的当前状态信息。因此,当监测单元51收到各个处理服务器的BMC所回报的处理服务器的当前状态信息后,其会记录各个处理服务器的当前状态信息,并实时显示在监测终端5的显示器以外显给服务器群的监测人员例如服务器管理员,从而使网络管理员能够根据各个处理服务器的状态信息记录判断出发生故障的处理服务器进而对其进行维修操作。
在附图3中示出了根据图2中的监测系统的操作方法,其中该监测系统能够定时监测服务器群1的运作状况,并于服务器群1发生异常时,让服务器群1的监测人员或管理者能立即获得服务器群1的系统状态信息以进行实时处置。具体来说,如图3的步骤101所示,本发明中的监测终端5会定期地与服务器群1进行网络连接,亦即监测终端5的监测单元51会根据服务器群1的统一的网络地址,定期性地以其网络输出/输入端口5a与服务器群1的各个处理服务器进行网络连接(例如,定期为1小时),以测试服务器群1提供的网络服务是否正常。
如图3的步骤102所示,若服务器群1中的各个处理服务器所提供的网络服务均没有发生异常,则监测终端5会与服务器群1中的各个处理服务器中断网络连接,并如图3的步骤103所示,监测终端5会在该周期内保持等待直至周期结束,再次执行步骤101,再次以其网络输出/输入端口5a与服务器群1的各个处理服务器进行网络连接,以达到定时监测服务器群1的运作状况的目的。
与之相反地,如果在步骤102中,当监测单元51发现无法与服务器群1中的某个或者多个处理服务器进行正常的网络连接时,这表示该服务器群1中存在发生故障的处理服务器,那么,在图3的步骤104中,监测终端5的监测单元51会经由多个与主板管理控制器的EWS接口41进行网络连接的网络输入端口5b与各个处理服务器的BMC网络连接,并下达要求BMC4回报其 当前状态信息的指令,要求BMC4回报其自身的处理服务器的当前状态信息。因此,当监测终端5收到BMC回传的处理服务器的当前状态信息后,监测单元51会记录处理服务器的故障状况及当前状态信息,并实时显示在监测终端5的显示器上。因此,服务器群1的监测人员可借由监测终端5实时可视化地了解服务器群1中的各个处理服务器的状态信息,对发生故障的处理服务器进行立即的处置,或者分析发生故障的处理服务器发生异常的原因。
综上所述,上述实施例借由监测终端5主动且定期性地与服务器群1进行网络连接,以测试服务器群1的整体运行状况,并在发现服务器群1中的处理服务器的对外网络连接发生异常状况时,由监测终端5的监测单元51主动地要求各个处理服务器的BMC4提供该处理服务器的当前状态信息,让服务器的监测人员在服务器群1发生异常后,就可以及时且清楚地了解目前服务器群1中的各个处理服务器的状况,而能立即进行相对应的措施或处置,进而提升服务器群1的监测效率。由此可见,本发明可以很低的硬件成本和人力成本即可实现对庞大的服务器群1进行故障监测,这为将服务器群应用到远程教育系统中进行任务请求处理在技术上扫清了障碍。
在下文中将以优选实施例的方式描述在上述实施例的基础上所做出的进一步的改进。以下的优选实施例的目的在于进一步降低监测系统的硬件成本,即在减少监测终端5中的与处理服务器的BMC4的EWS接口41进行网络连接的网络输入端口5b的数量的同时,还不会影响到对服务器群1的监测。
在图4中示出了根据本发明的第二实施例。与上述实施例相比,根据本发明的第二实施例的服务器群1的监测系统包括有多个监测代理服务器61-62,在图4中例如示出为2个(但本领域技术人员知晓并不限于2个),其中使一个监测代理服务器61或62同时服务服务器群1中的多个处理服务器,从而将该监测代理服务器作为服务器群1中的多个处理服务器与单个监测终端5之间的传输接口,借此监测终端5中的网络输入端口5b的数量只需要设置为与监测代理服务器的数量一致即可,从而大大降低了监控终端的硬件要求和成本。与此同时,由于降低了与监测终端直接进行通信连接的服务器数量,从而避免了若处理服务器同时传输的回报信息太多造成的监控终端处理不及时或者监测终端的处理器过载的问题,使得即使降低监测终端的处 理速度,也能够满足服务器群1的正常监测。需要指出的是,除了监测代理服务器之外,第二实施例中的部件具有与第一实施例基本相同的结构。因此,与第一实施例中的部件具有基本相同功能的部件在这里给与相同的编号。
如图4所示,根据本发明的第二实施例的监测系统主要包括:具有多个处理服务器11-14、11a-14a等的服务器群1、2个监测代理服务器61和62以及监控终端5。如图所示,多个处理服务器11-14分别连接到监测代理服务器61,同时多个代理服务器11a-14a则分别连接到监测代理服务器62,由此实现了监控代理服务器与多个代理服务器的BMC4的EWS端口41的信息传输。监测代理服务器61和监测代理服务器62则分别连接到监测终端5,从而可以将接受到的来自处理服务器的状态信息传输给监测终端5。
在本实施例中,监测代理服务器的数量要少于需要监测的服务器群1中的处理服务器11-14及11a-14a的数量,并且每一个监测代理服务器均可以为多个代理服务器提供监测。如此一来,可以大幅减少监测终端5的网络输入接口数量和监测终端的处理能力要求而不影响监测系统的监测功能。例如图4中所示,该监控系统示例性地示出了8个代理服务器11-14以及11a-14a,该8个代理服务器分别由两个监测代理服务器61-62来服务。借此,该监测终端5只需具备两个网络输入端口,而不再需要具备8个网络输入端口,因此可以有效降低其负载合硬件成本。
在监测代理服务器61和62启动后,首先向监测终端5发出信号以向监测终端5的网络输入端口5b进行注册。另一方面,需要被监控的服务器群1中的多个代理服务器也需要先向该监测终端5进行注册,并且在注册完成后,再接受该监测终端5的分配。借此,这些处理服务器可以得知必须通过哪一个监测代理服务器来传递自身的状态信息至该监测终端。本实施例中,该处理服务器主要是将自身的状态信息,例如温度、电源模式、冷却风扇速度、操作系统状态或其他硬件驱动设备的状态等传送给被分配的监测代理服务器61或62,并且再由该监测代理服务器进行缓存和优选级排序后再依序输入该监测终端5进行外显和分析处理。
该监测代理服务器61包括有信息输入单元61a、数据缓存池61b、信息输出单元61c以及控制单元61d,其中信息输入单元61a、数据缓存池61b 和信息输出单元61c依次通信连接,并且控制单元61d也相应地通信连接到信息输入单元61a、数据缓存池61b和信息输出单元61c。当监测代理服务器61接受到来自配属到其自身的处理服务器发出的状态信息后,会先传给数据缓存池进行缓存并发送收到输入信息的通知至控制单元。
其中该数据缓存池主要包括有队列及本地数据库,该队列用以排序待处理的输入的状态信息,而该本地数据库则用以在该状态信息尚未被成功传送至监测终端5之前,暂时储存该状态信息。控制单元在接受到来自信息输入单元的通知后,将根据预定的规则对队列中的多个代理服务器的状态信息进行排序以将多个代理服务器的状态信息依照一定的规则有序地输入到监测终端。
对于服务器群1中的多个代理服务器来说,其同时发生故障或者异常状态的概率基本上是不存在的。在实际使用中,服务器群1中的代理服务器由于各个代理服务器本身的使用寿命或者负载情况会错时地发生故障或者异常。期望的是,以发生故障的概率大小为依据进行状态信息传输优先级的排序,即优先传输最可能发生故障的处理服务器的状态信息,而将不太可能或者基本不可能发生故障的处理服务器的状态信息稍后或最晚传输。由于能够将多个处理服务器的状态信息进行排序处理并优先发送最可能发生故障的处理服务器的状态信息,与多个处理服务器同时回报相比,其能够进一步提高监测终端对发生故障的处理服务器的识别和响应速度。
为此,作为一种优选的排序规则,在本申请中是通过控制单元记录多个处理服务器的平均故障间隔时间和处理服务器的可靠性指数来评估处理服务器发生故障的概率大小,从而实现状态信息的传输优先级排序。具体来说,由控制单元记录有每个代理服务器的平均故障间隔时间(简称MTBF)以及每个处理服务器的可靠性指数(该可靠性指数基于处理服务器的运算能力和存储空间由管理者设定)。为MTBF和可靠性指数分别分配一个权重因子,其中权重因子的总和为1。作为示例,例如,MTBF的权重为0.6,可靠性指数的权重为0.4。控制单元根据服务器群1已经运行的时间来分别计算每个代理服务器的平均故障间隔时间与已运行时间之间的差值,并将差值与MTBF的权重予以相乘并加上可靠性指数的数值即可得出每个代理服务器的当前故障风 险因数。出于示例的目的,如果处理服务器11的MTBF为1000小时,而处理服务器12的MTBF为800小时,当服务器群1在已经工作700小时的时刻,其中的处理服务器发生了故障,那么在两者可靠性基本一致的情况下,控制单元将判定服务器12发生故障的概率大于服务器11发生故障的概率,那么在监测终端5要求各个处理服务器的BMC4回报自身状态信息时,监测代理服务器61中的控制单元将位于队列内的处理服务器12的状态信息的优先级排在处理服务器11的状态信息前面。同样地,如果处理服务器11和处理服务器12的MTBF相差无几,而处理服务器12中所采用的处理器的运算能力和存储空间均大于处理服务器11(这意味着数据溢出和运算能力不足导致宕机的可能性小),那么在监测终端要求各个处理服务器的BMC回报自身状态信息时,监测代理服务器中的控制单元会将位于队列内的处理服务器12的状态信息的优先级排在处理服务器11的状态信息前面。
在控制单元对位于队列中的各个处理服务器的状态信息进行排序完毕后,可以暂时将已完成优先级排序的状态信息储存于该数据池中的本地数据库中,以确保状态信息在传送出去之前不会遗失。如此的设计,可以有效地提升监测系统的安全性,并且确保数据的完整性。随后,控制单元在探测到监测终端5的网络输入端口5b可以接受状态信息时,发出允许存储在本地数据库中的状态信息经由信息输出单元发送至监测终端的网络输入端口5b的指令,从而完成状态信息的发送。
以下将对第二实施例的操作过程进行详细描述。
首先根据以上的描述将监控代理服务器和处理服务器进行注册和处理服务器的分配工作,使各个处理服务器知晓自身通过哪一个监控代理服务器向监测终端5回传状态信息。接下来,监测终端5会定时地与服务器群1进行网络连接,以测试服务器群1提供的网络服务是否正常。若服务器群1中的各个处理服务器所提供的网络服务均没有发生异常,则监测终端5会与服务器群1中的各个处理服务器中断网络连接,在等待一段时间后,监测终端5再次与服务器群1的各个处理服务器进行网络连接,以达到定时监测服务器群1的运作状况的目的。
当监测终端5发现无法与服务器群1中的某个或者多个处理服务器进行正常的网络连接时,监测终端5会要求BMC4回报其自身的处理服务器的当前状态信息。当多个处理服务器的BMC回报自身状态信息时,该状态信息首先被传输到其配属的监测代理服务器的信息输入单元,并在信息输入单元接受了状态信息后转存入数据池的队列中等待控制单元进行优先权排序。控制单元在接受到来自信息输入单元的通知后,将根据预定的规则对队列中的多个代理服务器的状态信息进行排序,已经排序完毕的多个代理服务器的状态信息则暂时存放在本地数据库中。在控制单元探测到监测终端的网络输入端口可以接受状态信息时,其允许存储在本地数据库中的状态信息经由信息输出单元发送至监测终端的网络输入端口的指令,从而完成状态信息的发送。
由上可知,与第一实施例相比,该第二实施例通过设置至少一个监控代理服务器作为多个处理服务器与监测终端之间的传输接口,如此可以有效控制监测终端的网络输入端口的数量和对监测终端的处理能力的要求。与此同时,由于降低了与监测终端进行通信连接的服务器数量,从而避免了若处理服务器同时传输的回报信息太多造成的监控终端处理不及时或者监测终端的处理器过载的问题,使得即使降低监测终端的处理速度,也能够满足服务器群的正常监测。
在图5-6中示出了根据本发明的第三实施例。第三实施例在上述实施例的基础上进一步降低了对监控终端5的硬件要求。该第三实施例充分利用了服务器群中的各个处理服务器在空间上紧凑布置这一有利条件,通过在各个处理服务器之间利用近距离无线通信技术来组成封闭的数据通信链路,进而将多个处理服务器组合成一个具有共同输出端口的服务器监测组加以监测。借此,服务器群1中的各个处理服务器被划分为多个具有共同输出端口的监测组,从而减小了监测终端的网络输入端口的数量和处理要求。
在图5中示出了根据本发明的第三实施例的服务器群中的处理服务器。在第三实施例中,在处理服务器中包括有BMC4和一个与之相通信的无线通信单元8。该无线通信单元8可以是NFC(Near Field Communication近距离无线通信技术)或者其他的近距离通讯模块。该无线通信单元8包括发射部 81和接收部82。而在BMC4中则包括设置部、监控部及状态信息输出部。作为示例,BMC中的设置部等部件可以由BMC中的中央处理器所执行。
如图6所示,在本实施例中,作为示例,将服务器群1中的处理服务器11、处理服务器12、处理服务器13、处理服务器14、处理服务器15这5个处理服务器组成了一个具有共同输出端口71的服务器监测组7。本领域技术人员能够知晓,每个服务器监测组中的处理服务器的个数并不限于5个,也可以根据需要设置为多于或者少于5个。由此,可以将整个服务器群1划分为多个服务器监测组。
在本实施例中,每个处理服务器的BMC中的设置部用于设置该处理服务器所属服务器监测组的名称及该处理服务器的通信上家及通信下家。如在图6所示的一个服务器监测组中,将处理服务器11作为监测组中的第一处理服务器,设置其所在的监测组的名字为Group 1,通信上家为处理服务器15,通信下家则是处理服务器12;选取处理服务器12为第二处理服务器,设置其所在的监测组的名字为Group 1,通信上家为处理服务器11,通信下家为处理服务器13;选取处理服务器13为第三处理服务器,设置其所在的监测组的名字为Group 1,通信上家为处理服务器12,通信下家为处理服务器14;选取处理服务器14为第四处理服务器,设置其所在的监测组的名字为Group1,通信上家为处理服务器13,通信下家为处理服务器15;选取处理服务器15为第五处理服务器,设置其所在的监测组的名字为Group 1,通信上家为处理服务器14,通信下家为处理服务器11;通过各个处理服务器11-15的BMC中的设置部的上述设置,这五个处理服务器即可集合在同一个监测组Group 1中并形成一个完整的如图6所示的闭环通信链。在本实施例中,处理服务器11、处理服务器12、处理服务器13、处理服务器14以及处理服务器15之间通过各自的无线通信单元8相互通讯。
当监测终端5发现无法与服务器群中的某个或者多个处理服务器进行正常的网络连接时,监测终端会要求BMC4回报其自身的处理服务器的当前状态信息。在此实施例中,各个处理服务器的BMC开始进行自检。具体来说,各个处理服务器的BMC4中的监控部可以将监测到的例如温度、电源模式、冷却风扇速度、操作系统状态或其他硬件驱动设备等状态与预设的处于正常运行 状态中的状态阈值进行对比。其中监控部根据以上的状态值是否已经超出了正常运行状态的状态阈值,相应地向无线通信单元8发出操作指令。
同时,在无线通信单元中,每一个发射部81用于发送信息给属于同一个监测组中的通信上家和通信下家。相应地,每一个接收部82用于接收通信上家和通信下家的信息。该发射部81还用于在接收部82接收到来自通信上家或通信下家的信息后,向通信上家或通信下家发出响应信息。
在使用过程中,当处于同一个服务器监测组中的每一个处理服务器的监控部发现其自身的处理服务器的状态信息值均未超出正常运行状态的状态阈值时,其分别向各自的无线通信单元8发出正常对外发射信息和接受信息的指令。在这种情况下,在服务器监测组的完整的闭环通信链是畅通的,这表示该服务器监测组中所有的处理服务器均处于正常工作状态。
进一步,当处于同一个服务器监测组中的一个或多个处理服务器的监控部发现其自身的处理服务器的状态信息值已超出正常运行状态的状态阈值时,其分别向自身的无线通信单元发出停止对外发射信息和接受信息的指令。在这种情况下,服务器监测组的完整的闭环通信链在例如第一处理服务器11和第二处理服务器12处是断开的,这表示该服务器监测组中存在故障的处理服务器。进一步,由于每一个处理服务器均具有其相应的通信上家和通信下家,根据闭环通信链的断开点可以很容易地定位该服务器监测组中发生故障或者异常的处理服务器。
更进一步,在根据闭环通信链的断开点确定了发生故障的处理服务器的情况下,服务器监测组的共同输出端口71向所确定的故障的处理服务器的监控部发出指令。该故障的处理服务器的监控部在收到来自共同输出端口71的指令后,指示其所属的状态信息输出部向共同输出端口发送其所属的BMC的状态信息。共同输出端口在接受了故障的处理服务器的BMC的状态信息后,再通过监测终端的网络输入端口5b输入到监测终端中,从而实现了对该服务器监测组的监控。
由此可见,与前述的第一和第二实施例相比,第三实施例仅通过在BMC上增设一个简单的近距离通信单元即可实现对服务器群1中的多个处理服务 器进行划分和快速识别每个服务器监测组中的故障服务器。具体来说,由于第三实施例的网络输入端口5b的数量与服务器监测组的数目相同(监测组的数目是处理服务器数目的几分之一甚至几十分之一),与第一实施例相比,这能够极大地减少监测终端5的网络输入端口5b的数量,这进而有效控制监测终端的网络输入端口的数量和对监测终端的处理能力的要求最终降低了监测系统的硬件成本,同时还不会影响到对服务器群1的监测。
同时,与第二实施例相比,则取消了在第二实施例中的监控代理服务器,进而降低了监测系统的硬件成本,同时还不会影响到对服务器群1的监测。
以上结合附图详细说明了本发明的技术方案,尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
本发明的系统能主动要求基板管理控制器提供其自身处理服务器的当前状态信息,让该服务器群的管理者在该服务器群发生异常后,可及时且清楚地了解服务器群的状况,而能立即进行相对应的措施或处置,进而提升服务器群的管理效率。

Claims (10)

  1. 一种远程教育系统的服务器群的监测系统,包括多个处理服务器,所述多个处理服务器均包括主板管理控制器,其特征在于:其中所述服务器群还包括与多个处理服务器通过传输控制协议数据连接的分配单元,其中该分配单元用于建立分配矩阵以动态管理各处理服务器上的闲置处理进程数;所述分配单元定时监测是否有待处理的任务请求,若有,则检查分配矩阵中各处理服务器的闲置处理进程数,若存在闲置的处理服务器,则将待处理任务分配给闲置的处理服务器;
    所述监测系统包括监测终端,该监测终端包含可以与服务器群中的处理服务器进行网络连接的服务器状态实时监测单元;该监测终端的该服务器状态实时监测单元定期地与处理服务器进行网络连接,并在发现与处理服务器的网络连接发生异常时要求各处理服务器的主板管理控制器回报该处理服务器的当前状态信息。
  2. 根据权利要求1所述的监测系统,其中,
    其中该服务器群中的多个处理服务器具有共同对外提供网络服务的同一网络地址,且该主板管理控制器彼此间具有不同的网络地址,该服务器状态实时监测单元是根据处理服务器的统一网络地址与处理服务器网络连接以测试该网络服务,并根据该主板管理控制器的网络地址与该主板管理控制器网络连接以要求该主板管理控制器回报当前状态信息。
  3. 根据权利要求1或2所述的监测系统,其中,
    该服务器状态实时监测单元包括:与主板管理控制器进行网络连接的至少一个网络输入端口,该网络输入端口用以接受至少一个处理服务器的主板管理控制器的回报信息;以及与服务器群进行网络连接的网络输出/输入端口。
  4. 根据权利要求2或3所述的监测系统,其中,
    所述网络输入端口的数量与服务器群中的处理服务器的数量相同,每个网络输入端口分别与一个处理服务器的主板管理控制器通信连接。
  5. 根据权利要求2或3所述的监测系统,其中,
    该监测系统还包括至少一个监测代理服务器,该监测代理服务器被配置成连接至多个代理服务器并配置为接受与其连接的多个代理服务器的状态信息,并将接受到的多个代理服务器的状态信息传输给监测终端,其中所述网络输入端口的数量与监测代理服务器的数量相同,每个网络输入端口分别与一个监测代理服务器进行通信连接。
  6. 权利要求5所述的监测系统,其中,
    所述监测代理服务器包括有依次通信连接的信息输入单元、数据缓存池、信息输出单元以及分别与信息输入单元、数据缓存池、信息输出单元通信连接的控制单元,其中该信息输入单元接受来自与其连接的处理服务器的状态信息并将该状态信息传输给数据缓存池,所述控制单元用于对数据缓存池内的多个处理服务器的状态信息进行优先级排序,并经由信息输出单元将排序完毕后的状态信息有序地传输给监测终端。
  7. 根据权利要求6所述的监测系统,其中,
    该数据缓存池包括队列及本地数据库,该队列被配置为用于排序多个处理服务器的状态信息,该本地数据库被配置为用于在状态信息尚未被传输至监测终端前暂时储存该状态信息,其中所述控制单元根据多个代理服务器的平均故障间隔时间和服务器的可靠性指数对队列中的状态信息进行优先级排序。
  8. 根据权利要求2或3所述的监测系统,其中,
    该监测系统还包括由服务器群中的多个处理服务器组成的具有共同输出端口的服务器监测组,所述共同输出端口被配置为输出故障服务器的状态信息,其中所述网络输入端口的数量与服务器监测组的数量相同,每个网络输入端口分别与一个服务器监测组的共同输出端口进行通信连接。
  9. 根据权利要求8所述的监测系统,其中,
    在同一服务器监测集中的每个处理服务器均具有与主板管理控制器通信连接的无线通信单元,每个主板管理控制器均具有设置部、监控部及状态信 息输出部,该设置部用于设置该处理服务器所属服务器监测组的名称及该处理服务器的通信上家及通信下家,所述无线通信单元包括发射部和接受部,其中每个处理服务器的发射部和接受部被配置用于与其通信上家及通信下家通信,从而在该监测组中形成完整的闭环通信链。
  10. 权利要求9所述的监测系统,其中,
    所述监控部被配置为用于判定处理服务器的状态信息是否超出状态阈值,在超出状态阈值时,向无线通信单元发出停止发射和接受信息的指令从而使所述闭环通信链断开,所述共同输出端口向处于闭环通信链断开点的处理服务器发出指令,以使其所属的状态信息输出部向该共同输出端口发送其所属的主板控制器的状态信息并经由共同输出端口传输至监测终端。
PCT/CN2017/114405 2017-11-17 2017-12-04 一种远程教育系统的服务器群的监测系统 WO2019095448A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711142005.8 2017-11-17
CN201711142005.8A CN109800120B (zh) 2017-11-17 2017-11-17 一种远程教育系统的服务器群的监测系统

Publications (1)

Publication Number Publication Date
WO2019095448A1 true WO2019095448A1 (zh) 2019-05-23

Family

ID=66538357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/114405 WO2019095448A1 (zh) 2017-11-17 2017-12-04 一种远程教育系统的服务器群的监测系统

Country Status (2)

Country Link
CN (1) CN109800120B (zh)
WO (1) WO2019095448A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112512027A (zh) * 2020-12-25 2021-03-16 浪潮电子信息产业股份有限公司 一种服务器及通信方法、系统、计算机可读存储介质
CN115858293A (zh) * 2022-12-13 2023-03-28 支付宝(杭州)信息技术有限公司 基于虚拟化的数据处理方法、装置及系统
CN116205766A (zh) * 2023-04-28 2023-06-02 北京华医网科技股份有限公司 一种应用于医学继教的大数据分析方法和系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035231A (zh) * 2020-09-01 2020-12-04 中国银行股份有限公司 一种数据处理系统、方法及服务器群
CN112035259A (zh) * 2020-09-01 2020-12-04 中国银行股份有限公司 一种数据处理系统、方法及服务器群
CN117354075B (zh) * 2023-12-05 2024-03-15 广州炫视智能科技有限公司 一种多用户交互方法及交互系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004743A (zh) * 2006-01-21 2007-07-25 鸿富锦精密工业(深圳)有限公司 分布式文档转换系统及方法
CN102404160A (zh) * 2010-09-13 2012-04-04 中国移动通信集团福建有限公司 智能监控实现方法和系统
CN102523234A (zh) * 2011-12-29 2012-06-27 山东中创软件工程股份有限公司 一种应用服务器集群实现方法及系统
CN103237838A (zh) * 2010-12-07 2013-08-07 巴斯夫欧洲公司 包含纳米多孔填料的三聚氰胺树脂泡沫
US20140068033A1 (en) * 2012-09-05 2014-03-06 John Berger Systems, methods, and articles of manufacture to manage alarm configurations of servers
CN106021070A (zh) * 2016-04-29 2016-10-12 乐视控股(北京)有限公司 服务器集群监测方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227838B (zh) * 2013-05-10 2015-09-30 中国工商银行股份有限公司 一种多重负载均衡处理装置与方法
CN105184498A (zh) * 2015-09-18 2015-12-23 成都虹昇光电科技有限公司 一种教育装备管理平台

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004743A (zh) * 2006-01-21 2007-07-25 鸿富锦精密工业(深圳)有限公司 分布式文档转换系统及方法
CN102404160A (zh) * 2010-09-13 2012-04-04 中国移动通信集团福建有限公司 智能监控实现方法和系统
CN103237838A (zh) * 2010-12-07 2013-08-07 巴斯夫欧洲公司 包含纳米多孔填料的三聚氰胺树脂泡沫
CN102523234A (zh) * 2011-12-29 2012-06-27 山东中创软件工程股份有限公司 一种应用服务器集群实现方法及系统
US20140068033A1 (en) * 2012-09-05 2014-03-06 John Berger Systems, methods, and articles of manufacture to manage alarm configurations of servers
CN106021070A (zh) * 2016-04-29 2016-10-12 乐视控股(北京)有限公司 服务器集群监测方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112512027A (zh) * 2020-12-25 2021-03-16 浪潮电子信息产业股份有限公司 一种服务器及通信方法、系统、计算机可读存储介质
CN115858293A (zh) * 2022-12-13 2023-03-28 支付宝(杭州)信息技术有限公司 基于虚拟化的数据处理方法、装置及系统
CN116205766A (zh) * 2023-04-28 2023-06-02 北京华医网科技股份有限公司 一种应用于医学继教的大数据分析方法和系统

Also Published As

Publication number Publication date
CN109800120B (zh) 2020-12-08
CN109800120A (zh) 2019-05-24

Similar Documents

Publication Publication Date Title
WO2019095448A1 (zh) 一种远程教育系统的服务器群的监测系统
US11533234B2 (en) Autonomous distributed workload and infrastructure scheduling
US11665230B2 (en) Data center network device sensing
US10404523B2 (en) Data center management with rack-controllers
JP6329899B2 (ja) クラウドコンピューティングのためのシステム及び方法
US7822841B2 (en) Method and system for hosting multiple, customized computing clusters
US8892737B2 (en) Network sniffer for performing service level management
US8634330B2 (en) Inter-cluster communications technique for event and health status communications
CN107534570A (zh) 虚拟化网络功能监控
US11349701B2 (en) Data center management with rack-controllers
EP3400498B1 (en) Data center management
WO2017222763A2 (en) Autonomous distributed workload and infrastructure scheduling
CN106993043A (zh) 基于代理的数据通信系统和方法
US10554497B2 (en) Method for the exchange of data between nodes of a server cluster, and server cluster implementing said method
CN108418860A (zh) 一种基于ceph集群的osd心跳通讯方法
EP3400497B1 (en) Data center management
US20240193439A1 (en) Automonous digital twin generation using edge-nodes
Danilevičius et al. STUDY OF HIGH AVAILABILITY AND PERFORMACE OFF SERVER CLUSTER
CN116962163A (zh) 一种故障处理系统、方法及电子设备
CN103688511A (zh) 物理位置的跟踪

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17932106

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17932106

Country of ref document: EP

Kind code of ref document: A1