CN108268355A - For the monitoring system and method for data center - Google Patents

For the monitoring system and method for data center Download PDF

Info

Publication number
CN108268355A
CN108268355A CN201611268506.6A CN201611268506A CN108268355A CN 108268355 A CN108268355 A CN 108268355A CN 201611268506 A CN201611268506 A CN 201611268506A CN 108268355 A CN108268355 A CN 108268355A
Authority
CN
China
Prior art keywords
monitoring
level
monitoring data
convergence
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611268506.6A
Other languages
Chinese (zh)
Inventor
曾祥洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Sichuan Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Sichuan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Sichuan Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201611268506.6A priority Critical patent/CN108268355A/en
Publication of CN108268355A publication Critical patent/CN108268355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data

Abstract

This application involves the monitoring system and methods for data center, the monitoring system includes monitoring resource layer, monitoring data convergence-level, monitoring center and configuration center, monitoring data convergence-level is multilayered structure, it monitors resource layer and includes multiple monitoring resource groups, the rule acquisition monitoring data that each monitoring resource group is issued according to configuration center, and monitoring data is reported to monitoring data convergence-level;Monitoring data convergence-level picks the monitoring data that monitoring resource layer is sent, and classifies to monitoring data, and carries out convergence processing to every class monitoring data;Monitoring center, for storing monitoring data;Configuration center, the aggregation node grouping information of grouping information, monitoring data convergence-level for configuration monitoring resource group and convergence strategy, and it is handed down to monitoring resource layer and monitoring data convergence-level.

Description

For the monitoring system and method for data center
Technical field
Invention relates generally to computer and network technology field, more particularly, to the monitoring system for data center System and method.
Background technology
As the continuous improvement of social informatization technology and the quick of Internet technology are popularized, computer equipment is increasingly It is more, it is contemplated that in the near future, the number of devices involved by ultra-large data center is up to hundreds thousand of or even up to a hundred Ten thousand, the data therefore, it is necessary to processing are also more and more, and demand of the every field to mass data processing is also increasing.In list Machine memory space and operational capability cannot meet under the background for the needs of people are to mass data processing, Distributed Calculation Start fast-developing and application with parallel computation, finally develop into grid computing.The monitoring information of large scale distributed system is Magnanimity, monitoring resource is multi-level multi-source, the dynamic of big data platform, complexity to big data platform monitoring system Regiment commander carrys out numerous difficulties.
Invention content
According to the one side of the application, a kind of monitoring system for data center is provided, which includes prison Resource layer, monitoring data convergence-level, monitoring center and configuration center are controlled, monitoring data convergence-level is multilayered structure, monitors resource Layer includes multiple monitoring resource groups, each monitors the rule acquisition monitoring data that resource group is issued according to configuration center, and will supervise Control data are reported to monitoring data convergence-level;Monitoring data convergence-level picks the monitoring data that monitoring resource layer is sent, to monitoring Data are classified, and carry out convergence processing to every class monitoring data;Monitoring center, for storing monitoring data;Configuration center, The aggregation node grouping information of grouping information, monitoring data convergence-level for configuration monitoring resource group and convergence strategy, and under Issue monitoring resource layer and monitoring data convergence-level.
According to the another aspect of the application, a kind of monitoring method for data center is provided, which is wrapping It includes and is performed in the monitoring system of monitoring resource layer, monitoring data convergence-level, monitoring center and configuration center, wherein, monitor number It is multilayered structure according to convergence-level, which includes:Monitoring data is simultaneously reported to prison by monitoring resource layer acquisition monitoring data Data convergence-level is controlled, monitoring resource layer includes multiple monitoring resource groups, each rule for monitoring resource group and being issued according to configuration center Then acquire monitoring data;Monitoring data convergence-level picks the monitoring data that monitoring resource layer is sent, and classifies to monitoring data, And convergence processing is carried out to every class monitoring data;Monitoring center stores monitoring data;And configuration center configuration monitoring resource group Grouping information, the aggregation node grouping information of monitoring data convergence-level and convergence strategy, and be handed down to monitoring resource layer and prison Control data convergence-level..
Monitoring can be mitigated by providing one kind according to the monitoring system and method for data center of the embodiment of the present application The technical solution of the pressure at center.
Description of the drawings
In conjunction with the following drawings, the application may be better understood in description according to an embodiment of the present application, wherein:
Fig. 1 shows the structure diagram of the monitoring system for data center according to one embodiment of the application;
Fig. 2 shows the flow chart of the monitoring method for data center according to one embodiment of the application;
Fig. 3 shows the operational flowchart of monitoring center according to one embodiment of the application;And
Fig. 4 show the monitoring resource node layer that can realize the application, monitoring data convergence-level node, monitoring center, The structure diagram of the information processing equipment of one or more of configuration center.
Specific embodiment
The feature and exemplary embodiment of the application various aspects is described more fully below.Following description covers many Detail, in order to provide the comprehensive understanding to the application.It will be apparent, however, to one skilled in the art that The application can be implemented in the case of some details in not needing to these details.Below to the description of embodiment only It is to provide the clearer understanding to the application by showing the example of the application.The application is not limited to set forth below Any concrete configuration, but covered under the premise of spirit herein is not departed from correlated characteristic, structure, operation etc. appoint What modification is replaced and is improved.
Monitoring is the important component of big data platform, with significantly increasing for monitoring data, current system and side Method can not meet the monitoring demand of increasingly huge data center, lead to monitoring delay, the data portion of acquisition is lost, prison The problems such as control central apparatus resource consumption remains high, so as to can not achieve the effective monitoring to data.
In addition, the server cluster of big data system may be because of service in statistical analysis software and hardware information related to summarizing Device quantity is more, deployment software type is more and the reasons such as relevant information index excessively complexity, and abnormal alarm is caused to work The problems such as measuring heavy, monitoring and alarm inefficiency.
The application converges corresponding resource, while to every layer by using the thinking of multilayer monitoring structure at every layer Resource is analyzed and is handled, and is mitigated the pressure of monitoring center significantly, is realized the effective monitoring to big data.
Fig. 1 shows the structural representation of the monitoring system 100 for data center according to one embodiment of the application Figure.
In one embodiment, system 100 can be included in monitoring resource layer 102, monitoring data convergence-level 101, monitoring The heart 103 and configuration center 104.In one embodiment, monitoring data convergence-level 101 is multilayered structure.In one embodiment, According to the amount that the scale of data center and needs monitor, monitoring data convergence-level can be according to business, region, network condition etc. N layers are divided into, wherein N is positive integer, for example, as shown in Figure 1, one 101_1 of monitoring data convergence-level, monitoring data convergence-level Two 101_2, three 101_3 ... of monitoring data convergence-level and monitoring data convergence-level N 101_N.The application is not to monitoring number It is limited according to the number of convergence-level, but with the increase of data processing level, increase monitoring that can be additional postpones, therefore, needs The number of plies is set according to portfolio, business feature of data center etc..
In one embodiment, monitoring resource layer 102 can be as the basal layer of system 100.In one embodiment, it supervises Multiple monitoring resource groups 1021 can be included by controlling resource layer 102, and each monitoring resource group 1021 can be according under configuration center 104 The rule acquisition monitoring data of hair, and monitoring data is reported to monitoring data convergence-level 101.
In one embodiment, at least one of multiple monitoring resource groups 1021 monitoring resource group 1021 can include needing The resource to be monitored.Resource for example can be software, hardware, or combination each attribute, for example, service attribute.
In one embodiment, monitoring node can be disposed for the resource monitored is each needed, to be carried out to resource Monitoring and gathered data.In one embodiment, monitoring node can be agency (agent).
In one embodiment, monitoring data convergence-level 101 picks the monitoring data that monitoring resource layer 102 is sent, to prison Control data are classified, and carry out convergence processing to every class monitoring data.
In one embodiment, monitoring center 103 is used to store monitoring data.
In one embodiment, the remittance of the grouping information, monitoring data convergence-level of 104 configuration monitoring resource group of configuration center Poly- node grouping information and convergence strategy, and it is handed down to monitoring resource layer and monitoring data convergence-level.
In one embodiment, each monitoring node can respectively report collected data, for example, reporting to First monitoring data convergence-level (for example, one 101_1 of monitoring data convergence-level).It in one embodiment, can be in each monitoring Main monitoring node is selected according to pre-defined rule in monitoring node in resource group, other monitoring nodes converge collected data Gather main monitoring node, and by main monitoring node reported data.It in one embodiment, can be according to pre-defined rule, for not Same monitoring resource group is reported using above two mode to monitoring data convergence-level.
In one embodiment, monitoring data convergence-level 101 can include the first monitoring data convergence-level and the second monitoring Data convergence-level, for example, two 101_2 (not shown in figure 1)s of one 101_1 of monitoring data convergence-level and monitoring data convergence-level, the Two monitoring data convergence-levels can be the upper strata convergence-level of the first monitoring data convergence-level.
In one embodiment, the first monitoring data convergence-level picks the monitoring data that monitoring resource layer 102 is sent, and is used for Monitoring data is converged to obtain the first monitoring information, and the first monitoring information is sent to the convergence of the second monitoring data Layer.
In one embodiment, the second monitoring data convergence-level receives the first monitoring letter from the first monitoring data convergence-level Breath, converges the first monitoring information to obtain the second monitoring information, and the second monitoring information is sent to monitoring center 103 Or second monitoring data convergence-level upper strata convergence-level (for example, two 101_3 of monitoring data convergence-level).
In one embodiment, monitoring center 103 for example can be used for being responsible for storage, analysis, the system event of monitoring information Barrier automated to respond to mechanism action analysis and automatic fault processing action issue and the transmission of warning information in one or It is multinomial.
In one embodiment, monitoring center 103 can include monitoring data center 1031.In one embodiment, it supervises Control data center 1031 can be used for storing monitoring data.In one embodiment, it is stored at monitoring data center 1031 all Monitoring data.In one embodiment, storage section monitoring data at monitoring data center 1031.
In one embodiment, monitoring center 103 can include monitoring data analysis center 1032.In one embodiment In, monitoring data analysis center 1032 can be used for analyzing monitoring data.In one embodiment, monitoring data is analyzed Center 1032 can carry out one or more in following item:The preliminary analysis judgement of fault message, system performance information analysis, The analysis of service feature status data, resource trends analysis, device load-bearing capability analysis, resource capacity expansion demand analysis and generation All kinds of relevant reports of monitoring.
In one embodiment, monitoring center 103 can include monitoring notice alarm center 1033.In one embodiment In, the notice warning information that monitoring notice alarm center 1033 can be responsible for monitoring system is sent.In one embodiment, it monitors Alarm center 1033 is notified corresponding notice warning information can be sent to corresponding personnel according to corresponding configuration.
In one embodiment, monitoring center 103 can include monitoring self-defined self-healing analysis center 1034.In a reality Apply in example, monitor self-defined self-healing analysis center 1034 can be used for the failure of 1032 preliminary analysis of monitoring data analysis center into It goes and further analyzes, generation is used to solve accordingly failure (for example, equipment crash, the event of the failure of device systems level, business Barrier etc.) instruction.
In one embodiment, monitoring center 103 can issue center 1035 including monitoring custom action.In a reality It applies in example, monitoring custom action, which issues center 1035, will monitor under the instruction that self-defined self-healing analysis center 1034 generates Corresponding device node is dealt into, to solve failure.In one embodiment, the instruction issued can include but is not limited to:Restart System, restarting equipment restart business, equipment is powered off or powers up restart firmly again (are set accessing corresponding power management In the case of standby) etc..
In one embodiment, monitoring data convergence-level can include multiple data convergence groups 1011, each data convergence Group is operated according to the rule that configuration center issues.In one embodiment, data convergence group 1011 can include multiple sections Point.In one embodiment, a certain node can be selected in multiple nodes in data convergence group 1011 as main convergence section Point, for sending monitoring information.
In one embodiment, monitoring data convergence-level can classify to monitoring data, specifically include:According to monitoring Monitoring data is divided into real time monitoring information, non real-time monitoring information and non-by the real time monitoring attribute of data and effective monitoring Monitoring information.
In one embodiment, monitoring data convergence-level carries out convergence processing to every class monitoring data, specifically includes:If it connects The monitoring data of receipts is real time monitoring information, sends real time monitoring information in real time;If the monitoring data received is non real-time monitoring Information, by non real-time monitoring information asynchronous transmission to monitoring center;If the monitoring data received is non-supervised information, non-prison is abandoned Control information.Thus, it is possible to it improves response speed and mitigates the pressure of monitoring center.
In one embodiment, real time information highest priority, can be real-time by the main aggregation node of data convergence group 1011 The real time information is sent to the monitoring data convergence-level of monitoring center 103 or higher level by ground.It in one embodiment, can root Non-real-time information is asynchronously sent directly to by monitoring center 103 by main aggregation node according to resource service condition, network flow etc.. In one embodiment, non-supervised information (e.g., including invalid information or the information no longer needed) can be direct by each node It abandons or is abandoned by main aggregation node.The pressure of the monitoring data convergence-level of monitoring center 103 and higher level will be mitigated as a result, Power.
In one embodiment, configuration center 104 can be used for storing various configuration strategies, and be issued to respective nodes.
In one embodiment, configuration strategy can include:Monitor grouping information, the monitoring resource layer 102 of resource layer 102 Main monitoring node selection mechanism, monitoring strategy, monitoring data convergence-level convergence strategy, monitoring data convergence-level main remittance Poly- node selection mechanism.
In one embodiment, configuration center 104 can include monitoring group configuration 1041.In one embodiment, it monitors Group configuration 1041 can store the grouping information configuration of monitoring resource.In other words, monitoring group configuration 1041 can be stored with being supervised The relevant information of resource of control, the monitoring resource group belonging to including but not limited to each resource.In one embodiment, in configuration The heart 104 can be grouped monitoring resource according to business information, physical region, service logic, network area etc..At one In embodiment, due to each resource deployment monitoring node (for example, agency), being provided to the monitoring belonging to each resource After source group is configured, monitoring node is also provided with corresponding group.
In one embodiment, configuration center 104 can include convergence group configuration 1042.In one embodiment, it converges Group configuration 1042 can store the node grouping information configuration of monitoring data convergence-level.In other words, convergence group configuration 1042 can be with Storage and the relevant information of node of monitoring data convergence-level, data convergence group belonging to including but not limited to each node and Level belonging to data convergence group.
In one embodiment, configuration center 104 can include monitoring strategies configuration 1043.In one embodiment, it supervises Control strategy configuration 1043 can store and monitor relevant strategy configuration.In one embodiment, strategy configuration is included but not It is limited to:The information acquired, the frequency for acquiring information, the method for acquiring information and monitoring resource layer are needed by information reporting Which node to monitoring data convergence-level etc..In one embodiment, strategy configuration, which can also include main monitoring node, needs The monitoring exploration policy to same group of resource actively to initiate, for example, same organize network delay inspection, with group node viability inspection Deng.
In one embodiment, configuration center 104 can include convergence strategy configuration 1044.In one embodiment, it converges Poly- strategy configuration 1044 can store the convergence strategy configuration of monitoring data convergence-level.In one embodiment, the convergence strategy Configuration can include monitoring data convergence-level to converging the preliminary analysis strategy of information to come up, information classification policy (by information Be divided into real time information, non-real-time information, non-supervised information etc.), the configuration of higher level's receiving node of main aggregation node (for example, which Which of data convergence group node receives information), non real-time monitoring information asynchronous upload monitoring center strategy is (for example, asynchronous Upload opportunity, mode etc.) etc..
In one embodiment, configuration center 104 can include monitoring group/convergence group host node selection mechanism configuration 1045.In one embodiment, monitoring group/convergence group host node selection mechanism configuration 1045 can store the main monitoring section of monitoring group The main aggregation node selection mechanism configuration of point/convergence group.It is each to supervise since large-scale data center environment is complicated, resource is different The host node selection strategy of control group/convergence group is limited to corresponding node resource, Internet resources, input and output (I/O) resource etc. And corresponding monitoring strategies.
In one embodiment, each monitoring group/convergence group can be according to respective business feature, to set corresponding selection Mechanism.For example, in the very big monitoring resource group of I/O resource pressures, the preferential selection smaller section of I/O resource pressures can be set Point is used as main monitoring node.For example, being consumed in process resource (for example, processor) in larger monitoring resource group, can set The relatively low node of process resource utilization rate is preferentially used as main monitoring node.In the more complicated situation of environment, host node is selected Strategy may need comprehensive many factors.
For example, above-mentioned strategy can rely on specific business and change, for example, respectively with the business of analysis classes and business class The corresponding strategy of business be different.For example, in one embodiment, for the business of analysis classes, it can not consider have The utilization rate of the process resource of body time point or period, but for class business of doing business, then need to consider this attribute.
In one embodiment, for a variety of different business, in corresponding Different Strategies attribute needed to be considered include but It is not limited to one or more of the following items:I/O resources service condition, process resource service condition, network flow, link number Etc..In one embodiment, some business may not be needed to consider any of the above described attribute, and other business may need Consider whole attributes.
In one embodiment, the attribute considered can be real-time data or non real-time historical data.
In one embodiment, configuration center 104 can issue center 1046 including configuration strategy.In one embodiment In, configuration strategy, which issues center 1046, can be responsible under the configuration strategy for monitoring resource node layer/monitoring data convergence-level node 1041, convergence group configuration 4012, monitoring strategies configuration 1043, convergence strategy configuration 1044, monitoring is configured in above-mentioned monitoring group by hair One or more of 1045 tactful configuration distributing is configured to respective nodes in group/convergence group host node selection mechanism.
Fig. 2 shows the flow chart 200 of the monitoring method for data center according to one embodiment of the application.It should Monitoring method can in including monitoring resource layer, monitoring data convergence-level, the monitoring system of monitoring center and configuration center quilt It performs, wherein, monitoring data convergence-level is multilayered structure.
At step 201, the grouping information of configuration center configuration monitoring resource group, the aggregation node of monitoring data convergence-level Grouping information and convergence strategy, and it is handed down to monitoring resource layer and monitoring data convergence-level.
At step 202, monitoring data is simultaneously reported to monitoring data convergence-level by monitoring resource layer acquisition monitoring data, is supervised It controls resource layer and includes multiple monitoring resource groups, each rule acquisition monitoring data for monitoring resource group and being issued according to configuration center.
At step 203, monitoring data convergence-level picks the monitoring data that monitoring resource layer is sent, and monitoring data is carried out Classification, and convergence processing is carried out to every class monitoring data.
At step 204, monitoring center storage monitoring data.
In one embodiment, monitoring resource layer can include multiple monitoring resource groups, and each resource group that monitors can wrap Include one or more nodes.
In one embodiment, monitoring data convergence-level includes at least the first monitoring data convergence-level and the second monitoring data Convergence-level, the second monitoring data convergence-level are the upper strata convergence-level of the first monitoring data convergence-level, and method 200 can include:The One monitoring data convergence-level picks the monitoring data that monitoring resource layer is sent, for being converged to obtain first to monitoring data Monitoring information, and the first monitoring information is sent to the second monitoring data convergence-level;Second monitoring data convergence-level is supervised from first It controls data convergence-levels and receives the first monitoring information, the first monitoring information is converged to obtain the second monitoring information, and by the Two monitoring informations are sent to the upper strata convergence-level of monitoring center or the second monitoring data convergence-level.
In one embodiment, monitoring data convergence-level is classified to monitoring data and can be included in step 203:According to Monitoring data is divided into real time monitoring information, non real-time monitoring information by the real time monitoring attribute of monitoring data and effective monitoring With non-supervised information.
In one embodiment, in step 203 monitoring data convergence-level convergence processing is carried out to every class monitoring data can be with Including:If the monitoring data received is real time monitoring information, real time monitoring information is sent in real time;If the monitoring data received is non- Information is monitored in real time, by non real-time monitoring information asynchronous transmission to monitoring center;If the monitoring data received is non-supervised information, Abandon non-supervised information.
In one embodiment, method 200 further includes configuration center and following item is configured:Monitor the main prison of resource layer Control the main aggregation node of node selection mechanism, the strategy of monitoring, the convergence strategy of monitoring data convergence-level, monitoring data convergence-level Selection mechanism.For example, being configured in configuration center with monitoring relevant strategy, as above, which is issued to monitoring by configuration center Node in resource layer.For example, the convergence strategy in configuration center configuration monitoring data convergence-level is configured, as above, by configuration The heart issues the node that the convergence strategy is configured in each monitoring data convergence-level.
In one embodiment, following item can be configured in configuration center:Monitor the main monitoring node choosing of resource layer The main aggregation node selection machine for system, the strategy of monitoring, the convergence strategy of monitoring data convergence-level, the monitoring data convergence-level of selecting a good opportunity System.
In one embodiment, monitoring data convergence-level can include multiple data convergence groups, and method 200 can also wrap It includes:Each data convergence group is operated according to the rule that configuration center issues.
In one embodiment, method 200 can also include:Master is selected from multiple nodes in monitoring data convergence group Aggregation node sends monitoring information.
In one embodiment, each node for monitoring the monitoring resource group in resource layer arrives collected data summarization Collected data are transmitted up monitoring data convergence by the main monitoring node of corresponding monitoring resource group by the main monitoring node Layer.In one embodiment, each node for monitoring the monitoring resource group in resource layer sends out collected data directly up It is sent to monitoring data convergence-level.
In one embodiment, the node of monitoring data convergence-level can be to from the monitoring resource layer or other monitoring of even lower level The data that data convergence-level receives are classified, the main aggregation node of data convergence group where then corresponding data is converged to Place.In one embodiment, the main aggregation node of monitoring data convergence-level can where converge to data convergence group other After the data of node, classify to these data.
Fig. 3 shows the operational flowchart 300 of monitoring center according to one embodiment of the application.
In one embodiment, at 301, monitoring data convergence-level 101 will be acquired by synchronous or asynchronous mode To real-time or non real-time data be sent to monitoring center 103, and the data are stored in monitoring data by monitoring center 103 The heart 1031.
In one embodiment, at 302, monitoring data analysis center 1032 reads the data at monitoring data center 1031 To be analyzed.
In one embodiment, at 303, monitoring data analysis center 1032 stores analysis result into monitoring data The heart 1031.
In one embodiment, monitoring data analysis center 1032 to from 1031 read data of monitoring data center into Row analysis if discovery failure, at 304, will notify configuration center 104 to update the configuration information of malfunctioning node interdependent node.
In one embodiment, monitoring data analysis center 1032 carries out preliminary analysis processing to failure.
In one embodiment, at 305, if failure not in the range of Self healing Strategy, then monitoring data analysis center 1032 notice monitoring notice alarm centers 1033 send out corresponding notice warning information to corresponding personnel, so as to manually be located Reason.
In one embodiment, at 306, if failure in the range of Self healing Strategy, monitoring data analysis center 1032 Fault message is sent to the self-defined self-healing analysis center 1034 of monitoring.
In one embodiment, it at 307, monitors self-defined self-healing analysis center 1034 and fault message is analyzed, Corresponding self-healing instruction is generated, and self-healing instruction is issued to monitoring custom action and issues center 1035.
In one embodiment, monitoring custom action issues center 1035 and corresponding self-healing operational order is issued to phase The node answered carries out self-healing operation.For example, at 309, monitoring custom action, which issues center 1035, will be directed to monitoring resource section The self-healing operational order of point is issued to corresponding monitoring resource node and carries out self-healing operation.For example, at 309, monitoring is self-defined Action issues center 1035 will be issued to corresponding monitoring data convergence for the self-healing operational order of monitoring data aggregation node Node carries out self-healing operation.
In one embodiment, if malfunctioning node self-healing operates successfully at 310, at 311, configuration center is notified 104 issue the strategy of interdependent node, and respective nodes are re-incorporated INTO monitoring range.
In one embodiment, if at 312 malfunctioning node self-healing operation failure, at 313, notice monitoring notice Alarm center 1033 sends corresponding notice alarm to corresponding personnel, to carry out artificial treatment.
Fig. 4 shows the structure diagram of information processing equipment 400, the section of the monitoring resource layer in embodiments herein One or more of point, the node of monitoring data convergence-level, monitoring center, configuration center can be by information processing equipments 400 To realize.As shown in figure 4, equipment 400 can include with one or more in lower component:Processor 420, memory 430, electricity Source component 440, input/output (I/O) interface 460, communication interface 480, these components can for example be led to by bus 410 The mode of letter connects.
The operation of the control device 400 on the whole of processor 420, for example, it is associated with data communication and calculation processing etc. Operation.Processor 420 can include one or more processing cores, and be able to carry out instruction to realize method described herein All or part of step.Processor 420 can include the various devices with processing function, including but not limited to general procedure It is device, application specific processor, microprocessor, microcontroller, graphics processor (GPU), digital signal processor (DSP), special integrated Circuit (ASIC), programmable logic device (PLD), field programmable gate array (FPGA) etc..Processor 420 can include Caching 425 can communicate with caching 425, to improve the access speed of data.
Memory 430 is configured as storing various types of instructions and/or data with the operation of holding equipment 400.Data Example include for instruction, data of any application program for operating on device 400 or method etc..Memory 430 can be with It is realized by any kind of volatibility or non-volatile memory device or combination thereof.Memory 430 can include partly leading Body memory, such as random access memory (RAM), static RAM (SRAM), dynamic random access memory (DRAM), read-only memory (ROM), programmable read only memory (PROM), Erasable Programmable Read Only Memory EPROM (EPROM), Electrically erasable programmable read-only memory (EEPROM), flash memory etc..Memory 430 can also include for example using paper being situated between Any memory of matter, magnetic medium and/or optical medium, as paper tape, hard disk, tape, floppy disk, magneto-optic disk (MO), CD, DVD, Blue-ray etc..
Power supply module 440 provides electric power for the various assemblies of equipment 400.Power supply module 440 can include internal cell and/ Or external power interface, and can include power-supply management system and other with generating, managing and distributing electric power phase for equipment 400 Associated component.
I/O interfaces 460 provide the interface for allowing users to interact with equipment 400.I/O interfaces 460 for example can be with Including the interface based on technologies such as PS/2, RS-232, USB, FireWire, Lightening, VGA, HDMI, DisplayPort, It allows users to through keyboard, Genius mouse, touch tablet, touch screen, control stick, button, microphone, loud speaker, display, camera shooting The peripheral devices such as head, projection port and equipment 400 interact.
Communication interface 480 is configured to that equipment 400 is enable to communicate in a wired or wireless fashion with other equipment.If Standby 400 can access the wireless network based on one or more communication standards by communication interface 480, for example, Wi-Fi, 2G, 3G, 4G communication networks.In a kind of exemplary embodiment, communication interface 480 can also be received via broadcast channel from external broadcasting The broadcast singal or broadcast related information of management system.Illustrative communication interface 480 can be included based on near-field communication (NFC) Technology, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, bluetooth (BT) technology etc. The interface of communication mode.
Structures described above frame functional block shown in figure can be implemented as hardware, software, firmware or their group It closes.When realizing in hardware, it may, for example, be electronic circuit, application-specific integrated circuit (ASIC), appropriate firmware, insert Part, function card etc..When being realized with software mode, element of the invention is used to perform program or the generation of required task Code section.Either code segment can be stored in machine readable media program or the data-signal by being carried in carrier wave is passing Defeated medium or communication links are sent." machine readable media " can include being capable of any medium of storage or transmission information. The example of machine readable media includes electronic circuit, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), soft Disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, etc..Code segment can be via such as internet, inline The computer network of net etc. is downloaded.
" one embodiment " is mentioned above, it should be understood, however, that the feature referred in various embodiments might not The embodiment is can be only applied to, but is possibly used for other embodiment or is applied in combination with other embodiment.
The application is described above with reference to the specific embodiment of the application, but those skilled in the art are equal Solution, implementation method mentioned herein are the application statement, and listed specific embodiment is only the applicating example of the application, Do not represent the application be only limitted to it is such using example, and these specific embodiments can be carry out various modifications, combine and Change, without departing from the spirit and scope limited by appended claims or its equivalent.

Claims (12)

1. a kind of monitoring system for data center, which is characterized in that the monitoring system includes monitoring resource layer, monitoring number According to convergence-level, monitoring center and configuration center, the monitoring data convergence-level is multilayered structure,
The monitoring resource layer includes multiple monitoring resource groups, what each monitoring resource group was issued according to the configuration center Rule acquisition monitoring data, and the monitoring data is reported to the monitoring data convergence-level;
The monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, and the monitoring data is divided Class, and convergence processing is carried out to every class monitoring data;
The monitoring center, for storing the monitoring data;
The configuration center, for the convergence section of the grouping information of the monitoring resource group, the monitoring data convergence-level to be configured Point grouping information and convergence strategy, and it is handed down to the monitoring resource layer and the monitoring data convergence-level.
2. monitoring system as described in claim 1, which is characterized in that the monitoring data convergence-level includes at least the first monitoring Data convergence-level and the second monitoring data convergence-level, the second monitoring data convergence-level are the first monitoring data convergence-level Upper strata convergence-level;
The first monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, for the monitoring data It is converged to obtain the first monitoring information, and first monitoring information is sent to the second monitoring data convergence-level;
The second monitoring data convergence-level receives first monitoring information, to institute from the first monitoring data convergence-level The first monitoring information is stated to be converged to obtain the second monitoring information, and by second monitoring information be sent to monitoring center or The upper strata convergence-level of second monitoring data convergence-level.
3. monitoring system as claimed in claim 1 or 2, which is characterized in that the monitoring data convergence-level is to the monitoring number According to classifying, specifically include:
According to the real time monitoring attribute of the monitoring data and effective monitoring, the monitoring data is divided into real time monitoring letter Breath, non real-time monitoring information and non-supervised information.
4. monitoring system as claimed in claim 3, which is characterized in that the monitoring data convergence-level to every class monitoring data into Row convergence processing, specifically includes:
If the monitoring data received is real time monitoring information, the real time monitoring information is sent in real time;
If the monitoring data received is non real-time monitoring information, will be in the non real-time monitoring information asynchronous transmission to the monitoring The heart;
If the monitoring data received is non-supervised information, the non-supervised information is abandoned.
5. monitoring system as described in claim 1, which is characterized in that the configuration center is additionally operable to be configured:
The main monitoring node selection mechanism of the monitoring resource layer, the convergence plan of tactful, the described monitoring data convergence-level of monitoring Slightly, the main aggregation node selection mechanism of the monitoring data convergence-level.
6. monitoring system as described in claim 1, which is characterized in that the monitoring data convergence-level is converged including multiple data Group, each data convergence group are operated according to the rule that the configuration center issues.
7. a kind of monitoring method for data center, which is characterized in that the monitoring method is including monitoring resource layer, monitoring It is performed in the monitoring system of data convergence-level, monitoring center and configuration center, wherein, the monitoring data convergence-level is multilayer Structure, the monitoring method include:
The aggregation node grouping of the grouping information, the monitoring data convergence-level of the configuration center configuration monitoring resource group Information and convergence strategy, and it is handed down to the monitoring resource layer and the monitoring data convergence-level;
The monitoring data is simultaneously reported to the monitoring data convergence-level, the prison by the monitoring resource layer acquisition monitoring data It controls resource layer and includes multiple monitoring resource groups, the rule acquisition prison that each monitoring resource group is issued according to the configuration center Control data;
The monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, and the monitoring data is divided Class, and convergence processing is carried out to every class monitoring data;And
The monitoring center stores the monitoring data.
8. monitoring method as claimed in claim 7, which is characterized in that the monitoring data convergence-level includes at least the first monitoring Data convergence-level and the second monitoring data convergence-level, the second monitoring data convergence-level are the first monitoring data convergence-level Upper strata convergence-level, the method includes:
The first monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, for the monitoring data It is converged to obtain the first monitoring information, and first monitoring information is sent to the second monitoring data convergence-level;
The second monitoring data convergence-level receives first monitoring information from the first monitoring data convergence-level, to described First monitoring information is converged to obtain the second monitoring information, and second monitoring information is sent to monitoring center or The upper strata convergence-level of two monitoring data convergence-levels.
9. monitoring method as claimed in claim 7 or 8, which is characterized in that the monitoring data convergence-level is to the monitoring number Include according to classification is carried out:
According to the real time monitoring attribute of the monitoring data and effective monitoring, the monitoring data is divided into real time monitoring letter Breath, non real-time monitoring information and non-supervised information.
10. monitoring method as claimed in claim 9, which is characterized in that the monitoring data convergence-level is to every class monitoring data Convergence processing is carried out to include:
If the monitoring data received is real time monitoring information, the real time monitoring information is sent in real time;
If the monitoring data received is non real-time monitoring information, will be in the non real-time monitoring information asynchronous transmission to the monitoring The heart;
If the monitoring data received is non-supervised information, the non-supervised information is abandoned.
11. monitoring method as claimed in claim 7, which is characterized in that the configuration center is also configured following item:Institute State the convergence strategy, described of the main monitoring node selection mechanism of monitoring resource layer, tactful, the described monitoring data convergence-level of monitoring The main aggregation node selection mechanism of monitoring data convergence-level.
12. monitoring method as claimed in claim 7, which is characterized in that the monitoring data convergence-level is converged including multiple data Poly group, the monitoring method include:Each data convergence group is operated according to the rule that the configuration center issues.
CN201611268506.6A 2016-12-31 2016-12-31 For the monitoring system and method for data center Pending CN108268355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611268506.6A CN108268355A (en) 2016-12-31 2016-12-31 For the monitoring system and method for data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611268506.6A CN108268355A (en) 2016-12-31 2016-12-31 For the monitoring system and method for data center

Publications (1)

Publication Number Publication Date
CN108268355A true CN108268355A (en) 2018-07-10

Family

ID=62771238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611268506.6A Pending CN108268355A (en) 2016-12-31 2016-12-31 For the monitoring system and method for data center

Country Status (1)

Country Link
CN (1) CN108268355A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062699A (en) * 2018-08-15 2018-12-21 郑州云海信息技术有限公司 A kind of resource monitoring method, device, server and storage medium
CN109204389A (en) * 2018-09-12 2019-01-15 济南轨道交通集团有限公司 A kind of subway equipment fault diagnosis and self-healing method, system
CN110417597A (en) * 2019-07-29 2019-11-05 中国工商银行股份有限公司 For monitoring method and device, electronic equipment and the readable storage medium storing program for executing of certificate
CN111934923A (en) * 2020-07-30 2020-11-13 深圳市高德信通信股份有限公司 CDN network quality monitoring system based on internet
CN113923131A (en) * 2021-09-10 2022-01-11 北京世纪互联宽带数据中心有限公司 Monitoring information determination method and device, computing equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7318094B1 (en) * 2002-02-13 2008-01-08 Cisco Technology, Inc. Apparatus, system and device for collecting, aggregating and monitoring network management information
CN103092158A (en) * 2012-12-31 2013-05-08 深圳先进技术研究院 Large building energy consumption real-time monitoring system based on wireless sensor network
CN103414571A (en) * 2013-08-03 2013-11-27 东北大学 Information collecting and transmitting convergence node used for industrial monitoring
CN104052634A (en) * 2014-05-30 2014-09-17 国家电网公司 Information security monitoring system and method
CN105991366A (en) * 2015-03-05 2016-10-05 中国移动通信集团福建有限公司 Service monitoring method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7318094B1 (en) * 2002-02-13 2008-01-08 Cisco Technology, Inc. Apparatus, system and device for collecting, aggregating and monitoring network management information
CN103092158A (en) * 2012-12-31 2013-05-08 深圳先进技术研究院 Large building energy consumption real-time monitoring system based on wireless sensor network
CN103414571A (en) * 2013-08-03 2013-11-27 东北大学 Information collecting and transmitting convergence node used for industrial monitoring
CN104052634A (en) * 2014-05-30 2014-09-17 国家电网公司 Information security monitoring system and method
CN105991366A (en) * 2015-03-05 2016-10-05 中国移动通信集团福建有限公司 Service monitoring method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062699A (en) * 2018-08-15 2018-12-21 郑州云海信息技术有限公司 A kind of resource monitoring method, device, server and storage medium
CN109204389A (en) * 2018-09-12 2019-01-15 济南轨道交通集团有限公司 A kind of subway equipment fault diagnosis and self-healing method, system
CN110417597A (en) * 2019-07-29 2019-11-05 中国工商银行股份有限公司 For monitoring method and device, electronic equipment and the readable storage medium storing program for executing of certificate
CN110417597B (en) * 2019-07-29 2022-11-01 中国工商银行股份有限公司 Method and device for monitoring certificate, electronic equipment and readable storage medium
CN111934923A (en) * 2020-07-30 2020-11-13 深圳市高德信通信股份有限公司 CDN network quality monitoring system based on internet
CN113923131A (en) * 2021-09-10 2022-01-11 北京世纪互联宽带数据中心有限公司 Monitoring information determination method and device, computing equipment and storage medium
CN113923131B (en) * 2021-09-10 2023-08-22 北京世纪互联宽带数据中心有限公司 Monitoring information determining method and device, computing equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108268355A (en) For the monitoring system and method for data center
Serhani et al. Self-adapting cloud services orchestration for fulfilling intensive sensory data-driven IoT workflows
CN105653425B (en) Monitoring system based on complex event processing engine
WO2021129367A1 (en) Method and apparatus for monitoring distributed storage system
CN109214704A (en) A kind of distributed intelligence operation platform, method, apparatus and readable storage medium storing program for executing
US10142242B2 (en) Network support node traffic reduction for self-organizing networks
KR20150112357A (en) Sensor data processing system and method thereof
US9772871B2 (en) Apparatus and method for leveraging semi-supervised machine learning for self-adjusting policies in management of a computer infrastructure
CN106100868B (en) A kind of project operation and maintenance device, system and method
CN110309108A (en) Data acquisition and storage method, device, electronic equipment, storage medium
US20150081376A1 (en) Customization of event management and incident management policies
CN107544832A (en) A kind of monitoring method, the device and system of virtual machine process
CN108337127A (en) application performance monitoring method, system, terminal and computer readable storage medium
US11743237B2 (en) Utilizing machine learning models to determine customer care actions for telecommunications network providers
CN110096258A (en) A method of the OpenStack infrastructure architecture management based on Terraform
US11388109B2 (en) Hierarchical capacity management in a virtualization environment
CN107479974A (en) A kind of dispatching method of virtual machine and device
US20230010417A1 (en) Message oriented middleware cluster synchronization
CN107197002A (en) Cloud computing system and cloud data processing method
Wladdimiro et al. Disaster management platform to support real-time analytics
CN112925619A (en) Big data real-time computing method and platform
CN103414717A (en) Simulation monitoring method and system in regard to C / S structure service system
CN114756301B (en) Log processing method, device and system
JP2021506010A (en) Methods and systems for tracking application activity data from remote devices and generating modified behavioral data structures for remote devices
CN115719147A (en) Power transmission line inspection data processing method, device and platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180710

RJ01 Rejection of invention patent application after publication