CN108337100A - A kind of method and apparatus of cloud platform monitoring - Google Patents

A kind of method and apparatus of cloud platform monitoring Download PDF

Info

Publication number
CN108337100A
CN108337100A CN201710043469.7A CN201710043469A CN108337100A CN 108337100 A CN108337100 A CN 108337100A CN 201710043469 A CN201710043469 A CN 201710043469A CN 108337100 A CN108337100 A CN 108337100A
Authority
CN
China
Prior art keywords
related information
data
layer
misoperation
hierarchical structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710043469.7A
Other languages
Chinese (zh)
Other versions
CN108337100B (en
Inventor
龚国成
舒忠玲
刘强
余永华
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile M2M Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile M2M Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile M2M Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201710043469.7A priority Critical patent/CN108337100B/en
Publication of CN108337100A publication Critical patent/CN108337100A/en
Application granted granted Critical
Publication of CN108337100B publication Critical patent/CN108337100B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a kind of cloud platform monitoring method, the method includes:Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and mark the related information of the misoperation data;After determining the cloud platform failure event, according to the related information, the operation of tracing to the source to the event of failure is realized.The embodiment of the invention also discloses a kind of devices of cloud platform monitoring.

Description

A kind of method and apparatus of cloud platform monitoring
Technical field
The present invention relates to the method and apparatus that field of cloud computer technology more particularly to a kind of cloud platform monitor.
Background technology
Cloud platform has the characteristics that big scale, virtualization, dynamic, real-time, and this requires cloud platform monitoring systems must Extensive resource, monitoring virtual resource and dynamic resource and real time inspection monitoring report must can be monitored, and monitors service The features such as measurability.And it is had the following disadvantages in existing cloud platform monitoring system:When monitoring event of failure in cloud platform When, it is difficult to which monitoring device failure determination is caused by, i.e., it is larger to the operation difficulty of tracing to the source for the event of breaking down;Very The function of monitoring system on cloudy platform is relatively fixed, is difficult to realize the dynamic expansion of system function.
Invention content
In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of method and apparatus of cloud platform monitoring, it is real The operation of tracing to the source to event of failure is showed.
The technical proposal of the invention is realized in this way:
An embodiment of the present invention provides a kind of methods of cloud platform monitoring, including:
Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;The hierarchical structure is by upper Include successively under:Identification information positioned at the virtual unit of top layer and the virtual unit positioned at the second layer are at least One functional group;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquire the operation data of the virtual unit;
When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and Mark the related information of the misoperation data;The related information of the misoperation data includes:The hierarchical structure In two layers in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with misoperation number According to associated virtual unit identification information;
After determining the cloud platform failure event, according to the related information, realize to the event of failure It traces to the source operation.
In said program, the hierarchical structure further includes:Positioned at least one function of the virtual unit of third layer The IP address at least one server that group uses, positioned at the IP address of the 4th layer of at least one server corresponding at least one A monitoring project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
In said program, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with it is different In the normal associated monitoring project of operation data and the hierarchical structure third layer with the server ip of misoperation data correlation Location;
The related information of the label misoperation data includes:Determining that collected operation data is abnormal transports When row data, believed according to the 4th layer in the related information of misoperation data described in misoperation data markers in bottom of association Breath;According to the 4th of label the layer of related information, the association of third layer in the related information of the misoperation data is marked to believe Breath;According to the related information of the third layer of label, the association of the second layer in the related information of the misoperation data is marked to believe Breath;According to the related information of the second layer of label, the related information of top layer in the related information of the misoperation data is marked.
It is described after determining the cloud platform failure event in said program, according to the related information, realization pair The operation of tracing to the source of the event of failure, including:After determining the cloud platform failure event, marked according in hierarchical structure Related information inquire in the related informations of the misoperation data at least one other layer in addition to bottom of related information, it is real Now to the operation of tracing to the source of the event of failure.
In said program, the method further includes:After establishing corresponding hierarchical structure at least one virtual unit, increase Add or delete at least one functional group in the second layer of hierarchical structure.
The embodiment of the present invention additionally provides a kind of device of cloud platform monitoring, and described device includes:Establish module, acquisition mould Block, processing module and locating module;Wherein,
Module is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform;Institute Stating hierarchical structure includes successively from top to bottom:Positioned at the identification information of the virtual unit of top layer and positioned at described in the second layer At least one functional group of virtual unit;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module, the operation data for acquiring the virtual unit;
Processing module, for when it is misoperation data to determine collected operation data, determining that the cloud platform goes out Existing event of failure, and mark the related information of the misoperation data;The related information of the misoperation data includes:Institute State in the hierarchical structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer With the virtual unit identification information of misoperation data correlation;
Locating module, for when determining the cloud platform failure event, according to the related information, realizing to institute State the operation of tracing to the source of event of failure.
In said program, the hierarchical structure further includes:Positioned at least one function of the virtual unit of third layer The IP address at least one server that group uses, positioned at the IP address of the 4th layer of at least one server corresponding at least one A monitoring project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
In said program, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with it is different In the normal associated monitoring project of operation data and the hierarchical structure third layer with the server ip of misoperation data correlation Location;
The processing module, specifically for determine collected operation data be misoperation data when, according to bottom 4th layer of related information in the related information of misoperation data described in middle misoperation data markers;According to the 4th of label the The related information of layer, marks the related information of third layer in the related information of the misoperation data;According to the third of label The related information of layer, marks the related information of the second layer in the related information of the misoperation data;According to the second of label The related information of layer, marks the related information of top layer in the related information of the misoperation data.
In said program, the locating module, specifically for when determining the cloud platform failure event, according to layer The related information marked in level structure inquires in the related informations of the misoperation data at least one other layer in addition to bottom Related information, realize operation of tracing to the source to the event of failure.
It is described to establish module in said program, it is additionally operable to establishing corresponding hierarchical structure at least one virtual unit Afterwards, at least one functional group in the second layer of increase or deletion hierarchical structure.
In the embodiment of the present invention, corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform; Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud is determined Platform failure event, and mark the related information of the misoperation data;Determining the cloud platform failure thing After part, according to the related information, the operation of tracing to the source to the event of failure is realized.In this way, realizing to the event of failure Operation of tracing to the source.
Description of the drawings
Fig. 1 is the flow chart of the first embodiment of the method for cloud platform of the present invention monitoring;
Fig. 2 is the schematic diagram of the first hierarchical structure of virtual unit to be monitored in the embodiment of the present invention;
Fig. 3 is the first composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring;
Fig. 4 is the schematic diagram of the second hierarchical structure of virtual unit to be monitored in the embodiment of the present invention;
Fig. 5 is the second composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes.
First embodiment
Fig. 1 is the flow chart of the first embodiment of the method for cloud platform of the present invention monitoring, as shown in Figure 1, this method includes:
Step 100:Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform.
In this step, hierarchical structure includes successively from top to bottom:Positioned at top layer the virtual unit identification information and Positioned at least one functional group of the virtual unit of the second layer;When each functional group is for indicating the virtual unit operation A kind of function.
Preferably, hierarchical structure further includes:It is used positioned at least one functional group of the virtual unit of third layer Address Internet protocol address (Internet Protocol, IP) of at least one server, positioned at least one of the 4th layer The corresponding at least one monitoring project of IP address, the corresponding virtual unit of at least one monitoring project positioned at bottom of server Operation data.
The crucial host of each system to be monitored is abstracted as virtual unit, the attribute letter of virtual unit by the embodiment of the present invention Breath is used for indicating different virtual units as identification information.Here, the attribute information of virtual unit includes:Equipment is write a Chinese character in simplified form name, is set Standby description, affiliated project or name of product, affiliated functional group, IP address, operating system and version.
Preferably, the embodiment of the present invention is according to product, and --- functional group --- supervise by server address --- monitoring project --- Content is surveyed to establish the hierarchical structure relationship of virtual unit to be monitored.
Fig. 2 is the schematic diagram of the first hierarchical structure of virtual unit to be monitored in the embodiment of the present invention, as shown in Fig. 2, empty The composition for proposing standby hierarchical structure includes:Positioned at the product identification information of top layer, positioned at the N number of function of second layer virtual unit Group, N takes more than 0 integer, positioned at third layer each function use whole IP address, positioned at the 4th layer each IP The corresponding monitoring project in address, positioned at bottom the corresponding virtual unit of each monitoring project operation data.
Illustratively, monitoring project may include:Host, application program, service, user behavior, middleware or database Deng.When monitoring project is host, corresponding monitoring content may include:Central processing unit (Central Processing Unit, CPU) total occupancy, User space CPU usage, kernel state CPU usage, interrupt CPU usage, hard disk remaining space, Hard disk utilization rate, magnetic disc i/o average time, magnetic disc i/o average throughput, physical memory utilization rate, exchange memory utilization rate, net Network upstream rate, network downstream rate etc.;
When monitoring project is application program, corresponding monitoring content may include:The operation of some critical applications Data and access record, the availability and quality of application program are determined by judging these monitoring contents.Such as:Crucial API The number of calling and response condition etc.;
When monitoring project is service, corresponding monitoring content may include:The operating status of large-scale service software.Example Such as:Nginx adds up request number of times, Nginx number of request per second, Nginx actively connects number, Nginx abandon connection number and The operating status etc. of Tomcat, MySQL, Apache;
When monitoring project is user behavior, corresponding monitoring content includes:Access monitoring, uniform resource locator (Uniform Resource Locator, URL) monitoring, content monitoring.Monitoring is accessed for obtaining user's access speed, URL Monitoring includes response time, mortality, and to understand the real time access state that services, content monitoring is for grasping web page element variation;
When monitoring project is middleware or database, corresponding monitoring content includes:I O throughputs, CPU use The data such as rate, disk occupancy.
In this step, it can also realize the management operation to hierarchical structure, can be specifically:It is virtually set to be at least one After establishing corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
Step 101:Acquire the operation data of the virtual unit.
Optionally, the operation data of acquisition virtual unit project to be monitored is parsed as monitoring data by protocol adaptation Monitoring data, and will be in the data buffer storage to message queue after parsing.Data after parsing can be according to business demand according to pre- If storage rule be directly stored in database, after can also handling the data after parsing according still further to preset storage advise Then database is arrived in storage.
To being stored again after the data processing after parsing, including:The first, Data Stream Processing, root are carried out to the data after parsing According to the strategy of configuration, complete the processing such as to calculate, alert;The second, data processing model is trained first with historical data, then The data after parsing are handled using data processing model.
Here, setting storage rule is to realize classification and the divided data library storage to data.Illustratively, storage rule Can be then:Key-Value databases are for storing metadata;Relevant database is for storing user information, handling result The data such as information, configuration information, history alarm and historical statistics;Non-relational (Not Only Structured Query Language, NoSQL) database be used for persistent storage historical data.
Step 102:When it is misoperation data to determine collected operation data, determine that the cloud platform breaks down Event, and mark the related information of the misoperation data;The related information of the misoperation data includes:The level In the structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with exception The associated virtual unit identification information of operation data.
In actual implementation, cloud platform realizes the real-time monitoring to operation data according to the strategy of configuration, works as operation data When meeting preset warning strategies, determine that collected operation data is misoperation data.
Preferably, the related information of misoperation data can also include:In the 4th layer of hierarchical structure with misoperation number According to the server ip address with misoperation data correlation in associated monitoring project and hierarchical structure third layer;
Correspondingly, when it is misoperation data to determine collected operation data, according to misoperation data in bottom Mark the 4th layer in the related information of the misoperation data related information (i.e. in the 4th layer of hierarchical structure with misoperation The monitoring project of data correlation);According to the 4th of label the layer of related information, the related information of the misoperation data is marked The related information (server ip address i.e. in hierarchical structure third layer with misoperation data correlation) of middle third layer;According to mark The related information of the third layer of note marks related information (the i.e. level of the second layer in the related information of the misoperation data In the structure second layer with the functional group of misoperation data correlation);According to the related information of the second layer of label, label is described different The related information of top layer is (i.e. virtual with misoperation data correlation in hierarchical structure top layer in the related information of normal operation data Equipment identification information).
When there are multiple misoperation data simultaneously in cloud platform, successively looked into from the bottom to top by being used to hierarchical structure The method looked for marks related information, determines the corresponding monitoring project of misoperation data, IP address, functional group and product Identification information.Optionally, in order to distinguish the corresponding related information of different misoperation data, can increase when marking related information The identification information of different misoperation data.Such as misoperation data 1 correspond to identification information 1 respectively to misoperation data X Can be in this way which belongs to by detecting identification information come the related information determined in hierarchical structure to identification information X The related information of misoperation data.
Step 103:After determining the cloud platform failure event, according to the related information, realize to the event The operation of tracing to the source of barrier event.
In this step, after determining the cloud platform failure event, according to the related information marked in hierarchical structure Inquire in the related information of the misoperation data at least one other layer in addition to bottom of related information, realize to described therefore The operation of tracing to the source of barrier event.
Illustratively, when needing the CPU usage to the host of virtual unit A to be monitored, when monitoring making for CPU When being more than threshold value with rate, the related information of the CPU usage of host is marked.Related information includes at this time:The layer of virtual unit A The monitoring content of bottom is CPU usage in level structure, the 4th layer of monitoring project is host supervision, is serviced used in third layer Device address is the product identification of IP address 1, the functional group of the second layer 1 and top layer.When monitoring current CPU usage exception, Label hierarchical structure determines all related informations of misoperation data, then gets the bid by searching for the hierarchical structure of virtual unit A The related information of note realizes the operation of tracing to the source to misoperation data.
It is understood that there may be multiple virtual units to be monitored in cloud platform, therefore work as event of failure When need quickly positioning that can inquire the top layer in the hierarchical structure of all virtual units when there is the virtual unit of event of failure Product identification information, by judging whether the product identification information of top layer in each hierarchical structure is marked as event of failure Related information improves the search efficiency of event of failure to determine the virtual unit for event of failure occur.
Can also include in the embodiment of the present invention:The content of the dynamic management of cloud platform system function, management includes to extremely The increase of a few system function or delete operation, system function include:Run monitoring, equipment details, configuration management, history announcement The functions such as police, forecast analysis, user management, data export.In actual implementation, can be realized by RESTful API modes To the dynamic expansion of system function, meets monitoring and regulatory requirement in cloud platform, solve the monitoring function in existing cloud platform Fixed defect.
Wherein, operational monitoring function:Be based on the tree list that virtual unit hierarchical structure is established, according to product, Functional group, server ip address three-level catalogue check the state of every level-one.And virtual unit is indicated using colour code Operating status, such as:Green is identified as normally, and yellow is identified as generation alarm event, and red is identified as generation anomalous event, simultaneously Identify the currently monitored numerical value.
Equipment details look facility:It supports to find some equipment by directory tree, checks the detailed operation feelings of the equipment Condition.Such as:Have under the equipment which application program middleware database, how many process be carrying out and process use The case where resource, including I the data such as O throughputs, CPU usage, memory usage, disk utilization rate, and support data Graphical representation.
Configuration management function:Realize the mark to all resources of cloud platform, including the grouping of hardware resource, server, process The information such as resource, port resource, IP resources, business, software.
The strategy of configuration includes:Warning strategies and fail-over policy etc..Here, warning strategies include:Alarm triggered item Information, the warning strategies such as part, alarm object, alarm recipient, alarm reception mode can be associated with product, policing type.Such as: Alarm triggered condition can be:When some Supervision measured value is more than alarming threshold value in some product, different stage should be generated Alarm.Alarm triggered condition may be simple conditional expression, such as:A≥C、A≤C、A>C or A<C, wherein A are monitoring Value, C are alarming threshold value.Alarming threshold value can customize, and threshold value is mainly for single monitoring data.
Fail-over policy includes:For generic failure, can be executed according to fail-over policy by system, for example, working as magnetic When disk is full, system carries out garbage files deletion and dilatation automatically, when CPU, memory usage are excessively high, kills engineering noise process.
History alarm query function:It can be accused according to condition queries history such as alarm time, warning strategies type, alarm levels Alert information, checks warning information disposition.
Forecast analysis function:Realize the functions such as management, prediction and the prediction result inquiry of data processing model.Model pipe The main management of reason is utilized the data processing model of historical data training by data analysis engine;Forecast function mainly utilizes model Forecast analysis is carried out to monitoring data such as host, service, application, users, online data can be monitored in real time, it also can be right Historical data carries out offline batch processing, improves forecast analysis efficiency;Prediction result inquiry is to show host, clothes according to querying condition Business, application, the relevant predictive analysis results of user.
Subscriber management function:Support two-stage user management, super keepe and ordinary user.Super keepe can be to common User carries out authority distribution, audits, checks, changing ordinary user's relevant information, ordinary user can only carry out data inspection and base This information is changed.
Data export function:By configuration data derived rule, by export such as monitoring data, handling result, warning information It is checked with facilitating.Derived rule may include:Be arranged export the time, export data volume, export format, export data deposit Storage space is set.Such as:By exporting data in fixed interval automatically from the background, the formats such as Excel or pdf are exported to.
In the embodiment of the present invention, online processing is carried out to monitoring data using Data Stream Processing mode, improves data processing Efficiency;When data query is analyzed, support, to the real-time processing of online data or the offline batch processing of historical data, to carry simultaneously For data prediction mechanism, reading process result can be shown directly from database, Optimizing Queries effect;Support data Classification, divided data library storage improve data reading performance using redundancy;The hierarchical structure of monitoring project is established, is supported to cloud platform monitoring item Purpose flexibly expands and management;It realizes that function is presented and operated by RESTful API modes, is conducive to the dynamic of system function Extension.
In the embodiment of the present invention, corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform; Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud is determined Platform failure event, and mark the related information of the misoperation data;Determining the cloud platform failure thing After part, according to the related information, the operation of tracing to the source to the event of failure is realized.It traces back to event of failure in this way, realizing Source operates.
Second embodiment
Fig. 3 is the first composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring, as shown in figure 3, the dress Set including:Establish module 300, acquisition module 301, processing module 302 and locating module 303;Wherein,
Module 300 is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform; The hierarchical structure includes successively from top to bottom:Identification information positioned at the virtual unit of top layer and the institute positioned at the second layer State at least one functional group of virtual unit;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module 301, the operation data for acquiring the virtual unit;
Processing module 302, for when it is misoperation data to determine collected operation data, determining the cloud platform Failure event, and mark the related information of the misoperation data;The related information of the misoperation data includes: In the hierarchical structure second layer with the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer In virtual unit identification information with misoperation data correlation;
Locating module 303, for when determining the cloud platform failure event, according to the related information, realizing Operation of tracing to the source to the event of failure.
Preferably, hierarchical structure can also include:At least one functional group positioned at the virtual unit of third layer makes The IP address of at least one server, the corresponding at least one prison of IP address positioned at the 4th layer of at least one server Survey project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
Fig. 4 is the schematic diagram of the second hierarchical structure of virtual unit to be monitored in the embodiment of the present invention, as shown in figure 4, pressing According to product-functional group-IP address-monitoring project (including:Application program, middleware, database)-monitoring content (I/O Port, CPU, hard disk etc.) establish the hierarchical structure of virtual unit.The server of one product can be divided into multiple by function Functional group, a functional group can be realized that server can dispose middleware, database, pass through process by multiple servers Monitor host, service, application indices, the hierarchical structure be easy to implement to monitoring project it is flexible expansion with manage, have It is tracked conducive to event of failure.
Preferably, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with abnormal fortune Server ip address in the associated monitoring project of row data and the hierarchical structure third layer with misoperation data correlation;
Processing module 302, specifically for determine collected operation data be misoperation data when, according in bottom 4th layer of related information in the related information of misoperation data described in misoperation data markers;According to the 4th of label the layer Related information, mark the related information of third layer in the related information of the misoperation data;According to the third layer of label Related information, mark the related information of the second layer in the related information of the misoperation data;According to the second layer of label Related information, mark the related information of top layer in the related information of the misoperation data.
Locating module 303, specifically for when determining the cloud platform failure event, being marked according in hierarchical structure Related information inquire in the related informations of the misoperation data at least one other layer in addition to bottom of related information, it is real Now to the operation of tracing to the source of the event of failure.
Module 300 is established, is additionally operable to after establishing corresponding hierarchical structure at least one virtual unit, increase or is deleted At least one functional group in the second layer of hierarchical structure.
In practical applications,:Establishing module 300, acquisition module 301, processing module 302 and locating module 303 can be by Central processing unit (Central Processing Unit, CPU), microprocessor (Micro in terminal device Processor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field-programmable The realizations such as gate array (Field Programmable Gate Array, FPGA).
3rd embodiment
Fig. 5 is the second composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring, as shown in figure 5, the dress Set including:Acquisition module, data analysis module, data memory module, management module and the ends Web.
Acquisition module includes:Data acquisition unit, protocol adaptation unit, message buffer unit.Wherein:
Data acquisition unit generates data point for acquiring virtual unit running log, and periodically uploads;It needs Bright, when uploading data point, data packet structure format is referred to REST API specifications, and data point value uses single layer JavaScript object representation (JavaScript Object Notation, JSON) is built, convenient for cross-platform, across language Data use and interaction.
Protocol adaptation unit, for realizing to monitoring data protocol adaptation and parsing;
Message buffer unit, in the data buffer storage to message queue after parse, so as to data analysis module reading And it handles.
Data analysis module realizes the online place of monitoring data for reading the monitoring data after being parsed in message queue Reason/offline batch processing;Judge that treated whether data are misoperation data, is determining that collected operation data is abnormal It when operation data, determines occur event of failure in cloud platform, and marks institute of the misoperation data in hierarchical structure related Join information;When it is normal operation data to determine collected operation data, continue to monitor.
Specifically, the major function of data analysis module includes:The first, real time data stream process is read according to configuration strategy Parsing data are taken to complete the processing such as to calculate, alert;The second, model training is carried out using historical data, model can be according to business certainly Definition, for calculating and forecast analysis;Third monitors online data using model in real time, or to historical data into The offline batch processing of row, improves forecast analysis efficiency.
Data memory module, for realizing to data classification and divided data library storage.Here, storage rule can be: Key-Value databases are for storing metadata;Relevant database is for storing user information, processing result information, configuration The data such as information, history alarm and historical statistics;Non-relational (Not Only Structured Query Language, NoSQL) database is used for persistent storage historical data.
The ends Web, for realizing the management to system function, and the access operation to monitoring event.
Specifically, the dynamic expansion to system function may be implemented by RESTful API modes in the ends Web, and it is flat to meet cloud Monitoring and regulatory requirement on platform, the monitoring function solved on existing cloud service platform fix defect.
The ends Web can also realize the spirit to monitoring data, handling result, warning information etc. by RESTful API modes It is living to access, when the virtual unit failure event monitored, all associations of misoperation data are presented by the ends Web and are believed Breath realizes the operation of tracing to the source to event of failure, improves the treatment effeciency of event of failure.
The embodiment of the present invention can be divided into following three kinds of situations when inquiring monitoring data:
The first, it is directed to the higher monitoring data/handling result of inquiry request frequency, data memory module first stores monitoring number According to/handling result, the ends Web directly read monitoring data/handling result in data memory module and show;
The second, for the inquiry request for needing to handle in real time, by data analysis module according to preset configuration strategy to institute It states monitoring data and carries out online processing, and monitoring data are sent to the ends Web, the monitoring after the display processing of the ends Web by treated Data;By treated, monitoring data are stored to data memory module simultaneously;
Third, when sending out inquiry request for whole monitoring data, data analysis module carries out off-line monitoring data Batch processing, and monitoring data are stored to data memory module by treated, the monitoring number in data memory module is read at the ends Web According to and show.
Management module includes:Heartbeat administrative unit, dispensing unit, service management unit, upgrade unit.Wherein:
Heartbeat administrative unit, for monitoring Host Status.
Configuration management element, for identifying all resources of cloud platform, configuration content includes hardware asset information, server point Group information, monitoring strategies information, warning strategies, fail-over policy etc. realize that the additions and deletions to resource, strategy change and look into operation.
Service management unit, for managing user right and essential information.
Upgrade unit, for providing the upgrade service to cloud platform.
In the embodiment of the present invention, established by using the product identification of virtual unit, functional group, IP address etc. corresponding Hierarchical structure is conducive to the operation of tracing to the source to event of failure.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention Formula.Moreover, the present invention can be used can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims (10)

1. a kind of method of cloud platform monitoring, which is characterized in that the method includes:
Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;The hierarchical structure is from top to bottom Include successively:Identification information positioned at the virtual unit of top layer and the virtual unit positioned at the second layer it is at least one Functional group;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquire the operation data of the virtual unit;
When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and mark The related information of the misoperation data;The related information of the misoperation data includes:The hierarchical structure second layer In in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with misoperation data close The virtual unit identification information of connection;
After determining the cloud platform failure event, according to the related information, realization traces to the source to the event of failure Operation.
2. according to the method described in claim 1, it is characterized in that, the hierarchical structure further includes:Described in third layer The IP address at least one server that at least one functional group of virtual unit uses, positioned at least one of the 4th layer service The corresponding at least one monitoring project of IP address of device, positioned at bottom the corresponding virtual unit of at least one monitoring project fortune Row data.
3. according to the method described in claim 2, it is characterized in that, the related information of the misoperation data further includes:Institute State in monitoring project and the hierarchical structure third layer in the 4th layer of hierarchical structure with misoperation data correlation with abnormal fortune The associated server ip address of row data;
The related information of the label misoperation data includes:Determining that collected operation data is misoperation number According to when, according to the 4th layer in the related information of misoperation data described in misoperation data markers in bottom of related information; According to the 4th of label the layer of related information, the related information of third layer in the related information of the misoperation data is marked; According to the related information of the third layer of label, the related information of the second layer in the related information of the misoperation data is marked; According to the related information of the second layer of label, the related information of top layer in the related information of the misoperation data is marked.
4. according to the method described in claim 1, it is characterized in that, described after determining the cloud platform failure event, According to the related information, the operation of tracing to the source to the event of failure is realized, including:Determining the cloud platform failure thing After part, inquired according to the related information marked in hierarchical structure in the related information of the misoperation data in addition to bottom at least The related information of one other layer realizes the operation of tracing to the source to the event of failure.
5. according to the method described in claim 1, it is characterized in that, the method further includes:For at least one virtual unit After establishing corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
6. a kind of device of cloud platform monitoring, which is characterized in that described device includes:Establish module, acquisition module, processing module And locating module;Wherein,
Module is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform;The layer Level structure includes successively from top to bottom:Positioned at the identification information of the virtual unit of top layer and positioned at the described virtual of the second layer At least one functional group of equipment;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module, the operation data for acquiring the virtual unit;
Processing module, for when it is misoperation data to determine collected operation data, determining that event occurs in the cloud platform Barrier event, and mark the related information of the misoperation data;The related information of the misoperation data includes:The layer In the level structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with it is different The associated virtual unit identification information of normal operation data;
Locating module, for when determining the cloud platform failure event, according to the related information, realizing to the event The operation of tracing to the source of barrier event.
7. device according to claim 6, which is characterized in that the hierarchical structure further includes:Described in third layer The IP address at least one server that at least one functional group of virtual unit uses, positioned at least one of the 4th layer service The corresponding at least one monitoring project of IP address of device, positioned at bottom the corresponding virtual unit of at least one monitoring project fortune Row data.
8. device according to claim 7, which is characterized in that the related information of the misoperation data further includes:Institute State in monitoring project and the hierarchical structure third layer in the 4th layer of hierarchical structure with misoperation data correlation with abnormal fortune The associated server ip address of row data;
The processing module, specifically for determine collected operation data be misoperation data when, according to different in bottom Normal operation data marks the 4th layer in the related informations of the misoperation data of related information;According to the 4th of label the layer Related information marks the related information of third layer in the related information of the misoperation data;According to the third layer of label Related information marks the related information of the second layer in the related information of the misoperation data;According to the second layer of label Related information marks the related information of top layer in the related information of the misoperation data.
9. device according to claim 6, which is characterized in that the locating module, specifically for determining that the cloud is flat When platform failure event, inquired according to the related information marked in hierarchical structure in the related information of the misoperation data At least one other layer of the related information in addition to bottom realizes the operation of tracing to the source to the event of failure.
10. device according to claim 6, which is characterized in that it is described to establish module, it is additionally operable to be at least one virtual After equipment establishes corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
CN201710043469.7A 2017-01-19 2017-01-19 Cloud platform monitoring method and device Active CN108337100B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710043469.7A CN108337100B (en) 2017-01-19 2017-01-19 Cloud platform monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710043469.7A CN108337100B (en) 2017-01-19 2017-01-19 Cloud platform monitoring method and device

Publications (2)

Publication Number Publication Date
CN108337100A true CN108337100A (en) 2018-07-27
CN108337100B CN108337100B (en) 2021-07-09

Family

ID=62922221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710043469.7A Active CN108337100B (en) 2017-01-19 2017-01-19 Cloud platform monitoring method and device

Country Status (1)

Country Link
CN (1) CN108337100B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634813A (en) * 2018-12-11 2019-04-16 平安科技(深圳)有限公司 Electronic device, cloud platform exception confirmation method and storage medium
CN110855473A (en) * 2019-10-16 2020-02-28 平安科技(深圳)有限公司 Monitoring method, device, server and storage medium
CN113799850A (en) * 2021-08-25 2021-12-17 通号城市轨道交通技术有限公司 Running state monitoring method and device, electronic equipment and storage medium
CN117724880A (en) * 2023-06-13 2024-03-19 荣耀终端有限公司 Fault information processing method, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227338A1 (en) * 2012-02-28 2013-08-29 International Business Machines Corporation Reconfiguring interrelationships between components of virtual computing networks
CN104142848A (en) * 2013-05-08 2014-11-12 西安邮电大学 Virtual machine identifier and use method thereof
CN104486406A (en) * 2014-12-15 2015-04-01 浪潮电子信息产业股份有限公司 Layered resource monitoring method based on cloud data center
CN106130809A (en) * 2016-09-07 2016-11-16 东南大学 A kind of IaaS cloud platform network failure locating method based on log analysis and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227338A1 (en) * 2012-02-28 2013-08-29 International Business Machines Corporation Reconfiguring interrelationships between components of virtual computing networks
CN104142848A (en) * 2013-05-08 2014-11-12 西安邮电大学 Virtual machine identifier and use method thereof
CN104486406A (en) * 2014-12-15 2015-04-01 浪潮电子信息产业股份有限公司 Layered resource monitoring method based on cloud data center
CN106130809A (en) * 2016-09-07 2016-11-16 东南大学 A kind of IaaS cloud platform network failure locating method based on log analysis and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634813A (en) * 2018-12-11 2019-04-16 平安科技(深圳)有限公司 Electronic device, cloud platform exception confirmation method and storage medium
CN110855473A (en) * 2019-10-16 2020-02-28 平安科技(深圳)有限公司 Monitoring method, device, server and storage medium
WO2021073433A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Monitoring method and device, server, and storage medium
CN113799850A (en) * 2021-08-25 2021-12-17 通号城市轨道交通技术有限公司 Running state monitoring method and device, electronic equipment and storage medium
CN117724880A (en) * 2023-06-13 2024-03-19 荣耀终端有限公司 Fault information processing method, electronic device and storage medium

Also Published As

Publication number Publication date
CN108337100B (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN111984499B (en) Fault detection method and device for big data cluster
CN109408347B (en) A kind of index real-time analyzer and index real-time computing technique
CN112653586B (en) Time-space big data platform application performance management method based on full link monitoring
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
US20200372007A1 (en) Trace and span sampling and analysis for instrumented software
US20170109668A1 (en) Model for Linking Between Nonconsecutively Performed Steps in a Business Process
CN106487574A (en) Automatic operating safeguards monitoring system
CN108763957A (en) A kind of safety auditing system of database, method and server
US20100070981A1 (en) System and Method for Performing Complex Event Processing
US20170109667A1 (en) Automaton-Based Identification of Executions of a Business Process
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
CN106940677A (en) One kind application daily record data alarm method and device
CN108337100A (en) A kind of method and apparatus of cloud platform monitoring
US20170109636A1 (en) Crowd-Based Model for Identifying Executions of a Business Process
US20170109638A1 (en) Ensemble-Based Identification of Executions of a Business Process
CN112052134A (en) Service data monitoring method and device
US9922116B2 (en) Managing big data for services
CN112039726A (en) Data monitoring method and system for content delivery network CDN device
CN108182263A (en) A kind of date storage method of data center&#39;s total management system
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
CN112181704A (en) Big data task processing method and device, electronic equipment and storage medium
CN109032904A (en) Monitored, management server and data acquisition, analysis method and management system
CN115333966A (en) Nginx log analysis method, system and equipment based on topology
US20170109640A1 (en) Generation of Candidate Sequences Using Crowd-Based Seeds of Commonly-Performed Steps of a Business Process
CN106528448A (en) Distributed caching mechanism for multi-source heterogeneous electronic commerce big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant