CN108337100A - A kind of method and apparatus of cloud platform monitoring - Google Patents
A kind of method and apparatus of cloud platform monitoring Download PDFInfo
- Publication number
- CN108337100A CN108337100A CN201710043469.7A CN201710043469A CN108337100A CN 108337100 A CN108337100 A CN 108337100A CN 201710043469 A CN201710043469 A CN 201710043469A CN 108337100 A CN108337100 A CN 108337100A
- Authority
- CN
- China
- Prior art keywords
- related information
- data
- layer
- misoperation
- hierarchical structure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/044—Network management architectures or arrangements comprising hierarchical management structures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/065—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0823—Errors, e.g. transmission errors
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention discloses a kind of cloud platform monitoring method, the method includes:Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and mark the related information of the misoperation data;After determining the cloud platform failure event, according to the related information, the operation of tracing to the source to the event of failure is realized.The embodiment of the invention also discloses a kind of devices of cloud platform monitoring.
Description
Technical field
The present invention relates to the method and apparatus that field of cloud computer technology more particularly to a kind of cloud platform monitor.
Background technology
Cloud platform has the characteristics that big scale, virtualization, dynamic, real-time, and this requires cloud platform monitoring systems must
Extensive resource, monitoring virtual resource and dynamic resource and real time inspection monitoring report must can be monitored, and monitors service
The features such as measurability.And it is had the following disadvantages in existing cloud platform monitoring system:When monitoring event of failure in cloud platform
When, it is difficult to which monitoring device failure determination is caused by, i.e., it is larger to the operation difficulty of tracing to the source for the event of breaking down;Very
The function of monitoring system on cloudy platform is relatively fixed, is difficult to realize the dynamic expansion of system function.
Invention content
In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of method and apparatus of cloud platform monitoring, it is real
The operation of tracing to the source to event of failure is showed.
The technical proposal of the invention is realized in this way:
An embodiment of the present invention provides a kind of methods of cloud platform monitoring, including:
Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;The hierarchical structure is by upper
Include successively under:Identification information positioned at the virtual unit of top layer and the virtual unit positioned at the second layer are at least
One functional group;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquire the operation data of the virtual unit;
When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and
Mark the related information of the misoperation data;The related information of the misoperation data includes:The hierarchical structure
In two layers in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with misoperation number
According to associated virtual unit identification information;
After determining the cloud platform failure event, according to the related information, realize to the event of failure
It traces to the source operation.
In said program, the hierarchical structure further includes:Positioned at least one function of the virtual unit of third layer
The IP address at least one server that group uses, positioned at the IP address of the 4th layer of at least one server corresponding at least one
A monitoring project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
In said program, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with it is different
In the normal associated monitoring project of operation data and the hierarchical structure third layer with the server ip of misoperation data correlation
Location;
The related information of the label misoperation data includes:Determining that collected operation data is abnormal transports
When row data, believed according to the 4th layer in the related information of misoperation data described in misoperation data markers in bottom of association
Breath;According to the 4th of label the layer of related information, the association of third layer in the related information of the misoperation data is marked to believe
Breath;According to the related information of the third layer of label, the association of the second layer in the related information of the misoperation data is marked to believe
Breath;According to the related information of the second layer of label, the related information of top layer in the related information of the misoperation data is marked.
It is described after determining the cloud platform failure event in said program, according to the related information, realization pair
The operation of tracing to the source of the event of failure, including:After determining the cloud platform failure event, marked according in hierarchical structure
Related information inquire in the related informations of the misoperation data at least one other layer in addition to bottom of related information, it is real
Now to the operation of tracing to the source of the event of failure.
In said program, the method further includes:After establishing corresponding hierarchical structure at least one virtual unit, increase
Add or delete at least one functional group in the second layer of hierarchical structure.
The embodiment of the present invention additionally provides a kind of device of cloud platform monitoring, and described device includes:Establish module, acquisition mould
Block, processing module and locating module;Wherein,
Module is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform;Institute
Stating hierarchical structure includes successively from top to bottom:Positioned at the identification information of the virtual unit of top layer and positioned at described in the second layer
At least one functional group of virtual unit;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module, the operation data for acquiring the virtual unit;
Processing module, for when it is misoperation data to determine collected operation data, determining that the cloud platform goes out
Existing event of failure, and mark the related information of the misoperation data;The related information of the misoperation data includes:Institute
State in the hierarchical structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer
With the virtual unit identification information of misoperation data correlation;
Locating module, for when determining the cloud platform failure event, according to the related information, realizing to institute
State the operation of tracing to the source of event of failure.
In said program, the hierarchical structure further includes:Positioned at least one function of the virtual unit of third layer
The IP address at least one server that group uses, positioned at the IP address of the 4th layer of at least one server corresponding at least one
A monitoring project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
In said program, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with it is different
In the normal associated monitoring project of operation data and the hierarchical structure third layer with the server ip of misoperation data correlation
Location;
The processing module, specifically for determine collected operation data be misoperation data when, according to bottom
4th layer of related information in the related information of misoperation data described in middle misoperation data markers;According to the 4th of label the
The related information of layer, marks the related information of third layer in the related information of the misoperation data;According to the third of label
The related information of layer, marks the related information of the second layer in the related information of the misoperation data;According to the second of label
The related information of layer, marks the related information of top layer in the related information of the misoperation data.
In said program, the locating module, specifically for when determining the cloud platform failure event, according to layer
The related information marked in level structure inquires in the related informations of the misoperation data at least one other layer in addition to bottom
Related information, realize operation of tracing to the source to the event of failure.
It is described to establish module in said program, it is additionally operable to establishing corresponding hierarchical structure at least one virtual unit
Afterwards, at least one functional group in the second layer of increase or deletion hierarchical structure.
In the embodiment of the present invention, corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;
Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud is determined
Platform failure event, and mark the related information of the misoperation data;Determining the cloud platform failure thing
After part, according to the related information, the operation of tracing to the source to the event of failure is realized.In this way, realizing to the event of failure
Operation of tracing to the source.
Description of the drawings
Fig. 1 is the flow chart of the first embodiment of the method for cloud platform of the present invention monitoring;
Fig. 2 is the schematic diagram of the first hierarchical structure of virtual unit to be monitored in the embodiment of the present invention;
Fig. 3 is the first composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring;
Fig. 4 is the schematic diagram of the second hierarchical structure of virtual unit to be monitored in the embodiment of the present invention;
Fig. 5 is the second composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes.
First embodiment
Fig. 1 is the flow chart of the first embodiment of the method for cloud platform of the present invention monitoring, as shown in Figure 1, this method includes:
Step 100:Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform.
In this step, hierarchical structure includes successively from top to bottom:Positioned at top layer the virtual unit identification information and
Positioned at least one functional group of the virtual unit of the second layer;When each functional group is for indicating the virtual unit operation
A kind of function.
Preferably, hierarchical structure further includes:It is used positioned at least one functional group of the virtual unit of third layer
Address Internet protocol address (Internet Protocol, IP) of at least one server, positioned at least one of the 4th layer
The corresponding at least one monitoring project of IP address, the corresponding virtual unit of at least one monitoring project positioned at bottom of server
Operation data.
The crucial host of each system to be monitored is abstracted as virtual unit, the attribute letter of virtual unit by the embodiment of the present invention
Breath is used for indicating different virtual units as identification information.Here, the attribute information of virtual unit includes:Equipment is write a Chinese character in simplified form name, is set
Standby description, affiliated project or name of product, affiliated functional group, IP address, operating system and version.
Preferably, the embodiment of the present invention is according to product, and --- functional group --- supervise by server address --- monitoring project ---
Content is surveyed to establish the hierarchical structure relationship of virtual unit to be monitored.
Fig. 2 is the schematic diagram of the first hierarchical structure of virtual unit to be monitored in the embodiment of the present invention, as shown in Fig. 2, empty
The composition for proposing standby hierarchical structure includes:Positioned at the product identification information of top layer, positioned at the N number of function of second layer virtual unit
Group, N takes more than 0 integer, positioned at third layer each function use whole IP address, positioned at the 4th layer each IP
The corresponding monitoring project in address, positioned at bottom the corresponding virtual unit of each monitoring project operation data.
Illustratively, monitoring project may include:Host, application program, service, user behavior, middleware or database
Deng.When monitoring project is host, corresponding monitoring content may include:Central processing unit (Central Processing
Unit, CPU) total occupancy, User space CPU usage, kernel state CPU usage, interrupt CPU usage, hard disk remaining space,
Hard disk utilization rate, magnetic disc i/o average time, magnetic disc i/o average throughput, physical memory utilization rate, exchange memory utilization rate, net
Network upstream rate, network downstream rate etc.;
When monitoring project is application program, corresponding monitoring content may include:The operation of some critical applications
Data and access record, the availability and quality of application program are determined by judging these monitoring contents.Such as:Crucial API
The number of calling and response condition etc.;
When monitoring project is service, corresponding monitoring content may include:The operating status of large-scale service software.Example
Such as:Nginx adds up request number of times, Nginx number of request per second, Nginx actively connects number, Nginx abandon connection number and
The operating status etc. of Tomcat, MySQL, Apache;
When monitoring project is user behavior, corresponding monitoring content includes:Access monitoring, uniform resource locator
(Uniform Resource Locator, URL) monitoring, content monitoring.Monitoring is accessed for obtaining user's access speed, URL
Monitoring includes response time, mortality, and to understand the real time access state that services, content monitoring is for grasping web page element variation;
When monitoring project is middleware or database, corresponding monitoring content includes:I O throughputs, CPU use
The data such as rate, disk occupancy.
In this step, it can also realize the management operation to hierarchical structure, can be specifically:It is virtually set to be at least one
After establishing corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
Step 101:Acquire the operation data of the virtual unit.
Optionally, the operation data of acquisition virtual unit project to be monitored is parsed as monitoring data by protocol adaptation
Monitoring data, and will be in the data buffer storage to message queue after parsing.Data after parsing can be according to business demand according to pre-
If storage rule be directly stored in database, after can also handling the data after parsing according still further to preset storage advise
Then database is arrived in storage.
To being stored again after the data processing after parsing, including:The first, Data Stream Processing, root are carried out to the data after parsing
According to the strategy of configuration, complete the processing such as to calculate, alert;The second, data processing model is trained first with historical data, then
The data after parsing are handled using data processing model.
Here, setting storage rule is to realize classification and the divided data library storage to data.Illustratively, storage rule
Can be then:Key-Value databases are for storing metadata;Relevant database is for storing user information, handling result
The data such as information, configuration information, history alarm and historical statistics;Non-relational (Not Only Structured Query
Language, NoSQL) database be used for persistent storage historical data.
Step 102:When it is misoperation data to determine collected operation data, determine that the cloud platform breaks down
Event, and mark the related information of the misoperation data;The related information of the misoperation data includes:The level
In the structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with exception
The associated virtual unit identification information of operation data.
In actual implementation, cloud platform realizes the real-time monitoring to operation data according to the strategy of configuration, works as operation data
When meeting preset warning strategies, determine that collected operation data is misoperation data.
Preferably, the related information of misoperation data can also include:In the 4th layer of hierarchical structure with misoperation number
According to the server ip address with misoperation data correlation in associated monitoring project and hierarchical structure third layer;
Correspondingly, when it is misoperation data to determine collected operation data, according to misoperation data in bottom
Mark the 4th layer in the related information of the misoperation data related information (i.e. in the 4th layer of hierarchical structure with misoperation
The monitoring project of data correlation);According to the 4th of label the layer of related information, the related information of the misoperation data is marked
The related information (server ip address i.e. in hierarchical structure third layer with misoperation data correlation) of middle third layer;According to mark
The related information of the third layer of note marks related information (the i.e. level of the second layer in the related information of the misoperation data
In the structure second layer with the functional group of misoperation data correlation);According to the related information of the second layer of label, label is described different
The related information of top layer is (i.e. virtual with misoperation data correlation in hierarchical structure top layer in the related information of normal operation data
Equipment identification information).
When there are multiple misoperation data simultaneously in cloud platform, successively looked into from the bottom to top by being used to hierarchical structure
The method looked for marks related information, determines the corresponding monitoring project of misoperation data, IP address, functional group and product
Identification information.Optionally, in order to distinguish the corresponding related information of different misoperation data, can increase when marking related information
The identification information of different misoperation data.Such as misoperation data 1 correspond to identification information 1 respectively to misoperation data X
Can be in this way which belongs to by detecting identification information come the related information determined in hierarchical structure to identification information X
The related information of misoperation data.
Step 103:After determining the cloud platform failure event, according to the related information, realize to the event
The operation of tracing to the source of barrier event.
In this step, after determining the cloud platform failure event, according to the related information marked in hierarchical structure
Inquire in the related information of the misoperation data at least one other layer in addition to bottom of related information, realize to described therefore
The operation of tracing to the source of barrier event.
Illustratively, when needing the CPU usage to the host of virtual unit A to be monitored, when monitoring making for CPU
When being more than threshold value with rate, the related information of the CPU usage of host is marked.Related information includes at this time:The layer of virtual unit A
The monitoring content of bottom is CPU usage in level structure, the 4th layer of monitoring project is host supervision, is serviced used in third layer
Device address is the product identification of IP address 1, the functional group of the second layer 1 and top layer.When monitoring current CPU usage exception,
Label hierarchical structure determines all related informations of misoperation data, then gets the bid by searching for the hierarchical structure of virtual unit A
The related information of note realizes the operation of tracing to the source to misoperation data.
It is understood that there may be multiple virtual units to be monitored in cloud platform, therefore work as event of failure
When need quickly positioning that can inquire the top layer in the hierarchical structure of all virtual units when there is the virtual unit of event of failure
Product identification information, by judging whether the product identification information of top layer in each hierarchical structure is marked as event of failure
Related information improves the search efficiency of event of failure to determine the virtual unit for event of failure occur.
Can also include in the embodiment of the present invention:The content of the dynamic management of cloud platform system function, management includes to extremely
The increase of a few system function or delete operation, system function include:Run monitoring, equipment details, configuration management, history announcement
The functions such as police, forecast analysis, user management, data export.In actual implementation, can be realized by RESTful API modes
To the dynamic expansion of system function, meets monitoring and regulatory requirement in cloud platform, solve the monitoring function in existing cloud platform
Fixed defect.
Wherein, operational monitoring function:Be based on the tree list that virtual unit hierarchical structure is established, according to product,
Functional group, server ip address three-level catalogue check the state of every level-one.And virtual unit is indicated using colour code
Operating status, such as:Green is identified as normally, and yellow is identified as generation alarm event, and red is identified as generation anomalous event, simultaneously
Identify the currently monitored numerical value.
Equipment details look facility:It supports to find some equipment by directory tree, checks the detailed operation feelings of the equipment
Condition.Such as:Have under the equipment which application program middleware database, how many process be carrying out and process use
The case where resource, including I the data such as O throughputs, CPU usage, memory usage, disk utilization rate, and support data
Graphical representation.
Configuration management function:Realize the mark to all resources of cloud platform, including the grouping of hardware resource, server, process
The information such as resource, port resource, IP resources, business, software.
The strategy of configuration includes:Warning strategies and fail-over policy etc..Here, warning strategies include:Alarm triggered item
Information, the warning strategies such as part, alarm object, alarm recipient, alarm reception mode can be associated with product, policing type.Such as:
Alarm triggered condition can be:When some Supervision measured value is more than alarming threshold value in some product, different stage should be generated
Alarm.Alarm triggered condition may be simple conditional expression, such as:A≥C、A≤C、A>C or A<C, wherein A are monitoring
Value, C are alarming threshold value.Alarming threshold value can customize, and threshold value is mainly for single monitoring data.
Fail-over policy includes:For generic failure, can be executed according to fail-over policy by system, for example, working as magnetic
When disk is full, system carries out garbage files deletion and dilatation automatically, when CPU, memory usage are excessively high, kills engineering noise process.
History alarm query function:It can be accused according to condition queries history such as alarm time, warning strategies type, alarm levels
Alert information, checks warning information disposition.
Forecast analysis function:Realize the functions such as management, prediction and the prediction result inquiry of data processing model.Model pipe
The main management of reason is utilized the data processing model of historical data training by data analysis engine;Forecast function mainly utilizes model
Forecast analysis is carried out to monitoring data such as host, service, application, users, online data can be monitored in real time, it also can be right
Historical data carries out offline batch processing, improves forecast analysis efficiency;Prediction result inquiry is to show host, clothes according to querying condition
Business, application, the relevant predictive analysis results of user.
Subscriber management function:Support two-stage user management, super keepe and ordinary user.Super keepe can be to common
User carries out authority distribution, audits, checks, changing ordinary user's relevant information, ordinary user can only carry out data inspection and base
This information is changed.
Data export function:By configuration data derived rule, by export such as monitoring data, handling result, warning information
It is checked with facilitating.Derived rule may include:Be arranged export the time, export data volume, export format, export data deposit
Storage space is set.Such as:By exporting data in fixed interval automatically from the background, the formats such as Excel or pdf are exported to.
In the embodiment of the present invention, online processing is carried out to monitoring data using Data Stream Processing mode, improves data processing
Efficiency;When data query is analyzed, support, to the real-time processing of online data or the offline batch processing of historical data, to carry simultaneously
For data prediction mechanism, reading process result can be shown directly from database, Optimizing Queries effect;Support data
Classification, divided data library storage improve data reading performance using redundancy;The hierarchical structure of monitoring project is established, is supported to cloud platform monitoring item
Purpose flexibly expands and management;It realizes that function is presented and operated by RESTful API modes, is conducive to the dynamic of system function
Extension.
In the embodiment of the present invention, corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;
Acquire the operation data of the virtual unit;When it is misoperation data to determine collected operation data, the cloud is determined
Platform failure event, and mark the related information of the misoperation data;Determining the cloud platform failure thing
After part, according to the related information, the operation of tracing to the source to the event of failure is realized.It traces back to event of failure in this way, realizing
Source operates.
Second embodiment
Fig. 3 is the first composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring, as shown in figure 3, the dress
Set including:Establish module 300, acquisition module 301, processing module 302 and locating module 303;Wherein,
Module 300 is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform;
The hierarchical structure includes successively from top to bottom:Identification information positioned at the virtual unit of top layer and the institute positioned at the second layer
State at least one functional group of virtual unit;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module 301, the operation data for acquiring the virtual unit;
Processing module 302, for when it is misoperation data to determine collected operation data, determining the cloud platform
Failure event, and mark the related information of the misoperation data;The related information of the misoperation data includes:
In the hierarchical structure second layer with the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer
In virtual unit identification information with misoperation data correlation;
Locating module 303, for when determining the cloud platform failure event, according to the related information, realizing
Operation of tracing to the source to the event of failure.
Preferably, hierarchical structure can also include:At least one functional group positioned at the virtual unit of third layer makes
The IP address of at least one server, the corresponding at least one prison of IP address positioned at the 4th layer of at least one server
Survey project, positioned at bottom the corresponding virtual unit of at least one monitoring project operation data.
Fig. 4 is the schematic diagram of the second hierarchical structure of virtual unit to be monitored in the embodiment of the present invention, as shown in figure 4, pressing
According to product-functional group-IP address-monitoring project (including:Application program, middleware, database)-monitoring content (I/O
Port, CPU, hard disk etc.) establish the hierarchical structure of virtual unit.The server of one product can be divided into multiple by function
Functional group, a functional group can be realized that server can dispose middleware, database, pass through process by multiple servers
Monitor host, service, application indices, the hierarchical structure be easy to implement to monitoring project it is flexible expansion with manage, have
It is tracked conducive to event of failure.
Preferably, the related information of the misoperation data further includes:In the 4th layer of the hierarchical structure with abnormal fortune
Server ip address in the associated monitoring project of row data and the hierarchical structure third layer with misoperation data correlation;
Processing module 302, specifically for determine collected operation data be misoperation data when, according in bottom
4th layer of related information in the related information of misoperation data described in misoperation data markers;According to the 4th of label the layer
Related information, mark the related information of third layer in the related information of the misoperation data;According to the third layer of label
Related information, mark the related information of the second layer in the related information of the misoperation data;According to the second layer of label
Related information, mark the related information of top layer in the related information of the misoperation data.
Locating module 303, specifically for when determining the cloud platform failure event, being marked according in hierarchical structure
Related information inquire in the related informations of the misoperation data at least one other layer in addition to bottom of related information, it is real
Now to the operation of tracing to the source of the event of failure.
Module 300 is established, is additionally operable to after establishing corresponding hierarchical structure at least one virtual unit, increase or is deleted
At least one functional group in the second layer of hierarchical structure.
In practical applications,:Establishing module 300, acquisition module 301, processing module 302 and locating module 303 can be by
Central processing unit (Central Processing Unit, CPU), microprocessor (Micro in terminal device
Processor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field-programmable
The realizations such as gate array (Field Programmable Gate Array, FPGA).
3rd embodiment
Fig. 5 is the second composed structure schematic diagram of the device of cloud platform of embodiment of the present invention monitoring, as shown in figure 5, the dress
Set including:Acquisition module, data analysis module, data memory module, management module and the ends Web.
Acquisition module includes:Data acquisition unit, protocol adaptation unit, message buffer unit.Wherein:
Data acquisition unit generates data point for acquiring virtual unit running log, and periodically uploads;It needs
Bright, when uploading data point, data packet structure format is referred to REST API specifications, and data point value uses single layer
JavaScript object representation (JavaScript Object Notation, JSON) is built, convenient for cross-platform, across language
Data use and interaction.
Protocol adaptation unit, for realizing to monitoring data protocol adaptation and parsing;
Message buffer unit, in the data buffer storage to message queue after parse, so as to data analysis module reading
And it handles.
Data analysis module realizes the online place of monitoring data for reading the monitoring data after being parsed in message queue
Reason/offline batch processing;Judge that treated whether data are misoperation data, is determining that collected operation data is abnormal
It when operation data, determines occur event of failure in cloud platform, and marks institute of the misoperation data in hierarchical structure related
Join information;When it is normal operation data to determine collected operation data, continue to monitor.
Specifically, the major function of data analysis module includes:The first, real time data stream process is read according to configuration strategy
Parsing data are taken to complete the processing such as to calculate, alert;The second, model training is carried out using historical data, model can be according to business certainly
Definition, for calculating and forecast analysis;Third monitors online data using model in real time, or to historical data into
The offline batch processing of row, improves forecast analysis efficiency.
Data memory module, for realizing to data classification and divided data library storage.Here, storage rule can be:
Key-Value databases are for storing metadata;Relevant database is for storing user information, processing result information, configuration
The data such as information, history alarm and historical statistics;Non-relational (Not Only Structured Query Language,
NoSQL) database is used for persistent storage historical data.
The ends Web, for realizing the management to system function, and the access operation to monitoring event.
Specifically, the dynamic expansion to system function may be implemented by RESTful API modes in the ends Web, and it is flat to meet cloud
Monitoring and regulatory requirement on platform, the monitoring function solved on existing cloud service platform fix defect.
The ends Web can also realize the spirit to monitoring data, handling result, warning information etc. by RESTful API modes
It is living to access, when the virtual unit failure event monitored, all associations of misoperation data are presented by the ends Web and are believed
Breath realizes the operation of tracing to the source to event of failure, improves the treatment effeciency of event of failure.
The embodiment of the present invention can be divided into following three kinds of situations when inquiring monitoring data:
The first, it is directed to the higher monitoring data/handling result of inquiry request frequency, data memory module first stores monitoring number
According to/handling result, the ends Web directly read monitoring data/handling result in data memory module and show;
The second, for the inquiry request for needing to handle in real time, by data analysis module according to preset configuration strategy to institute
It states monitoring data and carries out online processing, and monitoring data are sent to the ends Web, the monitoring after the display processing of the ends Web by treated
Data;By treated, monitoring data are stored to data memory module simultaneously;
Third, when sending out inquiry request for whole monitoring data, data analysis module carries out off-line monitoring data
Batch processing, and monitoring data are stored to data memory module by treated, the monitoring number in data memory module is read at the ends Web
According to and show.
Management module includes:Heartbeat administrative unit, dispensing unit, service management unit, upgrade unit.Wherein:
Heartbeat administrative unit, for monitoring Host Status.
Configuration management element, for identifying all resources of cloud platform, configuration content includes hardware asset information, server point
Group information, monitoring strategies information, warning strategies, fail-over policy etc. realize that the additions and deletions to resource, strategy change and look into operation.
Service management unit, for managing user right and essential information.
Upgrade unit, for providing the upgrade service to cloud platform.
In the embodiment of the present invention, established by using the product identification of virtual unit, functional group, IP address etc. corresponding
Hierarchical structure is conducive to the operation of tracing to the source to event of failure.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program
Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention
Formula.Moreover, the present invention can be used can use storage in the computer that one or more wherein includes computer usable program code
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real
The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or
The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.
Claims (10)
1. a kind of method of cloud platform monitoring, which is characterized in that the method includes:
Corresponding hierarchical structure is established at least one virtual unit to be monitored in cloud platform;The hierarchical structure is from top to bottom
Include successively:Identification information positioned at the virtual unit of top layer and the virtual unit positioned at the second layer it is at least one
Functional group;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquire the operation data of the virtual unit;
When it is misoperation data to determine collected operation data, the cloud platform failure event is determined, and mark
The related information of the misoperation data;The related information of the misoperation data includes:The hierarchical structure second layer
In in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with misoperation data close
The virtual unit identification information of connection;
After determining the cloud platform failure event, according to the related information, realization traces to the source to the event of failure
Operation.
2. according to the method described in claim 1, it is characterized in that, the hierarchical structure further includes:Described in third layer
The IP address at least one server that at least one functional group of virtual unit uses, positioned at least one of the 4th layer service
The corresponding at least one monitoring project of IP address of device, positioned at bottom the corresponding virtual unit of at least one monitoring project fortune
Row data.
3. according to the method described in claim 2, it is characterized in that, the related information of the misoperation data further includes:Institute
State in monitoring project and the hierarchical structure third layer in the 4th layer of hierarchical structure with misoperation data correlation with abnormal fortune
The associated server ip address of row data;
The related information of the label misoperation data includes:Determining that collected operation data is misoperation number
According to when, according to the 4th layer in the related information of misoperation data described in misoperation data markers in bottom of related information;
According to the 4th of label the layer of related information, the related information of third layer in the related information of the misoperation data is marked;
According to the related information of the third layer of label, the related information of the second layer in the related information of the misoperation data is marked;
According to the related information of the second layer of label, the related information of top layer in the related information of the misoperation data is marked.
4. according to the method described in claim 1, it is characterized in that, described after determining the cloud platform failure event,
According to the related information, the operation of tracing to the source to the event of failure is realized, including:Determining the cloud platform failure thing
After part, inquired according to the related information marked in hierarchical structure in the related information of the misoperation data in addition to bottom at least
The related information of one other layer realizes the operation of tracing to the source to the event of failure.
5. according to the method described in claim 1, it is characterized in that, the method further includes:For at least one virtual unit
After establishing corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
6. a kind of device of cloud platform monitoring, which is characterized in that described device includes:Establish module, acquisition module, processing module
And locating module;Wherein,
Module is established, for establishing corresponding hierarchical structure at least one virtual unit to be monitored in cloud platform;The layer
Level structure includes successively from top to bottom:Positioned at the identification information of the virtual unit of top layer and positioned at the described virtual of the second layer
At least one functional group of equipment;Each functional group is used to indicate a kind of function when the virtual unit operation;
Acquisition module, the operation data for acquiring the virtual unit;
Processing module, for when it is misoperation data to determine collected operation data, determining that event occurs in the cloud platform
Barrier event, and mark the related information of the misoperation data;The related information of the misoperation data includes:The layer
In the level structure second layer in the functional group of the virtual unit of misoperation data correlation and the hierarchical structure top layer with it is different
The associated virtual unit identification information of normal operation data;
Locating module, for when determining the cloud platform failure event, according to the related information, realizing to the event
The operation of tracing to the source of barrier event.
7. device according to claim 6, which is characterized in that the hierarchical structure further includes:Described in third layer
The IP address at least one server that at least one functional group of virtual unit uses, positioned at least one of the 4th layer service
The corresponding at least one monitoring project of IP address of device, positioned at bottom the corresponding virtual unit of at least one monitoring project fortune
Row data.
8. device according to claim 7, which is characterized in that the related information of the misoperation data further includes:Institute
State in monitoring project and the hierarchical structure third layer in the 4th layer of hierarchical structure with misoperation data correlation with abnormal fortune
The associated server ip address of row data;
The processing module, specifically for determine collected operation data be misoperation data when, according to different in bottom
Normal operation data marks the 4th layer in the related informations of the misoperation data of related information;According to the 4th of label the layer
Related information marks the related information of third layer in the related information of the misoperation data;According to the third layer of label
Related information marks the related information of the second layer in the related information of the misoperation data;According to the second layer of label
Related information marks the related information of top layer in the related information of the misoperation data.
9. device according to claim 6, which is characterized in that the locating module, specifically for determining that the cloud is flat
When platform failure event, inquired according to the related information marked in hierarchical structure in the related information of the misoperation data
At least one other layer of the related information in addition to bottom realizes the operation of tracing to the source to the event of failure.
10. device according to claim 6, which is characterized in that it is described to establish module, it is additionally operable to be at least one virtual
After equipment establishes corresponding hierarchical structure, at least one functional group in the second layer of increase or deletion hierarchical structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710043469.7A CN108337100B (en) | 2017-01-19 | 2017-01-19 | Cloud platform monitoring method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710043469.7A CN108337100B (en) | 2017-01-19 | 2017-01-19 | Cloud platform monitoring method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108337100A true CN108337100A (en) | 2018-07-27 |
CN108337100B CN108337100B (en) | 2021-07-09 |
Family
ID=62922221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710043469.7A Active CN108337100B (en) | 2017-01-19 | 2017-01-19 | Cloud platform monitoring method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108337100B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109634813A (en) * | 2018-12-11 | 2019-04-16 | 平安科技(深圳)有限公司 | Electronic device, cloud platform exception confirmation method and storage medium |
CN110855473A (en) * | 2019-10-16 | 2020-02-28 | 平安科技(深圳)有限公司 | Monitoring method, device, server and storage medium |
CN113799850A (en) * | 2021-08-25 | 2021-12-17 | 通号城市轨道交通技术有限公司 | Running state monitoring method and device, electronic equipment and storage medium |
CN117724880A (en) * | 2023-06-13 | 2024-03-19 | 荣耀终端有限公司 | Fault information processing method, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130227338A1 (en) * | 2012-02-28 | 2013-08-29 | International Business Machines Corporation | Reconfiguring interrelationships between components of virtual computing networks |
CN104142848A (en) * | 2013-05-08 | 2014-11-12 | 西安邮电大学 | Virtual machine identifier and use method thereof |
CN104486406A (en) * | 2014-12-15 | 2015-04-01 | 浪潮电子信息产业股份有限公司 | Layered resource monitoring method based on cloud data center |
CN106130809A (en) * | 2016-09-07 | 2016-11-16 | 东南大学 | A kind of IaaS cloud platform network failure locating method based on log analysis and system |
-
2017
- 2017-01-19 CN CN201710043469.7A patent/CN108337100B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130227338A1 (en) * | 2012-02-28 | 2013-08-29 | International Business Machines Corporation | Reconfiguring interrelationships between components of virtual computing networks |
CN104142848A (en) * | 2013-05-08 | 2014-11-12 | 西安邮电大学 | Virtual machine identifier and use method thereof |
CN104486406A (en) * | 2014-12-15 | 2015-04-01 | 浪潮电子信息产业股份有限公司 | Layered resource monitoring method based on cloud data center |
CN106130809A (en) * | 2016-09-07 | 2016-11-16 | 东南大学 | A kind of IaaS cloud platform network failure locating method based on log analysis and system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109634813A (en) * | 2018-12-11 | 2019-04-16 | 平安科技(深圳)有限公司 | Electronic device, cloud platform exception confirmation method and storage medium |
CN110855473A (en) * | 2019-10-16 | 2020-02-28 | 平安科技(深圳)有限公司 | Monitoring method, device, server and storage medium |
WO2021073433A1 (en) * | 2019-10-16 | 2021-04-22 | 平安科技(深圳)有限公司 | Monitoring method and device, server, and storage medium |
CN113799850A (en) * | 2021-08-25 | 2021-12-17 | 通号城市轨道交通技术有限公司 | Running state monitoring method and device, electronic equipment and storage medium |
CN117724880A (en) * | 2023-06-13 | 2024-03-19 | 荣耀终端有限公司 | Fault information processing method, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108337100B (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111984499B (en) | Fault detection method and device for big data cluster | |
CN109408347B (en) | A kind of index real-time analyzer and index real-time computing technique | |
CN112653586B (en) | Time-space big data platform application performance management method based on full link monitoring | |
US20170109676A1 (en) | Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process | |
US20200372007A1 (en) | Trace and span sampling and analysis for instrumented software | |
US20170109668A1 (en) | Model for Linking Between Nonconsecutively Performed Steps in a Business Process | |
CN106487574A (en) | Automatic operating safeguards monitoring system | |
CN108763957A (en) | A kind of safety auditing system of database, method and server | |
US20100070981A1 (en) | System and Method for Performing Complex Event Processing | |
US20170109667A1 (en) | Automaton-Based Identification of Executions of a Business Process | |
CN111339175B (en) | Data processing method, device, electronic equipment and readable storage medium | |
CN106940677A (en) | One kind application daily record data alarm method and device | |
CN108337100A (en) | A kind of method and apparatus of cloud platform monitoring | |
US20170109636A1 (en) | Crowd-Based Model for Identifying Executions of a Business Process | |
US20170109638A1 (en) | Ensemble-Based Identification of Executions of a Business Process | |
CN112052134A (en) | Service data monitoring method and device | |
US9922116B2 (en) | Managing big data for services | |
CN112039726A (en) | Data monitoring method and system for content delivery network CDN device | |
CN108182263A (en) | A kind of date storage method of data center's total management system | |
CN111177139A (en) | Data quality verification monitoring and early warning method and system based on data quality system | |
CN112181704A (en) | Big data task processing method and device, electronic equipment and storage medium | |
CN109032904A (en) | Monitored, management server and data acquisition, analysis method and management system | |
CN115333966A (en) | Nginx log analysis method, system and equipment based on topology | |
US20170109640A1 (en) | Generation of Candidate Sequences Using Crowd-Based Seeds of Commonly-Performed Steps of a Business Process | |
CN106528448A (en) | Distributed caching mechanism for multi-source heterogeneous electronic commerce big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |