CN109728938A - A kind of method of assessment system service level - Google Patents
A kind of method of assessment system service level Download PDFInfo
- Publication number
- CN109728938A CN109728938A CN201811511461.XA CN201811511461A CN109728938A CN 109728938 A CN109728938 A CN 109728938A CN 201811511461 A CN201811511461 A CN 201811511461A CN 109728938 A CN109728938 A CN 109728938A
- Authority
- CN
- China
- Prior art keywords
- time
- host
- service
- task
- interface
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Abstract
The present invention relates to cloud platform monitoring and administrative skill field, especially a kind of methods of assessment system service level.The method of the present invention includes following steps: S10: obtaining the operation conditions and resource service condition of host;S20: the operation conditions and resource service condition of virtual machine are obtained;S30: cloud platform interface response speed is obtained;S40: the long task operating situation of cloud platform is obtained;S50: according to weight calculation system service ability.The present invention can comprehensively grasp the operation conditions and resource service condition of system, have exception just to notify administrator in time, be avoided that fault coverage expands to a certain extent.
Description
Technical field
The present invention relates to cloud platform monitoring and administrative skill field, especially a kind of methods of assessment system service level.
Background technique
With the fast development of cloud computing technology, cloud platform is in full flourish.What the management of cloud platform and operation maintenance personnel were concerned about
The service condition of more than single resource is preferred to from the whole health degree for understanding entire cloud platform.And traditional cloud platform
Service condition of the monitoring module of offer just for the resource under cloud platform;Therefore, it is necessary to a kind of more fully assessment systems
The method of service level.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of method of assessment system service level, can more fully comment
Estimate the health status of cloud platform and is handled.
The technical solution that the present invention solves above-mentioned technical problem is:
The method includes the following steps:
S10: the operation conditions and resource service condition of host are obtained;
S20: the operation conditions and resource service condition of virtual machine are obtained;
S30: cloud platform interface response speed is obtained;
S40: the long task operating situation of cloud platform is obtained;
S50: according to weight calculation system service ability.
The host operation conditions includes each service operation situation, judgement and outer net connection situation, resource service condition
Including load on host computers, cpu usage, memory service condition and storage service condition;
Obtaining each service operation situation step is: 1) define the service for being included in statistics, the state that the service should be in and
Whether key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4) it presses
State classification counts the quantity of service and service list of the locating state;5) when be the discovery that key service not in corresponding state,
It sends a warning message in time to administrator.
The judgement and outer net connection situation is realized by ping and access public web site, if there is normally returning then
Think to be connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
The resource service condition step of the acquisition host is: 1) load on host computers, CPU usage, memory is arranged and uses
Rate and storage utilization rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage respectively
Utilization rate then calls directly local command if it is native data is obtained;If it is remote host data is obtained, then pass through
Snmp service acquisition;3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold,
Alarm is just triggered more than threshold value, notifies administrator in time.
The virtual machine operation conditions includes running situation, with outer net connection situation and with host is connected to feelings
Condition, resource service condition include cpu usage, memory service condition and storage service condition.
Running situation judges whether system occurs blocking exception by the heartbeat signal issued inside virtual machine, virtually
The agent that a collection operating status has been run in machine, periodically pushes the operating condition of the virtual machine to host.
It is described with host connection situation by the ip or host name of ping host, whether can be successfully transmitted heartbeat letter
Number and receive host reply to determine.
The resource service condition step of the acquisition virtual machine is: 1) CPU usage, memory usage and storage is arranged
Utilization rate alarm threshold;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) sentence
Whether disconnected CPU usage, memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, logical in time
Know administrator.
The interface is the sync cap of cloud platform intrinsic call;The step S30 is specifically: 1) interface is arranged and responds
Alarm threshold;2) blocker is set in the key position of communication, records the interface name of calling interface, starts allocating time and hold
The row end time;3) subtracted each other by executing the end time with allocating time is started, obtain the response time of the interface;4) above-mentioned
Interface name starts the response time deposit database of allocating time, execution end time and interface to subsequent processing;5) sentence
Whether the response time of slave interrupt interface is more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, appear in creation virtual machine, on
It passes in the function that mirror image needs to wait the long period that could complete;
The step S40 is specifically: 1) long task operating situation alarm threshold is arranged;2) raw before long task starts
At task ID, logger task ID, task names, time started, current state, operator's information to database;3) in request header
Task ID, calling interface executive chairman operation is arranged in portion;4) after the completion of long operation, management end is sent messages to, management end is according to disappearing
Task ID, operating result, current time and task time-consuming in breath update task record;5) judge whether task operating situation surpasses
Given threshold is crossed, alarm is just triggered more than threshold value, notifies administrator in time.
The step S50 is specifically: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification
Weight is set with alarm grade;3) module score is gone out by weight calculation to separate modular;4) it is totally obtained according to module weight calculation
Point;5) judge whether score is lower than given threshold, just trigger alarm lower than threshold value, notify administrator in time.
The method supports plug-in type extension, can increase assessment factor on demand;The step of defining plug-in unit is: 1) existing
Increase operation entry in entry_point.txt and corresponds to the path for realizing class, 2) in the corresponding reality of above-mentioned path position creation
Existing class, 3) increase evaluation module, sub-step, weight and specific logic in above-mentioned realization class, 4) newly-increased above-mentioned module is added
Enter in overall evaluation process.
Using the method for assessment system service level of the invention, the operation conditions and resource of system can be comprehensively grasped
Service condition has exception just to notify administrator in time, is avoided that fault coverage expands to a certain extent.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Fig. 1 is the method for the present invention flow chart.
Specific embodiment
It as shown in Figure 1, is the flow chart of the method for the present invention;This method comprises:
S10: the operation conditions and resource service condition of host are obtained.
Host operation conditions includes each service operation situation and outer net connection situation etc., and resource service condition includes host
Load, cpu usage, memory service condition and storage service condition etc..
Obtaining each service operation situation, steps are as follows: 1) defining the service for being included in statistics, the state that the service should be in
Whether key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4)
The quantity of service and service list of the locating state are counted by state classification;5) when being the discovery that key service not in corresponding state
When, it is sent a warning message in time to administrator.
Judgement with outer net connection situation is realized by ping and some public web sites of access, if there is normally returning, is recognized
To be connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
Obtaining resource service condition, steps are as follows: 1) load on host computers, CPU usage, memory usage and storage, which is arranged, to be made
With rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage utilization rate respectively, if
It is to obtain native data, then calls directly local command, if it is remote host data is obtained, then passes through snmp service acquisition;
3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold, are just touched more than threshold value
Hair alarm, notifies administrator in time.
S20: the operation conditions and resource service condition of virtual machine are obtained.
Virtual machine operation conditions includes running situation, with outer net connection situation and with host connection situation etc., money
Source service condition includes cpu usage, memory service condition and storage service condition etc..
Running situation by the heartbeat signal issued inside virtual machine judge system whether occur blocking it is equal extremely, void
The agent of a collection operating status has been run in quasi- machine, and the operating condition of the virtual machine can be periodically pushed to host.
Judgement with outer net connection situation is realized by ping and some public web sites of access, if there is normally returning, is recognized
To be connected to outer net, met network packet and do not sent out, then it is assumed that is obstructed with outer net network.
With host connection situation by the ip of ping host or host name, whether can be successfully transmitted heartbeat signal and receive
It is determined to host answer.
Obtaining resource service condition, steps are as follows: 1) CPU usage, memory usage and storage utilization rate is arranged and alerts threshold
Value;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) judge CPU usage,
Whether memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
S30: cloud platform interface response speed is obtained.
The interface is the sync cap of cloud platform intrinsic call, needs wait-for-response after issuing request, therefore can obtain
Interface is taken to call duration.
Steps are as follows: 1) interface is arranged and responds alarm threshold;2) blocker is set in the key position of communication, record calls
The interface name of interface starts allocating time and executes the end time;3) by executing the end time and starting allocating time phase
Subtract, obtains the response time of the interface;4) above-mentioned interface name, the sound for starting allocating time, executing end time and interface
Deposit database is to subsequent processing between seasonable;5) whether the response time for judging interface is more than given threshold, just more than threshold value
Triggering alarm, notifies administrator in time.
Normal condition lower interface response speed is all Millisecond, if the interface response time is too long, illustrates certain services
Or Network Abnormal, it notifies that processing can substantially reduce coverage to administrator in time, ensures user experience.
S40: the long task operating situation of cloud platform is obtained.
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, is generally present in as creation is empty
In the function that quasi- machine, upload mirror image etc. need to wait the long period that could complete.
Steps are as follows: 1) long task operating situation alarm threshold is arranged;2) before long task starts, task ID, note are generated
The information such as task ID, task names, time started, current state, operator are recorded to database;3) task is set in request header
ID, calling interface executive chairman operation;4) it after the completion of long operation, sends carefully to management end, management end is according to the task in message
ID, operating result, current time and task time-consuming update task record;5) judge whether task operating situation is more than setting threshold
Value, alarm is just triggered more than threshold value, notifies administrator in time.
Since the task definition of long task is had nothing in common with each other, the time-consuming for needing to be implemented the task is also not quite similar, so wanting needle
Threshold value is arranged to a generic task, threshold value can also be set for a certain particular task.
S50: according to weight calculation system service ability.
This is the weight calculation overall scores to the information collected by setting, shows the overall situation of the system.
Steps are as follows: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification and alarm grade
Set weight;3) module score is gone out by weight calculation to separate modular;4) according to module weight calculation overall scores;5) judge
Divide and whether be lower than given threshold, alarm is just triggered lower than threshold value, notifies administrator in time.
Due to can assessment system service level be far above above method, can be by so this method supports plug-in type extension
Assessment factor need to be increased.
Defining plug-in unit, steps are as follows: 1) in entry_point.txt increase operation entry and the corresponding path for realizing class,
2) corresponding realization class is created in above-mentioned path position, 3) increase evaluation module, sub-step, weight and tool in above-mentioned realization class
Body logic, 4) newly-increased above-mentioned module is added in overall evaluation process.
Claims (11)
1. a kind of method of assessment system service level, it is characterised in that: the method includes the following steps:
S10: the operation conditions and resource service condition of host are obtained;
S20: the operation conditions and resource service condition of virtual machine are obtained;
S30: cloud platform interface response speed is obtained;
S40: the long task operating situation of cloud platform is obtained;
S50: according to weight calculation system service ability.
2. according to the method described in claim 1, it is characterized by:
The host operation conditions includes each service operation situation, judgement and outer net connection situation, and resource service condition includes
Load on host computers, cpu usage, memory service condition and storage service condition;
Obtaining each service operation situation step is: 1) define the service for being included in statistics, the state that the service should be in and whether
Key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4) state is pressed
The quantity of service and service list of the state locating for statistic of classification;5) when be the discovery that key service not in corresponding state, in time
It sends a warning message to administrator.
3. according to the method described in claim 2, it is characterized by:
The judgement and outer net connection situation is realized by ping and access public web site, if there is normally returning, is thought
It is connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
4. according to the method described in claim 2, it is characterized by:
The resource service condition step of the acquisition host is: 1) be arranged load on host computers, CPU usage, memory usage and
Store utilization rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage respectively and uses
Rate then calls directly local command if it is native data is obtained;If it is remote host data is obtained, then taken by snmp
Business obtains;3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold, are more than threshold
Value just triggering alarm, notifies administrator in time.
5. according to the method described in claim 1, it is characterized by:
The virtual machine operation conditions include running situation, with outer net connection situation and with host connection situation, money
Source service condition includes cpu usage, memory service condition and storage service condition.
Running situation judges whether system occurs blocking exception by the heartbeat signal issued inside virtual machine, in virtual machine
The agent for having run a collection operating status, periodically pushes the operating condition of the virtual machine to host.
6. according to the method described in claim 5, it is characterized by:
It is described with host connection situation by the ip or host name of ping host, whether can be successfully transmitted heartbeat signal simultaneously
Host is received to reply to determine.
7. according to the method described in claim 5, it is characterized by:
The resource service condition step of the acquisition virtual machine is: 1) CPU usage, memory usage and storage is arranged and uses
Rate alarm threshold;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) judge
Whether CPU usage, memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, is notified in time
Administrator.
8. according to the method described in claim 1, it is characterized by:
The interface is the sync cap of cloud platform intrinsic call;The step S30 is specifically: 1) interface is arranged and responds alarm
Threshold value;2) blocker is set in the key position of communication, records the interface name of calling interface, starts allocating time and execute knot
The beam time;3) subtracted each other by executing the end time with allocating time is started, obtain the response time of the interface;4) above-mentioned interface
Title starts the response time deposit database of allocating time, execution end time and interface to subsequent processing;5) judgement connects
Whether the response time of mouth is more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
9. according to the method described in claim 1, it is characterized by:
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, appears in creation virtual machine, uploads mirror
In the function that picture needs to wait the long period that could complete;
The step S40 is specifically: 1) long task operating situation alarm threshold is arranged;2) it before long task starts, generates and appoints
Be engaged in ID, logger task ID, task names, time started, current state, operator's information to database;3) it is set in request header
Set task ID, calling interface executive chairman operation;4) after the completion of long operation, management end is sent messages to, management end is according in message
Task ID, operating result, current time and task time-consuming update task record;5) judge whether task operating situation is more than to set
Determine threshold value, alarm is just triggered more than threshold value, notifies administrator in time.
10. according to the method described in claim 1, it is characterized by:
The step S50 is specifically: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification and announcement
Alert grade sets weight;3) module score is gone out by weight calculation to separate modular;4) according to module weight calculation overall scores;5)
Judge whether score is lower than given threshold, just triggers alarm lower than threshold value, notify administrator in time.
11. the method according to any of claims 1 to 10, it is characterised in that:
The method supports plug-in type extension, can increase assessment factor on demand;The step of defining plug-in unit is: 1) in entry_
Increase operation entry and the corresponding path for realizing class in point.txt, 2) corresponding realization class is created in above-mentioned path position, 3)
Increase evaluation module, sub-step, weight and specific logic in above-mentioned realization class, 4) newly-increased above-mentioned module is added to general comment
Estimate in process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811511461.XA CN109728938A (en) | 2018-12-11 | 2018-12-11 | A kind of method of assessment system service level |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811511461.XA CN109728938A (en) | 2018-12-11 | 2018-12-11 | A kind of method of assessment system service level |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109728938A true CN109728938A (en) | 2019-05-07 |
Family
ID=66294859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811511461.XA Withdrawn CN109728938A (en) | 2018-12-11 | 2018-12-11 | A kind of method of assessment system service level |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109728938A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115442262A (en) * | 2022-08-01 | 2022-12-06 | 阿里巴巴(中国)有限公司 | Resource evaluation method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102123061A (en) * | 2011-03-28 | 2011-07-13 | 杭州电子科技大学 | Method for determining performance of Web server |
CN104184604A (en) * | 2013-05-24 | 2014-12-03 | 北京天地超云科技有限公司 | Cloud platform basic framework supervision system |
CN104333488A (en) * | 2014-11-04 | 2015-02-04 | 哈尔滨工业大学 | Cloud service platform performance test method |
US20150277858A1 (en) * | 2012-10-02 | 2015-10-01 | Nec Corporation | Performance evaluation device, method, and medium for information system |
CN105376100A (en) * | 2015-12-09 | 2016-03-02 | 国云科技股份有限公司 | Distributed alarm rule assessment method suitable for cloud platform resource monitoring |
CN107786616A (en) * | 2016-08-30 | 2018-03-09 | 江苏蓝创聚联数据与应用研究院有限公司 | Main frame intelligent monitor system based on high in the clouds |
CN108512719A (en) * | 2018-03-02 | 2018-09-07 | 南京易捷思达软件科技有限公司 | A kind of Integrative resource monitoring system based on cloud platform of increasing income |
-
2018
- 2018-12-11 CN CN201811511461.XA patent/CN109728938A/en not_active Withdrawn
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102123061A (en) * | 2011-03-28 | 2011-07-13 | 杭州电子科技大学 | Method for determining performance of Web server |
US20150277858A1 (en) * | 2012-10-02 | 2015-10-01 | Nec Corporation | Performance evaluation device, method, and medium for information system |
CN104184604A (en) * | 2013-05-24 | 2014-12-03 | 北京天地超云科技有限公司 | Cloud platform basic framework supervision system |
CN104333488A (en) * | 2014-11-04 | 2015-02-04 | 哈尔滨工业大学 | Cloud service platform performance test method |
CN105376100A (en) * | 2015-12-09 | 2016-03-02 | 国云科技股份有限公司 | Distributed alarm rule assessment method suitable for cloud platform resource monitoring |
CN107786616A (en) * | 2016-08-30 | 2018-03-09 | 江苏蓝创聚联数据与应用研究院有限公司 | Main frame intelligent monitor system based on high in the clouds |
CN108512719A (en) * | 2018-03-02 | 2018-09-07 | 南京易捷思达软件科技有限公司 | A kind of Integrative resource monitoring system based on cloud platform of increasing income |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115442262A (en) * | 2022-08-01 | 2022-12-06 | 阿里巴巴(中国)有限公司 | Resource evaluation method and device, electronic equipment and storage medium |
CN115442262B (en) * | 2022-08-01 | 2024-02-06 | 阿里巴巴(中国)有限公司 | Resource evaluation method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104657250B (en) | A kind of monitoring system and its monitoring method that performance monitoring is carried out to cloud host | |
CN109660380A (en) | Monitoring method, platform, system and the readable storage medium storing program for executing of operation condition of server | |
CN107508722B (en) | Service monitoring method and device | |
CN106612199B (en) | A kind of network monitoring data is collected and analysis system and method | |
CN105610648B (en) | A kind of acquisition method and server of O&M monitoring data | |
US8270579B2 (en) | Methods, computer program products, and systems for managing voice over internet protocol (VOIP) network elements | |
US20060230309A1 (en) | System for remote fault management in a wireless network | |
JP2004021549A (en) | Network monitoring system and program | |
CN111800354B (en) | Message processing method and device, message processing equipment and storage medium | |
CN112256542B (en) | eBPF-based micro-service system performance detection method, device and system | |
EP1890427B1 (en) | A system and method for monitoring the device port state | |
CN102983990A (en) | Method and device for management of virtual machine | |
US20110172963A1 (en) | Methods and Apparatus for Predicting the Performance of a Multi-Tier Computer Software System | |
CN112350854B (en) | Flow fault positioning method, device, equipment and storage medium | |
CN109428779A (en) | A kind of monitoring alarm method and device of distributed service | |
GB2594107A (en) | Network analytics | |
CN102195791A (en) | Alarm analysis method, device and system | |
CN108390907A (en) | A kind of management monitoring system and method based on Hadoop clusters | |
CN111339466A (en) | Interface management method and device, electronic equipment and readable storage medium | |
CN108039956A (en) | Using monitoring method, system and computer-readable recording medium | |
CN109728938A (en) | A kind of method of assessment system service level | |
CN111628903B (en) | Monitoring method and monitoring system for transaction system running state | |
CN110798660A (en) | Integrated operation and maintenance system based on cloud federal audio and video fusion platform | |
CN108449212B (en) | MAS message transmission method based on event association | |
CN110198246B (en) | Method and system for monitoring flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190507 |