CN109728938A - A kind of method of assessment system service level - Google Patents

A kind of method of assessment system service level Download PDF

Info

Publication number
CN109728938A
CN109728938A CN201811511461.XA CN201811511461A CN109728938A CN 109728938 A CN109728938 A CN 109728938A CN 201811511461 A CN201811511461 A CN 201811511461A CN 109728938 A CN109728938 A CN 109728938A
Authority
CN
China
Prior art keywords
time
host
service
task
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811511461.XA
Other languages
Chinese (zh)
Inventor
孔美琪
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN201811511461.XA priority Critical patent/CN109728938A/en
Publication of CN109728938A publication Critical patent/CN109728938A/en
Withdrawn legal-status Critical Current

Links

Abstract

The present invention relates to cloud platform monitoring and administrative skill field, especially a kind of methods of assessment system service level.The method of the present invention includes following steps: S10: obtaining the operation conditions and resource service condition of host;S20: the operation conditions and resource service condition of virtual machine are obtained;S30: cloud platform interface response speed is obtained;S40: the long task operating situation of cloud platform is obtained;S50: according to weight calculation system service ability.The present invention can comprehensively grasp the operation conditions and resource service condition of system, have exception just to notify administrator in time, be avoided that fault coverage expands to a certain extent.

Description

A kind of method of assessment system service level
Technical field
The present invention relates to cloud platform monitoring and administrative skill field, especially a kind of methods of assessment system service level.
Background technique
With the fast development of cloud computing technology, cloud platform is in full flourish.What the management of cloud platform and operation maintenance personnel were concerned about The service condition of more than single resource is preferred to from the whole health degree for understanding entire cloud platform.And traditional cloud platform Service condition of the monitoring module of offer just for the resource under cloud platform;Therefore, it is necessary to a kind of more fully assessment systems The method of service level.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of method of assessment system service level, can more fully comment Estimate the health status of cloud platform and is handled.
The technical solution that the present invention solves above-mentioned technical problem is:
The method includes the following steps:
S10: the operation conditions and resource service condition of host are obtained;
S20: the operation conditions and resource service condition of virtual machine are obtained;
S30: cloud platform interface response speed is obtained;
S40: the long task operating situation of cloud platform is obtained;
S50: according to weight calculation system service ability.
The host operation conditions includes each service operation situation, judgement and outer net connection situation, resource service condition Including load on host computers, cpu usage, memory service condition and storage service condition;
Obtaining each service operation situation step is: 1) define the service for being included in statistics, the state that the service should be in and Whether key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4) it presses State classification counts the quantity of service and service list of the locating state;5) when be the discovery that key service not in corresponding state, It sends a warning message in time to administrator.
The judgement and outer net connection situation is realized by ping and access public web site, if there is normally returning then Think to be connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
The resource service condition step of the acquisition host is: 1) load on host computers, CPU usage, memory is arranged and uses Rate and storage utilization rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage respectively Utilization rate then calls directly local command if it is native data is obtained;If it is remote host data is obtained, then pass through Snmp service acquisition;3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold, Alarm is just triggered more than threshold value, notifies administrator in time.
The virtual machine operation conditions includes running situation, with outer net connection situation and with host is connected to feelings Condition, resource service condition include cpu usage, memory service condition and storage service condition.
Running situation judges whether system occurs blocking exception by the heartbeat signal issued inside virtual machine, virtually The agent that a collection operating status has been run in machine, periodically pushes the operating condition of the virtual machine to host.
It is described with host connection situation by the ip or host name of ping host, whether can be successfully transmitted heartbeat letter Number and receive host reply to determine.
The resource service condition step of the acquisition virtual machine is: 1) CPU usage, memory usage and storage is arranged Utilization rate alarm threshold;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) sentence Whether disconnected CPU usage, memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, logical in time Know administrator.
The interface is the sync cap of cloud platform intrinsic call;The step S30 is specifically: 1) interface is arranged and responds Alarm threshold;2) blocker is set in the key position of communication, records the interface name of calling interface, starts allocating time and hold The row end time;3) subtracted each other by executing the end time with allocating time is started, obtain the response time of the interface;4) above-mentioned Interface name starts the response time deposit database of allocating time, execution end time and interface to subsequent processing;5) sentence Whether the response time of slave interrupt interface is more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, appear in creation virtual machine, on It passes in the function that mirror image needs to wait the long period that could complete;
The step S40 is specifically: 1) long task operating situation alarm threshold is arranged;2) raw before long task starts At task ID, logger task ID, task names, time started, current state, operator's information to database;3) in request header Task ID, calling interface executive chairman operation is arranged in portion;4) after the completion of long operation, management end is sent messages to, management end is according to disappearing Task ID, operating result, current time and task time-consuming in breath update task record;5) judge whether task operating situation surpasses Given threshold is crossed, alarm is just triggered more than threshold value, notifies administrator in time.
The step S50 is specifically: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification Weight is set with alarm grade;3) module score is gone out by weight calculation to separate modular;4) it is totally obtained according to module weight calculation Point;5) judge whether score is lower than given threshold, just trigger alarm lower than threshold value, notify administrator in time.
The method supports plug-in type extension, can increase assessment factor on demand;The step of defining plug-in unit is: 1) existing Increase operation entry in entry_point.txt and corresponds to the path for realizing class, 2) in the corresponding reality of above-mentioned path position creation Existing class, 3) increase evaluation module, sub-step, weight and specific logic in above-mentioned realization class, 4) newly-increased above-mentioned module is added Enter in overall evaluation process.
Using the method for assessment system service level of the invention, the operation conditions and resource of system can be comprehensively grasped Service condition has exception just to notify administrator in time, is avoided that fault coverage expands to a certain extent.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Fig. 1 is the method for the present invention flow chart.
Specific embodiment
It as shown in Figure 1, is the flow chart of the method for the present invention;This method comprises:
S10: the operation conditions and resource service condition of host are obtained.
Host operation conditions includes each service operation situation and outer net connection situation etc., and resource service condition includes host Load, cpu usage, memory service condition and storage service condition etc..
Obtaining each service operation situation, steps are as follows: 1) defining the service for being included in statistics, the state that the service should be in Whether key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4) The quantity of service and service list of the locating state are counted by state classification;5) when being the discovery that key service not in corresponding state When, it is sent a warning message in time to administrator.
Judgement with outer net connection situation is realized by ping and some public web sites of access, if there is normally returning, is recognized To be connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
Obtaining resource service condition, steps are as follows: 1) load on host computers, CPU usage, memory usage and storage, which is arranged, to be made With rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage utilization rate respectively, if It is to obtain native data, then calls directly local command, if it is remote host data is obtained, then passes through snmp service acquisition; 3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold, are just touched more than threshold value Hair alarm, notifies administrator in time.
S20: the operation conditions and resource service condition of virtual machine are obtained.
Virtual machine operation conditions includes running situation, with outer net connection situation and with host connection situation etc., money Source service condition includes cpu usage, memory service condition and storage service condition etc..
Running situation by the heartbeat signal issued inside virtual machine judge system whether occur blocking it is equal extremely, void The agent of a collection operating status has been run in quasi- machine, and the operating condition of the virtual machine can be periodically pushed to host.
Judgement with outer net connection situation is realized by ping and some public web sites of access, if there is normally returning, is recognized To be connected to outer net, met network packet and do not sent out, then it is assumed that is obstructed with outer net network.
With host connection situation by the ip of ping host or host name, whether can be successfully transmitted heartbeat signal and receive It is determined to host answer.
Obtaining resource service condition, steps are as follows: 1) CPU usage, memory usage and storage utilization rate is arranged and alerts threshold Value;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) judge CPU usage, Whether memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
S30: cloud platform interface response speed is obtained.
The interface is the sync cap of cloud platform intrinsic call, needs wait-for-response after issuing request, therefore can obtain Interface is taken to call duration.
Steps are as follows: 1) interface is arranged and responds alarm threshold;2) blocker is set in the key position of communication, record calls The interface name of interface starts allocating time and executes the end time;3) by executing the end time and starting allocating time phase Subtract, obtains the response time of the interface;4) above-mentioned interface name, the sound for starting allocating time, executing end time and interface Deposit database is to subsequent processing between seasonable;5) whether the response time for judging interface is more than given threshold, just more than threshold value Triggering alarm, notifies administrator in time.
Normal condition lower interface response speed is all Millisecond, if the interface response time is too long, illustrates certain services Or Network Abnormal, it notifies that processing can substantially reduce coverage to administrator in time, ensures user experience.
S40: the long task operating situation of cloud platform is obtained.
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, is generally present in as creation is empty In the function that quasi- machine, upload mirror image etc. need to wait the long period that could complete.
Steps are as follows: 1) long task operating situation alarm threshold is arranged;2) before long task starts, task ID, note are generated The information such as task ID, task names, time started, current state, operator are recorded to database;3) task is set in request header ID, calling interface executive chairman operation;4) it after the completion of long operation, sends carefully to management end, management end is according to the task in message ID, operating result, current time and task time-consuming update task record;5) judge whether task operating situation is more than setting threshold Value, alarm is just triggered more than threshold value, notifies administrator in time.
Since the task definition of long task is had nothing in common with each other, the time-consuming for needing to be implemented the task is also not quite similar, so wanting needle Threshold value is arranged to a generic task, threshold value can also be set for a certain particular task.
S50: according to weight calculation system service ability.
This is the weight calculation overall scores to the information collected by setting, shows the overall situation of the system.
Steps are as follows: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification and alarm grade Set weight;3) module score is gone out by weight calculation to separate modular;4) according to module weight calculation overall scores;5) judge Divide and whether be lower than given threshold, alarm is just triggered lower than threshold value, notifies administrator in time.
Due to can assessment system service level be far above above method, can be by so this method supports plug-in type extension Assessment factor need to be increased.
Defining plug-in unit, steps are as follows: 1) in entry_point.txt increase operation entry and the corresponding path for realizing class, 2) corresponding realization class is created in above-mentioned path position, 3) increase evaluation module, sub-step, weight and tool in above-mentioned realization class Body logic, 4) newly-increased above-mentioned module is added in overall evaluation process.

Claims (11)

1. a kind of method of assessment system service level, it is characterised in that: the method includes the following steps:
S10: the operation conditions and resource service condition of host are obtained;
S20: the operation conditions and resource service condition of virtual machine are obtained;
S30: cloud platform interface response speed is obtained;
S40: the long task operating situation of cloud platform is obtained;
S50: according to weight calculation system service ability.
2. according to the method described in claim 1, it is characterized by:
The host operation conditions includes each service operation situation, judgement and outer net connection situation, and resource service condition includes Load on host computers, cpu usage, memory service condition and storage service condition;
Obtaining each service operation situation step is: 1) define the service for being included in statistics, the state that the service should be in and whether Key service;2) service list for needing to count is obtained;3) all services in polling list, check operating status;4) state is pressed The quantity of service and service list of the state locating for statistic of classification;5) when be the discovery that key service not in corresponding state, in time It sends a warning message to administrator.
3. according to the method described in claim 2, it is characterized by:
The judgement and outer net connection situation is realized by ping and access public web site, if there is normally returning, is thought It is connected to outer net, if network packet is not sent out, then it is assumed that obstructed with outer net network.
4. according to the method described in claim 2, it is characterized by:
The resource service condition step of the acquisition host is: 1) be arranged load on host computers, CPU usage, memory usage and Store utilization rate alarm threshold;2) call instruction obtains load on host computers, CPU usage, memory usage and storage respectively and uses Rate then calls directly local command if it is native data is obtained;If it is remote host data is obtained, then taken by snmp Business obtains;3) judge whether load on host computers, CPU usage, memory usage and storage utilization rate are more than given threshold, are more than threshold Value just triggering alarm, notifies administrator in time.
5. according to the method described in claim 1, it is characterized by:
The virtual machine operation conditions include running situation, with outer net connection situation and with host connection situation, money Source service condition includes cpu usage, memory service condition and storage service condition.
Running situation judges whether system occurs blocking exception by the heartbeat signal issued inside virtual machine, in virtual machine The agent for having run a collection operating status, periodically pushes the operating condition of the virtual machine to host.
6. according to the method described in claim 5, it is characterized by:
It is described with host connection situation by the ip or host name of ping host, whether can be successfully transmitted heartbeat signal simultaneously Host is received to reply to determine.
7. according to the method described in claim 5, it is characterized by:
The resource service condition step of the acquisition virtual machine is: 1) CPU usage, memory usage and storage is arranged and uses Rate alarm threshold;2) libvirt order is called to obtain CPU usage, memory usage and storage utilization rate respectively;3) judge Whether CPU usage, memory usage and storage utilization rate are more than given threshold, and alarm is just triggered more than threshold value, is notified in time Administrator.
8. according to the method described in claim 1, it is characterized by:
The interface is the sync cap of cloud platform intrinsic call;The step S30 is specifically: 1) interface is arranged and responds alarm Threshold value;2) blocker is set in the key position of communication, records the interface name of calling interface, starts allocating time and execute knot The beam time;3) subtracted each other by executing the end time with allocating time is started, obtain the response time of the interface;4) above-mentioned interface Title starts the response time deposit database of allocating time, execution end time and interface to subsequent processing;5) judgement connects Whether the response time of mouth is more than given threshold, and alarm is just triggered more than threshold value, notifies administrator in time.
9. according to the method described in claim 1, it is characterized by:
The long task is the asynchronous interface of cloud platform intrinsic call, is a kind of long operation, appears in creation virtual machine, uploads mirror In the function that picture needs to wait the long period that could complete;
The step S40 is specifically: 1) long task operating situation alarm threshold is arranged;2) it before long task starts, generates and appoints Be engaged in ID, logger task ID, task names, time started, current state, operator's information to database;3) it is set in request header Set task ID, calling interface executive chairman operation;4) after the completion of long operation, management end is sent messages to, management end is according in message Task ID, operating result, current time and task time-consuming update task record;5) judge whether task operating situation is more than to set Determine threshold value, alarm is just triggered more than threshold value, notifies administrator in time.
10. according to the method described in claim 1, it is characterized by:
The step S50 is specifically: 1) score alarm threshold is arranged;2) to the above-mentioned data collected by module classification and announcement Alert grade sets weight;3) module score is gone out by weight calculation to separate modular;4) according to module weight calculation overall scores;5) Judge whether score is lower than given threshold, just triggers alarm lower than threshold value, notify administrator in time.
11. the method according to any of claims 1 to 10, it is characterised in that:
The method supports plug-in type extension, can increase assessment factor on demand;The step of defining plug-in unit is: 1) in entry_ Increase operation entry and the corresponding path for realizing class in point.txt, 2) corresponding realization class is created in above-mentioned path position, 3) Increase evaluation module, sub-step, weight and specific logic in above-mentioned realization class, 4) newly-increased above-mentioned module is added to general comment Estimate in process.
CN201811511461.XA 2018-12-11 2018-12-11 A kind of method of assessment system service level Withdrawn CN109728938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811511461.XA CN109728938A (en) 2018-12-11 2018-12-11 A kind of method of assessment system service level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811511461.XA CN109728938A (en) 2018-12-11 2018-12-11 A kind of method of assessment system service level

Publications (1)

Publication Number Publication Date
CN109728938A true CN109728938A (en) 2019-05-07

Family

ID=66294859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811511461.XA Withdrawn CN109728938A (en) 2018-12-11 2018-12-11 A kind of method of assessment system service level

Country Status (1)

Country Link
CN (1) CN109728938A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442262A (en) * 2022-08-01 2022-12-06 阿里巴巴(中国)有限公司 Resource evaluation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123061A (en) * 2011-03-28 2011-07-13 杭州电子科技大学 Method for determining performance of Web server
CN104184604A (en) * 2013-05-24 2014-12-03 北京天地超云科技有限公司 Cloud platform basic framework supervision system
CN104333488A (en) * 2014-11-04 2015-02-04 哈尔滨工业大学 Cloud service platform performance test method
US20150277858A1 (en) * 2012-10-02 2015-10-01 Nec Corporation Performance evaluation device, method, and medium for information system
CN105376100A (en) * 2015-12-09 2016-03-02 国云科技股份有限公司 Distributed alarm rule assessment method suitable for cloud platform resource monitoring
CN107786616A (en) * 2016-08-30 2018-03-09 江苏蓝创聚联数据与应用研究院有限公司 Main frame intelligent monitor system based on high in the clouds
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123061A (en) * 2011-03-28 2011-07-13 杭州电子科技大学 Method for determining performance of Web server
US20150277858A1 (en) * 2012-10-02 2015-10-01 Nec Corporation Performance evaluation device, method, and medium for information system
CN104184604A (en) * 2013-05-24 2014-12-03 北京天地超云科技有限公司 Cloud platform basic framework supervision system
CN104333488A (en) * 2014-11-04 2015-02-04 哈尔滨工业大学 Cloud service platform performance test method
CN105376100A (en) * 2015-12-09 2016-03-02 国云科技股份有限公司 Distributed alarm rule assessment method suitable for cloud platform resource monitoring
CN107786616A (en) * 2016-08-30 2018-03-09 江苏蓝创聚联数据与应用研究院有限公司 Main frame intelligent monitor system based on high in the clouds
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442262A (en) * 2022-08-01 2022-12-06 阿里巴巴(中国)有限公司 Resource evaluation method and device, electronic equipment and storage medium
CN115442262B (en) * 2022-08-01 2024-02-06 阿里巴巴(中国)有限公司 Resource evaluation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104657250B (en) A kind of monitoring system and its monitoring method that performance monitoring is carried out to cloud host
CN109660380A (en) Monitoring method, platform, system and the readable storage medium storing program for executing of operation condition of server
CN107508722B (en) Service monitoring method and device
CN106612199B (en) A kind of network monitoring data is collected and analysis system and method
CN105610648B (en) A kind of acquisition method and server of O&M monitoring data
US8270579B2 (en) Methods, computer program products, and systems for managing voice over internet protocol (VOIP) network elements
US20060230309A1 (en) System for remote fault management in a wireless network
JP2004021549A (en) Network monitoring system and program
CN111800354B (en) Message processing method and device, message processing equipment and storage medium
CN112256542B (en) eBPF-based micro-service system performance detection method, device and system
EP1890427B1 (en) A system and method for monitoring the device port state
CN102983990A (en) Method and device for management of virtual machine
US20110172963A1 (en) Methods and Apparatus for Predicting the Performance of a Multi-Tier Computer Software System
CN112350854B (en) Flow fault positioning method, device, equipment and storage medium
CN109428779A (en) A kind of monitoring alarm method and device of distributed service
GB2594107A (en) Network analytics
CN102195791A (en) Alarm analysis method, device and system
CN108390907A (en) A kind of management monitoring system and method based on Hadoop clusters
CN111339466A (en) Interface management method and device, electronic equipment and readable storage medium
CN108039956A (en) Using monitoring method, system and computer-readable recording medium
CN109728938A (en) A kind of method of assessment system service level
CN111628903B (en) Monitoring method and monitoring system for transaction system running state
CN110798660A (en) Integrated operation and maintenance system based on cloud federal audio and video fusion platform
CN108449212B (en) MAS message transmission method based on event association
CN110198246B (en) Method and system for monitoring flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190507