CN108683569A - A kind of the business monitoring method and system of cloud service-oriented infrastructure - Google Patents

A kind of the business monitoring method and system of cloud service-oriented infrastructure Download PDF

Info

Publication number
CN108683569A
CN108683569A CN201810585690.XA CN201810585690A CN108683569A CN 108683569 A CN108683569 A CN 108683569A CN 201810585690 A CN201810585690 A CN 201810585690A CN 108683569 A CN108683569 A CN 108683569A
Authority
CN
China
Prior art keywords
testing
server
data
log
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810585690.XA
Other languages
Chinese (zh)
Other versions
CN108683569B (en
Inventor
严寒冰
李佳
马莉雅
李志辉
温森浩
姚力
朱芸茜
王小群
张腾
陈阳
李世淙
徐剑
王适文
饶毓
肖崇蕙
贾子骁
张帅
吕志泉
韩志辉
雷君
周彧
周昊
高川
楼书逸
文静
杜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING RUICHI XINAN TECHNOLOGY Co Ltd
National Computer Network and Information Security Management Center
Original Assignee
BEIJING RUICHI XINAN TECHNOLOGY Co Ltd
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING RUICHI XINAN TECHNOLOGY Co Ltd, National Computer Network and Information Security Management Center filed Critical BEIJING RUICHI XINAN TECHNOLOGY Co Ltd
Priority to CN201810585690.XA priority Critical patent/CN108683569B/en
Publication of CN108683569A publication Critical patent/CN108683569A/en
Application granted granted Critical
Publication of CN108683569B publication Critical patent/CN108683569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0246Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
    • H04L41/0273Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using web services for network management, e.g. simple object access protocol [SOAP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Abstract

The present invention proposes a kind of the business monitoring method and system of cloud service-oriented infrastructure, belongs to the infrastructure field of cloud service.Including control centre's server and it is arranged in the testing servers of each department.Wherein, it is disposed with cloud service testing module on each testing server, testing mission dispatching module, data acquisition module, testing data analysis module, testing alarm module and database are disposed on control centre's server;By the way that testing server is arranged in different regions, the configuration file for the task that monitors is issued to the destination IP correctness of each testing server authentication task in the WEB interface configuration monitoring task of control centre's server, testing mission dispatching module, use the monitoring method of a wide range of asynchronous testing, for router, provide the DNS recursion servers serviced, realize the monitoring to cloud service infrastructure, the loss of data packet is reduced, realizes load balancing, there is higher robustness and stability.

Description

A kind of the business monitoring method and system of cloud service-oriented infrastructure
Technical field
The invention belongs to the infrastructure fields of cloud service, are related to a kind of test of leading to property of cloud service infrastructure network The method and system of verification.
Background technology
With the high speed development of cloud computing and mobile Internet, more and more business events are carried out beyond the clouds, cloud service It is indispensable in our work, and the infrastructure in cloud service is even more the most important thing, only basis is improved and just be can guarantee The robustness of system, the accuracy of data.
Cloud service is commercial distribution process caused by development of computer to Internet era.Cloud service is mainly shown as meter The service of many levels such as calculation machine and network, the storage of information resources, reading, download, information spy, analysis.Due to tool There are the characteristics of safety and stability, mass memory, cloud service to start to be favored by current social enterprise and individual.Thereupon, how Ensure that the stability of cloud service also becomes focus, and how to solve cloud service connectivity under the overall situation of cloud service, tests Safety and stability, the integrality of data of card cloud service are wherein important one.
In multiple office points, there are large-scale networks for cloud service manufacturer, and there are multiple servers in different regions, and Network status is bad, circuit is bad can cause the unstable of cloud service, to cause the loss of data, come to user, to enterprise Say it is very big loss.
The currently monitored method is Black-box Testing mostly, can only be known according to final result it is problematic, but can not be true Determine problem.Different application scenarios have different methods, when data when something goes wrong, it is efficient accurately to navigate to where problem Solve the key of the failure of cloud service.It therefore can be by being dialled to router, DNS (domain name system) server, business module It surveys, to test the leading to property of network of cloud service vendor infrastructure.
It is existing using the specific ICP/IP protocol of tissue (transmission control protocol/Internet Protocol) network message into The mode of row active probe includes mainly come the relevant technologies being monitored to the operating status of different business systems:
1) countries use network research laboratory (NLANER), about research [ginseng that is distributed, focusing on high-performance connection Examine file 1:Mcgregor T,Braun H W,Brown J.The NLANR Network Analysis Infrastructure[J].Communications Magazine IEEE,2000,38(5):122-128], NLANER is supported Passive flux monitoring actively measures and controls three kinds of monitoring modes of information monitoring, wherein AMP (Active Measurement Program) it is active measure the item, two-way time (RTT), packet loss between main measurement website, topological structure, handling capacity Deng.In AMP, monitor mutual transmission ICMP (Internet Control Message Protocol) message per minute, often Routings of ten minutes traceroute to other monitors.Testing throughput can pass through the transmission of a large amount of TCP datas, a large amount of UDP (User Datagram Protocol) data transmission, ping-F and treno are measured.
2) Ethernet OAM Informations detect [reference paper 2:It opens and realizes fault detect and failure in vast Ethernet OAMs [C] Chinese Technology Association's annual meeting informationizations and social development academic discussion sub-venue .2008 is isolated], use continuity check message (Continuity Check messages) is used as heartbeat signal, with the connection situation between detection terminal;Disappeared using link trace Breath (Link Trace messages) be used for record it is end-to-end between hop (jump) path, the Traceroute tools with IP layers It is similar;Loopback message (Loopback messages) is similar to the Ping functions of ICMP, for the connection between detecting terminal Property.
3) Cisco's assurance agent (Cisco Service Assurance Agent)/Cisco's IOS IP service levels Agreement (IOS IP Service Level Agreements) is built in Cisco ios devices, allows active probe and active Monitoring, can configure a large amount of options, such as UDP/TCP port numbers, tos field, VRF instance, source IP, destination IP and web URL Deng.The tool can measure following performance parameters:One-way latency, round-trip delay, delay variation, packet loss, packet order, sound matter Measure scoring, Internet resources availability, application performance, server response time.
In above technology, the requirement of cloud service infrastructure service test cannot be satisfied.For cloud service network Greatly, range is wide, the feature more than office point, needs to carry out a wide range of testing in more office points for cloud service manufacturer, for different agreement (UDP, TCP, HTTP), different office points, different operators carry out accurately asynchronous test, realize call-testing system and data analysis system System separation, for abnormal the reason of can accurately finding its problem, where helping staff to position cloud service system problem.
Invention content
For the demand, in order to realize under cloud service environment, the business of cloud service-oriented infrastructure monitors, the present invention A kind of the business monitoring method and system of cloud service-oriented infrastructure are provided, with the number to router, DNS recursion servers According to being monitored, and feature wide for cloud service range, more than office point, realize asynchronous a wide range of timesharing testing.
The present invention provides a kind of business of cloud service-oriented infrastructure monitor system, including control centre's server with And it is arranged in the testing server of each department.Wherein, cloud service testing module is disposed on each testing server, in control Be disposed on central server testing mission dispatching module, data acquisition module, testing data analysis module, testing alarm module and Database.
User configures monitoring task by testing mission dispatching module, and configuration file is handed down to corresponding testing service Device;Multiple source IPs, more destination IPs, the monitoring task of various protocols of user configuration described in configuration file.The testing is appointed Business issues module and is verified to the destination IP in configuration file, to verify not by task do not issue.
After the cloud service testing module receives configuration file, the task in configuration file is traversed, validation task The correctness of destination IP configures legal data packet to being verified for task, is carried out using asynchronous timesharing testing mode Data testing.The cloud service testing module includes two kinds of testing data:One is pass the DNS for providing service in cloud service Server is returned to send the data packet for formulating format domain name;One is the router-list passed through according to the cloud service flow of acquisition and Data packet number of the sampling of router than testing is arranged, to destination IP transmission data packet.Cloud service testing module is to purpose After the DNS recursion server transmission data packets that IP and offer service, records and give out a contract for a project daily record S-Log and be sent to control centre's clothes The testing data analysis module of business device.
The data acquisition module obtains data fingerprint information according to the daily record S-Log that gives out a contract for a project, and traverses office point database, The connection of office point database is first verified before data query, if connection failure or query timeout, by problem log to problem day In will E-Log, if successful connection, the storage data of inquiry office point data base after the completion of authorities' point data base traverses, generate Capture file and capture file daily record R-Log.The data fingerprint information is expressed as a hexa-atomic group information (source IP, destination IP, source port, destination interface, protocol number, rule ID).
The testing data analysis module obtains the daily record R-Log of the corresponding capture file of some task, problem The daily record E-Log and daily record S-Log that gives out a contract for a project, first Traversal Problem daily record E-Log, to problematic office point database in database In be marked, and record corresponding office point database problem;Secondly the daily record S-Log that gives out a contract for a project is traversed, is compared with daily record R-Log It is right, if testing data are the flow data sent to destination IP, according to data fingerprint information, calculate passed through being averaged for router and adopt Sample ratio, the storage rate of calculating task;If testing data are the data sent to DNS recursion servers, according to data fingerprint information It is compared, the storage rate of calculating task.
Stream monitoring of the testing alarm module for router and the prison for the DNS recursion servers for providing service It surveys, is all preset with threshold value, the storage rate and threshold value comparison of the task that testing data analysis module is calculated are less than threshold to storage rate The task of value carries out alarm prompt.
A kind of business monitoring method of cloud service-oriented infrastructure proposed by the present invention, divides following steps:
Step 1:Testing server is set in different regions, setting testing server uses asynchronous timesharing testing mode;Institute The asynchronous timesharing testing mode stated refers to that setting testing data transmission is asynchronous, and the data of setting quantity are sent every setting time Packet;
Step 2:User configures monitoring task, testing mission dispatching module verification in the WEB interface of control centre's server The destination IP correctness of task, if properly generating the configuration file of monitoring task;By the information (state for inquiring IP in the libraries IP Family, province, city, operator) it is whether correct, carry out the correctness of verifying purpose IP;
Step 3:The configuration file for the task that monitors is issued to each testing server, testing service by testing mission dispatching module Device verifies the destination IP correctness in configuration file, if mistake, to the destination IP without testing, and mistake is believed Breath feeds back to control centre's server;
Step 4:The sampling ratio for obtaining router-list and router that cloud service flow passes through, according to adopting for router Sample is than the Probability Condition that the stream daily record with setting is sampled, testing data packet of the setting testing server to destination IP;Testing takes Business device generates the daily record S-Log that gives out a contract for a project after sending testing data packet every time and is sent to control centre's server.
If the sampling of router is compared for 1/X, it is desirable that the probability that stream daily record is sampled is more than G%, the quantity of testing data packet For Y, then there is relationship:When transmission data packet quantity Y, the probability that stream daily record is sampled is
Step 5:Testing server obtains the DNS recursion server lists that service is provided in cloud service, to recurrence DNS service Device sends the data packet of specified format domain name, and carries out packet capturing, generates PCAP files;
Step 6:Testing server sends PCAP files to intermediate robot, and intermediate robot carries out packet authentication to protect Demonstrate,prove safety;Testing server, which is sent the packet within away by intermediate robot and generates the daily record S-Log that gives out a contract for a project, is sent to control Central server processed;
Step 7:The connection status of control centre's server authentication office point database, if connection failure or inquiry are super When, it will be in problem log to problem log E-Log;
Step 8:Control centre's server is according to the finger print information of the log acquisition data packet of giving out a contract for a project of reception, from each office point number Data query is carried out according to library, generates capture file;The finger print information is (source IP, destination IP, source port, destination Mouth, protocol number, rule ID).
Step 9:For some task, control centre's server obtains daily record R-Log and the problem day of capture file Will E-Log searches the corresponding daily record S-Log that gives out a contract for a project;By Traversal Problem daily record E-Log, to problem office point database in control The problems of be marked, and mark in the database of the heart;
Step 10:Control centre's server traverses the daily record S-Log that gives out a contract for a project, to normal office point database, by daily record of giving out a contract for a project S-Log is compared with daily record R-Log, if testing data are the flow data sent to destination IP, according to data fingerprint information, meter Calculate the average sample ratio of passed through router, the storage rate of calculating task, if testing data are sent to DNS recursion servers Data are compared according to data fingerprint information, the storage rate of calculating task;
Step 11:By the storage rate of calculating and preset corresponding storage rate threshold value comparison, if it is less than threshold value then in WEB Interface is prompted.
The method of the present invention compared with traditional business monitoring technology, has the following advantages that and good effect with system:
(1) method and system of the invention provides a kind of data monitoring side for router, DNS recursion servers Case, the function that the characteristic that can be sampled using router, recursion server provide domain name mapping carry out the knowledge for being directed to specific data , it compares and analyzes, using the method for Black-box Testing, event is carried out to operation system under the premise of not influencing existing business system Barrier test.In the method and system of the present invention, by the stream monitoring for router, calculates storage rate and can determine cloud service link The problem of, the monitoring of the DNS data by middle machine people forwarding is analyzed, and authentication is carried out, and provides safe DNS prisons It surveys, demonstrates the parsing function of dns server.
(2) method and system of the invention is monitored using asynchronous a wide range of timesharing, wide for cloud service range, office point is more The characteristics of, it is detached with monitoring analysis platform into a wide range of timesharing testing of line asynchronous, and by testing, reduces the pressure of server Power reduces the loss of data packet, greatly enhances flow data acquisition rate, load balancing is realized, to system under test (SUT) Influence is smaller, has higher robustness and stability, improves the robustness and validity of business monitoring system.
(3) system of the invention is disposed using distributed mode, cloud service testing module and testing data analysis module Phase separation can carry out accurately asynchronous test to business bureau's point of different office points, different operators.The present invention is by load balancing Technology introduces this method, realizes the parallel processing to data, realizes the asynchronous monitoring that testing module is detached with statistical module, system Counting is not influenced by testing module, the performance and autgmentability for enhancing the availability of system, improving system.
Description of the drawings
Fig. 1 is that the business of the cloud service-oriented infrastructure of the present invention monitors the overall structure figure of system;
Fig. 2 is the function implementation flow chart of cloud service testing module in present system;
Fig. 3 is the function implementation flow chart of data acquisition module in present system;
Fig. 4 is the function implementation flow chart of testing data analysis module in present system.
Specific implementation mode
Illustrate technical scheme of the present invention with reference to the accompanying drawings and examples.
The business monitoring method and system of cloud service-oriented infrastructure provided by the invention use asynchronous group a wide range of The monitoring method of survey for router, provides the DNS recursion servers serviced, realizes the monitoring to cloud service infrastructure, With implementing to monitor, the function of real-time prompting is accurate to infrastructure, and investigating mistake for cloud service provides effective support.
As shown in Figure 1, the invention discloses a kind of business of cloud service-oriented infrastructure to monitor system, including it is arranged in The testing server and control centre's server of each department, wherein cloud service testing mould is disposed on each testing server Block is disposed with testing mission dispatching module on control centre's server, data acquisition module, testing data analysis module, dials Survey alarm module and database.
Control centre's server is a service cluster, and the above-mentioned module for being arranged in control centre's server is available individual Server is realized.Or it is limited to resource, wherein several modules are integrated on a server and are realized.
User configures Detection task by testing mission dispatching module, and configuration file is handed down to corresponding testing service Device.In testing mission dispatching module, user by WEB interface configuration distributing task, configure multiple source IPs (different provinces), more innings Destination IP, the testing task of various protocols (TCP, UDP, HTTP) of point, by different mission dispatching to corresponding testing server, So that testing server carries out testing.Testing mission dispatching module provides IP authentication functions, is verified to destination IP, to not being inconsistent The IP tasks of conjunction, which are not given, to be issued.
After cloud service testing module receives the configuration file that testing mission dispatching module issues, to destination IP Information Authentication After carry out testing, timesharing, intermittent testing are carried out, to ensure the stability of testing, by corresponding destination IP and offer The recursion server of service sends mass data packet, records daily record of giving out a contract for a project, and the daily record that will give out a contract for a project carries out compression transmission, is sent to Testing data analytics server.As shown in Fig. 2, cloud service testing module reads configuration file, the task in configuration file is traversed, The correctness of the purpose IP address of each task is verified, when being verified, configuration task is regular accordingly, gives destination IP Transmission data packet, and during the record that will give out a contract for a project is added to and gives out a contract for a project daily record S-Log, it, will after all tasks in configuration file The daily record S-Log that gives out a contract for a project returns to control centre's server.
Testing mission dispatching module and cloud service testing module are all by the libraries IP to the verification of correctness of purpose IP address Whether the information (country, province, city, operator) for inquiring destination IP is correct, is verified if correct, otherwise verifies and do not pass through.
Cloud service testing module uses asynchronous timesharing testing mode, and the router passed through according to the cloud service flow of acquisition The data packet number of list and the sampling of router than testing is arranged.DNS recursion servers to providing service in cloud service are sent out Send the data packet for formulating format domain name.The data packet that the timesharing of cloud service testing module, interval are sent is according to the different network in various regions Environment can be configured flexibly, to avoid to destination server regular traffic influence and disappear to local network bandwidth Consumption.
Data acquisition module obtains data fingerprint information according to the daily record S-Log that gives out a contract for a project, as shown in figure 3, traversal office point data The connection of office point database is first verified in library before data query, if connection failure or query timeout, by problem log to asking It inscribes in daily record E-Log, if successful connection, the storage data of inquiry office point data base, authorities' point data base traversal is completed, complete When at query task, capture file, the entitled .ok of suffix of this document are generated, and generate the daily record R- of capture file Log.The data fingerprint information is expressed as hexa-atomic group information (source IP, destination IP, source port, destination interface, an agreement Number, rule ID).Data fingerprint information is also dyeing information, can be with the data message of one testing of unique mark, the method for dyeing It with label is made of the module information of test system, there is prodigious flexibility and operability.
Testing data analysis module obtains the daily record R-Log of the corresponding capture file of some task, problem log E- The Log and daily record S-Log that gives out a contract for a project, as shown in figure 4, being carried out first to problematic office point database according to problem log E-Log Label, then to each task, traverses its daily record S-Log that gives out a contract for a project, daily record R-Log is compared with S-Log.Encounter problems office When point data base, it is marked in the database of control centre's server and records corresponding office point database problem.To just Normal office point database, if testing data are the flow data sent to destination IP, according to data fingerprint information, calculating is passed through The average sample ratio of router, calculates the storage rate of each task;If testing data are the numbers sent to DNS recursion servers According to being compared according to data fingerprint information, calculate the storage rate of each task.Testing data analysis module also statistical is precipitated Storage rate variation in each period, provides Long-term change trend.In Fig. 4, if the daily record traversal of giving out a contract for a project to some task terminates Afterwards, capture file daily record R-Log, the problem log E-Log of analysis and the daily record S-Log that gives out a contract for a project will be participated in from current analysis Catalogue is moved under backup directory.
Stream monitoring of the testing alarm module for router and the monitoring for the DNS recursion servers for providing service, all It is previously provided with threshold value, the storage rate for the task that testing data analysis module calculates is compared with corresponding threshold value, for Storage rate carries out alarm prompt less than the task of threshold value.
The local data base of database central server in order to control flows daily record wherein storing collected stream log information In information other than the finger print information (source IP, destination IP, source port, destination interface, protocol number, rule ID) for being included, also wrap Containing the content in a part of data packet.
The business monitoring method of cloud service-oriented infrastructure provided by the invention, includes the following steps 1~11.
Step 1:Using asynchronous a wide range of timesharing testing technology, asynchronous test pattern is realized, by testing and monitoring point Analysis is detached, and carries out timesharing testing, is reduced the pressure of destination IP server, is reduced the loss of data packet.
In this step, more testing servers are provided, for P different province, there is testing server in each province, Load-balancing function is realized, to ensure the robustness and convergence of algorithm.P is the integer more than 2.
Every testing server uses asynchronous timesharing testing mode, i.e. testing data are sent asynchronous, to every K data Packet is arranged testing interval t, can largely improve storage rate.K is positive integer.Timesharing testing is carried out, destination IP clothes are reduced The pressure of business device, reduces the loss of data packet.Testing interval 1s is arranged in every 1000 data packets in the embodiment of the present invention, can be with Largely improve storage rate.
Asynchronous mode in the method for the present invention is also embodied in, will because testing data loading has delay (10min-30min) Cloud service testing is detached with testing data analysis, i.e. testing data analysis is not located at testing server in identical platform, The two is independent of each other.
Step 2:User issues monitoring task by the WEB interface of control centre's server, and testing mission dispatching module is tested The correctness for demonstrate,proving the destination IP of monitoring task, it is whether correct by the information (country, province, city, operator) of IP library inquiries IP, no It is correctly prompted, if all properly generating monitoring task configuration file, enters step 3.
Step 3:The configuration file for the task that monitors is issued to testing server, testing server by testing mission dispatching module The correctness of destination IP in same verification configuration file, by the information (country, province, city, operator) of IP library inquiries IP whether Correctly, if mistake, to the destination IP without testing, and feedback error is prompted to control centre's server.
Step 4:The data packet number that each testing server is sent to destination IP is set, data packet group at stream record can quilt The router sample acquisition of approach target ip address arrives.The router-list that cloud service flow passes through is obtained, and obtains its routing The sampling ratio of device.If router has sampling than 1/X, the daily record of router is equally sampled data.It is surveyed for the stream of router Examination needs to ensure that the stream daily record that message generates can be captured under sampling since the stream daily record that router generates is sampling.
If the sampling of router is compared for 1/X, it is desirable that the probability that stream daily record is sampled is more than G%, the quantity of testing data packet For Y, that is, the stream quantity sent is Y.X, Y is positive integer, and G is the positive number less than 100.
When Y=1, the collected probability of stream daily record isIf X=1000 at this time, acquisition probability 1/ 1000。
Y>When 1, the probability that stream daily record is sampled is
Then at this point, in sampling ratio 1:Under 1000, the probability that stream daily record is sampled needs value 10000 more than 99.99%, Y, should Probability is 99.995483%;
In sampling ratio 1:Under 2000, it is 99.995471% that Y, which needs value 20000, the probability that stream daily record is sampled,;
In sampling ratio 1:Under 5000, it is 99.995465% that Y, which needs value 50000, the probability that stream daily record is sampled,.
In addition stream daily record output is carried out in view of router, needs to consider to send interval t, ensures the stability for connecing packet.For Cloud service carries out a wide range of testing, 50000 data packets of each destination IP testing.Testing server is sending testing data every time The daily record S-Log that gives out a contract for a project is generated after packet is sent to control centre's server.
Step 5:Testing server obtains the DNS recursion server lists that service is provided in cloud service, to recurrence DNS service Device sends the data packet of specified format domain name, and carries out packet capturing, obtains PCAP files.
PCAP file formats are common data packet storage formats, and the mainstream packet capturing software including wireshark is all The data packet of this format can be generated.
Step 6:Testing server sends PCAP files to intermediate robot, and intermediate robot carries out packet authentication to protect Demonstrate,prove safety.Whether in order to control the source IP of the go-between's machine authentication data packet IP in the white list at center, whether domain name For regular Domain Name, if not in white list IP or domain name be not regular Domain Name, parse PCAP files, reorganizing packets, It is the IP in white list to forge source IP, forges specific standard domain name etc..Intermediate robot by the data packet being verified and The data packet of recombination is sent.Testing server generates after sending testing data packet by middle machine human hair every time gives out a contract for a project day Will S-Log is sent to control centre's server.
Step 7:The connection status of certification authority's point data base.Because data are stored in the corresponding office point database in more ground, because This increases database authentication function.The connection status of control centre's server authentication database, if connection failure or inquiry Problem be recorded in problem log E-Log by time-out.
Step 8:The finger print information for the log acquisition data of giving out a contract for a project that control centre sends according to testing server obtains fingerprint Hexa-atomic group information (source IP, destination IP, source port, destination interface, protocol number, rule ID) then carries out according to finger print information more The data query of office point generates data and acquires .ok files;
Step 9:Control centre's server obtains the daily record R-Log of the capture file of task, searches correspondence and gives out a contract for a project day Will S-Log, Traversal Problem daily record E-Log, determine problematic office point database and there are the problem of, and in control centre It is marked in database;
Step 10:The daily record S-Log that gives out a contract for a project is traversed, finger print information is then passed through to normal office point database and data acquire day Will R-Log is compared.If testing data are flow data, the router calculating average sample ratio passed through by stream, final point Warehousing quantity is precipitated, calculates the storage rate gone out on missions;Comparing is directly then carried out according to fingerprint if it is DNS data, is calculated The storage rate of task;
Step 11:According to preset threshold value, the storage rate of calculating is compared with threshold value, is then existed if it is less than threshold value WEB interface prompts, and provides corresponding problem clew.In embodiments of the present invention, storage rate threshold value is arranged in the step is all 65%.
Compared with the prior art, the method for the present invention calculates storage rate, it is determined that cloud takes by the stream monitoring for router The problem of business link;By the DNS data method for monitoring and analyzing that go-between forwards, authentication is carried out, safe DNS is provided Monitoring, verifies the parsing function of dns server.A kind of method of asynchronous a wide range of timesharing monitoring provided by the invention, timesharing Testing reduces the loss of data packet, realizes load balancing, and the influence to system under test (SUT) is smaller, with higher robustness and surely It is qualitative.

Claims (5)

1. the business of cloud service-oriented infrastructure a kind of monitors system, which is characterized in that including control centre's server and It is arranged in the testing server of each department;Wherein, cloud service testing module is disposed on each testing server, in control centre Testing mission dispatching module, data acquisition module, testing data analysis module, testing alarm module sum number are disposed on server According to library;
User configures monitoring task by testing mission dispatching module, and configuration file is handed down to corresponding testing server; Multiple source IPs, more destination IPs, the monitoring task of various protocols of user configuration described in configuration file;Under the testing task Hair module the destination IP in configuration file is verified, to verify not by task do not issue;
After the cloud service testing module receives configuration file, the task in configuration file, the purpose of validation task are traversed The correctness of IP configures legal data packet to being verified for task, and data are carried out using asynchronous timesharing testing mode Testing;The cloud service testing module includes two kinds of testing data:One is the DNS recurrence clothes to providing service in cloud service The data packet for device transmission formulation format domain name of being engaged in;One is the router-list passed through according to the cloud service flow of acquisition and routings Data packet number of the sampling of device than testing is arranged, to destination IP transmission data packet;Cloud service testing module to destination IP with And after the DNS recursion server transmission data packets of service are provided, record and give out a contract for a project daily record S-Log and be sent to control centre's server Testing data analysis module;
The data acquisition module obtains data fingerprint information according to the daily record S-Log that gives out a contract for a project, and office point database is traversed, in data The connection that office point database is first verified before inquiry, if connection failure or query timeout, by problem log to problem log E- In Log, if successful connection, the storage data of inquiry office point data base after the completion of authorities' point data base traverses, generate data Acquire file and capture file daily record R-Log;
The testing data analysis module obtains daily record R-Log, the problem log of the corresponding capture file of some task The E-Log and daily record S-Log that gives out a contract for a project, first Traversal Problem daily record E-Log, to problematic office point database in the database into Line flag, and record corresponding office point database problem;Secondly the daily record S-Log that gives out a contract for a project is traversed, is compared with daily record R-Log, If testing data are the flow data sent to destination IP, according to data fingerprint information, the average sample of passed through router is calculated Than the storage rate of calculating task;If testing data are the data sent to DNS recursion servers, according to data fingerprint information into Row comparison, the storage rate of calculating task;
Stream monitoring of the testing alarm module for router and the monitoring for the DNS recursion servers for providing service, It is all preset with threshold value, the storage rate and threshold value comparison of the task that testing data analysis module is calculated are less than threshold value to storage rate Task carry out alarm prompt.
2. a kind of business of cloud service-oriented infrastructure according to claim 1 monitors system, which is characterized in that described Control centre's server be a service cluster, the individual server of each module for being arranged in control centre's server To realize.
3. a kind of business of cloud service-oriented infrastructure according to claim 1 monitors system, which is characterized in that described Testing mission dispatching module to monitor task destination IP, pass through the information (country, province, city, operator) of IP library inquiries IP It is whether correct, it is verified if correct.
4. a kind of business of cloud service-oriented infrastructure according to claim 1 monitors system, which is characterized in that described Data fingerprint information be expressed as hexa-atomic group information (source IP, destination IP, source port, destination interface, protocol number, a rule ID)。
5. a kind of business monitoring method of cloud service-oriented infrastructure, which is characterized in that include the following steps:
Step 1:Testing server is set in different regions, setting testing server uses asynchronous timesharing testing mode;Described Asynchronous timesharing testing mode refers to that setting testing data transmission is asynchronous, and the data packet of setting quantity is sent every setting time;
Step 2:User configures monitoring task, testing mission dispatching module verification task in the WEB interface of control centre's server Destination IP correctness, if properly generating the configuration file of monitoring task;
Step 3:The configuration file for the task that monitors is issued to each testing server, testing server pair by testing mission dispatching module Destination IP correctness in configuration file is verified, if mistake, to the destination IP without testing, and is fed back in control Central server;
Step 4:The sampling ratio for obtaining router-list and router that cloud service flow passes through, according to the sampling of router ratio The Probability Condition that stream daily record with setting is sampled, testing data packet of the setting testing server to destination IP;Testing server The daily record S-Log that gives out a contract for a project, which is generated, after sending testing data packet every time is sent to control centre's server;
If the sampling of router is compared for 1/X, it is desirable that the probability that stream daily record is sampled is more than G%, and the quantity of testing data packet is Y, Then there is relationship:When transmission data packet quantity Y, the probability that stream daily record is sampled is
Step 5:Testing server obtains the DNS recursion server lists that service is provided in cloud service, is sent out to recurrence dns server The data packet of specified format domain name is sent, and carries out packet capturing, generates PCAP files;
Step 6:Testing server sends PCAP files to intermediate robot, and intermediate robot carries out packet authentication to ensure to pacify Quan Xing;Testing server is sent the packet within away by intermediate robot and generates the daily record S-Log that gives out a contract for a project and is sent in control Central server;
Step 7:The connection status of control centre's server authentication office point database will if connection failure or query timeout In problem log to problem log E-Log;
Step 8:Finger print information is obtained in the daily record of giving out a contract for a project that control centre's server is sent from testing server, according to finger print information The data query of more office points is carried out, capture file is generated;The finger print information is (source IP, destination IP, source port, mesh Port, protocol number, rule ID);
Step 9:To some task, control centre's server obtains the daily record R-Log and problem log E- of capture file Log searches the corresponding daily record S-Log that gives out a contract for a project;By Traversal Problem daily record E-Log, to problem office point database in control centre The problems of be marked, and mark in database;
Step 10:Control centre's server traverses the daily record S-Log that gives out a contract for a project, and to normal office point database, will give out a contract for a project daily record S-Log It is compared with daily record R-Log, if testing data are the flow data sent to destination IP, according to data fingerprint information, calculates institute By the average sample ratio of router, the storage rate of calculating task, if testing data are the numbers sent to DNS recursion servers According to being compared according to data fingerprint information, the storage rate of calculating task;
Step 11:By the storage rate of calculating and preset corresponding storage rate threshold value comparison, if it is less than threshold value then in WEB interface It is prompted.
CN201810585690.XA 2018-06-06 2018-06-06 Service monitoring method and system for cloud service infrastructure Active CN108683569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810585690.XA CN108683569B (en) 2018-06-06 2018-06-06 Service monitoring method and system for cloud service infrastructure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810585690.XA CN108683569B (en) 2018-06-06 2018-06-06 Service monitoring method and system for cloud service infrastructure

Publications (2)

Publication Number Publication Date
CN108683569A true CN108683569A (en) 2018-10-19
CN108683569B CN108683569B (en) 2020-06-09

Family

ID=63810284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810585690.XA Active CN108683569B (en) 2018-06-06 2018-06-06 Service monitoring method and system for cloud service infrastructure

Country Status (1)

Country Link
CN (1) CN108683569B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109921925A (en) * 2019-02-15 2019-06-21 北京奇艺世纪科技有限公司 A kind of dial testing method and device
CN110519303A (en) * 2019-09-30 2019-11-29 北京市天元网络技术股份有限公司 Communication means and system across xegregating unit
CN112100133A (en) * 2020-11-04 2020-12-18 广州市玄武无线科技股份有限公司 Distributed log processing system
CN112463572A (en) * 2019-09-06 2021-03-09 福建天泉教育科技有限公司 Cross-border multi-service dial testing software testing system and method thereof
CN112866053A (en) * 2020-12-31 2021-05-28 天翼物联科技有限公司 Internet of things testing method, system and device and storage medium
CN113572644A (en) * 2021-07-26 2021-10-29 武汉众邦银行股份有限公司 Internet cloud dial-up test automatic monitoring method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727389A (en) * 2009-11-23 2010-06-09 中兴通讯股份有限公司 Automatic test system and method of distributed integrated service
CN201601833U (en) * 2009-12-28 2010-10-06 福建邮科通信技术有限公司 Automatic detecting system for wireless network
CN102546269A (en) * 2010-12-07 2012-07-04 中国移动通信集团广东有限公司 Method and system capable of fast monitoring internet protocol (IP) network
KR20140039686A (en) * 2012-09-25 2014-04-02 에스케이텔레콤 주식회사 Apparatus and method for providing quality analysis of data service
CN104753735A (en) * 2013-12-31 2015-07-01 中国移动通信集团上海有限公司 Dialing testing system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727389A (en) * 2009-11-23 2010-06-09 中兴通讯股份有限公司 Automatic test system and method of distributed integrated service
CN201601833U (en) * 2009-12-28 2010-10-06 福建邮科通信技术有限公司 Automatic detecting system for wireless network
CN102546269A (en) * 2010-12-07 2012-07-04 中国移动通信集团广东有限公司 Method and system capable of fast monitoring internet protocol (IP) network
KR20140039686A (en) * 2012-09-25 2014-04-02 에스케이텔레콤 주식회사 Apparatus and method for providing quality analysis of data service
CN104753735A (en) * 2013-12-31 2015-07-01 中国移动通信集团上海有限公司 Dialing testing system and method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109921925A (en) * 2019-02-15 2019-06-21 北京奇艺世纪科技有限公司 A kind of dial testing method and device
CN112463572A (en) * 2019-09-06 2021-03-09 福建天泉教育科技有限公司 Cross-border multi-service dial testing software testing system and method thereof
CN112463572B (en) * 2019-09-06 2023-09-15 福建天泉教育科技有限公司 Cross-border multi-service dial testing software testing system and method thereof
CN110519303A (en) * 2019-09-30 2019-11-29 北京市天元网络技术股份有限公司 Communication means and system across xegregating unit
CN112100133A (en) * 2020-11-04 2020-12-18 广州市玄武无线科技股份有限公司 Distributed log processing system
CN112866053A (en) * 2020-12-31 2021-05-28 天翼物联科技有限公司 Internet of things testing method, system and device and storage medium
CN113572644A (en) * 2021-07-26 2021-10-29 武汉众邦银行股份有限公司 Internet cloud dial-up test automatic monitoring method and device
CN113572644B (en) * 2021-07-26 2024-01-23 武汉众邦银行股份有限公司 Internet cloud dial testing automatic monitoring method and device

Also Published As

Publication number Publication date
CN108683569B (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN108683569A (en) A kind of the business monitoring method and system of cloud service-oriented infrastructure
Dhamdhere et al. Inferring persistent interdomain congestion
US8443074B2 (en) Constructing an inference graph for a network
Cinque et al. Microservices monitoring with event logs and black box execution tracing
US7076547B1 (en) System and method for network performance and server application performance monitoring and for deriving exhaustive performance metrics
US9210050B2 (en) System and method for a testing vector and associated performance map
Carneiro et al. Flowmonitor: a network monitoring framework for the network simulator 3 (ns-3)
US8578017B2 (en) Automatic correlation of service level agreement and operating level agreement
US9712415B2 (en) Method, apparatus and communication network for root cause analysis
CN107113203B (en) Apparatus, system and method for debugging network connectivity
US9172593B2 (en) System and method for identifying problems on a network
US20030005145A1 (en) Network service assurance with comparison of flow activity captured outside of a service network with flow activity captured in or at an interface of a service network
CN108900640A (en) Node calls link generation method, device, computer equipment and storage medium
US20140280904A1 (en) Session initiation protocol testing control
CN101656642B (en) Method, device and system for testing authentication performance of network access equipment
Khanna et al. Automated online monitoring of distributed applications through external monitors
Cinque et al. An exploratory study on zeroconf monitoring of microservices systems
CN114389792B (en) WEB log NAT (network Address translation) front-back association method and system
Viipuri Traffic analysis and modeling of IP core networks
Madariaga et al. PePa ping dataset: Comprehensive contextualization of periodic passive ping in wireless networks
Bocchi et al. Statistical network monitoring: Methodology and application to carrier-grade NAT
Toll et al. IoTreeplay: Synchronous Distributed Traffic Replay in IoT Environments
Flach et al. Diagnosing slow web page access at the client side
US11899568B2 (en) Enriched application outage insights
Putra Cloud-based Distributed Internet Measurement Platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant