CN112769622A - Cluster service fault early warning system based on RPC service monitoring - Google Patents

Cluster service fault early warning system based on RPC service monitoring Download PDF

Info

Publication number
CN112769622A
CN112769622A CN202110060005.3A CN202110060005A CN112769622A CN 112769622 A CN112769622 A CN 112769622A CN 202110060005 A CN202110060005 A CN 202110060005A CN 112769622 A CN112769622 A CN 112769622A
Authority
CN
China
Prior art keywords
early warning
data
fault early
monitoring
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110060005.3A
Other languages
Chinese (zh)
Inventor
孙冬英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110060005.3A priority Critical patent/CN112769622A/en
Publication of CN112769622A publication Critical patent/CN112769622A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/30Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
    • H04L9/3066Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of cluster service fault early warning, and discloses a cluster service fault early warning system based on RPC service monitoring, which comprises: the cloud computing server CCScsfa is operated with cluster service fault early warning system server side software and is deployed in a remote cloud end, and the computer terminal PCTcsfa is operated with cluster service fault early warning system client side software and is used for executing cluster server operation and maintenance management tasks, and the computer terminal PCTcsfa is in communication connection with the cloud computing server CCScsfa through network communication equipment; the cluster service fault early warning system comprises a data acquisition agent node CNi, a gateway server and a data computing center, wherein the data computing center is in communication connection with the gateway server, and the gateway server is in communication connection with the acquisition agent node CNi. The invention solves the technical problem of how to realize the monitoring and early warning of the cluster system through the monitoring operation of the remote service call service.

Description

Cluster service fault early warning system based on RPC service monitoring
Technical Field
The invention relates to the technical field of cluster service fault early warning, in particular to a cluster service fault early warning system based on RPC service monitoring.
Background
With the continuous development of computer manufacturing technology and network communication technology, the cluster system gradually replaces the traditional mainframe and huge computer due to the advantages of good expansibility, high cost performance and the like, and is widely applied to numerous industrial fields. However, the cluster system is often composed of common computer nodes, and cannot guarantee stable and reliable operation, node failure or network problems, even sudden access volumes, all of which can cause the cluster to interrupt service. Therefore, cluster monitoring becomes a key technology for ensuring cluster robust service.
The nodes of the cluster system usually communicate in a remote service calling mode, and the nodes complete functions by calling remote service interfaces of other nodes, and simultaneously provide services for the outside through the remote service calling interfaces. Remote service Call (RPC) is an inter-process communication method that allows a process to Call a Remote service interface via a network to perform a function. However, the RPC service has a greatly increased probability of failure due to network failure and peer failure due to cross-host communication. The information of success rate, delay and the like of the RPC service is mastered, so that the current situation of the cluster service is mastered, and the quality of the cluster service is guaranteed. Therefore, how to realize the monitoring and early warning of the cluster system through the monitoring operation of the remote service call service becomes an effective solution for ensuring the service quality of the cluster system.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a cluster service fault early warning system based on RPC service monitoring, which aims to solve the technical problem of how to realize monitoring and early warning of a cluster system through monitoring operation of remote service call service.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme:
a cluster service fault early warning system based on RPC service monitoring comprises: the cloud computing server CCScsfa is operated with cluster service fault early warning system server side software and is deployed in a remote cloud end, and the computer terminal PCTcsfa is operated with cluster service fault early warning system client side software and is used for executing cluster server operation and maintenance management tasks, and the computer terminal PCTcsfa is in communication connection with the cloud computing server CCScsfa through network communication equipment;
the cluster service fault early warning system comprises a data acquisition agent node CNi, a gateway server and a data computing center, wherein the data computing center is in communication connection with the gateway server, and the gateway server is in communication connection with the acquisition agent node CNi.
Furthermore, the data collection agent node CNi is deployed at each monitoring node, and is responsible for collecting monitoring data reported by the RPC framework of the monitored process through inter-process communication and actively sending the data to the gateway server.
Furthermore, the data computing center is responsible for real-time computing and real-time analysis of large-scale monitoring data streams and mainly comprises a data cleaning module, a data statistics module, a result analysis and alarm module and a data storage module.
Furthermore, the data analysis and alarm module is responsible for analyzing the statistical result of the monitoring data stream based on the threshold judgment rule and judging whether an alarm needs to be sent to the operation and maintenance platform.
(III) advantageous technical effects
Compared with the prior art, the invention has the following beneficial technical effects:
the data collection agent node CNi is deployed at each monitoring node and used for collecting monitoring data reported by an RPC frame of a monitored process through interprocess communication, the data are actively sent to the data computing center through the gateway server, the data computing center is responsible for real-time computation and real-time analysis of large-scale monitoring data streams, the statistical result of the monitoring data streams is analyzed based on the threshold judgment rule, whether an alarm needs to be sent to an operation and maintenance platform is judged, and therefore the technical effect of monitoring and early warning of a cluster system through monitoring operation of remote service call service is achieved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A cluster service fault early warning system based on RPC service monitoring comprises: the cloud computing server CCScsfa is operated with cluster service fault early warning system server side software and is deployed in a remote cloud end, and the computer terminal PCTcsfa is operated with cluster service fault early warning system client side software and is used for executing cluster server operation and maintenance management tasks, and the computer terminal PCTcsfa is in communication connection with the cloud computing server CCScsfa through network communication equipment;
the cluster service fault early warning system comprises a data acquisition agent node CNi, a gateway server and a data computing center, wherein the data computing center is in communication connection with the gateway server, and the gateway server is in communication connection with the acquisition agent node CNi;
the data acquisition agent node CNi is deployed at each monitoring node and is responsible for collecting monitoring data reported by the RPC framework of the monitored process through interprocess communication and actively sending the data to the gateway server;
the gateway server is responsible for processing a data reporting request of the data acquisition proxy node CNi and monitoring the aggregation of data;
furthermore, the data computing center is responsible for real-time computation and real-time analysis of large-scale monitoring data streams and mainly comprises a data cleaning module, a data statistics module, a result analysis and alarm module and a data storage module;
the data cleaning module is responsible for acquiring an original monitoring data stream from the gateway server and carrying out validity and timeliness check on the original monitoring data stream;
the data statistics module is responsible for counting the monitoring data flow, and comprises statistics of the system performance of the machine and service performance statistics of each RPC interface;
the data analysis and alarm module is responsible for analyzing the statistical result of the monitoring data flow based on the threshold judgment rule and judging whether an alarm needs to be sent to the operation and maintenance platform;
the data storage module is responsible for storing the statistical and analysis results of the monitoring data into a database;
furthermore, installing and operating communication authority authentication system server software on an operating system of the computer terminal PCTcsfa;
in order to prevent an illegal network node imitating the cloud computing server CCScsfa from sending false cluster service fault early warning information to a computer terminal PCTcsfa through a cluster service fault early warning system, before the computer terminal PCTcsfa receives the early warning information sent by the cloud computing server CCScsfa, a communication authority authentication system authenticates the identity of the cloud computing server CCScsfa, and the authentication method specifically comprises the following steps:
step one, a cloud computing server CCScsfa registers communication authority on a communication authority authentication system, and the method specifically comprises the following steps:
the cloud computing server CCScsfa randomly selects a private key x on the communication authority authentication system, calculates a public key y as x P, and discloses the public key y to the communication authority authentication system, wherein P is a generating element of an elliptic curve E defined on a finite field F;
step two, when the cloud computing server CCScsfa sends cluster service fault early warning information to the computer terminal PCTcsfa, the communication authority authentication system authenticates the identity of the cloud computing server CCScsfa, and the method specifically comprises the following steps:
the cloud computing server CCScsfa randomly selects an integer N, computes M to N to P, and sends M to the communication authority authentication system;
the communication authority authentication system randomly selects a character l to be e to {0,1}, and sends the character l to a cloud computing server CCScsfa;
the cloud computing server CCScsfa computes N + l x and transmits the N + l x to the communication authority authentication system;
the communication authority authentication system verifies whether an equation (N + l x) P + M + l y is established or not;
if the equation is established, the cloud computing server CCScsfa is proved to know the communication private key x and has legal communication authority, and the computer terminal PCTcsfa receives cluster service fault early warning information sent by the cloud computing server CCScsfa;
in the authentication process, the secret key x only participates in operation in the authentication process and is not transmitted in communication, so that an illegal tracker cannot capture the secret key in a circuit, and the identity authentication process of the cloud computing server CCScsfa is zero-knowledge;
wherein, Remote service Call (RPC) is an interprocess communication protocol, the protocol allows the application process to request service to the application process on the Remote computer through the network, and need not to know the details of the underlying network, RPC adopts the client/server architecture, the process of the initiative request service is equivalent to the client, the process of providing service is relative to the server, the client requests the RPC service of the server through the local RPC interface, the basic communication flow of the RPC protocol is: the client transmits the RPC service interface method and related parameters of the request through the local RPC interface, the local interface packages the client request into an RPC message, the RPC message is transmitted to the server through the network, the server analyzes the RPC message, converts the RPC message into a call request, executes the corresponding service interface, finally returns an execution result to the client, and when the client receives the service result, the RPC call is ended.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. The utility model provides a cluster service fault early warning system based on RPC service monitoring which characterized in that includes: the cloud computing server CCScsfa is operated with cluster service fault early warning system server side software and is deployed in a remote cloud end, and the computer terminal PCTcsfa is operated with cluster service fault early warning system client side software and is used for executing cluster server operation and maintenance management tasks, and the computer terminal PCTcsfa is in communication connection with the cloud computing server CCScsfa through network communication equipment;
the cluster service fault early warning system comprises a data acquisition agent node CNi, a gateway server and a data computing center, wherein the data computing center is in communication connection with the gateway server, and the gateway server is in communication connection with the acquisition agent node CNi.
2. The RPC service monitoring-based cluster service fault early warning system of claim 1, wherein the data collection agent node CNi is deployed at each monitoring node, and is responsible for collecting monitoring data reported by a monitored process RPC framework through inter-process communication and actively sending the data to a gateway server.
3. The RPC service monitoring-based cluster service fault early warning system as claimed in claim 2, wherein the data computing center is responsible for real-time computation and real-time analysis of large-scale monitoring data streams, and mainly comprises a data cleaning module, a data statistics module, a result analysis and alarm module, and a data storage module.
4. The RPC service monitoring-based cluster service fault early warning system of claim 3, wherein the data analysis and alarm module is responsible for analyzing the statistical result of the monitoring data stream based on a threshold judgment rule to judge whether an alarm needs to be sent to the operation and maintenance platform.
CN202110060005.3A 2021-01-18 2021-01-18 Cluster service fault early warning system based on RPC service monitoring Pending CN112769622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110060005.3A CN112769622A (en) 2021-01-18 2021-01-18 Cluster service fault early warning system based on RPC service monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110060005.3A CN112769622A (en) 2021-01-18 2021-01-18 Cluster service fault early warning system based on RPC service monitoring

Publications (1)

Publication Number Publication Date
CN112769622A true CN112769622A (en) 2021-05-07

Family

ID=75702271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110060005.3A Pending CN112769622A (en) 2021-01-18 2021-01-18 Cluster service fault early warning system based on RPC service monitoring

Country Status (1)

Country Link
CN (1) CN112769622A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113708967A (en) * 2021-08-26 2021-11-26 中化信息技术有限公司 System monitoring disaster tolerance early warning device and early warning method
CN114500306A (en) * 2021-12-21 2022-05-13 上海赛可出行科技服务有限公司 Monitoring service automatic sampling verification method based on dimensionality
CN115314770A (en) * 2022-08-02 2022-11-08 郑州煤机液压电控有限公司 Fully mechanized coal mining face complete equipment distributed data transmission system and method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150032886A1 (en) * 2011-11-23 2015-01-29 Shen Wang Remote Real-Time Monitoring System based on cloud computing
CN106713014A (en) * 2016-11-30 2017-05-24 华为技术有限公司 Monitored host in monitoring system, the monitoring system and monitoring method
CN107688322A (en) * 2017-08-31 2018-02-13 天津中新智冠信息技术有限公司 A kind of containerization management system
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN108874567A (en) * 2018-07-19 2018-11-23 广州市创乐信息技术有限公司 A kind of service processing method and system
CN109714222A (en) * 2017-10-26 2019-05-03 创盛视联数码科技(北京)有限公司 The distributed computer monitoring system and its monitoring method of High Availabitity
CN110460490A (en) * 2019-07-05 2019-11-15 武汉虹信通信技术有限责任公司 Server cluster based on Internet of Things monitors system and method
CN110912773A (en) * 2019-11-25 2020-03-24 深圳晶泰科技有限公司 Cluster monitoring system and monitoring method for multiple public cloud computing platforms
CN111428109A (en) * 2020-03-25 2020-07-17 浙江知多多网络科技有限公司 Patent early warning system based on patent big data machine learning
CN111539622A (en) * 2020-04-22 2020-08-14 国网信通亿力科技有限责任公司 Collective enterprise project management platform based on cloud platform and micro-service architecture
CN111563018A (en) * 2020-04-28 2020-08-21 北京航空航天大学 Resource management and monitoring method of man-machine-object fusion cloud computing platform

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150032886A1 (en) * 2011-11-23 2015-01-29 Shen Wang Remote Real-Time Monitoring System based on cloud computing
CN106713014A (en) * 2016-11-30 2017-05-24 华为技术有限公司 Monitored host in monitoring system, the monitoring system and monitoring method
CN107688322A (en) * 2017-08-31 2018-02-13 天津中新智冠信息技术有限公司 A kind of containerization management system
CN109714222A (en) * 2017-10-26 2019-05-03 创盛视联数码科技(北京)有限公司 The distributed computer monitoring system and its monitoring method of High Availabitity
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN108874567A (en) * 2018-07-19 2018-11-23 广州市创乐信息技术有限公司 A kind of service processing method and system
CN110460490A (en) * 2019-07-05 2019-11-15 武汉虹信通信技术有限责任公司 Server cluster based on Internet of Things monitors system and method
CN110912773A (en) * 2019-11-25 2020-03-24 深圳晶泰科技有限公司 Cluster monitoring system and monitoring method for multiple public cloud computing platforms
CN111428109A (en) * 2020-03-25 2020-07-17 浙江知多多网络科技有限公司 Patent early warning system based on patent big data machine learning
CN111539622A (en) * 2020-04-22 2020-08-14 国网信通亿力科技有限责任公司 Collective enterprise project management platform based on cloud platform and micro-service architecture
CN111563018A (en) * 2020-04-28 2020-08-21 北京航空航天大学 Resource management and monitoring method of man-machine-object fusion cloud computing platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘一田等: "柔性微服务监控框架", 《计算机系统应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113708967A (en) * 2021-08-26 2021-11-26 中化信息技术有限公司 System monitoring disaster tolerance early warning device and early warning method
CN113708967B (en) * 2021-08-26 2024-04-16 中化信息技术有限公司 System monitoring disaster recovery early warning device and early warning method
CN114500306A (en) * 2021-12-21 2022-05-13 上海赛可出行科技服务有限公司 Monitoring service automatic sampling verification method based on dimensionality
CN114500306B (en) * 2021-12-21 2024-01-09 上海赛可出行科技服务有限公司 Dimension-based monitoring service automatic sampling verification method
CN115314770A (en) * 2022-08-02 2022-11-08 郑州煤机液压电控有限公司 Fully mechanized coal mining face complete equipment distributed data transmission system and method
CN115314770B (en) * 2022-08-02 2023-08-22 郑州恒达智控科技股份有限公司 Fully mechanized coal mining face complete equipment distributed data transmission system and method

Similar Documents

Publication Publication Date Title
CN112769622A (en) Cluster service fault early warning system based on RPC service monitoring
US20210250220A1 (en) Data Collection and Processing Method, Apparatus, and System
CN111970386B (en) Internet of things communication data processing method of intelligent lamp pole
CN102929773A (en) Information collection method and device
CN106612199A (en) Network monitoring data collection and analysis system and method
CN112491593B (en) Network element alarm processing method and device
US10742672B2 (en) Comparing metrics from different data flows to detect flaws in network data collection for anomaly detection
CN112468592B (en) Terminal online state detection method and system based on electric power information acquisition
CN106533791A (en) End-to-end business quality optimization apparatus and method based on big data platform
WO2014008694A1 (en) Signaling monitoring device for implementing ps domain distributed architecture
CN111131332A (en) Network service interconnection and flow acquisition, analysis and recording system
CN111541645A (en) VoIP service knowledge base construction method and system
WO2022052412A1 (en) Violation data identification method and apparatus, and electronic device
CN112817815A (en) Network server fault warning system based on business layer monitoring big data
US20190104084A1 (en) Managing access to logical objects in software defined networks
CN115834699A (en) Service call chain tracking implementation method and system
CN110275815A (en) A kind of system exception alert processing method and device
CN109544727A (en) A kind of cloud computing vehicle trouble statistical analysis technique
CN107733941A (en) A kind of realization method and system of the data acquisition platform based on big data
CN116302862B (en) Monitoring alarm method and system under micro-service architecture
CN116668988A (en) C-V2X unified access gateway and access method based on multi-source sensing equipment
CN113873033B (en) Intelligent edge computing gateway platform with fault-tolerant function
CN110099116B (en) Big data-based subnet security evaluation method
CN103618790A (en) Method and system for obtaining API service
CN113313592A (en) Intelligent service transaction and supervision system based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210507

WD01 Invention patent application deemed withdrawn after publication