CN117033117A - Real-time service monitoring management method, system, electronic equipment and storage medium - Google Patents

Real-time service monitoring management method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN117033117A
CN117033117A CN202310820620.9A CN202310820620A CN117033117A CN 117033117 A CN117033117 A CN 117033117A CN 202310820620 A CN202310820620 A CN 202310820620A CN 117033117 A CN117033117 A CN 117033117A
Authority
CN
China
Prior art keywords
module
service
switching
state
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310820620.9A
Other languages
Chinese (zh)
Inventor
周拓
黄卓杰
张锦秀
黄微
朱渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xuanwu Wireless Technology Co Ltd
Original Assignee
Guangzhou Xuanwu Wireless Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xuanwu Wireless Technology Co Ltd filed Critical Guangzhou Xuanwu Wireless Technology Co Ltd
Priority to CN202310820620.9A priority Critical patent/CN117033117A/en
Publication of CN117033117A publication Critical patent/CN117033117A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a real-time service monitoring and managing method, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module; constructing a switching rule list, and configuring automatic switching rules and switching types in the switching rule list; the automatic switching rules comprise a fault switching rule and a service switching rule; the polling switching rule list determines a current module, and determines an automatic switching rule according to the switching type of the current module; and executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value. The invention can control the risk of the service module after fault to a certain extent, improves the expandability of the service monitoring scheme and can be widely applied to the technical field of service monitoring.

Description

Real-time service monitoring management method, system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of service monitoring technologies, and in particular, to a method, a system, an electronic device, and a storage medium for real-time service monitoring management.
Background
The service monitoring system is an important component for guaranteeing the service stability and reliability of the system, and by collecting index data related to the service monitoring system, system operation and maintenance personnel can check the operation condition of the system in real time, and when the system is abnormal, the system can be tracked and processed in time. The monitoring system based on Prometheus (Prometheus) can actively detect and acquire service scenes in an automatic mode, acquire service system running state indexes in real time, can visually display monitoring data values by taking the Prometheus as a data source by combining Grafana, can configure alarm rules in the Prometheus by taking the Prometheus as a data source by combining with an alert manager, and realizes an alarm notification function in a mail mode and the like; wherein, promethaus is an open source monitoring software; grafana is an open-source data visualization platform; alert manager is a very powerful and critical component in the monitoring system, providing powerful notification delivery capabilities for promethaus.
However, a time-consuming process from system fault to alarm emission to manual intervention treatment is adopted, and since timely response is difficult to achieve after alarm emission is monitored, the uncontrollable factor is a risk potential point for guaranteeing the stability and reliability of the whole service system, and the system operation is possibly unstable.
And the monitoring alarm is currently carried out based on Ptomethus and Prometheuseus rule, and Prometheus rule is a rule language based on Prometheus, and can be used for defining and managing monitoring indexes. The rule flow is complex in the monitoring and alarming of the service system, the service switching logic is realized by the fact that configurable weighting statistics and other operations cannot be performed in specific service scenes, the rule configuration is not flexible enough, and the expandability of the monitoring and alarming is not enough.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent. Therefore, the invention provides a real-time service monitoring management method, a system, electronic equipment and a storage medium, which can control the risk of the service module after the fault to a certain extent and improve the expandability of a service monitoring scheme.
In one aspect, an embodiment of the present invention provides a method for monitoring and managing real-time service, including:
constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module;
constructing a switching rule list, and configuring an automatic switching rule and a switching type in the switching rule list; wherein, the automatic switching rule comprises a fault switching rule and a service switching rule;
polling the switching rule list to determine a current module, and determining the automatic switching rule according to the switching type of the current module;
and executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value.
Optionally, in the step of executing automatic switching and/or alarming on the service module based on the failover rule or the service switching rule in the automatic switching rule according to the monitored data value, the step of executing automatic switching of the module according to the failover rule includes:
acquiring a current module name, an inspection time range, a first judgment threshold value and a target module name;
actively acquiring a monitoring data value from a time sequence database of Prometaus according to the current module name and the checking time range;
comparing the monitoring data value with the first judgment threshold value, determining a first module state of a current module, and updating the first module state into the module state list;
and when the first module state is a fault state, determining whether to execute module switching according to the second module state of the target module.
Optionally, when the first module state is a failure state, determining whether to execute module switching according to the second module state of the target module includes:
querying a second module state of the target module in the module state list;
when the second module state is a fault state, generating first alarm notification information to alarm the target module, and continuing to poll the switching rule list;
and when the second module state is a normal state, replacing the URL forwarding address of the flow inlet with the second URL address of the target module from the first URL address of the current module.
Optionally, in the step of executing automatic switching and/or alarming on the service module based on the failure switching rule or the service switching rule in the automatic switching rule according to the monitored data value, the step of executing automatic switching of the module according to the service switching rule includes:
acquiring a current module name, a service name, an inspection time range, a second judgment threshold value and a target module name;
actively acquiring a monitoring data value from a time sequence database of Prometaus according to the current module name and the checking time range;
inquiring in the module weighting information list to obtain a weighting value according to the current module name and the service name;
determining a weighted monitoring index value according to the monitoring data value and the weighted value;
comparing the weighted monitoring index value with the second judging threshold value to determine a first service state of the current module;
when the first service state is a service switching state, generating second alarm notification information to alarm the current module, and according to a second module state of the name of the target module;
when the second module state is a fault, continuing to poll the switching rule list;
and when the second module state is normal, replacing the URL forwarding address of the flow inlet by the first URL address of the current module and the second URL address of the target module.
Optionally, the method further comprises:
and polling the module state list, checking the module state of each service module in real time, and generating third alarm notification information to alarm when the module state is a fault.
Optionally, the method further comprises:
determining an automatic monitoring program, and adding the automatic monitoring program into a service module when the service module is deployed and started;
acquiring monitoring data of the service module collected by the automatic monitoring program;
generating a monitoring data value according to the monitoring data;
wherein the determining an automated monitoring program comprises:
for the business module, configuring a module name label; the module name label has uniqueness;
for business logic, configuring a business name label; the service name tag has uniqueness in the current service module.
Optionally, the method further comprises:
storing a module request address registered to a server into a timing queue;
according to fixed time intervals, the module request addresses in the timing queue are polled in the form of HTTP protocol requests, and monitoring data of the service module are obtained;
storing the monitoring data into a time sequence database according to a pre-configured index name label; the index name label comprises a module name label and a service name label.
On the other hand, the embodiment of the invention also provides a real-time service monitoring and managing system, which comprises:
the list construction unit is used for constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module; the method is also used for constructing a switching rule list, and automatic switching rules and switching types are configured in the switching rule list; wherein, the automatic switching rule comprises a fault switching rule and a service switching rule;
the rule determining unit is used for polling the switching rule list to determine a current module and determining the automatic switching rule according to the switching type of the current module;
and the rule switching unit is used for executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value.
The system may further comprise a first unit for:
and polling the module state list, checking the module state of each service module in real time, and generating third alarm notification information to alarm when the module state is a fault.
The system may further comprise a second unit for:
determining an automatic monitoring program, and adding the automatic monitoring program into a service module when the service module is deployed and started;
acquiring monitoring data of the service module collected by the automatic monitoring program;
generating a monitoring data value according to the monitoring data;
wherein the determining an automated monitoring program comprises:
for the business module, configuring a module name label; the module name label has uniqueness;
for business logic, configuring a business name label; the service name tag has uniqueness in the current service module.
The system may further comprise a third unit for:
storing a module request address registered to a server into a timing queue;
according to fixed time intervals, the module request addresses in the timing queue are polled in the form of HTTP protocol requests, and monitoring data of the service module are obtained;
storing the monitoring data into a time sequence database according to a pre-configured index name label; the index name label comprises a module name label and a service name label.
The content of the method embodiment of the invention is suitable for the system embodiment, the specific function of the system embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.
In another aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
The content of the method embodiment of the invention is suitable for the electronic equipment embodiment, the functions of the electronic equipment embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method.
In another aspect, embodiments of the present invention also provide a computer storage medium in which a processor-executable program is stored, which when executed by a processor is configured to implement the method as described above.
The content of the method embodiment of the invention is applicable to the computer readable storage medium embodiment, the functions of the computer readable storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method.
The embodiment of the invention has the following beneficial effects: the method comprises the steps that a module state list and a module weighting information list are built for recording the state information of a service module in real time, the module weighting information list is built for recording the weighting information of the module, a switching rule list is built, automatic switching rules and switching types are configured in the switching rule list, the switching rules in the list can be configured as required, and the expandability is high; the polling switching rule list determines the current module, determines an automatic switching rule according to the switching type of the current module, and executes automatic switching and/or alarming on the service module according to the fault switching rule or the service switching rule in the automatic switching rule, so that a fault switching scheme can be configured in advance, and when a fault occurs, the automatic switching of the module is preferentially carried out, and the stable operation of the service is ensured; the method and the device can remind related personnel to perform fault processing maintenance by alarming the fault module, can control risks after the service module is in fault to a certain extent, and improve expandability of a service monitoring scheme.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic step diagram of a real-time service monitoring and managing method according to an embodiment of the present invention;
FIG. 2 is a specific flowchart of steps S300-S400 provided in an embodiment of the present invention;
fig. 3 is a specific flowchart of step S510 provided in an embodiment of the present invention;
FIG. 4 is a flowchart showing steps S610 to S730 according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a real-time service monitoring and managing system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that although functional block diagrams are depicted as block diagrams, and logical sequences are shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the block diagrams in the system. The terms first/S100, second/S200, and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic step diagram of a real-time service monitoring and managing method according to an embodiment of the present invention, where the method includes the following steps:
s100, constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module.
In particular, a module status list is constructed, which may be stored locally for recording status information of the service modules in real time. The module state list records a module name, a module state, and a URL address, and the module state may include a normal state and a fault state.
A module weighting information list is constructed, which may be stored locally for recording specific weighting information of the module. The module weighting information list records a module name, a service name and a weighting value. The module weighting information list provides a web terminal interface, and the configuration of the module weighting information can be carried out on the module weighting information list through the web terminal interface.
S200, constructing a switching rule list, and configuring an automatic switching rule and a switching type in the switching rule list; the automatic switching rules comprise a fault switching rule and a service switching rule.
Specifically, a handoff rule list is constructed, which may be stored locally for recording automatic handoff rules, including failover rules and traffic handoff rules. The switching rule list records switching types (whether fault switching occurs or not), checking time ranges, state judging thresholds, current module names, service names and target module names; the state judgment threshold value can be divided into a first judgment threshold value and a second judgment threshold value, and when the switching type is the fault switching, the state judgment threshold value is the first judgment threshold value; when the switching type is service switching, the state judgment threshold value is a second judgment threshold value.
In some embodiments, the switch type may be configured to be determined by "whether to fail-over" and when the parameter value of the switch type indicates "fail-over," then the switch type is indicated as fail-over; when the parameter value of the handover type indicates "not fail-over", it indicates that the handover type is service handover.
The switching rule list provides a web terminal interface through which automatic switching rules of the switching rule list can be configured.
S300, determining a current module by polling the switching rule list, and determining an automatic switching rule according to the switching type of the current module.
Specifically, referring to fig. 2, fig. 2 is a specific flowchart of steps S300 to S400 provided by the embodiment of the present invention, a switching rule list is polled, a current module is determined, when the switching type of the current module is a fail-over, an automatic switching rule is determined to be a fail-over rule, and a fail-over flow is entered. And when the switching type is service switching, determining the automatic switching rule as service switching, and entering a service switching flow.
S400, executing automatic switching and/or alarming on the service module according to the fault switching rule or the service switching rule in the automatic switching rule.
Referring to fig. 2, in some embodiments, the step of automatically switching modules according to the fail-over rule includes the following steps a) through d):
a) And acquiring the current module name, the checking time range, the first judging threshold value and the target module name.
The method specifically comprises the steps of obtaining a current module name, a checking time range, a first judging threshold and a target module name recorded in a switching rule list by a current module, wherein the parameter of the checking time range is used for determining the time range for obtaining monitoring data, the first judging threshold is used for judging whether the state of the current module is normal, and the target module name is used for determining a target module switched to.
b) And actively acquiring a monitoring data value from the time sequence database of Prometaheus according to the current module name and the checking time range.
Specifically, constructing a PromQL query statement according to the current module name and the checking time range, and actively acquiring a monitoring data value from a time sequence database monitored by Prometheus, wherein the monitoring data value is a monitoring data value in the checking time range; promeQL is a data query language built in Prometheus, among other things, that provides support for time-series data-rich queries, aggregations, and logic computation capabilities.
c) And comparing the monitoring data value with a first judgment threshold value, determining a first module state of the current module, and updating the first module state into a module state list.
In some embodiments, when the monitored data value reaches the first judgment threshold, it is determined that the current module state is a fault state or a normal state, and the configuration can be performed according to actual needs.
d) And when the first module state is a fault state, determining whether to execute the module switching according to the second module state of the target module.
Specifically, determining whether to perform the module switching according to the second module state of the target module includes:
(1) Inquiring a second module state of the target module in the module state list;
(2) And when the second module state is a fault state, generating first alarm notification information to alarm the target module, and returning to the step S300 to continuously poll the switching rule list.
Specifically, in the embodiment of the invention, the alarm notification information can be sent by any notification mode with an alarm notification function such as mail, short message, social software, office platform and the like, and relevant personnel are reminded to carry out cause investigation and fault processing.
(3) And when the second module state is the normal state, replacing the URL forwarding address of the flow inlet with the second URL address of the target module from the first URL address of the current module.
The switching function of the service module is realized through the replacement of the URL address.
In some embodiments, the step of automatically switching modules according to the service switching rules includes the following steps e) to l):
e) And acquiring the current module name, the service name, the checking time range, the second judging threshold value and the target module name.
The method specifically comprises the steps of obtaining a current module name, a checking time range, a first judging threshold value and a target module name recorded in a switching rule list by a current module, wherein the service name is used for obtaining a weighted value, the checking time range is used for determining a time range for obtaining monitoring data, the first judging threshold value is used for judging whether the state of the current module is normal, and the target module name is used for determining a target module switched to.
f) And actively acquiring a monitoring data value from the time sequence database of Prometaheus according to the current module name and the checking time range.
g) And inquiring in the module weighting information list according to the current module name and service name to obtain a weighting value.
h) And determining a weighted monitoring index value according to the monitoring data value and the weighted value. The calculation formula of the weighted monitoring index value is as follows:
weighted monitoring index value=weighted value×monitoring data value
i) And comparing the weighted monitoring index value with a second judging threshold value to determine a first service state of the current module.
The first service state comprises a service switching state and a normal state.
j) And when the first service state is a service switching state, a second module state according to the name of the target module is obtained.
k) When the second module status is failure, return to step S300 to continue polling the switching rule list.
l) when the second module state is normal, replacing the URL forwarding address of the traffic inlet by the first URL address of the current module to the second URL address of the target module.
In some embodiments, the weighted value and the second judgment threshold value may be set according to the actual use condition, for example: the cost of executing the service a is 0.1 element, when the cost of the service needs to be controlled within 10 elements, the weighting value can be configured to be 0.1, the second judgment threshold is 10, so as to determine the first service state, and then whether to switch the service is determined according to the first service state. The configuration of the weighted value and the second judgment threshold is beneficial to monitoring and counting various costs during operation, and the effect of precisely controlling the service cost is achieved.
Referring to fig. 3, an embodiment of the present invention may further include the following step S510:
s510, polling the module state list, checking the module state of each service module in real time, and generating third alarm notification information to alarm when the module state is a fault.
Before constructing each list, referring to fig. 4, the embodiment of the present invention may further include the following steps S610 to S630:
and S610, determining an automatic monitoring program, and adding the automatic monitoring program into the service module when the service module deployment is started.
Selecting indexes to be generated and modules to be monitored specifically, configuring a module name label for a service module, wherein the module name label has uniqueness, and specifically, the module name label has uniqueness in a global range; for service logic, a service name tag is configured, and the service name tag has uniqueness in the current service module.
S620, collecting monitoring data of the business module through an automatic monitoring program.
When the service module is deployed and started, an automatic monitoring program is added, so that non-invasive index buried points can be realized in the service module, the collection of monitoring data can be realized without changing module codes, prometheus Metrics monitoring data is automatically generated in the running process of the service system, and the method is one index type data in Prometaus monitoring.
And S630, generating a monitoring data value according to the monitoring data.
The embodiment of the invention also comprises the following steps S710 to S730:
s710, storing the module request address registered to the server into a timing queue.
The URL address of the service module is registered through Prometheus Client, and the module request address of the registered service module is stored in the timing queue.
S720, according to the fixed time interval, the module request address in the timing queue is polled in the form of HTTP protocol request, and the monitoring data of the service module is obtained.
And polling module addresses in the timing queue in the form of HTTP protocol requests at fixed time intervals to acquire Prometheus Metrics monitoring data of each module, thereby realizing monitoring of the operation condition of the service module.
Illustratively, for example, according to the monitoring data, a "1" is marked when the traffic module is functioning properly, and a "0" is marked when the traffic module is malfunctioning; in other embodiments, other characters may be labeled, as well, and embodiments of the invention are not limited in this regard. The service calling times in the service module can be counted according to the monitoring data, and the monitoring data value corresponding to the service name is increased by 1 when one service is called once.
S730, storing the monitoring data into a time sequence database according to a preset index name label; the index name label comprises a module name label and a service name label.
After the data are stored in the time sequence database, when the modules are switched, the PromQL query statement can be used for carrying out data query according to the index name label.
The embodiment of the invention has the following beneficial effects:
1. according to the embodiment of the invention, the module state list and the module weighting information list are constructed for recording the state information of the service module in real time, the module weighting information list is constructed for recording the weighting information of the module, the switching rule list is constructed, the automatic switching rule and the switching type are configured in the switching rule list, the switching rule in the list can be configured as required, and the expandability is high;
2. the polling switching rule list determines the current module, determines an automatic switching rule according to the switching type of the current module, and executes automatic switching and/or alarming on the service module according to the fault switching rule or the service switching rule in the automatic switching rule, so that a fault switching scheme can be configured in advance, and when a fault occurs, the automatic switching of the module is preferentially carried out, and the stable operation of the service is ensured;
3. the method and the device can remind related personnel to perform fault processing maintenance by alarming the fault module, can control risks after the service module is in fault to a certain extent, and improve expandability of a service monitoring scheme.
4. For specific service, the second judgment threshold can be configured, and the weighting value of the service is calculated by using the weighting calculation function, so that the statistics of various costs during operation is facilitated, and the effect of precisely controlling the service cost is realized.
The following is an application example provided by the embodiment of the present invention:
determining an automatic monitoring program, and adding the automatic monitoring program into a service module when the service module is deployed and started; acquiring monitoring data of a service module collected by an automatic monitoring program; generating a monitoring data value according to the monitoring data; wherein determining an automation monitor includes: for a service module, configuring a module name label; the module name label has uniqueness; for business logic, configuring a business name label; the service name label is provided with a unique construction module state list and a module weighting information list in the current service module; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module;
storing a module request address registered to a server into a timing queue; according to the fixed time interval, the module request address in the timing queue is polled in the form of HTTP protocol request to obtain the monitoring data of the service module; storing the monitoring data into a time sequence database according to a pre-configured index name label; the index name label comprises a module name label and a service name label.
Constructing a switching rule list, and configuring automatic switching rules and switching types in the switching rule list; the automatic switching rules comprise a fault switching rule and a service switching rule; the polling switching rule list determines a current module, and determines an automatic switching rule according to the switching type of the current module; and executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value.
On the other hand, as shown in fig. 5, an embodiment of the present invention provides a real-time traffic monitoring and management system 10, where the real-time traffic monitoring and management system 10 includes:
a list construction unit 11 for constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module; the method is also used for constructing a switching rule list, and automatic switching rules and switching types are configured in the switching rule list; the automatic switching rules comprise a fault switching rule and a service switching rule;
a rule determining unit 12, configured to poll the switching rule list to determine a current module, and determine an automatic switching rule according to a switching type of the current module;
the rule switching unit 13 is configured to perform automatic switching and/or alerting on the service module based on the fail-over rule or the service switching rule in the automatic switching rules according to the monitored data value.
It should be noted that, in some embodiments, the system may further include a first unit 14 configured to:
and polling the module state list, checking the module state of each service module in real time, and generating third alarm notification information to alarm when the module state is a fault.
In some embodiments, the system may further comprise a second unit 15 for:
determining an automatic monitoring program, and adding the automatic monitoring program into a service module when the service module is deployed and started;
acquiring monitoring data of a service module collected by an automatic monitoring program;
generating a monitoring data value according to the monitoring data;
wherein determining an automation monitor includes:
for a service module, configuring a module name label; the module name label has uniqueness;
for business logic, configuring a business name label; the service name tag has uniqueness in the current service module.
In some embodiments, the system may further comprise a third unit 16 for:
storing a module request address registered to a server into a timing queue;
according to the fixed time interval, the module request address in the timing queue is polled in the form of HTTP protocol request to obtain the monitoring data of the service module;
storing the monitoring data into a time sequence database according to a pre-configured index name label; the index name label comprises a module name label and a service name label.
The content of the method embodiment of the invention is suitable for the system embodiment, the specific function of the system embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.
In another aspect, an embodiment of the present invention further provides an electronic device 20, referring to fig. 6, including: a processor 21 and a memory 22; the memory 22 is used for storing programs; the processor 21 executes a program to implement the method as described above.
The content of the method embodiment of the invention is suitable for the electronic equipment embodiment, the functions of the electronic equipment embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method.
In another aspect, embodiments of the present invention also provide a computer storage medium in which a processor-executable program is stored, which when executed by a processor is configured to implement the method as described above.
The content of the method embodiment of the invention is applicable to the computer readable storage medium embodiment, the functions of the computer readable storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims (10)

1. The real-time service monitoring and managing method is characterized by comprising the following steps:
constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module;
constructing a switching rule list, and configuring an automatic switching rule and a switching type in the switching rule list; wherein, the automatic switching rule comprises a fault switching rule and a service switching rule;
polling the switching rule list to determine a current module, and determining the automatic switching rule according to the switching type of the current module;
and executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value.
2. The method according to claim 1, wherein in the step of performing automatic switching and/or alerting on the service module based on a fail-over rule or a service switching rule of the automatic switching rules according to the monitored data values, the step of performing automatic switching of the module according to the fail-over rule comprises:
acquiring a current module name, an inspection time range, a first judgment threshold value and a target module name;
actively acquiring a monitoring data value from a time sequence database of Prometaus according to the current module name and the checking time range;
comparing the monitoring data value with the first judgment threshold value, determining a first module state of a current module, and updating the first module state into the module state list;
and when the first module state is a fault state, determining whether to execute module switching according to the second module state of the target module.
3. The method for monitoring and managing real-time traffic according to claim 2, wherein when the first module status is a failure status, determining whether to execute a module switching according to the second module status of the target module comprises:
querying a second module state of the target module in the module state list;
when the second module state is a fault state, generating first alarm notification information to alarm the target module, and continuing to poll the switching rule list;
and when the second module state is a normal state, replacing the URL forwarding address of the flow inlet with the second URL address of the target module from the first URL address of the current module.
4. The method according to claim 1, wherein in the step of performing automatic switching and/or alerting on the service module based on a fail-over rule or a service switching rule in the automatic switching rules according to the monitored data values, the step of performing automatic switching of the module according to the service switching rule comprises:
acquiring a current module name, a service name, an inspection time range, a second judgment threshold value and a target module name;
actively acquiring a monitoring data value from a time sequence database of Prometaus according to the current module name and the checking time range;
inquiring in the module weighting information list to obtain a weighting value according to the current module name and the service name;
determining a weighted monitoring index value according to the monitoring data value and the weighted value;
comparing the weighted monitoring index value with the second judging threshold value to determine a first service state of the current module;
when the first service state is a service switching state, generating second alarm notification information to alarm the current module, and according to a second module state of the name of the target module;
when the second module state is a fault, continuing to poll the switching rule list;
and when the second module state is normal, replacing the URL forwarding address of the flow inlet by the first URL address of the current module and the second URL address of the target module.
5. The method for monitoring and managing real-time services according to claim 1, further comprising:
and polling the module state list, checking the module state of each service module in real time, and generating third alarm notification information to alarm when the module state is a fault.
6. The method for monitoring and managing real-time services according to claim 1, further comprising:
determining an automatic monitoring program, and adding the automatic monitoring program into a service module when the service module is deployed and started;
acquiring monitoring data of the service module collected by the automatic monitoring program;
generating a monitoring data value according to the monitoring data;
wherein the determining an automated monitoring program comprises:
for the business module, configuring a module name label; the module name label has uniqueness;
for business logic, configuring a business name label; the service name tag has uniqueness in the current service module.
7. The method for monitoring and managing real-time services according to claim 1, further comprising:
storing a module request address registered to a server into a timing queue;
according to fixed time intervals, the module request addresses in the timing queue are polled in the form of HTTP protocol requests, and monitoring data of the service module are obtained;
storing the monitoring data into a time sequence database according to a pre-configured index name label; the index name label comprises a module name label and a service name label.
8. A real-time traffic monitoring management system, comprising:
the list construction unit is used for constructing a module state list and a module weighting information list; the module state list is used for recording the state information of the service module in real time; the module weighting information list is used for recording the weighting information of the module; the method is also used for constructing a switching rule list, and automatic switching rules and switching types are configured in the switching rule list; wherein, the automatic switching rule comprises a fault switching rule and a service switching rule;
the rule determining unit is used for polling the switching rule list to determine a current module and determining the automatic switching rule according to the switching type of the current module;
and the rule switching unit is used for executing automatic switching and/or alarming on the service module based on the fault switching rule or the service switching rule in the automatic switching rule according to the monitoring data value.
9. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program implements the method of any one of claims 1 to 7.
10. A computer storage medium in which a processor executable program is stored, characterized in that the processor executable program is for implementing the method according to any one of claims 1 to 7 when being executed by the processor.
CN202310820620.9A 2023-07-05 2023-07-05 Real-time service monitoring management method, system, electronic equipment and storage medium Pending CN117033117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310820620.9A CN117033117A (en) 2023-07-05 2023-07-05 Real-time service monitoring management method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310820620.9A CN117033117A (en) 2023-07-05 2023-07-05 Real-time service monitoring management method, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117033117A true CN117033117A (en) 2023-11-10

Family

ID=88640285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310820620.9A Pending CN117033117A (en) 2023-07-05 2023-07-05 Real-time service monitoring management method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117033117A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716206A (en) * 2013-12-30 2014-04-09 中国烟草总公司湖南省公司 Service system operation monitoring method and server
CN112148561A (en) * 2020-09-28 2020-12-29 建信金融科技有限责任公司 Service system running state prediction method and device and server
CN113986649A (en) * 2021-09-27 2022-01-28 湖南麒麟信安科技股份有限公司 System monitoring device and method based on prometheus service
CN114302350A (en) * 2021-12-30 2022-04-08 胜斗士(上海)科技技术发展有限公司 Service provider fault switching method and device, electronic equipment and storage medium
CN115586932A (en) * 2022-10-14 2023-01-10 西安雷风电子科技有限公司 Dynamic modification and effect-taking system and method for prometheus rule configuration file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716206A (en) * 2013-12-30 2014-04-09 中国烟草总公司湖南省公司 Service system operation monitoring method and server
CN112148561A (en) * 2020-09-28 2020-12-29 建信金融科技有限责任公司 Service system running state prediction method and device and server
CN113986649A (en) * 2021-09-27 2022-01-28 湖南麒麟信安科技股份有限公司 System monitoring device and method based on prometheus service
CN114302350A (en) * 2021-12-30 2022-04-08 胜斗士(上海)科技技术发展有限公司 Service provider fault switching method and device, electronic equipment and storage medium
CN115586932A (en) * 2022-10-14 2023-01-10 西安雷风电子科技有限公司 Dynamic modification and effect-taking system and method for prometheus rule configuration file

Similar Documents

Publication Publication Date Title
CN108833184B (en) Service fault positioning method and device, computer equipment and storage medium
US8199900B2 (en) Automated performance monitoring for contact management system
US9049105B1 (en) Systems and methods for tracking and managing event records associated with network incidents
US9584395B1 (en) Adaptive metric collection, storage, and alert thresholds
US20060265272A1 (en) System and methods for re-evaluating historical service conditions after correcting or exempting causal events
US20200012990A1 (en) Systems and methods of network-based intelligent cyber-security
US9032247B2 (en) Intermediate database management layer
CN112532435B (en) Operation and maintenance method, operation and maintenance management platform, equipment and medium
Islam et al. Anomaly detection in a large-scale cloud platform
CN105119761B (en) O&M monitoring and solution integrated management system and method
CN113806171A (en) Server health assessment method, system, equipment and medium
CN115860729A (en) IT operation and maintenance integrated management system
CN110417586A (en) Service monitoring method, service node, server and computer readable storage medium
CN114328107A (en) Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment
CN114301817A (en) Equipment monitoring threshold setting method and system based on Netconf protocol
CN117033117A (en) Real-time service monitoring management method, system, electronic equipment and storage medium
US20230359514A1 (en) Operation-based event suppression
CN109460311A (en) The management method and device of firmware abnormality
CN112783906A (en) Log data management method and system for industrial internet
JP2010015246A (en) Failure information analysis management system
TWI644228B (en) Server and monitoring method thereof
CN113835961B (en) Alarm information monitoring method, device, server and storage medium
CN111510351B (en) Anomaly detection method and device based on Promissuris monitoring system
WO2017189249A1 (en) Rule-governed entitlement data structure change notifications
CN113220543A (en) Automatic service alarm method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination