CN112910733A - Full link monitoring system and method based on big data - Google Patents

Full link monitoring system and method based on big data Download PDF

Info

Publication number
CN112910733A
CN112910733A CN202110127207.5A CN202110127207A CN112910733A CN 112910733 A CN112910733 A CN 112910733A CN 202110127207 A CN202110127207 A CN 202110127207A CN 112910733 A CN112910733 A CN 112910733A
Authority
CN
China
Prior art keywords
data
module
node module
repair
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110127207.5A
Other languages
Chinese (zh)
Inventor
傅义平
孙稳
周兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Huaxing Digital Technology Co Ltd
Original Assignee
Shanghai Huaxing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Huaxing Digital Technology Co Ltd filed Critical Shanghai Huaxing Digital Technology Co Ltd
Priority to CN202110127207.5A priority Critical patent/CN112910733A/en
Publication of CN112910733A publication Critical patent/CN112910733A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Abstract

The application provides a big data-based full link monitoring system and a big data-based full link monitoring method. And sending the obtained service data to an observation analysis module, carrying out aggregation calculation on the service data by the observation analysis module through a preset calculation mode, and obtaining state data after the service data is subjected to aggregation calculation. The alarm module generates the state data into alarm information, classifies the state data into a plurality of fault classes according to preset classification, and executes corresponding alarm operation according to the fault classes, wherein the alarm operation can be for example informing related personnel through mails, short messages or telephones and the like. And the system self-healing repair module performs system repair on the application link according to the alarm information.

Description

Full link monitoring system and method based on big data
Technical Field
The application relates to the technical field of application link monitoring, in particular to a full link monitoring system and method based on big data.
Background
Although the application link architecture of the micro-service is not long in birth time, the application link architecture is popular with many enterprises because of being adapted to the culture of high-speed development, agility and the like of the current Internet. In these architectures, various service data such as response duration, throughput, and working condition data of the application link need to be monitored, and the working condition of the application link can be known after monitoring, so that effective measures can be implemented on the application link. While the application link architecture of the microservice brings advantages of flexibility, expansibility, scalability, high availability and the like, the complexity thereof also brings great challenges to the most important monitoring link in operation and maintenance work: how to monitor the service data of each node of the application link and implement corresponding repair on the application link.
Disclosure of Invention
In view of this, the present application provides a system and a method for monitoring a full link based on big data, which solves the problem that the current application link is difficult to monitor and repair.
In a first aspect, the present application provides a big data-based full link monitoring system, including: a data acquisition module connected with the application link and configured to: collecting service data of the application link; an observation analysis module, communicatively coupled to the data acquisition module, configured to: receiving the service data, and carrying out aggregation operation on the service data to obtain state data; the alarm module is in communication connection with the observation analysis module and is configured to: receiving the state data, calculating the state data to generate alarm information, and grading the state data into a plurality of fault grades to perform corresponding alarm operation; and the system self-healing repair module is in communication connection with the alarm module and is configured to: and receiving the alarm information, and performing system repair on the application link based on the alarm information.
With reference to the first aspect, in a possible implementation manner, the observation analysis module is further configured to: classifying the service data according to the dimension index; and deriving the status data from the result of the classification.
With reference to the first aspect, in a possible implementation manner, the observation analysis module is further configured to: and distributing the service data to a data stream server, and classifying the service data in the data stream server according to the dimension indexes.
With reference to the first aspect, in a possible implementation manner, the system self-healing repair module is further configured to: executing corresponding repair operation on the application link according to a preset repair strategy based on the alarm information; or obtaining a new repair strategy by adopting machine learning based on the preset repair strategy, and executing repair operation on the application link according to the new repair strategy.
With reference to the first aspect, in a possible implementation manner, the method further includes: the cluster control module is connected with the application link and is in communication connection with the system self-healing repair module, and is configured to: calling different application program interfaces of the application link; wherein the system self-healing repair module is further configured to: and calling an application program interface by controlling the cluster control module to execute repair operation on the application link.
With reference to the first aspect, in a possible implementation manner, the acquiring service data of the application link includes: accessing each service node module in the application link; collecting the service data in the service node module in a fully decoupled manner; and summarizing the service data and sending the service data to the observation analysis module in batches.
With reference to the first aspect, in a possible implementation manner, the big data based full link monitoring system is applied to a remote control; wherein the service node module of the remote control comprises: the system comprises a login node module, a batch processing node module, a gateway node module, a first middleware node module and a second middleware node module; wherein the login node module is configured to: generating work order information according to a user instruction; receiving the working condition data transferred by the second middleware node module, executing result judgment, recording a judgment result in a work order table and displaying the judgment result to a user; the batch processing node module is in communication connection with the login node module and is configured to: receiving the work order information, detecting whether the equipment is on line or not, and if so, pushing an on-line message to a user; the gateway node module is in communication connection with the login node module and the device respectively, and is configured to: receiving the work order information, sending the work order information to equipment, and collecting equipment working condition data; the first middleware node module is communicatively coupled to the gateway node module and configured to: receiving and consuming the working condition data, and filtering and detecting the working condition data; the second middleware node module is communicatively coupled to the first middleware node module and configured to: receiving the working condition data filtered and detected by the first middleware node module, pushing the working condition data to a user and transferring the working condition data to the login node module; the data acquisition module is connected with each service node module of the remote control and acquires the service data in the service node module.
In a second aspect, the present application provides a big data-based full link monitoring method, which is applied to a big data-based full link monitoring system, and includes the steps of: collecting service data of an application link; performing aggregation operation on the service data to obtain state data; calculating the state data to generate alarm information and carrying out alarm operation; and performing system repair on the application link based on the alarm information.
In a third aspect, the present application provides an electronic device, comprising: a processor; and a memory for storing the processor-executable instructions; the processor is used for executing the big data-based full link monitoring method.
In a fourth aspect, the present application provides a computer-readable storage medium, where the storage medium stores a computer program for executing the above big data-based full link monitoring method.
When the data acquisition module is used, the data acquisition module is responsible for being implanted into an application link, and the application link can be, for example, a micro service and acquires service data of each node module of the micro service. And sending the obtained service data to an observation analysis module, carrying out aggregation calculation on the service data by the observation analysis module through a preset calculation mode, and obtaining state data after the service data is subjected to aggregation calculation. The alarm module generates the state data into alarm information, classifies the state data into a plurality of fault classes according to preset classification, and executes corresponding alarm operation according to the fault classes, wherein the alarm operation can be for example informing related personnel through mails, short messages or telephones and the like. And the system self-healing repair module performs system repair on the application link according to the alarm information.
Drawings
Fig. 1 is a schematic block diagram of a big data based full link monitoring system according to an embodiment of the present disclosure.
Fig. 2 is a schematic block diagram of a big data based full link monitoring system according to another embodiment of the present application.
Fig. 3 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to an embodiment of the present application.
Fig. 4 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to another embodiment of the present application.
Fig. 5 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to another embodiment of the present application.
Fig. 6 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to another embodiment of the present application.
Fig. 7 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to another embodiment of the present application.
Fig. 8 is a schematic diagram illustrating method steps of a big data-based full link monitoring method according to another embodiment of the present application.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In some embodiments, as shown in fig. 1, the present application provides a big data based full link monitoring system, including: the system comprises a data acquisition module 101, an observation and analysis module 102, an alarm module 103 and a system self-healing repair module 104; wherein, the data acquisition module 101 is connected with the application link 105, and is configured to: collecting service data of the application link 105; the observation analysis module 102 is communicatively connected to the data acquisition module 101, and is configured to: receiving service data, and performing aggregation operation on the service data to obtain state data; the alarm module 103 is communicatively connected to the observation and analysis module 102, and is configured to: receiving state data, calculating the state data to generate alarm information, and grading the state data into a plurality of fault grades to perform corresponding alarm operation; the system self-healing repair module 104 is in communication connection with the alarm module 103, and is configured to: alarm information is received and system repair is performed on the application link 105 based on the alarm information.
In use, the data collection module 101 is responsible for being embedded in the application link 105, and the application link 105 may be, for example, a micro service, and collects service data of each node module of the micro service. And sending the obtained service data to the observation and analysis module 102, wherein the observation and analysis module 102 performs aggregation calculation on the service data in a preset calculation mode, and the service data obtains state data after the aggregation calculation. The alarm module 103 then generates the status data as alarm information, classifies the status data into a plurality of fault classes according to preset classifications, and executes corresponding alarm operations according to the fault classes, where the alarm operations may be, for example, notifying related personnel by mail, short message, or telephone. The system self-healing repair module 104 performs system repair on the application link 105 according to the alarm information.
In some embodiments, the observation analysis module is further configured to: classifying the service data according to the dimension indexes; and obtaining status data according to the classification result.
When the embodiment is used, aggregation operation is realized in a classified manner, wherein the dimension index may be: request time, success status, and other tags, etc., the aggregate calculation may be: average calculation, maximum value taking, minimum value taking, addition, proportion calculation and the like, and the state data is finally obtained after the service data is subjected to aggregation calculation. The state data represents the operating state of the application link; for example, when the data acquisition module fails to acquire steps 1, 2 and 3 in the application link, the data acquisition module represents first state data; for example, when the data acquisition module fails to acquire steps 4, 5 and 6 in the application link, the data acquisition module represents second state data; for example, when the data acquisition module fails to acquire steps 7, 8 and 9 in the application link, the data acquisition module represents third state data; different state data can correspond to different alarm information, and then the system self-healing repair module executes corresponding repair operation according to different alarm information.
In some embodiments, the observation analysis module is further configured to: and distributing the service data to a data stream server, and classifying the service data in the data stream server according to the dimension indexes.
In use, the present embodiment may employ a dynamic language for classification. And distributing the service data sent by the data acquisition module to a data stream server, and classifying the service data according to the dimension indexes by using a dynamic language so as to obtain state data.
In some embodiments, the system self-healing repair module is further configured to: executing corresponding repair operation on the application link according to a preset repair strategy based on the alarm information; or, obtaining a new repair strategy by adopting machine learning based on a preset repair strategy, and executing repair operation on the application link according to the new repair strategy.
When the embodiment is used, different corresponding relations between the warning information and the repair policy may be preset, for example, the first warning information corresponds to a restart policy, the second warning information corresponds to an expansion policy, and the third warning information corresponds to an unprocessed policy. Or a new repair strategy can be obtained based on the preset repair strategy through machine learning. After the repair strategy is determined, the relevant interface of the application link is called, and the strategies of restarting, expanding the capacity or not processing the application link are carried out.
In some embodiments, the alert module is further configured to: and classifying the state data into a plurality of fault levels according to preset classification, and executing corresponding alarm operation according to the fault levels.
When the method is used, different fault levels are corresponding to different state data in a preset grading mode, and corresponding warning operation (such as telephone, short message or mail notification to workers) is carried out after the fault levels are obtained according to the state data. For example, the highest fault level may be notified to the relevant person by a telephone notification method, the middle fault level may be notified to the relevant person by a short message notification method, and the low fault level may be notified to the relevant person by an email notification method. By the flexible notification mode, the real-time performance of fault notification is guaranteed, and the fault duration is reduced.
In some embodiments, the big data based full link monitoring system further comprises: the cluster control module is connected with the application link and is in communication connection with the system self-healing repair module and is configured to: calling different application program interfaces of the application link; wherein the system self-healing repair module is further configured to: and calling an application program interface by the control cluster control module to execute repair operation on the application link.
When the embodiment is used, different Application Program Interfaces (APIs) of the application link can be called by using the cluster control module, so that different repair operations are performed on the application link. After the repair strategy is determined, the system self-healing repair module can call the cluster control module, so as to call an application program interface of the application link to realize corresponding repair operation.
In some embodiments, collecting service data of the application link comprises the following procedures: accessing each service node module in the application link; acquiring service data in a service node module in a completely decoupled manner; and summarizing the service data and sending the service data to the observation analysis module in batches.
When the data acquisition module is used, the data acquisition module can adopt a non-invasive agent mode, the data acquisition module can adopt the agent module to access the service data of each node module for acquisition, and the data acquisition module is transparent to service codes and can realize complete decoupling on each node module of an application link. And the data acquisition module uses a Buffer (Buffer) circular queue to summarize the service data, and the summarized service data is sent to the observation analysis module in batch, so that the interaction times can be reduced to reduce the network overhead. When data is sent, data transmission can be performed by using a gRPC (gRPC) -based protocol, and the transmission process is fast and reliable.
In some embodiments, as shown in FIG. 2, the big data based full link monitoring system is applied to a remote control; wherein, the service node module of remote control includes: a login node module 201, a batch processing node module 202, a gateway node module 203, a first middleware node module 204, and a second middleware node module 205; wherein the login node module 201 is configured to: generating work order information according to a user instruction; receiving the working condition data transferred by the second middleware node module 205, executing result judgment, recording the judgment result in the work order sheet and displaying the judgment result to the user; the batch node module 202 is communicatively coupled to the login node module 201 and configured to: receiving work order information, detecting whether the equipment 206 is online or not, and if so, pushing online information to a user; the gateway node module 203 is communicatively connected to the login node module 201 and the device 206, respectively, and is configured to: receiving the work order information, sending the work order information to the equipment 206, and collecting the working condition data of the equipment 206; the first middleware node module 204 is communicatively coupled to the gateway node module 203 and configured to: receiving and consuming working condition data, and filtering and detecting the working condition data; the second middleware node module 205 is communicatively coupled to the first middleware node module 204 and configured to: receiving the working condition data filtered and detected by the first middleware node module 204, pushing the working condition data to a user and transferring the working condition data to the login node module 201; the data acquisition module is accessed to each service node module of the remote control and acquires service data in the service node modules.
When the system is used, the remote control is used as an application link, the data acquisition module acquires data of the remote control, and full link monitoring of the remote control is achieved. Specifically, the complete process of issuing the remote control command includes:
1) a user accesses from the outside and issues an instruction through the login node module 201 to generate work order information for controlling the equipment 206;
2) the login node module 201 pushes the work order information to the batch processing node module 202;
3) the batch processing node module 202 communicates with the gateway node module 203, detects whether the device 206 is online at intervals (e.g., 2 minutes) through the gateway node module 203, and if so, sends an online message to the second middleware node module 205, and the second middleware node module 205 pushes the online message to the user so that the user knows that the device 206 is online;
4) after the gateway node module 203 receives the work order information, if the device 206 is connected to the gateway, the work order information is issued to the device 206, otherwise, the issuing fails;
5) the equipment 206 receives the work order information and executes work corresponding to the work order information;
6) the device 206 uploads the condition data to the gateway node module 203;
7) the gateway node module 203 writes the working condition data into a buffer database corresponding to the first middleware node module 204;
8) the first middleware node module 204 receives and consumes the operating condition data;
9) the first middleware node module 204 filters and detects the working condition data, filters and detects whether the working condition data of remote control exists, then pushes the working condition data to the second middleware node module 205, and the second middleware node module 205 issues and pushes the working condition data to the user;
10) the second middleware node module 205 sends the operating condition data to the sign-in node module 201,
the login node module 201 performs result judgment on the working condition data;
11) the login node module 201 records the judgment result in the work order table and displays the judgment result to the user; based on the remote control instruction issuing process, the data acquisition module buries points and acquires service data in the login node module 201, the batch processing node module 202, the first middleware node module 204 and the gateway node module 203, the data acquisition module can generally adopt an agent access mode to access each node module to bury points and acquire data, and the observation and analysis module 102 performs aggregation analysis on the service data. The process of aggregation analysis may be, for example: the failure in steps 1-5 represents the first status information, the failure in steps 6-8 represents the second status information, the failure in steps 9-11 represents the third status information, and other failure step combinations corresponding to other different status information can be predefined.
In a specific embodiment, the data collection module, the observation and analysis module 102, the alarm module 103, and the system self-healing module 104 may work in the following manners:
within 5 minutes, executing all the steps 1-11 by one remote control instruction, and showing that the instruction issued by the user is successfully executed;
if the device 206 is not on-line, the step 2 is only taken, and the steps 3-11 are executed after the device 206 is on-line. Wherein, the step 1-5 is a process of issuing the work order information corresponding to the user instruction to the equipment 206, and the step 6-11 is a result feedback of the work order information executed by the equipment 206;
when the execution of the work order information fails, the data acquisition module acquires service data, the observation and analysis module 102 performs aggregation calculation on the service data to obtain which step fails, and generates an alarm to be pushed to the alarm module 103;
if the step 1-5 fails, the alarm module 103 marks the alarm as an emergency type, marks the grade as the highest grade, and informs related personnel to process through a telephone;
if the step 6-11 fails, the alarm module 103 marks the alarm as a general type, marks the grade as the highest grade, and informs related personnel of processing through a short message;
the system self-healing repair module 104 selects a repair strategy to perform self-healing according to the operation state of the application link when the execution of the work order information fails; such as: when the execution of the work order information fails, the CPU utilization rate of the corresponding application link reaches 100%, and then a capacity expansion strategy is selected to expand the application link. When the execution of the work order information fails, the heartbeat detection of the corresponding application link is not replied, and then a restart strategy is selected, and the application link is restarted through the cluster control module 106.
In some embodiments, as shown in fig. 3, the present application provides a big data-based full link monitoring method applied to a big data-based full link monitoring system, including the steps of:
step 301, collecting service data of an application link;
step 302, performing aggregation operation on the service data to obtain state data;
step 303, calculating and generating the state data into alarm information and carrying out alarm operation; and
and step 304, performing system repair on the application link based on the alarm information.
When the method is used, the service data of the application link is collected, the state data is obtained through calculation from the service data, the alarm is performed according to the state data, and finally the corresponding system repair is performed on the application link, so that the monitoring on the application link is completed.
In some embodiments, as shown in fig. 4, in the step of collecting service data of the application link, the method further includes the steps of:
step 401, accessing each service node module in an application link;
step 402, collecting service data in a service node module in a complete decoupling mode; and
and step 403, summarizing the service data and sending the service data to the observation and analysis module in batch.
In the implementation of the embodiment, a non-invasive agent mode, such as a Java agent, can be adopted to collect data of each service node module; the monitoring logic is inserted in a byte code modification mode, and at the moment, the data acquisition module is transparent to the service code, so that complete decoupling can be realized. And the service data is summarized by using a Buffer (Buffer) circular queue, and the summarized service data is sent in batch, so that the interaction times can be reduced to reduce the network overhead. When data is sent, data transmission can be performed by using a gRPC (gRPC) -based protocol, and the transmission process is fast and reliable.
In some embodiments, as shown in fig. 5, in the step of deriving the state data from the service data aggregation operation, the method further includes the steps of:
step 501, receiving service data sent in batches;
502, distributing service data to a data stream server;
step 503, classifying the service data in the data stream server according to the dimension index; and
and step 504, obtaining state data according to the classification result.
When the embodiment is used, aggregation operation is performed through a dynamic language, and the used dimension indexes may be: request time, success status, and other tags, etc., the aggregate calculation may be: average calculation, maximum value taking, minimum value taking, addition, proportion calculation and the like, and the state data is finally obtained after the service data is subjected to aggregation calculation. The process of aggregation analysis may be, for example: the failure in steps 1-5 in the process in the application link represents the first state information, the failure in steps 6-8 represents the second state information, the failure in steps 9-11 represents the third state information, and other failure step combinations corresponding to other different state information can be predefined.
In some embodiments, as shown in fig. 6, in the step of calculating and generating the status data as the alarm information and performing the alarm operation, the method further includes the steps of:
step 601, classifying the state data into a plurality of fault classes according to preset classes; and
and step 602, executing corresponding alarm operation according to the fault level.
When the method is used, different state data correspond to different fault levels through preset grading, and after the fault levels are obtained according to the state data, corresponding alarm operation (such as telephone, short message or mail notification to workers) is carried out. For example, the highest fault level may be notified to the relevant person by a telephone notification method, the middle fault level may be notified to the relevant person by a short message notification method, and the low fault level may be notified to the relevant person by an email notification method. By the flexible notification mode, the real-time performance of fault notification is guaranteed, and the fault duration is reduced.
In some embodiments, as shown in fig. 7, in the step of performing system repair on the application link based on the alarm information, the method further includes the steps of:
step 701, calling a corresponding repair strategy according to the alarm information;
step 702, calling a corresponding application program interface according to the repair strategy; and
and step 703, controlling the called application program interface to execute system repair on the application link.
When the embodiment is used, corresponding relationships between different warning information and different repair policies are generally preset, for example, a first warning information corresponds to a restart policy, a second warning information corresponds to an expansion policy, a third warning information corresponds to an unprocessed policy, or other corresponding relationships.
In some embodiments, as shown in fig. 8, in the step of performing system repair on the application link based on the alarm information, the method further includes the steps of:
step 801, calling a corresponding repair strategy according to the alarm information;
step 802, training by adopting machine learning and based on the called repair strategy to obtain a newly-built repair strategy;
step 803, calling a corresponding application program interface according to the newly-built repair strategy; and
and step 804, controlling the called application program interface to execute system repair on the application link.
When the method is used, a new repair strategy can be obtained through machine learning based on a preset repair strategy. After the repair strategy is determined, the relevant interface of the application link is called, and the strategies of restarting, expanding the capacity or not processing the application link are carried out. A new repairing strategy is obtained through machine learning, so that the expansibility of the application is stronger, and the method can be applied to more application scenes.
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 9. Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 9, the electronic device 90 includes one or more processors 901 and memory 902.
The processor 901 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 90 to perform desired functions.
Memory 902 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 901 to implement the big data based full link monitoring method of the various embodiments of the present application above or other desired functions. Various contents such as service data may also be stored in the computer-readable storage medium.
In one example, the electronic device 90 may further include: an input device 903 and an output device 904, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 903 may include, for example, a keyboard, a mouse, and the like.
The output device 904 may output various information to the outside, including the determined service data, condition data, work order information, and the like. The output 904 may include, for example, a display, a communication network, a remote output device connected thereto, and so forth.
Of course, for simplicity, only some of the components of the electronic device 90 relevant to the present application are shown in fig. 9, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 90 may include any other suitable components, depending on the particular application.
In addition to the above methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the big data based full link monitoring method according to various embodiments of the present application described in the present specification.
The computer program product may include program code for carrying out operations for embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps in the big data based full link monitoring method according to various embodiments of the present application.
A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to". It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modifications, equivalents and the like that are within the spirit and principle of the present application should be included in the scope of the present application.

Claims (10)

1. A big data-based full link monitoring system, comprising:
a data acquisition module connected with the application link and configured to: collecting service data of the application link;
an observation analysis module, communicatively coupled to the data acquisition module, configured to: receiving the service data, and carrying out aggregation operation on the service data to obtain state data;
the alarm module is in communication connection with the observation analysis module and is configured to: receiving the state data, calculating the state data to generate alarm information, and grading the state data into a plurality of fault grades to perform corresponding alarm operation; and
the system self-healing repair module is in communication connection with the alarm module and is configured to: and receiving the alarm information, and performing system repair on the application link based on the alarm information.
2. The big-data based full-link monitoring system according to claim 1,
the observation analysis module is further configured to: classifying the service data according to the dimension index; and deriving the status data from the result of the classification.
3. The big-data based full-link monitoring system according to claim 2,
the observation analysis module is further configured to: and distributing the service data to a data stream server, and classifying the service data in the data stream server according to the dimension indexes.
4. The big-data based full-link monitoring system according to claim 3,
the system self-healing repair module is further configured to: executing corresponding repair operation on the application link according to a preset repair strategy based on the alarm information; or obtaining a new repair strategy by adopting machine learning based on the preset repair strategy, and executing repair operation on the application link according to the new repair strategy.
5. The big-data based full-link monitoring system according to claim 4, further comprising:
the cluster control module is connected with the application link and is in communication connection with the system self-healing repair module, and is configured to: calling different application program interfaces of the application link;
wherein the system self-healing repair module is further configured to: and calling an application program interface by controlling the cluster control module to execute repair operation on the application link.
6. The big-data based full-link monitoring system according to claim 5, wherein the collecting service data of the application link comprises:
accessing each service node module in the application link;
collecting the service data in the service node module in a fully decoupled manner; and
and summarizing the service data and sending the service data to the observation analysis module in batches.
7. The big data based full link monitoring system according to claim 6, applied to a remote control;
wherein the service node module of the remote control comprises: the system comprises a login node module, a batch processing node module, a gateway node module, a first middleware node module and a second middleware node module;
wherein the login node module is configured to: generating work order information according to a user instruction; receiving the working condition data transferred by the second middleware node module, executing result judgment, recording a judgment result in a work order table and displaying the judgment result to a user;
the batch processing node module is in communication connection with the login node module and is configured to: receiving the work order information, detecting whether the equipment is on line or not, and if so, pushing an on-line message to a user;
the gateway node module is in communication connection with the login node module and the device respectively, and is configured to: receiving the work order information, sending the work order information to equipment, and collecting equipment working condition data;
the first middleware node module is communicatively coupled to the gateway node module and configured to: receiving and consuming the working condition data, and filtering and detecting the working condition data;
the second middleware node module is communicatively coupled to the first middleware node module and configured to: receiving the working condition data filtered and detected by the first middleware node module, pushing the working condition data to a user and transferring the working condition data to the login node module;
the data acquisition module is connected with each service node module of the remote control and acquires the service data in the service node module.
8. A big data-based full link monitoring method applied to the big data-based full link monitoring system of claim 1, comprising the steps of:
collecting service data of an application link;
performing aggregation operation on the service data to obtain state data;
calculating the state data to generate alarm information and carrying out alarm operation; and
and performing system repair on the application link based on the alarm information.
9. An electronic device, characterized in that the electronic device comprises:
a processor; and
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the big data based full link monitoring method of claim 8.
10. A computer-readable storage medium, wherein the storage medium stores a computer program for executing the big data based full link monitoring method according to claim 8.
CN202110127207.5A 2021-01-29 2021-01-29 Full link monitoring system and method based on big data Withdrawn CN112910733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110127207.5A CN112910733A (en) 2021-01-29 2021-01-29 Full link monitoring system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110127207.5A CN112910733A (en) 2021-01-29 2021-01-29 Full link monitoring system and method based on big data

Publications (1)

Publication Number Publication Date
CN112910733A true CN112910733A (en) 2021-06-04

Family

ID=76121246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110127207.5A Withdrawn CN112910733A (en) 2021-01-29 2021-01-29 Full link monitoring system and method based on big data

Country Status (1)

Country Link
CN (1) CN112910733A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710430A (en) * 2022-04-06 2022-07-05 深圳依时货拉拉科技有限公司 Bidirectional communication control method, computer readable storage medium and computer device
CN115051922A (en) * 2022-05-09 2022-09-13 中国联合网络通信集团有限公司 Link control method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079466A2 (en) * 1999-06-23 2000-12-28 Visicu, Inc. Telemedical expert service provision for intensive care units
US20160284084A1 (en) * 2015-03-23 2016-09-29 Ohio State Innovation Foundation System and method for segmentation and automated measurement of chronic wound images
CN106802854A (en) * 2017-02-22 2017-06-06 郑州云海信息技术有限公司 A kind of failure monitoring system of multi controller systems
CN108494590A (en) * 2018-03-15 2018-09-04 苏州思必驰信息科技有限公司 A kind of big data data quality monitoring method and device end to end
CN108710545A (en) * 2018-03-23 2018-10-26 上海精鲲计算机科技有限公司 A kind of remote monitoring fault self-recovery system
US10217066B1 (en) * 2017-08-28 2019-02-26 Deere & Company Methods and apparatus to monitor work vehicles and to generate worklists to order the repair of such work vehicles should a machine failure be identified
CN109447290A (en) * 2018-11-21 2019-03-08 国网浙江电动汽车服务有限公司 A kind of charging station intelligent fault O&M method
CN110138600A (en) * 2019-04-28 2019-08-16 北京大米科技有限公司 A kind of prompt information output method, device, storage medium and server

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079466A2 (en) * 1999-06-23 2000-12-28 Visicu, Inc. Telemedical expert service provision for intensive care units
US20160284084A1 (en) * 2015-03-23 2016-09-29 Ohio State Innovation Foundation System and method for segmentation and automated measurement of chronic wound images
CN106802854A (en) * 2017-02-22 2017-06-06 郑州云海信息技术有限公司 A kind of failure monitoring system of multi controller systems
US10217066B1 (en) * 2017-08-28 2019-02-26 Deere & Company Methods and apparatus to monitor work vehicles and to generate worklists to order the repair of such work vehicles should a machine failure be identified
CN108494590A (en) * 2018-03-15 2018-09-04 苏州思必驰信息科技有限公司 A kind of big data data quality monitoring method and device end to end
CN108710545A (en) * 2018-03-23 2018-10-26 上海精鲲计算机科技有限公司 A kind of remote monitoring fault self-recovery system
CN109447290A (en) * 2018-11-21 2019-03-08 国网浙江电动汽车服务有限公司 A kind of charging station intelligent fault O&M method
CN110138600A (en) * 2019-04-28 2019-08-16 北京大米科技有限公司 A kind of prompt information output method, device, storage medium and server

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710430A (en) * 2022-04-06 2022-07-05 深圳依时货拉拉科技有限公司 Bidirectional communication control method, computer readable storage medium and computer device
CN115051922A (en) * 2022-05-09 2022-09-13 中国联合网络通信集团有限公司 Link control method and device, electronic equipment and storage medium
CN115051922B (en) * 2022-05-09 2023-07-18 中国联合网络通信集团有限公司 Link control method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021008031A1 (en) Processing method for implementing monitoring intellectualization on the basis of micro-services, and electronic device
CN105357038B (en) Monitor the method and system of cluster virtual machine
CN111049705B (en) Method and device for monitoring distributed storage system
CN110231998B (en) Detection method and device for distributed timing task and storage medium
CN112910733A (en) Full link monitoring system and method based on big data
CN113505044B (en) Database warning method, device, equipment and storage medium
CN112596975A (en) Method, system, equipment and storage medium for monitoring network equipment
US11897527B2 (en) Automated positive train control event data extraction and analysis engine and method therefor
CN114244683A (en) Event classification method and device
CN103763143A (en) Method and system for equipment abnormality alarming based on storage server
CN115001989A (en) Equipment early warning method, device, equipment and readable storage medium
CN107612755A (en) The management method and its device of a kind of cloud resource
CN113590437A (en) Alarm information processing method, device, equipment and medium
CN103823743A (en) Monitoring method and monitoring device of software system
CN116483663A (en) Abnormality warning method and device for platform
CN110750425A (en) Database monitoring method, device and system and storage medium
CN116594840A (en) Log fault acquisition and analysis method, system, equipment and medium based on ELK
CN115222181B (en) Robot operation state monitoring system and method
CN115374088A (en) Database health degree analysis method, device and equipment and readable storage medium
CN110677271B (en) Big data alarm method, device, equipment and storage medium based on ELK
CN112214437B (en) Storage device, communication method and device and computer readable storage medium
CN113626869A (en) Data processing method, system, electronic device and storage medium
CN113656239A (en) Monitoring method and device for middleware and computer program product
CN114090293A (en) Service providing method and electronic equipment
CN113612622B (en) Method and device for alarming each module under network operating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210604

WW01 Invention patent application withdrawn after publication