CN110535713A - Monitoring management system and method for managing and monitoring - Google Patents

Monitoring management system and method for managing and monitoring Download PDF

Info

Publication number
CN110535713A
CN110535713A CN201810509664.9A CN201810509664A CN110535713A CN 110535713 A CN110535713 A CN 110535713A CN 201810509664 A CN201810509664 A CN 201810509664A CN 110535713 A CN110535713 A CN 110535713A
Authority
CN
China
Prior art keywords
monitoring
queue
information
message queue
monitoring information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810509664.9A
Other languages
Chinese (zh)
Other versions
CN110535713B (en
Inventor
杨猛
邵利铎
鹿慧
何栋
于灏
欧创新
王路远
王磊
刘松
王龙涛
刘皓
刘震
蔡雨佳
张娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PICC PROPERTY AND CASUALTY Co Ltd
Original Assignee
PICC PROPERTY AND CASUALTY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PICC PROPERTY AND CASUALTY Co Ltd filed Critical PICC PROPERTY AND CASUALTY Co Ltd
Priority to CN201810509664.9A priority Critical patent/CN110535713B/en
Publication of CN110535713A publication Critical patent/CN110535713A/en
Application granted granted Critical
Publication of CN110535713B publication Critical patent/CN110535713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 

Abstract

This application discloses a kind of monitoring management systems, comprising: message queue cluster, monitoring management platform, database;The message queue cluster, including at least one message queue gateway and multiple message queue nodes;The monitoring information of gateway monitors data collection agent programmed acquisition message queue gateway, and the monitoring information of the message queue gateway of acquisition is reported into monitoring management platform, the monitoring information of monitoring nodes data collection agent programmed acquisition message queue node, and the monitoring information of the message queue node of acquisition is reported into the monitoring management platform;The monitoring management platform obtains monitored results, and monitoring information is stored to database for analyzing the monitoring information;The database, the monitoring information provided for storing the monitoring management platform.Using above system, operation conditions, the health status of current message queue cluster, the O&M guarantee of great lifting system are summarized by mechanism such as information collection, analyses.

Description

Monitoring management system and method for managing and monitoring
Technical field
This application involves field of computer technology, and in particular to a kind of monitoring management system.The application is related to one kind simultaneously Method for managing and monitoring.
Background technique
Message Queuing Middleware product is a general message queue product, is widely used in data distribution, message The scenes such as interaction.
Under the prior art, through carrying out interacting message frequently with message-oriented middleware product (for example, IBM MQ), but it has been the absence of The operation of standby monitoring management system monitoring message queue cluster.During using message queue technology, one is mainly asked Topic is exactly that can not find in time when message queue cluster breaks down, and related O&M operation is carried out, for example, piling in queue In the case where or MQ cluster when break down, can not find these exceptions in time, the reliability for resulting in system is low, It is maintainable low.
The prior art has that reliability is low, maintainable low when using Message Queuing Middleware product.
Summary of the invention
The application provides a kind of monitoring management system and method for managing and monitoring, existing using among message queue to solve There is a problem of that reliability is low, maintainable low when part product.
The application provides a kind of monitoring management system, comprising: message queue cluster, monitoring management platform, database;
The message queue cluster, including at least one message queue gateway and multiple message queue nodes;
The message queue gateway is used for the load according to the message queue node, will dock the message point of application system Message queue node is issued, the message queue gateway runs gateway monitors data collection agent program, the gateway monitors number It is used to acquire the monitoring information of message queue gateway according to Collection agent program, and by the monitoring information of the message queue gateway of acquisition Report the monitoring management platform;
The message queue node is used to receive the message of the docking application system of message queue gateway offer, to described right The message for connecing application system is handled, or the message of the docking application system is stored in the form of message queue, described Message queue node runs monitoring nodes data collection agent program, and the monitoring nodes data collection agent program is for acquiring The monitoring information of message queue node, and the monitoring information of the message queue node of acquisition is reported into the monitoring management platform;
The monitoring management platform, for obtaining gateway monitors data collection agent program and monitoring nodes data acquisition generation The monitoring information that reason program reports, analyzes the monitoring information, obtains monitored results, and the monitoring information is stored To the database;
The database, the monitoring information provided for storing the monitoring management platform.
Optionally, the gateway monitors data collection agent program is specifically used for the monitoring of timing acquiring message queue gateway Information, and the monitoring information of the message queue gateway of acquisition is reported into the monitoring management platform.
Optionally, the monitoring nodes data collection agent program is specifically used for the monitoring of timing acquiring message queue node Information, and the monitoring information of the message queue node of acquisition is reported into the monitoring management platform.
Optionally, the monitoring management platform includes:
Early warning submodule, for determining the need for early warning according to the monitored results to the monitoring information, if so, Carry out early warning processing;
Statistics and displaying submodule, for carrying out multidimensional statistics and displaying to the monitoring information in the database.
Optionally, the early warning submodule, is specifically used for:
For reaching the early-warning conditions for monitoring information setting when the monitored results to the monitoring information When threshold value, pass through SMS or mail informing system administrator;Alternatively, issuing image or sound early warning or alarm.
Optionally, the early warning submodule is also used to be arranged the warning level of monitoring information, and different warning levels is corresponding The different early-warning conditions threshold values for monitoring information setting.
Optionally, the statistics and displaying submodule, comprising:
MQ group system operating condition statistics and show submodule, for MQ group system operating condition carry out statistics and It shows;Alternatively,
Condition of hardware statistics and displaying submodule, for hardware health status to be counted and shown;Alternatively,
Queue management device statistics and displaying submodule, for being counted and being shown to queue management device.
Optionally, the running situation statistics and displaying submodule are specifically used for showing docking application business system Topological diagram.
Optionally, the queue management device statistics and displaying submodule, are specifically used for:
Queue management device data volume item number is counted and shown;Alternatively,
Queuing message in queue management device is counted and shown;Alternatively,
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;Or Person,
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases into Row statistics and displaying.
Optionally, described that queue management device is counted and shown, comprise at least one of the following mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.
Optionally, the monitoring information, including following at least one information:
Queue management device state;Message channel state;Message queue information;Error queue information;Dead letter queue information;Team Column statistical information.
Optionally, the priority-queue statistic information includes: that queuing data flows to information and/or data traffic.
In addition the application provides a kind of method for managing and monitoring, applied to above-mentioned monitoring management system, which comprises
Message queue gateway by the gateway monitors data collection agent program that operates on the message queue gateway to Monitoring management platform reports the monitoring information of itself;
Message queue node by the monitoring nodes data collection agent program that operates on the message queue node to Monitoring management platform reports the monitoring information of itself;
The monitoring management platform analyzes the monitoring information, obtains the monitored results to the monitoring information, And the monitoring information is stored to database.
Optionally, the message queue gateway is acquired by the gateway monitors data operated on the message queue gateway Broker program reports the monitoring information of itself to monitoring management platform, comprising:
The message queue gateway passes through the gateway monitors data collection agent journey that operates on the message queue gateway Sequence reports the monitoring information of itself to monitoring management platform timing.
Optionally, message queue node passes through the monitoring nodes data collection agent that operates on the message queue node Program reports the monitoring information of itself to monitoring management platform, comprising:
The message queue node passes through the monitoring nodes data collection agent journey that operates on the message queue node Sequence reports the monitoring information of itself to monitoring management platform timing.
Optionally, further includes:
Alarm or early warning are determined the need for according to the monitored results to the monitoring information.
Optionally, alarm or early warning are determined the need for according to the monitored results to the monitoring information, comprising:
Judge whether the monitored results to the monitoring information reach the alarm bar for monitoring information setting Part, if so, carrying out alert process;Or
Judge whether the monitored results to the monitoring information reach the early warning item for monitoring information setting Part threshold value, if so, carrying out early warning processing.
Optionally, further includes:
The warning level of monitoring information is set, and different warning levels corresponds to different for monitoring information setting Early-warning conditions threshold value.
Optionally, further includes:
Multidimensional statistics and displaying are carried out to the monitoring information in the database.
Optionally, the monitoring information in the database carries out multidimensional statistics and displaying, comprising:
MQ group system operating condition is counted and shown;Alternatively,
Hardware health status is counted and shown;Alternatively,
Queue management device is counted and is shown.
Optionally, it is described to MQ group system operating condition carry out statistics and displaying include:
The topological diagram of docking application business system is shown.
It is optionally, described that queue management device is counted and shown, comprising:
Queue management device data volume item number is counted and shown;Alternatively,
Queuing message in queue management device is counted and shown;
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases into Row statistics and displaying.
Optionally, described that queue management device is counted and shown, comprise at least one of the following mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.
Optionally, the monitoring information, including following at least one information:
Queue management device state;Message channel state;Message queue information;Error queue information;Dead letter queue information;Team Column statistical information.
Optionally, the priority-queue statistic information includes: that queuing data flows to information and/or data traffic.
Compared with prior art, the application has the following advantages:
Monitoring management system and method for managing and monitoring provided by the present application, pass through message queue gateway and message queue section Point reports the monitoring information of itself to the monitoring management platform, and monitoring management platform is analyzed the monitoring information, obtained The problem of obtaining monitored results, can timely finding message queue cluster, and respective handling is carried out, it is adopted by information The mechanism such as collection, analysis summarize the operation conditions of current message queue cluster, health status, and the O&M of great lifting system is protected Barrier.
Detailed description of the invention
Fig. 1 is a kind of schematic diagram for monitoring management system that the application first embodiment provides.
Fig. 2 is that the monitoring that the monitoring agent program that the application first embodiment provides obtains monitoring information, and will acquire is believed Breath is sent to the schematic diagram of monitoring management platform.
Fig. 3 is a kind of functional schematic for monitoring management platform that the application first embodiment provides.
Fig. 4 is that the monitoring management platform that the application first embodiment provides sends warning information to system manager's mailbox Schematic diagram.
Fig. 5 is the signal that queue management device data volume item number is counted and shown that the application first embodiment provides Figure.
Fig. 6 be the application first embodiment provide the queue message in queue management device is counted and is shown show It is intended to.
Fig. 7 be the application first embodiment provide to queue management device be put into successfully, be put into failure, take out successfully, take The schematic diagram that the number of data to fail out is counted.
Fig. 8 be the application first embodiment provide queue management device in each queue be put into successfully, be put into failure, Take out successfully, take out failure data cases schematic diagram.
It is the schematic diagram of the topological diagram for the displaying docking application business system that the application first embodiment provides shown in Fig. 9.
Figure 10 is a kind of flow chart for method for managing and monitoring that the application second embodiment provides.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where Under do similar popularization, therefore the application is not limited by following public specific implementation.
The application first embodiment provides a kind of monitoring management system, the message queue collection in the application first embodiment Group is introduced by taking MQ (IBM MQ) cluster as an example.It is carried out below in conjunction with Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7 and Fig. 8 detailed It describes in detail bright.
The system comprises: message queue cluster 101, monitoring management platform 102, database 103.
The message queue cluster 101,
The message queue cluster, including at least one message queue gateway and multiple message queue nodes;
The message queue gateway is used for the load according to the message queue node, will dock the message point of application system Message queue node is issued, the message queue gateway runs gateway monitors data collection agent program, the gateway monitors number It is used to acquire the monitoring information of message queue gateway according to Collection agent program, and by the monitoring information of the message queue gateway of acquisition Report the monitoring management platform;
The message queue node is used to receive the message of the application system of message queue gateway offer, to the application system The message of system is handled, or the message of the application system is stored in the form of message queue, the message queue node Monitoring nodes data collection agent program is run, the monitoring nodes data collection agent program is for acquiring message queue node Monitoring information, and the monitoring information of the message queue node of acquisition is reported into the monitoring management platform.
It should be noted that gateway monitors data collection agent program and monitoring nodes data collection agent program can adopt With identical program, different programs can also be used.
As shown in Figure 1, MQ gateway server 1 (message queue gateway) and MQ gateway server 2 are message queue gateway.
The MQ gateway server refers to the message queue gateway server in MQ cluster, as the gateway of entire MQ cluster, Mainly application-oriented connection request, by load-balancing mechanism, the message data for docking application system will pass through message queue Gateway server is distributed to MQ node.
The monitoring information, including hardware information: cpu busy percentage, disk service condition, file size, process, network etc. Information;It further include following information: queue management device state;Message channel state;Message queue information;Error queue information;Extremely Believe queuing message;Priority-queue statistic information etc..Wherein, the priority-queue statistic information includes: that queuing data flows to information and/or data Flow information etc..
As shown in Figure 1, the gateway monitors data collection agent journey run on MQ gateway server 1 and MQ gateway server 2 Sequence (Agent) can acquire monitoring information, and the monitoring information of the message queue gateway of acquisition is reported monitoring management platform.It is excellent Choosing, gateway monitors data collection agent program can be with timing acquiring monitoring information, and by the prison of the message queue gateway of acquisition Information reporting monitoring management platform is controlled, so that monitoring management platform can locate abnormal conditions according to monitoring information in time Reason, and realize the real time monitoring to system.Wherein, gateway monitors data collection agent program is to operate in message queue gateway On software program, various monitoring informations can be acquired from message queue gateway and the monitoring information of acquisition is sent to monitoring Manage platform.As shown in Fig. 2, gateway monitors data collection agent program obtains monitoring information by calling api interface, and will obtain The monitoring information taken is sent to monitoring management Platform Server end, wherein system hardware information (information such as memory, CPU, disk) Inquiry acquisition, the SDK API that the relevant information of MQ cluster is provided by IBM MQ are mainly carried out by the api interface of operating system Interface obtain, the MQ data of acquisition mainly include: MQ cluster, MQ queue management device, queue, queue depth, message channel, IP, Port is put into and takes out the information such as item number, time.
As shown in Figure 1, MQ clustered node 1, MQ clustered node 2, MQ clustered node 3, MQ clustered node 4 are message queue section Point.
The monitoring nodes data run on MQ clustered node 1, MQ clustered node 2, MQ clustered node 3 and MQ clustered node 4 Collection agent program (Agent) can acquire the monitoring information of message queue node, and by the prison of the message queue node of acquisition Control information reporting monitoring management platform.Preferably, monitoring nodes data collection agent program can be with timing acquiring message queue section The monitoring information of point, and the monitoring information of the message queue node of acquisition is reported into monitoring management platform, so that monitoring pipe pats Platform can be handled abnormal conditions according to monitoring information in time, and realize the real time monitoring to message queue nodes.Wherein, Monitoring nodes data collection agent program is the software program operated on message queue node, can be from message queue node It is upper to acquire various monitoring informations and the monitoring information of acquisition is sent to monitoring management platform.
The monitoring management platform 103 is adopted for obtaining gateway monitors data collection agent program and monitoring nodes data The monitoring information that reports of collection broker program, analyzes the monitoring information, obtains monitored results, and by the monitoring information Store the database.Fig. 3 shows the functional schematic of monitoring management platform.
The monitoring management platform may include:
Alarm submodule, for being alarmed according to the monitored results of the monitoring information;
Early warning submodule, for determining the need for early warning according to the monitored results to the monitoring information, if so, Carry out early warning processing;
Statistics and displaying submodule, for carrying out multidimensional statistics and displaying to the monitoring information in the database.
Alarm submodule, can alarm when monitoring information is abnormal.When the monitored results of monitoring information When meeting alert if, alarm.For example, can alarm when the monitored results of monitoring information meet following condition: Memory usage is more than 95%;CPU usage is more than 95%;Hard disk utilization rate is more than 95%;There are data for dead letter queue;Occur " message channel is unavailable " information.
Early warning submodule can give warning in advance.When the monitored results to monitoring information reach for monitoring information When the early-warning conditions threshold value of setting, SMS or mail informing system administrator can be passed through;Alternatively, issuing image or sound Early warning or alarm.
Since the corresponding monitored item of different monitoring information is different, early warning that can be different to different monitoring information setting Condition threshold.When reaching the early-warning conditions threshold value of setting, early warning or alert process are carried out.
For example, when being 90,000 for the item number that some message queue setting early-warning conditions threshold value is message, then when this message Early warning is needed when message count is greater than or equal to 90,000 in queue;It is interior for a certain MQ node setting memory early-warning conditions threshold value Depositing occupancy is 85%, then when EMS memory occupation 85%, needs early warning.
It should be noted that it is different to the early-warning conditions threshold value of different messages queue setting, for example, sensitive to real-time Message queue, the item number that early-warning conditions threshold value can be set to message is 10;To the biggish message queue of data volume, early warning item The item number that part threshold value can be set to message is 2000.
Preferably, the early warning submodule is also used to be arranged the warning level of monitoring information, and different warning levels is corresponding The different early-warning conditions threshold values for monitoring information setting.For example, different early warning items cpu busy percentage can be arranged The early-warning conditions threshold value of part threshold value, cpu busy percentage is respectively set to 70%, 80%, 90%, respectively corresponds level-one early warning, second level Early warning, three-level early warning.
Preferably, in order to make system manager, (system manager and each docking application including this monitoring management system are Unite administrator) understand early warning or warning message in real time, monitoring management platform can with the phone number of binding system administrator and/or Email address.Fig. 4 shows the schematic diagram that monitoring management platform sends warning information to system manager's mailbox.
Preferably, advisory information is carried in early warning or alert process.
Illustrate below with reference to scene when reaching the early-warning conditions threshold value of setting, carries out the important of early warning or alert process Property.
For example, carrying out the reception and forwarding of message using MQ cluster in the data sorting platform of insurance industry.Assuming that setting The message queue for having set declaration form of accepting insurance, it is 100,000 that the item number that its early-warning conditions threshold value is message, which is arranged, if Claims Resolution is pending Declaration form concentrates transmission primary daily, it is likely that impact is brought to the message queue for declaration form of accepting insurance, when message count in this message queue Early warning is carried out when more than or equal to 100,000, if continuous a period of time (such as 5 days) has carried out early warning, can be believed in early warning Dilatation suggestion is carried in breath, so as to system manager can timely dilatation, guarantee the normal operation of system, improve system can By property.
The statistics and displaying submodule, comprising:
MQ group system operating condition statistics and show submodule, for MQ group system operating condition carry out statistics and It shows;Alternatively,
Condition of hardware statistics and displaying submodule, for hardware health status to be counted and shown;Alternatively,
Queue management device statistics and displaying submodule, for being counted and being shown to queue management device.
The queue management device statistics and displaying submodule, are specifically used for:
Queue management device data volume item number is counted and shown;Alternatively,
Queue message in queue management device is counted and shown;Alternatively,
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;Or Person,
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases into Row statistics and displaying.
As shown in figure 5, queue management device data volume item number is counted and is shown, for example, newly protecting queue management device again Data volume item number is 420573.
It is described that queue management device is counted and shown, comprise at least one of the following mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.Such as Fig. 6 institute Show, the queue message in queue management device is counted and shown, for example, team can be counted and be shown daily or by minute The queue message of queue QAREINS001, QAREINS001, QAREINS001 of column manager QMGWC management.
As shown in fig. 7, it illustrates to queue management device be put into successfully, be put into failure, take out successfully, take out failure number According to the schematic diagram that item number is counted, as shown in fig. 7, statistics available and displaying same day parent company's data are sent to different branch companies Data volume, and receive the data volume from branch company.If the data of data sender's statistics and the number of data receiver's statistics According to inconsistent, then it can determine whether that there are the inconsistent situations of business.
As shown in figure 8, it illustrates each queue in queue management device be put into successfully, be put into failure, take out successfully, Take out the data cases of failure.According to the statistics of some queue of Fig. 8, it can be determined that whether data are put into speed and take-off speed Quite, whether data stacking is generated in MQ cluster.
The MQ group system operating condition statistics and displaying submodule, specifically for showing docking application business system Topological diagram.As shown in figure 9, it illustrates the schematic diagrames for the topological diagram for showing docking application business system.
It should be noted that since message queue gateway to the monitoring management platform reports itself the monitoring information not to be Data format needed for monitoring management platform first can be parsed and be handled to monitoring information to the monitoring management platform; Parsing will be passed through again, and monitoring information is stored in database with treated.
The database 104, the monitoring information provided for storing the monitoring management platform.
The database can refer to the warehouse for coming tissue, storage and management data according to data structure.The database can To include relational database, such as Oracle, SQL Server, data base querying can be used when query information from database Sentence.The database can be with monitoring management Platform deployment on same physical server, in order to guarantee data storage Safety, the database can also be with monitoring management Platform deployments on different physical servers.
So far, the embodiment of the monitoring management system provided the application first embodiment is described in detail.This Apply for first embodiment by the monitoring information of acquisition message queue gateway and message queue node, and by the monitoring information of acquisition Monitoring management platform is reported, monitoring management platform analyzes the monitoring information, obtains monitored results, can timely send out The problem of existing message queue cluster, and carry out early warning or alarm;Monitoring management platform can also carry out monitoring information Statistics and display.It is realized by the mechanism such as monitoring information acquisition, statistics and display, early warning and acquires and summarize in real time current message Operation conditions, the health status of queue cluster are greatly improved the O&M guarantee of system.
The application second embodiment provides a kind of method for managing and monitoring, and the method is applied to the application first embodiment Monitoring management system.It is described in detail below in conjunction with Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9 and Figure 10.
As shown in Figure 10, in step S1001, message queue gateway passes through the net that operates on the message queue gateway Close the monitoring information that monitoring data collection broker program reports itself to monitoring management platform.
The message queue gateway refers to the message queue gateway server in message queue cluster (for example, MQ cluster), makees For the gateway of entire message queue cluster, mainly application-oriented connection request, by load-balancing mechanism, application system disappears Breath data will be distributed to message queue node (such as MQ node) by message queue gateway server.As shown in Figure 1, MQ gateway Server 1 and MQ gateway server 2 are message queue gateway.
The monitoring information, including hardware information: cpu busy percentage, disk service condition, file size, process, network etc. Information;It further include following information: queue management device state;Message channel state;Message queue information;Error queue information;Extremely Believe queuing message;Priority-queue statistic information etc..Wherein, the priority-queue statistic information includes: that queuing data flows to information and/or data Flow information etc..
The monitoring management platform is a kind of software systems, for obtain message queue gateway and message queue node to The monitoring information that monitoring management platform reports analyzes the monitoring information, obtains the monitoring knot to the monitoring information Fruit, and the monitoring information is stored to the database.
As shown in Figure 1, the gateway monitors data collection agent journey run on MQ gateway server 1 and MQ gateway server 2 Sequence (Agent) can acquire monitoring information, and the monitoring information of the message queue gateway of acquisition is reported monitoring management platform.It is excellent Choosing, gateway monitors data collection agent program can be with timing acquiring monitoring information, and by the prison of the message queue gateway of acquisition Information reporting monitoring management platform is controlled, so that monitoring management platform can locate abnormal conditions according to monitoring information in time Reason, and realize the real time monitoring to system.Wherein, gateway monitors data collection agent program is to operate in message queue gateway On software program, various monitoring informations can be acquired from message queue gateway and the monitoring information of acquisition is sent to monitoring Manage platform.As shown in Fig. 2, gateway monitors data collection agent program obtains monitoring information by calling api interface, and will obtain The monitoring information taken is sent to monitoring management Platform Server end, wherein system hardware information (information such as memory, CPU, disk) Inquiry acquisition, the SDK API that the relevant information of MQ cluster is provided by IBM MQ are mainly carried out by the api interface of operating system Interface obtain, the MQ data of acquisition mainly include: MQ cluster, MQ queue management device, queue, queue depth, message channel, IP, Port is put into and takes out the information such as item number, time.
As shown in Figure 10, in step S1002, message queue node passes through the section that operates on the message queue node Point monitoring data collection broker program reports the monitoring information of itself to monitoring management platform.
As shown in Figure 1, MQ clustered node 1, MQ clustered node 2, MQ clustered node 3, MQ clustered node 4 are message queue section Point.
The monitoring nodes data run on MQ clustered node 1, MQ clustered node 2, MQ clustered node 3 and MQ clustered node 4 Collection agent program (Agent) can acquire the monitoring information of message queue node, and by the prison of the message queue node of acquisition Control information reporting monitoring management platform.Preferably, monitoring nodes data collection agent program can be with timing acquiring message queue section The monitoring information of point, and the monitoring information of the message queue node of acquisition is reported into monitoring management platform, so that monitoring pipe pats Platform can be handled abnormal conditions according to monitoring information in time, and realize the real time monitoring to message queue nodes.Wherein, Monitoring nodes data collection agent program is the software program operated on message queue node, can be from message queue node It is upper to acquire various monitoring informations and the monitoring information of acquisition is sent to monitoring management platform.
As shown in Figure 10, in the step s 1003, the monitoring management platform analyzes the monitoring information, obtains It stores to the monitored results of the monitoring information, and by the monitoring information to database.
The monitoring information is analyzed in monitoring management platform, after obtaining to the monitored results of the monitoring information, Monitoring management platform can also determine the need for alarm or early warning according to the monitored results to monitoring information.
Alarm or early warning are determined the need for according to the monitored results to the monitoring information, comprising:
Judge whether the monitored results to the monitoring information reach the alarm bar for monitoring information setting Part, if so, carrying out alert process;Or
Judge whether the monitored results to the monitoring information reach the early warning item for monitoring information setting Part threshold value, if so, carrying out early warning processing.
When the monitored results of monitoring information meet the alert if for monitoring information setting, alarm.Example Such as, when the monitored results of monitoring information meet following condition, can alarm: memory usage is more than 95%;CPU is used Rate is more than 95%;Hard disk utilization rate is more than 95%;There are data for dead letter queue;There is " message channel is unavailable " information.Monitoring It manages platform and passes through SMS or mail informing system administrator;It can also be showed by monitoring management platform interface or sound Alarm.
When the monitored results to monitoring information reach the early-warning conditions threshold value for monitoring information setting, Ke Yitong Cross SMS or mail informing system administrator;Alternatively, issuing image or sound early warning or alarm.
Since the corresponding monitored item of different monitoring information is different, early warning that can be different to different monitoring information setting Condition threshold.When reaching the early-warning conditions threshold value of setting, early warning or alert process are carried out.
For example, when being 90,000 for the item number that some message queue setting early-warning conditions threshold value is message, then when this message Early warning is needed when message count is greater than or equal to 90,000 in queue;It is interior for a certain MQ node setting memory early-warning conditions threshold value Depositing occupancy is 85%, then when EMS memory occupation 85%, needs early warning.
It should be noted that it is different to the early-warning conditions threshold value of different messages queue setting, for example, sensitive to real-time Message queue, the item number that early-warning conditions threshold value can be set to message is 10;To the biggish message queue of data volume, early warning item The item number that part threshold value can be set to message is 2000.
Preferably, the warning level of monitoring information can also be set, and different warning levels corresponds to different for described The early-warning conditions threshold value of monitoring information setting.For example, different early-warning conditions threshold values cpu busy percentage can be arranged, CPU is utilized The early-warning conditions threshold value of rate is respectively set to 70%, 80%, 90%, respectively corresponds level-one early warning, second level early warning, three-level early warning.
Preferably, in order to make system manager, (system manager and each docking application including this monitoring management system are Unite administrator) understand early warning or warning message in real time, monitoring management platform can with the phone number of binding system administrator and/or Email address.Fig. 4 shows the schematic diagram that monitoring management platform sends warning information to system manager's mailbox.
Preferably, advisory information is carried in early warning or alert process.
Illustrate below with reference to scene when reaching the early-warning conditions threshold value of setting, carries out the important of early warning or alert process Property.
For example, carrying out the reception and forwarding of message using MQ cluster in the data sorting platform of insurance industry.Assuming that setting The message queue for having set declaration form of accepting insurance, it is 100,000 that the item number that its early-warning conditions threshold value is message, which is arranged, if Claims Resolution is pending Declaration form concentrates transmission primary daily, it is likely that impact is brought to the message queue for declaration form of accepting insurance, when message count in this message queue Early warning is carried out when more than or equal to 100,000, if continuous a period of time (such as 5 days) has carried out early warning, can be believed in early warning Dilatation suggestion is carried in breath, so as to system manager can timely dilatation, guarantee the normal operation of system, improve system can By property.
Monitoring management platform can also carry out multidimensional system to the monitoring information in database other than carrying out early warning and alarm Meter and displaying.
The monitoring information in the database carries out multidimensional statistics and displaying, comprising:
MQ group system operating condition is counted and shown;Alternatively,
Hardware health status is counted and shown;Alternatively,
Queue management device is counted and is shown.
It is described that queue management device is counted and shown, comprising:
Queue management device data volume item number is counted and shown;Alternatively,
Queuing message in queue management device is counted and shown;
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases into Row statistics and displaying.
As shown in figure 5, queue management device data volume item number is counted and is shown, for example, newly protecting queue management device again Data volume item number is 420573.
It is described that queue management device is counted and shown, comprise at least one of the following mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.Such as Fig. 6 institute Show, the queue message in queue management device is counted and shown, for example, team can be counted and be shown daily or by minute The queue message of queue QAREINS001, QAREINS001, QAREINS001 of column manager QMGWC management.
As shown in fig. 7, it illustrates to queue management device be put into successfully, be put into failure, take out successfully, take out failure number According to the schematic diagram that item number is counted, as shown in fig. 7, statistics available and displaying same day parent company's data are sent to different branch companies Data volume, and receive the data volume from branch company.If be put into take out data it is inconsistent, can determine whether that there are docking business The inconsistent situation of system data.
As shown in figure 8, it illustrates each queue in queue management device be put into successfully, be put into failure, take out successfully, Take out the data cases of failure.According to the statistics of some queue of Fig. 8, it can be determined that whether data are put into speed and take-off speed Quite, whether data stacking is generated in system.
It is described MQ group system operating condition is counted and is shown include: to docking application business system topological diagram It is shown.As shown in figure 9, it illustrates the schematic diagrames for the topological diagram for showing docking application business system.
It should be noted that since message queue gateway to the monitoring management platform reports itself the monitoring information not to be Data format needed for monitoring management platform first can be parsed and be handled to monitoring information to the monitoring management platform; Parsing will be passed through again, and monitoring information is stored in database with treated.
So far, the embodiment of the method for managing and monitoring provided the application second embodiment is described in detail.This Apply for second embodiment by the monitoring information of acquisition message queue gateway and message queue node, and by the monitoring information of acquisition Monitoring management platform is reported, monitoring management platform analyzes the monitoring information, obtains monitored results, can timely send out The problem of existing message queue cluster, and carry out early warning or alarm;Monitoring management platform can also carry out monitoring information Statistics and display.It is realized by the mechanism such as monitoring information acquisition, statistics and display, early warning and acquires and summarize in real time current message Operation conditions, the health status of queue cluster are greatly improved the O&M guarantee of system.
Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this field skill Art personnel without departing from the spirit and scope of the present invention, can make possible variation and modification, therefore guarantor of the invention Shield range should be subject to the range that the claims in the present invention are defined.

Claims (25)

1. a kind of monitoring management system characterized by comprising message queue cluster, monitoring management platform, database;
The message queue cluster, including at least one message queue gateway and multiple message queue nodes;
The message queue gateway is used for according to the load of the message queue node, will dock the message distribution of application system to Message queue node, the message queue gateway run gateway monitors data collection agent program, and the gateway monitors data are adopted Collection broker program is used to acquire the monitoring information of message queue gateway, and the monitoring information of the message queue gateway of acquisition is reported The monitoring management platform;
The message queue node is used to receive the message of the docking application system of message queue gateway offer, scoops out to described Dui It is handled with the message of system, or stores the message of the docking application system, the message in the form of message queue Queue nodes run monitoring nodes data collection agent program, and the monitoring nodes data collection agent program is for acquiring message The monitoring information of queue nodes, and the monitoring information of the message queue node of acquisition is reported into the monitoring management platform;
The monitoring management platform, for obtaining gateway monitors data collection agent program and monitoring nodes data collection agent journey The monitoring information that sequence reports analyzes the monitoring information, obtains monitored results, and the monitoring information is stored to institute State database;
The database, the monitoring information provided for storing the monitoring management platform.
2. monitoring management system according to claim 1, which is characterized in that the gateway monitors data collection agent program Institute is reported specifically for the monitoring information of timing acquiring message queue gateway, and by the monitoring information of the message queue gateway of acquisition State monitoring management platform.
3. monitoring management system according to claim 1, which is characterized in that the monitoring nodes data collection agent program Institute is reported specifically for the monitoring information of timing acquiring message queue node, and by the monitoring information of the message queue node of acquisition State monitoring management platform.
4. monitoring management system according to claim 1, which is characterized in that the monitoring management platform includes:
Early warning submodule, for determining the need for early warning according to the monitored results to the monitoring information, if so, carrying out Early warning processing;
Statistics and displaying submodule, for carrying out multidimensional statistics and displaying to the monitoring information in the database.
5. monitoring management system according to claim 4, which is characterized in that the early warning submodule is specifically used for:
For reaching the early-warning conditions threshold value for monitoring information setting when the monitored results to the monitoring information When, pass through SMS or mail informing system administrator;Alternatively, issuing image or sound early warning or alarm.
6. monitoring management system according to claim 4, which is characterized in that the early warning submodule is also used to be arranged monitoring The warning level of information, different warning levels correspond to the different early-warning conditions threshold values for monitoring information setting.
7. monitoring management system according to claim 4, which is characterized in that the statistics and displaying submodule, comprising:
MQ group system operating condition statistics and displaying submodule, for MQ group system operating condition to be counted and opened up Show;Alternatively,
Condition of hardware statistics and displaying submodule, for hardware health status to be counted and shown;Alternatively,
Queue management device statistics and displaying submodule, for being counted and being shown to queue management device.
8. monitoring management system according to claim 7, which is characterized in that MQ group system operating condition statistics and Submodule is shown, specifically for showing the topological diagram of docking application business system.
9. monitoring management system according to claim 7, which is characterized in that the queue management device statistics and displaying submodule Block is specifically used for:
Queue management device data volume item number is counted and shown;Alternatively,
Queuing message in queue management device is counted and shown;Alternatively,
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;Alternatively,
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases unite Meter and displaying.
10. monitoring management system according to claim 7, which is characterized in that it is described to queue management device carry out statistics and It shows, comprises at least one of the following mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.
11. monitoring management system according to claim 1, which is characterized in that the monitoring information, including following at least one Kind information:
Queue management device state;Message channel state;Message queue information;Error queue information;Dead letter queue information;Queue system Count information.
12. monitoring management system according to claim 11, which is characterized in that the priority-queue statistic information includes: queue Data flow information and/or data traffic.
13. a kind of method for managing and monitoring, which is characterized in that be applied to monitoring management system described in claim 1, the method Include:
Message queue gateway is by the gateway monitors data collection agent program that operates on the message queue gateway to monitoring Management platform reports the monitoring information of itself;
Message queue node is by the monitoring nodes data collection agent program that operates on the message queue node to monitoring Management platform reports the monitoring information of itself;
The monitoring management platform analyzes the monitoring information, obtains the monitored results to the monitoring information, and will The monitoring information is stored to database.
14. according to the method for claim 13, which is characterized in that the message queue gateway is by operating in the message Gateway monitors data collection agent program on queue gateway reports the monitoring information of itself to monitoring management platform, comprising:
The message queue gateway by the gateway monitors data collection agent program that operates on the message queue gateway to Monitoring management platform timing reports the monitoring information of itself.
15. according to the method for claim 13, which is characterized in that message queue node is by operating in the message queue Monitoring nodes data collection agent program on node reports the monitoring information of itself to monitoring management platform, comprising:
The message queue node by the monitoring nodes data collection agent program that operates on the message queue node to Monitoring management platform timing reports the monitoring information of itself.
16. according to the method for claim 13, which is characterized in that further include:
Alarm or early warning are determined the need for according to the monitored results to the monitoring information.
17. according to the method for claim 16, which is characterized in that according to described true to the monitored results of the monitoring information It is fixed whether to need alarm or early warning, comprising:
Judge whether the monitored results to the monitoring information reach the alert if for monitoring information setting, if It is to carry out alert process;Or
Judge whether the monitored results to the monitoring information reach the early-warning conditions threshold for monitoring information setting Value, if so, carrying out early warning processing.
18. according to the method for claim 17, which is characterized in that further include:
The warning level of monitoring information is set, and different warning levels corresponds to the different early warning for monitoring information setting Condition threshold.
19. according to the method for claim 13, which is characterized in that further include:
Multidimensional statistics and displaying are carried out to the monitoring information in the database.
20. according to the method for claim 19, which is characterized in that the monitoring information in the database carries out more Dimension statistics and displaying, comprising:
MQ group system operating condition is counted and shown;Alternatively,
Hardware health status is counted and shown;Alternatively,
Queue management device is counted and is shown.
21. according to the method for claim 20, which is characterized in that it is described to MQ group system operating condition carry out statistics and Displaying includes:
The topological diagram of docking application business system is shown.
22. according to the method for claim 20, which is characterized in that described that queue management device is counted and shown, packet It includes:
Queue management device data volume item number is counted and shown;Alternatively,
Queuing message in queue management device is counted and shown;
To queue management device be put into successfully, be put into failure, take out successfully, take out failure number of data count;
To the queue in queue management device be put into successfully, be put into failure, take out successfully, take out failure data cases unite Meter and displaying.
23. according to the method for claim 22, which is characterized in that described that queue management device is counted and shown, packet Include following at least one mode:
It monthly counts, daily count, counting by the hour, being counted by minute, by historical statistics, customized statistics.
24. according to the method for claim 13, which is characterized in that the monitoring information, including following at least one information:
Queue management device state;Message channel state;Message queue information;Error queue information;Dead letter queue information;Queue system Count information.
25. according to the method for claim 24, which is characterized in that the priority-queue statistic information includes: queuing data flow direction Information and/or data traffic.
CN201810509664.9A 2018-05-24 2018-05-24 Monitoring management system and monitoring management method Active CN110535713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810509664.9A CN110535713B (en) 2018-05-24 2018-05-24 Monitoring management system and monitoring management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810509664.9A CN110535713B (en) 2018-05-24 2018-05-24 Monitoring management system and monitoring management method

Publications (2)

Publication Number Publication Date
CN110535713A true CN110535713A (en) 2019-12-03
CN110535713B CN110535713B (en) 2021-08-03

Family

ID=68657435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810509664.9A Active CN110535713B (en) 2018-05-24 2018-05-24 Monitoring management system and monitoring management method

Country Status (1)

Country Link
CN (1) CN110535713B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111556019A (en) * 2020-03-27 2020-08-18 天津市普迅电力信息技术有限公司 Vehicle-mounted machine data encryption transmission and processing method under distributed environment
CN111626870A (en) * 2020-05-25 2020-09-04 泰康保险集团股份有限公司 Method, device and equipment for processing data of underwriting of cleaning physical examination piece
CN111638981A (en) * 2020-05-27 2020-09-08 南京犀六智能科技有限公司 Safety management system
CN112291254A (en) * 2020-11-05 2021-01-29 中国人民银行清算总中心 Message processing method and device for reliable transaction
CN112333042A (en) * 2020-10-27 2021-02-05 广州助蜂网络科技有限公司 Monitoring management method and device for Internet of things card middleware
CN113630284A (en) * 2020-05-08 2021-11-09 网联清算有限公司 Message middleware monitoring method, device and equipment
CN115776435A (en) * 2022-10-24 2023-03-10 华能信息技术有限公司 Early warning method based on API gateway
CN116170385A (en) * 2023-04-21 2023-05-26 四川汉科计算机信息技术有限公司 Gateway information forwarding system, method, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101965005A (en) * 2009-07-21 2011-02-02 中兴通讯股份有限公司 Distributed access gateway system
US20120254652A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Fault detection and recovery as a service
CN102801585A (en) * 2012-08-24 2012-11-28 上海和辰信息技术有限公司 Information monitoring system and method based on cloud computing network environment
CN107766207A (en) * 2017-10-20 2018-03-06 中国人民财产保险股份有限公司 Distributed automatic monitoring method, system, computer-readable recording medium and terminal device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101965005A (en) * 2009-07-21 2011-02-02 中兴通讯股份有限公司 Distributed access gateway system
US20120254652A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Fault detection and recovery as a service
CN102801585A (en) * 2012-08-24 2012-11-28 上海和辰信息技术有限公司 Information monitoring system and method based on cloud computing network environment
CN107766207A (en) * 2017-10-20 2018-03-06 中国人民财产保险股份有限公司 Distributed automatic monitoring method, system, computer-readable recording medium and terminal device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111556019A (en) * 2020-03-27 2020-08-18 天津市普迅电力信息技术有限公司 Vehicle-mounted machine data encryption transmission and processing method under distributed environment
CN111556019B (en) * 2020-03-27 2022-06-14 天津市普迅电力信息技术有限公司 Vehicle-mounted machine data encryption transmission and processing method under distributed environment
CN113630284A (en) * 2020-05-08 2021-11-09 网联清算有限公司 Message middleware monitoring method, device and equipment
CN111626870A (en) * 2020-05-25 2020-09-04 泰康保险集团股份有限公司 Method, device and equipment for processing data of underwriting of cleaning physical examination piece
CN111638981A (en) * 2020-05-27 2020-09-08 南京犀六智能科技有限公司 Safety management system
CN112333042A (en) * 2020-10-27 2021-02-05 广州助蜂网络科技有限公司 Monitoring management method and device for Internet of things card middleware
CN112291254A (en) * 2020-11-05 2021-01-29 中国人民银行清算总中心 Message processing method and device for reliable transaction
CN112291254B (en) * 2020-11-05 2023-05-05 中国人民银行清算总中心 Message processing method and device for reliable transaction
CN115776435A (en) * 2022-10-24 2023-03-10 华能信息技术有限公司 Early warning method based on API gateway
CN115776435B (en) * 2022-10-24 2024-03-01 华能信息技术有限公司 Early warning method based on API gateway
CN116170385A (en) * 2023-04-21 2023-05-26 四川汉科计算机信息技术有限公司 Gateway information forwarding system, method, equipment and storage medium

Also Published As

Publication number Publication date
CN110535713B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN110535713A (en) Monitoring management system and method for managing and monitoring
US10701214B2 (en) System and method for real-time analysis of network traffic
US20050281276A1 (en) Data analysis and flow control system
US6141777A (en) System and method for reporting telecommunication service conditions
US5627886A (en) System and method for detecting fraudulent network usage patterns using real-time network monitoring
US20030135382A1 (en) Self-monitoring service system for providing historical and current operating status
CN108197261A (en) A kind of wisdom traffic operating system
US8755499B2 (en) Methods, computer program products, and systems for managing voice over internet protocol (VOIP) network elements
US8040231B2 (en) Method for processing alarm data to generate security reports
CN109885453A (en) Big data platform monitoring system based on flow data processing
CN110224865A (en) A kind of log warning system based on Stream Processing
CN110221947A (en) Warning information method for inspecting, system, computer installation and readable storage medium storing program for executing
CN107509119A (en) A kind of monitoring alarm method and device
CN111049673A (en) Method and system for counting and monitoring API call in service gateway
CN114036022A (en) Monitoring alarm processing method, device, equipment and medium
US20070147595A1 (en) Methods, systems, and computer program products for providing routing of communications
CN108235353A (en) A kind of monitoring system of the urban rail system based on LTE-M communications
CN111582796B (en) Express monitoring system and method based on image recognition
CN110097381A (en) A kind of complaint automatic identification early warning system and method applied to airport service
US20180374333A1 (en) Autonomous Cloud-Based Third Party Monitoring
CN114297020A (en) Enterprise industrial control safety brain platform system and operation method
Cisco Events
CN109508356B (en) Data abnormality early warning method, device, computer equipment and storage medium
CN111983960A (en) Monitoring system and method
KR20050003240A (en) System and method for network failure management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant