CN106940671A - The monitoring method of mission thread operation, apparatus and system in a kind of cluster - Google Patents

The monitoring method of mission thread operation, apparatus and system in a kind of cluster Download PDF

Info

Publication number
CN106940671A
CN106940671A CN201610004928.6A CN201610004928A CN106940671A CN 106940671 A CN106940671 A CN 106940671A CN 201610004928 A CN201610004928 A CN 201610004928A CN 106940671 A CN106940671 A CN 106940671A
Authority
CN
China
Prior art keywords
mission thread
time
cluster
thread
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610004928.6A
Other languages
Chinese (zh)
Other versions
CN106940671B (en
Inventor
郦军杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610004928.6A priority Critical patent/CN106940671B/en
Publication of CN106940671A publication Critical patent/CN106940671A/en
Application granted granted Critical
Publication of CN106940671B publication Critical patent/CN106940671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The embodiment of the present application discloses the monitoring method that mission thread is run in a kind of cluster, apparatus and system.This method includes:Task server in cluster updates the current heartbeat time letter information of store tasks thread based on heart time association channel by the cycle of the first preset time in database;Monitoring server in the cluster obtains the current heartbeat temporal information of mission thread in the database by the cycle of the second preset time;It is described to monitor the time sum that server calculates the current heartbeat temporal information and first preset time of the mission thread, the size of relatively more described time sum and current time;It is described to monitor the operation that server controls mission thread in the cluster according to the result of the comparison.The technical scheme provided using the embodiment of the present application can need not directly rely on the task status of mission thread, directly judge whether mission thread normally runs, and effectively improve the stability of a system, reduce system risk.

Description

The monitoring method of mission thread operation, apparatus and system in a kind of cluster
Technical field
The application is related to the monitoring method that mission thread is run in Internet communication technology field, more particularly to a kind of cluster, device And system.
Background technology
With making rapid progress for Internet communication technology, some large-scale internet systems, due to reasons such as business complexity, can go out The scene that now multitask thread is run under big cluster environment.After general mission thread operation, it is required for one section of long period to carry out Business Processing, monitoring server needs to ensure the stable operation of each mission thread within the time.
The monitoring method taken in the prior art for each mission thread stable operation under the big cluster environment of guarantee mainly includes two kinds:One It is to collect Report mode to plant, and another is actively to return to look into pattern.Needed in Report mode is collected by the shape of each mission thread State is reported to server is monitored, and institute will be restarted when monitoring server and being collected into the mission failure state report of some mission thread The task copy of mission thread is stated to ensure the operation of mission thread stabilization.Each mission thread in pattern is looked into active time to be required for Registered on server is monitored, mission thread inquiry one by one can be carried out according to some cycles by so monitoring server, be passed through The task status for looking into pattern acquiring mission thread is returned, is determined whether to restart the task copy of mission thread according to query task state The operation stable to ensure mission thread.But can be due to network problem in the monitoring method of the collection Report mode of prior art Or the delay situation of machine of machine causes task status smoothly to report to server is monitored, and monitors server when can only be according to one section Between do not receive the rule of task status report and restart the task copy of mission thread;The monitoring method for pattern of looking into is returned in the active In be also limited by machine, the factor of network can not accurately obtain task status, so can only also be looked into according to returning for predetermined number time The condition of task status is not found in number of times to restart the copy of mission thread.
Therefore, task status can not accurately be obtained in existing above two monitoring method, it may appear that task is not withered away blind Purpose restarts the situation of the task copy of mission thread, causes the long-time because of the mission thread repeated priming concurrently to report an error, leads Cause larger monitoring development cost and system risk.
The content of the invention
The purpose of the embodiment of the present application is to provide the monitoring method that mission thread is run in a kind of cluster, apparatus and system, Ke Yibao Each mission thread stable operation under big cluster environment is demonstrate,proved, the stability of a system is improved, system risk is reduced.
In order to solve the above technical problems, the embodiment of the present application provide the monitoring method that mission thread is run in a kind of cluster, device and What system was realized in:
A kind of monitoring method that mission thread is run in cluster, methods described includes:
Task server in cluster is updated based on heart time association channel by the cycle of the first preset time in database to be deposited Store up the current heartbeat temporal information of mission thread;
What the monitoring server in the cluster obtained mission thread in the database by the cycle of the second preset time works as front center Jump temporal information;
It is described monitor server calculate the current heartbeat temporal information of the mission thread and first preset time time it With the size of relatively more described time sum and current time;
It is described to monitor the operation that server controls mission thread in the cluster according to the result of the comparison.
A kind of monitoring method that mission thread is run in cluster, methods described includes:
The current heartbeat temporal information of mission thread in the database is obtained by the cycle of the second preset time;
The current heartbeat temporal information and task server for calculating the mission thread update the storage task line in database The time sum of first preset time in the current heartbeat temporal information correspondence cycle of journey, relatively more described time sum and current time Size;
The operation of mission thread in the cluster is controlled according to the result of the comparison.
A kind of monitoring method that mission thread is run in cluster, methods described includes:
When monitoring startup of server mission thread, when checking whether the mission thread establishes heartbeat between the database Between association channel;
When the inspection result for be when, based on heart time association channel by the cycle of the first preset time in database more The current heartbeat temporal information of new store tasks thread.
A kind of supervising device that mission thread is run in cluster, described device includes:
Heart time data obtaining module, for obtaining the current of mission thread in the database by the cycle of the second preset time Heart time information;
Data computation module, for calculating the current heartbeat temporal information of the mission thread and task server in database more Newly store the time sum of first preset time in the current heartbeat temporal information correspondence cycle of the mission thread;
Data comparison module, the size of obtained time sum and current time is calculated for comparing the data computation module;
Mission thread operation control module, for controlling task line in the cluster according to the data comparison module result of the comparison The operation of journey.
A kind of supervising device that mission thread is run in cluster, described device includes:
Check module, for when monitoring startup of server mission thread, check the mission thread whether with the database it Between establish heart time association channel;
Heart time information updating module, it is logical based on heart time association for when the inspection result of the inspection module is is Road updates the current heartbeat temporal information of store tasks thread by the cycle of the first preset time in database.
A kind of monitoring system that mission thread is run in cluster, the system includes:
Server is monitored, the current heartbeat time letter for obtaining mission thread in the database by the cycle of the second preset time Breath;And updated for the current heartbeat temporal information for calculating the mission thread with task server in database described in storage The time sum of first preset time in the current heartbeat temporal information correspondence cycle of mission thread, relatively more described time sum is with working as The size of preceding time;And for controlling the operation of mission thread in the cluster according to the result of the comparison;
Task server, for when monitoring startup of server mission thread, check the mission thread whether with the database Between establish heart time association channel;And for when the result of the inspection is is, based on heart time association channel The current heartbeat temporal information of store tasks thread is updated in database by the cycle of the first preset time;
Database, the current heartbeat temporal information for mission thread in storage cluster.
In the embodiment of the present application task server based on heart time association channel by the current heartbeat temporal information of mission thread with First preset time updates storage for the cycle in database, monitors server and the current of mission thread is obtained from the database After heart time information, the time sum of current heartbeat temporal information and first preset time is calculated, and by comparing The mode of the size of the time sum and current time judges whether current task thread normally runs.Finally, according to comparing As a result the operation of mission thread in the cluster can effectively be controlled.Compared with prior art, can be with using the embodiment of the present application The task status of mission thread need not be directly relied on, directly judges whether mission thread normally runs, system is effectively improved stable Property, reduce system risk.
Brief description of the drawings
, below will be to embodiment or prior art in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The accompanying drawing to be used needed for description is briefly described, it should be apparent that, drawings in the following description are only note in the application Some embodiments carried, for those of ordinary skill in the art, without having to pay creative labor, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of embodiment of the monitoring method of mission thread operation in the cluster that the application is provided;
Fig. 2 is the schematic flow sheet of another embodiment of the monitoring method of mission thread operation in the cluster that the application is provided;
Fig. 3 is the schematic flow sheet of another embodiment of the monitoring method of mission thread operation in the cluster that the application is provided;
Fig. 4 is a kind of structural representation of embodiment of the supervising device of mission thread operation in the cluster that the application is provided;
Fig. 5 is a kind of module diagram of embodiment of the mission thread operation control module that the application is provided;
Fig. 6 is the structural representation of another embodiment of the supervising device of mission thread operation in the cluster that the application is provided;
Fig. 7 is the structural representation of another embodiment of the supervising device of mission thread operation in the cluster that the application is provided;
Fig. 8 is the structural representation of another embodiment of the supervising device of mission thread operation in the cluster that the application is provided;
Fig. 9 is a kind of structural representation of embodiment of the monitoring system of mission thread operation in the cluster that the application is provided.
Embodiment
In order that those skilled in the art more fully understand the technical scheme in the application, below in conjunction with the embodiment of the present application Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the scope of the application protection.
Hereinafter implementing for the embodiment of the present application is described in detail with several specific examples.
Introduce a kind of embodiment of the monitoring method that mission thread is run in a kind of cluster of the application first below.Fig. 1 is the application The schematic flow sheet of a kind of embodiment for the monitoring method that mission thread is run in the cluster of offer, this application provides such as embodiment Or the method operating procedure described in flow chart, but can include more or less behaviour based on routine or without performing creative labour Make step.The step of being enumerated in embodiment order is only a kind of mode in numerous step execution sequences, does not represent and uniquely holds Row order.When system or client production in practice is performed, it can be held according to embodiment or method shown in the drawings order Row or parallel execution (environment of such as parallel processor or multiple threads).Specifically as shown in figure 1, methods described It can include:
S110:Task server in cluster be based on heart time association channel by the cycle of the first preset time in database more The current heartbeat temporal information of new store tasks thread.
In the embodiment of the present application, the task server in cluster can based on heart time association channel using the first preset time as Cycle updates the current heartbeat temporal information of store tasks thread in database.In actual applications, can in the general cluster To include the monitoring server of the corresponding task server of multiple tasks thread, and the multiple mission thread of monitoring.Specifically, The task server can include client, can also include the server of control multiple tasks thread.Specifically, described One preset time can be set according to practical situations, general, first preset time can be less than mission thread operation Treatment time.Specifically, the heart time association channel is corresponded with corresponding operating mission thread.When task line After journey and Database heart time association channel, it is ensured that arrive the heart time information Store of the mission thread In database.Specifically, the current heartbeat temporal information can be mission thread task server storage during running Current time information into database.For example, mission thread A is currently running, and storage was updated in database apart from last time Current heartbeat temporal information has reached first preset time, then task server can be by the current of the mission thread A Temporal information storage is used as current heartbeat temporal information into database.
Further, in actual applications, can be corresponding for the traffic assignments in cluster when needing to carry out Business Processing Mission thread, accordingly, methods described can also include:
Mission thread in cluster described in the monitoring startup of server, and the mission thread set up respectively in the cluster with it is described Heart time association channel between database.
The jump association in time passage specifically, a general mission thread may only unite as one with the Database.When task line Journey and the Database heart time association channel, based on mission thread described in the heart time association channel in operation Period can just update storage current heartbeat temporal information in the database.Accordingly, methods described can also include:
The task server checks whether the mission thread establishes the heart between the database when mission thread starts Jump association in time passage;
When the result of the inspection is is, it is default with first that the task server is based on heart time association channel execution Time is the operation for the cycle current heartbeat temporal information for storing the mission thread being updated in database.
Accordingly, methods described can also include:
When the result of the inspection is no, stop the operation of the mission thread current task.
Above by mission thread startup optimization, check whether the mission thread establishes heartbeat between the database Association in time passage can ensure that, when mission thread itself occupies corresponding heart time association channel, business can be proceeded Processing;When itself not having corresponding heart time association channel, the task server can stop the operation of current task. There is also alternative between the mission thread of so same business, what the task that can be avoided did not withered away but blindness restarts task line The situation of the task copy of journey, is solved because the long-time of the mission thread repeated priming concurrently reports an error.
S120:Monitoring server in the cluster obtains working as mission thread in the database by the cycle of the second preset time Preceding heartbeat temporal information.
In the embodiment of the present application, after step silo, monitoring server in the cluster can using the second preset time as Cycle obtains the current heartbeat temporal information of mission thread in the database.Specifically, second preset time can basis Practical situations are set, specifically, second preset time can be less than or equal to first preset time.For example, institute When stating the first preset time for 30s, second preset time can be 25s.
Can be by API (the Application Programming of database specifically, monitoring server described in the embodiment of the present application Interface, application programming interface) interface access data storehouse, obtain the current heartbeat temporal information of mission thread.Here Conducted interviews by api interface it is possible to prevente effectively from data communication problem caused by network problem.
S130:It is described monitor server calculate the current heartbeat temporal information of the mission thread and first preset time when Between sum, relatively the time sum and current time size.
In the embodiment of the present application, after step S120, it is described monitor that server can calculate the mission thread work as front center Jump temporal information and the time sum of first preset time, the size of relatively more described time sum and current time.Specifically, The current heartbeat temporal information 09 of last update storages of such as mission thread A in database:32:10, can be with table It is shown as Time_heartbeat=9*3600+32*60+10, it is assumed that the first preset time is 25s, can be expressed as Time_first =25*60, then the time for obtaining the current heartbeat temporal information of the mission thread A with first preset time can be calculated Sum 09:32:35, Time_heartbeat+Time_first=9*3600+32*60+35 can be expressed as.
Specifically, after the current heartbeat temporal information and the time sum of first preset time of mission thread is calculated, Current time, the size of relatively more described time sum and current time can be obtained.
S140:It is described to monitor the operation that server controls mission thread in the cluster according to the result of the comparison.
In the embodiment of the present application, after step s 130, the monitoring server can control institute according to the result of the comparison State the operation of mission thread in cluster.Specifically, can include it is following in any one:
When the result of the comparison is that the time sum is less than the current time, cluster described in the monitoring Server Restart Described in mission thread corresponding to time sum task copy;
When the result of the comparison is that the time sum is more than or equal to the current time, the monitoring server is performed with the Two preset times are the cycle to obtain the operation of the current heartbeat temporal information of mission thread in the database to complete the cluster The operation of middle mission thread.
Specifically, in actual applications, with the time sum Time_heartbeat+ corresponding to above-mentioned mission thread A Exemplified by Time_first=9*3600+32*60+35, it is assumed that current time 09:32:45, Time_now=can be expressed as 9*3600+32*60+45, then Time_heartbeat+Time_first can be obtained<Time_now, accordingly, step Result of the comparison is that the time sum is less than the current time in S130, and the monitoring server can be by restarting the collection The task copy of mission thread described in group corresponding to time sum ensures that mission thread can be continued to run with, so as to enter The processing of row corresponding service.Accordingly, in the task described in the monitoring Server Restart described in cluster corresponding to time sum After the task copy of thread, methods described can also include:
It is described to monitor the heart time association channel that server is set up between the mission thread and the database.
Specifically, after the task copy of server task thread is monitored, setting up between the mission thread and the database Heart time association channel, it is ensured that current operating mission thread can occupy corresponding heart time association channel. So no matter whether the corresponding ancestral task thread of the mission thread has stopped the operation of corresponding task, when ancestral task thread It was found that the operation of corresponding task can be automatically stopped after no longer having occupied correspondence heart time association channel.
Specifically, in actual applications, with the time sum Time_heartbeat+ corresponding to above-mentioned mission thread A Exemplified by Time_first=9*3600+32*60+35, it is assumed that current time 09:32:45, Time_now=can be expressed as 9*3600+32*60+45, then Time_heartbeat+Time_first can be obtained>Time_now, accordingly, step Result of the comparison is that the time sum is more than the current time in S130, it can be determined that corresponding mission thread stablizes fortune OK, the mission thread can continue to update storage current heartbeat temporal information in database by the cycle of the first preset time. Accordingly, the monitoring server can continue executing with S120 and task in the database is obtained by the cycle of the second preset time The operation of the current heartbeat temporal information of thread completes the operation of mission thread in the cluster.
As can be seen here, task server is based on heartbeat in the embodiment for the monitoring method that mission thread is run in a kind of cluster of the application The current heartbeat temporal information of mission thread is updated storage by the cycle of the first preset time by association in time passage in database, Monitoring server is obtained from the database after the current heartbeat temporal information of mission thread, calculates current heartbeat time letter The time sum of breath and first preset time, and judged by way of the size of relatively more described time sum and current time Whether current task thread normally runs.Then, mission thread in the cluster can effectively be controlled according to result of the comparison Operation.Meanwhile, by checking whether the mission thread between the database establishes the heart in mission thread startup optimization Jump the mode of association in time passage, it is ensured that there is also alternative between the mission thread of same business, task can be avoided not have There is the situation for the but task copy for restarting mission thread of blindness of withering away, efficiently solve because of the length of the mission thread repeated priming Time concurrently reports an error.Compared with prior art, the task status of mission thread can need not be directly relied on using the embodiment of the present application, Directly judge whether mission thread normally runs, while solving the long-time because of the mission thread repeated priming and transmitting messages Mistake, effectively increases the stability of a system, reduces system risk.
Consider the another of the monitoring method that mission thread is run in the step of monitoring server is main, a kind of cluster of the application introduced below A kind of embodiment, Fig. 2 is that the flow of another embodiment of the monitoring method of mission thread operation in the cluster that the application is provided is shown It is intended to, this application provides the method operating procedure as described in embodiment or flow chart, but based on labor conventional or without creativeness It is dynamic to include more or less operating procedures.In the step of being enumerated in embodiment order only numerous step execution sequences A kind of mode, unique execution sequence is not represented., can be according to embodiment or side shown in the drawings when performing in practice Method order is performed or parallel execution (environment of such as parallel processor or multiple threads).It is specific as shown in Fig. 2 Methods described can include:
S210:The current heartbeat temporal information of mission thread in the database is obtained by the cycle of the second preset time.
S220:The current heartbeat temporal information and task server for calculating the mission thread update described of storage in database The time sum of first preset time in the current heartbeat temporal information correspondence cycle of business thread, compares the time sum and current The size of time.
S230:The operation of mission thread in the cluster is controlled according to the result of the comparison.
Specifically, it is described according to the result of the comparison control the operation of mission thread in the cluster can include it is following in appoint Meaning is a kind of:
When the result of the comparison is that the time sum is less than the current time, time sum described in the cluster is restarted The task copy of corresponding mission thread;
When the result of the comparison is that the time sum is more than or equal to the current time, perform using the second preset time as week Phase obtains the operation of the current heartbeat temporal information of mission thread in the database to complete the fortune of mission thread in the cluster OK.
Further, it is described restart the task copy of the mission thread described in the cluster corresponding to time sum after, institute The method of stating can also include:
The heart time association channel set up between the mission thread and the database.
Further, methods described can also include:
Start the mission thread in the cluster, and the heart set up respectively between the mission thread in the cluster and the database Jump association in time passage.
From the technical scheme of above the embodiment of the present application, server is monitored in the application by the cycle of the second preset time from number According to after the current heartbeat temporal information that mission thread is obtained in storehouse, current heartbeat temporal information is calculated with task server in number The time sum of first preset time in the current heartbeat temporal information correspondence cycle for storing the mission thread according to being updated in storehouse, and Judge whether current task thread normally runs by way of the size of relatively more described time sum and current time.Then, root The operation of mission thread in the cluster can be effectively controlled according to result of the comparison.Compared with prior art, it is real using the application The task status of mission thread can need not be directly relied on by applying example, directly judged whether mission thread normally runs, effectively improved The stability of a system, reduces system risk.
Consider the another of the monitoring method that mission thread is run in the step of task server is main, a kind of cluster of the application introduced below A kind of embodiment, Fig. 3 is that the flow of another embodiment of the monitoring method of mission thread operation in the cluster that the application is provided is shown It is intended to, this application provides the method operating procedure as described in embodiment or flow chart, but based on labor conventional or without creativeness It is dynamic to include more or less operating procedures.In the step of being enumerated in embodiment order only numerous step execution sequences A kind of mode, unique execution sequence is not represented., can be according to embodiment or side shown in the drawings when performing in practice Method order is performed or parallel execution (environment of such as parallel processor or multiple threads).It is specific as shown in figure 3, Methods described can include:
S310:When monitoring startup of server mission thread, check whether the mission thread establishes between the database Heart time association channel;
S320:When the result of the inspection is is, based on heart time association channel using the first preset time be the cycle in data The current heartbeat temporal information of store tasks thread is updated in storehouse.
Further, methods described can also include:
When the result of the inspection is no, stop the operation of the mission thread current task.
From the technical scheme of above the embodiment of the present application, task server is checked in mission thread startup optimization in the application Whether the mission thread establishes the mode of heart time association channel between the database, it is ensured that same business There is also alternative between mission thread, task can be avoided not wither away the feelings of the but task copy for restarting mission thread of blindness Condition, is efficiently solved because the long-time of the mission thread repeated priming concurrently reports an error.Then, when establishing the heart time After association channel, store tasks thread is updated by the cycle of the first preset time in database based on heart time association channel Current heartbeat temporal information, so that current task thread can be judged based on the current heartbeat temporal information by subsequently monitoring server Whether normally run, and then can effectively control the operation of mission thread in the cluster.Compared with prior art, this is utilized Application embodiment can be solved because the long-time of the mission thread repeated priming concurrently reports an error, and effectively increase system stable Property, reduce system risk.
On the other hand the application also provides a kind of a kind of embodiment of the supervising device that mission thread is run in cluster, and Fig. 4 is this Shen The structural representation of a kind of embodiment for the supervising device that mission thread is run in the cluster that please be provide, as shown in figure 4, the dress Putting 400 can include:
Heart time data obtaining module 410, can be used for obtaining task line in the database by the cycle of the second preset time The current heartbeat temporal information of journey.
Data computation module 420, can be used for calculating the current heartbeat temporal information of the mission thread with task server in number The time sum of first preset time in the current heartbeat temporal information correspondence cycle for storing the mission thread according to being updated in storehouse.
Data comparison module 430, can be used for data computation module described in comparison and calculates obtained time sum and current time Size.
Mission thread operation control module 440, can be used for controlling the cluster according to the data comparison module result of the comparison The operation of middle mission thread.
The application provides a kind of specific embodiment mode of the mission thread operation control module 440.Specifically, Fig. 5 is this Apply for a kind of module diagram of embodiment of the mission thread operation control module provided, as shown in figure 5, the one of the application Planting mission thread operation control module 440 described in embodiment can include:
First control process unit 441, can be used for when the result of the comparison is that the time sum is less than the current time When, restart the task copy of the mission thread described in the cluster corresponding to time sum;
Second control process unit 442, can be used for when the result of the comparison is that the time sum is current more than or equal to described During the time, the operation that the current heartbeat temporal information of mission thread in the database is obtained by the cycle of the second preset time is performed To complete the operation of mission thread in the cluster.
Fig. 6 is the structural representation of another embodiment of the supervising device of mission thread operation in the cluster that the application is provided, separately In a kind of embodiment, as shown in fig. 6, described device 400 can also include:
First association channel sets up module 450, can be used for restarting the task line described in the cluster corresponding to time sum After the task copy of journey, the heart time association channel set up between the mission thread and the database.
Fig. 7 is the structural representation of another embodiment of the supervising device of mission thread operation in the cluster that the application is provided, separately In a kind of embodiment, as shown in fig. 7, described device 400 can also include:
Second association channel sets up module 460, can be used for starting the mission thread in the cluster, and set up the collection respectively The heart time association channel between mission thread and the database in group.
Heartbeat temporal information acquisition module 410 obtains task line from the database by the cycle of the second preset time in the application The current heartbeat temporal information of journey;Then, data computation module 420 can obtain current heartbeat temporal information and institute by calculating State the time sum of the first preset time;Then, when data comparison module 430 can be by comparing the time sum and being current Between the mode of size judge whether current task thread normally runs.Finally, mission thread operation control module 440 can root The operation of mission thread in the cluster is effectively controlled according to result of the comparison, the stability of a system is improved, system risk is reduced.
On the other hand the application also provides a kind of another embodiment of the supervising device that mission thread is run in cluster, and Fig. 8 is this Apply for a kind of structural representation of the embodiment for the supervising device that mission thread is run in the cluster provided, as shown in figure 8, described Device 800 can include:
Check module 810, can be used for when monitoring startup of server mission thread, check the mission thread whether with it is described Heart time association channel is established between database;
Heart time information updating module 820, can be used for when the inspection result of the inspection module is is, during based on heartbeat Between association channel the current heartbeat temporal information of store tasks thread is updated in database by the cycle of the first preset time.
In another embodiment, described device 800 can also include:
Task run stopping modular, can be used for, when the inspection result of the inspection module is no, stopping the mission thread and working as The operation of preceding task.
The application provide inspection module 810 by checked in mission thread startup optimization the mission thread whether with the number According to the mode that heart time association channel is established between storehouse, it is ensured that there is also mutual exclusion between the mission thread of same business Property, task can be avoided not wither away the situation of the but task copy for restarting mission thread of blindness, efficiently solve and appoint because described The long-time of business thread repeated priming concurrently reports an error.
On the other hand the application also provides the monitoring system that mission thread is run in a kind of cluster, and Fig. 9 is the cluster that the application is provided The structural representation of a kind of embodiment of the monitoring system of middle mission thread operation, as shown in figure 9, the system 900 can be wrapped Include:
Server 910 is monitored, can be used for by the cycle of the second preset time mission thread in the acquisition database works as front center Jump temporal information;And updated for the current heartbeat temporal information for calculating the mission thread with task server in database Store the time sum of first preset time in the current heartbeat temporal information correspondence cycle of the mission thread, relatively more described time The size of sum and current time;And for controlling the operation of mission thread in the cluster according to the result of the comparison;
Task server 920, can be used for when monitoring startup of server mission thread, check the mission thread whether with institute State and establish heart time association channel between database;And for when the result of the inspection is is, based on heart time Association channel updates the current heartbeat temporal information of store tasks thread by the cycle of the first preset time in database;
Database 930, can be used for the current heartbeat temporal information of mission thread in storage cluster.
As can be seen here, the monitoring method of mission thread operation, the embodiment of apparatus and system are based on heartbeat in a kind of cluster of the application The current heartbeat temporal information of mission thread is updated storage by the cycle of the first preset time by association in time passage in database, Monitoring server is obtained from the database after the current heartbeat temporal information of mission thread, calculates current heartbeat time letter The time sum of breath and first preset time, and judged by way of the size of relatively more described time sum and current time Whether current task thread normally runs.Then, mission thread in the cluster can effectively be controlled according to result of the comparison Operation.Meanwhile, by checking whether the mission thread between the database establishes the heart in mission thread startup optimization Jump the mode of association in time passage, it is ensured that there is also alternative between the mission thread of same business, task can be avoided not have There is the situation for the but task copy for restarting mission thread of blindness of withering away, efficiently solve because of the length of the mission thread repeated priming Time concurrently reports an error.Compared with prior art, the task status of mission thread can need not be directly relied on using the embodiment of the present application, Directly judge whether mission thread normally runs, while solving the long-time because of the mission thread repeated priming and transmitting messages Mistake, effectively increases the stability of a system, reduces system risk.
Although mentioning the number of the data interaction between database, monitoring server and task server or the like in teachings herein According to the description of processing, still, it must be complete standard or mentioned data handling utility environment that the application, which is not limited to, Situation.Involved foregoing description is only the application in some embodiments in the application in the embodiment of each in the application.Certainly, Meeting other deformations without creativeness of the process method step described in the application the various embodiments described above, still can realize Identical application, will not be repeated here.
Although this application provides the method operating procedure as described in embodiment or flow chart, based on conventional or without creativeness Means can include more or less operating procedures.The step of being enumerated in embodiment order is only numerous step execution sequences In a kind of mode, unique execution sequence is not represented., can be according to reality when device or client production in practice is performed Apply example or method shown in the drawings order is performed or parallel execution (environment of such as parallel processor or multiple threads).
Above-described embodiment illustrates device or module, can specifically be realized by computer chip or entity, or by with certain function Product realize.For convenience of description, it is divided into various modules during description apparatus above with function to describe respectively.Certainly, exist Implement the function of each module can be realized in same or multiple softwares and/or hardware during the application, will can also realize same The module of function is realized by the combination of multiple submodule or subelement.
, completely can be with it is also known in the art that in addition to realizing controller in pure computer readable program code mode Cause controller with gate, switch, application specific integrated circuit, FPGA control by the way that method and step is carried out into programming in logic Device processed realizes identical function with the form of embedded microcontroller etc..Therefore this controller is considered a kind of Hardware Subdivision Part, and the device for realizing various functions included to its inside can also be considered as the structure in hardware component.Or even, It not only can be able to will be the software module of implementation method but also can be in hardware component for realizing that the device of various functions is considered as Structure.
The application can be described in the general context of computer executable instructions, such as program module. Usually, program module include performing particular task or realize the routine of particular abstract data type, program, object, component, Data structure, class etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by Remote processing devices connected by communication network perform task.In a distributed computing environment, program module can position In including in the local and remote computer-readable storage medium including storage device.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can be by soft Part adds the mode of required general hardware platform to realize.Understood based on such, the technical scheme of the application is substantially in other words The part contributed to prior art can be embodied in the form of software product, and the computer software product can be stored in In storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are make it that a computer equipment (can be with It is personal computer, mobile terminal, server, or network equipment etc.) perform each embodiment of the application or embodiment Method described in some parts.
Each embodiment in this specification is described by the way of progressive, and same or analogous part is mutual between each embodiment Referring to what each embodiment was stressed is the difference with other embodiment.The application can be used for it is numerous general or In special computing system environments or configuration.For example:Personal computer, server computer, handheld device portable are set Standby, laptop device, mobile communication terminal, multicomputer system, the system based on microprocessor, programmable electronic equipment, Network PC, minicom, DCE of mainframe computer including any of the above system or equipment etc..
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application have it is many deformation and change and Spirit herein is not departed from, it is desirable to which appended claim includes these deformations and changed without departing from spirit herein.

Claims (14)

1. the monitoring method that mission thread is run in a kind of cluster, it is characterised in that methods described includes:
Task server in cluster is updated based on heart time association channel by the cycle of the first preset time in database to be deposited Store up the current heartbeat temporal information of mission thread;
What the monitoring server in the cluster obtained mission thread in the database by the cycle of the second preset time works as front center Jump temporal information;
It is described monitor server calculate the current heartbeat temporal information of the mission thread and first preset time time it With the size of relatively more described time sum and current time;
It is described to monitor the operation that server controls mission thread in the cluster according to the result of the comparison.
2. the monitoring method that mission thread is run in a kind of cluster, it is characterised in that methods described includes:
The current heartbeat temporal information of mission thread in the database is obtained by the cycle of the second preset time;
The current heartbeat temporal information and task server for calculating the mission thread update the storage task line in database The time sum of first preset time in the current heartbeat temporal information correspondence cycle of journey, relatively more described time sum and current time Size;
The operation of mission thread in the cluster is controlled according to the result of the comparison.
3. method according to claim 2, it is characterised in that described that the cluster is controlled according to the result of the comparison The operation of middle mission thread include it is following in any one:
When the result of the comparison is that the time sum is less than the current time, time sum described in the cluster is restarted The task copy of corresponding mission thread;
When the result of the comparison is that the time sum is more than or equal to the current time, perform using the second preset time as week Phase obtains the operation of the current heartbeat temporal information of mission thread in the database to complete the fortune of mission thread in the cluster OK.
4. method according to claim 3, it is characterised in that restart time sum institute described in the cluster described After the task copy of corresponding mission thread, methods described also includes:
The heart time association channel set up between the mission thread and the database.
5. method according to claim 2, it is characterised in that methods described also includes:
Start the mission thread in the cluster, and the heart set up respectively between the mission thread in the cluster and the database Jump association in time passage.
6. the monitoring method that mission thread is run in a kind of cluster, it is characterised in that methods described includes:
When monitoring startup of server mission thread, when checking whether the mission thread establishes heartbeat between the database Between association channel;
When the inspection result for be when, based on heart time association channel by the cycle of the first preset time in database more The current heartbeat temporal information of new store tasks thread.
7. method according to claim 6, it is characterised in that methods described also includes:
When the result of the inspection is no, stop the operation of the mission thread current task.
8. the supervising device that mission thread is run in a kind of cluster, it is characterised in that described device includes:
Heart time data obtaining module, for obtaining the current of mission thread in the database by the cycle of the second preset time Heart time information;
Data computation module, for calculating the current heartbeat temporal information of the mission thread and task server in database more Newly store the time sum of first preset time in the current heartbeat temporal information correspondence cycle of the mission thread;
Data comparison module, the size of obtained time sum and current time is calculated for comparing the data computation module;
Mission thread operation control module, for controlling task line in the cluster according to the data comparison module result of the comparison The operation of journey.
9. device according to claim 8, it is characterised in that the mission thread operation control module includes:
First control process unit, for when the result of the comparison is that the time sum is less than the current time, restarting The task copy of mission thread described in the cluster corresponding to time sum;
Second control process unit, for when the result of the comparison be the time sum be more than or equal to the current time when, Perform and obtain the operation of the current heartbeat temporal information of mission thread in the database by the cycle of the second preset time to complete The operation of mission thread in the cluster.
10. device according to claim 9, it is characterised in that described device also includes:
First association channel sets up module, in the restarting the mission thread described in the cluster corresponding to time sum of the task After copy, the heart time association channel set up between the mission thread and the database.
11. device according to claim 8, it is characterised in that described device also includes:
Second association channel sets up module, for starting the mission thread in the cluster, and sets up appointing in the cluster respectively The heart time association channel being engaged between thread and the database.
12. the supervising device that mission thread is run in a kind of cluster, it is characterised in that described device includes:
Check module, for when monitoring startup of server mission thread, check the mission thread whether with the database it Between establish heart time association channel;
Heart time information updating module, it is logical based on heart time association for when the inspection result of the inspection module is is Road updates the current heartbeat temporal information of store tasks thread by the cycle of the first preset time in database.
13. device according to claim 12, it is characterised in that described device also includes:
Task run stopping modular, for when the inspection result of the inspection module is no, stopping the mission thread as predecessor The operation of business.
14. the monitoring system that mission thread is run in a kind of cluster, it is characterised in that the system includes:
Server is monitored, the current heartbeat time letter for obtaining mission thread in the database by the cycle of the second preset time Breath;And updated for the current heartbeat temporal information for calculating the mission thread with task server in database described in storage The time sum of first preset time in the current heartbeat temporal information correspondence cycle of mission thread, relatively more described time sum is with working as The size of preceding time;And for controlling the operation of mission thread in the cluster according to the result of the comparison;
Task server, for when monitoring startup of server mission thread, check the mission thread whether with the database Between establish heart time association channel;And for when the result of the inspection is is, based on heart time association channel The current heartbeat temporal information of store tasks thread is updated in database by the cycle of the first preset time;
Database, the current heartbeat temporal information for mission thread in storage cluster.
CN201610004928.6A 2016-01-05 2016-01-05 Method, device and system for monitoring running of task threads in cluster Active CN106940671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610004928.6A CN106940671B (en) 2016-01-05 2016-01-05 Method, device and system for monitoring running of task threads in cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610004928.6A CN106940671B (en) 2016-01-05 2016-01-05 Method, device and system for monitoring running of task threads in cluster

Publications (2)

Publication Number Publication Date
CN106940671A true CN106940671A (en) 2017-07-11
CN106940671B CN106940671B (en) 2020-08-04

Family

ID=59469754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610004928.6A Active CN106940671B (en) 2016-01-05 2016-01-05 Method, device and system for monitoring running of task threads in cluster

Country Status (1)

Country Link
CN (1) CN106940671B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213684A (en) * 2018-09-18 2019-01-15 北京工业大学 Program detecting method and application based on OpenMP thread heartbeat detection technology
CN109391495A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Send and receive method, apparatus, computer-readable medium and the electronic equipment of heartbeat message
CN109992436A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Thread block detection method and equipment
CN113438122A (en) * 2021-05-14 2021-09-24 济南浪潮数据技术有限公司 Heartbeat management method and device for server, computer equipment and medium
CN113434291A (en) * 2021-06-25 2021-09-24 湖北央中巨石信息技术有限公司 Real-time scheduling optimization method based on channel
CN114328083A (en) * 2021-11-30 2022-04-12 苏州浪潮智能科技有限公司 WDT monitoring method, device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1717658A (en) * 2002-11-27 2006-01-04 甲骨文国际公司 Heartbeat mechanism for cluster systems
CN101854373A (en) * 2009-04-01 2010-10-06 华为技术有限公司 Target switching method, server node and colony system
CN102880475A (en) * 2012-10-23 2013-01-16 上海普元信息技术股份有限公司 Real-time event handling system and method based on cloud computing in computer software system
US8639818B1 (en) * 2012-12-25 2014-01-28 Kaspersky Lab Zao System and method for reliable and timely task completion in a distributed computing environment
US20140244687A1 (en) * 2013-02-24 2014-08-28 Technion Research & Development Foundation Limited Processing query to graph database
CN104866380A (en) * 2015-06-18 2015-08-26 北京搜狐新媒体信息技术有限公司 Method and device for processing state transition of cluster management system
CN104915256A (en) * 2015-06-05 2015-09-16 惠州Tcl移动通信有限公司 Method and system for realizing real-time scheduling of task

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1717658A (en) * 2002-11-27 2006-01-04 甲骨文国际公司 Heartbeat mechanism for cluster systems
CN101854373A (en) * 2009-04-01 2010-10-06 华为技术有限公司 Target switching method, server node and colony system
CN102880475A (en) * 2012-10-23 2013-01-16 上海普元信息技术股份有限公司 Real-time event handling system and method based on cloud computing in computer software system
US8639818B1 (en) * 2012-12-25 2014-01-28 Kaspersky Lab Zao System and method for reliable and timely task completion in a distributed computing environment
US20140244687A1 (en) * 2013-02-24 2014-08-28 Technion Research & Development Foundation Limited Processing query to graph database
CN104915256A (en) * 2015-06-05 2015-09-16 惠州Tcl移动通信有限公司 Method and system for realizing real-time scheduling of task
CN104866380A (en) * 2015-06-18 2015-08-26 北京搜狐新媒体信息技术有限公司 Method and device for processing state transition of cluster management system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391495A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Send and receive method, apparatus, computer-readable medium and the electronic equipment of heartbeat message
CN109992436A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Thread block detection method and equipment
CN109213684A (en) * 2018-09-18 2019-01-15 北京工业大学 Program detecting method and application based on OpenMP thread heartbeat detection technology
CN109213684B (en) * 2018-09-18 2022-01-28 北京工业大学 Program detection method based on OpenMP thread heartbeat detection technology and application
CN113438122A (en) * 2021-05-14 2021-09-24 济南浪潮数据技术有限公司 Heartbeat management method and device for server, computer equipment and medium
CN113438122B (en) * 2021-05-14 2022-05-17 济南浪潮数据技术有限公司 Heartbeat management method and device for server, computer equipment and medium
CN113434291A (en) * 2021-06-25 2021-09-24 湖北央中巨石信息技术有限公司 Real-time scheduling optimization method based on channel
CN114328083A (en) * 2021-11-30 2022-04-12 苏州浪潮智能科技有限公司 WDT monitoring method, device and medium
CN114328083B (en) * 2021-11-30 2023-11-14 苏州浪潮智能科技有限公司 WDT monitoring method, device and medium

Also Published As

Publication number Publication date
CN106940671B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN106940671A (en) The monitoring method of mission thread operation, apparatus and system in a kind of cluster
CN106940699B (en) Synchronous processing method, device, server and system for memory data
CN111290768B (en) Updating method, device, equipment and medium of containerized application system
CN111506401B (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
CN105204880B (en) computer system and setting method of basic input and output system
CN112527474B (en) Task processing method and device, equipment, readable medium and computer program product
CN112948212A (en) RPA task state monitoring method, device and computer storage medium
CN107277083A (en) A kind of processing method of data interaction, apparatus and system
CN109257396B (en) Distributed lock scheduling method and device
US9098334B2 (en) Special values in oracle clusterware resource profiles
CN114416284A (en) Distributed operating system control method, apparatus, device, medium, and program product
CN109697117B (en) Terminal control method, terminal control device and computer-readable storage medium
CN112925811A (en) Data processing method, device, equipment, storage medium and program product
CN113703946B (en) Application recovery method and device, electronic equipment and computer readable storage medium
CN112817992B (en) Method, apparatus, electronic device and readable storage medium for executing change task
CN112148447B (en) Task processing method and system based on risk control and electronic equipment
CN104951346A (en) Process management method for embedded system as well as system
CN113206748B (en) Management method of alliance chain network, alliance chain network and electronic device
CN114327819A (en) Task management method, device, equipment and storage medium
CN114791900A (en) Operator-based Redis operation and maintenance method, device, system and storage medium
CN114124700A (en) Cluster parameter configuration method and device, electronic equipment and readable storage medium
CN111581049A (en) Method, device, equipment and storage medium for monitoring running state of distributed system
CN112527368A (en) Cluster kernel version updating method and device, electronic equipment and storage medium
CN110647440A (en) Big data task processing method and system based on state machine
CN111258954B (en) Data migration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.