WO2017063521A1 - 协程监控方法及装置 - Google Patents

协程监控方法及装置 Download PDF

Info

Publication number
WO2017063521A1
WO2017063521A1 PCT/CN2016/101467 CN2016101467W WO2017063521A1 WO 2017063521 A1 WO2017063521 A1 WO 2017063521A1 CN 2016101467 W CN2016101467 W CN 2016101467W WO 2017063521 A1 WO2017063521 A1 WO 2017063521A1
Authority
WO
WIPO (PCT)
Prior art keywords
coroutine
monitoring
running
preset
running time
Prior art date
Application number
PCT/CN2016/101467
Other languages
English (en)
French (fr)
Inventor
尹德升
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017063521A1 publication Critical patent/WO2017063521A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring

Definitions

  • the present disclosure relates to the field of communication technologies, for example, to a method and apparatus for monitoring a co-process.
  • OpenStack an open source cloud computing management platform project, is a free software and open source cloud computing management platform project. OpenStack provides Infrastructure as a Service (IaaS) solutions through a variety of complementary services. Is an open source project designed to provide software for the construction and management of public and private clouds.
  • IaaS Infrastructure as a Service
  • Thread is the unit of Linux operating system scheduling. Multiple threads in the system can obtain and run a Central Processing Unit (CPU) through some scheduling policies (such as priority preemption, time slice rotation, etc.). When the CPU has multiple cores, these threads can execute simultaneously.
  • CPU Central Processing Unit
  • Coroutine can be considered as a user space thread.
  • the operating system knows nothing about the existence of coroutines. It requires developers to design scheduling in threads to perform collaborative multitasking. After the operating system schedules the thread carried by the coroutine, the thread internally completes the second-level scheduling of the coroutine.
  • the present disclosure provides a method and apparatus for monitoring a cooperative process to reduce system delay caused by a coroutine anomaly.
  • an embodiment of the present disclosure provides a method for monitoring a coroutine, including:
  • the monitoring thread monitors at least one coroutine in the first process, and determines whether the running time of the coroutine in the at least one coroutine exceeds a preset running duration; Each of the at least one coroutine is pre-recorded with a preset run time of the coroutine;
  • the first coroutine is terminated by the monitoring thread.
  • the method before monitoring, by the monitoring thread, the at least one coroutine in the first process, the method further includes:
  • the start times of the respective operations are respectively recorded by the at least one coroutine, and the respective preset running hours are respectively recorded.
  • the method further includes:
  • the method further includes:
  • the method further includes:
  • the running time of the second coroutine in the at least one coroutine is not exceeded by the second preset running time of the second coroutine, the first record is cancelled by the second coroutine The start time of the second coroutine operation and the second preset running time.
  • a cooperative monitoring apparatus including:
  • the monitoring module is configured to monitor, in the first process, the at least one coroutine in the first process by the monitoring thread, to determine whether the running time of the coroutine in the at least one coroutine exceeds a preset running duration; Wherein, each of the at least one coroutine is pre-recorded with a preset running time of the coroutine;
  • the device further includes a recording module, configured to:
  • the monitoring module monitors at least one coroutine in the first process by the monitoring thread, when the at least one coroutine is run, respectively, the starting time of each running is recorded by the at least one coroutine, And record their respective preset runtimes.
  • the device further includes an initialization module, configured to:
  • the terminating module terminates running the first coroutine
  • the first coroutine is initialized.
  • the device further includes an operation module, and is configured to:
  • the initialization module After the initialization module initializes the first coroutine, the first coroutine is re-run.
  • the device further includes a logout module, configured to:
  • the present disclosure also provides a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the above method.
  • the present disclosure also provides an electronic device, including:
  • At least one processor At least one processor
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the method described above.
  • the monitoring thread can be run, and the monitoring thread needs to monitor the coroutines that need to be monitored (ie, at least one coroutine), and the coroutines that need to be monitored also record the preset running time separately, then if the monitoring thread monitors When there is a coroutine with a running time exceeding the corresponding preset running time, the monitoring thread can terminate the running of the coroutine, so that other coroutines can continue to run, and the other coroutines cannot be operated normally due to an exception of one coroutine. The situation arises, and the system delay is avoided as much as possible, which improves the network quality.
  • Figure 1 is a schematic diagram of the hierarchical relationship of coroutines, threads, and processes
  • FIG. 2 is a flowchart of a method for monitoring a coroutine according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an application scenario in an embodiment of the present disclosure.
  • FIG. 4 is a structural block diagram of a coroutine monitoring apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the hierarchy of processes, threads, and coroutines is an organization and allocation unit of resources. All execution processes in the process use the same process space.
  • a thread is a scheduling unit of a system. Multiple threads under a process use a common process space to work together under the scheduling of an operating system. In a CPU environment of multiple cores, multiple threads can run simultaneously.
  • the coroutine is a secondary scheduling under one thread. The operating system does not see this layer.
  • the coroutines under the same thread are executed sequentially, so they can avoid mutual protection of common resources.
  • the Nova-compute service in OpenStack (a process responsible for completing virtual machine operation instructions residing on a computing node) uses multithreading in order to respond to user operations on virtual machines in a timely manner.
  • a three-level management mechanism such as multi-coroutine under the thread (see Figure 1).
  • the usage information of some physical resources is shared by multiple threads, and the following problems exist:
  • the Nova-compute service periodically retrieves the local physical resource information and reports it to the OpenStack database. If the coroutines under different threads need to use these physical resources, you need to add a mutex to the flow of the resource operation to ensure that it is in a Only one thread can access resources at a time. After the process such as the resource server is blocked, the process that needs to operate the resource can only wait for the lock for the data operation. At this time, the service status may still appear normal on the surface, so the system needs to be discovered and self-healed in time.
  • the monitoring thread can be run, and the monitoring thread needs to monitor the coroutines that need to be monitored (ie, at least one coroutine), and the coroutines that need to be monitored are also separately recorded in advance. If the monitoring thread monitors a coroutine that has a running time longer than the corresponding preset running time, the monitoring thread can terminate the running of the coroutine, so that other monitoring The coroutine continues to run, avoiding the situation that other coroutines can not operate normally due to an abnormality of a coroutine, and also avoids the system delay and improves the network quality.
  • an embodiment of the present disclosure provides a method for monitoring a coroutine, and a process of the method is described as follows.
  • the monitoring thread monitors at least one coroutine in the first process to determine whether the running time of the coroutine exceeds the preset running time in at least one coroutine; wherein at least one Each coroutine in the coroutine is pre-recorded with the preset run time of the coroutine.
  • step 220 if it is determined that the running length of the first coroutine in the at least one coroutine exceeds the first preset running duration of the first coroutine, the first coroutine is terminated by the monitoring thread.
  • the first process may be any process running in the system, that is, the monitoring process may be run in any process in the system, for example, if you want to monitor the coroutine in a process, you can Run the monitoring process in this process.
  • the monitoring thread may run periodically or may also run when needed.
  • the coroutine may record the starting time of the running at the time of running, and may record the coroutine.
  • the default run time Setting the preset run time for the coroutine can be done by the user or automatically by the system. For example, one way of handling is as follows:
  • the coroutine is decorated with a python (a computer programming language) decorator.
  • Python is an object-oriented, interpreted computer programming language.
  • the Python decorator simply wraps the function and adds some additional functionality to the function, but does not change the original processing flow of the function.
  • the decorator is also a function, the argument is the wrapped function, returning the wrapped function.
  • the decorator can be defined, that is, a function is defined, for example, This function is called a monitoring function and can be monitored by appending the monitoring function to the monitored coroutine (that is, appending the monitoring function to the function of the monitored coroutine).
  • the defined monitoring function can complete the following tasks:
  • the coroutine executes the original process, if the preset running time of the coroutine is not exceeded, the record is deleted in the registration form.
  • the monitoring function can be attached to the function of the monitored coroutine, thus completing the "decoration" of the coroutine.
  • the method before monitoring, by the monitoring thread, the at least one coroutine in the first process, the method further includes:
  • the start times of the respective operations are respectively recorded by at least one coroutine, and the respective preset running times are respectively recorded.
  • At least one coroutine may be a coroutine that is "decorated" in the first process, wherein at least one coroutine may be all coroutines in the first process, or may be part of the coroutine in the first process. For example, it may be a coroutine in the first process that needs to access the same common resource.
  • the coroutine After a coroutine is decorated, when the coroutine starts running, through the monitoring function, the coroutine can register a record in the registration table to record the start time of the coroutine before executing the original process. And the preset runtime of the coroutine. After the recording is completed, the coroutine begins to execute the original process.
  • the start times of the operations recorded by these coroutines may naturally be different, or may be the same, and the respective preset running times recorded by these coroutines. It may also be the same or different. That is, the preset running time can be set for each coroutine to meet the different running needs of different coroutines.
  • the monitoring thread can be responsible for monitoring all decorated coroutines in a process, which may be scheduled by one thread or may be scheduled by multiple threads.
  • the monitoring thread finds that the running time of the coroutine in the monitored coroutine exceeds the record of the coroutine The preset running time, then the monitoring thread can terminate the running of the coroutine, so as to avoid the coroutine from affecting the operation of other coroutines.
  • the method further includes:
  • the first coroutine After terminating the running of the first coroutine, the first coroutine can be initialized, because the coroutine can retain the state of the last call (ie, a specific combination of all local states), and each time the process reenters, it is equivalent.
  • the state of the last call if the first coroutine is not initialized, if the first coroutine does have an abnormality or failure, and is not excluded, the next first coroutine may be re-run again. The state of one call may cause the first coroutine to continue to run. Therefore, the initialization of the first coroutine can prevent the first coroutine from entering the last abnormal state again, and try to ensure that the first coroutine can run successfully next time.
  • a monitored coroutine run timeout is detected (ie, the preset run time of the coroutine record is exceeded), the monitoring thread can not only terminate the coroutine, but also reinitialize the coroutine.
  • the most common cause of this blocking phenomenon that other coroutines cannot perform is that the link established between the local and the message server is abnormal, or the local is waiting for a message that has been lost, etc., and because of these The process of coroutine processing is cyclical. A cycle execution failure has no effect on the system. Therefore, adopting a simple and efficient self-healing strategy is to reset the coroutine again, so that it will restart the chain establishment with the server. Re-initiate the processing of the cycle to resolve the previous blocking problem.
  • the method further includes:
  • the first coroutine can be re-run when the next cycle arrives to continue to implement the functions that the first coroutine can implement.
  • the method further includes:
  • the running time of the second coroutine in the at least one coroutine does not exceed the second preset running time of the second coroutine, the starting time of the second coroutine running is recorded by the second coroutine And a second preset runtime.
  • the coroutine can stop running by itself. Before stopping the operation, the coroutine can automatically register the start time of the coroutine running and the preset running time before registering in the registration table. It can avoid the need to store too much content in the registration form, and it is also convenient for the coroutine to be re-recorded when it is scheduled next time, to avoid confusion caused by too many records of one coroutine.
  • the Nova-compute service on the compute node of the OpenStack has a large number of coroutines executed by the timing cycle, and the coroutines that periodically update the node resource information and the coroutines that report the heartbeat keep-alive information of the Nova-compute service are periodically updated.
  • These two coroutines are the most common, for example, the two coroutines are called coroutine 1 and coroutine 2.
  • Both coroutines are sent through a message server (such as QPID (an object-oriented message middleware) or rabbitMQ (an enterprise message system) using Advanced Message Queuing Protocol (AMQP).
  • the data is reported to the database in OpenStack.
  • an abnormal process such as a message server restart occurs, the link created by the compute node and the message server may be abnormal, or some response messages may be lost, causing the message sender (for example, coroutine 1 or coroutine 2) to wait for a response.
  • the message sender for example, coroutine 1 or coroutine 2
  • a coroutine that is waiting for a reply does not release the data lock that has already been applied.
  • Other coroutines will never be able to request locks for operating data, but will also have to wait for the mutex.
  • the operation of the virtual machine is generally performed by the system administrator on-site, or the system is initiated during disaster recovery backup and service elastic expansion.
  • the real-time requirements are relatively high, and it is not allowed to respond for a long time. What is more serious is The exception caused by the message server cannot be recovered by itself. That is, the coroutine waiting for the response in the virtual machine process on the left side may wait for the response state no matter how long it waits, so other coroutines will not be able to continue. Running, this will lead to system delays, and even lead to system failure, so it is extremely important to terminate this abnormal state.
  • the monitoring function may be defined, and the monitoring function is added to the coroutine to be monitored, and the monitored coroutine is registered in the registration form before executing the original process.
  • a record is recorded to record the start time of the coroutine operation and the preset running time of the coroutine, and then go to execute the original coroutine process.
  • the monitoring thread is, for example, a cycle execution process, and the monitoring thread can periodically check whether each registered record in the registration table has timed out. If it times out, the corresponding coroutine of the record is terminated, and the coroutine is executed.
  • Initialization for example, you can initialize the link relationship between it and the message server, to The coroutine can be re-run in the next cycle, because the coroutine is initialized, and the blocking is automatically released, so that the system can self-heal.
  • the monitoring process does not terminate the coroutine, and the coroutine can operate normally.
  • the coroutine has executed the original coroutine process, the previously registered records can be logged out in the registration form.
  • the solution provided by the embodiment of the present disclosure alleviates the phenomenon that the OpenStack system cannot execute the normal operation instruction of the virtual machine in the case that all the normal process is blocked due to partial coroutine blocking.
  • the design principles of the present disclosure are equally applicable to the detection of deadlock loops in all python processes.
  • an embodiment of the present disclosure provides a co-location monitoring device, which may include a monitoring module 401 and a termination module 402.
  • the modules in the device may be implemented by hardware, or may be implemented by software, or may be implemented by a combination of hardware and software.
  • the monitoring module 401 is configured to monitor, by the monitoring thread, the at least one coroutine in the first process in the first process, to determine whether the running time of the coroutine exceeds the preset running time in the at least one coroutine; wherein at least one Each coroutine in the coroutine is pre-recorded with the preset run time of the coroutine.
  • the termination module 402 is configured to terminate the operation of the first coroutine by the monitoring thread if it is determined that the running length of the first coroutine in the at least one coroutine exceeds the first preset running duration of the first coroutine.
  • the device further includes a recording module, configured to:
  • the monitoring module 401 monitors at least one coroutine in the first process by the monitoring thread, when at least one coroutine is run, the starting time of each running is separately recorded by at least one coroutine, and the respective presets are respectively recorded. Run time.
  • the device further includes an initialization module, configured to:
  • the termination module 402 terminates running the first coroutine
  • the first coroutine is initialized.
  • the device further includes an operation module, and is configured to:
  • the initialization module After the initialization module initializes the first coroutine, the first coroutine is re-run.
  • the device further includes a logout module, configured to:
  • the monitoring module 401 monitors at least one coroutine in the first process by the monitoring thread, if the running time of the second coroutine in the at least one coroutine is not exceeded, the running time of the second coroutine does not exceed the second
  • the preset running time is used, and the starting time of the second coroutine running and the second preset running time are recorded by the second coroutine.
  • the monitoring thread in one process, can be run, and the monitoring thread needs to monitor the coroutine that needs to be monitored, and the coroutine that needs to be monitored also records the preset running time separately, then if the monitoring thread monitors When there is a coroutine with a running time exceeding the corresponding preset running time, the monitoring thread can terminate the running of the coroutine, so that other coroutines can continue to run, and the other coroutines cannot be operated normally due to an exception of one coroutine. The situation arises, and the system delay is avoided as much as possible, which improves the network quality.
  • the apparatus and method disclosed in the embodiments provided by the present application may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit or unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units.
  • the solution of this embodiment can be implemented by selecting some or all of the units as needed.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the method of any of the above embodiments.
  • the present disclosure also provides a schematic structural diagram of an electronic device.
  • the electronic device includes:
  • At least one processor 501 which is exemplified by a processor 501 in FIG. 5; and a memory 502, may further include a communication interface 503 and a bus 504.
  • the processor 501, the communication interface 503, and the memory 502 can complete communication with each other through the bus 504.
  • Communication interface 504 can be used for information transfer.
  • Processor 501 can invoke logic instructions in memory 502 to perform the methods of the above-described embodiments.
  • logic instructions in the memory 502 described above may be implemented in the form of a software functional unit and sold or used as a stand-alone product, and may be stored in a computer readable storage medium.
  • the memory 502 is a computer readable storage medium and can be used to store software programs, computer executable programs, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure.
  • the processor 501 executes the function application and the data processing by executing the software programs, the instructions, and the modules stored in the memory 502, that is, implementing the coroutine monitoring method in the foregoing method embodiments.
  • the memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal device, and the like. Further, the memory 502 may include a high speed random access memory, and may also include a nonvolatile memory.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • All or part of the technical solution of the present application may be embodied in the form of a software product stored in a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, Either a network device or the like or a processor performs all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
  • the collaborative process monitoring method and apparatus provided by the embodiments of the present disclosure reduce system delay and improve network quality.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种协程监控方法及装置,该方法包括:在第一进程中,通过监控线程对所述第一进程中的至少一个协程进行监控,确定所述至少一个协程中是否有协程的运行时长超过预设运行时长;其中,所述至少一个协程中的每个协程均预先记录有该协程的预设运行时长(210);若确定所述至少一个协程中的第一协程的运行时长超过所述第一协程的第一预设运行时长,则通过所述监控线程终止运行所述第一协程(220)。

Description

协程监控方法及装置 技术领域
本公开涉及通信技术领域,例如涉及一种协程监控方法及装置。
背景技术
OpenStack(一种开源的云计算管理平台项目)是一个自由软件和开放源代码云计算管理平台项目,OpenStack通过多种互补的服务提供了基础设施即服务(Infrastructure as a Service,IaaS)的解决方案,是一个旨在为公共云及私有云的建设与管理提供软件的开源项目。
线程(Thread)是linux操作系统调度的单位,系统中的多个线程可以通过一些调度策略(比如优先级抢占、时间片轮转等等)来获得中央处理器(Central Processing Unit,CPU)并运行,当CPU拥有多个内核的时候,这些线程可以同时执行。
协程(Coroutine)可以认为是用户空间线程,操作系统(Operating System,OS)对协程的存在一无所知,需要开发人员在线程中设计好调度,来执行协作式多任务。操作系统调度到协程所承载的线程后,线程内部再完成对协程的二级调度。
在使用协程时,如果使用公共资源的协程不是同属于同一个线程,那么需要通过加入互斥锁的机制来保护数据的一致性,那么就存在以下问题:
如果使用公共资源的一个协程出现异常,那么接下来需要使用该公共资源的协程就会一直请求不到用于对数据进行操作的锁,其他协程就会一直处于等待状态,从而导致系统出现延迟。
发明内容
本公开提供一种协程监控方法及装置,减少因协程异常而导致系统延迟。
第一方面,本公开实施例提供了一种协程监控方法,包括:
在第一进程中,通过监控线程对所述第一进程中的至少一个协程进行监控,确定所述至少一个协程中是否有协程的运行时长超过预设运行时长;其中,所 述至少一个协程中的每个协程均预先记录有该协程的预设运行时长;以及
若确定所述至少一个协程中的第一协程的运行时长超过所述第一协程的第一预设运行时长,则通过所述监控线程终止运行所述第一协程。
可选的,在通过监控线程对所述第一进程中的至少一个协程进行监控之前,所述方法还包括:
在运行所述至少一个协程时,通过所述至少一个协程分别记录各自运行的起始时刻,以及分别记录各自的预设运行时长。
可选的,在终止运行所述第一协程之后,所述方法还包括:
对所述第一协程进行初始化。
可选的,在对所述第一协程进行初始化之后,所述方法还包括:
重新运行所述第一协程。
可选的,在通过监控线程对所述第一进程中的至少一个协程进行监控之后,所述方法还包括:
若所述至少一个协程中的第二协程在运行完毕时的运行时长未超过所述第二协程的第二预设运行时长,则通过所述第二协程注销记录的所述第二协程运行的起始时刻以及所述第二预设运行时长。
第二方面,提供一种协程监控装置,包括:
监控模块,设置为在第一进程中,通过监控线程对所述第一进程中的至少一个协程进行监控,确定所述至少一个协程中是否有协程的运行时长超过预设运行时长;其中,所述至少一个协程中的每个协程均预先记录有该协程的预设运行时长;以及
终止模块,设置为若确定所述至少一个协程中的第一协程的运行时长超过所述第一协程的第一预设运行时长,则通过所述监控线程终止运行所述第一协程。
可选的,所述装置还包括记录模块,设置为:
在所述监控模块通过监控线程对所述第一进程中的至少一个协程进行监控之前,在运行所述至少一个协程时,通过所述至少一个协程分别记录各自运行的起始时刻,以及分别记录各自预设运行时长。
可选的,所述装置还包括初始化模块,设置为:
在所述终止模块终止运行所述第一协程之后,对所述第一协程进行初始化。
可选的,所述装置还包括操作模块,设置为:
在所述初始化模块对所述第一协程进行初始化之后,重新运行所述第一协程。
可选的,所述装置还包括注销模块,设置为:
在所述监控模块通过监控线程对所述第一进程中的至少一个协程进行监控之后,若所述至少一个协程中的第二协程在运行完毕时的运行时长未超过所述第二协程的第二预设运行时长,则通过所述第二协程注销记录的所述第二协程运行的起始时刻以及所述第二预设运行时长。本公开还提供了一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述方法。
本公开还提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行上述的方法。
在一个进程中,可以运行监控线程,通过监控线程对需要监控的协程(即至少一个协程)进行监控,而需要监控的协程也预先分别记录有预设运行时长,那么若监控线程监控到有运行时长超过相应的预设运行时长的协程,则监控线程可以终止运行该协程,这样可以让其他协程继续运行,避免因一个协程出现异常而导致其他协程都无法正常运行的情况出现,也尽量避免了系统延迟的情况,提高了网络质量。
附图说明
图1为协程、线程及进程的层次关系示意图;
图2为本公开实施例中协程监控方法的流程图;
图3为本公开实施例中一种应用场景示意图;
图4为本公开实施例中协程监控装置的结构框图;以及
图5是本公开实施例中的电子设备的结构示意图。
具体实施方式
为使本公开实施例的技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。在不冲突的情况下,实施例和实施例中的特征可以相互任意组合。
请参见图1,为进程(process)、线程和协程的层次关系,进程是资源的一个组织和分配单位,进程中所有的执行流程使用同样的进程空间。线程是系统的一个调度单位,进程下的多个线程在操作系统的调度下使用共同的进程空间协同工作,在多个内核的CPU环境下,可以多个线程同时运行。协程是一个线程下的二级调度,操作系统看不到这一层,同一个线程下的协程是顺序执行的,因此他们之间可以不用考虑对公共资源的互斥保护。
一种场景中,OpenStack中的Nova-compute服务(一个负责完成虚拟机操作指令的进程,驻留在计算节点上),为了能够及时地响应用户对虚拟机的操作指令,采用了多线程、及线程下面多协程这样的3级管理机制(可参见图1)。一些物理资源的使用信息都是多个线程共用的,也就存在以下问题:
Nova-compute服务会周期性地获取本地的物理资源信息上报到OpenStack的数据库中,如果是不同线程下的协程需要使用这些物理资源,那么需要对资源操作的流程加互斥锁,保证在一个时刻只有一个线程能够访问资源。当消息服务器异常等原因导致资源上报等流程阻塞后,所有需要对该资源进行操作的协程因为等不到用于进行数据操作的锁,只能都处于等待状态。而这个时候服务状态可能在表面看上去还是正常的,因此需要系统及时发现并自愈。
本公开实施例充分考虑到以上问题,在一个进程中,可以运行监控线程,通过监控线程对需要监控的协程(即至少一个协程)进行监控,而需要监控的协程也预先分别记录预设运行时长,那么若监控线程监控到有运行时长超过相应的预设运行时长的协程,则监控线程可以终止运行该协程,这样可以让其他 协程继续运行,避免因一个协程出现异常而导致其他协程都无法正常运行的情况出现,也尽量避免了系统延迟的情况,提高了网络质量。
请参见图2,本公开实施例提供了一种协程监控方法,该方法的流程描述如下。
在步骤210中,在第一进程中,通过监控线程对第一进程中的至少一个协程进行监控,确定至少一个协程中是否有协程的运行时长超过预设运行时长;其中,至少一个协程中的每个协程均预先记录有该协程的预设运行时长。
在步骤220中,若确定至少一个协程中的第一协程的运行时长超过第一协程的第一预设运行时长,则通过监控线程终止运行第一协程。
本公开实施例中,第一进程可以是系统中运行的任意一个进程,即,监控进程可以运行在系统中的任意一个进程中,比如,只要想对一个进程中的协程进行监控,就可以在该进程中运行监控进程。
本公开实施例中,监控线程可以周期性运行,或者也可以在需要时运行。
本公开实施例中,如果确定一个协程需要被监控,那么可以对该协程进行一些处理,在处理之后,该协程在运行时可以记录运行的起始时刻,以及可以记录该协程的预设运行时长。为该协程设定预设运行时长,可以由用户完成,或者也可以由系统自动完成。例如,一种处理方式如下:
通过python(一种计算机程序设计语言)装饰器,对协程进行装饰。
Python是一种面向对象、解释型计算机程序设计语言。Python装饰器,就是把函数包装一下,为函数添加一些附加功能,但不改变函数原有的处理流程。例如,装饰器也是一个函数,参数为被包装的函数,返回包装后的函数。
假设装饰函数为Function1(),被装饰的函数是Function2(),则可以在声明函数Function2的前面加上@Function1(),即:
@Function1()
Function2()
这样在调用Function2的时候,会自动调用Function1中为function2补充的附加功能。
那么,本公开实施例中,可以定义这个装饰器,即定义一个函数,例如将 该函数称为监控函数,将该监控函数附加到被监控的协程中(即将该监控函数附加到被监控的协程所在的函数),就可以进行监控。
例如,本公开实施例中,定义的监控函数可以完成以下任务:
在系统的登记表中追加一条记录,登记本协程为一个检测点;
记录下协程的进入时间,即运行的起始时刻,以及能够容忍的该协程的运行时长,即该协程的预设运行时长;以及
在该协程执行完原有流程后,若未超过该协程的预设运行时长,则在登记表中删除本条记录。
在定义好监控函数后,可以将该监控函数附加到被监控的协程所在的函数,这样就完成了对协程的“装饰”。
可选的,在通过监控线程对第一进程中的至少一个协程进行监控之前,所述方法还包括:
在运行至少一个协程时,通过至少一个协程分别记录各自运行的起始时刻,以及分别记录各自的预设运行时长。
例如至少一个协程均可以为第一进程中被“装饰”过的协程,其中,至少一个协程可以是第一进程中的全部协程,或者也可以是第一进程中的部分协程,例如可以是第一进程中需要访问同一公共资源的协程。
在一个协程被装饰完毕后,该协程开始运行时,通过监控函数,该协程在执行原有的流程前,可以在登记表中注册一条记录,以记录该协程运行的起始时刻,以及该协程的预设运行时长。在记录完毕后,该协程开始执行原有的流程。
例如,一个进程中有多个协程被装饰,那么这些协程所记录的运行的起始时刻自然可能各自不同,或者也可能有的相同,以及这些协程所记录的各自的预设运行时长也可能相同或不同。即,预设运行时长可以针对每个协程设定,为符合不同协程的不同运行需求。
而在监控线程开始运行时,监控线程可以负责对一个进程中所有被装饰过的协程进行监控,这些协程可能受一个线程调度,或者也可能受多个线程调度。
如果监控线程发现被监控的协程中有协程的运行时长超过了该协程记录的 预设运行时长,那么监控线程可以终止运行该协程,从而尽量避免该协程影响到其他协程的运行。
可选的,在终止运行第一协程之后,所述方法还包括:
对第一协程进行初始化。
在终止运行第一协程后,可以对第一协程进行初始化,因为协程也能保留上一次调用时的状态(即所有局部状态的一个特定组合),每次过程重入时,就相当于进入上一次调用的状态,如果不对第一协程进行初始化,那么如果第一协程确实出现了异常或故障,而未被排除,下次第一协程再运行时,可能又会进入上一次调用的状态,可能会导致第一协程继续运行失败。因此,对第一协程进行初始化,可以避免第一协程再次进入上次的异常状态,尽量保证第一协程下次能够成功运行。
如果检测到一个被监控的协程运行超时(即超过了该协程记录的预设运行时长)了,则监控线程不仅可以终止该协程,还可以将这个协程重新初始化。如前面的场景描述,造成其他协程无法执行的这种阻塞现象的最为常见的原因是本地和消息服务器建立的链接出现了异常,或者本地在一直等待一个已经丢失了的消息等,而且由于这些协程处理的流程都是周期性的,一个周期执行失败对系统基本没有影响,所以采用一种简单高效的自愈策略就是重新把这个协程复位一下,这样它会重新和服务器发起建链以及重新发起周期的处理流程,从而解决前面的阻塞问题。
可选的,在对第一协程进行初始化之后,所述方法还包括:
重新运行第一协程。
在对第一协程进行初始化之后,在下个周期到来时,可以重新运行第一协程,以继续实现第一协程能够实现的功能。
可选的,在通过监控线程对第一进程中的至少一个协程进行监控之后,所述方法还包括:
若至少一个协程中的第二协程在运行完毕时的运行时长未超过第二协程的第二预设运行时长,则通过第二协程注销记录的第二协程运行的起始时刻以及第二预设运行时长。
如果被监控的协程的运行时长未超过该协程事先记录的预设运行时长,那 么监控线程不会控制终止该协程,该协程可以自行停止运行,在停止运行之前,该协程可以自动在登记表中注册之前记录的该协程运行的起始时刻以及预设运行时长,可以避免登记表中需要存储过多的内容,而且也便于该协程在下次被调度时能够重新进行记录,避免因一个协程的记录太多而导致混乱。
下面介绍一种场景,以及在该场景下如何运用本公开实施例中的技术方案。
请参见图3,OpenStack的计算节点上的Nova-compute服务中有大量的定时周期执行的协程,以周期更新计算节点资源信息的协程和周期上报Nova-compute服务心跳保活信息的协程这两个协程最为常见,例如分别将这两个协程称为协程1和协程2。
这两个协程,都会通过消息服务器(比如使用高级消息队列协议(Advanced Message Queuing Protocol,AMQP)的QPID(一种面向对象的消息中间件)或者rabbitMQ(一种企业消息系统)等)来把数据上报给OpenStack中的数据库。当消息服务器重启等异常流程出现后,计算节点和消息服务器创建的链接可能会出现异常,或者一些应答消息可能会丢失,导致消息发送端(例如协程1或协程2)会一直处于等待应答状态,处于等待应答状态的协程不会释放已经申请的数据锁。其他的协程就会一直请求不到用于操作数据的锁,而也不得不一直处于等待互斥锁的状态。
虚拟机的操作动作,一般都是系统管理员现场操作,或者系统在容灾备份、业务弹性伸缩的时候发起,对实时性要求相对较高,不允许长时间得不到响应,更严重的是消息服务器引起的这种异常是无法自己恢复的,也就是左侧的虚拟机流程中等待应答的协程可能无论等多长时间也会一直处于等待应答状态,这样其他的协程就会一直无法运行,这就会导致系统的延迟,甚至会导致系统的故障,所以终止这种异常状态极为重要。
采用本公开实施例中提供的技术方案,可以定义监控函数,将监控函数附加到需监控的协程中,则被监控的协程在运行时,会在执行原有流程前在登记表中注册一条记录,以记录该协程运行的起始时刻以及该协程的预设运行时长,转去执行原有的协程流程。
监控线程例如是一个周期执行的流程,监控线程可以周期地检查登记表中的每条登记的记录是否已经超时,如果超时,则将这条记录对应的协程终止运行,并对该协程进行初始化,例如可以初始化它和消息服务器的链接关系,到 下个周期时该协程可重新运行,因为对该协程进行了初始化,也就自动解除了阻塞,使系统能够自愈。
如果一个被监控的协程在执行完原有的协程流程后的运行时长未超过该协程事先记录的预设运行时长,那么监控进程不会终止该协程,该协程可以正常运行,在该协程执行完原有的协程流程后,可以在登记表中注销之前注册的记录。
通过本公开实施例所提供的方案,缓解了OpenStack系统因部分协程阻塞而导致看似一切正常的情况下没法执行虚拟机的正常操作指令的现象。另外,本公开的设计原理同样可以适用于所有的python进程下死锁死循环的检测等。
请参见图4,本公开实施例提供一种协程监控装置,该装置可以包括监控模块401和终止模块402。该装置中的模块可以通过硬件方式实现,或者也可以通过软件方式实现,也可以通过硬件和软件的结合的方式实现。
监控模块401设置为在第一进程中,通过监控线程对第一进程中的至少一个协程进行监控,确定至少一个协程中是否有协程的运行时长超过预设运行时长;其中,至少一个协程中的每个协程均预先记录有该协程的预设运行时长。
终止模块402设置为若确定至少一个协程中的第一协程的运行时长超过第一协程的第一预设运行时长,则通过监控线程终止运行第一协程。
可选的,所述装置还包括记录模块,设置为:
在监控模块401通过监控线程对第一进程中的至少一个协程进行监控之前,在运行至少一个协程时,通过至少一个协程分别记录各自运行的起始时刻,以及分别记录各自的预设运行时长。
可选的,所述装置还包括初始化模块,设置为:
在终止模块402终止运行第一协程之后,对第一协程进行初始化。
可选的,所述装置还包括操作模块,设置为:
在初始化模块对第一协程进行初始化之后,重新运行第一协程。
可选的,所述装置还包括注销模块,设置为:
在监控模块401通过监控线程对第一进程中的至少一个协程进行监控之后,若至少一个协程中的第二协程在运行完毕时的运行时长未超过第二协程的第二 预设运行时长,则通过第二协程注销记录的第二协程运行的起始时刻以及第二预设运行时长。
本公开实施例中,在一个进程中,可以运行监控线程,通过监控线程对需要监控的协程进行监控,而需要监控的协程也预先分别记录有的预设运行时长,那么若监控线程监控到有运行时长超过相应的预设运行时长的协程,则监控线程可以终止运行该协程,这样可以让其他协程继续运行,避免因一个协程出现异常而导致其他协程都无法正常运行的情况出现,也尽量避免了系统延迟的情况,提高了网络质量。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述功能单元的划分进行举例说明,实际应用中,可以根据需要将上述功能分配由不同的功能单元完成,即将装置的内部结构划分成不同的功能单元,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的工作过程,可以参考前述方法实施例中的对应过程。
在本申请所提供的实施例所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据需要选择其中的部分或者全部单元来实现本实施例方案。
另外,本申请实施例中的功能单元可以集成在一个处理单元中,也可以是每个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本公开实施例还提供了一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述任一实施例中的方法。
本公开还提供了一种电子设备的结构示意图。参见图5,该电子设备包括:
至少一个处理器(processor)501,图5中以一个处理器501为例;和存储器(memory)502,还可以包括通信接口(Communications Interface)503和总线504。其中,处理器501、通信接口503、存储器502可以通过总线504完成相互间的通信。通信接口504可以用于信息传输。处理器501可以调用存储器502中的逻辑指令,以执行上述实施例的方法。
此外,上述的存储器502中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器502作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器501通过运行存储在存储器502中的软件程序、指令以及模块,从而执行功能应用以及数据处理,即实现上述方法实施例中的协程监控方法。
存储器502可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非易失性存储器。所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。本申请的技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质。
工业实用性
本公开实施例提供的协程监控方法及装置,减少了系统延迟,提高了网络质量。

Claims (11)

  1. 一种协程监控方法,包括:
    在第一进程中,通过监控线程对所述第一进程中的至少一个协程进行监控,确定所述至少一个协程中是否有协程的运行时长超过预设运行时长;其中,所述至少一个协程中的每个协程均预先记录有该协程的预设运行时长;以及
    若确定所述至少一个协程中的第一协程的运行时长超过所述第一协程的第一预设运行时长,则通过所述监控线程终止运行所述第一协程。
  2. 如权利要求1所述的方法,在通过监控线程对所述第一进程中的至少一个协程进行监控之前,所述方法还包括:
    在运行所述至少一个协程时,通过所述至少一个协程分别记录各自运行的起始时刻,以及分别记录各自的预设运行时长。
  3. 如权利要求1或2所述的方法,在终止运行所述第一协程之后,所述方法还包括:
    对所述第一协程进行初始化。
  4. 如权利要求3所述的方法,在对所述第一协程进行初始化之后,所述方法还包括:
    重新运行所述第一协程。
  5. 如权利要求4所述的方法,在通过监控线程对所述第一进程中的至少一个协程进行监控之后,所述方法还包括:
    若所述至少一个协程中的第二协程在运行完毕时的运行时长未超过所述第二协程的第二预设运行时长,则通过所述第二协程注销记录的所述第二协程运行的起始时刻以及所述第二预设运行时长。
  6. 一种协程监控装置,包括:
    监控模块,设置为在第一进程中,通过监控线程对所述第一进程中的至少 一个协程进行监控,确定所述至少一个协程中是否有协程的运行时长超过预设运行时长;其中,所述至少一个协程中的每个协程均预先记录有该协程的预设运行时长;以及
    终止模块,设置为若确定所述至少一个协程中的第一协程的运行时长超过所述第一协程的第一预设运行时长,则通过所述监控线程终止运行所述第一协程。
  7. 如权利要求6所述的装置,所述装置还包括记录模块,设置为:
    在所述监控模块通过监控线程对所述第一进程中的至少一个协程进行监控之前,在运行所述至少一个协程时,通过所述至少一个协程分别记录各自运行的起始时刻,以及分别记录各自的预设运行时长。
  8. 如权利要求6或7所述的装置,所述装置还包括初始化模块,设置为:
    在所述终止模块终止运行所述第一协程之后,对所述第一协程进行初始化。
  9. 如权利要求8所述的装置,所述装置还包括操作模块,设置为:
    在所述初始化模块对所述第一协程进行初始化之后,重新运行所述第一协程。
  10. 如权利要求9所述的装置,所述装置还包括注销模块,设置为:
    在所述监控模块通过监控线程对所述第一进程中的至少一个协程进行监控之后,若所述至少一个协程中的第二协程在运行完毕时的运行时长未超过所述第二协程的第二预设运行时长,则通过所述第二协程注销记录的所述第二协程运行的起始时刻以及所述第二预设运行时长。
  11. 一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-5中任一项的方法。
PCT/CN2016/101467 2015-10-15 2016-10-08 协程监控方法及装置 WO2017063521A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510665928.6A CN106598801A (zh) 2015-10-15 2015-10-15 一种协程监控方法及装置
CN201510665928.6 2015-10-15

Publications (1)

Publication Number Publication Date
WO2017063521A1 true WO2017063521A1 (zh) 2017-04-20

Family

ID=58517814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101467 WO2017063521A1 (zh) 2015-10-15 2016-10-08 协程监控方法及装置

Country Status (2)

Country Link
CN (1) CN106598801A (zh)
WO (1) WO2017063521A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955520A (zh) * 2019-11-05 2020-04-03 中国电力科学研究院有限公司 一种面向电能表的多任务调度方法和系统
CN112015551A (zh) * 2020-08-26 2020-12-01 京东方科技集团股份有限公司 协程池的管理方法及装置
CN112905267A (zh) * 2019-12-03 2021-06-04 阿里巴巴集团控股有限公司 虚拟机接入协程库的方法、装置及设备
CN113641506A (zh) * 2021-07-02 2021-11-12 的卢技术有限公司 基于Golang语言的多协程同步屏障方法、装置
CN117032844A (zh) * 2023-10-07 2023-11-10 北京集度科技有限公司 一种协程链路追踪装置、方法及智能车辆

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329812B (zh) * 2017-06-09 2018-07-06 腾讯科技(深圳)有限公司 一种运行协程的方法和装置
CN107463438B (zh) * 2017-08-03 2020-09-08 郑州云海信息技术有限公司 用于多Openstack环境的信息处理方法、装置和系统
CN109257411B (zh) * 2018-07-31 2021-12-24 平安科技(深圳)有限公司 一种业务处理方法、调用管理系统和计算机设备
CN110618868A (zh) * 2019-08-29 2019-12-27 凡普数字技术有限公司 对数据进行批量写入的方法、装置以及存储介质
CN112181600B (zh) * 2020-10-21 2021-07-13 甘肃柏隆电子商务科技有限责任公司 一种云计算资源管理方法及系统
CN116663868B (zh) * 2023-08-01 2024-04-19 江门市科能电子有限公司 Pcb线路板组装进度监控系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1967487A (zh) * 2005-11-18 2007-05-23 Sap股份公司 使用协程和线程的协同调度
CN103473031A (zh) * 2013-01-18 2013-12-25 龙建 协同并发式消息总线、主动构件组装模型及构件拆分方法
CN103473032A (zh) * 2013-01-18 2013-12-25 龙建 独立主动构件和可运行主动构件组装模型及构件拆分方法
CN104142858A (zh) * 2013-11-29 2014-11-12 腾讯科技(深圳)有限公司 阻塞任务调度方法及装置
US20150220352A1 (en) * 2014-02-05 2015-08-06 Travis T. Wilson Method and System for Executing Third-Party Agent Code in a Data Processing System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1967487A (zh) * 2005-11-18 2007-05-23 Sap股份公司 使用协程和线程的协同调度
CN103473031A (zh) * 2013-01-18 2013-12-25 龙建 协同并发式消息总线、主动构件组装模型及构件拆分方法
CN103473032A (zh) * 2013-01-18 2013-12-25 龙建 独立主动构件和可运行主动构件组装模型及构件拆分方法
CN104142858A (zh) * 2013-11-29 2014-11-12 腾讯科技(深圳)有限公司 阻塞任务调度方法及装置
US20150220352A1 (en) * 2014-02-05 2015-08-06 Travis T. Wilson Method and System for Executing Third-Party Agent Code in a Data Processing System

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955520A (zh) * 2019-11-05 2020-04-03 中国电力科学研究院有限公司 一种面向电能表的多任务调度方法和系统
CN112905267A (zh) * 2019-12-03 2021-06-04 阿里巴巴集团控股有限公司 虚拟机接入协程库的方法、装置及设备
CN112905267B (zh) * 2019-12-03 2024-05-10 阿里巴巴集团控股有限公司 虚拟机接入协程库的方法、装置及设备
CN112015551A (zh) * 2020-08-26 2020-12-01 京东方科技集团股份有限公司 协程池的管理方法及装置
CN112015551B (zh) * 2020-08-26 2024-06-04 京东方科技集团股份有限公司 协程池的管理方法及装置
CN113641506A (zh) * 2021-07-02 2021-11-12 的卢技术有限公司 基于Golang语言的多协程同步屏障方法、装置
CN117032844A (zh) * 2023-10-07 2023-11-10 北京集度科技有限公司 一种协程链路追踪装置、方法及智能车辆
CN117032844B (zh) * 2023-10-07 2024-01-09 北京集度科技有限公司 一种协程链路追踪装置、方法及智能车辆

Also Published As

Publication number Publication date
CN106598801A (zh) 2017-04-26

Similar Documents

Publication Publication Date Title
WO2017063521A1 (zh) 协程监控方法及装置
JP7007425B2 (ja) 部分的にオフロードされた仮想化マネージャにおけるメモリ割当て技術
US10621005B2 (en) Systems and methods for providing zero down time and scalability in orchestration cloud services
JP5258019B2 (ja) アプリケーション・プロセス実行の範囲内での非決定論的オペレーションを管理、ロギング、またはリプレイするための予測方法
JP4353005B2 (ja) クラスタ構成コンピュータシステムの系切替方法
US11216220B2 (en) Resolving failed or hanging mount points in a clustered storage solution for containers
KR101970839B1 (ko) 서비스의 2차 위치에서의 작업의 재생 기법
JP5519909B2 (ja) アプリケーション・プロセスにおいて内部イベントをリプレイするための非侵入的方法およびこの方法を実装するシステム
US11537430B1 (en) Wait optimizer for recording an order of first entry into a wait mode by a virtual central processing unit
CN104199753B (zh) 一种虚拟机应用服务故障恢复系统及其故障恢复方法
BR112020004404A2 (pt) dispositivo terminal de gerenciamento de alta disponibilidade, e, método de gerenciamento de um dispositivo terminal de gerenciamento.
BR112020004408A2 (pt) dispositivo do nó de computação de alta disponibilidade, e, método de gerenciamento do dispositivo de nó de computação.
WO2016045439A1 (zh) 一种vnfm容灾保护的方法、装置和nfvo、存储介质
US7840940B2 (en) Semantic management method for logging or replaying non-deterministic operations within the execution of an application process
Heidari et al. Qos assurance with light virtualization-a survey
WO2016106756A1 (zh) 一种容灾方法、系统和装置
US9015534B2 (en) Generation of memory dump of a computer process without terminating the computer process
BR112020004407A2 (pt) sistema de alta disponibilidade de uma máquina virtual openstack para impedir split-brain.
US7536587B2 (en) Method for the acceleration of the transmission of logging data in a multi-computer environment and system using this method
US9588685B1 (en) Distributed workflow manager
Scordino et al. Real-time virtualization for industrial automation
WO2014086150A1 (zh) 快照方法、业务节点、主控节点及系统
US7533296B2 (en) Method for optimizing the transmission of logging data in a multi-computer environment and a system implementing this method
CN109558254B (zh) 异步回调方法、系统、装置及计算机可读存储介质
WO2017107828A1 (zh) 一种节点重启后的数据处理方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16854900

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16854900

Country of ref document: EP

Kind code of ref document: A1