WO2018094739A1 - 一种处理业务的方法、业务节点、控制节点和分布式系统 - Google Patents
一种处理业务的方法、业务节点、控制节点和分布式系统 Download PDFInfo
- Publication number
- WO2018094739A1 WO2018094739A1 PCT/CN2016/107504 CN2016107504W WO2018094739A1 WO 2018094739 A1 WO2018094739 A1 WO 2018094739A1 CN 2016107504 W CN2016107504 W CN 2016107504W WO 2018094739 A1 WO2018094739 A1 WO 2018094739A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- service
- node
- feature information
- accessed
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Definitions
- Embodiments of the present invention relate to the field of storage technologies, and in particular, to a method, a service node, a control node, and a distributed system for processing a service.
- distributed systems In order to ensure high availability, distributed systems generally adopt multi-node redundancy; when an abnormality occurs in a single node, redundant nodes can immediately take over services and ensure business continuity.
- a typical distributed system that can support thousands of online nodes at the same time.
- the software running on all nodes is homogeneous software, when the software defect causes a node abnormality (a serious consequence such as a reset), other online nodes take over the business, and the high probability will trigger the same software defect, the same occurs.
- the abnormality causes the nodes in the system to have abnormalities one after another, which ultimately causes serious consequences of cluster redundancy failure and service interruption. In the popular sense, we call such problems as a common cause failure causing multiple nodes to successively reset.
- the present application provides a method for processing a service, a service node, a control node, and a distributed system, to improve system stability.
- the present application provides a method for processing a service, where the distributed system includes a control node and at least two service nodes, and the at least two service nodes include a first service node and a first Two business nodes.
- the method includes: the control node receives service characteristic information of a service that is abnormally reported by the first service node; the control node generates a control instruction according to the service feature information, and sends the control command to the
- the second service node is configured to instruct the second service node to refuse to process the service characterized by the service feature information.
- the second service node is a service node that has a service backup relationship with the first service node, and the control command includes the service feature information of the service that is abnormally reported by the first service node.
- control command may also be sent to the service node that has not failed, and has a service backup relationship with the first service node or sent to the service node that has a service backup relationship with the first service node. Gave other business nodes. According to the above method, when the service node that has received the control command has a service to be processed, the service can be refused according to the control command, thereby avoiding the abnormality caused by the same reason, thereby improving the stability of the system.
- the control node stores abnormal service information and a management policy for managing abnormal traffic, and the abnormal service information includes service characteristic information of the service in which the abnormality has occurred and the number of times the abnormality occurs.
- the process of generating the control instruction includes: the control node updating the abnormal service information according to the received service feature information, and determining the service feature information and the corresponding abnormality in the updated abnormal service information.
- the control command is generated when the number of times meets the condition for performing the control in the management policy.
- the setting can be set according to actual requirements, so that the control node has more diversified management and control of abnormal services and has better adaptability.
- the service feature information of the locally stored service to be controlled is updated according to the service feature information in the control command.
- the second service node obtains the service feature information of the service to be accessed, and determines to refuse to process the service to be accessed according to the service feature information of the service to be accessed and the service feature information of the updated service to be controlled. .
- the process of obtaining the service feature information of the to-be-accessed service before the second service node accesses the service includes: obtaining, by the second service node, feature information of the to-be-accessed service from the received service access request, where The service access request includes the service feature information of the service to be accessed; or the second service node obtains the service feature information of the service to be accessed according to the management maintenance task.
- the second service node thus controlling the control command received by the node, includes the management duration. Then, after receiving the control command, the second service node records the service feature information in the control command as the service feature information of the service to be controlled, and starts a timer, and sets the duration of the timer to the control duration. Before the service is accessed, the second service node obtains the service feature information of the service to be accessed, and determines to refuse to process the to-be-accessed service according to the service feature information of the service to be accessed and the updated abnormal service information before the timer expires. Or, at timing After the device times out, it is not controlled.
- the method before the control node receives the service feature information of the abnormality service, the method further includes: the first service node starts a management maintenance task, and according to the to-be-accessed service accessed by the management maintenance task The service feature information and the locally stored service feature information of the service to be controlled are determined to execute the management and maintenance task; when the first service node executes the management and maintenance task, an exception is triggered, according to the ID of the thread that executes the management and maintenance task. Obtaining the service feature information of the abnormality service, and sending the obtained service feature information to the control node.
- the method further includes: the first service node receiving the service access request sent by the control node, according to the service feature of the service to be accessed corresponding to the service access request.
- the information and the locally stored service characteristic information of the service to be controlled determine to execute the service access request; when the service access request is executed, a service abnormality is triggered, and the service access request is obtained according to the ID of the thread executing the service access request.
- the service feature information is sent to the control node by the obtained service feature information.
- the service characteristic information of the service to be controlled may be stored locally on the first service node, or may be stored on the device that the first service node can query.
- the abnormal service information may be stored locally on the second service node or on the device that the second service node can query.
- Each time the service node processes the service it determines whether the received service access request or the self-running task can be executed according to the service characteristic information of the business to be controlled. Thereby, the stability of the entire system can be improved.
- an embodiment of the present invention provides another method for processing a service, where the method is applied to a distributed system, where the distributed system includes a control node and at least two service nodes, and the at least two service nodes include A service node and a second service node, where the second service node is a service node having a service backup relationship with the first service node.
- the method includes: the second service node receives a control command sent by the control node, and updates service characteristic information of the locally stored service to be controlled according to the received service feature information, where the control command is by the control node And generating, according to the service feature information of the service that is abnormally reported by the first service node, the control command includes service feature information of the service to be controlled; and the second service node obtains the service to be accessed before accessing the service And the second service node determines to refuse to process the to-be-accessed service according to the service feature information of the service to be accessed and the service feature information of the updated service to be controlled.
- the second aspect focuses on the second service node receiving the control command. After that, the service access request is controlled according to the information carried in the control command, thereby avoiding the failure of the service node in the distributed system due to the same service access. Improve the stability of the system.
- the process of determining to refuse to process the to-be-accessed service includes: the second service node finding service characteristic information of the service to be accessed in the service feature information of the service to be controlled When the same service feature information is used, the service to be accessed is refused to be processed.
- the obtaining, by the second service node, the service feature information of the service to be accessed before accessing the service includes: obtaining, by the second service node, the service feature of the to-be-accessed service from the received service access request And the service access request includes the service feature information of the to-be-accessed service; or the second service node obtains the service feature information of the to-be-accessed service according to the management and maintenance task.
- the method further includes: the second service node initiating a management maintenance task, according to the service feature information of the service to be accessed accessed by the management maintenance task, and the service of the locally stored service to be controlled
- the feature information determines to execute the management maintenance task; when the second service node generates an abnormality in the execution of the management maintenance task, obtaining a thread context of the thread according to an ID of a thread executing the management maintenance task, from the Obtaining, in a thread context, service characteristic information of the service in which the exception occurs, or obtaining a service context of a service being executed by the thread according to an ID of a thread executing the management maintenance task, and obtaining the abnormality from the service context
- the service characteristic information of the service, and the obtained service feature information is sent to the control node.
- the method further includes: receiving, by the second service node, a service access request sent by the client, according to the service feature information of the service to be accessed corresponding to the service access request, and the need for local storage control
- the service characteristic information of the service determines to execute the service access request; when the second service node triggers a service exception in executing the service access request, the thread of the thread is obtained according to the ID of the thread executing the service access request a context, obtaining, from the thread context, service characteristic information of the abnormally generated service, or obtaining a service context of the service access request according to an ID of a thread executing the service access request, obtaining a location from the service context
- the service characteristic information of the service in which the abnormality is generated is sent to the control node by the obtained service feature information.
- the second service node may also fail when it runs the management and maintenance task or processes the service access request of the client.
- the second service node also obtains the service characteristic information of the service in which the node is abnormal. It is sent to the control server, so that the control server and other service nodes perform management and control, so as to avoid the same failure of other service nodes.
- an embodiment of the present invention provides another method for processing a service, where the method is applied to a distributed system, where the distributed system includes a first service node.
- the method includes: the first service node receives the service feature information of the service in which the other node is abnormal, and updates the abnormally stored service information locally according to the received service feature information, where the abnormal service information includes the service of the abnormal service.
- Feature information the first service node obtains service feature information of the service to be accessed before accessing the service; and the first service node determines to refuse to process the service according to the service feature information of the to-be-accessed service and the updated abnormal service information. Access the business.
- the method provided by the embodiment is that the first service node collects abnormal service information by itself, and determines whether to refuse to process the to-be-accessed service according to the abnormal service information, instead of controlling according to the control node.
- the instruction determines whether to refuse to process the to-be-accessed service. In this way, even if there is no layout control node in the distributed system, the control of abnormal services can be realized, and the stability and stability of the system can be improved.
- the process that the first service node refuses to process the to-be-accessed service according to the service feature information of the to-be-accessed service and the abnormal service information includes: the first service node is in the abnormality When the service feature information that is the same as the service feature information of the to-be-accessed service is found in the service information, the service to be accessed is refused to be processed.
- the first service node stores a management policy for managing abnormal traffic
- the abnormal service feature information further includes an abnormality corresponding to the service feature information of the abnormally generated service.
- the number of times that the first service node refuses to process the to-be-accessed service according to the service feature information of the to-be-accessed service and the abnormal service information includes: the first service node finds and is in the abnormal service feature information
- the service feature information of the to-be-accessed service has the same service feature information and the corresponding number of occurrences of the abnormality; the first service node determines that the found service feature information and the corresponding number of occurrences of the abnormality are consistent with the execution of the control and control policy.
- the condition is met, the service to be accessed is refused to be processed.
- the first service node can also set a timer for the abnormal service according to the management and control policy. Before the timer expires, the service to be accessed is refused according to the service characteristic information of the to-be-accessed service and the abnormal service information; or, after the timer expires, it is not controlled.
- the first service node obtains the service feature information of the to-be-accessed service before accessing the service, and the first service node obtains the service feature information of the to-be-accessed service from the received service access request.
- the service access request includes the service feature information of the to-be-accessed service; or the first service node obtains the service feature information of the to-be-accessed service accessed by the management and maintenance task according to the management maintenance task.
- the distributed system further includes a second service node, where the second service node is a service node having a service backup relationship with the first service node.
- the second service node initiates the management and maintenance task, and according to the service feature information of the to-be-accessed service accessed by the management maintenance task, and the local
- the stored abnormal service information determines that the management and maintenance task is executed; when the second service node generates an abnormality in the execution of the management and maintenance task, obtaining the abnormally generated service according to the ID of the thread that executes the management and maintenance task.
- the service feature information is sent to the first service node.
- the second service node before the first service node obtains the service feature information of the service that is abnormal, the second service node further includes: the second service node receives the service access request sent by the client, according to the service feature information of the service to be accessed, and the local storage.
- the abnormal service information is determined to execute the service access request; when the second service node triggers a service abnormality in executing the service access request, obtaining the abnormally generated service according to the ID of the thread executing the service access request
- the service feature information is sent to the first service node.
- obtaining the service feature information of the abnormally generated service according to the ID of the thread that executes the management and maintenance task may include: obtaining the thread of the thread according to the ID of the thread that executes the management maintenance task. a context, obtaining, from the thread context, service characteristic information of the service in which the exception occurs, or obtaining a service context of a service being executed by the thread according to an ID of a thread executing the management maintenance task, from the service context Obtaining service characteristic information of the service in which the abnormality occurs.
- obtaining the service feature information of the service access request according to the ID of the thread that executes the service access request may include: obtaining the thread of the thread according to the ID of the thread executing the service access request. a context, obtaining, from the thread context, service characteristic information of the abnormally generated service, or obtaining a service context of the service access request according to an ID of a thread executing the service access request, obtaining a location from the service context The service characteristic information of the service in which the abnormality occurs.
- an embodiment of the present invention provides a service node, where the service node is applied to a deployed control.
- the service node includes: a receiving unit, configured to receive a control command sent by the service control node, and update service characteristic information of the locally stored service to be controlled according to the received service feature information, where the control command is determined by the control node according to the The service feature information of the service that is abnormally reported by the other service node is generated, and the control command includes service feature information of the service to be controlled; and the update unit is configured to update the local storage control according to the received service feature information.
- the service characteristic information of the service is further configured to obtain the service feature information of the first to-be-accessed service before accessing the service; the processing unit is configured to use the service feature information of the first to-be-accessed service and the updated requirement.
- the service characteristic information of the managed service determines to refuse to process the to-be-accessed service.
- the service node Before the service node processes the service, it determines whether to reject the service based on whether the service is a controlled service delivered by the control node. This avoids the fact that the service nodes in the distributed system continuously trigger exceptions due to the same service access. Improve the stability of distributed systems.
- the acquiring unit is configured to obtain feature information of the first to-be-accessed service from the received first service access request, where the first service access request includes the The service feature information of the first to-be-accessed service is obtained; or the acquiring unit is configured to obtain the service feature information of the first to-be-accessed service according to the management and maintenance task.
- the service node includes a sending unit, where the processing unit is further configured to start a management maintenance task, and according to the service feature information and service of the second to-be-accessed service accessed by the management maintenance task.
- the feature information determines to execute the management maintenance task, and when an abnormality occurs in the execution of the management maintenance task, obtains a thread context of the thread according to an ID of a thread that executes the management maintenance task, and obtains a context from the thread context.
- the sending unit is configured to send the obtained service feature information.
- the service node includes a sending unit, where the receiving unit is further configured to receive a service access request, where the service access request includes service feature information of a service to be accessed;
- the unit is further configured to determine, according to the service feature information and the abnormal service information of the to-be-accessed service, the thread that executes the service access request, and when the service abnormality is triggered in executing the service access request, according to the thread that executes the service access request ID of the thread obtains the thread context of the thread, obtains service characteristic information of the abnormality-generating service from the thread context, or according to the execution of the service access Obtaining, by the ID of the thread, the service context of the service access request, obtaining the service feature information of the abnormality service from the service context, and the sending unit, configured to send the obtained service feature information .
- the service node further includes a storage unit, where the storage unit is configured to store service feature information of the service to be controlled, and the processing unit uses And when the service feature information that is the same as the service feature information of the to-be-accessed service is found in the service feature information of the service to be controlled, the service to be accessed is refused to be processed.
- an embodiment of the present invention provides another service node, where the service node also includes a receiving unit, an updating unit, an obtaining unit, and a processing unit.
- the receiving unit is configured to receive service feature information of the service in which the other service node is abnormal
- the update unit is configured to update the abnormal service information according to the received service feature information, where the abnormal service information includes the service feature of the abnormally generated service.
- the information obtaining unit is further configured to obtain the service feature information of the first to-be-accessed service before accessing the service
- the processing unit is configured to refuse to process the service according to the service feature information of the first to-be-accessed service and the recorded abnormal service information. Pending business.
- the service node collects abnormalities of other service nodes in the system, and before processing the service, it determines whether to reject the processing based on the collected abnormal service information. This avoids the fact that the service nodes in the distributed system continuously trigger exceptions due to the same service access. Improve the stability and stability of distributed systems.
- the acquiring unit is configured to obtain feature information of the first to-be-accessed service from the received first service access request, where the first service access request includes the The service feature information of the first to-be-accessed service is obtained; or the acquiring unit is configured to obtain the service feature information of the first to-be-accessed service according to the management and maintenance task.
- the service node includes a sending unit, where the processing unit is further configured to start a management maintenance task, and according to service characteristic information and service characteristics of the second to-be-accessed service accessed by the management maintenance task. Determining that the management maintenance task is performed, and when an abnormality occurs in the execution of the management maintenance task, obtaining a thread context of the thread according to an ID of a thread executing the management maintenance task, obtaining the thread context from the thread context Obtaining service characteristic information of the abnormal service, or obtaining a service context of the service being executed by the thread according to the ID of the thread executing the management maintenance task, and obtaining service characteristic information of the abnormally generated service from the service context
- the sending unit is used And transmitting the obtained service feature information.
- the service node further includes a sending unit, where the receiving unit is further configured to receive a service access request, where the service access request includes service feature information of the service to be accessed;
- the processing unit is further configured to determine, according to the service feature information and the abnormal service information of the to-be-accessed service, the execution of the service access request, and execute the service access request according to the execution of the service access request when the service access request is triggered.
- the ID of the thread obtains the thread context of the thread, obtains the service feature information of the service in which the exception occurs from the thread context, or obtains the service of the service access request according to the ID of the thread executing the service access request.
- Context, the service feature information of the abnormality-generating service is obtained from the service context, and the sending unit is configured to send the obtained service feature information.
- the service node further includes a storage unit, where the storage unit is configured to record the abnormal service feature information, and the processing unit is configured to find and describe the abnormal service information.
- the service feature information of the service to be accessed has the same service feature information, the first to-be-accessed service is refused to be processed.
- the service node further includes a storage unit, the storage unit is configured to record the abnormal service feature information, and store a management policy for managing an abnormal service, where the abnormal service feature information
- the processing unit is configured to find, in the abnormal service feature information, the same as the service feature information of the first to-be-accessed service, the number of occurrences of the abnormality corresponding to the service characteristic information of the abnormally-generating service. The service feature information and the corresponding number of times the abnormality occurs, and the first to-be-accessed service is refused to be processed when it is determined that the found service feature information and the corresponding number of occurrences of the abnormality meet the conditions for performing the control in the control policy.
- an embodiment of the present invention provides a distributed system, where the distributed system includes a control node and at least two service nodes, where the at least two service nodes include a first service node and a second service node, where The control node is configured to receive the service feature information of the service that is abnormally reported by the first service node, generate a control command according to the service feature information, and send the control command to the second service node, where the control command includes the service feature Information, the control instruction is used to indicate that the second service node refuses to process the service characterized by the service feature information; the second service node is configured to receive the control command, according to the service feature in the control instruction Information update service profile of the locally controlled managed business And obtaining the service feature information of the service to be accessed before accessing the service, and refusing to process the to-be-accessed service according to the service feature information of the to-be-accessed service and the updated abnormal service feature information.
- the second service node is a service node that has a service backup relationship with the first
- the second service node is configured to obtain feature information of the to-be-accessed service from the received service access request, where the service access request includes service feature information of the to-be-accessed service. Or, according to the management and maintenance task, obtain the service characteristic information of the to-be-accessed service accessed by the management and maintenance task.
- the first service node is configured to start an administrative maintenance task, determine to perform the management and maintenance task according to the abnormally stored service information, and when an abnormality occurs in the execution of the management and maintenance task, Obtaining, according to the ID of the thread that performs the management and maintenance task, service characteristic information of the abnormality service, and sending the obtained service feature information to the control node.
- the first service node is configured to receive a service access request sent by the control node, perform the service access request according to the locally stored service to be controlled, and perform the
- the service feature information of the service access request is obtained according to the ID of the thread that executes the service access request, and the obtained service feature information is sent to the control node.
- an embodiment of the present invention provides a control node, including a communication interface, a processor, and a memory, where the communication interface, the processor, and the memory are connected by a bus, and the communication interface is used for communicating with the external device and
- the processor is in communication with instructions stored in the memory, the processor executing instructions in the memory to perform the steps performed by the control node in the first aspect above.
- an embodiment of the present invention further provides a service node, including a communication interface, a processor, and a memory, where the communication interface, the processor, and the memory are connected by using a bus, where the communication interface is used for communicating with external
- a service node including a communication interface, a processor, and a memory, where the communication interface, the processor, and the memory are connected by using a bus, where the communication interface is used for communicating with external
- instructions are stored in the memory, the processor executing instructions in the memory to perform the steps of the second and third aspects above.
- an embodiment of the present invention provides a program product, the program product comprising instructions, when the program product is executed by a computer, causing the computer to perform the method of any one of the first aspect to the third aspect.
- the control node or the service node collects service characteristic information of the service in which the abnormality occurs in the distributed system. Based on the collected information, it is decided whether to control the abnormal business, thereby improving the stability of the distributed system.
- 1 is a schematic structural diagram of a distributed distributed system
- FIG. 2 is a schematic flow chart of a method for processing a service
- FIG. 3 is a schematic flow chart of a method for processing a service
- FIG. 4 is a schematic flow chart of a method for processing a service
- FIG. 5 is a schematic flowchart of a method for processing a service
- FIG. 6 is a schematic structural diagram of a distributed system
- FIG. 7 is a schematic structural diagram of a service node
- FIG. 9 is a schematic structural diagram of a control node or a service node.
- distributed distributed system 100 includes a plurality of clients (1, 2, ... N), a plurality of service nodes (1, 2, ... M), and a control node.
- M, N is a natural number greater than or equal to 2.
- the control node may be a metadata server; the service node may be a storage node or a computing node; and the client may be various application servers, file servers, or terminal users.
- a distributed distributed system may also include more than two control nodes. When there are more than one control node in a distributed distributed system, there may be active and standby settings for the control node. For example, one of them is set as the primary control node, and the rest is set as the standby Control node.
- the control node mentioned in the embodiment of the present invention is a control node that is processing a service, and may be a primary control node or a standby control node that takes over the primary control node.
- the node mentioned in the embodiment of the present invention may be a server in a specific application scenario.
- the control node may be a control server
- the storage node may be a storage server
- the computing node may be an authentication server, which is not limited in this application.
- a service node (hereinafter referred to as a first service node) in the distributed system has a service abnormality
- the service node reports the service characteristic information of the abnormal service to the control node
- the control node generates the service feature information according to the received service feature information.
- the control command is sent, and the generated control command is sent to the service node in the distributed system that has a service backup relationship with the service node where the abnormality occurs (hereinafter referred to as the second service node).
- the second service node There may be more than one service node in the distributed system with an exception.
- the other service nodes in the system can also be sent, which is not limited in the embodiment of the present invention.
- the foregoing control command is used to indicate that the service node refuses to process the service characterized by the service characteristic information, and the foregoing control command includes the service feature information of the service in which the abnormality occurs.
- the embodiment of the present invention provides a method for processing a service, as shown in FIG. 2, and the specific process includes:
- the control node receives the service access request 1 sent by the client 1, and sends the service access request 1 to the service node 1.
- the service access request 1 includes the service feature information of the service to be accessed. Alternatively, in some distributed systems, the service node may also receive the service access request 1 directly from the client 1.
- a service node is a storage node, and the service feature information carried in the service access request may include a service object ID, an operation address range, and an operation code.
- the opcode can be used to indicate operations such as read operations, write operations, or file system services.
- the service node may be a storage node, and the service feature information carried in the service access request may be put/get, key, and value.
- the service node may be a computing node, and the service feature information carried in the service access request may be an interface name and an interface parameter (indeterminate quantity). This application does not limit the service node.
- the service may be written to a certain address range of the service node, or read data from a certain address range, and is not limited herein.
- the service node 1 After receiving the service access request 1, the service node 1 determines to execute the service access request 1 according to the service feature information of the service to be accessed and the service feature information of the service to be controlled.
- the service characteristic information of the service to be controlled may be stored locally or may be stored on a device accessible by the service node 1.
- the embodiment of the invention is not limited. If the same service feature information can be found in the recorded service feature information of the service to be managed, it indicates that the service accessed by the service access request 1 is abnormal and needs to be controlled during the previous service access. On the other hand, if the same service feature information is not found in the service feature information of the locally controlled service, it indicates that the service accessed by the service access request 1 has not experienced an access abnormality during the previous service access, so Need to be controlled, or to indicate that the business accessed by the service access request 1 has triggered an exception during the previous business access, but does not need to be controlled.
- the service node 1 invokes a thread to execute the received service access request 1. In the process of executing the service access request by the thread, the thread generates an abnormality.
- the service node 1 obtains service characteristic information of the service in which the abnormality occurs according to the ID of the thread, where the service feature information includes a service object ID of the service in which the abnormality occurs, an address range of the operation, and an operation code.
- the service characteristic information of the service that obtains the abnormality can be obtained in two ways.
- the first mode the thread in the service node 1 acquires the service context of the service access request 1 that the thread is executing according to the ID of the thread, and obtains the service from the obtained service. Get the business feature information of the service in which the exception occurred in the context.
- Manner 2 The thread in the service node 1 obtains the service feature information of the service in which the abnormality occurs according to the thread context corresponding to the ID of the thread.
- the service node 1 control node sends the service feature information of the abnormality service to the control node.
- the control node generates a control instruction according to the service characteristic information of the received abnormal service, where the control instruction includes service characteristic information of the abnormal service.
- the control command is used to indicate how one or more service nodes in the distributed system respond when receiving the same request as the service access request 1 described above. And the control command carries the service characteristic information of the service to be controlled. In this step, the control node generates a control command as soon as it receives the service abnormality reported by the service node.
- control node may also store abnormal service information and a tube for managing service abnormalities.
- the abnormal service information includes service characteristic information of the service in which the abnormality has occurred and the number of times the abnormality occurs.
- the management policy is used to describe under which conditions which business nodes need to refuse processing for a certain service.
- the management and control policy may be: when a number of abnormal times triggered by a service access request exceeds a preset threshold, generate a control command to indicate that the service node in the distributed system has a backup relationship with the service node that has an abnormality, and the service node refuses to process and The service access request is the same request; when the number of abnormalities triggered by a service access request does not exceed a preset threshold, no control command is generated regardless of the control.
- the specific process of generating the control command may be: updating the abnormal service information according to the service characteristic information of the received abnormal service; the control node determining that the service feature information and the corresponding number of occurrences of the abnormality are in compliance with the control A control command is generated when the condition of the control is executed in the policy.
- the abnormal business information and the control strategy described above can be set according to actual needs. Take the management and control strategy as an example. On the one hand, configure different management and control strategies for different types of operations. In other words, different management policies can be configured for operation types such as write operations, read operations, and file system services. If different management policies are configured for different types of operations, when a service abnormality occurs, a management policy corresponding to the operation type represented by the operation code is found according to the operation code in the service characteristic information. On the other hand, the content of the management strategy can also be configured according to actual needs.
- the management policy may include: if the number of abnormalities triggered by a service access request exceeds a preset threshold, sending a control command to other service nodes in the distributed system that have not been abnormal, indicating that the service node refuses to process and The service access request is the same request; if the number of exceptions triggered by a service access request is greater than 1 and the threshold is not exceeded, a temporary control command may be sent.
- the so-called temporary control command is used to indicate that the service node in the distributed system refuses to process the same request for the service access request within a preset time period.
- the issued control order also includes the length of the control.
- the management policy can also limit which service nodes in the distributed system are sent by the control command or the temporary control command.
- the control command or the temporary control command is sent to the service node having the service backup relationship with the failed service node.
- the service access request mentioned above triggers an abnormality in the service node 1
- the abnormality may be triggered on the other service node because of the service access request, and has been reported to the control node. That is, more than one service node in a distributed system triggers an exception due to the same service access request.
- the control node sends the generated control command to the service node in the distributed system that has a service backup relationship with the service node where the abnormality occurs.
- the control node may also send the obtained service characteristic information of the abnormality to the service node in the distributed system where the abnormality does not occur.
- These service nodes that do not have an abnormality include service nodes that have a service backup relationship with the service node where the abnormality occurs.
- only the service node 2 is taken as an example to describe the processing procedure of the service node after receiving the control command. It can be understood that other service nodes that receive the control instructions will also perform the same processing as the service node 2.
- control node may also include sending a prompt message to the client to prompt the client user to intervene.
- the service node 2 After receiving the control command, the service node 2 records the service feature information in the control command as the service feature information of the service to be controlled.
- the control command when the temporary control is determined according to the management policy, the control command also carries the control duration.
- the service node 2 when the service node 2 receives the control command, it starts a timer, and sets the duration of the timer to the control duration.
- the timer can be set for a specific service.
- the client 1 sends a service access request 2 to the service node 2, and the service access request 2 includes service characteristic information of the service requested by the client.
- the service access request 2 sent by other clients may also be used.
- the service node 2 After receiving the service access request 2, the service node 2 determines to reject the received service access request 2 according to the service feature information in the received service access request 2 and the service feature information of the service to be controlled.
- the specific judgment method is the same as that in step 202, except that in this step, the service node 2 finds the same service feature information in the service feature information of the service to be controlled. That is to say, the service access request 2 received by the service node 2 needs to be controlled. Of course, if the service node 2 does not find the same service feature information in the recorded service feature information of the service to be managed, the service node 2 can continue to process the service access request 2.
- the specific process of processing the service access request 2 is the same as the prior art, and is not described here.
- the service node 2 when the service node 2 is provided with a timer, if the service access request 2 described above is received before the timer expires, the service node 2 determines that the service access request 2 needs to be controlled. if If the service access request 2 is received after the timer expires, the service node 2 does not control; and the timer is terminated, and the service characteristic information of the service to be controlled corresponds to the service characteristic information in the service access request 2 Deleted.
- the service node 2 returns a response message rejecting the access to the client 1.
- the control node collects an abnormality in the system.
- the control node may The service nodes that have a service backup relationship with the service node that has an abnormality send control instructions.
- the service nodes can reject the service access request, thereby avoiding the abnormality caused by the same service access request and improving the stability of the system.
- an embodiment of the present invention further provides a method for processing a service access request.
- the process of triggering an abnormality occurs when the service node 1 performs a self-running task.
- the method comprises the following steps:
- the method for determining whether the service to be accessed needs to be controlled is the same as that in step 202.
- the self-running task may be a task performed to ensure the normal operation of the service node. For example, periodic data verification tasks, or periodic hardware status inspection tasks.
- the service feature information may include a service object ID and a task ID. Sometimes, the service feature information may also include an operation address range. Among them, the task ID is the unique identifier of the task being executed.
- the above self-running tasks can also be replaced with other management and maintenance tasks, such as manually triggered configuration, control, maintenance commands, and the like.
- the other task processing is the same as the self-running task, and will not be described here.
- the service node 1 triggers an exception when executing the task 1, and obtains service feature information of the task in which the abnormality occurs by executing a thread context of the thread of the task 1, the service feature information including a task object ID and a task ID of the service causing the abnormality .
- the step adopts the second method, and alternatively, it can also be implemented by the first method. That is to say, the service characteristic information of the abnormal service is obtained from the business context of the service that the thread is executing.
- the service node 1 reports the obtained service feature information to the control node.
- the control node updates the abnormally stored service information according to the service characteristic information of the received abnormal service, and generates a control command according to the updated abnormal service information and a pre-configured management and control policy for managing the service abnormality.
- the instruction includes service characteristic information of the abnormal service.
- step 206 abnormal service information and a control policy for managing service abnormality are locally stored in the control node.
- the control command and the specific processing, refer to the embodiment of FIG. 2, and details are not described herein again.
- this step can also be the same as step 206.
- the control node generates a control command as long as it receives a service abnormality reported by the service node.
- the service node 1 When the service node 1 triggers an exception because it is running a certain task, it may have already triggered an exception on another service node because the self-running task has been reported to the control node. That is, more than one business node in a distributed system triggers an exception because of the same self-running task.
- control node may also include sending a prompt message to the client to prompt the client user to intervene. 305.
- the control node sends the generated control command to the service node in the distributed system that has a service backup relationship with the service node where the abnormality occurs.
- control node may also send the obtained abnormal service characteristic information to the abnormal service node that does not occur in the distributed system.
- service nodes that do not have an abnormality include service nodes that have a service backup relationship with the service node where the abnormality occurs.
- only the service node 2 is taken as an example to describe the processing procedure of the service node after receiving the control command.
- the service node 2 After receiving the control command, the service node 2 records the service feature information in the control command as the service feature information of the service to be controlled.
- the control command when the temporary control is determined according to the management policy, the control command also carries the control duration.
- the service node 2 receives the control command, it starts a timer, and sets the duration of the timer to the control duration.
- the service node 2 when the service node 2 starts the self-running task 2, the service node 2 obtains the service feature information of the service to be accessed accessed by the self-running task 2, and according to the service feature information of the service to be accessed and the service feature information of the service to be controlled Determining to refuse to process the service to be accessed.
- the method for determining whether to start the service to be accessed needs to be controlled is the same as that in step 202.
- the service node 2 finds the same service feature information in the recorded service feature information of the service to be controlled. That is to say, the business accessed from the running task 2 needs to be controlled.
- the service node 2 can continue to execute the self-run task 2.
- the specific process of executing a self-running task is no different from the prior art and will not be described here.
- the service node 2 when the service node 2 is provided with a timer, if the self-run task 2 access to the service is determined before the timer expires, the service node 2 determines that the to-be-accessed service needs to be controlled. If the self-running task 2 access to the service is determined after the timer expires, the service node 2 does not control; and the timer is terminated, and the service feature information of the service to be controlled and the service feature information of the to-be-accessed service are The corresponding record is deleted.
- the control node collects an abnormal situation when the service node performs an administrative maintenance task.
- the control node sends a control command to the service nodes in the distributed system that have a service backup relationship with the service node where the abnormality occurs.
- these service nodes that receive the control command can refuse to perform the task, thereby improving system stability.
- the control node When the predetermined condition is met, the control node deletes the record related to the service feature information in the locally saved abnormal service information according to the service feature information defined by the predetermined condition; and correspondingly, each service node also according to the predetermined
- the service feature information defined by the condition deletes the record related to the service feature information among the service feature information of the locally-controlled business to be controlled.
- the predetermined conditions can be set according to actual needs, such as user intervention intervention, or abnormal business recovery. With this setup, the control is resettable, more flexible and more adaptable.
- SAN or NAS distributed systems
- an application server or a file server is connected to multiple storage nodes through a network, usually when an application server needs to write data to a storage node or When reading data, the application server or file server is equivalent to the client that initiates the service access request, and the service access request is sent to the storage node through the network.
- an embodiment of the present invention provides a method for processing a service, which is applied to a distributed system without a control node.
- the method comprises the following steps:
- the client 1 sends a service access request 1 to the service node 1, where the service access request 1 includes service feature information of the service to be accessed.
- the service feature information is different in different distributed systems.
- the service feature information herein may include a service object ID, an operation address range, and an operation code.
- the opcode can be used to indicate operations such as read operations, write operations, or file system services.
- the service node 1 After receiving the service access request 1, the service node 1 determines, according to the service feature information and the abnormal service information carried in the service access request 1, the service accessed by the service access request 1.
- the abnormal business information here can be stored locally or on a device accessible by the service node.
- the abnormal service information may include service characteristic information of the service in which the abnormality has occurred, or the service characteristic information of the service in which the abnormality has occurred and the corresponding number of occurrences of the abnormality.
- the setting of the management strategy can also be based on actual needs.
- the abnormal service information includes the service feature information of the service that has been abnormal, and the service feature information that is the same as the service feature information carried in the access request is not found in the local record abnormal service information. That is to say, the service accessed by the service access request does not have an access exception during the previous service access, and thus does not need to be controlled. Conversely, if the same service feature information as the service feature information carried in the access request is found in the recorded abnormal service information, it indicates that the service accessed by the service access request is abnormal and needs to be controlled during the previous service access.
- the abnormal service information includes the service characteristic information of the service in which the abnormality has occurred and the corresponding number of times the abnormality occurs.
- the control policy is: when the number of abnormal times triggered by a service access request exceeds a preset threshold, it needs to be controlled; when the number of abnormalities triggered by a service access request does not exceed the preset threshold, regardless of the control. Then, if the same service feature information is found in the service feature information of the service that has triggered the abnormality locally, it does not indicate that the service accessed by the service access request needs to be controlled. Instead, it is necessary to further determine whether to control according to the management strategy.
- service node 1 determines to receive When the number of abnormalities triggered by the service access request does not exceed the preset threshold, the service accessed by the service access request is determined. Because, although it is possible that another service node has caused an exception due to the same service access request, and the exception has been notified to the service node 1, so that the service node 1 stores the service feature information, it does not mean that Business access requests are subject to control. Conversely, if it is determined that the number of abnormalities triggered by the received service access request exceeds a preset threshold, determining to refuse to perform the service accessed by the service access request.
- the service node 1 invokes a thread to execute a service access request 1.
- the thread executes the service access request 1, the thread generates an abnormality.
- the service node 1 obtains service characteristic information of the service in which the abnormality occurs according to the ID of the thread, where the service feature information includes a service object ID of the service in which the abnormality occurs, an address range of the operation, and an operation code.
- the service node 1 sends the obtained service feature information of the abnormal service to the service node in the distributed system that has a service backup relationship with the service node where the abnormality occurs.
- the service node 1 may also send the determined service characteristic information of the abnormality to the abnormal service node that does not occur in the distributed system.
- These service nodes that do not have an abnormality include service nodes that have a service backup relationship with the service node where the abnormality occurs.
- the service node 2 is taken as an example to describe the processing procedure of the service node after receiving the service feature information of the service in which the abnormality has occurred. It can be understood that other service nodes that receive the service feature information of the service in which the abnormality has occurred will also perform the same processing as the service node 2.
- the storage node 1 may also include sending a prompt message to the client to prompt the client user to intervene.
- the service node 2 updates the recorded abnormal service information according to the service characteristic information of the received abnormal service.
- the abnormal service information can be the same as in the previous step 402 or other embodiments.
- the client 1 sends a service access request 2 and is allocated to the service node 2 by the control node.
- the service access request 2 sent by other clients may also be used.
- the service node 2 After receiving the service access request 2, the service node 2 determines, according to the service feature information carried in the service access request 2 and the recorded abnormal service information, that the access denied by the service access request 2 is denied. business.
- the service node 2 determines that the service access request 2 needs to be controlled. It can be understood that it is possible that the service node other than the service node 1 has also triggered an abnormality due to the same service access request as the service access request 2, and has already notified the service node 2. Therefore, when the service node 2 receives the service access request 2, the abnormal service information recorded on the service node 2 includes the service feature information in the service access request 2, or the service feature information in the service access request 2 according to the service. And the corresponding number of abnormalities determines the condition for executing the management and control in the business access request 2 reviewing the control strategy, thereby triggering the control. For specific control policies, refer to the description of the above embodiments.
- the service node 2 returns a response message rejecting the access to the client 1.
- each service node collects an abnormal situation of other service nodes in the system, and one or more service nodes are abnormal due to an external service access request, and other service nodes that do not have an abnormality are configured according to their own configurations.
- the management policy controls the subsequent received service access requests.
- the embodiment of the present invention further provides a method for processing a service access request, which is applied to a distributed system without a control node.
- the process of triggering an abnormality occurs when the service node 1 performs a self-running task.
- the method comprises the following steps:
- the service feature information of the to-be-accessed service includes a service object ID and a task ID.
- the self-running task and the service feature information are the same as those described in the step 301.
- the self-running task can be replaced with another management and maintenance task.
- the diagnosis in step 301 For details, refer to the description in step 301.
- the abnormal service information and the process of determining whether to perform the management maintenance task refer to the related description in step 402 and other embodiments. 502.
- the service node 1 triggers an exception when executing the task 1, and obtains service feature information of the task in which the abnormality occurs by executing a thread context of the thread of the task 1, the service feature information including the task object ID and the task ID of the service causing the abnormality. .
- the service node 1 sends the obtained service characteristic information of the abnormality to the service node in the distributed system that has a service backup relationship with the service node where the abnormality occurs.
- the service node 1 may also send the determined service characteristic information of the abnormality to the abnormal service node that does not occur in the distributed system.
- These service nodes that do not have an abnormality include service nodes that have a service backup relationship with the service node where the abnormality occurs.
- the service node 2 is taken as an example to describe the processing procedure of the service node after receiving the service feature information of the service in which the abnormality has occurred. It can be understood that other service nodes that receive the service feature information of the service in which the abnormality has occurred will also perform the same processing as the service node 2.
- the service node 2 updates the abnormal service information according to the service characteristic information of the received abnormal service.
- the abnormal service information can be the same as that described in the previous step 402 or other embodiments. I will not repeat them here.
- the service node 2 when the service node 2 starts the self-running task 2, the service node 2 obtains the service feature information of the service accessed by the task 2, and determines the service feature information of the service to be accessed and the recorded abnormal service information to refuse to process the to-be-accessed business.
- FIG. 4 and FIG. 5 can also refer to the embodiment corresponding to FIG. 2 and FIG. 3, and set the management strategy according to the actual situation. For example, if the number of exceptions triggered by a service access request is greater than 1 and the threshold is not exceeded, temporary management can be configured. That is to say, in steps 406 and 504, the service node 2 receives the service feature information of the service in which the abnormality occurs, and when the abnormal service information is updated, it also determines whether the timer corresponding to the service feature information is started. If it has been started, it is cleared and restarted. If it is not started, the timer is started, and the timer is set for the timer, that is, the duration of the control is set.
- step 408 of the above embodiment it is originally determined to refuse to perform the service accessed by the service access request 2.
- the service node 2 is set with a timer, it is necessary to further consider whether the timer expires. That is to say, it is necessary to comprehensively consider the service feature information in the service access request 2, the recorded abnormal service information, and the timer setting to determine whether to perform the service accessed by the service access request 2. If the service access request 2 described above is received before the timer expires, the service node 2 determines that the received service access request needs to be managed. If the above service access request 2 is received after the timer expires, then The service node 2 is not controlled, and may delete the recorded service characteristic information of the service to be controlled.
- the service node 2 when the service node 2 is provided with a timer, it is also necessary to consider whether the timer expires in step 505. If the access to the service from the running task 2 is determined before the timer expires, the service node 2 determines that the received service access request needs to be managed. If the access to the service from the running task 2 is determined after the timer expires, the service node 2 does not control, and may delete the recorded service characteristic information of the service to be controlled.
- the difference from the embodiment of FIG. 4 is that, in this embodiment, when the service node in the distributed system is abnormal due to the execution of the self-running task, the service feature information of the abnormality service is sent to the service node having the service backup relationship. .
- the service node that receives the service feature information of the abnormal service updates its local abnormal service information.
- the service node in the distributed system refuses to perform the task according to the abnormal business information, thereby improving system stability.
- each service node When the predetermined condition is met, each service node also deletes the record related to the service feature information in the locally saved service-controlled service feature information according to the service feature information defined by the predetermined condition.
- the predetermined conditions can be set according to actual needs, such as user intervention in a certain business, or some abnormal business recovery. Through this setting, the management is resettable, making the management business more flexible and adaptable.
- FIG. 6 shows a possible structural diagram of a distributed system including a control node 601 and at least two service nodes (taking the service node 602 and the service node 603 as an example), and the service node 603 is
- the service node 602 has a service node with a service backup relationship.
- the control node 601 is configured to receive the service feature information of the service that is abnormally reported by the service node 602, generate a control command according to the service feature information, and send the control command to the service node 602, where the control command includes the service feature.
- the control command is used to indicate that the service node 602 refuses to process the service characterized by the service feature information; the service node 603 is configured to receive the control command, according to the service feature information in the control command Updating the service characteristic information of the locally controlled service to be controlled, and obtaining the service feature information of the service to be accessed before accessing the service, and refusing to process the service feature according to the service feature information of the to-be-accessed service and the abnormal service feature information
- the service node 603 is configured to obtain the feature information of the to-be-accessed service from the received service access request, where the service access request includes the service feature information of the to-be-accessed service; or, according to the management
- the maintenance task obtains the service feature information of the to-be-accessed service accessed by the management and maintenance task.
- the service node 602 sends the service feature information of the abnormal service to the control node 601 in two possible implementation manners.
- the first type the service node 602 is configured to start the management and maintenance task, determine to perform the management and maintenance task according to the abnormally stored service information, and execute the management and maintenance task when an abnormality occurs in the execution of the management and maintenance task.
- the ID of the thread obtains the service characteristic information of the abnormality service, and sends the obtained service feature information to the control node 601.
- the service node 602 is configured to receive the service access request sent by the control node, execute the service access request according to the locally stored control service, and trigger a service abnormality in executing the service access request. And obtaining the service feature information of the service access request according to the ID of the thread that executes the service access request, and sending the obtained service feature information to the control node 601.
- obtaining the service feature information of the abnormally generated service according to the ID of the thread that executes the management and maintenance task specifically includes: obtaining a thread context of the thread according to an ID of a thread that executes the management maintenance task. Obtaining, from the thread context, service characteristic information of the service in which the abnormality occurs, or obtaining a service context of a service being executed by the thread according to an ID of a thread executing the management maintenance task, obtained from the service context The service characteristic information of the abnormally generated service.
- obtaining the service feature information of the service access request according to the ID of the thread that executes the service access request specifically includes: obtaining the thread of the thread according to the ID of the thread executing the service access request. a context, obtaining, from the thread context, service characteristic information of the abnormally generated service, or obtaining a service context of the service access request according to an ID of a thread executing the service access request, obtaining a location from the service context
- the service characteristic information of the service in which the abnormality occurs Ben
- the embodiment of the present invention also provides a distributed system without deploying a control node.
- the distributed system includes a first service node and a second service node, and the second service node is a service node having a service backup relationship with the first service node.
- the first service node is configured to receive the service feature information of the abnormal service that is sent by the second service node, update the abnormally stored service information according to the received service feature information, and obtain the service feature of the service to be accessed before accessing the service. And determining, according to the service characteristic information of the to-be-accessed service and the updated abnormal service information, that the service to be accessed is refused to be processed.
- the abnormal service information includes service characteristic information of a service in which an abnormality has occurred.
- the second service node is configured to obtain the service feature information of the service that is abnormal on the node, and send the service feature information of the abnormal service to the first service node.
- the second service node is configured to start the management and maintenance task, and determine to perform the management and maintenance task according to the service feature information of the to-be-accessed service accessed by the management and maintenance task and the abnormally stored service information stored locally. Obtaining the service feature information of the abnormally generated service according to the ID of the thread that executes the management and maintenance task, and sending the obtained service feature information to the first Business node.
- the second service node is configured to receive a service access request sent by the client, where the service access request includes service feature information of the service to be accessed, and according to the service feature of the to-be-accessed service
- the information and the locally stored abnormal service information determine to execute the service access request, and when a service abnormality is triggered in executing the service access request, obtain a thread context of the thread according to an ID of a thread that executes the service access request, Obtaining, in the thread context, service characteristic information of the service in which the abnormality occurs, or obtaining a service context of the service access request according to an ID of a thread that executes the service access request, and obtaining the abnormality from the service context.
- the service characteristic information of the service, and the obtained service feature information is sent to the first service node.
- obtaining the service feature information of the abnormally generated service according to the ID of the thread that performs the management and maintenance task specifically: obtaining the thread according to the ID of the thread that executes the management and maintenance task Thread context, obtaining service characteristic information of the abnormality-generating service from the thread context, or obtaining a service context of a service being executed by the thread according to an ID of a thread executing the management maintenance task, from the service context Obtaining service characteristic information of the service in which the abnormality occurs.
- obtaining the industry according to an ID of a thread that executes the service access request specifically includes: obtaining a thread context of the thread according to an ID of a thread that executes the service access request, obtaining service characteristic information of the abnormality service from the thread context, or performing according to the execution
- the ID of the thread of the service access request obtains the service context of the service access request, and obtains the service feature information of the abnormality service from the service context.
- each device in the distributed system for example, the service node, the first service node, and the second service node, in order to implement the above functions, includes corresponding hardware structures and/or software modules for performing the respective functions.
- the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the modules and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
- FIG. 7 shows a possible structural diagram of a service node involved in the present application.
- the function node can implement the functions of the service node 1 and/or the service node 2 in the method embodiment in FIG. 4 and FIG. 5 above.
- the terminology and implementation details not defined in this embodiment may refer to the method embodiments of FIG. 4 and FIG. 5 above.
- the service node may include a receiving unit 701, an updating unit 702, an obtaining unit 703, and a processing unit 704.
- the receiving unit 701 is configured to receive the service feature information of the service in which the abnormality occurs
- the updating unit 702 is configured to update the abnormal service information according to the received service feature information, where the abnormal service information includes the service feature information of the abnormally generated service.
- the obtaining unit 703 is further configured to obtain the service feature information of the first to-be-accessed service before accessing the service, and the processing unit 704 is configured to reject the processing according to the service feature information of the first to-be-accessed service and the recorded abnormal service information. Tell the access business.
- the obtaining unit 703 is configured to obtain the feature information of the first to-be-accessed service from the received first service access request, where the first service access request includes the first
- the service feature information of the service to be accessed is obtained by the acquisition unit 703.
- the acquisition unit 703 is configured to obtain the service feature information of the first to-be-accessed service according to the management maintenance task.
- the service node further includes a sending unit 705, where the processing unit 704 is configured to start a management maintenance task, according to the service feature information of the second to-be-accessed service accessed by the management maintenance task, and the recorded service.
- Feature information determining to execute the self-running task and executing the management dimension
- the ID of the thread of the maintenance task obtains the service context of the service being executed by the thread, and obtains the service feature information of the abnormality service from the service context; and the sending unit 705 is configured to use the obtained service feature The information is sent out.
- the service node further includes a sending unit 705, where the receiving unit 701 is further configured to receive a service access request, where the service access request includes service feature information of the service to be accessed, and the processing unit 704 further uses Determining, according to the service characteristic information of the to-be-accessed service and the recorded abnormal service information, the ID of the thread that executes the service access request when the service access request is executed and the service abnormality is triggered in executing the service access request.
- the service feature information of the service that is abnormal is obtained in the service context; the sending unit 705 is configured to send the obtained service feature information.
- the foregoing may further include a storage unit 706, where the storage unit 706 is configured to store the abnormal service feature information, and the processing unit 704 is configured to find and describe the abnormal service information.
- the processing unit 704 is configured to find and describe the abnormal service information.
- the storage unit 706 is further configured to store the abnormal service feature information and store a control policy for managing abnormal services.
- the abnormal service feature information further includes the number of times the abnormality occurs corresponding to the recorded service feature information.
- the processing unit 704 is configured to find, in the abnormal service feature information, the same service feature information as the service feature information of the first to-be-accessed service and the corresponding number of occurrences of the abnormality, and determine the found When the service characteristic information and the corresponding number of occurrences of the abnormality meet the conditions for performing the management and control in the management and control policy, the first to-be-accessed service is refused to be processed.
- the service node involved in the foregoing method embodiments of FIG. 2 and FIG. 3 may also include a receiving unit, an updating unit, an obtaining unit, and a processing unit.
- the receiving unit is configured to receive the control command sent by the service control node, and update the service feature information of the locally stored service to be controlled according to the received service feature information, where the control command is used by the control node according to other service nodes.
- Reported occurrence The service characteristic information of the service is generated, and the control command includes the service feature information of the service to be controlled; and the update unit is configured to update the service feature information of the locally stored service to be controlled according to the received service feature information.
- the obtaining unit is further configured to obtain the service feature information of the first to-be-accessed service before accessing the service
- the processing unit is configured to use the service feature information of the first to-be-accessed service and the updated service feature of the service to be controlled The information determines to refuse to process the to-be-accessed service.
- the acquiring unit is configured to obtain feature information of the first to-be-accessed service from the received first service access request, where the first service access request includes the The service feature information of the first to-be-accessed service is obtained; or the acquiring unit is configured to obtain the service feature information of the first to-be-accessed service according to the management and maintenance task.
- the service node includes a sending unit, where the processing unit is further configured to start a management maintenance task, and according to the service feature information and the service feature of the second to-be-accessed service accessed by the management maintenance task. Determining that the management maintenance task is performed, and when an abnormality occurs in the execution of the management maintenance task, obtaining a thread context of the thread according to an ID of a thread executing the management maintenance task, obtaining the thread context from the thread context Obtaining service characteristic information of the abnormal service, or obtaining a service context of the service being executed by the thread according to the ID of the thread executing the management maintenance task, and obtaining service characteristic information of the abnormally generated service from the service context
- the sending unit is configured to send the obtained service feature information.
- the service node further includes a sending unit, where the receiving unit is further configured to receive a service access request, where the service access request includes service feature information of a service to be accessed; And determining, according to the service feature information and the abnormal service information of the to-be-accessed service, the execution of the service access request, and when a service abnormality is triggered in executing the service access request, according to a thread that executes the service access request Obtaining, by the ID, a thread context of the thread, obtaining service characteristic information of the abnormality-generating service from the thread context, or obtaining a service context of the service access request according to an ID of a thread executing the service access request, Obtaining, in the service context, the service feature information of the service that is abnormal; the sending unit is configured to send the obtained service feature information.
- the processing unit is configured to: when the service feature information of the to-be-accessed service is found in the service feature information of the service to be controlled, Refusing to process the service to be accessed.
- FIG. 8 shows a possible structural diagram of the control node involved in the above implementation.
- the control node includes a receiving unit 801, an instruction generating unit 802, and a transmitting unit 803.
- the receiving unit 801 is configured to receive service feature information of the abnormally reported service reported by the first service node in the distributed system.
- the instruction generating unit 802 is configured to generate a control instruction according to the service feature information, where the management control information includes the service feature information.
- the sending unit 803 is configured to send the control command to the second service node in the distributed system, to indicate that the second service node refuses to process the service characterized by the service feature information.
- the second service node is a service node that has a service backup relationship with the first service node.
- control node further includes a storage unit 804.
- the storage unit 804 stores abnormal service information and a management and control policy for managing abnormal traffic, and the abnormal service information includes service characteristic information of the service in which the abnormality has occurred and the number of times the abnormality occurs.
- the command generating unit 802 is configured to update the abnormal service information according to the received service feature information, and determine the service feature information in the abnormal service information and the corresponding number of occurrences of the abnormality in the control policy.
- the control command is generated when the condition of the control is executed.
- the service node and the control node involved in the foregoing embodiments of the present invention may be implemented by a processor executing software instructions.
- the software instructions may be composed of corresponding software modules, which may be stored in a memory, such as a random access memory (RAM), a flash memory, a read only memory (ROM), and an erasable memory. Erasable Programmable ROM (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Register, Hard Disk, Mobile Hard Disk, CD-ROM, or any other form well known in the art.
- RAM random access memory
- ROM read only memory
- EPROM Erasable Programmable ROM
- EEPROM Electrically Erasable Programmable Read Only Memory
- Register Hard Disk
- Mobile Hard Disk CD-ROM
- CD-ROM Compact Disk
- An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
- the storage medium can also be an integral part of the processor.
- the service node includes a processor 901, a memory 902, a communication interface 903, and a bus 904.
- the processor 901, the memory 902, and the communication interface are connected to each other through a bus 904.
- the bus 904 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
- PCI peripheral component interconnect
- EISA extended industry standard architecture
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 9, but it does not mean that there is only one bus or one type of bus.
- the communication interface 903 is configured to communicate with the external device and communicate with the processor 901.
- the memory 901 stores computer executable instructions.
- the control node may also include a processor 901, a memory 902, a communication interface 903, and a bus 904, except that the instructions stored in the memory are different.
- the processor 901 executes the instructions in the memory 902, the method embodiment is executed. The function of the control node.
- the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination thereof.
- the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
- Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
- a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
- the functions of the above-described embodiments of the present invention may also be implemented by a computer program product including instructions that, when executed by a computer, cause the computer to perform some or all of the steps of the above method embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Hardware Redundancy (AREA)
Abstract
本发明实施例涉及一种处理业务的方法、业务节点、控制节点和分布式系统。其中,处理业务的方法应用于分布式系统中,具体包括:控制节点接收业务节点上报的发生异常的业务的业务特征信息;根据所述业务特征信息生成管控指令,所述管控指令中包括所述业务特征信息;将该管控指令发送给分布式系统中与发生异常的业务节点有业务备份关系的业务节点。从而让其他业务节点可以拒绝处理同类业务,提高系统的稳定性。
Description
本发明实施例涉及存储技术领域,尤其是一种处理业务的方法、业务节点、控制节点和分布式系统。
为了保证高可用性,分布式系统一般采用多节点冗余;当单个节点发生异常时,冗余节点可以立即接管业务,保证业务连续性。典型的分布式系统,可以同时支持上千个在线节点。对于一个分布式系统,由于所有节点运行的软件是同构软件,当软件缺陷引起节点异常(引发复位等严重后果)时,其他在线节点接管业务,大概率也会触发相同的软件缺陷,发生相同异常,导致系统中节点相继发生异常,最终造成集群冗余失效、业务中断的严重后果。通俗意义上我们称此类问题为共因故障导致多节点相继复位问题。
目前为止,现有技术中没有比较好的解决该问题的方案。
发明内容
有鉴于此,本申请提供了一种处理业务的方法、业务节点、控制节点和分布式系统,用以提高系统的稳定性。
第一方面,本申请提供了一种处理业务的方法,应用于分布式系统中,该分布式系统包括控制节点和至少两个业务节点,而该至少两个业务节点包括第一业务节点和第二业务节点。该方法包括:所述控制节点接收所述第一业务节点上报的发生异常的业务的业务特征信息;所述控制节点根据所述业务特征信息生成管控指令,并将该管控指令发送给所述第二业务节点用于指示所述第二业务节点拒绝处理所述业务特征信息所表征的业务。其中,第二业务节点为与所述第一业务节点有业务备份关系的业务节点,所述管控指令中包括上述第一业务节点上报的发生异常的业务的业务特征信息。
可以理解的是,管控指令也可以是发送给未发生故障的,且与第一业务节点有业务备份关系的业务节点或者除了发送给与第一业务节点有业务备份关系的业务节点外,还发送给了别的业务节点。采用上述的方法,这些收到过管控指令的业务节点有待处理的业务时,可以根据管控指令拒绝处理该业务,从而避免自身因同样的原因引发异常,从而提高了系统的稳定性。
在一种可能的设计中,所述控制节点上存储有异常业务信息以及用于管理异常业务的管控策略,且所述异常业务信息包括发生过异常的业务的业务特征信息以及发生异常的次数。那么,生成管控指令过程包括:所述控制节点根据接收到的所述业务特征信息更新所述异常业务信息,并在确定更新后的异常业务信息中的所述业务特征信息及对应的发生异常的次数符合所述管控策略中执行管控的条件时,生成所述管控指令。
可以理解的是,由于引入了管控策略,可以根据实际需求进行设置,使得控制节点对于异常业务的管控更为多样化,具有更好的适应性。
在另外一种可能的设计中,当第二业务节点接收到所述管控制之后,根据所述管控指令中的业务特征信息更新本地存储的需管控的业务的业务特征信息。该第二业务节点在访问业务之前,会先获得待访问业务的业务特征信息,并根据待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。可选的,所述第二业务节点访问业务之前获得待访问业务的业务特征信息的过程包括:所述第二业务节点从接收到的业务访问请求中获得所述待访问业务的特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第二业务节点根据管理维护任务获得所述待访问的业务的业务特征信息。
可以理解的是,管控策略的内容是比较多样化的,所以可能根据临时管控策略以及异常业务信息确定出需要临时管控。也就是说,第二业务节点从而控制节点接收的管控指令中包括管控时长。那么,第二业务节点接收到管控指令之后,将所述管控指令中的业务特征信息记录为需管控的业务的业务特征信息,并启动定时器,设定定时器的时长为所述管控时长。该第二业务节点在访问业务之前,会先获得待访问业务的业务特征信息,在定时器超时之前根据待访问业务的业务特征信息以及更新后的异常业务信息确定拒绝处理所述待访问业务;或者,在定时
器超时后,不作管控。
在另外一种可能的设计中,所述控制节点接收发生异常的业务的业务特征信息之前还包括:所述第一业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述管理维护任务;所述第一业务节点执行所述管理维护任务时触发异常,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将获得的所述业务特征信息发送给所述控制节点。或者,所述控制节点接收发生异常的业务的业务特征信息之前还包括:所述第一业务节点接收所述控制节点发送的业务访问请求,根据所述业务访问请求对应的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述业务访问请求;执行所述业务访问请求时触发业务异常,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息,将获得的所述业务特征信息发送给所述控制节点。
可以理解的是,其中,需管控的业务的业务特征信息可以是存储在第一业务节点本地,也可以是存储在第一业务节点可以查询的设备上。而异常业务信息则可以是存储在第二业务节点本地,也可以是第二业务节点可以查询的设备上。业务节点每次在处理业务之前都根据需管控业务的业务特征信息来确定接收到的业务访问请求或者自运行任务是否可以执行。由此,可以提高整个系统的稳定性。
第二方面,本发明实施例提供了另外一种处理业务的方法,该方法应用于分布式系统中,该分布式系统包括控制节点和至少两个业务节点,所述至少两个业务节点包括第一业务节点和第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点。该方法包括:所述第二业务节点接收所述控制节点发送的管控指令,根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息,所述管控指令是由所述控制节点根据所述第一业务节点上报的发生异常的业务的业务特征信息生成的,且,所述管控指令包含需管控的业务的业务特征信息;所述第二业务节点访问业务之前获得待访问业务的业务特征信息;所述第二业务节点根据所述待访问业务的业务特征信息以及所述更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
跟第一方面的方法相比,第二方面侧重介绍了第二业务节点接收到管控指令之
后,根据管控指令中携带的信息对业务访问请求进行管控,从而避免了分布式系统中的业务节点因为同样的业务访问而导致故障。提高了系统的稳定性。
在一个可能的设计中,所述确定拒绝处理所述待访问业务的过程包括:所述第二业务节点在所述需管控的业务的业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
在另一个可能的设计中,所述第二业务节点访问业务之前获得待访问业务的业务特征信息包括:所述第二业务节点从接收到的业务访问请求中获得所述待访问业务的业务特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第二业务节点根据管理维护任务获得所述待访问业务的业务特征信息。
在一个可能的设计中,所述方法进一步包括:所述第二业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述管理维护任务;所述第二业务节点在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
在一种可能的设计中,所述方法进一步包括:所述第二业务节点接收客户端发送的业务访问请求,根据所述业务访问请求对应的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述业务访问请求;所述第二业务节点在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
可以理解的是,第二业务节点也可能会在自身运行管理维护任务或者处理客户端的业务访问请求时发生故障,此时,第二业务节点也会获取本节点发生异常的业务的业务特征信息并发送给控制服务器,从而使控制服务器以及其他的业务节点执行管控,避免其他业务节点也发生同样的故障。
第三方面,本发明实施例提供了另外一种处理业务的方法,该方法应用于分布式系统中,该分布式系统包括第一业务节点。该方法包括:所述第一业务节点接收其他节点发生异常的业务的业务特征信息,根据接收到的业务特征信息更新本地存储的异常业务信息,所述异常业务信息包括发生过异常的业务的业务特征信息;所述第一业务节点访问业务之前获得待访问业务的业务特征信息;所述第一业务节点根据所述待访问业务的业务特征信息以及更新后的异常业务信息确定拒绝处理所述待访问业务。
跟第一方面的实施例不同,本实施例提供的方法是由第一业务节点自己收集异常业务信息,并根据异常业务信息来判断是否拒绝处理所述待访问业务,而不是根据控制节点的管控指令来判断是否拒绝处理所述待访问业务。这样,即便分布式系统中没有布局控制节点,也可以实现对异常业务的管控,提高系统的稳定性稳定性。
在一个可能的设计中,所述第一业务节点根据所述待访问业务的业务特征信息以及所述异常业务信息拒绝处理所述待访问业务的过程包括:所述第一业务节点在所述异常业务信息中找到与所述待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
在另外一个可能的设计中,所述第一业务节点上存储有用于管理异常业务的管控策略,且所述异常业务特征信息还包括与所述发生过异常的业务的业务特征信息对应的发生异常的次数,所述第一业务节点根据所述待访问业务的业务特征信息以及所述异常业务信息拒绝处理所述待访问业务包括:所述第一业务节点在所述异常业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息及对应的发生异常的次数;所述第一业务节点确定找到的业务特征信息及对应的发生异常的次数符合所述管控策略中执行管控的条件时,拒绝处理所述待访问业务。
可以理解的是,管控策略的内容是比较多样化的,所以第一业务节点上除了存储发生异常的业务的业务特征信息之外,还可以根据管控策略为该异常业务设置定时器,并在所述定时器超时之前,根据所述待访问业务的业务特征信息以及所述异常业务信息拒绝处理所述待访问业务;或者,在定时器超时之后,不作管控。
一种可能的设计中,所述第一业务节点访问业务之前获得待访问业务的业务特征信息包括:所述第一业务节点从接收到的业务访问请求中获得所述待访问业务的业务特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第一业务节点根据管理维护任务获得所述管理维护任务所访问的所述待访问的业务的业务特征信息。
在另外一种可能的设计中,所述分布式系统还包括第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点。所述第一业务节点获得其他业务节点发生异常的业务的业务特征信息之前包括:所述第二业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的异常业务信息确定执行所述管理维护任务;所述第二业务节点在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。或者,所述第一业务节点获得其他业务节点发生异常的业务的业务特征信息之前还包括:所述第二业务节点接收客户端发送的业务访问请求,根据待访问业务的业务特征信息以及本地存储的异常业务信息确定执行所述业务访问请求;所述第二业务节点在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。
在一种可能的实现中,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息可以包括:根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。
在另外一种可能的实现中,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息可以包括:根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。
第四方面,本发明实施例提供了一种业务节点,这种业务节点应用于部署了控
制节点的分布式系统。该业务节点包括:接收单元,用于接收业务控制节点发送的管控指令,根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息,所述管控指令是由所述控制节点根据其他业务节点上报的发生异常的业务的业务特征信息生成的,且,所述管控指令包含需管控的业务的业务特征信息;更新单元,用于根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息;获取单元,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元,用于根据所述第一待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
由于业务节点在处理业务之前,会基于历史中该业务是否为控制节点下发的需管控的业务而确定是否要拒绝处理。这样就避免了分布式系统中的业务节点因为同样的业务访问而连续触发异常。提高了分布式系统的稳定性。
在一种可能的设计中,所述获取单元用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,所述获取单元用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
在一种可能的实现方式中,所述业务节点包括发送单元,其中,所述处理单元,还用于启动管理维护任务,根据管理维护任务所访问的第二待访问业务的业务特征信息以及业务特征信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
在一种可能的实现方式中,所述业务节点包括发送单元,其中,所述接收单元,还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;所述处理单元,还用于根据所述待访问业务的业务特征信息和异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请
求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
可选的,上述任意一种可能的设计或实现中,所述的业务节点还包括存储单元,所述的存储单元用于存储所述需管控的业务的业务特征信息,所述处理单元,用于在所述需管控的业务的业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
第五方面,本发明实施例提供了另外一种业务节点,该业务节点也包括接收单元、更新单元、获取单元以及处理单元。其中,接收单元,用于接收其他业务节点发生异常的业务的业务特征信息;更新单元,用于根据接收到的业务特征信息更新异常业务信息,所述异常业务信息包括发生异常的业务的业务特征信息;获取单元,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元,用于根据所述第一待访问业务的业务特征信息以及记录的异常业务信息拒绝处理所述待访问业务。
由于业务节点会收集系统中其他业务节点发生异常的情况,并且在处理业务之前,会基于收集的异常业务信息来确定是不是要拒绝处理。这样就避免了分布式系统中的业务节点因为同样的业务访问而连续触发异常。提高了分布式系统的稳定性稳定性。
在一种可能的设计中,所述获取单元用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,所述获取单元用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
在一种可能的设计中,所述业务节点包括发送单元,其中,所述处理单元,还用于启动管理维护任务,根据管理维护任务所访问的第二待访问业务的业务特征信息以及业务特征信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用
于将所述获得的业务特征信息发送出去。
在另外一种可能的设计中,所述业务节点,还包括发送单元,其中,所述接收单元,还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;所述处理单元,还用于根据所述待访问业务的业务特征信息和异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
在一种可能的设计中,所述业务节点还包括存储单元,所述存储单元用于记录所述异常业务特征信息;所述处理单元,用于在所述异常业务信息中找到与所述第一待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述第一待访问业务。
在一种可能的设计中,所述业务节点还包括存储单元,所述存储单元用于记录所述异常业务特征信息,以及存储用于管理异常业务的管控策略,其中,所述异常业务特征信息还包括与所述发生异常的业务的业务特征信息对应的发生异常的次数;所述处理单元,用于在所述异常业务特征信息中找到与所述第一待访问业务的业务特征信息相同的业务特征信息及对应的发生异常的次数,并在确定找到的业务特征信息及对应的发生异常的次数符合管控策略中执行管控的条件时,拒绝处理所述第一待访问业务。
第六方面,本发明实施例提供了一种分布式系统,该分布式系统包括控制节点和至少两个业务节点,所述至少两个业务节点包括第一业务节点和第二业务节点,所述的控制节点用于接收第一业务节点上报的发生异常的业务的业务特征信息,根据所述业务特征信息生成管控指令并发送给所述第二业务节点,所述管控指令中包括所述业务特征信息,所述管控指令用于指示所述第二业务节点拒绝处理所述业务特征信息所表征的业务;所述第二业务节点用于接收所述管控指令,根据所述管控指令中的业务特征信息更新本地存储的需管控的业务的业务特征信
息,并在访问业务之前获得待访问业务的业务特征信息,根据所述待访问业务的业务特征信息以及更新后的异常业务特征信息拒绝处理所述待访问业务。其中,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点。
在一种可能的设计中,所述第二业务节点用于从接收到的业务访问请求中获得所述待访问业务的特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,根据管理维护任务获得所述管理维护任务所访问的待访问业务的业务特征信息。
在一种可能的设计中,所述第一业务节点,用于启动管理维护任务,根据本地存储的异常业务信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
在一种可能的设计中,所述第一业务节点,用于接收所述控制节点发送的业务访问请求,根据本地存储的需管控的业务的确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
第七方面,本发明实施例提供了一种控制节点,包括通信接口、处理器和存储器,所述通信接口、处理器和存储器通过总线相连,所述通信接口,用于与跟外部通信以及跟处理器通信,在该存储器中存储有指令,所述处理器执行所述存储器中的指令以执行上述第一方面中的控制节点执行的步骤。
第八方面,本发明实施例还提供了一种业务节点,包括通信接口、处理器和存储器,所述通信接口、处理器和存储器通过总线相连,所述通信接口,用于与跟外部通信以及跟处理器通信,在该存储器中存储有指令,所述处理器执行所述存储器中的指令以执行上述第二方面和第三方面中的步骤。
第九方面,本发明实施例提供了一种程序产品,该程序产品包括指令,当该程序产品被计算机执行的时候,使得该计算机执行上述第一方面到第三方面任意一方面的方法。
可以理解的是,上述多个方面的实施例中,跟第一业务节点具有业务备份关系的业务节点可以有多个,也就是说,除了第二业务节点之外,还可以有别的业
务节点。
相较于现有技术,本发明实施例提供的方案中,控制节点或者业务节点会收集分布式系统中发生异常的业务的业务特征信息。基于收集到的信息,来决定是否对发生异常的业务进行管控,从而提高了分布式系统的稳定性。
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。
图1是一种分布式分布式系统的结构示意图;
图2是一种处理业务的方法流程示意图;
图3是一种处理业务的方法流程示意图;
图4是一种处理业务的方法流程示意图;
图5是一种处理业务的方法流程示意图;
图6是一种分布式系统的结构示意图;
图7是一种业务节点的结构示意图;
图8是一种控制节点的结构示意图;
图9是一种控制节点或者业务节点的结构示意图。
下面将结合本发明实施例中的附图,对本发明实施例提供的技术方案进行描述。
如图1所示的,分布式分布式系统100包括多个客户端(1,2,…N),多个业务节点(1,2,…M),以及控制节点。M,N为大于等于2的自然数。其中,控制节点可以是元数据服务器;业务节点可以是存储节点或者计算节点;而客户端则可以是各种应用服务器、文件服务器或者终端用户等。分布式分布式系统也可以包括两个以上的控制节点。当分布式分布式系统中的控制节点不止一个时,可以对控制节点有主备设置。比如,其中一个设置为主控制节点,其余的设置为备
控制节点。当设置有主备控制节点时,在各控制节点上配置相应的策略来实现控制节点之间的协作。这些客户端、控制节点与业务节点之间,通过通信网络进行通信。本发明实施例中所提到的控制节点是正在处理业务的控制节点,可以是主控制节点,也可以是接管主控制节点的备控制节点。本发明实施例中提到的节点,在具体的应用场景中,可以是服务器。比如控制节点可以是控制服务器,存储节点可以是存储服务器、计算节点则可以是鉴权服务器,本申请中不作限定。
当上述分布式系统中有业务节点(下面称为第一业务节点)发生业务异常时,该业务节点将发生异常的业务的业务特征信息上报给控制节点,控制节点根据接收到的业务特征信息生成管控指令,并将生成的管控指令发送给分布式系统中跟发生异常的业务节点有业务备份关系的业务节点(下面称第二业务节点)。分布式系统中的跟发生异常的业务节点可以不止一个。当然,也可以发送系统中的给其他业务节点,本发明实施例不作限制。上述的管控指令用于指示业务节点拒绝处理由上述业务特征信息所表征的业务,在上述的管控指令中包括发生异常的业务的业务特征信息。
结合图1所述的分布式分布式系统,本发明实施例提供了一种处理业务的方法,如图2所示,具体的过程包括:
201,控制节点接收客户端1发送的业务访问请求1,将该业务访问请求1发送给业务节点1,该业务访问请求1中包括待访问业务的业务特征信息。可替代的,在有些分布式系统中,也可以是业务节点直接从客户端1接收业务访问请求1。
业务特征信息是用来表征一个业务的。在分布式的存储服务系统中,业务节点也就是存储节点,而业务访问请求中所携带的业务特征信息可以包括业务对象ID、操作地址范围和操作码。其中,操作码可以用来指示读操作、写操作或者文件系统服务等操作。在分布式的对象服务系统中,业务节点可以是存储节点,而业务访问请求中所携带的业务特征信息可以是put/get,key,value。在分布式的鉴权服务系统中,业务节点可以是计算节点,而业务访问请求中所携带的业务特征信息可以是接口名称和接口参数(数量不定)。本申请对业务节点不作限定。业务可以是指将数据写入业务节点的某一地址范围,或者从某一地址范围内读取数据等,在此不作限定。
202,业务节点1接收到该业务访问请求1之后,根据待访问业务的业务特征信息以及需管控的业务的业务特征信息确定执行所述业务访问请求1。
需管控的业务的业务特征信息可以是存储在本地的,也可以是存储在业务节点1可以访问的设备上。本发明实施例中不作限定。如果能够在记录的需管控的业务的业务特征信息中找到相同的业务特征信息,则表明该业务访问请求1所访问的业务在之前的业务访问时出现异常且需要管控。相反,如果在本地记录的需管控的业务的业务特征信息中找不到相同的业务特征信息,则表明该业务访问请求1所访问的业务在之前的业务访问时未出现过访问异常,因此不需要管控,或者,表明该业务访问请求1所访问的业务虽然在之前的业务访问时触发过异常,但无需管控。
203,业务节点1调用线程执行接收到的业务访问请求1,在该线程执行业务访问请求的过程中,该线程发生异常。
204,业务节点1根据该线程的ID获得发生异常的业务的业务特征信息,该业务特征信息包括发生异常的业务的业务对象ID、操作的地址范围以及操作码。
获得发生异常的业务的业务特征信息可以通过两种方式,方式一:业务节点1中的该线程根据该线程的ID获取该线程正在执行的业务访问请求1的业务上下文,并从获取到的业务上下文中获得发生异常的业务的业务特征信息。方式二:业务节点1中的该线程根据该线程的ID对应的线程上下文中获得发生异常的业务的业务特征信息。
205,业务节点1控制节点上述的发生异常的业务的业务特征信息上报给控制节点。
206,控制节点根据接收到的发生异常的业务的业务特征信息生成管控指令,该管控指令中包括该异常业务的业务特征信息。
其中,管控指令用于指示分布式系统中的一个或多个业务节点在接收到与上述的业务访问请求1相同的请求时如何响应。且管控指令中携带需管控的业务的业务特征信息。本步骤中,控制节点只要收到业务节点上报的业务异常,就生成管控指令。
可替代地,控制节点上也可以存储有异常业务信息以及用于管理业务异常的管
控策略。其中,异常业务信息包括发生过异常的业务的业务特征信息以及发生异常的次数。管控策略是用来描述在哪些条件下哪些业务节点需要拒绝对某业务的处理。
比如,管控策略可以为:当某一业务访问请求触发的异常的次数超过预设阈值时,生成管控指令,用以指示分布式系统中与发生异常的业务节点具有备份关系的业务节点拒绝处理与该业务访问请求一样的请求;当某一业务访问请求触发的异常的次数未超过预设阈值时,不管控,也就不会生成管控指令。那么生成管控指令的具体过程可以是:根据所述接收到的发生异常的业务的业务特征信息更新所述异常业务信息;所述控制节点确定所述业务特征信息及对应的发生异常的次数符合管控策略中执行管控的条件时,生成管控指令。
可见,上述的异常业务信息和管控策略都可以根据实际需要设定。以管控策略为例,一方面,为不同类型的操作配置不同的管控策略。也就是说,可以对写操作、读操作、文件系统服务等操作类型分别配置不同的管控策略。如果为不同类型的操作配置了不同的管控策略,那么,当发生业务异常时,根据业务特征信息中的操作码找到与该操作码所代表的操作类型对应的管控策略。另一方面,管控策略的内容也是可以根据实际需要配置。比如,管控策略可以包括:若由某一业务访问请求而触发的异常的次数超过预设阈值,则发送管控指令给分布式系统中尚未发生异常的其他业务节点,指示这些业务节点拒绝处理与该业务访问请求一样的请求;对于由某一业务访问请求触发的异常的次数大于1且未超过阈值的情况,可以发送临时管控指令。所谓的临时管控指令,用于指示分布式系统中业务节点在预设的时间段内拒绝处理与该业务访问请求相同的请求。当需要临时管控时,下发的管控指令中还包括管控时长。可以理解的是,管控策略中也可以限定,将管控指令或者临时管控指令发送给分布式系统中的哪些业务节点。比如,将管控指令或临时管控指令发送给与所述发生故障的业务节点具有业务备份关系的业务节点。可以理解的是,当上述的业务访问请求在业务节点1触发异常的时候,可能在别的业务节点也因为该业务访问请求而触发过异常,且已经上报到控制节点。也就是说,分布式系统中不止一个业务节点因为同一业务访问请求而触发异常。
207,控制节点将生成的管控指令下发给分布式系统中与发生异常的业务节点有业务备份关系的业务节点。
可替代的,本步骤中,控制节点也可以将获得的发生异常的业务特征信息发送给分布式系统中未发生异常的业务节点。这些未发生异常的业务节点包括了跟发生异常的业务节点有业务备份关系的业务节点。下文仅以业务节点2为例,来说明当接收到管控指令后,业务节点的处理过程。可以理解的是,其他接收到管控指令的业务节点也会与业务节点2作出相同的处理。
可选地,在本步骤中,控制节点也可以包括向客户端发送提示信息,提示客户端用户介入。
208,业务节点2接收到管控指令之后,将所述管控指令中的业务特征信息记录为需管控的业务的业务特征信息。
可选的,当根据管控策略确定为临时管控的时候,管控指令中还会携带管控时长。这种情况下,业务节点2接收到管控指令时,会启动定时器,设定定时器的时长为所述管控时长。定时器可以是为某一特定业务设置的。
209,客户端1发送业务访问请求2被控制节点分配到了业务节点2,该业务访问请求2中包括客户端所请求的业务的业务特征信息。
可替代地,本步骤中也可以是其他客户端发送的业务访问请求2。
210,业务节点2接收到业务访问请求2之后,根据接收到的业务访问请求2中的业务特征信息以及需管控的业务的业务特征信息确定拒绝处理接收到的所述业务访问请求2。
具体的判断方法与步骤202中的相同,只是在本步骤中,业务节点2在需管控的业务的业务特征信息中找到了相同的业务特征信息。也就是说,业务节点2接收到的该业务访问请求2需要管控。当然,如果业务节点2在记录的需管控的业务的业务特征信息中没找到相同的业务特征信息,那么业务节点2可以继续处理业务访问请求2。处理业务访问请求2的具体过程跟现有技术无异,在此不再赘述。
进一步地,当业务节点2设置有定时器时,如果上述的业务访问请求2是在定时器超时之前接收到的,则业务节点2确定业务访问请求2需要管控。如果是
在定时器超时之后接收到上述的业务访问请求2的,那么业务节点2不作管控;并且结束定时器,把需管控的业务的业务特征信息中与业务访问请求2中的业务特征信息相应的记录删掉。
211,业务节点2向客户端1返回拒绝访问的响应消息。上述的方法实施例中,控制节点收集系统中发生异常的情况,当分布式系统中有一个或多个业务节点因为处理外部的客户端的同一业务访问请求而引发了业务异常时,控制节点会向那些与发生异常的业务节点之间存在业务备份关系的业务节点发送管控指令。当同样的业务访问请求试图访问这些收到过通知的业务节点时,这些业务节点可以拒绝该业务访问请求,从而避免因同样的业务访问请求而导致自身异常,提高了系统的稳定性。
如图3所示,本发明实施例还提供了一种处理业务访问请求的方法。跟图2中的实施例不同的是,本实施例中,触发异常的过程是在业务节点1执行自运行任务时发生的。该方法包括如下步骤:
301,业务节点1启动自运行任务1时,获得自运行任务1所访问的待访问业务的业务特征信息,并将待访问业务的业务特征信息和需管控的业务的业务特征信息确定执行自运行任务1。
确定待访问业务是否需管控的方法跟步骤202中的一样。其中,自运行的任务,可以是保证业务节点的正常运行而执行的任务。比如,周期性的数据校验任务,或者,周期性的硬件状态巡检任务等。在本实施例中,业务特征信息可以包括业务对象ID、和任务ID。有时候,业务特征信息也可以包括操作地址范围。其中,任务ID是正在执行的任务的唯一标识。
可替代的,上述的自运行任务也可以替换为别的管理维护任务,比如,人工触发的配置、控制,维护命令等。其他的任务处理过程与自运行任务一样,在此不再赘述。
302,业务节点1在执行该任务1时触发异常,通过执行该任务1的线程的线程上下文获得发生异常的任务的业务特征信息,该业务特征信息包括引起异常的业务的任务对象ID和任务ID。
获得异常业务的业务特征信息的过程可以参考上述图2对应的实施例描述。本
步骤采用的的是其中的方式二,可替代的,也可以采用其中的方式一来实现。也就是说从线程正在执行的业务的业务上下文中获得异常业务的业务特征信息。
303,业务节点1将获得的业务特征信息上报给控制节点。
304,控制节点根据接收到的发生异常的业务的业务特征信息更新本地存储的异常业务信息,并根据更新后的异常业务信息和预先配置的用于管理业务异常的管控策略生成管控指令,该管控指令中包括该异常业务的业务特征信息。
与步骤206不同的是,本步骤中,在控制节点本地存储有异常业务信息以及用于管理业务异常的管控策略。其中,管控策略、管控指令以及具体的处理过程都可以参照图2的实施例,此处不再赘述。当然,基于实际需求,本步骤也可以跟步骤206一样,控制节点只要收到业务节点上报的业务异常,就生成管控指令。
业务节点1因自运行某项任务而触发异常时,可能已经在别的业务节点也因该自运行该任务而触发过异常,且已经上报到控制节点。也就是说,分布式系统中不止一个业务节点因为同一自运行任务而触发异常。
可选地,在本步骤中,控制节点也可以包括向客户端发送提示信息,提示客户端用户介入。305,控制节点将生成的管控指令下发给分布式系统中与发生异常的业务节点有业务备份关系的业务节点。
可替代的,本步骤中,控制节点也可以将获得的发生异常的业务特征信息发送给分布式系统中未发生的异常的业务节点。这些未发生异常的业务节点包括了跟发生异常的业务节点有业务备份关系的业务节点。下文仅以业务节点2为例,来说明当接收到管控指令后,业务节点的处理过程。
306,业务节点2接收到管控指令之后,将所述管控指令中的业务特征信息记录为需管控的业务的业务特征信息。
可选的,当根据管控策略确定为临时管控的时候,管控指令中还会携带管控时长。这种情况下,业务节点2接收到管控指令时,会启动定时器,设定定时器的时长为所述管控时长。
307,业务节点2启动自运行任务2时,业务节点2获得该自运行任务2所访问的待访问业务的业务特征信息,并根据待访问业务的业务特征信息和需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
确定启动待访问业务是否需管控的方法跟步骤202中的一样,只是在本步骤中,业务节点2在记录的需管控的业务的业务特征信息中找到了相同的业务特征信息。也就是说,自运行任务2所访问的业务是需要管控的。当然,如果业务节点2在记录的需管控的业务的业务特征信息中没找到相同的业务特征信息,那么业务节点2可以继续执行自运行任务2。执行自运行任务的具体过程跟现有技术无异,在此不再赘述。
进一步地,当业务节点2设置有定时器时,如果上述的自运行任务2对业务的访问是在定时器超时之前确定的,则业务节点2确定待访问业务需要管控。如果上述的自运行任务2对业务的访问是在定时器超时之后确定的,那么业务节点2不作管控;并且结束定时器,把需管控的业务的业务特征信息中与待访问业务的业务特征信息相应的记录删掉。
可以理解的是,分布式系统中其他的收到管控指令的业务节点的处理与节点2相同,此处不再赘述。跟图2的实施例不同的是,本实施例中,控制节点收集的是业务节点执行管理维护任务时发生异常的情况。当分布式系统中的一个或多个业务节点因为运行同一自运行任务导致异常时,控制节点会向分布式系统中那些与发生异常的业务节点之间存在业务备份关系的业务节点发送管控指令。当同样的自运行任务启动时,这些接收到管控指令的业务节点可以拒绝执行该任务,从而提高了系统稳定性。
可以理解的是,图2和图3对应的实施例中,还可以进一步包括如下步骤:
当满足预定条件时,控制节点根据所述预定条件所限定的业务特征信息将本地保存的异常业务信息中与该业务特征信息相关的记录删除;而相应的,各业务节点也会根据所述预定条件所限定的业务特征信息将本地保存的需管控的业务的业务特征信息中与该业务特征信息相关的记录删除。预定的条件可以根据实际需求设置,比如,用户介入干预,或异常业务恢复。通过这种设置,使得管控是可重置的,更为灵活、更有适应性。
跟图1所示的组网不一样,也有一些分布式系统,比如,SAN或NAS,并没有部署单独的控制节点。通常在SAN或者NAS网络中,应用服务器或者文件服务器通过网络与多个存储节点相连,通常当应用服务器需要向存储节点写数据或
者读数据时,应用服务器或者文件服务器就相当于是发起业务访问请求的客户端,而该业务访问请求通过网络发送给存储节点。
如图4所示,本发明实施例提供了一种处理业务的方法,应用于没有部署控制节点的分布式系统中。该方法包括如下步骤:
401,客户端1向业务节点1发送业务访问请求1,该业务访问请求1中包括待访问业务的业务特征信息。
参考步骤201中的描述,不同的分布式系统中,业务特征信息所包含的信息有所不同。以分布式的存储服务系统为例,此处的业务特征信息可以包括业务对象ID、操作地址范围和操作码。操作码可以用来指示读操作、写操作或者文件系统服务等操作。
402,业务节点1接收到该业务访问请求1之后,根据该业务访问请求1中携带的业务特征信息和异常业务信息确定执行该业务访问请求1所访问的业务。
这里的异常业务信息可以存储在本地的,也可以存储在业务节点可以访问的设备上。跟前面的实施例一样,异常业务信息可以包括发生过异常的业务的业务特征信息,或者包括发生过异常的业务的业务特征信息以及相应的发生异常的次数。而且管控策略的设置也可以根据实际需要。
在一种可能的实现中,上述的异常业务信息包括发生过异常的业务的业务特征信息,且在本地记录异常业务信息中找不到与访问请求中携带的业务特征信息相同的业务特征信息。也就是说,该业务访问请求所访问的业务在之前的业务访问时未出现过访问异常,因此不需要管控。相反地,如果能够在记录的异常业务信息中找到与访问请求中携带的业务特征信息相同的业务特征信息,则表明该业务访问请求所访问的业务在之前的业务访问时出现异常且需要管控。
在另外一种可能的实现中,上述的异常业务信息包括发生过异常的业务的业务特征信息以及相应的发生异常的次数。管控策略为:当某一业务访问请求触发的异常的次数超过预设阈值时,需要管控;当某一业务访问请求触发的异常的次数未超过预设阈值时,不管控。那么,如果在本地触发过异常的业务的业务特征信息中找到相同的业务特征信息,并不能表明该业务访问请求所访问的业务需要管控。而是,还需要进一步根据管控策略确定是否管控。当业务节点1确定收到
的业务访问请求所触发的异常的次数未超过预设阈值时,确定执行该业务访问请求所访问的业务。因为,虽然可能是别的业务节点因为同样的业务访问请求而导致了异常,并且该异常已经通知到了业务节点1,从而使得业务节点1中存储了该业务特征信息,但是并不意味着根据该业务访问请求就要被管控。相反地,如果确定收到的业务访问请求所触发的异常的次数超过预设阈值时,确定拒绝执行该业务访问请求所访问的业务。
403,业务节点1调用线程执行业务访问请求1,在该线程执行业务访问请求1的过程中,该线程发生异常。
404,业务节点1根据该线程的ID获得发生异常的业务的业务特征信息,该业务特征信息包括发生异常的业务的业务对象ID、操作的地址范围以及操作码。
获得发生异常的业务的业务特征信息的方式参考步骤204中的描述,此处不再赘述。
405,业务节点1将获得的发生异常的业务的业务特征信息发给分布式系统中与发生异常的业务节点有业务备份关系的业务节点。
可替代的,本步骤中,业务节点1也可以将确定出来的发生异常的业务特征信息发送给分布式系统中未发生的异常的业务节点。这些未发生异常的业务节点包括了跟发生异常的业务节点有业务备份关系的业务节点。
下文仅以业务节点2为例,来说明当接收到发生异常的业务的业务特征信息之后,业务节点的处理过程。可以理解的是,其他接收到发生异常的业务的业务特征信的业务节点也会与业务节点2作出相同的处理。
可选地,在本步骤中,存储节点1也可以包括向客户端发送提示信息,提示客户端用户介入。
406,业务节点2根据接收到的发生异常的业务的业务特征信息更新记录的异常业务信息。该异常业务信息可以跟前面的步骤402或者其他实施例中的一样。
407,客户端1发送了业务访问请求2被控制节点分配到了业务节点2。
可替代地,本步骤中也可以是其他客户端发送的业务访问请求2。
408,业务节点2接收到业务访问请求2之后,根据该业务访问请求2中携带的业务特征信息和记录的异常业务信息确定拒绝执行该业务访问请求2所访问的
业务。
也就说,此处,业务节点2确定该业务访问请求2需要管控。可以理解的是,可能因为业务节点1之外的业务节点也因为与该业务访问请求2相同的业务访问请求而触发过异常,且已经通知到业务节点2。所以,业务节点2在接收到该业务访问请求2的时候,业务节点2上记录的异常业务信息中包括了该业务访问请求2中业务特征信息,或者根据该业务访问请求2中的业务特征信息及对应的异常次数确定该业务访问请求2复核管控策略中执行管控的条件,从而触发了管控。具体的管控策略可以参考上述实施例的描述。
409,业务节点2向客户端1返回拒绝访问的响应消息。
本实施例的方案中,每个业务节点收集系统中其他业务节点的异常情况,一个或的多个业务节点因为外部的业务访问请求而导致异常,其他未发生异常的业务节点会根据自身配置的管控策略对后续接收到的业务访问请求进行管控。因而,相比现有技术技术,本发明的实施例具有更好的稳定性。
如图5所示,本发明实施例还提供了一种处理业务访问请求的方法,应用于没有部署控制节点的分布式系统中。跟图4实施例不同的是本实施例中,触发异常的过程是在业务节点1执行自运行任务时发生的。该方法包括如下步骤:
501,业务节点1启动自运行任务1时,根据自运行任务1获得自运行任务1所访问的待访问业务的业务特征信息,并将待访问业务的业务特征信息和异常业务信息确定执行所述自运行任务。其中,该待访问业务的业务特征信息包括业务对象ID和任务ID。
其中,自运行任务和业务特征信息跟步骤301中介绍的一样,另外,也可以将自运行任务替换为别的管理维护任务,具体参看步骤301中的描述。而这里的异常业务信息以及确定是否执行管理维护任务的过程则可以参考步骤402及其他实施例中的相关介绍。502,业务节点1在执行该任务1时触发异常,通过执行该任务1的线程的线程上下文获得发生异常的任务的业务特征信息,该业务特征信息包括引起异常的业务的任务对象ID和任务ID。
503,业务节点1将获得的发生异常的业务特征信息发给分布式系统中与发生异常的业务节点有业务备份关系的业务节点。
可替代的,本步骤中,业务节点1也可以将确定出来的发生异常的业务特征信息发送给分布式系统中未发生的异常的业务节点。这些未发生异常的业务节点包括了跟发生异常的业务节点有业务备份关系的业务节点。
下文仅以业务节点2为例,来说明当接收到发生异常的业务的业务特征信息之后,业务节点的处理过程。可以理解的是,其他接收到发生异常的业务的业务特征信的业务节点也会与业务节点2作出相同的处理。
504,业务节点2根据接收到的发生异常的业务的业务特征信息更新异常业务信息。
该异常业务信息可以跟前面的步骤402或者其他实施例中所记载的一样。此处不再赘述。
505,业务节点2启动自运行任务2时,业务节点2获得该任务2所访问的业务的业务特征信息,并将待访问业务的业务特征信息和记录的异常业务信息确定拒绝处理所述待访问业务。
这里的异常业务信息以及可能的实现方式跟步骤402的描述一致,此处不再赘述。
显然,图4和图5的实施例,也可以参考图2和图3对应的实施例,根据实际情况设置管控策略。比如,对于由某一业务访问请求触发的异常的次数大于1且未超过阈值的情况,可以配置临时管控。也就是说,步骤406和504中,业务节点2接收到发生异常的业务的业务特征信息,更新记录的异常业务信息时,还会判断该业务特征信息对应的定时器是否启动。如果已经启动,则清零,重新开始计时;如果没有启动,则启动定时器,为定时器设置定时时长,也就是设置管控时长。
在上述实施例的步骤408中,原本是确定拒绝执行业务访问请求2所访问的业务。但是当业务节点2设置有定时器时,还需要进一步考虑定时器是否超时。也就是说要综合考虑业务访问请求2中的业务特征信息、记录的异常业务信息和定时器设置来确定是否执行业务访问求2所访问的业务。如果上述的业务访问请求2是在定时器超时之前接收到的,那么则业务节点2确定接收到的该业务访问请求需要管控。如果是在定时器超时之后接收到上述的业务访问请求2的,那么
业务节点2不作管控,可以是删掉记录的需管控的业务的业务特征信息。
进一步的,当业务节点2设置有定时器时,步骤505中也需要考虑定时器是否超时。如果自运行任务2对业务的访问是在定时器超时之前确定的,则业务节点2确定接收到的该业务访问请求需要管控。如果自运行任务2对业务的访问是在定时器超时之后确定的,那么业务节点2不作管控,可以是删掉记录的需管控的业务的业务特征信息。
可以理解的是,分布式系统中其他的收到管控指令的业务节点的处理与节点2相同,此处不再赘述。
跟图4的实施例不同的是,本实施例中,当分布式系统中业务节点因为执行自运行任务导致异常时,会向自己有业务备份关系的业务节点发送发生异常的业务的业务特征信息。而接收到异常业务的业务特征信息的业务节点会更新其本地的异常业务信息。当同样的自运行任务启动时,分布式系统中的业务节点会根据异常业务信息拒绝执行该任务,从而提高了系统稳定性。
可以理解的是,图4和图5对应的实施例中,还可以进一步包括如下步骤:
当满足预定条件时,各业务节点也会根据所述预定条件所限定的业务特征信息将本地保存的需管控的业务特征信息中与该业务特征信息相关的记录删除。预定的条件可以根据实际需求设置,比如,用户介入干预某一业务,或某些异常业务恢复等等。通过这种设置,使得管控是可重置的,使得管控业务更为灵活、更有适应性。
上文结合图1至5,从各个设备之间交互的角度对本发明实施例提供的方法进行了详细的介绍。下面结合图6来介绍一下上述实施例中所涉及的分布式系统。图6示出了分布式系统的一种可能的结构示意图,该分布式系统包括控制节点601和至少两个业务节点(以业务节点602和业务节点603为例),业务节点603为与所述业务节点602有业务备份关系的业务节点。其中,控制节点601用于接收业务节点602上报的发生异常的业务的业务特征信息,根据所述业务特征信息生成管控指令并发送给所述业务节点602,所述管控指令中包括所述业务特征信息,所述管控指令用于指示所述业务节点602拒绝处理所述业务特征信息所表征的业务;业务节点603,用于接收所述管控指令,根据所述管控指令中的业务特征信息
更新本地存储的需管控的业务的业务特征信息,并在访问业务之前获得待访问业务的业务特征信息,根据所述待访问业务的业务特征信息以及所述异常业务特征信息拒绝处理所述业务特征信息所表征的业务。可以理解的是,跟业务节点602具有业务备份关系的业务节点可以有多个,也就是说,除了业务节点603之外,还可以有别的业务节点。
在一个示例中,业务节点603,用于从接收到的业务访问请求中获得所述待访问业务的特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,根据管理维护任务获得所述管理维护任务所访问的待访问业务的业务特征信息。
在上述的实施例中,业务节点602将发生异常的业务的业务特征信息发送给控制节点601有两种可能的实现方式。第一种,业务节点602,用于启动管理维护任务,根据本地存储的异常业务信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,通过执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点601。第二种,业务节点602,用于接收所述控制节点发送的业务访问请求,根据本地存储的需管控的业务的确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息,将所述获得的业务特征信息发送给所述控制节点601。
一种可能的实现中,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息具体包括:根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。
在另外一种可能的实现中,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息具体包括:根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。本
实施例中未定义的术语及实现细节可以参考上述图2和图3的方法实施例。
除了上述分布式系统,本发明实施例还提供了一种没有部署控制节点的分布式系统。这种分布式系统包括第一业务节点和第二业务节点,且第二业务节点为与所述第一业务节点有业务备份关系的业务节点。其中,第一业务节点用于接收第二业务节点发送的发生异常的业务的业务特征信息,根据接收到的业务特征信息更新本地存储的异常业务信息,在访问业务之前获得待访问业务的业务特征信息,并根据所述待访问业务的业务特征信息以及所述更新后的异常业务信息确定拒绝处理所述待访问业务。其中,所述异常业务信息包括发生过异常的业务的业务特征信息。第二业务节点,用于获得本节点上发生异常的业务的业务特征信息,将所述发生异常的业务的业务特征信息发送给第一业务节点。
一种可实现的方式中,第二业务节点,用于启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的异常业务信息确定执行所述管理维护任务,在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。
在另外一种可实现的方式中,第二业务节点,用于接收客户端发送的业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息,根据所述待访问业务的业务特征信息和本地存储的异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。
可选的,上述的实现中,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息具体包括:根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。
可选的,上述的实现中,根据执行所述业务访问请求的线程的ID获得所述业
务访问请求的业务特征信息具体包括:根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息。本实施例中未定义的术语及实现细节可以参考上述图4和图5的方法实施例。可以理解的是,分布式系统中的各个设备,例如,业务节点,第一业务节点,第二业务节点,为了实现上述的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文所公开的实施例描述的各示例的模块及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
图7示出了本申请所涉及的业务节点的一种可能的结构示意图。该业务节点可以实现上述图4和图5中方法实施例中业务节点1和/或业务节点2的功能本实施例中未定义的术语及实现细节可以参考上述图4和图5的方法实施例。如图7所示,该业务节点可以包括接收单元701,更新单元702,获取单元703,处理单元704。其中,接收单元701,用于接收发生异常的业务的业务特征信息;更新单元702,用于根据接收到的业务特征信息更新异常业务信息,所述异常业务信息包括发生异常的业务的业务特征信息;获取单元703,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元704,用于根据所述第一待访问业务的业务特征信息以及记录的异常业务信息拒绝处理所述待访问业务。
一种可能的实现中,获取单元703用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,获取单元703用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
一种可能的实现中,业务节点还包括有发送单元705,其中,处理单元704用于启动管理维护任务,根据所述管理维护任务所访问的第二待访问业务的业务特征信息以及记录的业务特征信息确定执行所述自运行任务,并在执行所述管理维
护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;而发送单元705,用于将所述获得的业务特征信息发送出去。
一种可能的实现中,业务节点还包括有发送单元705,其中,接收单元701还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;处理单元704,还用于根据所述待访问业务的业务特征信息和记录的异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元705,用于将所述获得的业务特征信息发送出去。
可选的,在上述的任意一种实现中,还可以包括存储单元706,存储单元706用于存储所述异常业务特征信息,处理单元704,用于在所述异常业务信息中找到与所述第一待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述第一待访问业务。
可选的,在上述的任意一种实现中,还可以包括存储单元706,存储单元706用于存储所述异常业务特征信以及存储用于管理异常业务的管控策。其中,所述异常业务特征信息还包括与记录的业务特征信息对应的发生异常的次数。在该实施例中,处理单元704用于在所述异常业务特征信息中找到与所述第一待访问业务的业务特征信息相同的业务特征信息及对应的发生异常的次数,并在确定找到的业务特征信息及对应的发生异常的次数符合管控策略中执行管控的条件时,拒绝处理所述第一待访问业务。
参考图7,上述图2和图3方法实施例中所涉及的业务节点,也可以包括接收单元,更新单元,获取单元和处理单元。其中,接收单元,用于接收业务控制节点发送的管控指令,根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息,所述管控指令是由所述控制节点根据其他业务节点上报的发生异
常的业务的业务特征信息生成的,且,所述管控指令包含需管控的业务的业务特征信息;更新单元,用于根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息;获取单元,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元,用于根据所述第一待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
在一种可能的实现中,所述获取单元用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,所述获取单元用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
在一种可能的实现中,所述业务节点包括发送单元,其中,所述处理单元,还用于启动管理维护任务,根据管理维护任务所访问的第二待访问业务的业务特征信息以及业务特征信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
一种可能的实现中,上述的业务节点还包括发送单元,其中,所述接收单元,还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;所述处理单元,还用于根据所述待访问业务的业务特征信息和异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
可选的,在上述的任意一种实现中,所述处理单元,用于在所述需管控的业务的业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
需要注意的是,本实施例中未定义的术语及实现细节可以参考上述图4和图5的方法实施例。
图8示出了上述实施中所涉及的控制节点的一种可能的结构示意图。如图8所示,该控制节点包括接收单元801,指令生成单元802,发送单元803。其中,接收单元801,用于接收分布式系统中的第一业务节点上报的发生异常的业务的业务特征信息。指令生成单元802,用于根据所述业务特征信息生成管控指令,所述管控指令中包括所述业务特征信息。发送单元803,用于将所述管控指令发送给所述分布式系统中的第二业务节点,用于指示所述第二业务节点拒绝处理所述业务特征信息所表征的业务。其中,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点。
在一种具体的实现中,所述控制节点还包括了存储单元804。存储单元804上存储有异常业务信息以及用于管理异常业务的管控策略,所述异常业务信息包括发生过异常的业务的业务特征信息以及发生异常的次数。其中,指令生成单元802,用于根据接收到的所述业务特征信息更新所述异常业务信息,在确定所述异常业务信息中的业务特征信息及对应的发生异常的次数符合所述管控策略中执行管控的条件时,生成所述管控指令。
另外,本发明上述实施例中所涉及的业务节点及控制节点都可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于存储器中,比如,随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于业务节点中。当然,处理器和存储介质也可以作为分立组件存在于业务节点中。
参阅图9所示,该业务节点包括:处理器901、存储器902、通信接口903,总线904。其中,处理器901、存储器902以及通信接口通过总线904相互连接;
总线904可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。其中,通信接口903用于跟外部通信以及跟处理器901通信,存储器901中存储有计算机可执行指令,当处理器901执行所述存储器902中的指令时,会执行上述方法实施例中业务节点的功能。控制节点也可以是包括了处理器901、存储器902、通信接口903以及总线904,只不过存储器中存储的指令不同,当处理器901执行所述存储器902中的指令时,会执行上述方法实施例中控制节点的功能。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。上述本发明实施例的功能也可以是由包括指令的的计算机程序产品来实现,当该程序产品被计算机执行的时候,使得该计算机执行上述方法实施例中的部分或者全部步骤。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。
Claims (32)
- 一种处理业务的方法,应用于分布式系统中,该分布式系统包括控制节点和至少两个业务节点,所述至少两个业务节点包括第一业务节点和第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点,其特征在于,所述的方法包括:所述控制节点接收所述第一业务节点上报的发生异常的业务的业务特征信息;所述控制节点根据所述业务特征信息生成管控指令,所述管控指令中包括所述业务特征信息;所述控制节点将所述管控指令发送给所述第二业务节点,用于指示所述第二业务节点拒绝处理所述业务特征信息所表征的业务。
- 如权利要求1所述的方法,其特征在于,所述控制节点上存储有异常业务信息以及用于管理异常业务的管控策略,所述异常业务信息包括发生过异常的业务的业务特征信息以及发生异常的次数,所述生成管控指令包括:所述控制节点根据接收到的所述业务特征信息更新所述异常业务信息;所述控制节点确定更新后的异常业务信息中的所述业务特征信息及对应的发生异常的次数符合所述管控策略中执行管控的条件时,生成所述管控指令。
- 如权利要求1或2所述的方法,其特征在于,所述方法还包括:所述第二业务节点接收所述管控指令,根据所述管控指令中的所述业务特征信息更新本地存储的需管控的业务的业务特征信息;所述第二业务节点访问业务之前获得待访问业务的业务特征信息;所述第二业务节点根据获得的所述待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
- 如权利要求3所述的方法,其特征在于,所述第二业务节点访问业务之前获得待访问业务的业务特征信息包括:所述第二业务节点从接收到的业务访问请求中获得所述待访问业务的业务特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第二业务节点根据管理维护任务获得所述待访问业务的业务特征信息。
- 如权利要求1或2所述的方法,其特征在于,所述控制节点接收发生异常 的业务的业务特征信息之前还包括:所述第一业务节点接收所述控制节点发送的业务访问请求,根据所述业务访问请求对应的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述业务访问请求;执行所述业务访问请求时触发业务异常,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息,将获得的所述业务特征信息发送给所述控制节点。
- 如权利要求1或2所述的方法,其特征在于,所述控制节点接收发生异常的业务的业务特征信息之前还包括:所述第一业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述管理维护任务;所述第一业务节点执行所述管理维护任务时触发异常,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将获得的所述业务特征信息发送给所述控制节点。
- 一种处理业务的方法,该方法应用于分布式系统中,该分布式系统包括控制节点和至少两个业务节点,所述至少两个业务节点包括第一业务节点和第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点,其特征在于,所述的方法包括:所述第二业务节点接收所述控制节点发送的管控指令,根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息,所述管控指令是由所述控制节点根据所述第一业务节点上报的发生异常的业务的业务特征信息生成的,且,所述管控指令包含需管控的业务的业务特征信息;所述第二业务节点访问业务之前获得待访问业务的业务特征信息;所述第二业务节点根据所述待访问业务的业务特征信息以及所述更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
- 如权利要求7所述的方法,其特征在于,所述第二业务节点根据所述待访问业务的业务特征信息以及所述更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务包括:所述第二业务节点在所述需管控的业务的业务特征信息中找到与所述待访问 业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
- 如权利要求7所述的方法,其特征在于,所述第二业务节点访问业务之前获得待访问业务的业务特征信息包括:所述第二业务节点从接收到的业务访问请求中获得所述待访问业务的业务特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第二业务节点根据管理维护任务获得所述待访问业务的业务特征信息。
- 如权利要求7所述的方法,其特征在于,所述方法还包括所述第二业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述管理维护任务;所述第二业务节点在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
- 如权利要求7所述的方法,其特征在于,所述方法还包括:所述第二业务节点接收客户端发送的业务访问请求,根据所述业务访问请求对应的待访问业务的业务特征信息以及本地存储的需管控的业务的业务特征信息确定执行所述业务访问请求;所述第二业务节点在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
- 一种处理业务的方法,该方法应用于分布式系统中,该分布式系统包括第一业务节点,其特征在于,所述的方法包括:所述第一业务节点接收其他业务节点发生异常的业务的业务特征信息,根据接收到的业务特征信息更新本地存储的异常业务信息,所述异常业务信息包括发生过异常的业务的业务特征信息;所述第一业务节点访问业务之前获得待访问业务的业务特征信息;所述第一业务节点根据所述待访问业务的业务特征信息以及所述更新后的异常业务信息确定拒绝处理所述待访问业务。
- 如权利要求12所述的方法,其特征在于,所述第一业务节点上存储有用于管理异常业务的管控策略,且所述异常业务特征信息还包括与所述发生过异常的业务的业务特征信息对应的发生异常的次数,所述第一业务节点根据所述待访问业务的业务特征信息以及所述异常业务信息拒绝处理所述待访问业务包括:所述第一业务节点在所述异常业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息及对应的发生异常的次数;所述第一业务节点确定找到的业务特征信息及对应的发生异常的次数符合所述管控策略中执行管控的条件时,拒绝处理所述待访问业务。
- 如权利要求12所述的方法,其特征在于,所述第一业务节点访问业务之前获得待访问业务的业务特征信息包括:所述第一业务节点从接收到的业务访问请求中获得所述待访问业务的业务特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,所述第一业务节点根据管理维护任务获得所述管理维护任务所访问的所述待访问的业务的业务特征信息。
- 如权利要求12-14任意一项所述的方法,所述分布式系统还包括第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点,其特征在于,所述第一业务节点接收其他业务节点发生异常的业务的业务特征信息之前包括:所述第二业务节点启动管理维护任务,根据所述管理维护任务所访问的待访问业务的业务特征信息以及本地存储的异常业务信息确定执行所述管理维护任务;所述第二业务节点在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。
- 如权利要求12-14任一项所述的方法,其特征在于,所述分布式系统还包括第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点,所述第一业务节点接收其他业务节点发生异常的业务的业务特征信息之前还包括:所述第二业务节点接收客户端发送的业务访问请求,根据所述业务访问请求对 应的待访问业务的业务特征信息以及本地存储的异常业务信息确定执行所述业务访问请求;所述第二业务节点在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述第一业务节点。
- 一种业务节点,所述业务节点包括:接收单元,用于接收业务控制节点发送的管控指令,根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息,所述管控指令是由所述控制节点根据其他业务节点上报的发生异常的业务的业务特征信息生成的,且,所述管控指令包含需管控的业务的业务特征信息;更新单元,用于根据接收到的业务特征信息更新本地存储的需管控的业务的业务特征信息;获取单元,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元,用于根据所述第一待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息确定拒绝处理所述待访问业务。
- 如权利要求17所述的业务节点,其特征在于,所述获取单元用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,所述获取单元用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
- 如权利要求17所述的业务节点,其特征在于,所述业务节点包括发送单元,其中,所述处理单元,还用于启动管理维护任务,根据管理维护任务所访问的第二待访问业务的业务特征信息以及业务特征信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
- 如权利要求17所述的业务节点,其特征在于,还包括发送单元,其中,所述接收单元,还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;所述处理单元,还用于根据所述待访问业务的业务特征信息和异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
- 如权利要求17-20任意一项所述的业务节点,其特征在于,所述处理单元,用于在所述需管控的业务的业务特征信息中找到与所述待访问业务的业务特征信息相同的业务特征信息时,拒绝处理所述待访问业务。
- 一种业务节点,所述业务节点包括:接收单元,用于接收其他业务节点发生异常的业务的业务特征信息;更新单元,用于根据接收到的业务特征信息更新本地存储的异常业务信息,所述异常业务信息包括发生过异常的业务的业务特征信息;获取单元,还用于在访问业务之前获得第一待访问业务的业务特征信息;处理单元,用于根据所述第一待访问业务的业务特征信息以及更新后的异常业务信息确定拒绝处理所述待访问业务。
- 如权利要求22所述的业务节点,其特征在于,所述获取单元用于在访问业务之前从接收到的第一业务访问请求中获得所述第一待访问业务的特征信息,所述第一业务访问请求中包括所述第一待访问业务的业务特征信息;或者,所述获取单元用于根据管理维护任务获得所述第一待访问业务的业务特征信息。
- 如权利要求22所述的业务节点,其特征在于,所述业务节点包括发送单元,其中,所述处理单元,还用于启动管理维护任务,根据管理维护任务所访问的第二待访问业务的业务特征信息以及业务特征信息确定执行所述管理维护任务,并在 执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述管理维护任务的线程的ID获得所述线程正在执行的业务的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
- 如权利要求22所述的业务节点,其特征在于,还包括发送单元,其中,所述接收单元,还用于接收业务访问请求,所述业务访问请求中包括待访问业务的业务特征信息;所述处理单元,还用于根据所述待访问业务的业务特征信息和异常业务信息确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述线程的线程上下文,从所述线程上下文中获得所述发生异常的业务的业务特征信息,或根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务上下文,从所述业务上下文中获得所述发生异常的业务的业务特征信息;所述发送单元,用于将所述获得的业务特征信息发送出去。
- 如权利要求22-25任意一项所述的方法,其特征在于,所述业务节点还包括存储单元,所述存储单元用于记录所述异常业务特征信息,以及存储用于管理异常业务的管控策略,其中,所述异常业务特征信息还包括与所述发生过异常的业务的业务特征信息对应的发生异常的次数;所述处理单元,用于在所述异常业务特征信息中找到与所述第一待访问业务的业务特征信息相同的业务特征信息及对应的发生异常的次数,并在确定找到的业务特征信息及对应的发生异常的次数符合管控策略中执行管控的条件时,拒绝处理所述第一待访问业务。
- 一种分布式系统,其特征在于,所述的分布式系统包括控制节点和至少两个业务节点,所述至少两个业务节点包括第一业务节点和第二业务节点,所述第二业务节点为与所述第一业务节点有业务备份关系的业务节点,其特征在于,所述的控制节点用于接收第一业务节点上报的发生异常的业务的业务特征信息,根据所述业务特征信息生成管控指令并发送给所述第二业务节点,所述管控指令中包括所述业务特征信息,所述管控指令用于指示所述第二业务节点拒绝处理所述业务特征信息所表征的业务;所述第二业务节点用于接收所述管控指令,根据所述管控指令中的业务特征信息更新本地存储的需管控的业务的业务特征信息,并在访问业务之前获得待访问业务的业务特征信息,根据所述待访问业务的业务特征信息以及更新后的需管控的业务的业务特征信息拒绝处理所述待访问业务。
- 如权利要求27所述的分布式系统,其特征在于,所述第二业务节点,用于从接收到的业务访问请求中获得所述待访问业务的特征信息,所述业务访问请求中包括所述待访问业务的业务特征信息;或者,根据管理维护任务获得所述管理维护任务所访问的待访问业务的业务特征信息。
- 如权利要求27或28所述的分布式系统,其特征在于,所述第一业务节点,用于启动管理维护任务,根据本地存储的异常业务信息确定执行所述管理维护任务,并在执行所述管理维护任务中发生异常时,根据执行所述管理维护任务的线程的ID获得所述发生异常的业务的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
- 如权利要求27或28所述的分布式系统,其特征在于,所述第一业务节点,用于接收所述控制节点发送的业务访问请求,根据本地存储的需管控的业务的确定执行所述业务访问请求,并在执行所述业务访问请求中触发业务异常时,根据执行所述业务访问请求的线程的ID获得所述业务访问请求的业务特征信息,将所述获得的业务特征信息发送给所述控制节点。
- 一种控制节点,包括通信接口、处理器和存储器,所述通信接口、处理器和存储器通过总线相连,其特征在于,所述通信接口,用于与跟外部通信以及跟处理器通信,所述存储器中存储有指令,所述处理器执行所述存储器中的指令以执行如权利要求1或2所述的方法。
- 一种业务节点,包括通信接口、处理器和存储器,所述通信接口、处理器和存储器通过总线相连,其特征在于,所述通信接口,用于与跟外部通信以及跟处理器通信,所述存储器中存储有指令,所述处理器执行所述存储器中的指令以执行如权利要求7-16任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201680003721.5A CN108377670A (zh) | 2016-11-28 | 2016-11-28 | 一种处理业务的方法、业务节点、控制节点和分布式系统 |
PCT/CN2016/107504 WO2018094739A1 (zh) | 2016-11-28 | 2016-11-28 | 一种处理业务的方法、业务节点、控制节点和分布式系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/107504 WO2018094739A1 (zh) | 2016-11-28 | 2016-11-28 | 一种处理业务的方法、业务节点、控制节点和分布式系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018094739A1 true WO2018094739A1 (zh) | 2018-05-31 |
Family
ID=62194626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/107504 WO2018094739A1 (zh) | 2016-11-28 | 2016-11-28 | 一种处理业务的方法、业务节点、控制节点和分布式系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108377670A (zh) |
WO (1) | WO2018094739A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115589307A (zh) * | 2022-09-07 | 2023-01-10 | 支付宝(杭州)信息技术有限公司 | 分布式系统的风险监测方法和装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1898923A (zh) * | 2004-10-28 | 2007-01-17 | 日本电信电话株式会社 | 拒绝服务攻击检测系统及拒绝服务攻击检测方法 |
CN103685459A (zh) * | 2012-09-24 | 2014-03-26 | 日本电气株式会社 | 分布式系统、服务器计算机、分布式管理服务器和故障防止方法 |
CN106130786A (zh) * | 2016-07-26 | 2016-11-16 | 腾讯科技(深圳)有限公司 | 一种网络故障的检测方法及装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267345B (zh) * | 2008-03-10 | 2010-12-08 | 中兴通讯股份有限公司 | 业务节点备份方法及分布式系统 |
US9442791B2 (en) * | 2014-11-07 | 2016-09-13 | International Business Machines Corporation | Building an intelligent, scalable system dump facility |
-
2016
- 2016-11-28 CN CN201680003721.5A patent/CN108377670A/zh active Pending
- 2016-11-28 WO PCT/CN2016/107504 patent/WO2018094739A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1898923A (zh) * | 2004-10-28 | 2007-01-17 | 日本电信电话株式会社 | 拒绝服务攻击检测系统及拒绝服务攻击检测方法 |
CN103685459A (zh) * | 2012-09-24 | 2014-03-26 | 日本电气株式会社 | 分布式系统、服务器计算机、分布式管理服务器和故障防止方法 |
CN106130786A (zh) * | 2016-07-26 | 2016-11-16 | 腾讯科技(深圳)有限公司 | 一种网络故障的检测方法及装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115589307A (zh) * | 2022-09-07 | 2023-01-10 | 支付宝(杭州)信息技术有限公司 | 分布式系统的风险监测方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108377670A (zh) | 2018-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110071821B (zh) | 确定事务日志的状态的方法,节点和存储介质 | |
US10020980B2 (en) | Arbitration processing method after cluster brain split, quorum storage apparatus, and system | |
JP6141189B2 (ja) | ファイルシステムにおける透過的なフェイルオーバーの提供 | |
US8533525B2 (en) | Data management apparatus, monitoring apparatus, replica apparatus, cluster system, control method and computer-readable medium | |
US11330071B2 (en) | Inter-process communication fault detection and recovery system | |
US20150339200A1 (en) | Intelligent disaster recovery | |
JP2007279890A (ja) | バックアップシステム及びバックアップ方法 | |
WO2016177130A1 (zh) | 通讯节点的选择方法及装置 | |
CN110888889A (zh) | 一种数据信息更新方法、装置及设备 | |
US10558547B2 (en) | Methods for proactive prediction of disk failure in a RAID group and devices thereof | |
CN110602136B (zh) | 集群访问方法和相关产品 | |
CN110069365B (zh) | 管理数据库的方法和相应的装置、计算机可读存储介质 | |
CN112948128A (zh) | Target端的选择方法、系统及计算机可读介质 | |
US20120124221A1 (en) | Element terminal and communication system | |
US20220138036A1 (en) | Safely recovering workloads within a finite timeframe from unhealthy cluster nodes | |
CN106452836A (zh) | 主节点设置方法及装置 | |
US9092396B2 (en) | Standby system device, a control method, and a program thereof | |
CN113821168A (zh) | 一种共享存储迁移系统、方法及电子设备和存储介质 | |
CN114296909B (zh) | 一种根据kubernetes事件的节点自动扩容缩容方法及系统 | |
CN111342986B (zh) | 分布式节点管理方法及装置、分布式系统、存储介质 | |
CN108200151B (zh) | 一种分布式存储系统中ISCSI Target负载均衡方法和装置 | |
WO2018094739A1 (zh) | 一种处理业务的方法、业务节点、控制节点和分布式系统 | |
CN109474694A (zh) | 一种基于san存储阵列的nas集群的管控方法及装置 | |
WO2017092539A1 (zh) | 虚拟机修复方法、虚拟机装置、系统及业务功能网元 | |
US10348675B1 (en) | Distributed management of a storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16922537 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16922537 Country of ref document: EP Kind code of ref document: A1 |