CN110941528B

CN110941528B - Log buried point setting method, device and system based on fault

Info

Publication number: CN110941528B
Application number: CN201911085865.1A
Authority: CN
Inventors: 魏亚文
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2022-04-08
Anticipated expiration: 2039-11-08
Also published as: CN110941528A

Abstract

The embodiment of the specification discloses a method, a device and a system for setting a log burying point based on a fault. The method comprises the following steps: receiving service failure information, wherein the service failure information comprises: a service identifier and a fault code; acquiring one or more sub fault codes matched with the service identifier and the fault code in the service fault information according to the preset corresponding relation among the service identifier, the fault code and the sub fault codes; acquiring a sub fault code which is not provided with a corresponding log burying point on a current service system from one or more sub fault codes, and acquiring log burying point description information corresponding to the sub fault code which is not provided with the corresponding log burying point, wherein the log burying point description information comprises: burying point coordinates; acquiring a program source code running on a current service system line, and dynamically adding a corresponding log embedded point code segment in a corresponding position of the program source code according to an embedded point coordinate; and dynamically loading the program source code added with the log burial point code segment to a service system.

Description

Log buried point setting method, device and system based on fault

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method, a device and a system for setting a log burial point based on a fault.

Background

In the case of a large number of customers of a business system, even a minor failure of the business system may instantaneously affect tens of thousands of customers. Therefore, in order to avoid the influence on the client due to the failure of the service system, the service system needs to be capable of quickly realizing self-healing when the failure occurs.

In order to enable the service system to quickly realize self-healing when a fault occurs, the fault needs to be located first, and in order to locate the fault, a log is required to be buried for the fault which may occur so as to collect information related to the fault from the service system.

There is a need to provide faster or more reliable solutions for how to log burial points for possible failures of a business system.

Disclosure of Invention

An embodiment of the present specification provides a fault-based log burial point setting method, including:

receiving service failure information, wherein the service failure information comprises: a service identifier and a fault code;

acquiring one or more sub fault codes matched with the service identifier and the fault code in the service fault information according to a preset corresponding relation among the service identifier, the fault code and the sub fault code;

acquiring a sub fault code which is not provided with a corresponding log burying point on the current service platform from the one or more sub fault codes, and acquiring log burying point description information corresponding to the sub fault code which is not provided with the corresponding log burying point, wherein the log burying point description information comprises: burying point coordinates;

acquiring program source codes of all service systems running on the current service platform line, and dynamically adding corresponding log embedded point code segments at corresponding positions of the program source codes according to embedded point coordinates in the embedded point description information;

and dynamically loading the program source code added with the log burial point code segment to the current service platform.

An embodiment of the present specification further provides a fault-based log burial point setting device, including:

a receiving unit, configured to receive service failure information, where the service failure information includes: a service identifier and a fault code;

the first acquisition unit is used for acquiring one or more sub fault codes matched with the service identifier and the fault code in the service fault information according to the corresponding relation among the preset service identifier, the fault code and the sub fault code;

a second obtaining unit, configured to obtain, from the one or more sub fault codes, a sub fault code that is not provided with a corresponding log burying point on the current service platform, and obtain log burying point description information corresponding to the sub fault code that is not provided with the corresponding log burying point, where the log burying point description information includes: burying point coordinates;

a third obtaining unit, configured to obtain a program source code that runs on each service system line of the current service platform;

the embedding unit is used for dynamically adding corresponding log embedded point code segments at corresponding positions of the program source codes according to embedded point coordinates in the embedded point description information;

and the dynamic loading unit is used for dynamically loading the program source code added with the log embedded point code segment to the current service platform.

An embodiment of the present specification further provides a computing device, including:

at least one processor; and

a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method described above.

Embodiments of the present specification also provide a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the above-described method.

An embodiment of the present specification further provides a service self-healing system, including: a service module, a monitoring module, a positioning module, a decision-making module, a data module, a plan module and an execution plan module, wherein,

the service module runs each service system on the service platform, executes the function of the service system and provides service for users;

the monitoring module is used for collecting and analyzing the logs generated by the service module, generating fault alarm information and sending the fault alarm information to the positioning module when abnormal logs are found;

the positioning module receives the fault alarm information, matches the fault alarm information according to preset fault model data to obtain service fault information corresponding to the fault alarm information, and sends the service fault information to the execution planning module and the decision module, wherein the service fault information comprises: a service identifier and a fault code;

the execution planning module receives the service fault information and sets a log burying point for a service system operated by the service module according to the method of any one of claims 1 to 6;

the decision module determines whether the service system is self-healing or not according to the fault code in the service fault information and preset current values of a plurality of monitoring indexes, and selects a self-healing plan from preset plans when the service system is not self-healing;

the plan module executes a flow code corresponding to the plan selected by the decision module, and performs emergency processing on the service in the service system, wherein the emergency processing includes at least one of the following: product degradation, service degradation, and service down-line;

the data module stores basic data in the service self-healing system, wherein the basic data comprises: the fault model data, the current values of the plurality of monitoring indexes, the preset plan and the emergency treatment executed by the plan module.

The embodiment of the specification adopts the following technical scheme: when service fault information including fault codes is received, firstly, a minimum fault code matched with the fault codes is obtained, if a log buried point matched with the obtained minimum fault code is not arranged in a current service platform, log buried point description information corresponding to the minimum fault code is obtained, then program source codes of all service systems running on a current service platform line are obtained, corresponding log buried points are dynamically added into the program source codes according to the log buried point description information, and finally, the program source codes added with the log buried points are dynamically loaded to replace the program source codes running before the service systems on the service platform.

The technical scheme adopted by the specification can achieve the following beneficial effects: the log burying points are not required to be buried in all possible faults when the service system program source code is written initially, but the log burying points are dynamically added according to the faults occurring in operation in the operation process of the service platform, so that noise and interference caused by facing all logs at the same time are avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of one or more embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the one or more embodiments of the disclosure and together with the description serve to explain the one or more embodiments of the disclosure and not to limit the disclosure in any way. In the drawings:

fig. 1 is a schematic diagram of a framework of an example of a service self-healing system;

FIG. 2 is a flow diagram of a fault-based method for log burial point setting in one embodiment of the present description;

FIG. 3 is a diagram illustrating an example structure of an execution plan tree in one embodiment of the present description;

FIG. 4 is a flow diagram of one example of a method for fault-based logging of a buried point in a Java system in one embodiment of the present description;

FIG. 5 is a schematic block diagram of a fault-based logging site setup device 600 in one embodiment of the present description;

fig. 6 is an architecture diagram of an example of a service self-healing system according to an embodiment of the present description; and

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification.

Detailed Description

To make the objects, technical solutions and advantages of one or more embodiments of the present disclosure more apparent, a more complete description of one or more embodiments of the present disclosure will be given below with reference to specific embodiments of one or more embodiments of the present disclosure and accompanying drawings. It is to be understood that the embodiments described are only a subset of one or more embodiments of the specification and not all embodiments. All other embodiments that can be derived by a person skilled in the art from the embodiments given in the description without making any creative effort fall within the protection scope of the embodiments given in the description.

The technical solutions provided by the embodiments of one or more embodiments of the present specification are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic block diagram of an example of a service self-healing system according to the present disclosure, and as shown in fig. 1, a service self-healing system 100 includes: a business module 110, a monitoring module 120, a location module 130, a decision module 140, a data module 150, and a protocol module 160. The service module 110 runs a service system, executes the functions of the service system, and provides services to users. In practical application, the self-healing of the service system can be realized by a near-end packet, that is, the self-healing logic directly runs in a service platform, and the service platform provides host capability. The monitoring module 120 collects and analyzes logs generated in the operation process of the service platform, can timely find abnormal logs and timely notify the abnormal logs to the outside, and in the self-healing system, the monitoring module 120 can generate fault alarm information when finding abnormal logs and send the fault alarm information to the positioning module 130. In practical applications, the monitoring module 120 may collect the log in a manner of reporting by the client regularly or pulling the polling regularly. The positioning module 130 is triggered when receiving the fault alarm information sent by the monitoring module 120, and matches the received fault alarm information based on preset service fault information (i.e., fault model data) to obtain service fault information corresponding to the received fault alarm information. In a specific application, fault codes (i.e. codes) corresponding to various faults can be enumerated in advance to represent corresponding faults, for example, RZ _ DB _ NO _ CONNECT, where a database representing RZ cannot be connected; CZ _ TAIR _ SLOW represents a delay in the transmission of the data buffer across the city. The decision module 140 calculates, according to a fault code obtained by the current positioning module 130, whether to automatically self-heal through a dynamic script by using a current value of a plurality of dimension monitoring indexes as a parameter, and if not, selects a plan for a service system to execute self-healing from preset plans, including but not limited to a plan number, an executor, and a predicted execution time of the selected plan, where the monitoring indexes include but are not limited to: range dimensions, service dimensions, index vectors, system information, release information, physical machine information, and admission information. The plan module 150 executes the flow code corresponding to the plan selected by the decision module, and each subsystem in the service system performs some emergency automatic emergency treatment on the emergency situation in combination with the service of the subsystem, such as product degradation, service degradation or service offline. The data module 160 may provide basic data support for the above modules, and the stored data includes, but is not limited to, the above fault model data, plan data, multiple dimension monitoring indicators (which may be referred to as decision data set data), and self-healing process data (which may be referred to as hemostasis record data). In addition, the data module 160 may also provide a training data source for an intelligent operation and maintenance (AIOPs) training platform, where the AIOPs are products applying Artificial Intelligence to the operation and maintenance field, and may further solve the problem that the automated operation and maintenance cannot solve in a machine learning manner based on existing operation and maintenance data (logs, monitoring information, application information, and the like).

In the embodiment of the present specification, based on the service fault information located by the location module 130, log embedding is performed dynamically, so that the monitoring module 120 can collect a log complete set for the service fault information as much as possible, the location module 130 can locate a fault more accurately, and the decision module 140 can select a more appropriate self-healing plan.

Fig. 1 shows an example of a service self-healing system for illustrative purposes only, and one or more embodiments of the present disclosure may also be applied to service self-healing systems of other architectures.

Example 1

The embodiment provides a log burial point setting method based on a fault, which can be used in a service self-healing system shown in fig. 1 to dynamically set a log burial point for a service system in the service self-healing system.

FIG. 2 is a flow diagram of a method for fault-based logging fix setting in one embodiment of the subject specification. As shown in fig. 2, in step 202, service failure information is received, which includes a service identifier and a failure code. In practical applications, if the method is used in the service self-healing system shown in fig. 1, the service failure information may be sent by the positioning module 130.

After receiving the service failure information, in step 204, one or more sub-failure codes matched with the service identifier and the failure code in the service failure information are obtained according to the preset correspondence between the service identifier, the failure code and the sub-failure codes.

In practical application, the final service fault reason may not be determined according to the fault code in the service fault information, and the fault code in the service fault information needs to be analyzed to find the final service fault reason, so in the embodiment, the corresponding relationship between the service identifier, the fault code and the sub-fault code may be preset, and after the service fault information is received, the sub-fault code matched with the service identifier and the fault code in the service fault information is found from the corresponding relationship.

For example, if a service failure code is a failure of in-place payment check (dfmjysb), the true cause of the failure cannot be directly analyzed from the failure code, and in the preset correspondence, the set of sub-codes matching the failure code is [ problem occurs in the downstream system, problem occurs in the database, and problem occurs in the machine ]. The downstream system which may have problems may be determined according to the service identifier in the service failure information, and similarly, the database and the machine which may have problems may also be determined according to the service identifier in the service failure information.

In an optional implementation manner of this embodiment, in order to facilitate accurate identification of the service fault information, the service fault information may further include a timestamp (indicating the time when the fault occurs) in addition to the service identifier and the fault code, and the time when the fault occurs may be determined by using the timestamp, so that the fault may be more conveniently located.

In an optional implementation manner of this embodiment, the service identifier and the sub fault code that are matched with the service identifier and the fault code in the service fault information may also fail to locate the final service fault reason, and therefore, further matching is required to find the final service fault reason. Therefore, in this optional embodiment, in step 204, a next-stage service identifier and a sub fault code matched with the service identifier and the fault code in the service fault information may be obtained according to the set correspondence between the service identifier, the fault code, and the sub fault code, and then a next-stage service identifier and a sub fault code matched with the next-stage service identifier and the sub fault code may be found according to the correspondence, and recursion may be performed in a loop until a last-stage service identifier and a sub fault code are matched. For example, for the "downstream system problem" (containing the service identifier and the sub fault code of the possible problem in the downstream system) in the above-mentioned sub fault code set matching the current pay-per-view check failure, the next-level sub fault set matching the sub fault may include [ call the downstream system problem, data returned by the downstream system problem ], then continue to match the sub fault of "downstream system problem", find the next-level sub fault set [ call the downstream system problem, data returned by the downstream system problem ], if this-level sub fault set does not have a matching next-level sub fault, finally determine that the sub fault set matching the "current pay-per-view check failure (dfmjysb)" includes: calling a downstream system to cause problems, calling data returned by the downstream system to cause problems, calling a database to cause problems, and calling a machine to cause problems. Through the optional implementation mode, the current service fault can be positioned to the final service fault reason, so that a log burying point can be set for the final service fault reason in the later period to collect appropriate log information.

After matching the one or more sub fault codes corresponding to the service fault information, in step 206, acquiring a sub fault code, which does not have a corresponding log burying point set on the current service platform, from the one or more sub fault codes, and acquiring log burying point description information corresponding to the sub fault code, which does not coordinate the corresponding log burying point, the log burying point description information including: and (4) burying point coordinates. The buried point coordinates can be used for specifying the position where the log buried point description information is inserted. Optionally, the log burial point description information may further include a variable name for indicating a variable related to the log to be collected.

For a certain sub fault code, the log information to be collected may be multifaceted in order to implement self-healing, and one or more sub fault codes in the sub fault codes obtained in step 204 may already have a corresponding log burying point, for example, when a certain fault occurs in the service platform before, a log burying point is already set for the sub fault code. Therefore, in step 206, the sub fault codes which are not provided with the corresponding log burying points on the current service platform are obtained from one or more of the sub fault codes obtained in step 204, and then log addition is performed on the sub fault codes, so that repeated log burying points can be avoided.

Optionally, the sub-fault code may indicate a service system related to the fault, and it may be determined whether the current service platform sets a corresponding log burying point by checking whether the service system sets the corresponding log burying point. By the method, whether the current service platform is provided with the log burying point corresponding to the sub fault code can be quickly determined.

In a specific application, the faults occurring on the service platform may be generally predictable, and for each fault, the log information that needs to be collected may be determined in advance, so in an optional implementation manner of this embodiment, a corresponding relationship between the sub fault code and the log burying point description information may be set in advance, and in step 206, the log burying point description information corresponding to the sub fault code not having the corresponding log burying point is obtained according to the corresponding relationship. Through the optional implementation mode, by presetting the corresponding relation between the sub fault codes and the log buried point description information, when a fault occurs, the log buried point description information corresponding to the sub fault codes can be obtained according to the preset condition, instead of obtaining the log buried point description information by analyzing according to the sub fault codes, and therefore the time for dynamically burying the log buried points can be saved.

In practical applications, the log buried point description information may be referred to as an Anchor point (Anchor), and specifically may be a dynamically generated code block. In an optional implementation manner of this embodiment, the code block may be configured with an operation platform in advance, so that when a suspected fault occurs, the corresponding code block may be dynamically loaded by the operation platform, and the capability of the system to dynamically start collecting more information at a correct time is implemented.

In order to facilitate rapid matching of the embedded point description information matched with the received service fault information, in an optional implementation manner of this embodiment, a preset corresponding relationship between the service identifier and the fault code and a sub-fault code, and a corresponding relationship between the sub-fault code and the log embedded point description information may be stored in a tree structure, where the service identifier and the fault code are used as root nodes, the corresponding next-stage service identifier and the sub-fault code are used as leaf nodes, the next-stage service identifier and the sub-fault code may be used as root nodes of the next stage, and the process is repeated in this way, where the leaf node of the final-stage sub-fault code is the embedded point description information corresponding to the sub-fault code, and the tree may be referred to as an execution plan tree.

For example, fig. 3 is a schematic structural diagram of an example of an execution plan tree in this specification, and in the execution plan tree shown in fig. 3, two services, i.e., service 1 and service 2, are included. The service fault code 1.1 of the service 1 relates to a system a and a system b, two corresponding Anchor points (anchors) namely Anchor 1 and Anchor 2 need to be configured for the fault in the system a, and the system b obviously does not have the fault, so that the Anchor points (anchors) do not need to be configured. Whereas the service fault code 1.2 of service 1 relates to system a, in which a corresponding anchor point, namely anchor 3, needs to be configured for the fault. The sub-fault corresponding to the service fault code 2.1 of the service 2 is the sub-fault code 2.1.1, the sub-fault code 2.1.1 relates to the system a and the system c, and for the fault, the system a needs to be configured with a corresponding anchor point, namely the anchor4, and the system c needs to be configured with a corresponding anchor point, namely the anchor 5. If the service identifier in the received service fault information is 2 and the service fault code is 2.1, the execution plan tree is matched to obtain the following result: an anchor4 is disposed in the system a, and an anchor 5 is disposed in the system c.

And after the log burial point description information is acquired, dynamically burying the log burial point in the service system. In step 208, a program source code running on the current service system line is acquired, and a corresponding log embedded point code segment is dynamically added to the program source code according to the acquired embedded point description information.

In an optional implementation manner of this embodiment, in executing step 208, a code segment of the embedded point description information may be added to the program source code in the memory, then compiled, and in the code file that is obtained by compiling and recognizable by the computer, the processing of the variable memory allocation and the log printing method may be performed. Thus, in this alternative embodiment, step 208 may include the steps of:

step 2081: and loading the program source code into a memory, and adding code segments of the embedded point description information into code lines corresponding to the embedded point coordinates according to the embedded point description information.

Step 2082: compiling the program source codes in the memory into code files which can be identified by the computer, namely compiling the program source codes of the code segments added with the embedded point description information to obtain the code files which can be identified by the computer.

Step 2083, scanning the code file line by line, after the variable name is resolved from the code file, allocating a memory for the variable corresponding to the variable name and storing the memory in a thread stack memory where a service system is located, and dynamically adding a static method calling code for log printing to the current line of the code file; by allocating memory for the variables and storing the memory in the thread stack memory where the service system is located, the parameters can be provided for the subsequent call log printing method.

Step 2084, storing the code file of the memory to the local.

Through the optional implementation mode, static method calling of log printing is added to the code file which can be identified by the computer, redundant codes can be reduced, and performance is improved.

After the corresponding log burial point code segment is dynamically added to the program source code, in step 210, the program source code added with the log burial point code segment is dynamically loaded to the service system, so that the log burial point is dynamically buried in the service system, the service self-healing system applied in the specification can acquire more complete log information subsequently, and the self-healing capability is improved.

In the dynamic loading of the program source code, if the program source code has been processed into the code file stored in the local according to the scheme provided in the foregoing optional embodiment, in an optional embodiment of this embodiment, in step 210, a loader of the current thread of the service system may be obtained first, then the code file stored in the local is loaded into the memory by using the loader, and then the memory address referred by the service system is replaced with the corresponding address of the local code file in the memory by a reflection call method. In the optional embodiment, the code file is loaded by adopting the loader of the current thread of the service system, so that the class object in the loaded service system and the original class object of the service system can be ensured to be loaded by the same loader, and the problem of class object isolation is avoided.

The method for setting a log burial point based on a fault provided in the present specification is described below by taking a Java system as an example, and it should be noted that the present specification is described by taking the Java system as an example, but the method provided in the present specification is not limited to being applied to the Java system only, and may be applied to other systems, and the present specification is not particularly limited thereto.

Fig. 4 is a flowchart of an example of a fault-based journaling point setting method in a Java system in an embodiment of the present specification.

First, in order to facilitate understanding of the example provided in the present specification, a part of terms referred to in the example is explained below.

(1) Dynamic script: refers to code that can be executed dynamically, often a groovy script.

(2) A near-end bag: the method provides partial capability to a caller in a jar packet mode, and avoids calling through a remote service.

(3) java stack: java stack, a data structure in Java, automatic allocation and release of operating system, storing function parameter values, local variable values, etc.

(4) The java instruction: java is a java language programming compiler. I.e., java compiler, the definition of classes and interfaces written in the java language, and compiles them into a class file of bytecode.

(5) Java class loader: java Classloader, which is part of the Java Runtime Environment (Java Runtime Environment), is responsible for dynamically loading Java classes into the memory space of a Java virtual machine.

(6) class file: after the java Class is compiled, a byte code file ending in Class is generated, and relevant information of the Class object is stored in the byte code file

(7) java agent: developers can build an application-independent Agent (Agent) that monitors and assists programs running on the JVM and even can replace and modify certain class definitions.

As shown in fig. 4, in this embodiment, the setting of the logging site may include four phases:

41. planning and preprocessing;

42. burying an anchor;

43. class bytecode enhancement;

44. and (4) dynamic loading.

The preprocessing stage corresponds to step 202-206 in fig. 2, the embedding anchor and class bytecode enhancement stage corresponds to step 208 in fig. 2, and the dynamic loading corresponds to step 210 in fig. 2.

The above-described stages will be described with reference to fig. 4.

As shown in fig. 4, the planning pre-processing 41 may include: plan tree loading 411, business fault matching 412, and plan tree execution 413. After receiving the service failure information, the plan tree loading 41 is executed, and the plan tree is loaded from the cache or the database and cached in the memory, where the operation belongs to an initialization operation. The planning tree may be preset based on possible failures of the current business system. Then, a service failure matching 412 is performed, matching the plan tree with the received service failure information as an entry, and determining whether the final service failure cause is already in a recursive manner. Wherein, the service failure information includes but is not limited to: a service identifier, a fault code, and a timestamp. For example, a set of service failure codes (dfmjysb) matched to a child failure code corresponding to the failure by matching the plan tree includes: problems with downstream systems, problems with databases, and problems with machines. I.e. an execution log enhancement for three sub-fault codes is required. After the sub fault codes corresponding to the service fault information are obtained, plan tree execution 413 is executed, each obtained sub fault code is analyzed, and whether log enhancement is performed on each sub fault code is judged, namely whether a log burying point for each sub fault code is set in the service system. For a sub fault code for which log enhancement is not started, an Anchor is prepared to be buried for the sub fault code. The information in the anchor includes but is not limited to: executing plan id, a system, a buried point coordinate, a variable name, log prefix information and a variable printing rule, wherein the buried point coordinate can be determined by three dimensions of class + method + specified insertion, and the variable name can be a variable definition name which can be accessed in a java stack when a reference code runs. The structure of the operation tree in this example can be seen in fig. 3.

As shown in fig. 4, the buried anchor42 may include: read source code 421, insert anchor422, and load class file 423. When the source code 421 is read, the program source code of each service system running on the current service platform line can be pulled from the git code library in real time. Then, an anchor422 is inserted into the designated line, specifically, java file information of the service system corresponding to the embedded point coordinate information can be loaded into the memory through the embedded point coordinate information in the anchor, and anchor related information is added to the corresponding line. After the anchor is added to the designated line, class file 423 is loaded, so that the java source code file in the memory can be finally converted into a class bytecode file through a java instruction, and preparation is provided for later bytecode enhancement.

As shown in FIG. 4, class file bytecode enhancements 43 may include: method signature validation 431, get target line number (VisitLineNumber)432, Anchor load and parse 433, access to local variable type (VisitVarInsn)444, join Log print to call class method (InvokeStatic), operation instruction intercept (vistInsn)446, and byte stream write native 447. In practical applications, class file bytecode enhancement 43 may be implemented by the near-end package of the business system.

In order to prevent loading to an incorrect method, method signature verification 431 is first performed, the class bytecode file obtained by conversion is read again when the anchor42 is embedded, and the target method, i.e. the embedded anchor, is filtered step by step according to the coordinate information in the anchor when the anchor is read. After the target method is determined, the read byte code file is continuously scanned line by line in VisitLineNumber 432, whether the line number of the current record is the target address or not is recorded, and if the line number is the target address, the line number of the current scan is determined to be the line number of the inserted anchor. After the line number of the embedded Anchor is acquired, the variable name in the Anchor is acquired, then Anchor loading and parsing 433 is executed, the byte code file is continuously scanned, and after the locally embedded Anchor variable is parsed, a memory is allocated for the variable and is stored in the current thread stack memory. After storing the Anchor local variable into the current thread stack memory, in accessing the local variable type (which can be completed by visitvarlnsn instruction) 434, a variable loading instruction is prepared to load the information of the Anchor stored in the previous step into the current operation stack for the subsequent log printing method to provide the parameters. After the specified line (i.e., the line corresponding to the target address) is resolved, a log print 435 is added with an Invoke class method (Invoke class method, Invoke electronic) and a static method call for log print is dynamically added to the specified line, and the method can be provided in advance by means of a jar packet. For example, a static method call for journaling may be added in the following manner:

after the dynamic log insertion method, one example of a bytecode file is:

at operation instruction interception (vistsnn) 436, it is confirmed whether the bytecode file has an abnormal condition. When it is determined that there is no exception, the entry byte stream is written into the local 437, the enhanced class bytecode file is uniformly stored into the local, and the class name definition may be the original class name + "enhance" + hashFile (class file name).

The dynamic loading 44 is to load the last link to generate an enhanced class bytecode file, and in a specific application, to better execute the dynamic loading, the java-agent can be started in the current environment. As shown in FIG. 4, dynamic loading 44 may include: class loader fetch 441, class load 442, object instantiation and reference 443, and reflection call 444.

And obtaining 441 the class loader of the current thread in order to ensure that the added class object and the original class object are loaded by the same loader, thereby avoiding the problem of class object isolation. After fetching the class loader, a class load 442 is performed, loading the enhanced class bytecode file so that the jvm environment can access the enhanced class bytecode file. Object instantiation and reference 443 is then executed, initializing the object logic inside the java. After the initialization is completed, a reflection call 444 is carried out, the added class bytecode file is used for replacing a program operated by the current service system in the step, because the java agent is started in the current environment, all direct calls after the service system is started are changed into a reflection call mode through a byte enhancement mode, and therefore, the memory address of the reflection call is replaced by the address of the enhanced class bytecode file in the memory in the step 444, and the whole class dynamic loading is realized.

If the logging is performed before the service system is on line, the logging is performed on all possible faults of the service system, otherwise, a specific fault cause may not be found due to some faults caused by lack of key logs, but on the other hand, since the faults themselves are small-probability events, if the logging is performed on all the small-probability events, many redundant and invalid logs may be caused, and the storage burden and cost are increased. Moreover, by positioning all logs at one time, although the implementation is simple, if the service system is updated after being online, all logs need to be positioned again, and continuous management cannot be performed. By adopting the method of dynamically setting the log burying points, when a service system fails, the log burying points can be dynamically set aiming at the failure, namely, under the condition that a small-probability event occurs, the log burying points are set aiming at the small-probability event, so that the interference caused by collecting too many logs can be avoided, the required log information can be collected, and the failure positioning is more accurate.

It should be noted that the execution subjects of the steps of the method provided in embodiment 1 may be the same device, or different devices may be used as the execution subjects of the method. For example, the execution subject of steps 102 and 104 may be device 1, and the execution subject of steps 106 to 110 may be device 2; for another example, the execution subject of step 102 may be device 1, and the execution subjects of steps 104 to 110 may be device 2; and so on.

Example 2

This embodiment provides a fault-based log burial point setting apparatus, which can be used to implement the method described in embodiment 1.

Fig. 5 is a schematic block diagram of a fault-based log burial point setting apparatus 500 according to the present disclosure, and as shown in fig. 5, in a software implementation, the apparatus 500 may include: a receiving unit 510, a first obtaining unit 520, a second obtaining unit 530, a third obtaining unit 540, a burying unit 550, and a dynamic loading unit 560.

In this embodiment, the functions of the above units are described, and other details may refer to the description in embodiment 1, which are not described herein again.

In this embodiment, the receiving unit 510 receives service failure information, where the service failure information includes: a service identifier and a fault code; a first obtaining unit 520, configured to obtain one or more sub fault codes matched with the service identifier and the fault code in the service fault information according to a preset correspondence between the service identifier and the fault code; a second obtaining unit 530, obtaining, from the one or more sub fault codes, a sub fault code that is not provided with a corresponding log burying point on the current service platform, and obtaining log burying point description information corresponding to the sub fault code that is not provided with the corresponding log burying point, where the log burying point description information includes: burying point coordinates; a third obtaining unit 540, configured to obtain a program source code of each service system that runs on the current service platform line; the embedding unit 550 is used for dynamically adding a corresponding log embedded point code segment to a corresponding position of the program source code according to the embedded point coordinates in the embedded point description information; and the dynamic loading unit 560 is configured to dynamically load the program source code added with the log burial point code segment to the current service platform.

In an optional implementation manner of this embodiment, the first obtaining unit 520 may obtain one or more sub fault codes that match the service identifier and the fault code in the service fault information by:

and acquiring a next-stage service identifier and a sub fault code matched with the service identifier and the fault code in the service fault information according to the preset corresponding relation between the service identifier, the fault code and the sub fault code, finding a next-stage service identifier and a sub fault code matched with the next-stage service identifier and the sub fault code according to the corresponding relation, and circulating recursion until the last-stage service identifier and the sub fault code are matched.

In an optional implementation manner of this embodiment, the second obtaining unit 530 may obtain the buried journal description information corresponding to the sub fault code where the corresponding buried journal is not set by:

and acquiring the log buried point description information corresponding to the sub fault code without the corresponding log buried point according to the corresponding relation between the preset sub fault code and the log buried point description information.

In an optional implementation manner of this embodiment, the log burial point description information may further include a collected variable name; the embedded unit 550 may include:

the source code loading module loads the program source code into a memory;

the source code modification module is used for adding code segments of the buried point description information into a code line corresponding to the buried point coordinates according to the buried point description information;

the source code compiling module is used for compiling the program source codes in the memory into code files which can be identified by the computer;

the code file modification module is used for distributing a memory for the variable corresponding to the variable name and storing the memory into a thread stack memory where the service system is located after the variable name is analyzed from the code file, and dynamically adding a static method calling code for log printing into the current line of the code file; and

and the storage module is used for storing the code file of the memory to the local.

In an optional implementation manner of this embodiment, the dynamic loading unit 560 may include:

the acquisition module acquires a loader of a current thread of the current service platform;

the loading module loads the code file stored locally by using the loader; and

and the replacing module is used for replacing the memory address quoted by the current service platform with the memory address corresponding to the loaded local code file in a reflection calling mode.

By the aid of the device, after the service platform is on line, the log burial points can be dynamically set aiming at the current faults, and collection of invalid and redundant logs is avoided.

Example 3

The embodiment provides a service self-healing system.

Fig. 6 is an architectural schematic diagram of an example of the service self-healing system of the present specification, and as shown in fig. 6, the service self-healing system 600 adds an execution planning module 670 to the service self-healing system 100 shown in fig. 1.

In one embodiment, the service module 610 runs each service system of the service system, executes the function of the service system, and provides a service for a user; the monitoring module 620 collects and analyzes the log generated by the service module 610, generates fault alarm information and sends the fault alarm information to the positioning module 630 when an abnormal log is found; the positioning module 630 receives the fault alarm information, matches the fault alarm information according to predetermined fault model data to obtain service fault information corresponding to the fault alarm information, and sends the service fault information to the execution planning module 670 and the decision module 640, where the service fault information includes: a service identifier and a fault code; an execution planning module 670, configured to receive service failure information, and set a log burying point for a service system operated by the service module 610 according to at least one implementation manner described in embodiment 1 above; the decision module 640 determines whether the service system is self-healing or not according to the fault code in the service fault information and current values of a plurality of preset monitoring indexes, and selects a self-healing plan for the service system to execute from preset plans under the condition that the service system is not self-healing; a plan module 660, configured to execute a flow code corresponding to the plan selected by the decision module 640, and perform emergency processing on the service in the service system, where the emergency processing includes at least one of: product degradation, service degradation, and service down-line; the data module 650 stores basic data in the service self-healing system, wherein the basic data includes but is not limited to: fault model data, current values of a plurality of monitoring indexes, a preset plan, emergency processing performed by the plan module 660, and the like.

In the above technical solution, the execution plan module 670 performs log addition during the operation of the service system of the service module 610, and configures and manages a log burying point in a plan execution manner, so that a developer does not need to additionally add a log code according to a self-healing requirement. By performing the plan configuration, designated log buried point code logic can be automatically added based on the designated buried point coordinates. And based on the mode of executing the plan, the self-healing alarm log or information can not be printed, the pressure of the disk is reduced, the self-healing alarm log or information is opened step by step only when the fault is suspected, and the fault reason is finally positioned.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 7, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a fault-based log buried point setting device on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

acquiring a sub fault code which is not provided with a corresponding log burying point on the current service system from the one or more sub fault codes, and acquiring log burying point description information corresponding to the sub fault code which is not provided with the corresponding log burying point, wherein the log burying point description information comprises: a buried point coordinate and a variable name;

acquiring a program source code running on the current service system line, and dynamically adding a corresponding log embedded point code segment into the program source code according to the embedded point description information;

and dynamically loading the program source code added with the log burial point code segment to the service system.

As described above in one or more embodiments of the present disclosure, the method performed by the fault-based log burying point setting apparatus according to the embodiment shown in fig. 6 can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may also execute the method of fig. 2, and implement the function of the fault-based log burial point setting apparatus in the embodiment shown in fig. 6, which is not described herein again.

Of course, besides the software implementation, the electronic device of one or more embodiments of the present disclosure does not exclude other implementations, such as logic devices or combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

This specification embodiment also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 2, and in particular to perform the following operations:

In general, the above description is only a preferred embodiment of one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A log burial point setting method based on faults comprises the following steps:

2. The method according to claim 1, wherein obtaining one or more sub fault codes matched with the service identifier and the fault code in the service fault information according to a preset correspondence between the service identifier, the fault code, and the sub fault codes comprises:

3. The method of claim 2, wherein obtaining the log burial point description information corresponding to the sub fault code without the corresponding log burial point, comprises:

4. The method according to claim 3, wherein the preset service identifier, the corresponding relationship between the fault code and the sub fault code, and the corresponding relationship between the preset sub fault code and the log buried point description information are stored in a tree structure.

5. The method of claim 1, wherein the log pit description information further comprises a collected variable name; according to the embedded point coordinates of the log embedded point description information, dynamically adding corresponding log embedded point code segments at corresponding positions of the program source codes, wherein the method comprises the following steps:

loading the program source code into a memory, and adding code segments of the embedded point description information into code lines corresponding to the embedded point coordinates;

compiling the program source codes in the memory into code files which can be identified by a computer;

scanning the code file line by line, after the variable name is analyzed from the code file, allocating a memory for the variable corresponding to the variable name and storing the memory into a thread stack memory where the service system is located, and dynamically adding a static method calling code for log printing into the current line of the code file;

and storing the code file of the memory to the local.

6. The method of claim 5, wherein dynamically loading the program source code added to the dynamic loading of the journal landed point to the current business platform comprises:

acquiring a loader of a current thread of the current service platform, and loading the code file stored in the local into a memory by using the loader;

and replacing the memory address referenced by the current service platform with the corresponding address of the local code file in the memory in a reflection calling mode.

7. A fault-based log burial point setting device comprises:

a third obtaining unit, configured to obtain a program source code of each service system that is running on the current service platform line;

the embedding unit is used for dynamically adding a corresponding log embedded point code segment into a corresponding position of the program source code according to the embedded point coordinates in the embedded point description information;

8. The apparatus according to claim 7, wherein the first obtaining unit obtains one or more sub fault codes matching the service identifier and the fault code in the service fault information by:

9. The apparatus according to claim 7, wherein the second acquisition unit acquires the buried journal description information corresponding to the sub fault code for which the corresponding buried journal is not provided, by:

10. The apparatus of claim 7, wherein the log burial point description information further comprises: the name of the variable collected; the embedded unit includes:

the source code loading module loads the program source code into a memory;

the source code modification module is used for adding code segments of the buried point description information into a code line corresponding to the buried point coordinates;

the code file modification module is used for distributing a memory for the variable corresponding to the variable name and storing the memory into a thread stack memory where the service system is located after the variable name is analyzed from the code file, and dynamically adding a static method calling code for log printing into the current line of the code file;

11. The apparatus of claim 10, wherein the dynamic loading unit comprises:

the loading module loads the code file stored locally by using the loader;

12. A computing device, comprising:

at least one processor; and

a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1 to 6.

13. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 1 to 6.

14. A service self-healing system, comprising: a service module, a monitoring module, a positioning module, a decision-making module, a data module, a plan module and an execution plan module, wherein,

the service module runs each service system on the service platform, executes the function of each service system and provides service for users;