CN108427624B

CN108427624B - System stability risk identification method and device

Info

Publication number: CN108427624B
Application number: CN201710075892.5A
Authority: CN
Inventors: 周涛明; 董建峰; 徐旭
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2017-02-13
Filing date: 2017-02-13
Publication date: 2021-03-02
Anticipated expiration: 2037-02-13
Also published as: CN108427624A

Abstract

The application provides a method and equipment for identifying system stability risks, and relates to the technical field of data identification. The apparatus comprises: the log receiving device is used for receiving a log of a target system; the system identification device is used for identifying the associated system on the target system link according to the log; and the risk identification device is used for extracting the risk characteristic value of the correlation system and identifying the stability risk of the target system. Through the technical scheme of this application, realized the stability risk of automatic identification target system to liberation human cost promotes the efficiency that stability was administered.

Description

System stability risk identification method and device

Technical Field

The application belongs to the technical field of data identification, and particularly relates to a method and equipment for identifying system stability risks.

Background

The risk of system stability is usually combed according to the steps of critical link judgment- > dependent combing- > risk identification. When the dependencies of the relationship system and the target system are all direct dependencies, the stability formula is shown in fig. 1, and the link stability of the target system is the stability of the relationship system 1 × the stability of the correlation system 2 × … … × the stability of the correlation system n. As can be seen from the stability formula, the comb link dependency is a common approach. The stability and direct dependence of the link are directly related. But to improve the stability, firstly, link combing is carried out, whether the link combing is directly dependent or not is carried out, if the link combing is directly dependent, whether double links exist or not is carried out, the stability is improved, and if the redundant links are ensured, the stability risk is relatively low. The stability risk generally requires two steps:

first, a dependent carding is performed. The current universal carding method in the industry needs a large amount of manual input to carry out the carding of the link, and consumes higher cost. Meanwhile, some risk points may be missed due to different experiences and abilities of each person, thereby causing a stability problem.

Then, risk identification is carried out on each dependence, and the common risks comprise:

(1) single point, single link risk DB single point risk (capacity risk and single library failure cause all services to be unavailable), single room risk, hot point link risk.

(2) Bypass risk, which refers to the impact of non-critical traffic on critical traffic, since non-critical traffic and critical traffic may share resources, while non-critical traffic directly impacts critical links. Typically by applying the entire call volume.

(3) Capacity risk, which refers to the capacity needed by the service and the actual difference, if any, may be referred to as capacity risk. The capacity risk is partly obtained by combing and partly checked by full link pressure measurement. The capacity risk requires on the one hand acquisition of an upper capacity limit for each dependency.

(4) Avalanche risk: when a single point of failure occurs, as the number of accesses increases and persists, the consumption of resources is increased, resulting in all requests being affected.

(5) Cascading redundancy risks: the risk of cascading redundancy generally refers to that in the case of a double link, because a main link has a problem, the capacity of another link cannot bear large-flow impact, and thus the link is unavailable.

The above risks are all risks leading to poor stability. Some tools in the prior art can provide data such as dependency relationship and call volume between systems, but have the following technical problems:

(1) most of the tools still need manual experience and capability due to lack of systematic and systematic tool forming methods, and the manual methods cannot be effectively solidified in the tools;

(2) low efficiency, large manpower input and repeated labor.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for identifying a system stability risk, which implement automatic identification of a stability risk of a target system, thereby liberating human costs and improving efficiency of stability management.

In order to achieve the above purpose, the present application provides the following technical solutions:

according to a first aspect of the present application, a method for identifying a risk of system stability is provided, including:

receiving a log of a target system;

identifying an associated system on the target system link according to the log;

and extracting the risk characteristic value of the correlation system, and identifying the stability risk of the target system.

According to a second aspect of the present application, a system stability risk identification device is proposed, comprising:

the log receiving device is used for receiving a log of a target system;

the system identification device is used for identifying the associated system on the target system link according to the log;

and the risk identification device is used for extracting the risk characteristic value of the correlation system and identifying the stability risk of the target system.

According to the technical scheme, the logs of the target system are analyzed, the corresponding correlation system is firstly identified, the risk characteristic value of the correlation system is extracted, the stability risk of the target system is further identified, the automatic identification of the stability risk of the system is realized, the repeated labor is reduced, and the working efficiency is greatly improved.

In order to make the aforementioned and other objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 shows a schematic diagram of a prior art carding link dependence;

fig. 2 shows a block diagram of a system stability risk identification device according to an exemplary embodiment of the present application;

fig. 3 is a block diagram illustrating a first structure of an implementation manner of a risk identification apparatus in a system stability risk identification device according to an exemplary embodiment of the present application;

fig. 4 shows a block diagram of a second implementation manner of the risk identification device in the system stability risk identification device according to an exemplary embodiment of the present application;

fig. 5 is a block diagram illustrating a third implementation manner of the risk identification device in the system stability risk identification apparatus according to an exemplary embodiment of the present application;

FIG. 6 illustrates a flow chart of a method of identifying risk of system stability according to an exemplary embodiment of the present application;

FIG. 7 shows a flowchart of a first embodiment of step S103 in FIG. 6;

fig. 8 shows a flowchart of a second embodiment of step S103 in fig. 6;

fig. 9 shows a flowchart of a third embodiment of step S103 in fig. 6;

FIG. 10 is a schematic diagram of a target system and an association system.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present application.

Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or modular units may be included in the methods or apparatus based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution sequence of the steps or the module structure of the apparatus is not limited to the execution sequence or the module structure shown in the embodiment or the drawings of the present application. When the described method or module structure is applied to a practical device or an end product, the method or module structure according to the embodiment or the figures may be executed sequentially or executed in parallel (for example, in the environment of parallel processors or multi-thread processing, or even in the environment of distributed processing).

Fig. 2 is a block diagram illustrating a structure of a system stability risk identification device according to an exemplary embodiment of the present application, and referring to fig. 2, the present application provides a system stability risk identification device including:

the log receiving apparatus 100 is configured to receive a log of a target system.

And the system identification device 200 is used for identifying the associated system on the link of the target system according to the log. In a specific embodiment, the log of the target system includes all the associated systems called by the target system, so that all the associated systems on the link can be identified according to the log, the number of the associated systems can be n, and n is a natural number. Referring to fig. 10, system a is the target system, and system B is the associated system of system a.

The stability risk has a distinct characteristic that the stability of the overall link of a target system is equal to the product of the stabilities of the links of the respective associated systems (i.e. the dependent ones), as can be seen from the stability definition formula.

In a specific embodiment, taking an example that a target system displays product details, logic of a product detail page depends on calling progress display of a plurality of services, specifically, it can be obtained from a log of the target system that a product price and a product name are displayed by a server calling basic information of the product, logistics display is displayed by an information server calling logistics, evaluation display is displayed by a server calling evaluation, information display of a store is displayed by a store server calling, and when the target system displays the product details, an association system includes the server calling basic information of the product, the information server calling logistics, the server evaluating and the store server.

And the risk identification device 300 is used for extracting the risk characteristic value of the associated system and identifying the stability risk of the target system.

In one embodiment of the present application, the apparatus further comprises:

and the storage and display device is used for storing and displaying the risk characteristic value of the associated system and the stability risk of the target system. In a specific embodiment, if the target system needs to show product details, if a correlation system store information server is identified as strongly dependent by the risk identification device, a risk characteristic value of the strong dependence of the correlation server of the target system is saved and displayed, and the link stability risk of the target system store is a strongly dependent risk.

In the present application, fault injection is performed on all the associated systems of the link one by one to further implement risk identification.

Fig. 3 is a block diagram illustrating a first embodiment of a risk identification device according to an exemplary embodiment of the present application, and referring to fig. 3, the first embodiment of the risk identification device includes:

the system setting module 301 is configured to set the associated system as unavailable, specifically, by modifying an access address of the associated system or setting a network of the associated system as a failure.

A first determining module 302, configured to determine whether a link of the target system after being configured is available, and if the determination is negative (in a specific embodiment, if the target system is not accessible by a user, the link of the target system is considered to be unavailable), execute a second determining module;

the second judging module 303 is configured to judge whether the association system is a database, execute the first identifying module if the association system is judged to be the database, and execute the second identifying module if the association system is judged to be the database;

the first identification module 304 is configured to identify that the risk characteristic value of the database is a database single point, and the stability risk of the target system is a single point risk;

the second identifying module 305 is configured to identify that the risk characteristic value of the associated system is a strong dependency, and the stability risk of the target system is a strong dependency risk.

In the first embodiment, by means of fault injection, if an association system (i.e. dependency) is set as unavailable (by disconnecting the network or modifying the association system to a wrong address), the entire link is unavailable (inaccessible to the user) after a fault is injected, and the corresponding association system (i.e. dependency) is a database, the risk characteristic value is identified as a database single point, and the stability risk of the target system is a single point risk. If the associated system (i.e., dependency) is not a database, a risk characteristic value of a strong dependency is identified and the stability risk of the target system is a strong dependency risk.

In a specific embodiment, if a target system needs to show product details, and a correlation system store information server is identified to be strongly dependent through a risk identification device, a risk characteristic value of the strong dependence of the correlation server is extracted, and the link stability risk of the target system store is a strong dependence risk.

Fig. 4 is a block diagram illustrating a second implementation of the risk identification apparatus in the system stability risk identification device according to an exemplary embodiment of the present application, and referring to fig. 4, the apparatus in the second implementation includes:

a third judging module 306, configured to judge whether there is a cache on the link according to the log, and if yes, execute a fourth judging module;

the fourth judging module 307 is configured to judge whether the associated system is a single-database, execute the third identifying module if the associated system is a single-database, and execute the first log analyzing module if the associated system is not a single-database;

the third identifying module 308 is configured to identify that the risk characteristic value of the associated system is cache breakdown, and the stability risk of the target system is a redundancy cascade risk;

the first log analysis module 309 is configured to analyze the log to obtain an access hotspot of the cached single key value;

a fourth identifying module 310, configured to identify the risk feature value of the association system as a cache hotspot when the access hotspot reaches a preset first threshold, where the stability risk of the target system is a hotspot storage risk.

In the second embodiment, whether a cache exists on a link is judged according to the log, when the cache exists and the correlation system is a single-database, the risk characteristic value of the correlation system is identified as cache breakdown, and the stability risk of the target system is a redundancy cascade risk. When a cache exists and the associated system is not a single-database, the cache hot spots can be analyzed through log analysis, the access logs of all days are analyzed, and if the access hot spots of the cached single key values are larger than a first threshold (the first threshold can be set according to different situations, such as 50%), the hot spot storage risk exists.

Fig. 5 is a block diagram illustrating a third implementation of the risk identification apparatus in the system stability risk identification device according to an exemplary embodiment of the present application, and referring to fig. 5, in the third implementation, the apparatus includes:

a fifth judging module 311, configured to judge whether there is a database on the link according to the log, and if yes, execute a second log analyzing module;

the second log analysis module 312 is configured to analyze the log to obtain an access hotspot of a single key value of the database;

a fifth identifying module 313, configured to, when the access hotspot reaches a second preset threshold, determine that the risk characteristic value of the association system is a database hotspot, and determine that the stability risk of the target system is a hotspot storage risk.

In the third embodiment, whether a database exists on a link is judged according to a log, when a database exists, the database hotspot can be analyzed through the log, the access log of the whole day is analyzed, and if the access hotspot of the single key value of the database is larger than a second threshold (the second threshold can be set according to different situations, such as 50%), the hotspot storage risk exists.

As described above, the present application provides a system stability risk identification device, which first identifies a corresponding correlation system by analyzing a log of a target system, and then extracts a risk characteristic value of the correlation system, thereby identifying a stability risk of the target system, and implementing automatic identification of the stability risk of the system, thereby reducing repeated labor and greatly improving work efficiency.

After the device of the present application is introduced, a method for identifying a risk of system stability of the present application is described next with reference to the drawings. The implementation of the method can be referred to the implementation of the device, and repeated details are not repeated.

Fig. 6 is a flowchart illustrating a method for identifying a risk of system stability according to an exemplary embodiment of the present application, and referring to fig. 6, the method for identifying a risk of system stability provided by the present application includes:

s101: a log of a target system is received.

S102: and identifying the associated system on the target system link according to the log. In a specific embodiment, the log of the target system includes all the associated systems called by the target system, so that all the associated systems on the link can be identified according to the log, the number of the associated systems can be n, and n is a natural number. Referring to fig. 10, system a is the target system, and system B is the associated system of system a.

S103: and extracting the risk characteristic value of the correlation system, and identifying the stability risk of the target system.

In one embodiment of the present application, the method further comprises:

s104: and storing and displaying the risk characteristic value of the associated system and the stability risk of the target system. In a specific embodiment, if the target system needs to show product details, if a correlation system store information server is identified as strongly dependent by the risk identification device, a risk characteristic value of the strong dependence of the correlation server of the target system is saved and displayed, and the link stability risk of the target system store is a strongly dependent risk.

In the present application, fault injection is performed on all the associated systems of the link one by one to realize risk identification.

Fig. 7 shows a flowchart of a first embodiment of step S103, please refer to fig. 7, in the first embodiment, the step includes:

s201: setting the association system as unavailable may be implemented, in particular, by modifying an access address of the association system or setting a network of the association system as failed.

S202: judging whether the set link of the target system is available, and if not (in a specific implementation mode, if the user cannot access the target system, the link of the target system is considered to be unavailable), executing S203;

s203: judging whether the associated system is a database, if so, executing S204, otherwise, executing S205;

s204: identifying that the risk characteristic value of the database is a database single point, and the stability risk of the target system is a single point risk;

s205: and identifying that the risk characteristic value of the correlation system is a strong dependence, and the stability risk of the target system is a strong dependence risk.

Fig. 8 shows a flowchart of a second embodiment of step S103, and referring to fig. 8, in the second embodiment, the step includes:

s301: judging whether a cache exists on the link according to the log, and executing S302 when the cache exists on the link;

s302: judging whether the associated system is a single-database, if so, executing S303, otherwise, executing S304;

s304: identifying that the risk characteristic value of the correlation system is cache breakdown, and the stability risk of the target system is redundancy cascade risk;

s304: analyzing the log to obtain an access hotspot of the cached single key value;

s305: and when the access hotspot reaches a preset first threshold value, identifying the risk characteristic value of the correlation system as a cache hotspot, and identifying the stability risk of the target system as a hotspot storage risk.

Fig. 9 shows a flowchart of a third embodiment of step S103, and referring to fig. 9, in embodiment 5, the step includes:

s401: judging whether a database exists on the link or not according to the log, and executing S402 when the judgment is yes;

s402: analyzing the log to obtain an access hotspot of the single key value of the database;

s403: when the access hotspot reaches a preset second threshold value, the risk characteristic value of the correlation system is a database hotspot, and the stability risk of the target system is a hotspot storage risk.

As described above, the application provides a method for identifying a system stability risk, by analyzing a log of a target system, a corresponding correlation system is identified firstly, a risk characteristic value of the correlation system is extracted secondly, and then the stability risk of the target system is identified, so that automatic identification of the stability risk of the system is realized, thereby reducing repeated labor and greatly improving working efficiency.

It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

The units, devices, modules, etc. set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of a plurality of sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A method for identifying risk of system stability, the method comprising:

receiving a log of a target system;

extracting a risk characteristic value of the correlation system, and identifying a stability risk of the target system; which comprises the following steps:

judging whether a cache exists on the link or not according to the log;

if so, continuously judging whether the associated system is a single-database;

when the judgment result is yes, the risk characteristic value of the correlation system is cache breakdown, and the stability risk of the target system is a redundancy cascade risk;

otherwise, analyzing the log to obtain an access hotspot of the cached single key value;

when the access hotspot reaches a preset first threshold value, the risk characteristic value of the association system is a cache hotspot, and the stability risk of the target system is a hotspot storage risk.

2. The method according to claim 1, wherein the number of the correlation systems is n, and n is a natural number.

3. The method of claim 2, wherein extracting risk feature values of the associated system and identifying stability risks of the target system comprises:

setting the associated system as unavailable;

judging whether the set link of the target system is available;

when the judgment result is negative, continuously judging whether the associated system is a database;

when the risk characteristic value of the database is judged to be the single point of the database, the stability risk of the target system is the single point risk;

otherwise, the risk characteristic value of the correlation system is a strong dependence, and the stability risk of the target system is a strong dependence risk.

4. The method of claim 3, wherein setting the association system as unavailable comprises:

modifying an access address of the associated system or setting a network of the associated system to a fault.

5. The method of claim 2, wherein extracting risk feature values of the associated system and identifying stability risks of the target system comprises:

judging whether a database exists on the link or not according to the log;

if so, analyzing the log to obtain an access hotspot of the single key value of the database;

when the access hotspot reaches a preset second threshold value, the risk characteristic value of the correlation system is a database hotspot, and the stability risk of the target system is a hotspot storage risk.

6. The method according to any one of claims 3 to 5, further comprising:

and storing and displaying the risk characteristic value of the associated system and the stability risk of the target system.

7. An apparatus for identifying risk of system stability, the apparatus comprising:

the log receiving device is used for receiving a log of a target system;

the risk identification device is used for extracting a risk characteristic value of the correlation system and identifying the stability risk of the target system;

wherein the risk identification means comprises:

the third judging module is used for judging whether a cache exists on the link according to the log, and if so, the fourth judging module is executed;

the fourth judging module is used for judging whether the associated system is a single-database, if so, executing the third identifying module, and otherwise, executing the first log analyzing module;

the third identification module is configured to identify that the risk characteristic value of the associated system is cache breakdown, and the stability risk of the target system is a redundancy cascade risk;

the first log analysis module is used for analyzing the log to obtain an access hotspot of the cached single key value;

and the fourth identification module is used for identifying the risk characteristic value of the association system as a cache hotspot when the access hotspot reaches a preset first threshold, and the stability risk of the target system is a hotspot storage risk.

8. The apparatus of claim 7, wherein the number of the correlation systems is n, and n is a natural number.

9. The apparatus of claim 8, wherein the risk identification device comprises:

a system setting module for setting the associated system as unavailable;

the first judging module is used for judging whether the link of the target system after being set is available or not, and when the judgment is negative, the second judging module is executed;

the second judging module is used for judging whether the associated system is a database, if so, executing the first identifying module, otherwise, executing the second identifying module;

the first identification module is used for identifying that the risk characteristic value of the database is a database single point, and the stability risk of the target system is a single point risk;

the second identification module is used for identifying that the risk characteristic value of the correlation system is a strong dependence, and the stability risk of the target system is a strong dependence risk.

10. The apparatus of claim 9, wherein the system setup module setting the associated system as unavailable comprises the system setup module modifying an access address of the associated system or setting a network of the associated system as down.

11. The apparatus of claim 8, wherein the risk identification device comprises:

the fifth judging module is used for judging whether a database exists on the link or not according to the log, and if so, executing the second log analyzing module;

the second log analysis module is used for analyzing the log to obtain an access hotspot of the single key value of the database;

and the fifth identification module is used for setting the risk characteristic value of the association system as a database hotspot and setting the stability risk of the target system as a hotspot storage risk when the access hotspot reaches a preset second threshold.

12. The apparatus according to any one of claims 9 to 11, characterized in that it further comprises:

and the storage and display device is used for storing and displaying the risk characteristic value of the associated system and the stability risk of the target system.