WO2023207689A1 - Change risk assessment method and apparatus, and storage medium - Google Patents

Change risk assessment method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023207689A1
WO2023207689A1 PCT/CN2023/089099 CN2023089099W WO2023207689A1 WO 2023207689 A1 WO2023207689 A1 WO 2023207689A1 CN 2023089099 W CN2023089099 W CN 2023089099W WO 2023207689 A1 WO2023207689 A1 WO 2023207689A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
change
target
alarm information
level
Prior art date
Application number
PCT/CN2023/089099
Other languages
French (fr)
Chinese (zh)
Inventor
吕彪
戚依宁
王绍哲
党浩
方崇荣
祝顺民
蒋江伟
程鹏
陈积明
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2023207689A1 publication Critical patent/WO2023207689A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design

Definitions

  • This application relates to the field of cloud technology, and in particular to a change risk assessment method, equipment and storage medium.
  • the embodiment of this application provides a change risk assessment method, including:
  • the change impact scope In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • a risk assessment is performed on the target change event.
  • An embodiment of the present application also provides a computing device, including a memory and a processor;
  • the memory is used to store one or more computer instructions
  • the processor is coupled to the memory for executing the one or more computer instructions for:
  • the change impact scope In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • a risk assessment is performed on the target change event.
  • Embodiments of the present application also provide a computer-readable storage medium that stores computer instructions.
  • the computer instructions are executed by one or more processors, the one or more processors are caused to execute the aforementioned change risk assessment method.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation range of the change risk assessment work is large enough and help improve the accuracy of the risk assessment; Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner.
  • Figure 1a is a schematic flow chart of a change risk assessment method provided by an exemplary embodiment of the present application
  • Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application
  • Figure 2a is a logical schematic diagram of network scope expansion of a change object provided by an exemplary embodiment of the present application
  • Figure 2b is a logical schematic diagram of extending the network range of an alarm occurrence object provided by an exemplary embodiment of the present application
  • Figure 3 is a logical schematic diagram of a solution for selecting target alarm information for target change events provided by an exemplary embodiment of the present application
  • Figure 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application
  • Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the risk assessment. Accuracy; alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, this can promptly discover problems that have not yet appeared in the cloud network.
  • Figure 1a is a schematic flowchart of a change risk assessment method provided by an exemplary embodiment of the present application.
  • the method can be executed by a change risk assessment device.
  • the change risk assessment device can be implemented as a combination of software and/or hardware.
  • the change risk assessment method can be implemented as a combination of software and/or hardware.
  • the evaluation device can be integrated in the computing device. Referring to Figure 1a, the method may include:
  • Step 100 In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the topology information preset in the cloud network.
  • the topology information includes the affiliation relationships between objects at different levels in the cloud network and the relationships between objects at the same level. Association relationship, the scope of change influence includes at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level;
  • Step 101 Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
  • Step 102 Determine the alarm impact scope corresponding to the multiple alarm information according to the topology information.
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • Step 103 Select at least one target alarm information that is adapted to the target change event from multiple alarm information, and there is overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
  • Step 104 Perform a risk assessment on the target change event based on at least one target alarm information.
  • the change risk assessment method provided in this embodiment can be applied to operation and maintenance scenarios of changes in cloud networks.
  • the cloud network may refer to a software-defined computing network in a cloud computing infrastructure. Of course, the definition here is only in a narrow sense.
  • the cloud network in this embodiment may generally refer to a network architecture built based on cloud technology. In this embodiment, operations such as code modification, configuring new functions, and bug fixes performed on functional components that occur in the cloud network can be called changes.
  • the change risk assessment method provided in this embodiment can be applied to the testing phase before the change is launched to replace the existing testing method. It can also be used to continue tracking and evaluating the change after the change is launched to timely discover undetected defects in the testing phase. Change risks, thereby effectively avoiding cloud failures caused by changes.
  • the target change event in this embodiment can be used to refer to any change that occurs in the cloud network.
  • a change object is usually specified in a change.
  • a certain gateway component in the cloud network can be specified. as the object of change.
  • the above-mentioned repair Gateway component A and switch component B can be specified in the buge class change, and gateway component A and switch component B are located in the same availability zone (level object).
  • this is only preferred, and this embodiment is not limited to this.
  • Multiple change objects in the same change event may not have the above restrictions.
  • it is only necessary to perform the above steps within the change event. It is required to group the change objects, analyze the change risks by groups, and then synthesize the analysis results to achieve risk assessment of the change events.
  • the above-mentioned priority implementation method will be used by default to define the target change event in the following article.
  • the risk assessment instruction can be triggered periodically or according to other trigger conditions.
  • the change risk assessment logic shown in Figure 1a will be continuously executed during the duration of the change event, so that Track and evaluate change events, for example, every 10 seconds.
  • the end time of the tracking assessment can be set according to timing needs. For example, it can be set to stop the assessment after a preset time period after the change event begins, or when the risk assessment result corresponding to the change event is lower than the preset standard and has continued. Stop the evaluation after a preset time period, etc. This embodiment does not limit this.
  • the change impact scope corresponding to the target change event can be reasonably expanded in response to the risk assessment instruction.
  • the expansion of the impact scope of the change refers to extending the impact scope of the target change event from the change object to a larger scope.
  • the change object is a physical gateway
  • the original change impact scope of the target change event is a device in the cloud network. It can be gradually extended to the cluster to which the device belongs, to the availability zone to which the device belongs, and to the region to which the device belongs. region etc.
  • the scope of change influence of the target change event can be expanded, and a large enough observation range is provided for the target change event. Observing the target change event within a large enough observation range can effectively improve the accuracy of risk assessment.
  • multi-level objects can be divided into the cloud network, and the topology information in the cloud network can be predefined based on the multi-level objects.
  • multiple-level objects can include but are not limited to instances, network elements, applications, devices, clusters, availability zones, regions, etc.
  • topology information can include affiliation relationships between objects at different levels and associations between objects at the same level.
  • An exemplary affiliation relationship can be that the device belongs to the cluster, the cluster belongs to the availability zone, and the availability zone belongs to the region;
  • an exemplary association relationship between objects of the same level can be that there may be resource association relationships between different instances or There may be resource associations between instances and applications. This embodiment does not limit the specific relationship logic contained in the topology information.
  • the specifications of objects at different levels are different.
  • the specifications of a region are larger than the availability zone.
  • the specifications of objects of the same level can be the same or similar.
  • instances and applications belong to objects of the same level.
  • this embodiment does not limit this.
  • Based on the topology information in the cloud network a change object or an alarm occurrence object can be expanded to a larger scope of influence, including expansion within the same level and expansion to a higher level.
  • the level objects to which the change object belongs can be searched level by level according to the topology information preset in the cloud network, so as to obtain and change At least one layer of superior objects corresponding to the object; it can also search for objects of the same level that are associated with the change object; thus, the determined change impact scope can include at least one layer of superior objects corresponding to the change object of the target change event and the associated same-level objects. level object.
  • the object of the specified level can be used as the end condition.
  • the level objects to which the change object belongs can be searched level by level until the specified level is found. End the search after the object to obtain at least one layer of superior objects corresponding to the change object of the target change event.
  • the object at the specified level may be a region.
  • this is only exemplary and is not limited in this embodiment.
  • the obtained change impact scope can be expanded to record multiple levels of objects that may be affected by the target change event.
  • Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application.
  • the change event can be obtained from the change system in the cloud network, and the change object will be specified in the change event.
  • this embodiment also innovatively proposes to introduce alarm information as the basis for change risk assessment.
  • Mature monitoring systems are usually deployed in cloud networks.
  • the monitoring system is used to monitor the operating status of various points in the cloud network, such as monitoring traffic status, packet loss status, delay status, etc. A large number of data will be generated in the monitoring system.
  • Alarm information In this embodiment, alarm information generated by the monitoring system in the cloud network can be collected, and these alarm information can be used as the basis for change risk assessment.
  • the change risk assessment work can be triggered periodically or in the form of other trigger conditions.
  • a risk assessment instruction is triggered, in this embodiment, new data added in the cloud network after the last risk assessment can be collected The alarm information will be used as the basis for this change risk assessment.
  • monitoring systems in cloud networks usually adopt a single-point monitoring method.
  • the monitoring objects are usually at the instance, device, network element, application, etc. levels.
  • the alarm occurrence objects in the alarm information are usually A la carte.
  • step 101 multiple alarm information generated in the cloud network within a preset time range after the target change event occurs may be collected.
  • the alarm impact scope corresponding to the multiple alarm information can be determined according to the topology information.
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and the associated same-level objects. object.
  • the alarm information may include information such as the alarm occurrence object, alarm level, and alarm description content.
  • the alarm occurrence object refers to the object in the cloud network where abnormal conditions occur, and the abnormal conditions on the objects trigger the monitoring system to send out alarm information.
  • the level objects of alarm occurrence objects in different alarm information may be different.
  • the level objects of alarm occurrence objects may include but are not limited to the aforementioned instance, device, network element, application and other levels.
  • the original scope of influence corresponding to the alarm information can be expanded so that the alarm information can cover a larger scope of influence, which can assign the alarm information to the expanded scope of influence.
  • the impact scope of the change may be across components.
  • the abnormalities caused by the change may only appear at a few points, while the abnormalities caused by it at other points may not yet be seen.
  • the alarm discovered at a single point is reasonably expanded to a larger scope by extending the scope of influence of the alarm information, so as to fully discover the potential potential that has not yet appeared in the expanded alarm scope. Alarms, and these potential alarms that have not yet appeared can be fully involved in the change risk assessment process, providing a more comprehensive basis for the change risk assessment work.
  • step 100 and step 102 may be performed synchronously, and the order is not limited in this embodiment.
  • the logic of expanding the alarm impact scope is basically similar to the logic of expanding the change impact scope. The details of the expansion operation of the alarm impact scope will not be repeated here.
  • step 103 at least one piece of target alarm information adapted to the target change event can be selected from a plurality of alarm information.
  • the change impact scope and the alarm impact scope can be overlapped and analyzed, and the alarm information corresponding to the alarm impact scope that overlaps with the change impact scope can be used as the target alarm information. That is to say, the alarm impact scope of the target alarm information needs to overlap with the change impact scope of the target change event, and the overlapping part contains at least one level object.
  • the observation range of the target change event is expanded, and the coverage of the alarm information is expanded.
  • this embodiment describes the matching process in step 102 from the perspective of target change events, but it should be understood that this embodiment does not limit the primary and secondary roles of alarm information and change events in the matching process. , you can search for target alarm information from multiple alarm information from the perspective of each change event, or you can search for matching change events from multiple change events from the perspective of each alarm information, and the alarm information naturally becomes The target alarm information corresponding to the matched change event. Moreover, the matching operation may be synchronous or, of course, asynchronous, which is not limited in this embodiment.
  • a risk assessment can be performed on the target change event based on the at least one piece of target alarm information.
  • whether there is a risk in the target change event can be assessed by analyzing at least one target alarm information.
  • multiple implementation methods can be used to perform change risk assessment on target change events based on target alarm information. The specific implementation methods will be described in detail in subsequent embodiments.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the accuracy of the risk assessment.
  • Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner. , and fully participate in these potential alarms in the change risk assessment process; on this basis, you can also find target alarm information that matches the target change event by judging whether there is overlap between the alarm impact scope and the change impact scope.
  • the alarm information can cover the entire network, through the alarm information matching scheme proposed in this embodiment, the cross-component alarm information with the changed object can be introduced into the risk assessment work of the target changed object. This It can effectively solve the current dilemma of being unable to evaluate changes across components (usually different components are responsible for different departments. Currently, change testing is usually only carried out by the department responsible for the change object, and the departments responsible for other related components are not even aware of the change. ).
  • the topology information in the cloud network may adopt a tree structure, that is, the topology information in the cloud network may be represented in the form of a topology tree.
  • the topology tree can follow a hierarchical structure, and objects of different levels are reasonably distributed in each layer of the topology tree.
  • the upper layer of the device class object can be a cluster class object
  • the upper layer can be an availability zone class object
  • the upper layer can be a region class object.
  • a solution for determining the scope of change influence may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized with the change object as the root.
  • Figure 2a is a logical schematic diagram for determining the impact scope of a change provided by an exemplary embodiment of the present application. Referring to Figure 2a, the change object in the target change event is the AVS device.
  • the AVS device can be extended level by level to the AVS cluster, availability zone and region to which it belongs, thereby obtaining the scope of change and also That is the changed topology tree on the far right in Figure 2a.
  • Figure 2a only shows the topology tree structure when the change object is a device.
  • the initial level object to which the change object belongs is other objects, it can be adaptively expanded according to the topology information preset in the cloud network.
  • a solution for determining the scope of alarm impact may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level are organized with the alarm occurrence object as the root.
  • the alarm topology tree of the node to represent the alarm impact scope.
  • Figure 2b is a logical schematic diagram for determining the impact scope of an alarm provided by an exemplary embodiment of the present application.
  • the first alarm information is an EIP instance alarm. In this case, the alarm in the first alarm information can be found first.
  • the object that is, the EIP instance, and based on the topology information preset in the cloud network, find the ECS instance that has a resource association with the EIP instance; then, you can find the XGW cluster that hosts the EIP instance and the AVS device that hosts the ECS instance. Then, in a similar manner to Figure 2a, continue to extend the XGW cluster level by level to the availability zone and region to which it belongs, and extend the AVS device to the AVS cluster, availability zone and region to which it belongs level by level, so as to obtain the alarm impact.
  • the scope is the alarm topology tree on the far right side in Figure 2b.
  • multi-level objects are divided into the cloud network, and multiple actual objects may exist under each level object.
  • instance 1 and instance 2 may exist under the instance class level object.
  • instance 3 and other instances and there can be several availability zones such as availability zone A, availability zone B and so on under the availability zone class level object.
  • the change object in the process of constructing the change topology tree for the target change event, it is only necessary to place the change object in the topology information in the cloud network according to the level object to which it belongs. Based on the topology information, the change object can be determined and changed. Actual objects under objects at other levels that have topological relationships.
  • both the change impact scope and the alarm impact scope are represented by a topology tree
  • the process of selecting at least one target alarm information adapted to the target change event from multiple alarm information if the alarm topology tree and the change alarm If there are overlapping tree nodes between trees, it can be determined that the corresponding alarm impact scope and the change impact scope overlap.
  • FIG. 3 is a logical schematic diagram of a solution for selecting target alarm information for a target change event provided by an exemplary embodiment of the present application.
  • the left side is the change topology tree corresponding to the target change event
  • the right side is the alarm topology tree corresponding to the two alarm information. It can be seen that both alarm topology trees have overlapping tree nodes with the change topology tree. Therefore, Both alarm information in Figure 3 can be determined as target alarm information corresponding to the target change event. In this way, through the topology tree, the adapted target alarm information for the target change event can be determined conveniently, quickly and accurately.
  • a topology tree can be used to represent the change impact scope and alarm impact scope, which can not only clearly and comprehensively present at least one level of objects expanded by the impact scope expansion operation, but also present the extended Topological information such as affiliation and resource association between the exported object and the changed object.
  • other methods can be used to characterize the change impact range and alarm impact range.
  • the data structure of [first-level objects, second-level objects...; topological information between level objects] can be used to represent
  • the representation may also be represented by a collection+tag method, etc. This embodiment is not limited to this.
  • FIG. 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application.
  • at least one target alarm information will be associated with the change impact scope corresponding to the target change event.
  • at least one target alarm information will be associated with the minimum overlap in the change impact scope. level object.
  • the AVS device 2 node and AVS device 3 node in the changed topology tree are also associated with alarm information; there is also a P1 level target alarm information and the smallest level object that overlaps with the changed topology tree is AVS Cluster 1, then the target alarm information can be associated with the AVS cluster 1 node in the changed topology tree.
  • various implementation methods may be used to perform risk assessment on target change events based on at least one piece of target alarm information.
  • the alarm level recorded in at least one target alarm information can be obtained; based on the respective alarm levels corresponding to the at least one target alarm information, the risk assessment value corresponding to the target change event is calculated; if the risk assessment If the value meets the preset conditions, it is determined that there is a risk in the target change event.
  • the alarm level is the existing information in the alarm information, which is used to represent the severity, impact, etc. of the corresponding abnormal event.
  • the alarm level can be extracted from at least one target alarm information, and the extracted alarm level can be used as the basis for change risk assessment. This makes the calculation logic of risk assessment values more concise and clever.
  • the risk assessment value can be used to characterize the risk level of the target change event.
  • the higher the risk assessment value the higher the risk level of the target change event, and the higher the possibility and severity of the failure it may cause to the cloud.
  • the degree may also be higher.
  • the degree of correlation between each of the at least one piece of target alarm information and the target change event can be determined. ; Assign a weight to at least one target alarm information according to the degree of correlation; Calculate the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one target alarm information.
  • the degree of correlation is used to represent the degree of fit between the alarm impact scope and the change impact scope.
  • the target alarm information is assigned a higher degree of correlation.
  • the smallest level object in the overlap between the change influence scope and the alarm influence scope corresponding to at least one target alarm information can be searched; according to the smallest level object corresponding to at least one target alarm information, to the change object
  • the level distance between them is used to assign a correlation degree to at least one target alarm information.
  • the level distance is essentially the number of levels in the affiliation relationship between the smallest level object and the change object.
  • the lowest tree node in the overlapping portion between the change impact scope and the alarm impact scope corresponding to at least one target alarm information can be searched, where a single tree
  • the node corresponds to a level object; based on the topological distance between the lowest tree node corresponding to at least one target alarm information and the change object in the change topology tree that represents the scope of the change, determine the level distance corresponding to at least one target alarm information.
  • the corresponding lowest tree node has the same level distance between the target alarm information with the same position in the topology tree that represents the scope of the change.
  • the lowest tree node overlapping between the alarm topology tree on the upper right and the change topology tree on the left is AVS device 2; while the lowest tree node overlapping between the alarm topology tree on the lower right and the change topology tree on the left is available.
  • the target alarm information corresponding to the alarm topology tree on the upper right will obtain a higher degree of correlation than the target alarm information corresponding to the alarm topology tree on the lower right.
  • level distance can be used to express the degree of correlation.
  • the level distance between the alarm topology tree on the upper right and the changed topology tree is 1, so its corresponding target alarm can be assigned a correlation degree of 1; If the level distance between the alarm topology tree and the changed topology tree is 2, then its corresponding target alarm can be assigned a correlation level of 2.
  • a weight can be assigned to at least one target alarm information according to the degree of correlation.
  • target alarm information with a higher degree of correlation can be assigned a higher weight to reflect its higher reference role for target change events.
  • target alarm information with the same degree of correlation can be assigned the same weight.
  • the weight can be calculated according to the following formula:
  • p is the above-mentioned correlation degree (which can be represented by topological distance)
  • ⁇ and ⁇ are empirical parameters
  • q is the alarm proportion in the level object corresponding to the current level distance.
  • the alarm ratio is used to represent the proportion of the number of objects matching alarm information within the same level within the scope of change to the total number of objects within that level.
  • the alarm ratio of this layer can be 1 (also That is 100%); for the same reason, the three AVS devices under AVS cluster 1 in the previous level also all match the alarm information, so the alarm ratio can also be 1; and the availability zone a in the previous level If only AVS cluster 1 matches the alarm information, then the alarm ratio of availability zone a can be 1/21 (that is, among the 21 clusters in the availability zone, only 1 matches the alarm information), and then go up to the next level.
  • the alarm ratio in inner areas can be 1/5.
  • the initial risk value assigned to the corresponding alarm level of the at least one target alarm information can be obtained ; Based on the corresponding weight of at least one piece of target alarm information, perform a weighted sum of the initial risk values corresponding to at least one piece of target alarm information; and determine the risk assessment value corresponding to the target change event based on the result of the weighted sum.
  • an exemplary solution for assigning initial risk values to different alarm levels may be: determining the basic risk values corresponding to different alarm levels; counting the historical frequency of occurrence of different alarm levels in the cloud network; based on different alarm levels.
  • the corresponding historical frequencies of each alarm level are assigned adjustment coefficients for different alarm levels; under different alarm levels, the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
  • a higher adjustment coefficient can be assigned to them, so that their initial risk values are higher and their impact on the final risk assessment value will be greater.
  • the basic risk value is fine-tuned by taking into account the historical frequency of occurrence of different alarm levels in the cloud network.
  • the initial risk value of the corresponding alarm level of at least one target alarm information can be obtained.
  • v represents the risk assessment value of the target change event
  • v p represents the sum of risk assessment values caused by all target alarm information with a topological distance of p
  • p represents the aforementioned topological distance
  • f(p) represents the target with a correlation degree of p.
  • x represents the initial risk value corresponding to each target alarm information with a correlation degree of p. It can be seen that the risk assessment value of the target change event is equal to the weighted sum of the initial risk values of all target alarm information plus the initial risk value of all target alarm information.
  • the target alarm information that matches the target change event that occurs in the cloud network can be used as the basis for risk assessment, and each direction can also be classified according to the degree of correlation between the target alarm information and the target change event.
  • the degree of participation of target alarm information in the process of calculating risk assessment values can be comprehensively considered to avoid one-sided judgments about the risk of target change events due to a small amount of target alarm information.
  • alarm information in the cloud network may be caused by user behavior, but these alarm information are difficult to accurately eliminate.
  • the alarm information caused by these user behaviors participates in the calculation of risk assessment values During the process, the alarm information caused by user behavior is usually local and temporary. Therefore, its participation in the risk assessment value calculation process is not too strong, which invisibly affects the final risk assessment. The influence of value has been weakened, which can effectively avoid the problem of misjudgment of risk assessment caused by user behavior.
  • the above implementation method can associate target alarm information to target change events, and comprehensively consider the degree of influence that each target alarm information should play in the risk assessment value through various dimensions such as alarm proportion, weight, adjustment coefficient, and initial risk value. , so as to reasonably analyze the target alarm information to obtain the risk assessment value.
  • a risk threshold can also be set, and the aforementioned preset condition is set to exceed the risk threshold.
  • the risk assessment value calculated for the target change event exceeds the risk threshold, it can be determined Target change events are risky.
  • a reminder notification can be issued; the reminder notification can be output to the operation and maintenance personnel for the operation and maintenance personnel to confirm the handling plan for the target change event, for example, it can be to suspend the change or modify the change online, etc.
  • this embodiment is not limited here.
  • An exemplary solution for determining the risk threshold may be: continuously collect risk assessment values calculated for historical change events that occur in the cloud network as assessment value samples; and compare the collected values according to the number of times different risk assessment values have been recorded. Distribution fitting is performed on the evaluation value samples to obtain the fitting function; based on the fitting function, the risk threshold is selected.
  • Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application.
  • the risk assessment value in the assessment value sample can be used as the X-axis, and each risk assessment value involved in the assessment value sample The number of times recorded is the Y-axis, the distribution data of the risk assessment value is obtained, and the corresponding distribution fitting function is generated.
  • the evaluation value samples are distributed and sorted based on the fitting function; the evaluation value samples after distribution sorting are selected to match the preset false alarm rate target evaluation value sample; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold; if the number of evaluation value samples is lower than the specified number, the evaluation value samples will be distributed and sorted based on the fitting function; sorted from the distribution Select the target evaluation value sample that is adapted to the cumulative probability of the preset distribution among the evaluation value samples; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold.
  • the evaluation value samples in Figure 5 are less than 500 (corresponding to the specified number mentioned above). If the evaluation value samples are 100, and the cumulative probability of the preset distribution is 99%, the risk threshold is calculated to be 9.4. In this way, If the risk assessment value of a target change event is higher than 9.4, it will be deemed to be a risk.
  • the risk assessment value corresponding to the target change event can be calculated simply, efficiently, and accurately based on the target alarm information, and whether there is a risk in the target change event by judging whether the risk assessment value exceeds the risk threshold. In this way, the risk of change events can be discovered in a timely manner during the change testing phase or the operation phase after the change is launched, thereby effectively avoiding the failures that the change may bring to the cloud.
  • each step of the method provided in the above embodiments may be the same device, or the method may also be executed by different devices.
  • Some of the processes described in the above embodiments and drawings include multiple operations that appear in a specific order, but it should be clearly understood that these operations may not be performed in the order in which they appear in this article or may be performed in parallel.
  • the operations The serial numbers such as 101, 102, etc. are only used to distinguish different operations. The serial numbers themselves do not represent any execution order. Additionally, these processes may include more or fewer operations, and the operations may be performed sequentially or in parallel.
  • FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application. As shown in FIG. 6 , the computing device includes: a memory 60 and a processor 61 .
  • the processor 61 is coupled to the memory 60 and is used to execute the computer program in the memory 60 for:
  • the change impact scope In response to the risk assessment instructions, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes the affiliation relationships between objects at different levels in the cloud network and the association relationships between objects at the same level.
  • the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • the processor 61 in the process of determining at least one layer of superior objects corresponding to the change objects of the target change event included in the change impact scope, is configured to:
  • the topology information search the level objects to which the change object belongs level by level until the object of the specified level is found and end the search to obtain at least one layer of superior objects corresponding to the change object of the target change event;
  • the level objects to which the alarm occurrence object belongs are searched level by level until the object of the specified level is found and the search is terminated to obtain at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information.
  • the topology information adopts a tree structure
  • the processor 61 can also be used to:
  • At least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized into a change topology tree with the change object as the root node to represent the scope of change influence;
  • At least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects at the same level are organized into an alarm topology tree with the alarm occurrence object as the root node to represent the scope of alarm influence.
  • the processor 61 in the process of risk assessment of a target change event based on at least one piece of target alarm information, is configured to:
  • the risk assessment value is higher than the risk threshold, it is determined that the target change event is risky.
  • the processor 61 in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level of at least one item of target alarm information, is used to:
  • the processor 61 in the process of determining the degree of correlation between at least one piece of target alarm information and the target change event, is configured to:
  • a correlation degree is assigned to at least one piece of target alarm information based on the level distance between the minimum level object corresponding to each of the at least one piece of target alarm information and the change object.
  • the processor 61 in the process of searching for the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to at least one piece of target alarm information, is used to:
  • the change impact scope and the alarm impact scope are represented by a topology tree, search for the lowest tree node in the overlap between the change impact scope and the alarm impact scope corresponding to at least one target alarm information, where a single tree node corresponds to a level object. ;
  • the process of determining level distance is used for:
  • the processor 61 in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one piece of target alarm information, is used to:
  • the processor 61 in the process of assigning initial risk values to different alarm levels, is used to:
  • the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
  • the processor 61 can also be used to:
  • the risk threshold is selected.
  • the computing device also includes: a communication component 62, a power supply component 63 and other components. Only some components are schematically shown in FIG. 6 , which does not mean that the computing device only includes the components shown in FIG. 6 .
  • embodiments of the present application also provide a computer-readable storage medium storing a computer program.
  • the computer program When executed, it can implement each step that can be executed by a computing device in the above method embodiment.
  • the memory in Figure 6 above is used to store computer programs, and can be configured to store various other data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable memory Read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable memory Read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory
  • flash memory magnetic or optical disk.
  • the communication component in Figure 6 mentioned above is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • a power component in Figure 6 above provides power to various components of the device where the power supply component is located.
  • a power component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power component resides.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.

Abstract

Provided in the embodiments of the present application are a change risk assessment method and apparatus, and a storage medium. The change influence range of a target change event can be rationally expanded, such that it can be ensured that the observation range of change risk assessment work is sufficiently large, which is conducive to improving the accuracy of risk assessment; alarm information can also be introduced to serve as a basis for change risk assessment, the alarm influence range of each piece of alarm information is rationally expanded, such that potential alarms which have not yet appeared in a cloud network can be discovered in a timely manner, and these potential alarms are fully involved in a change risk assessment process; and target alarm information which matches the target change event can also be discovered by means of determining whether there is an overlapping part between the alarm influence range and the change influence range, such that the expanded change influence range can be corrected to a more accurate range, and required alarm information can be accurately and comprehensively hit to calculate a change risk value. Therefore, the risk of a change can be efficiently and accurately assessed.

Description

一种变更风险评估方法、设备及存储介质A change risk assessment method, equipment and storage medium
本申请要求于2022年04月27日提交中国专利局、申请号为202210459479.X、申请名称为“一种变更风险评估方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application submitted to the China Patent Office on April 27, 2022, with the application number 202210459479. This reference is incorporated into this application.
技术领域Technical field
本申请涉及云技术领域,尤其涉及一种变更风险评估方法、设备及存储介质。This application relates to the field of cloud technology, and in particular to a change risk assessment method, equipment and storage medium.
背景技术Background technique
随着云技术的发展,云的规模越来越大,功能更新的频率也越来越高。在云网络中发生的针对功能组件执行的代码修改、配置新功能、修复bug等操作可被称为变更。With the development of cloud technology, the scale of cloud is getting larger and larger, and the frequency of feature updates is also getting higher and higher. Code modifications, configuration of new functions, bug fixes and other operations performed on functional components that occur in the cloud network can be called changes.
目前,运维人员会在提交变更之前,对变更进行严格的分析和测试,其中,顺利通过测试的变更将被认为是无风险的。但是,由于云的真实环境在规模、软硬件版本、工作负载情况、组件交互情况等各个方面都可能与测试环境有所不同,这导致,针对变更的测试结果的准确性不足。在那些已经通过测试的变更中可能存在有缺陷的变更,而这部分有缺陷的变更在上线后,可能会给云带来灾难性的故障。At present, operation and maintenance personnel will conduct strict analysis and testing of changes before submitting them. Among them, changes that successfully pass the test will be considered risk-free. However, since the real environment of the cloud may be different from the test environment in terms of scale, software and hardware versions, workload conditions, component interactions, etc., this results in insufficient accuracy of test results for changes. There may be defective changes among those changes that have passed the test, and these defective changes may cause catastrophic failures to the cloud after they are put online.
发明内容Contents of the invention
本申请的多个方面提供一种变更风险评估方法、设备及存储介质,用以更加合理、更加准确地评估变更的风险。Various aspects of this application provide a change risk assessment method, equipment and storage medium to assess the risk of change more reasonably and accurately.
本申请实施例提供一种变更风险评估方法,包括:The embodiment of this application provides a change risk assessment method, including:
响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,所述拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,所述变更影响范围中包括所述目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network. The topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
收集云网络中在所述目标变更事件发生后的预设时间范围内所产生的多项告警信息;Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
根据所述拓扑信息分别确定所述多项告警信息各自对应的告警影响范围,所述告警影响范围包括所述告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Determine the alarm influence scope corresponding to each of the multiple alarm information according to the topology information, and the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
从所述多项告警信息中选择与所述目标变更事件适配的至少一项目标告警信息,所述目标告警信息对应的告警影响范围与所述变更影响范围之间具有重叠部分; Select at least one piece of target alarm information adapted to the target change event from the plurality of alarm information, and there is an overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
基于所述至少一项目标告警信息,对所述目标变更事件进行风险评估。Based on the at least one target alarm information, a risk assessment is performed on the target change event.
本申请实施例还提供一种计算设备,包括存储器和处理器;An embodiment of the present application also provides a computing device, including a memory and a processor;
所述存储器用于存储一条或多条计算机指令;The memory is used to store one or more computer instructions;
所述处理器与所述存储器耦合,用于执行所述一条或多条计算机指令,以用于:The processor is coupled to the memory for executing the one or more computer instructions for:
响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,所述拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,所述变更影响范围中包括所述目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network. The topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
收集云网络中在所述目标变更事件发生后的预设时间范围内所产生的多项告警信息;Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
根据所述拓扑信息分别确定所述多项告警信息各自对应的告警影响范围,所述告警影响范围包括所述告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Determine the alarm influence scope corresponding to each of the multiple alarm information according to the topology information, and the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
从所述多项告警信息中选择与所述目标变更事件适配的至少一项目标告警信息,所述目标告警信息对应的告警影响范围与所述变更影响范围之间具有重叠部分;Select at least one piece of target alarm information adapted to the target change event from the plurality of alarm information, and there is an overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
基于所述至少一项目标告警信息,对所述目标变更事件进行风险评估。Based on the at least one target alarm information, a risk assessment is performed on the target change event.
本申请实施例还提供一种存储计算机指令的计算机可读存储介质,当所述计算机指令被一个或多个处理器执行时,致使所述一个或多个处理器执行前述的变更风险评估方法。Embodiments of the present application also provide a computer-readable storage medium that stores computer instructions. When the computer instructions are executed by one or more processors, the one or more processors are caused to execute the aforementioned change risk assessment method.
在本申请实施例中,可基于云网络中预置的拓扑信息合理扩大目标变更事件的变更影响范围,这可保证变更风险评估工作的观测范围足够大,有助于提升风险评估的准确性;还可引入告警信息作为变更风险评估的依据,通过收集云网络中产生的告警信息并基于拓扑信息合理扩大各项告警信息的告警影响范围,这可及时发现云网络中尚未显现出的潜在告警,并将这些潜在告警充分参与到变更风险评估过程中;在此基础上,还可通过判断告警影响范围和变更影响范围之间是否具有重叠部分,来发现与目标变更事件匹配的目标告警信息,这样,不仅可将扩展后的变更影响范围修正至更加准确的范围,还可精准地、全面地命中所需的告警信息来计算变更风险值。从而,可实现更加全面、高效、准确地评估变更的风险。In the embodiment of this application, the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation range of the change risk assessment work is large enough and help improve the accuracy of the risk assessment; Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner. And fully participate in these potential alarms in the change risk assessment process; on this basis, you can also find target alarm information that matches the target change event by judging whether there is overlap between the alarm impact scope and the change impact scope, so that , not only can the expanded change impact scope be corrected to a more accurate scope, but also the required alarm information can be accurately and comprehensively hit to calculate the change risk value. As a result, the risk of change can be assessed more comprehensively, efficiently, and accurately.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. In the attached picture:
图1a为本申请一示例性实施例提供的一种变更风险评估方法的流程示意图;Figure 1a is a schematic flow chart of a change risk assessment method provided by an exemplary embodiment of the present application;
图1b为本申请一示例性实施例提供的一种变更风险评估方案的逻辑示意图;Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application;
图2a为本申请一示例性实施例提供的一种对变更对象进行网络范围扩展的逻辑示意图;Figure 2a is a logical schematic diagram of network scope expansion of a change object provided by an exemplary embodiment of the present application;
图2b为本申请一示例性实施例提供的一种对告警发生对象进行网络范围扩展的逻辑示意图;Figure 2b is a logical schematic diagram of extending the network range of an alarm occurrence object provided by an exemplary embodiment of the present application;
图3为本申请一示例性实施例提供的一种为目标变更事件选择目标告警信息的方案逻辑示意图;Figure 3 is a logical schematic diagram of a solution for selecting target alarm information for target change events provided by an exemplary embodiment of the present application;
图4为本申请一示例性实施例提供的一种匹配后的变更影响范围的效果示意图;Figure 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application;
图5为本申请一示例性实施例提供的一种风险阈值确定方案的逻辑示意图;Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application;
图6为本申请又一示例性实施例提供的一种计算设备的结构示意图。FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be clearly and completely described below in conjunction with specific embodiments of the present application and corresponding drawings. Obviously, the described embodiments are only some of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
目前,采用测试的方式来对变更进行风险评估,但是这种方式的评估结果准确性不足,可能会给云带来灾难性故障。为此,本申请的一些实施例中:可基于云网络中预置的拓扑信息合理扩大目标变更事件的变更影响范围,这可保证变更风险评估工作的观测范围足够大,有助于提升风险评估的准确性;还可引入告警信息作为变更风险评估的依据,通过收集云网络中产生的告警信息并基于拓扑信息合理扩大各项告警信息的告警影响范围,这可及时发现云网络中尚未显现出的潜在告警,并将这些潜在告警充分参与到变更风险评估过程中;在此基础上,还可通过判断告警影响范围和变更影响范围之间是否具有重叠部分,来发现与目标变更事件匹配的目标告警信息,这样,不仅可将扩展后的变更影响范围修正至更加准确的范围,还可精准地、全面地命中所需的告警信息来计算变更风险值。从而,可实现更加全面、高效、准确地评估变更的风险。Currently, testing is used to conduct risk assessment on changes, but the assessment results of this method are not accurate enough and may cause catastrophic failures to the cloud. To this end, in some embodiments of this application: the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the risk assessment. Accuracy; alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, this can promptly discover problems that have not yet appeared in the cloud network. potential alarms, and fully participate in these potential alarms in the change risk assessment process; on this basis, you can also find targets that match the target change events by judging whether there is overlap between the alarm impact scope and the change impact scope. Alarm information, in this way, not only can the expanded change impact scope be corrected to a more accurate range, but also the required alarm information can be accurately and comprehensively hit to calculate the change risk value. As a result, the risk of change can be assessed more comprehensively, efficiently, and accurately.
以下结合附图,详细说明本申请各实施例提供的技术方案。The technical solutions provided by each embodiment of the present application will be described in detail below with reference to the accompanying drawings.
图1a为本申请一示例性实施例提供的一种变更风险评估方法的流程示意图,该方法可由变更风险评估装置执行,该变更风险评估装置可实现为软件和/或硬件的结合,该变更风险评估装置可集成在计算设备中。参考图1a,该方法可包括:Figure 1a is a schematic flowchart of a change risk assessment method provided by an exemplary embodiment of the present application. The method can be executed by a change risk assessment device. The change risk assessment device can be implemented as a combination of software and/or hardware. The change risk assessment method can be implemented as a combination of software and/or hardware. The evaluation device can be integrated in the computing device. Referring to Figure 1a, the method may include:
步骤100、响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,变更影响范围中包括目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;Step 100: In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the topology information preset in the cloud network. The topology information includes the affiliation relationships between objects at different levels in the cloud network and the relationships between objects at the same level. Association relationship, the scope of change influence includes at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level;
步骤101、收集云网络中在目标变更事件发生后的预设时间范围内所产生的多项告警信息;Step 101: Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
步骤102、根据拓扑信息分别确定多项告警信息各自对应的告警影响范围,告警 影响范围包括告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Step 102: Determine the alarm impact scope corresponding to the multiple alarm information according to the topology information. The alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
步骤103、从多项告警信息中选择与目标变更事件适配的至少一项目标告警信息,目标告警信息对应的告警影响范围与变更影响范围之间具有重叠部分;Step 103: Select at least one target alarm information that is adapted to the target change event from multiple alarm information, and there is overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
步骤104、基于至少一项目标告警信息,对目标变更事件进行风险评估。Step 104: Perform a risk assessment on the target change event based on at least one target alarm information.
本实施例提供的变更风险评估方法可应用于云网络中对变更的运维场景。其中,云网络可以是指云计算基础架构中的软件定义的计算网络,当然,这里的定义仅是狭义的,本实施例中的云网络可泛指基于云技术而构建的网络架构。本实施例中,可将在云网络中发生的针对功能组件执行的代码修改、配置新功能、修复bug等操作称为变更。本实施例提供的变更风险评估方法可应用于变更上线之前的测试阶段以取代现有的测试方法,也可应用于变更上线后对变更进行继续跟踪评估,以及时发现测试阶段中未检出的变更风险,从而有效避免因变更导致的云故障。The change risk assessment method provided in this embodiment can be applied to operation and maintenance scenarios of changes in cloud networks. The cloud network may refer to a software-defined computing network in a cloud computing infrastructure. Of course, the definition here is only in a narrow sense. The cloud network in this embodiment may generally refer to a network architecture built based on cloud technology. In this embodiment, operations such as code modification, configuring new functions, and bug fixes performed on functional components that occur in the cloud network can be called changes. The change risk assessment method provided in this embodiment can be applied to the testing phase before the change is launched to replace the existing testing method. It can also be used to continue tracking and evaluating the change after the change is launched to timely discover undetected defects in the testing phase. Change risks, thereby effectively avoiding cloud failures caused by changes.
随着云规模的不断扩大,云网络中发生的变更数量是巨大的,仅单日的版本发布、日常运维等导致的变更次数就有可能超过万次。本实施例中的目标变更事件可用于指代云网络中发生的任意一次变更,一次变更中通常会指定变更对象,例如,一次修复bug类的变更中,可指定云网络中的某个网关组件作为变更对象。本实施例中,一个变更事件中指定的变更对象可以是一个或多个,当变更对象为多个时,该多个变更对象之间至少存在一层共同所属的级别对象,例如,上述的修复buge类的变更中可指定网关组件A和交换机组件B,且网关组件A和交换机组件B位于同一可用区(级别对象)中。当然,这仅是优选地,本实施例并不限于此,同一变更事件中的多个变更对象之间也可并无上述限定,对此,本实施例中,只需在变更事件内部按上述要求进行变更对象分组,并按组来分析变更风险后再进行分析结果综合,即可实现对变更事件的风险评估。为便于方案说明,后文中将默认采用上述的优先的实现方式来定义目标变更事件。As the scale of the cloud continues to expand, the number of changes that occur in the cloud network is huge. The number of changes caused by version releases and daily operations and maintenance in a single day may exceed 10,000 times. The target change event in this embodiment can be used to refer to any change that occurs in the cloud network. A change object is usually specified in a change. For example, in a change to fix a bug, a certain gateway component in the cloud network can be specified. as the object of change. In this embodiment, there can be one or more change objects specified in a change event. When there are multiple change objects, there is at least one level of shared level objects between the multiple change objects, for example, the above-mentioned repair Gateway component A and switch component B can be specified in the buge class change, and gateway component A and switch component B are located in the same availability zone (level object). Of course, this is only preferred, and this embodiment is not limited to this. Multiple change objects in the same change event may not have the above restrictions. In this regard, in this embodiment, it is only necessary to perform the above steps within the change event. It is required to group the change objects, analyze the change risks by groups, and then synthesize the analysis results to achieve risk assessment of the change events. In order to facilitate the explanation of the solution, the above-mentioned priority implementation method will be used by default to define the target change event in the following article.
本实施例中,可周期性或根据其它触发条件来触发风险评估指令,应当理解的是,本实施例中将在变更事件的存续时间内不断执行图1a中所示的变更风险评估逻辑,从而对变更事件进行跟踪评估,例如,每隔10s评估一次。跟踪评估的结束时机可根据时机需要进行设定,例如,可设定为在变更事件开始后的预设时长后停止评估,或者在变更事件对应的风险评估结果低于预设标准的状态已经持续预设时长后停止评估,等等,本实施例对此不做限定。In this embodiment, the risk assessment instruction can be triggered periodically or according to other trigger conditions. It should be understood that in this embodiment, the change risk assessment logic shown in Figure 1a will be continuously executed during the duration of the change event, so that Track and evaluate change events, for example, every 10 seconds. The end time of the tracking assessment can be set according to timing needs. For example, it can be set to stop the assessment after a preset time period after the change event begins, or when the risk assessment result corresponding to the change event is lower than the preset standard and has continued. Stop the evaluation after a preset time period, etc. This embodiment does not limit this.
基于此,参考图1a,在步骤100中,可响应于风险评估指令,合理扩大目标变更事件对应的变更影响范围。其中,变更影响范围的扩大是指将目标变更事件的影响范围从变更对象扩展至更大的范围。例如,变更对象为物理网关时,目标变更事件原本的变更影响范围是云网络中的一个设备,可逐步将其扩展至设备所属的集群,扩展至设备所属的可用区,扩展至设备所属的地区region等。这样,可将目标变更事件的变更影响范围扩大,为目标变更事件提供了足够大了观测范围,在足够大的观测范围内对目标变更事件进行观测,可有效提高风险评估的准确性。Based on this, referring to Figure 1a, in step 100, the change impact scope corresponding to the target change event can be reasonably expanded in response to the risk assessment instruction. Among them, the expansion of the impact scope of the change refers to extending the impact scope of the target change event from the change object to a larger scope. For example, when the change object is a physical gateway, the original change impact scope of the target change event is a device in the cloud network. It can be gradually extended to the cluster to which the device belongs, to the availability zone to which the device belongs, and to the region to which the device belongs. region etc. In this way, the scope of change influence of the target change event can be expanded, and a large enough observation range is provided for the target change event. Observing the target change event within a large enough observation range can effectively improve the accuracy of risk assessment.
本实施例中,可在云网络中划分出多级别对象,并基于多级别对象预先定义云网络中的拓扑信息。其中,多种级别对象可包括但不限于实例、网元、应用、设备、集群、可用区、地区等等。在云网络中,拓扑信息中可包含不同级别对象之间隶属关系以及同级别对象之间的关联关系。示例性的隶属关系可以是,设备从属于集群,集群从属于可用区,而可用区从属于地区;示例性的同级别对象之间的关联关系可以是,不同实例之间可能存在资源关联关系或者实例与应用之间可能存在资源关联关系等,本实施例对拓扑信息中包含的具体关系逻辑不做限定。本实施例中,不同级别对象的规格不同,例如,地区的规格要大于可用区;同级别对象之间的规格可相同也可近似,例如,实例、应用属于同级别对象,当然,这仅是示例性的,本实施例对此并不做限定。基于云网络中的拓扑信息,可将一个变更对象或一个告警发生对象扩展至更大的影响范围,这其中包含同级别内的扩展以及向更高级别的扩展。In this embodiment, multi-level objects can be divided into the cloud network, and the topology information in the cloud network can be predefined based on the multi-level objects. Among them, multiple-level objects can include but are not limited to instances, network elements, applications, devices, clusters, availability zones, regions, etc. In a cloud network, topology information can include affiliation relationships between objects at different levels and associations between objects at the same level. An exemplary affiliation relationship can be that the device belongs to the cluster, the cluster belongs to the availability zone, and the availability zone belongs to the region; an exemplary association relationship between objects of the same level can be that there may be resource association relationships between different instances or There may be resource associations between instances and applications. This embodiment does not limit the specific relationship logic contained in the topology information. In this embodiment, the specifications of objects at different levels are different. For example, the specifications of a region are larger than the availability zone. The specifications of objects of the same level can be the same or similar. For example, instances and applications belong to objects of the same level. Of course, this is just For example, this embodiment does not limit this. Based on the topology information in the cloud network, a change object or an alarm occurrence object can be expanded to a larger scope of influence, including expansion within the same level and expansion to a higher level.
在此基础上,本实施例中,在确定目标变更事件对应的变更影响范围的过程中,可按照云网络中预置的拓扑信息,逐级别查找变更对象所隶属的级别对象,以获得与变更对象对应的至少一层上级对象;还可查找与变更对象存在关联的同级别对象;从而确定出的变更影响范围中可包括目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象。本实施例中,在查找变更对象对应的至少一层上级对象的过程中,可将指定级别的对象作为结束条件,这样,可逐级别查找变更对象所隶属的级别对象,直至查找到指定级别的对象后结束查找,以获得目标变更事件的变更对象对应的至少一层上级对象。例如,指定级别的对象可以是地区region,当然,这仅是示例性的,本实施例对此不做限定。On this basis, in this embodiment, in the process of determining the change impact scope corresponding to the target change event, the level objects to which the change object belongs can be searched level by level according to the topology information preset in the cloud network, so as to obtain and change At least one layer of superior objects corresponding to the object; it can also search for objects of the same level that are associated with the change object; thus, the determined change impact scope can include at least one layer of superior objects corresponding to the change object of the target change event and the associated same-level objects. level object. In this embodiment, in the process of searching for at least one layer of superior objects corresponding to the change object, the object of the specified level can be used as the end condition. In this way, the level objects to which the change object belongs can be searched level by level until the specified level is found. End the search after the object to obtain at least one layer of superior objects corresponding to the change object of the target change event. For example, the object at the specified level may be a region. Of course, this is only exemplary and is not limited in this embodiment.
这样,扩展获得的变更影响范围可记录目标变更事件可能影响到的多层级别对象。In this way, the obtained change impact scope can be expanded to record multiple levels of objects that may be affected by the target change event.
图1b为本申请一示例性实施例提供的一种变更风险评估方案的逻辑示意图。参考图1b,可从云网络中的变更系统中获知变更事件,而变更事件中将指定变更对象。Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application. Referring to Figure 1b, the change event can be obtained from the change system in the cloud network, and the change object will be specified in the change event.
参考图1b,本实施例中还创新性地提出引入告警信息来作为变更风险评估的依据。云网络中通常部署有成熟的监测系统,监测系统用于监测云网络中各个点位上的运行状态,例如,监测流量状态、丢包状态、时延状态等等,监测系统中将产生大量的告警信息,本实施例中,可收集云网络中监测系统产生的告警信息,并将这些告警信息作为变更风险评估的依据。正如前文提及的,变更风险评估工作可以采用周期性或其它触发条件的形式被触发,触发一次风险评估指令的情况下,本实施例中可收集上一次风险评估结束后,云网络中新增的告警信息,作为本次变更风险评估工作的依据。Referring to Figure 1b, this embodiment also innovatively proposes to introduce alarm information as the basis for change risk assessment. Mature monitoring systems are usually deployed in cloud networks. The monitoring system is used to monitor the operating status of various points in the cloud network, such as monitoring traffic status, packet loss status, delay status, etc. A large number of data will be generated in the monitoring system. Alarm information. In this embodiment, alarm information generated by the monitoring system in the cloud network can be collected, and these alarm information can be used as the basis for change risk assessment. As mentioned above, the change risk assessment work can be triggered periodically or in the form of other trigger conditions. When a risk assessment instruction is triggered, in this embodiment, new data added in the cloud network after the last risk assessment can be collected The alarm information will be used as the basis for this change risk assessment.
目前,为了监测的精准性,云网络中的监测系统通常是采用单点监测的方式,监测对象通常处于实例、设备、网元、应用等层面,相应的,告警信息中的告警发生对象通常是单点的。为此,本实施例中,参考图1a,在步骤101中,可收集云网络中在目标变更事件发生后的预设时间范围内所产生的多项告警信息。At present, in order to ensure the accuracy of monitoring, monitoring systems in cloud networks usually adopt a single-point monitoring method. The monitoring objects are usually at the instance, device, network element, application, etc. levels. Correspondingly, the alarm occurrence objects in the alarm information are usually A la carte. To this end, in this embodiment, referring to Figure 1a, in step 101, multiple alarm information generated in the cloud network within a preset time range after the target change event occurs may be collected.
在此基础上,在步骤102中,可根据拓扑信息分别确定多项告警信息各自对应的告警影响范围,告警影响范围包括告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象。其中,告警信息中可包含告警发生对象、告警级别以及告警描述内容等信息。其中,告警发生对象是指云网络中发生异常情况的对象,其上的异常情况触发监测系统发出了告警信息。不同告警信息中的告警发生对象所处的级别对象可能不尽相同,告警发生对象所处的级别对象可包括但不限于前述的实例、设备、网元、应用等层面。本实施例中,可将告警信息对应的原始影响范围扩大,使告警信息可覆盖更大的影响范围,这可将告警信息赋予至扩大后的影响范围。很多时候,在变更有缺陷的情况下,变更的影响范围可能是跨组件的,短时间内,变更导致的异常可能仅显现在几个点位上,而其对其它点位造成的异常可能尚未显现出来,为此,本实施例中,通过对告警信息进行影响范围扩展,将在单点上发现的告警合理扩展到更大范围,以充分发现扩大后的告警影响范围中尚未显现出的潜在告警,并可将这些尚未显现出来的潜在告警充分参与到变更风险评估过程中,为变更风险评估工作提供更加全面的依据。On this basis, in step 102, the alarm impact scope corresponding to the multiple alarm information can be determined according to the topology information. The alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and the associated same-level objects. object. Among them, the alarm information may include information such as the alarm occurrence object, alarm level, and alarm description content. Among them, the alarm occurrence object refers to the object in the cloud network where abnormal conditions occur, and the abnormal conditions on the objects trigger the monitoring system to send out alarm information. The level objects of alarm occurrence objects in different alarm information may be different. The level objects of alarm occurrence objects may include but are not limited to the aforementioned instance, device, network element, application and other levels. In this embodiment, the original scope of influence corresponding to the alarm information can be expanded so that the alarm information can cover a larger scope of influence, which can assign the alarm information to the expanded scope of influence. Many times, when a change is defective, the impact scope of the change may be across components. In a short period of time, the abnormalities caused by the change may only appear at a few points, while the abnormalities caused by it at other points may not yet be seen. To this end, in this embodiment, the alarm discovered at a single point is reasonably expanded to a larger scope by extending the scope of influence of the alarm information, so as to fully discover the potential potential that has not yet appeared in the expanded alarm scope. Alarms, and these potential alarms that have not yet appeared can be fully involved in the change risk assessment process, providing a more comprehensive basis for the change risk assessment work.
实际应用中,步骤100和步骤102中的影响范围扩展操作可以是同步执行的,本实施例中并不限定先后顺序。且告警影响范围的扩大逻辑和变更影响范围的扩大逻辑基本类似,在此不再详细重复关于告警影响范围的扩大操作的细节。In practical applications, the influence scope expansion operations in step 100 and step 102 may be performed synchronously, and the order is not limited in this embodiment. Moreover, the logic of expanding the alarm impact scope is basically similar to the logic of expanding the change impact scope. The details of the expansion operation of the alarm impact scope will not be repeated here.
在此基础上,参考图1a和图1b,在步骤103中,可从多项告警信息中选择与目标变更事件适配的至少一项目标告警信息。这里,可将变更影响范围和告警影响范围进行重叠分析,将与变更影响范围具有重叠部分的告警影响范围所对应的告警信息作为目标告警信息。也即是,目标告警信息的告警影响范围需与目标变更事件的变更影响范围具有重叠部分,其中重叠部分中包含至少一个级别对象。正如前文提及的,本实施例中,将目标变更事件的观测范围进行了扩大,将告警信息的覆盖范围进行了扩大,通过对扩展获得的变更影响范围和告警影响范围进行重叠分析,可快速地确定出变更影响范围中可能发生告警的级别对象,这可将扩展获得的变更影响范围再修正至更加准确的范围。由于修正后的范围是充分考虑了云网络中已经显现的告警以及前述步骤102中囊括的尚未显现出的潜在告警而确定出的,因此,不仅可保证变更风险评估的全面性和准确性,还可进一步精简变更的观测范围,减少计算量。On this basis, referring to Figure 1a and Figure 1b, in step 103, at least one piece of target alarm information adapted to the target change event can be selected from a plurality of alarm information. Here, the change impact scope and the alarm impact scope can be overlapped and analyzed, and the alarm information corresponding to the alarm impact scope that overlaps with the change impact scope can be used as the target alarm information. That is to say, the alarm impact scope of the target alarm information needs to overlap with the change impact scope of the target change event, and the overlapping part contains at least one level object. As mentioned above, in this embodiment, the observation range of the target change event is expanded, and the coverage of the alarm information is expanded. By overlapping the change impact range and the alarm impact range obtained by the expansion, we can quickly This method can accurately determine the level objects in the change impact scope that may cause alarms, which can correct the expanded change impact scope to a more accurate range. Since the revised scope is determined by fully considering the alarms that have already appeared in the cloud network and the potential alarms that have not yet appeared in step 102, it can not only ensure the comprehensiveness and accuracy of the change risk assessment, but also ensure the comprehensiveness and accuracy of the change risk assessment. The observation scope of changes can be further streamlined and the amount of calculations reduced.
值得说明的是,本实施例中是从目标变更事件的角度来描述步骤102中的匹配过程,但应当理解的是,本实施例中并不限定告警信息与变更事件在匹配过程的角色主次,可以从每个变更事件的角度去从多项告警信息中寻找目标告警信息,也可以从每项告警信息的角度去从多个变更事件中去寻找匹配的变更事件,而该告警信息自然成为匹配到的变更事件对应的目标告警信息。而且,匹配操作可以是同步的,当然也可以是异步的,本实施例对此不做限定。It is worth noting that this embodiment describes the matching process in step 102 from the perspective of target change events, but it should be understood that this embodiment does not limit the primary and secondary roles of alarm information and change events in the matching process. , you can search for target alarm information from multiple alarm information from the perspective of each change event, or you can search for matching change events from multiple change events from the perspective of each alarm information, and the alarm information naturally becomes The target alarm information corresponding to the matched change event. Moreover, the matching operation may be synchronous or, of course, asynchronous, which is not limited in this embodiment.
在确定出与目标变更事件适配的至少一项目标告警信息之后,在步骤104中,可基于至少一项目标告警信息,对目标变更事件进行风险评估。本实施例中,可通过对至少一项目标告警信息进行分析,来评估目标变更事件是否存在风险。本实施例中可采用多种实现方式来基于目标告警信息对目标变更事件进行变更风险评估,具体的实现方式将在后续实施例中进行详细说明。After at least one piece of target alarm information adapted to the target change event is determined, in step 104, a risk assessment can be performed on the target change event based on the at least one piece of target alarm information. In this embodiment, whether there is a risk in the target change event can be assessed by analyzing at least one target alarm information. In this embodiment, multiple implementation methods can be used to perform change risk assessment on target change events based on target alarm information. The specific implementation methods will be described in detail in subsequent embodiments.
据此,本实施例中,可基于云网络中预置的拓扑信息合理扩大目标变更事件的变更影响范围,这可保证变更风险评估工作的观测范围足够大,有助于提升风险评估的准确性;还可引入告警信息作为变更风险评估的依据,通过收集云网络中产生的告警信息并基于拓扑信息合理扩大各项告警信息的告警影响范围,这可及时发现云网络中尚未显现出的潜在告警,并将这些潜在告警充分参与到变更风险评估过程中;在此基础上,还可通过判断告警影响范围和变更影响范围之间是否具有重叠部分,来发现与目标变更事件匹配的目标告警信息,这样,不仅可将扩展后的变更影响范围修正至更加准确的范围,还可精准地、全面地命中所需的告警信息来计算变更风险值。从而,可实现更加全面、高效、准确地评估变更的风险。另外,由于告警信息是可覆盖全网的,因此,通过本实施例提出的告警信息的匹配方案,可将与变更对象之间跨组件的告警信息引入到目标变更对象的风险评估工作中,这可有效解决目前无法跨组件评估变更的困境(通常不同组件是由不同部门负责,目前变更测试工作通常只放在变更对象的负责部门来进行,其它相关组件的负责部门对变更甚至是无感知的)。Accordingly, in this embodiment, the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the accuracy of the risk assessment. Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner. , and fully participate in these potential alarms in the change risk assessment process; on this basis, you can also find target alarm information that matches the target change event by judging whether there is overlap between the alarm impact scope and the change impact scope. In this way, not only can the expanded change impact scope be corrected to a more accurate scope, but also the required alarm information can be accurately and comprehensively hit to calculate the change risk value. As a result, the risk of change can be assessed more comprehensively, efficiently, and accurately. In addition, since the alarm information can cover the entire network, through the alarm information matching scheme proposed in this embodiment, the cross-component alarm information with the changed object can be introduced into the risk assessment work of the target changed object. This It can effectively solve the current dilemma of being unable to evaluate changes across components (usually different components are responsible for different departments. Currently, change testing is usually only carried out by the department responsible for the change object, and the departments responsible for other related components are not even aware of the change. ).
在上述或下述实施例中,云网络中的上述拓扑信息可采用树结构,也即是,可通过拓扑树的形式来表征云网络中的上述拓扑信息。拓扑树可遵循分层结构,将不同级别对象合理分配在拓扑树的各个层中。例如,设备类对象的上一层可以是集群类对象,在上一层可以是可用区类对象,再上一层可以是地区类对象。In the above or following embodiments, the topology information in the cloud network may adopt a tree structure, that is, the topology information in the cloud network may be represented in the form of a topology tree. The topology tree can follow a hierarchical structure, and objects of different levels are reasonably distributed in each layer of the topology tree. For example, the upper layer of the device class object can be a cluster class object, the upper layer can be an availability zone class object, and the upper layer can be a region class object.
基于此,一种变更影响范围的确定方案可以是:按照拓扑信息对应的树结构,将目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象组织为以变更对象作为根节点的变更拓扑树,以表征变更影响范围。图2a为本申请一示例性实施例提供的一种确定变更影响范围的逻辑示意图。参考图2a,目标变更事件中的变更对象为AVS设备,根据云网络中的拓扑信息,可将AVS设备逐级别扩展至其所属的AVS集群、可用区及地区region,从而获得变更影响范围,也即是图2a中最右侧的变更拓扑树。当然,图2a中,仅示出了变更对象为设备时的拓扑树结构,对于变更对象所属的初始级别对象为其它对象的情况,可根据云网络中预置的拓扑信息进行适应性地扩展。Based on this, a solution for determining the scope of change influence may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized with the change object as the root. A change topology tree of nodes to represent the impact scope of the change. Figure 2a is a logical schematic diagram for determining the impact scope of a change provided by an exemplary embodiment of the present application. Referring to Figure 2a, the change object in the target change event is the AVS device. According to the topology information in the cloud network, the AVS device can be extended level by level to the AVS cluster, availability zone and region to which it belongs, thereby obtaining the scope of change and also That is the changed topology tree on the far right in Figure 2a. Of course, Figure 2a only shows the topology tree structure when the change object is a device. When the initial level object to which the change object belongs is other objects, it can be adaptively expanded according to the topology information preset in the cloud network.
同样,一种告警影响范围的确定方案可以是:按照拓扑信息对应的树结构,将告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象组织为以告警发生对象作为根节点的告警拓扑树,以表征告警影响范围。图2b为本申请一示例性实施例提供的一种确定告警影响范围的逻辑示意图,参考图2b,第一告警信息为EIP实例告警,这种情况下,可首先找到第一告警信息中的告警对象,也即是EIP实例,以及基于云网络中预置的拓扑信息,找到与EIP实例存在资源关联关系的ECS实例;之后,可分别找到承载EIP实例的XGW集群和承载ECS实例的AVS设备;再按照与图2a类似的方式,继续将XGW集群逐级别扩展至其所属的可用区及地区region,以及将AVS设备逐级别扩展至其所属的AVS集群、可用区及地区region,从而获得告警影响范围,也即是图2b中最右侧侧告警拓扑树。Similarly, a solution for determining the scope of alarm impact may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level are organized with the alarm occurrence object as the root. The alarm topology tree of the node to represent the alarm impact scope. Figure 2b is a logical schematic diagram for determining the impact scope of an alarm provided by an exemplary embodiment of the present application. Referring to Figure 2b, the first alarm information is an EIP instance alarm. In this case, the alarm in the first alarm information can be found first. The object, that is, the EIP instance, and based on the topology information preset in the cloud network, find the ECS instance that has a resource association with the EIP instance; then, you can find the XGW cluster that hosts the EIP instance and the AVS device that hosts the ECS instance. Then, in a similar manner to Figure 2a, continue to extend the XGW cluster level by level to the availability zone and region to which it belongs, and extend the AVS device to the AVS cluster, availability zone and region to which it belongs level by level, so as to obtain the alarm impact. The scope is the alarm topology tree on the far right side in Figure 2b.
应当理解的是,本实施例中,在云网络中划分出了多级别对象,而在每种级别对象下则可存在多个实际对象,例如,实例类级别对象下可存在实例1、实例2、实例3等若干实例,而可用区类级别对象下则可存在可用区A、可用区B等若干可用区。本实施例中,在为目标变更事件构建变更拓扑树的过程中,只需将变更对象根据自己所属的级别对象在云网络中的拓扑信息内对号入座即可,而基于拓扑信息可确定出与变更对象存在拓扑关系的其它级别对象下的实际对象。It should be understood that in this embodiment, multi-level objects are divided into the cloud network, and multiple actual objects may exist under each level object. For example, instance 1 and instance 2 may exist under the instance class level object. , instance 3 and other instances, and there can be several availability zones such as availability zone A, availability zone B and so on under the availability zone class level object. In this embodiment, in the process of constructing the change topology tree for the target change event, it is only necessary to place the change object in the topology information in the cloud network according to the level object to which it belongs. Based on the topology information, the change object can be determined and changed. Actual objects under objects at other levels that have topological relationships.
在变更影响范围和告警影响范围都采用拓扑树来表征的情况下,在从多项告警信息中选择与目标变更事件适配的至少一项目标告警信息的过程中,若告警拓扑树和变更告警树之间是否存在重叠的树节点,则可确定相应的告警影响范围与变更影响范围具有重叠部分。When both the change impact scope and the alarm impact scope are represented by a topology tree, in the process of selecting at least one target alarm information adapted to the target change event from multiple alarm information, if the alarm topology tree and the change alarm If there are overlapping tree nodes between trees, it can be determined that the corresponding alarm impact scope and the change impact scope overlap.
图3为本申请一示例性实施例提供的一种为目标变更事件选择目标告警信息的方案逻辑示意图。参考图3,左侧为目标变更事件对应的变更拓扑树,右侧为两项告警信息各自对应的告警拓扑树,可见,两棵告警拓扑树均与变更拓扑树存在重叠的树节点,因此,图3中的两项告警信息均可确定为目标变更事件对应的目标告警信息。这样,通过拓扑树的方式,可方便快捷准确地为目标变更事件确定出适配的目标告警信息。FIG. 3 is a logical schematic diagram of a solution for selecting target alarm information for a target change event provided by an exemplary embodiment of the present application. Referring to Figure 3, the left side is the change topology tree corresponding to the target change event, and the right side is the alarm topology tree corresponding to the two alarm information. It can be seen that both alarm topology trees have overlapping tree nodes with the change topology tree. Therefore, Both alarm information in Figure 3 can be determined as target alarm information corresponding to the target change event. In this way, through the topology tree, the adapted target alarm information for the target change event can be determined conveniently, quickly and accurately.
据此,本实施例中,可采用拓扑树的方式来表征变更影响范围和告警影响范围,这不仅可清楚、全面地呈现影响范围扩展操作所扩展出的至少一层级别对象,还可呈现扩展出的对象与变更对象之间的隶属关系、资源关联关系等拓扑信息。当然,本实施例中,还可采用其它方式来表征变更影响范围和告警影响范围,例如,可采用【第一级别对象、第二级别对象…;级别对象之间的拓扑信息】的数据结构来表征,也可采用集合+标签的方式来表征等,本实施例并不限于此。Accordingly, in this embodiment, a topology tree can be used to represent the change impact scope and alarm impact scope, which can not only clearly and comprehensively present at least one level of objects expanded by the impact scope expansion operation, but also present the extended Topological information such as affiliation and resource association between the exported object and the changed object. Of course, in this embodiment, other methods can be used to characterize the change impact range and alarm impact range. For example, the data structure of [first-level objects, second-level objects...; topological information between level objects] can be used to represent The representation may also be represented by a collection+tag method, etc. This embodiment is not limited to this.
图4为本申请一示例性实施例提供的一种匹配后的变更影响范围的效果示意图。参考图4,,经过上述匹配过程后,至少一个目标告警信息将关联至目标变更事件对应的变更影响范围中,具体地,至少一个目标告警信息将关联至其在变更影响范围中所重叠的最小的级别对象。举例来说,图4中,存在1个P3级别的目标告警信息与变更拓扑树重叠的最小的级别对象为AVS设备1,则可将该目标告警信息关联至变更拓扑树中的AVS设备1节点,同样的情况,变更拓扑树中的AVS设备2节点和AVS设备3节点上也被关联上了告警信息;还存在1个P1级别的目标告警信息与变更拓扑树重叠的最小的级别对象为AVS集群1,则可将该目标告警信息关联至变更拓扑树中的AVS集群1节点。FIG. 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application. Referring to Figure 4, after the above matching process, at least one target alarm information will be associated with the change impact scope corresponding to the target change event. Specifically, at least one target alarm information will be associated with the minimum overlap in the change impact scope. level object. For example, in Figure 4, there is a P3 level target alarm information and the smallest level object that overlaps with the changed topology tree is AVS device 1. Then the target alarm information can be associated with the AVS device 1 node in the changed topology tree. , in the same situation, the AVS device 2 node and AVS device 3 node in the changed topology tree are also associated with alarm information; there is also a P1 level target alarm information and the smallest level object that overlaps with the changed topology tree is AVS Cluster 1, then the target alarm information can be associated with the AVS cluster 1 node in the changed topology tree.
在上述或下述实施例中,可采用多种实现方式来基于至少一项目标告警信息而对目标变更事件进行风险评估。In the above or following embodiments, various implementation methods may be used to perform risk assessment on target change events based on at least one piece of target alarm information.
在一种可选的实现方式中,可获取至少一项目标告警信息中记录的告警级别;根据至少一项目标告警信息各自对应的告警级别,计算目标变更事件对应的风险评估值;若风险评估值满足预设条件,则确定目标变更事件存在风险。其中,告警级别为告警信息中的已有信息,其用于表征相应异常事件的严重程度、影响程度等。In an optional implementation method, the alarm level recorded in at least one target alarm information can be obtained; based on the respective alarm levels corresponding to the at least one target alarm information, the risk assessment value corresponding to the target change event is calculated; if the risk assessment If the value meets the preset conditions, it is determined that there is a risk in the target change event. Among them, the alarm level is the existing information in the alarm information, which is used to represent the severity, impact, etc. of the corresponding abnormal event.
在该实现方式中,可从至少一项目标告警信息中分别提取告警级别,并将提取到的告警级别作为变更风险评估的依据。这使得风险评估值的计算逻辑更加简洁、巧妙。In this implementation, the alarm level can be extracted from at least one target alarm information, and the extracted alarm level can be used as the basis for change risk assessment. This makes the calculation logic of risk assessment values more concise and clever.
其中,风险评估值可用于表征目标变更事件的风险程度,通常,风险评估值越高代表目标变更事件的风险程度越高,其可能给云带来故障的可能性越高、造成的故障的严重程度可能也更高。Among them, the risk assessment value can be used to characterize the risk level of the target change event. Generally, the higher the risk assessment value, the higher the risk level of the target change event, and the higher the possibility and severity of the failure it may cause to the cloud. The degree may also be higher.
在该实现方式中,在根据至少一项目标告警信息各自对应的告警级别,计算目标变更事件对应的风险评估值的过程中:可分别确定至少一项目标告警信息各自与目标变更事件的关联程度;按照关联程度为至少一项目标告警信息分配权重;根据至少一项目标告警信息各自对应的告警级别及权重,计算目标变更事件对应的风险评估值。In this implementation, in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level of at least one piece of target alarm information: the degree of correlation between each of the at least one piece of target alarm information and the target change event can be determined. ; Assign a weight to at least one target alarm information according to the degree of correlation; Calculate the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one target alarm information.
其中,关联程度用于表征告警影响范围和变更影响范围之间的契合程度,变更影响范围中与告警影响范围的重叠部分越多表征变更影响范围与告警影响范围的契合程度越高,可为相应的目标告警信息分配更高的关联程度。为此,可查找变更影响范围和至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最小的级别对象;根据至少一项目标告警信息各自对应的最小的级别对象到变更对象之间的级别距离,为至少一项目标告警信息分配关联程度。其中,级别距离实质上是最小的级别对象与变更对象之间的隶属关系中的层级数。可选地,若变更影响范围和告警影响范围采用拓扑树表征,则可查找变更影响范围和至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最低树节点,其中,单个树节点对应一个级别对象;根据至少一项目标告警信息各自对应的最低树节点在表征变更影响范围的变更拓扑树中与变更对象之间的拓扑距离,确定至少一项目标告警信息各自对应的级别距离。其中,所对应的最低树节点在表征变更影响范围的拓扑树中的位置相同的目标告警信息之间具有相同的级别距离。参考图3,右上的告警拓扑树与左侧变更拓扑树之间重叠的最低树节点为AVS设备2;而右下的告警拓扑树与左侧变更拓扑树之间重叠的最低树节点则为可用区a,显然,右上的告警拓扑树对应的目标告警信息将获得比右下的告警拓扑树对应的目标告警信息更高的关联程度。实际应用中,可采用级别距离来表示关联程度,参考图3,右上的告警拓扑树与变更拓扑树之间的级别距离为1,则可为其对应的目标告警分配关联程度1;右下的告警拓扑树与变更拓扑树之间的级别距离为2,则可为其对应的目标告警分配关联程度2。Among them, the degree of correlation is used to represent the degree of fit between the alarm impact scope and the change impact scope. The more overlaps between the change impact scope and the alarm impact scope, the higher the fit between the change impact scope and the alarm impact scope, which can be the corresponding The target alarm information is assigned a higher degree of correlation. To this end, the smallest level object in the overlap between the change influence scope and the alarm influence scope corresponding to at least one target alarm information can be searched; according to the smallest level object corresponding to at least one target alarm information, to the change object The level distance between them is used to assign a correlation degree to at least one target alarm information. Among them, the level distance is essentially the number of levels in the affiliation relationship between the smallest level object and the change object. Optionally, if the change impact scope and the alarm impact scope are represented by a topology tree, the lowest tree node in the overlapping portion between the change impact scope and the alarm impact scope corresponding to at least one target alarm information can be searched, where a single tree The node corresponds to a level object; based on the topological distance between the lowest tree node corresponding to at least one target alarm information and the change object in the change topology tree that represents the scope of the change, determine the level distance corresponding to at least one target alarm information. . Among them, the corresponding lowest tree node has the same level distance between the target alarm information with the same position in the topology tree that represents the scope of the change. Referring to Figure 3, the lowest tree node overlapping between the alarm topology tree on the upper right and the change topology tree on the left is AVS device 2; while the lowest tree node overlapping between the alarm topology tree on the lower right and the change topology tree on the left is available. In area a, obviously, the target alarm information corresponding to the alarm topology tree on the upper right will obtain a higher degree of correlation than the target alarm information corresponding to the alarm topology tree on the lower right. In practical applications, level distance can be used to express the degree of correlation. Refer to Figure 3. The level distance between the alarm topology tree on the upper right and the changed topology tree is 1, so its corresponding target alarm can be assigned a correlation degree of 1; If the level distance between the alarm topology tree and the changed topology tree is 2, then its corresponding target alarm can be assigned a correlation level of 2.
在此基础上,可按照关联程度为至少一个目标告警信息分配权重,通常,关联程度越高的目标告警信息可分配更高的权重,以体现其对目标变更事件更高的参考作用。另外,相同关联程度的目标告警信息可分配到同样的权重。On this basis, a weight can be assigned to at least one target alarm information according to the degree of correlation. Generally, target alarm information with a higher degree of correlation can be assigned a higher weight to reflect its higher reference role for target change events. In addition, target alarm information with the same degree of correlation can be assigned the same weight.
可选地,可按照以下公式来计算权重:
Optionally, the weight can be calculated according to the following formula:
其中,p为上述的关联程度(可使用拓扑距离来表示),α、β为经验参数,q为当前级别距离对应的级别对象中的告警比例。告警比例用于表征变更影响范围中同级别内匹配到告警信息的对象数在该级别内全部对象数中的占比。举例来说,参考图4,在变更拓扑树的最小的级别内AVS设备1、AVS设备2和AVS设备3下的所有ECS实例全部匹配到了告警信息,则该层的告警比例可以为1(也即是100%);基于同样的理由,上一级别内AVS集群1下的3个AVS设备也全部匹配到了告警信息,因此,告警比例同样可以为1;而再上一级别内的可用区a下只有AVS集群1匹配到了告警信息,则可用区a的告警比例则可以是1/21(也即是该可用区中的21个集群中,只有1个匹配到了告警信息),再上一级别内地区的告警比例则可以为1/5。在此基础上,当要计算变更影响范围中级别距离为1(对应图中的最级别对象AVS设备1-3)的告警信息对应的关联程度时,可将上式中的q赋值为1,从而求取出f(1);同理,当要计算级别距离为3的告警信息对应的关联程度时,可将上式中的q赋值为1/21,从而求取出f(3)。Among them, p is the above-mentioned correlation degree (which can be represented by topological distance), α and β are empirical parameters, and q is the alarm proportion in the level object corresponding to the current level distance. The alarm ratio is used to represent the proportion of the number of objects matching alarm information within the same level within the scope of change to the total number of objects within that level. For example, referring to Figure 4, in the smallest level of the changed topology tree, all ECS instances under AVS device 1, AVS device 2 and AVS device 3 all match alarm information, then the alarm ratio of this layer can be 1 (also That is 100%); for the same reason, the three AVS devices under AVS cluster 1 in the previous level also all match the alarm information, so the alarm ratio can also be 1; and the availability zone a in the previous level If only AVS cluster 1 matches the alarm information, then the alarm ratio of availability zone a can be 1/21 (that is, among the 21 clusters in the availability zone, only 1 matches the alarm information), and then go up to the next level. The alarm ratio in inner areas can be 1/5. On this basis, when you want to calculate the degree of correlation corresponding to the alarm information with a level distance of 1 (corresponding to the highest level object AVS device 1-3 in the figure) in the change impact scope, you can assign q in the above formula to 1, Thus f(1) is obtained; similarly, when it is necessary to calculate the correlation degree corresponding to the alarm information with a level distance of 3, q in the above formula can be assigned a value of 1/21, thereby obtaining f(3).
其中,在根据至少一项目标告警信息各自对应的告警级别及权重,计算目标变更事件对应的风险评估值的过程中,可获取为至少一项目标告警信息各自对应的告警级别分配的初始风险值;基于至少一项目标告警信息各自对应的权重,对至少一项目标告警信息各自对应的初始风险值进行加权求和;根据加权求和的结果,确定目标变更事件对应的风险评估值。Among them, in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of the at least one target alarm information, the initial risk value assigned to the corresponding alarm level of the at least one target alarm information can be obtained ; Based on the corresponding weight of at least one piece of target alarm information, perform a weighted sum of the initial risk values corresponding to at least one piece of target alarm information; and determine the risk assessment value corresponding to the target change event based on the result of the weighted sum.
可选地,一种为不同告警级别分配的初始风险值的的示例性方案可以是:确定不同告警级别各自对应的基础风险值;统计不同告警级别各自在云网络中发生的历史频次;基于不同告警级别各自对应的历史频次,为不同告警级别分配调整系数;在不同告警级别下,按照相应的调整系数对相应的基础风险值进行加权,以获得不同告警级别各自对应的初始风险值。其中,对于历史频次更高的告警级别,可为其分配更高的调整系数,以使其初始风险值更高一些,对最终的风险评估值的影响也将更大一些。在该示例性方案中,考虑了不同告警级别在云网络中发生的历史频次来对基础风险值进行了微调,这样,随着云网络中发生的告警情况的变化,不同告警级别各自对应的初始风险值也将是动态变化的。当然,在该实现方式中,还可采用其它示例性方案来为不同告警级别分配初始风险值,而并不限于此。Optionally, an exemplary solution for assigning initial risk values to different alarm levels may be: determining the basic risk values corresponding to different alarm levels; counting the historical frequency of occurrence of different alarm levels in the cloud network; based on different alarm levels. The corresponding historical frequencies of each alarm level are assigned adjustment coefficients for different alarm levels; under different alarm levels, the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels. Among them, for alarm levels with higher historical frequency, a higher adjustment coefficient can be assigned to them, so that their initial risk values are higher and their impact on the final risk assessment value will be greater. In this exemplary solution, the basic risk value is fine-tuned by taking into account the historical frequency of occurrence of different alarm levels in the cloud network. In this way, as the alarm situation changes in the cloud network, the initial corresponding alarm levels of different alarm levels change. The value at risk will also be dynamic. Of course, in this implementation, other exemplary solutions can also be used to assign initial risk values to different alarm levels, but are not limited thereto.
在此基础上,可获取至少一项目标告警信息各自对应的告警级别的初始风险值。On this basis, the initial risk value of the corresponding alarm level of at least one target alarm information can be obtained.
在该实现方式中,风险评估值的计算逻辑可表征为以下公式:
vp=x*(1+f(p))
v=v1+v2+…+vp
In this implementation, the calculation logic of the risk assessment value can be characterized as the following formula:
v p =x*(1+f(p))
v=v 1 +v 2 +…+v p
其中,v表示目标变更事件的风险评估值,vp表示所有拓扑距离为p的目标告警信息所导致风险评估值之和,p表示前述的拓扑距离,f(p)表示关联程度为p的目标告警信息所分配到的权重,x表示关联程度为p的各个目标告警信息对应的初始风险值。可知,目标变更事件的风险评估值等于对所有目标告警信息的初始风险值进行加权求和的结果再加上所有目标告警信息的初始风险值。Among them, v represents the risk assessment value of the target change event, v p represents the sum of risk assessment values caused by all target alarm information with a topological distance of p, p represents the aforementioned topological distance, and f(p) represents the target with a correlation degree of p. The weight assigned to the alarm information, x represents the initial risk value corresponding to each target alarm information with a correlation degree of p. It can be seen that the risk assessment value of the target change event is equal to the weighted sum of the initial risk values of all target alarm information plus the initial risk value of all target alarm information.
通过这种方式来计算风险评估值,可将云网络中发生的与目标变更事件相匹配的目标告警信息作为风险评估的依据,也可根据目标告警信息与目标变更事件的关联程度来分类各向目标告警信息在计算风险评估值过程中的参与程度,这样,可综合考虑所有目标告警信息提供的风险依据,避免因其中少量目标告警信息而片面地判断目标变更事件存在风险。例如,云网络中的告警信息有可能是用户行为所导致的,但是这些告警信息很难被准确地剔除出来,在该实现方式中,即使这些用户行为导致的告警信息参与到风险评估值的计算过程中来,由于用户行为导致的告警信息通常是局部的、暂时的,因此,其在风险评估值计算过程中的参与程度并不会过于强烈,这无形中将这类告警信息对最终风险评估值的影响力进行了弱化,可有效避免因用户行为而导致的风险评估误判问题。By calculating the risk assessment value in this way, the target alarm information that matches the target change event that occurs in the cloud network can be used as the basis for risk assessment, and each direction can also be classified according to the degree of correlation between the target alarm information and the target change event. The degree of participation of target alarm information in the process of calculating risk assessment values. In this way, the risk basis provided by all target alarm information can be comprehensively considered to avoid one-sided judgments about the risk of target change events due to a small amount of target alarm information. For example, alarm information in the cloud network may be caused by user behavior, but these alarm information are difficult to accurately eliminate. In this implementation, even if the alarm information caused by these user behaviors participates in the calculation of risk assessment values During the process, the alarm information caused by user behavior is usually local and temporary. Therefore, its participation in the risk assessment value calculation process is not too strong, which invisibly affects the final risk assessment. The influence of value has been weakened, which can effectively avoid the problem of misjudgment of risk assessment caused by user behavior.
上述实现方式可通过将目标告警信息关联至目标变更事件,而且通过告警比例、权重、调整系数、初始风险值等多种维度来综合考量各项目标告警信息应当在风险评估值中发挥的影响程度,从而合理地分析目标告警信息,以获得风险评估值。The above implementation method can associate target alarm information to target change events, and comprehensively consider the degree of influence that each target alarm information should play in the risk assessment value through various dimensions such as alarm proportion, weight, adjustment coefficient, and initial risk value. , so as to reasonably analyze the target alarm information to obtain the risk assessment value.
应当理解的是,本实施例中,还可采用其它实现方式来根据目标告警信息计算出风险评估值,例如,采用机器学习的方式来学习告警信息与风险评估值之间的映射关系等等,本实施例在此不做详述。It should be understood that in this embodiment, other implementation methods can also be used to calculate the risk assessment value based on the target alarm information, for example, using machine learning to learn the mapping relationship between the alarm information and the risk evaluation value, etc., This embodiment will not be described in detail here.
进一步地,本实施例中,还可设定风险阈值,并将前述的预设条件设置为超过风险阈值,这样,在为目标变更事件计算出的风险评估值超过风险阈值的情况下,可确定目标变更事件存在风险。在确定目标变更事件存在风险的情况下,可发出提醒通知;提醒通知可输出给运维人员,以供运维人员确认对目标变更事件的处置方案,例如,可以是暂停变更或在线修改变更等,本实施例在此不做限定。Furthermore, in this embodiment, a risk threshold can also be set, and the aforementioned preset condition is set to exceed the risk threshold. In this way, when the risk assessment value calculated for the target change event exceeds the risk threshold, it can be determined Target change events are risky. When it is determined that there is a risk in the target change event, a reminder notification can be issued; the reminder notification can be output to the operation and maintenance personnel for the operation and maintenance personnel to confirm the handling plan for the target change event, for example, it can be to suspend the change or modify the change online, etc. , this embodiment is not limited here.
一种示例性的确定风险阈值的方案可以是:持续收集为云网络中发生的历史变更事件所计算的风险评估值,作为评估值样本;按照不同风险评估值各自被记录的次数,对收集到的评估值样本进行分布拟合,以获得拟合函数;基于拟合函数,选取风险阈值。An exemplary solution for determining the risk threshold may be: continuously collect risk assessment values calculated for historical change events that occur in the cloud network as assessment value samples; and compare the collected values according to the number of times different risk assessment values have been recorded. Distribution fitting is performed on the evaluation value samples to obtain the fitting function; based on the fitting function, the risk threshold is selected.
图5为本申请一示例性实施例提供的一种风险阈值确定方案的逻辑示意图,参考图5,可以评估值样本中的风险评估值作为X轴,以评估值样本涉及到的各个风险评估值被记录次数为Y轴,获得风险评估值的分布数据,并生成对应的分布拟合函数。Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application. Referring to Figure 5, the risk assessment value in the assessment value sample can be used as the X-axis, and each risk assessment value involved in the assessment value sample The number of times recorded is the Y-axis, the distribution data of the risk assessment value is obtained, and the corresponding distribution fitting function is generated.
在选取风险阈值的过程中:若评估值样本的数量高于指定数量,则基于拟合函数对评估值样本进行分布排序;从分布排序后的评估值样本中选择与预置误报率适配的目标评估值样本;将目标评估值样本对应的风险评估值作为风险阈值;若评估值样本的数量低于指定数量,则则基于拟合函数对评估值样本进行分布排序;从分布排序后的评估值样本中选择与预置分布累计概率适配的目标评估值样本;将目标评估值样本对应的风险评估值作为风险阈值。例如,图5中评估值样本不足500(对应前述的指定数量)条,若评估值样本为100条,按照预置分布累计概率为99%为条件,则计算出风险阈值为9.4,这样,在目标变更事件的风险评估值高于9.4的情况下,将被认定为存在风险。In the process of selecting the risk threshold: if the number of evaluation value samples is higher than the specified number, the evaluation value samples are distributed and sorted based on the fitting function; the evaluation value samples after distribution sorting are selected to match the preset false alarm rate target evaluation value sample; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold; if the number of evaluation value samples is lower than the specified number, the evaluation value samples will be distributed and sorted based on the fitting function; sorted from the distribution Select the target evaluation value sample that is adapted to the cumulative probability of the preset distribution among the evaluation value samples; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold. For example, the evaluation value samples in Figure 5 are less than 500 (corresponding to the specified number mentioned above). If the evaluation value samples are 100, and the cumulative probability of the preset distribution is 99%, the risk threshold is calculated to be 9.4. In this way, If the risk assessment value of a target change event is higher than 9.4, it will be deemed to be a risk.
据此,本实施例中,可基于目标告警信息,简洁、高效、准确地计算出目标变更事件对应的风险评估值,并通过判断风险评估值是否超过风险阈值来研判目标变更事件是否存在风险,从而可在变更测试阶段或者变更上线后的运行阶段中,及时发现变更事件的风险,进而有效规避变更可能给云带来的故障。Accordingly, in this embodiment, the risk assessment value corresponding to the target change event can be calculated simply, efficiently, and accurately based on the target alarm information, and whether there is a risk in the target change event by judging whether the risk assessment value exceeds the risk threshold. In this way, the risk of change events can be discovered in a timely manner during the change testing phase or the operation phase after the change is launched, thereby effectively avoiding the failures that the change may bring to the cloud.
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。It should be noted that the execution subject of each step of the method provided in the above embodiments may be the same device, or the method may also be executed by different devices. Some of the processes described in the above embodiments and drawings include multiple operations that appear in a specific order, but it should be clearly understood that these operations may not be performed in the order in which they appear in this article or may be performed in parallel. The operations The serial numbers such as 101, 102, etc. are only used to distinguish different operations. The serial numbers themselves do not represent any execution order. Additionally, these processes may include more or fewer operations, and the operations may be performed sequentially or in parallel.
图6为本申请又一示例性实施例提供的一种计算设备的结构示意图。如图6所示,该计算设备包括:存储器60和处理器61。FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application. As shown in FIG. 6 , the computing device includes: a memory 60 and a processor 61 .
处理器61,与存储器60耦合,用于执行存储器60中的计算机程序,以用于:The processor 61 is coupled to the memory 60 and is used to execute the computer program in the memory 60 for:
响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,变更影响范围中包括目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;In response to the risk assessment instructions, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network. The topology information includes the affiliation relationships between objects at different levels in the cloud network and the association relationships between objects at the same level. The change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
收集云网络中在目标变更事件发生后的预设时间范围内所产生的多项告警信息;Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
根据拓扑信息分别确定多项告警信息各自对应的告警影响范围,告警影响范围包括告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Determine the alarm impact scope corresponding to multiple alarm information according to the topology information. The alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
从多项告警信息中选择与目标变更事件适配的至少一项目标告警信息,目标告警信息对应的告警影响范围与变更影响范围之间具有重叠部分;Select at least one target alarm information that is adapted to the target change event from multiple alarm information, and there is overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
基于至少一项目标告警信息,对目标变更事件进行风险评估。Based on at least one target alarm information, conduct a risk assessment on target change events.
在一可选实施例中,处理器61在确定变更影响范围中包括的目标变更事件的变更对象对应的至少一层上级对象的过程中,用于:In an optional embodiment, in the process of determining at least one layer of superior objects corresponding to the change objects of the target change event included in the change impact scope, the processor 61 is configured to:
按照拓扑信息,逐级别查找变更对象所隶属的级别对象,直至查找到指定级别的对象后结束查找,以获得目标变更事件的变更对象对应的至少一层上级对象;According to the topology information, search the level objects to which the change object belongs level by level until the object of the specified level is found and end the search to obtain at least one layer of superior objects corresponding to the change object of the target change event;
确定告警影响范围中包括的告警信息的告警发生对象对应的至少一层上级对象的过程中,用于:In the process of determining at least one level of superior objects corresponding to the alarm occurrence objects of alarm information included in the alarm impact scope, it is used to:
按照拓扑信息,逐级别查找告警发生对象所隶属的级别对象,直至查找到指定级别的对象后结束查找,以获得告警信息的告警发生对象对应的至少一层上级对象。According to the topology information, the level objects to which the alarm occurrence object belongs are searched level by level until the object of the specified level is found and the search is terminated to obtain at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information.
在一可选实施例中,拓扑信息采用树结构,处理器61还可用于:In an optional embodiment, the topology information adopts a tree structure, and the processor 61 can also be used to:
按照拓扑信息对应的树结构,将目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象组织为以变更对象作为根节点的变更拓扑树,以表征变更影响范围;According to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized into a change topology tree with the change object as the root node to represent the scope of change influence;
按照拓扑信息对应的树结构,将告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象组织为以告警发生对象作为根节点的告警拓扑树,以表征告警影响范围。According to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects at the same level are organized into an alarm topology tree with the alarm occurrence object as the root node to represent the scope of alarm influence.
在一可选实施例中,处理器61在基于至少一项目标告警信息,对目标变更事件进行风险评估过程中,用于:In an optional embodiment, in the process of risk assessment of a target change event based on at least one piece of target alarm information, the processor 61 is configured to:
获取至少一项目标告警信息中记录的告警级别;Obtain the alarm level recorded in at least one target alarm information;
根据至少一项目标告警信息各自对应的告警级别,计算目标变更事件对应的风险评估值;Calculate the risk assessment value corresponding to the target change event based on the corresponding alarm level of at least one target alarm information;
若风险评估值满足高于风险阈值,则确定目标变更事件存在风险。If the risk assessment value is higher than the risk threshold, it is determined that the target change event is risky.
在一可选实施例中,处理器61在根据至少一项目标告警信息各自对应的告警级别,计算目标变更事件对应的风险评估值过程中,用于:In an optional embodiment, in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level of at least one item of target alarm information, the processor 61 is used to:
分别确定至少一项目标告警信息各自与目标变更事件的关联程度;Respectively determine the degree of correlation between at least one piece of target alarm information and the target change event;
按照关联程度为至少一项目标告警信息分配权重;Assign a weight to at least one target alarm information according to the degree of correlation;
根据至少一项目标告警信息各自对应的告警级别及权重,计算目标变更事件对应的风险评估值。Calculate the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one target alarm information.
在一可选实施例中,处理器61在分别确定至少一项目标告警信息各自与目标变更事件的关联程度过程中,用于:In an optional embodiment, in the process of determining the degree of correlation between at least one piece of target alarm information and the target change event, the processor 61 is configured to:
查找变更影响范围和至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最小的级别对象;Find the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to at least one target alarm information;
根据至少一项目标告警信息各自对应的最小的级别对象到变更对象之间的级别距离,为至少一项目标告警信息分配关联程度。A correlation degree is assigned to at least one piece of target alarm information based on the level distance between the minimum level object corresponding to each of the at least one piece of target alarm information and the change object.
在一可选实施例中,处理器61在查找变更影响范围和至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最小的级别对象过程中,用于:In an optional embodiment, in the process of searching for the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to at least one piece of target alarm information, the processor 61 is used to:
若变更影响范围和告警影响范围采用拓扑树表征,则查找变更影响范围和至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最低树节点,其中,单个树节点对应一个级别对象;If the change impact scope and the alarm impact scope are represented by a topology tree, search for the lowest tree node in the overlap between the change impact scope and the alarm impact scope corresponding to at least one target alarm information, where a single tree node corresponds to a level object. ;
确定级别距离的过程中,用于:The process of determining level distance is used for:
根据至少一项目标告警信息各自对应的最低树节点在表征变更影响范围的变更拓扑树中与变更对象之间的拓扑距离,确定至少一项目标告警信息各自对应的级别距离。Based on the topological distance between the lowest tree node corresponding to the at least one target alarm information and the change object in the change topology tree that represents the scope of the change, determine the level distance corresponding to the at least one target alarm information.
在一可选实施例中,处理器61在根据至少一项目标告警信息各自对应的告警级别及权重,计算目标变更事件对应的风险评估值过程中,用于:In an optional embodiment, in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one piece of target alarm information, the processor 61 is used to:
获取为至少一项目标告警信息各自对应的告警级别分配的初始风险值;Obtain the initial risk value assigned to the corresponding alarm level of at least one target alarm information;
基于至少一项目标告警信息各自对应的权重,对至少一项目标告警信息各自对应的初始风险值进行加权求和;Based on the corresponding weights of at least one piece of target alarm information, perform a weighted summation of the respective initial risk values corresponding to at least one piece of target alarm information;
根据加权求和的结果,确定目标变更事件对应的风险评估值。Based on the results of the weighted sum, determine the risk assessment value corresponding to the target change event.
在一可选实施例中,处理器61在为不同告警级别分配的初始风险值的过程中,用于:In an optional embodiment, in the process of assigning initial risk values to different alarm levels, the processor 61 is used to:
确定不同告警级别各自对应的基础风险值;Determine the basic risk values corresponding to different alarm levels;
统计不同告警级别各自在云网络中发生的历史频次;Statistics of the historical frequency of occurrence of different alarm levels in the cloud network;
基于不同告警级别各自对应的历史频次,为不同告警级别分配调整系数;Based on the corresponding historical frequencies of different alarm levels, allocate adjustment coefficients to different alarm levels;
在不同告警级别下,按照相应的调整系数对相应的基础风险值进行加权,以获得不同告警级别各自对应的初始风险值。Under different alarm levels, the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
在一可选实施例中,处理器61还可用于:In an optional embodiment, the processor 61 can also be used to:
持续收集为云网络中发生的历史变更事件所计算的风险评估值,作为评估值样本;Continuously collect risk assessment values calculated for historical change events that occur in the cloud network as assessment value samples;
按照不同风险评估值各自被记录的次数,对收集到的评估值样本进行分布拟合,以获得拟合函数;According to the number of times different risk assessment values are recorded, distribution fitting is performed on the collected assessment value samples to obtain a fitting function;
基于拟合函数,选取风险阈值。Based on the fitting function, the risk threshold is selected.
进一步,如图6所示,该计算设备还包括:通信组件62、电源组件63等其它组件。图6中仅示意性给出部分组件,并不意味着计算设备只包括图6所示组件。Further, as shown in Figure 6, the computing device also includes: a communication component 62, a power supply component 63 and other components. Only some components are schematically shown in FIG. 6 , which does not mean that the computing device only includes the components shown in FIG. 6 .
值得说明的是,上述关于计算设备各实施例中的技术细节,可参考前述的方法实施例中的相关描述,为节省篇幅,在此不再赘述,但这不应造成本申请保护范围的损失。It is worth noting that for the above technical details in each embodiment of the computing device, reference can be made to the relevant descriptions in the foregoing method embodiments. To save space, they will not be repeated here, but this should not cause a loss in the scope of protection of the present application. .
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现上述方法实施例中可由计算设备执行的各步骤。Correspondingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program. When the computer program is executed, it can implement each step that can be executed by a computing device in the above method embodiment.
上述图6中的存储器,用于存储计算机程序,并可被配置为存储其它各种数据以支持在计算平台上的操作。这些数据的示例包括用于在计算平台上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory in Figure 6 above is used to store computer programs, and can be configured to store various other data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform, contact data, phonebook data, messages, pictures, videos, etc. Memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable memory Read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
上述图6中的通信组件,被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 The communication component in Figure 6 mentioned above is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices. The device where the communication component is located can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
上述图6中的电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。The power supply component in Figure 6 above provides power to various components of the device where the power supply component is located. A power component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power component resides.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will understand that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按 照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。 The above descriptions are only examples of the present application and are not intended to limit the present application. To those skilled in the art, various modifications and variations may be made to this application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included in the protection scope of this application.

Claims (12)

  1. 一种变更风险评估方法,包括:A change risk assessment approach that includes:
    响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,所述拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,所述变更影响范围中包括所述目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network. The topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
    收集云网络中在所述目标变更事件发生后的预设时间范围内所产生的多项告警信息;Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
    根据所述拓扑信息分别确定所述多项告警信息各自对应的告警影响范围,所述告警影响范围包括所述告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Determine the alarm influence scope corresponding to each of the multiple alarm information according to the topology information, and the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
    从所述多项告警信息中选择与所述目标变更事件适配的至少一项目标告警信息,所述目标告警信息对应的告警影响范围与所述变更影响范围之间具有重叠部分;Select at least one piece of target alarm information adapted to the target change event from the plurality of alarm information, and there is an overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
    基于所述至少一项目标告警信息,对所述目标变更事件进行风险评估。Based on the at least one target alarm information, a risk assessment is performed on the target change event.
  2. 根据权利要求1所述的方法,确定所述变更影响范围中包括的所述目标变更事件的变更对象对应的至少一层上级对象的过程,包括:According to the method of claim 1, the process of determining at least one layer of superior objects corresponding to the change objects of the target change event included in the change impact scope includes:
    按照所述拓扑信息,逐级别查找出所述变更对象所隶属的级别对象,直至查找到指定级别的对象后结束查找,以获得所述目标变更事件的变更对象对应的至少一层上级对象;According to the topology information, find the level objects to which the change object belongs level by level until the object of the specified level is found and end the search to obtain at least one layer of superior objects corresponding to the change object of the target change event;
    确定所述告警影响范围中包括的所述告警信息的告警发生对象对应的至少一层上级对象的过程,包括:The process of determining at least one layer of superior objects corresponding to the alarm occurrence objects of the alarm information included in the alarm impact scope includes:
    按照所述拓扑信息,逐级别查找所述告警发生对象所隶属的级别对象,直至查找到指定级别的对象后结束查找,以获得所述告警信息的告警发生对象对应的至少一层上级对象。According to the topology information, the level object to which the alarm occurrence object belongs is searched level by level until an object of a specified level is found and the search is terminated to obtain at least one layer of upper-level objects corresponding to the alarm occurrence object of the alarm information.
  3. 根据权利要求2所述的方法,所述拓扑信息采用树结构,所述方法还包括:According to the method of claim 2, the topology information adopts a tree structure, and the method further includes:
    按照所述拓扑信息对应的树结构,将所述目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象组织为以所述变更对象作为根节点的变更拓扑树,以表征所述变更影响范围;According to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level are organized into a change topology tree with the change object as the root node to represent The scope of impact of the change;
    按照所述拓扑信息对应的树结构,将所述告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象组织为以所述告警发生对象作为根节点的告警拓扑树,以表征所述告警影响范围。According to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level are organized into an alarm topology tree with the alarm occurrence object as the root node, so as to Characterize the impact scope of the alarm.
  4. 根据权利要求1所述的方法,所述基于所述至少一项目标告警信息,对所述目标变更事件进行风险评估,包括:The method according to claim 1, performing a risk assessment on the target change event based on the at least one target alarm information includes:
    获取所述至少一项目标告警信息中记录的告警级别;Obtain the alarm level recorded in the at least one target alarm information;
    根据所述至少一项目标告警信息各自对应的告警级别,计算所述目标变更事件对 应的风险评估值;Calculate the target change event pair according to the corresponding alarm level of the at least one target alarm information. appropriate risk assessment value;
    若所述风险评估值高于风险阈值,则确定所述目标变更事件存在风险。If the risk assessment value is higher than the risk threshold, it is determined that the target change event is risky.
  5. 根据权利要求4所述的方法,所述根据所述至少一项目标告警信息各自对应的告警级别,计算所述目标变更事件对应的风险评估值,包括:The method according to claim 4, wherein calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level of each of the at least one target alarm information includes:
    分别确定所述至少一项目标告警信息各自与所述目标变更事件的关联程度;Determine respectively the degree of correlation between each of the at least one target alarm information and the target change event;
    按照关联程度为所述至少一项目标告警信息分配权重;Assign a weight to the at least one target alarm information according to the degree of correlation;
    根据所述至少一项目标告警信息各自对应的告警级别及权重,计算所述目标变更事件对应的风险评估值。Calculate the risk assessment value corresponding to the target change event according to the corresponding alarm level and weight of the at least one target alarm information.
  6. 根据权利要求5所述的方法,所述分别确定所述至少一项目标告警信息各自与所述目标变更事件的关联程度,包括:The method according to claim 5, wherein determining the degree of correlation between each of the at least one target alarm information and the target change event includes:
    查找所述变更影响范围和所述至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最小的级别对象;Find the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to the at least one target alarm information;
    根据所述至少一项目标告警信息各自对应的最小的级别对象到所述变更对象之间的级别距离,为所述至少一项目标告警信息分配所述关联程度。The degree of association is assigned to the at least one item of target alarm information based on the level distance between the minimum level object corresponding to each of the at least one item of target alarm information and the change object.
  7. 根据权利要求6所述的方法,所述查找所述变更影响范围和所述至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最小的级别对象,包括:The method according to claim 6, wherein the search for the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to the at least one item of target alarm information includes:
    若所述变更影响范围和所述告警影响范围采用拓扑树表征,则查找所述变更影响范围和所述至少一项目标告警信息各自对应的告警影响范围之间重叠部分中的最低树节点,其中,单个树节点对应一个级别对象;If the change impact scope and the alarm impact scope are represented by a topology tree, search for the lowest tree node in the overlap between the change impact scope and the alarm impact scope corresponding to the at least one target alarm information, where , a single tree node corresponds to a level object;
    所述确定级别距离的过程,包括:The process of determining level distance includes:
    根据所述至少一项目标告警信息各自对应的最低树节点在表征所述变更影响范围的变更拓扑树中与所述变更对象之间的拓扑距离,确定所述至少一项目标告警信息各自对应的所述级别距离。According to the topological distance between the lowest tree node corresponding to each of the at least one target alarm information and the change object in the change topology tree that represents the scope of the change, determine the corresponding minimum tree node of each of the at least one target alarm information. The level distance.
  8. 根据权利要求5所述的方法,所述根据所述至少一项目标告警信息各自对应的告警级别及权重,计算所述目标变更事件对应的风险评估值,包括:The method according to claim 5, calculating the risk assessment value corresponding to the target change event based on the respective alarm levels and weights of the at least one target alarm information, including:
    获取为所述至少一项目标告警信息各自对应的告警级别分配的初始风险值;Obtain the initial risk value assigned to the corresponding alarm level of each of the at least one target alarm information;
    基于所述至少一项目标告警信息各自对应的权重,对所述至少一项目标告警信息各自对应的初始风险值进行加权求和;Based on the respective weights corresponding to the at least one item of target alarm information, perform a weighted summation of the initial risk values corresponding to each of the at least one item of target alarm information;
    根据加权求和的结果,确定所述目标变更事件对应的风险评估值。According to the result of the weighted sum, the risk assessment value corresponding to the target change event is determined.
  9. 根据权利要求8所述的方法,为不同告警级别分配的初始风险值的过程,包括:According to the method of claim 8, the process of assigning initial risk values to different alarm levels includes:
    确定不同告警级别各自对应的基础风险值;Determine the basic risk values corresponding to different alarm levels;
    统计不同告警级别各自在云网络中发生的历史频次;Statistics of the historical frequency of occurrence of different alarm levels in the cloud network;
    基于不同告警级别各自对应的历史频次,为不同告警级别分配调整系数;Based on the corresponding historical frequencies of different alarm levels, allocate adjustment coefficients to different alarm levels;
    在不同告警级别下,按照相应的调整系数对相应的基础风险值进行加权,以获得不同告警级别各自对应的初始风险值。 Under different alarm levels, the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
  10. 根据权利要求4所述的方法,还包括:The method of claim 4, further comprising:
    持续收集为云网络中发生的历史变更事件所计算的风险评估值,作为评估值样本;Continuously collect risk assessment values calculated for historical change events that occur in the cloud network as assessment value samples;
    按照不同风险评估值各自被记录的次数,对收集到的评估值样本进行分布拟合,以获得拟合函数;According to the number of times different risk assessment values are recorded, distribution fitting is performed on the collected assessment value samples to obtain a fitting function;
    基于所述拟合函数,选取所述风险阈值。Based on the fitting function, the risk threshold is selected.
  11. 一种计算设备,包括存储器和处理器;A computing device including a memory and a processor;
    所述存储器用于存储一条或多条计算机指令;The memory is used to store one or more computer instructions;
    所述处理器与所述存储器耦合,用于执行所述一条或多条计算机指令,以用于:The processor is coupled to the memory for executing the one or more computer instructions for:
    响应于风险评估指令,根据云网络中预置的拓扑信息确定目标变更事件对应的变更影响范围,所述拓扑信息中包含云网络中不同级别对象之间的隶属关系以及同级别对象之间的关联关系,所述变更影响范围中包括所述目标变更事件的变更对象对应的至少一层上级对象以及所关联的同级别对象;In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network. The topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
    收集云网络中在所述目标变更事件发生后的预设时间范围内所产生的多项告警信息;Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
    根据所述拓扑信息分别确定所述多项告警信息各自对应的告警影响范围,所述告警影响范围包括所述告警信息的告警发生对象对应的至少一层上级对象以及所关联的同级别对象;Determine the alarm influence scope corresponding to each of the multiple alarm information according to the topology information, and the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
    从所述多项告警信息中选择与所述目标变更事件适配的至少一项目标告警信息,所述目标告警信息对应的告警影响范围与所述变更影响范围之间具有重叠部分;Select at least one piece of target alarm information adapted to the target change event from the plurality of alarm information, and there is an overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
    基于所述至少一项目标告警信息,对所述目标变更事件进行风险评估。Based on the at least one target alarm information, a risk assessment is performed on the target change event.
  12. 一种存储计算机指令的计算机可读存储介质,当所述计算机指令被一个或多个处理器执行时,致使所述一个或多个处理器执行权利要求1-10任一项所述的变更风险评估方法。 A computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to execute the change risk described in any one of claims 1-10 assessment method.
PCT/CN2023/089099 2022-04-27 2023-04-19 Change risk assessment method and apparatus, and storage medium WO2023207689A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210459479.X 2022-04-27
CN202210459479.XA CN115102834B (en) 2022-04-27 2022-04-27 Change risk assessment method, device and storage medium

Publications (1)

Publication Number Publication Date
WO2023207689A1 true WO2023207689A1 (en) 2023-11-02

Family

ID=83287651

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089099 WO2023207689A1 (en) 2022-04-27 2023-04-19 Change risk assessment method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN115102834B (en)
WO (1) WO2023207689A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102834B (en) * 2022-04-27 2024-04-16 浙江大学 Change risk assessment method, device and storage medium
CN116977062B (en) * 2023-08-04 2024-01-23 江苏臻云技术有限公司 Risk label management system and method for financial business

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178038A1 (en) * 2015-12-22 2017-06-22 International Business Machines Corporation Discovering linkages between changes and incidents in information technology systems
CN107124299A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 Resource method for early warning and system based on resource topology
CN113313419A (en) * 2021-06-23 2021-08-27 中国农业银行股份有限公司 Information system window change risk obtaining method and device
CN113450033A (en) * 2021-09-02 2021-09-28 广州嘉为科技有限公司 CMDB-based change influence analysis method and management equipment
CN113792554A (en) * 2021-09-18 2021-12-14 中国建设银行股份有限公司 Method and device for evaluating change influence based on knowledge graph
CN115102834A (en) * 2022-04-27 2022-09-23 浙江大学 Change risk assessment method, equipment and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546274A (en) * 2010-12-20 2012-07-04 中国移动通信集团广西有限公司 Alarm monitoring method and alarm monitoring equipment in communication service
CN103856344B (en) * 2012-12-05 2017-09-15 中国移动通信集团北京有限公司 A kind of alarm event information processing method and device
CN104125217A (en) * 2014-06-30 2014-10-29 复旦大学 Cloud data center real-time risk assessment method based on mainframe log analysis
CN106209829A (en) * 2016-07-05 2016-12-07 杨林 A kind of network security management system based on warning strategies
CN107204876B (en) * 2017-05-22 2020-09-29 成都网络空间安全技术有限公司 Network security risk assessment method
CN108108902B (en) * 2017-12-26 2021-06-29 创新先进技术有限公司 Risk event warning method and device
US11734636B2 (en) * 2019-02-27 2023-08-22 University Of Maryland, College Park System and method for assessing, measuring, managing, and/or optimizing cyber risk
CN114338435B (en) * 2020-09-24 2024-02-09 腾讯科技(深圳)有限公司 Network change monitoring method, device, computer equipment and storage medium
CN112329022A (en) * 2020-11-11 2021-02-05 浙江长三角车联网安全技术有限公司 Intelligent network automobile information security risk assessment method and system
CN112446640A (en) * 2020-12-10 2021-03-05 中国农业银行股份有限公司 Information system change risk assessment method, related equipment and readable storage medium
CN112540905A (en) * 2020-12-18 2021-03-23 青岛特来电新能源科技有限公司 System risk assessment method, device, equipment and medium under micro-service architecture
CN112559023A (en) * 2020-12-24 2021-03-26 中国农业银行股份有限公司 Method, device and equipment for predicting change risk and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178038A1 (en) * 2015-12-22 2017-06-22 International Business Machines Corporation Discovering linkages between changes and incidents in information technology systems
CN107124299A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 Resource method for early warning and system based on resource topology
CN113313419A (en) * 2021-06-23 2021-08-27 中国农业银行股份有限公司 Information system window change risk obtaining method and device
CN113450033A (en) * 2021-09-02 2021-09-28 广州嘉为科技有限公司 CMDB-based change influence analysis method and management equipment
CN113792554A (en) * 2021-09-18 2021-12-14 中国建设银行股份有限公司 Method and device for evaluating change influence based on knowledge graph
CN115102834A (en) * 2022-04-27 2022-09-23 浙江大学 Change risk assessment method, equipment and storage medium

Also Published As

Publication number Publication date
CN115102834A (en) 2022-09-23
CN115102834B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
US11082285B2 (en) Network event grouping
WO2023207689A1 (en) Change risk assessment method and apparatus, and storage medium
JP7145764B2 (en) Network advisor based on artificial intelligence
CN106886485B (en) System capacity analysis and prediction method and device
US20180349797A1 (en) Data driven methods and systems for what if analysis
CN110519365B (en) Method for changing equipment service and service changing system
US10965541B2 (en) Method and system to proactively determine potential outages in an information technology environment
US20170068581A1 (en) System and method for relationship based root cause recommendation
US10909018B2 (en) System and method for end-to-end application root cause recommendation
CN106776288B (en) A kind of health metric method of the distributed system based on Hadoop
RU2716029C1 (en) System for monitoring quality and processes based on machine learning
Xu et al. Lightweight and adaptive service api performance monitoring in highly dynamic cloud environment
CN105184886A (en) Cloud data center intelligence inspection system and cloud data center intelligence inspection method
CN114064196A (en) System and method for predictive assurance
JP6252309B2 (en) Monitoring omission identification processing program, monitoring omission identification processing method, and monitoring omission identification processing device
CN113297044A (en) Operation and maintenance risk early warning method and device
CN114490303A (en) Fault root cause determination method and device and cloud equipment
CN114819367A (en) Public service platform based on industrial internet
CN114676002A (en) PHM technology-based system operation and maintenance method and device
US11558271B2 (en) System and method of comparing time periods before and after a network temporal event
CN117172721B (en) Data flow supervision early warning method and system for financing service
Yu et al. Predicting gray fault based on context graph in container-based cloud
US11886451B2 (en) Quantization of data streams of instrumented software and handling of delayed data by adjustment of a maximum delay
US20240004765A1 (en) Data processing method and apparatus for distributed storage system, device, and storage medium
US20230105304A1 (en) Proactive avoidance of performance issues in computing environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795122

Country of ref document: EP

Kind code of ref document: A1