WO2023207689A1 - Procédé et appareil d'évaluation de risque de changement, et support de stockage - Google Patents

Procédé et appareil d'évaluation de risque de changement, et support de stockage Download PDF

Info

Publication number
WO2023207689A1
WO2023207689A1 PCT/CN2023/089099 CN2023089099W WO2023207689A1 WO 2023207689 A1 WO2023207689 A1 WO 2023207689A1 CN 2023089099 W CN2023089099 W CN 2023089099W WO 2023207689 A1 WO2023207689 A1 WO 2023207689A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
change
target
alarm information
level
Prior art date
Application number
PCT/CN2023/089099
Other languages
English (en)
Chinese (zh)
Inventor
吕彪
戚依宁
王绍哲
党浩
方崇荣
祝顺民
蒋江伟
程鹏
陈积明
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2023207689A1 publication Critical patent/WO2023207689A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design

Definitions

  • This application relates to the field of cloud technology, and in particular to a change risk assessment method, equipment and storage medium.
  • the embodiment of this application provides a change risk assessment method, including:
  • the change impact scope In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • a risk assessment is performed on the target change event.
  • An embodiment of the present application also provides a computing device, including a memory and a processor;
  • the memory is used to store one or more computer instructions
  • the processor is coupled to the memory for executing the one or more computer instructions for:
  • the change impact scope In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes affiliation relationships between objects at different levels in the cloud network and associations between objects at the same level. Relationship, the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm influence scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • a risk assessment is performed on the target change event.
  • Embodiments of the present application also provide a computer-readable storage medium that stores computer instructions.
  • the computer instructions are executed by one or more processors, the one or more processors are caused to execute the aforementioned change risk assessment method.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation range of the change risk assessment work is large enough and help improve the accuracy of the risk assessment; Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner.
  • Figure 1a is a schematic flow chart of a change risk assessment method provided by an exemplary embodiment of the present application
  • Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application
  • Figure 2a is a logical schematic diagram of network scope expansion of a change object provided by an exemplary embodiment of the present application
  • Figure 2b is a logical schematic diagram of extending the network range of an alarm occurrence object provided by an exemplary embodiment of the present application
  • Figure 3 is a logical schematic diagram of a solution for selecting target alarm information for target change events provided by an exemplary embodiment of the present application
  • Figure 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application
  • Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the risk assessment. Accuracy; alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, this can promptly discover problems that have not yet appeared in the cloud network.
  • Figure 1a is a schematic flowchart of a change risk assessment method provided by an exemplary embodiment of the present application.
  • the method can be executed by a change risk assessment device.
  • the change risk assessment device can be implemented as a combination of software and/or hardware.
  • the change risk assessment method can be implemented as a combination of software and/or hardware.
  • the evaluation device can be integrated in the computing device. Referring to Figure 1a, the method may include:
  • Step 100 In response to the risk assessment instruction, determine the change impact scope corresponding to the target change event based on the topology information preset in the cloud network.
  • the topology information includes the affiliation relationships between objects at different levels in the cloud network and the relationships between objects at the same level. Association relationship, the scope of change influence includes at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level;
  • Step 101 Collect multiple alarm information generated in the cloud network within a preset time range after the target change event occurs;
  • Step 102 Determine the alarm impact scope corresponding to the multiple alarm information according to the topology information.
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • Step 103 Select at least one target alarm information that is adapted to the target change event from multiple alarm information, and there is overlap between the alarm impact scope corresponding to the target alarm information and the change impact scope;
  • Step 104 Perform a risk assessment on the target change event based on at least one target alarm information.
  • the change risk assessment method provided in this embodiment can be applied to operation and maintenance scenarios of changes in cloud networks.
  • the cloud network may refer to a software-defined computing network in a cloud computing infrastructure. Of course, the definition here is only in a narrow sense.
  • the cloud network in this embodiment may generally refer to a network architecture built based on cloud technology. In this embodiment, operations such as code modification, configuring new functions, and bug fixes performed on functional components that occur in the cloud network can be called changes.
  • the change risk assessment method provided in this embodiment can be applied to the testing phase before the change is launched to replace the existing testing method. It can also be used to continue tracking and evaluating the change after the change is launched to timely discover undetected defects in the testing phase. Change risks, thereby effectively avoiding cloud failures caused by changes.
  • the target change event in this embodiment can be used to refer to any change that occurs in the cloud network.
  • a change object is usually specified in a change.
  • a certain gateway component in the cloud network can be specified. as the object of change.
  • the above-mentioned repair Gateway component A and switch component B can be specified in the buge class change, and gateway component A and switch component B are located in the same availability zone (level object).
  • this is only preferred, and this embodiment is not limited to this.
  • Multiple change objects in the same change event may not have the above restrictions.
  • it is only necessary to perform the above steps within the change event. It is required to group the change objects, analyze the change risks by groups, and then synthesize the analysis results to achieve risk assessment of the change events.
  • the above-mentioned priority implementation method will be used by default to define the target change event in the following article.
  • the risk assessment instruction can be triggered periodically or according to other trigger conditions.
  • the change risk assessment logic shown in Figure 1a will be continuously executed during the duration of the change event, so that Track and evaluate change events, for example, every 10 seconds.
  • the end time of the tracking assessment can be set according to timing needs. For example, it can be set to stop the assessment after a preset time period after the change event begins, or when the risk assessment result corresponding to the change event is lower than the preset standard and has continued. Stop the evaluation after a preset time period, etc. This embodiment does not limit this.
  • the change impact scope corresponding to the target change event can be reasonably expanded in response to the risk assessment instruction.
  • the expansion of the impact scope of the change refers to extending the impact scope of the target change event from the change object to a larger scope.
  • the change object is a physical gateway
  • the original change impact scope of the target change event is a device in the cloud network. It can be gradually extended to the cluster to which the device belongs, to the availability zone to which the device belongs, and to the region to which the device belongs. region etc.
  • the scope of change influence of the target change event can be expanded, and a large enough observation range is provided for the target change event. Observing the target change event within a large enough observation range can effectively improve the accuracy of risk assessment.
  • multi-level objects can be divided into the cloud network, and the topology information in the cloud network can be predefined based on the multi-level objects.
  • multiple-level objects can include but are not limited to instances, network elements, applications, devices, clusters, availability zones, regions, etc.
  • topology information can include affiliation relationships between objects at different levels and associations between objects at the same level.
  • An exemplary affiliation relationship can be that the device belongs to the cluster, the cluster belongs to the availability zone, and the availability zone belongs to the region;
  • an exemplary association relationship between objects of the same level can be that there may be resource association relationships between different instances or There may be resource associations between instances and applications. This embodiment does not limit the specific relationship logic contained in the topology information.
  • the specifications of objects at different levels are different.
  • the specifications of a region are larger than the availability zone.
  • the specifications of objects of the same level can be the same or similar.
  • instances and applications belong to objects of the same level.
  • this embodiment does not limit this.
  • Based on the topology information in the cloud network a change object or an alarm occurrence object can be expanded to a larger scope of influence, including expansion within the same level and expansion to a higher level.
  • the level objects to which the change object belongs can be searched level by level according to the topology information preset in the cloud network, so as to obtain and change At least one layer of superior objects corresponding to the object; it can also search for objects of the same level that are associated with the change object; thus, the determined change impact scope can include at least one layer of superior objects corresponding to the change object of the target change event and the associated same-level objects. level object.
  • the object of the specified level can be used as the end condition.
  • the level objects to which the change object belongs can be searched level by level until the specified level is found. End the search after the object to obtain at least one layer of superior objects corresponding to the change object of the target change event.
  • the object at the specified level may be a region.
  • this is only exemplary and is not limited in this embodiment.
  • the obtained change impact scope can be expanded to record multiple levels of objects that may be affected by the target change event.
  • Figure 1b is a logical schematic diagram of a change risk assessment solution provided by an exemplary embodiment of the present application.
  • the change event can be obtained from the change system in the cloud network, and the change object will be specified in the change event.
  • this embodiment also innovatively proposes to introduce alarm information as the basis for change risk assessment.
  • Mature monitoring systems are usually deployed in cloud networks.
  • the monitoring system is used to monitor the operating status of various points in the cloud network, such as monitoring traffic status, packet loss status, delay status, etc. A large number of data will be generated in the monitoring system.
  • Alarm information In this embodiment, alarm information generated by the monitoring system in the cloud network can be collected, and these alarm information can be used as the basis for change risk assessment.
  • the change risk assessment work can be triggered periodically or in the form of other trigger conditions.
  • a risk assessment instruction is triggered, in this embodiment, new data added in the cloud network after the last risk assessment can be collected The alarm information will be used as the basis for this change risk assessment.
  • monitoring systems in cloud networks usually adopt a single-point monitoring method.
  • the monitoring objects are usually at the instance, device, network element, application, etc. levels.
  • the alarm occurrence objects in the alarm information are usually A la carte.
  • step 101 multiple alarm information generated in the cloud network within a preset time range after the target change event occurs may be collected.
  • the alarm impact scope corresponding to the multiple alarm information can be determined according to the topology information.
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and the associated same-level objects. object.
  • the alarm information may include information such as the alarm occurrence object, alarm level, and alarm description content.
  • the alarm occurrence object refers to the object in the cloud network where abnormal conditions occur, and the abnormal conditions on the objects trigger the monitoring system to send out alarm information.
  • the level objects of alarm occurrence objects in different alarm information may be different.
  • the level objects of alarm occurrence objects may include but are not limited to the aforementioned instance, device, network element, application and other levels.
  • the original scope of influence corresponding to the alarm information can be expanded so that the alarm information can cover a larger scope of influence, which can assign the alarm information to the expanded scope of influence.
  • the impact scope of the change may be across components.
  • the abnormalities caused by the change may only appear at a few points, while the abnormalities caused by it at other points may not yet be seen.
  • the alarm discovered at a single point is reasonably expanded to a larger scope by extending the scope of influence of the alarm information, so as to fully discover the potential potential that has not yet appeared in the expanded alarm scope. Alarms, and these potential alarms that have not yet appeared can be fully involved in the change risk assessment process, providing a more comprehensive basis for the change risk assessment work.
  • step 100 and step 102 may be performed synchronously, and the order is not limited in this embodiment.
  • the logic of expanding the alarm impact scope is basically similar to the logic of expanding the change impact scope. The details of the expansion operation of the alarm impact scope will not be repeated here.
  • step 103 at least one piece of target alarm information adapted to the target change event can be selected from a plurality of alarm information.
  • the change impact scope and the alarm impact scope can be overlapped and analyzed, and the alarm information corresponding to the alarm impact scope that overlaps with the change impact scope can be used as the target alarm information. That is to say, the alarm impact scope of the target alarm information needs to overlap with the change impact scope of the target change event, and the overlapping part contains at least one level object.
  • the observation range of the target change event is expanded, and the coverage of the alarm information is expanded.
  • this embodiment describes the matching process in step 102 from the perspective of target change events, but it should be understood that this embodiment does not limit the primary and secondary roles of alarm information and change events in the matching process. , you can search for target alarm information from multiple alarm information from the perspective of each change event, or you can search for matching change events from multiple change events from the perspective of each alarm information, and the alarm information naturally becomes The target alarm information corresponding to the matched change event. Moreover, the matching operation may be synchronous or, of course, asynchronous, which is not limited in this embodiment.
  • a risk assessment can be performed on the target change event based on the at least one piece of target alarm information.
  • whether there is a risk in the target change event can be assessed by analyzing at least one target alarm information.
  • multiple implementation methods can be used to perform change risk assessment on target change events based on target alarm information. The specific implementation methods will be described in detail in subsequent embodiments.
  • the change impact scope of the target change event can be reasonably expanded based on the topology information preset in the cloud network. This can ensure that the observation scope of the change risk assessment work is large enough and help improve the accuracy of the risk assessment.
  • Alarm information can also be introduced as the basis for change risk assessment. By collecting alarm information generated in the cloud network and reasonably expanding the alarm impact scope of each alarm information based on topology information, potential alarms that have not yet appeared in the cloud network can be discovered in a timely manner. , and fully participate in these potential alarms in the change risk assessment process; on this basis, you can also find target alarm information that matches the target change event by judging whether there is overlap between the alarm impact scope and the change impact scope.
  • the alarm information can cover the entire network, through the alarm information matching scheme proposed in this embodiment, the cross-component alarm information with the changed object can be introduced into the risk assessment work of the target changed object. This It can effectively solve the current dilemma of being unable to evaluate changes across components (usually different components are responsible for different departments. Currently, change testing is usually only carried out by the department responsible for the change object, and the departments responsible for other related components are not even aware of the change. ).
  • the topology information in the cloud network may adopt a tree structure, that is, the topology information in the cloud network may be represented in the form of a topology tree.
  • the topology tree can follow a hierarchical structure, and objects of different levels are reasonably distributed in each layer of the topology tree.
  • the upper layer of the device class object can be a cluster class object
  • the upper layer can be an availability zone class object
  • the upper layer can be a region class object.
  • a solution for determining the scope of change influence may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized with the change object as the root.
  • Figure 2a is a logical schematic diagram for determining the impact scope of a change provided by an exemplary embodiment of the present application. Referring to Figure 2a, the change object in the target change event is the AVS device.
  • the AVS device can be extended level by level to the AVS cluster, availability zone and region to which it belongs, thereby obtaining the scope of change and also That is the changed topology tree on the far right in Figure 2a.
  • Figure 2a only shows the topology tree structure when the change object is a device.
  • the initial level object to which the change object belongs is other objects, it can be adaptively expanded according to the topology information preset in the cloud network.
  • a solution for determining the scope of alarm impact may be: according to the tree structure corresponding to the topology information, at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level are organized with the alarm occurrence object as the root.
  • the alarm topology tree of the node to represent the alarm impact scope.
  • Figure 2b is a logical schematic diagram for determining the impact scope of an alarm provided by an exemplary embodiment of the present application.
  • the first alarm information is an EIP instance alarm. In this case, the alarm in the first alarm information can be found first.
  • the object that is, the EIP instance, and based on the topology information preset in the cloud network, find the ECS instance that has a resource association with the EIP instance; then, you can find the XGW cluster that hosts the EIP instance and the AVS device that hosts the ECS instance. Then, in a similar manner to Figure 2a, continue to extend the XGW cluster level by level to the availability zone and region to which it belongs, and extend the AVS device to the AVS cluster, availability zone and region to which it belongs level by level, so as to obtain the alarm impact.
  • the scope is the alarm topology tree on the far right side in Figure 2b.
  • multi-level objects are divided into the cloud network, and multiple actual objects may exist under each level object.
  • instance 1 and instance 2 may exist under the instance class level object.
  • instance 3 and other instances and there can be several availability zones such as availability zone A, availability zone B and so on under the availability zone class level object.
  • the change object in the process of constructing the change topology tree for the target change event, it is only necessary to place the change object in the topology information in the cloud network according to the level object to which it belongs. Based on the topology information, the change object can be determined and changed. Actual objects under objects at other levels that have topological relationships.
  • both the change impact scope and the alarm impact scope are represented by a topology tree
  • the process of selecting at least one target alarm information adapted to the target change event from multiple alarm information if the alarm topology tree and the change alarm If there are overlapping tree nodes between trees, it can be determined that the corresponding alarm impact scope and the change impact scope overlap.
  • FIG. 3 is a logical schematic diagram of a solution for selecting target alarm information for a target change event provided by an exemplary embodiment of the present application.
  • the left side is the change topology tree corresponding to the target change event
  • the right side is the alarm topology tree corresponding to the two alarm information. It can be seen that both alarm topology trees have overlapping tree nodes with the change topology tree. Therefore, Both alarm information in Figure 3 can be determined as target alarm information corresponding to the target change event. In this way, through the topology tree, the adapted target alarm information for the target change event can be determined conveniently, quickly and accurately.
  • a topology tree can be used to represent the change impact scope and alarm impact scope, which can not only clearly and comprehensively present at least one level of objects expanded by the impact scope expansion operation, but also present the extended Topological information such as affiliation and resource association between the exported object and the changed object.
  • other methods can be used to characterize the change impact range and alarm impact range.
  • the data structure of [first-level objects, second-level objects...; topological information between level objects] can be used to represent
  • the representation may also be represented by a collection+tag method, etc. This embodiment is not limited to this.
  • FIG. 4 is a schematic diagram of the effect of a modified impact scope after matching provided by an exemplary embodiment of the present application.
  • at least one target alarm information will be associated with the change impact scope corresponding to the target change event.
  • at least one target alarm information will be associated with the minimum overlap in the change impact scope. level object.
  • the AVS device 2 node and AVS device 3 node in the changed topology tree are also associated with alarm information; there is also a P1 level target alarm information and the smallest level object that overlaps with the changed topology tree is AVS Cluster 1, then the target alarm information can be associated with the AVS cluster 1 node in the changed topology tree.
  • various implementation methods may be used to perform risk assessment on target change events based on at least one piece of target alarm information.
  • the alarm level recorded in at least one target alarm information can be obtained; based on the respective alarm levels corresponding to the at least one target alarm information, the risk assessment value corresponding to the target change event is calculated; if the risk assessment If the value meets the preset conditions, it is determined that there is a risk in the target change event.
  • the alarm level is the existing information in the alarm information, which is used to represent the severity, impact, etc. of the corresponding abnormal event.
  • the alarm level can be extracted from at least one target alarm information, and the extracted alarm level can be used as the basis for change risk assessment. This makes the calculation logic of risk assessment values more concise and clever.
  • the risk assessment value can be used to characterize the risk level of the target change event.
  • the higher the risk assessment value the higher the risk level of the target change event, and the higher the possibility and severity of the failure it may cause to the cloud.
  • the degree may also be higher.
  • the degree of correlation between each of the at least one piece of target alarm information and the target change event can be determined. ; Assign a weight to at least one target alarm information according to the degree of correlation; Calculate the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one target alarm information.
  • the degree of correlation is used to represent the degree of fit between the alarm impact scope and the change impact scope.
  • the target alarm information is assigned a higher degree of correlation.
  • the smallest level object in the overlap between the change influence scope and the alarm influence scope corresponding to at least one target alarm information can be searched; according to the smallest level object corresponding to at least one target alarm information, to the change object
  • the level distance between them is used to assign a correlation degree to at least one target alarm information.
  • the level distance is essentially the number of levels in the affiliation relationship between the smallest level object and the change object.
  • the lowest tree node in the overlapping portion between the change impact scope and the alarm impact scope corresponding to at least one target alarm information can be searched, where a single tree
  • the node corresponds to a level object; based on the topological distance between the lowest tree node corresponding to at least one target alarm information and the change object in the change topology tree that represents the scope of the change, determine the level distance corresponding to at least one target alarm information.
  • the corresponding lowest tree node has the same level distance between the target alarm information with the same position in the topology tree that represents the scope of the change.
  • the lowest tree node overlapping between the alarm topology tree on the upper right and the change topology tree on the left is AVS device 2; while the lowest tree node overlapping between the alarm topology tree on the lower right and the change topology tree on the left is available.
  • the target alarm information corresponding to the alarm topology tree on the upper right will obtain a higher degree of correlation than the target alarm information corresponding to the alarm topology tree on the lower right.
  • level distance can be used to express the degree of correlation.
  • the level distance between the alarm topology tree on the upper right and the changed topology tree is 1, so its corresponding target alarm can be assigned a correlation degree of 1; If the level distance between the alarm topology tree and the changed topology tree is 2, then its corresponding target alarm can be assigned a correlation level of 2.
  • a weight can be assigned to at least one target alarm information according to the degree of correlation.
  • target alarm information with a higher degree of correlation can be assigned a higher weight to reflect its higher reference role for target change events.
  • target alarm information with the same degree of correlation can be assigned the same weight.
  • the weight can be calculated according to the following formula:
  • p is the above-mentioned correlation degree (which can be represented by topological distance)
  • ⁇ and ⁇ are empirical parameters
  • q is the alarm proportion in the level object corresponding to the current level distance.
  • the alarm ratio is used to represent the proportion of the number of objects matching alarm information within the same level within the scope of change to the total number of objects within that level.
  • the alarm ratio of this layer can be 1 (also That is 100%); for the same reason, the three AVS devices under AVS cluster 1 in the previous level also all match the alarm information, so the alarm ratio can also be 1; and the availability zone a in the previous level If only AVS cluster 1 matches the alarm information, then the alarm ratio of availability zone a can be 1/21 (that is, among the 21 clusters in the availability zone, only 1 matches the alarm information), and then go up to the next level.
  • the alarm ratio in inner areas can be 1/5.
  • the initial risk value assigned to the corresponding alarm level of the at least one target alarm information can be obtained ; Based on the corresponding weight of at least one piece of target alarm information, perform a weighted sum of the initial risk values corresponding to at least one piece of target alarm information; and determine the risk assessment value corresponding to the target change event based on the result of the weighted sum.
  • an exemplary solution for assigning initial risk values to different alarm levels may be: determining the basic risk values corresponding to different alarm levels; counting the historical frequency of occurrence of different alarm levels in the cloud network; based on different alarm levels.
  • the corresponding historical frequencies of each alarm level are assigned adjustment coefficients for different alarm levels; under different alarm levels, the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
  • a higher adjustment coefficient can be assigned to them, so that their initial risk values are higher and their impact on the final risk assessment value will be greater.
  • the basic risk value is fine-tuned by taking into account the historical frequency of occurrence of different alarm levels in the cloud network.
  • the initial risk value of the corresponding alarm level of at least one target alarm information can be obtained.
  • v represents the risk assessment value of the target change event
  • v p represents the sum of risk assessment values caused by all target alarm information with a topological distance of p
  • p represents the aforementioned topological distance
  • f(p) represents the target with a correlation degree of p.
  • x represents the initial risk value corresponding to each target alarm information with a correlation degree of p. It can be seen that the risk assessment value of the target change event is equal to the weighted sum of the initial risk values of all target alarm information plus the initial risk value of all target alarm information.
  • the target alarm information that matches the target change event that occurs in the cloud network can be used as the basis for risk assessment, and each direction can also be classified according to the degree of correlation between the target alarm information and the target change event.
  • the degree of participation of target alarm information in the process of calculating risk assessment values can be comprehensively considered to avoid one-sided judgments about the risk of target change events due to a small amount of target alarm information.
  • alarm information in the cloud network may be caused by user behavior, but these alarm information are difficult to accurately eliminate.
  • the alarm information caused by these user behaviors participates in the calculation of risk assessment values During the process, the alarm information caused by user behavior is usually local and temporary. Therefore, its participation in the risk assessment value calculation process is not too strong, which invisibly affects the final risk assessment. The influence of value has been weakened, which can effectively avoid the problem of misjudgment of risk assessment caused by user behavior.
  • the above implementation method can associate target alarm information to target change events, and comprehensively consider the degree of influence that each target alarm information should play in the risk assessment value through various dimensions such as alarm proportion, weight, adjustment coefficient, and initial risk value. , so as to reasonably analyze the target alarm information to obtain the risk assessment value.
  • a risk threshold can also be set, and the aforementioned preset condition is set to exceed the risk threshold.
  • the risk assessment value calculated for the target change event exceeds the risk threshold, it can be determined Target change events are risky.
  • a reminder notification can be issued; the reminder notification can be output to the operation and maintenance personnel for the operation and maintenance personnel to confirm the handling plan for the target change event, for example, it can be to suspend the change or modify the change online, etc.
  • this embodiment is not limited here.
  • An exemplary solution for determining the risk threshold may be: continuously collect risk assessment values calculated for historical change events that occur in the cloud network as assessment value samples; and compare the collected values according to the number of times different risk assessment values have been recorded. Distribution fitting is performed on the evaluation value samples to obtain the fitting function; based on the fitting function, the risk threshold is selected.
  • Figure 5 is a logical schematic diagram of a risk threshold determination scheme provided by an exemplary embodiment of the present application.
  • the risk assessment value in the assessment value sample can be used as the X-axis, and each risk assessment value involved in the assessment value sample The number of times recorded is the Y-axis, the distribution data of the risk assessment value is obtained, and the corresponding distribution fitting function is generated.
  • the evaluation value samples are distributed and sorted based on the fitting function; the evaluation value samples after distribution sorting are selected to match the preset false alarm rate target evaluation value sample; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold; if the number of evaluation value samples is lower than the specified number, the evaluation value samples will be distributed and sorted based on the fitting function; sorted from the distribution Select the target evaluation value sample that is adapted to the cumulative probability of the preset distribution among the evaluation value samples; use the risk evaluation value corresponding to the target evaluation value sample as the risk threshold.
  • the evaluation value samples in Figure 5 are less than 500 (corresponding to the specified number mentioned above). If the evaluation value samples are 100, and the cumulative probability of the preset distribution is 99%, the risk threshold is calculated to be 9.4. In this way, If the risk assessment value of a target change event is higher than 9.4, it will be deemed to be a risk.
  • the risk assessment value corresponding to the target change event can be calculated simply, efficiently, and accurately based on the target alarm information, and whether there is a risk in the target change event by judging whether the risk assessment value exceeds the risk threshold. In this way, the risk of change events can be discovered in a timely manner during the change testing phase or the operation phase after the change is launched, thereby effectively avoiding the failures that the change may bring to the cloud.
  • each step of the method provided in the above embodiments may be the same device, or the method may also be executed by different devices.
  • Some of the processes described in the above embodiments and drawings include multiple operations that appear in a specific order, but it should be clearly understood that these operations may not be performed in the order in which they appear in this article or may be performed in parallel.
  • the operations The serial numbers such as 101, 102, etc. are only used to distinguish different operations. The serial numbers themselves do not represent any execution order. Additionally, these processes may include more or fewer operations, and the operations may be performed sequentially or in parallel.
  • FIG. 6 is a schematic structural diagram of a computing device provided by another exemplary embodiment of the present application. As shown in FIG. 6 , the computing device includes: a memory 60 and a processor 61 .
  • the processor 61 is coupled to the memory 60 and is used to execute the computer program in the memory 60 for:
  • the change impact scope In response to the risk assessment instructions, determine the change impact scope corresponding to the target change event based on the preset topology information in the cloud network.
  • the topology information includes the affiliation relationships between objects at different levels in the cloud network and the association relationships between objects at the same level.
  • the change impact scope includes at least one layer of superior objects corresponding to the change object of the target change event and associated objects of the same level;
  • the alarm impact scope includes at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects of the same level;
  • the processor 61 in the process of determining at least one layer of superior objects corresponding to the change objects of the target change event included in the change impact scope, is configured to:
  • the topology information search the level objects to which the change object belongs level by level until the object of the specified level is found and end the search to obtain at least one layer of superior objects corresponding to the change object of the target change event;
  • the level objects to which the alarm occurrence object belongs are searched level by level until the object of the specified level is found and the search is terminated to obtain at least one layer of superior objects corresponding to the alarm occurrence object of the alarm information.
  • the topology information adopts a tree structure
  • the processor 61 can also be used to:
  • At least one layer of superior objects corresponding to the change object of the target change event and the associated objects of the same level are organized into a change topology tree with the change object as the root node to represent the scope of change influence;
  • At least one layer of superior objects corresponding to the alarm occurrence object of the alarm information and associated objects at the same level are organized into an alarm topology tree with the alarm occurrence object as the root node to represent the scope of alarm influence.
  • the processor 61 in the process of risk assessment of a target change event based on at least one piece of target alarm information, is configured to:
  • the risk assessment value is higher than the risk threshold, it is determined that the target change event is risky.
  • the processor 61 in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level of at least one item of target alarm information, is used to:
  • the processor 61 in the process of determining the degree of correlation between at least one piece of target alarm information and the target change event, is configured to:
  • a correlation degree is assigned to at least one piece of target alarm information based on the level distance between the minimum level object corresponding to each of the at least one piece of target alarm information and the change object.
  • the processor 61 in the process of searching for the smallest level object in the overlap between the change impact scope and the alarm impact scope corresponding to at least one piece of target alarm information, is used to:
  • the change impact scope and the alarm impact scope are represented by a topology tree, search for the lowest tree node in the overlap between the change impact scope and the alarm impact scope corresponding to at least one target alarm information, where a single tree node corresponds to a level object. ;
  • the process of determining level distance is used for:
  • the processor 61 in the process of calculating the risk assessment value corresponding to the target change event based on the corresponding alarm level and weight of at least one piece of target alarm information, is used to:
  • the processor 61 in the process of assigning initial risk values to different alarm levels, is used to:
  • the corresponding basic risk values are weighted according to the corresponding adjustment coefficients to obtain the initial risk values corresponding to different alarm levels.
  • the processor 61 can also be used to:
  • the risk threshold is selected.
  • the computing device also includes: a communication component 62, a power supply component 63 and other components. Only some components are schematically shown in FIG. 6 , which does not mean that the computing device only includes the components shown in FIG. 6 .
  • embodiments of the present application also provide a computer-readable storage medium storing a computer program.
  • the computer program When executed, it can implement each step that can be executed by a computing device in the above method embodiment.
  • the memory in Figure 6 above is used to store computer programs, and can be configured to store various other data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable memory Read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable memory Read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory
  • flash memory magnetic or optical disk.
  • the communication component in Figure 6 mentioned above is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • a power component in Figure 6 above provides power to various components of the device where the power supply component is located.
  • a power component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power component resides.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Les modes de réalisation de la présente invention concernent un procédé et un appareil de risque de changement, et un support de stockage. La plage d'influence de changement d'un événement de changement cible peut être étendue rationnellement pour pouvoir garantir que la plage d'observation d'une tâche d'évaluation de risque de changement soit suffisamment grande, ce qui permet d'améliorer la précision d'une évaluation de risque ; des informations d'alarme peuvent également être introduites pour servir de base pour une évaluation de risque de changement, la plage d'influence d'alarme de chaque élément d'information d'alarme est étendue rationnellement, de telle sorte que des alarmes potentielles qui n'ont pas encore apparu dans un réseau en nuage puissent être découvertes de manière opportune, et ces alarmes potentielles sont totalement impliquées dans un processus d'évaluation de risque de changement ; et des informations d'alarme cible qui correspondent à l'événement de changement cible peuvent également être découvertes par la détermination du fait qu'il existe ou non une partie en chevauchement entre la plage d'influence d'alarme et la plage d'influence de changement, de sorte que la plage d'influence de changement étendue peut être corrigée pour devenir plus précise, et les informations d'alarme requises peuvent être détectées de manière précise et complète pour le calcul d'une valeur de risque de changement. Le risque de l'apparition d'un changement peut donc être évalué de manière efficace et précise.
PCT/CN2023/089099 2022-04-27 2023-04-19 Procédé et appareil d'évaluation de risque de changement, et support de stockage WO2023207689A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210459479.X 2022-04-27
CN202210459479.XA CN115102834B (zh) 2022-04-27 2022-04-27 一种变更风险评估方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023207689A1 true WO2023207689A1 (fr) 2023-11-02

Family

ID=83287651

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089099 WO2023207689A1 (fr) 2022-04-27 2023-04-19 Procédé et appareil d'évaluation de risque de changement, et support de stockage

Country Status (2)

Country Link
CN (1) CN115102834B (fr)
WO (1) WO2023207689A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102834B (zh) * 2022-04-27 2024-04-16 浙江大学 一种变更风险评估方法、设备及存储介质
CN116977062B (zh) * 2023-08-04 2024-01-23 江苏臻云技术有限公司 一种用于金融业务的风险标签管理系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178038A1 (en) * 2015-12-22 2017-06-22 International Business Machines Corporation Discovering linkages between changes and incidents in information technology systems
CN107124299A (zh) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 基于资源拓扑的资源预警方法及系统
CN113313419A (zh) * 2021-06-23 2021-08-27 中国农业银行股份有限公司 信息系统窗口变更风险获取方法和装置
CN113450033A (zh) * 2021-09-02 2021-09-28 广州嘉为科技有限公司 一种基于cmdb的变更影响分析方法及管理设备
CN113792554A (zh) * 2021-09-18 2021-12-14 中国建设银行股份有限公司 一种基于知识图谱的变更影响评估方法和装置
CN115102834A (zh) * 2022-04-27 2022-09-23 浙江大学 一种变更风险评估方法、设备及存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546274A (zh) * 2010-12-20 2012-07-04 中国移动通信集团广西有限公司 一种通信业务中的告警监控方法及设备
CN103856344B (zh) * 2012-12-05 2017-09-15 中国移动通信集团北京有限公司 一种告警事件信息处理方法及装置
CN104125217A (zh) * 2014-06-30 2014-10-29 复旦大学 一种基于主机日志分析的云数据中心实时风险评估方法
CN106209829A (zh) * 2016-07-05 2016-12-07 杨林 一种基于告警策略的网络安全管理系统
CN107204876B (zh) * 2017-05-22 2020-09-29 成都网络空间安全技术有限公司 一种网络安全风险评估方法
CN108108902B (zh) * 2017-12-26 2021-06-29 创新先进技术有限公司 一种风险事件告警方法和装置
US11734636B2 (en) * 2019-02-27 2023-08-22 University Of Maryland, College Park System and method for assessing, measuring, managing, and/or optimizing cyber risk
CN114338435B (zh) * 2020-09-24 2024-02-09 腾讯科技(深圳)有限公司 网络变更监控方法、装置、计算机设备和存储介质
CN112329022A (zh) * 2020-11-11 2021-02-05 浙江长三角车联网安全技术有限公司 一种智能网汽车信息安全风险评估方法及系统
CN112446640A (zh) * 2020-12-10 2021-03-05 中国农业银行股份有限公司 信息系统变更风险评估方法、相关设备及可读存储介质
CN112540905A (zh) * 2020-12-18 2021-03-23 青岛特来电新能源科技有限公司 一种微服务架构下系统风险评估方法、装置、设备及介质
CN112559023A (zh) * 2020-12-24 2021-03-26 中国农业银行股份有限公司 一种变更风险的预测方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178038A1 (en) * 2015-12-22 2017-06-22 International Business Machines Corporation Discovering linkages between changes and incidents in information technology systems
CN107124299A (zh) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 基于资源拓扑的资源预警方法及系统
CN113313419A (zh) * 2021-06-23 2021-08-27 中国农业银行股份有限公司 信息系统窗口变更风险获取方法和装置
CN113450033A (zh) * 2021-09-02 2021-09-28 广州嘉为科技有限公司 一种基于cmdb的变更影响分析方法及管理设备
CN113792554A (zh) * 2021-09-18 2021-12-14 中国建设银行股份有限公司 一种基于知识图谱的变更影响评估方法和装置
CN115102834A (zh) * 2022-04-27 2022-09-23 浙江大学 一种变更风险评估方法、设备及存储介质

Also Published As

Publication number Publication date
CN115102834A (zh) 2022-09-23
CN115102834B (zh) 2024-04-16

Similar Documents

Publication Publication Date Title
US11082285B2 (en) Network event grouping
WO2023207689A1 (fr) Procédé et appareil d'évaluation de risque de changement, et support de stockage
JP7145764B2 (ja) 人工知能に基づくネットワークアドバイザー
CN106886485B (zh) 系统容量分析预测方法及装置
US20180349797A1 (en) Data driven methods and systems for what if analysis
CN110519365B (zh) 一种变更设备业务的方法和业务变更系统
US10965541B2 (en) Method and system to proactively determine potential outages in an information technology environment
US20170068581A1 (en) System and method for relationship based root cause recommendation
US10909018B2 (en) System and method for end-to-end application root cause recommendation
US11500735B2 (en) Dynamic optimization of backup policy
WO2021213247A1 (fr) Procédé et dispositif de détection d'anomalies
CN106776288B (zh) 一种基于Hadoop的分布式系统的健康度量方法
RU2716029C1 (ru) Система мониторинга качества и процессов на базе машинного обучения
Xu et al. Lightweight and adaptive service api performance monitoring in highly dynamic cloud environment
CN105184886A (zh) 一种云数据中心智能巡检系统及方法
CN114064196A (zh) 用于预测性保障的系统和方法
JP6252309B2 (ja) 監視漏れ特定処理プログラム,監視漏れ特定処理方法及び監視漏れ特定処理装置
CN114490303A (zh) 故障根因确定方法、装置和云设备
CN114676002A (zh) 基于phm技术的系统运维方法及装置
US11558271B2 (en) System and method of comparing time periods before and after a network temporal event
CN117172721B (zh) 用于融资业务的数据流转监管预警方法及系统
CN113572633B (zh) 根因定位方法、系统、设备及存储介质
Yu et al. Predicting gray fault based on context graph in container-based cloud
US11886451B2 (en) Quantization of data streams of instrumented software and handling of delayed data by adjustment of a maximum delay
US11985048B2 (en) Computerized system and method for an improved self organizing network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795122

Country of ref document: EP

Kind code of ref document: A1