CN114785673B - Method and device for acquiring abnormal information during active-standby switching - Google Patents

Method and device for acquiring abnormal information during active-standby switching Download PDF

Info

Publication number
CN114785673B
CN114785673B CN202210451829.8A CN202210451829A CN114785673B CN 114785673 B CN114785673 B CN 114785673B CN 202210451829 A CN202210451829 A CN 202210451829A CN 114785673 B CN114785673 B CN 114785673B
Authority
CN
China
Prior art keywords
abnormal information
main
switching
standby
captured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210451829.8A
Other languages
Chinese (zh)
Other versions
CN114785673A (en
Inventor
鲍佳鹏
刘书超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN202210451829.8A priority Critical patent/CN114785673B/en
Publication of CN114785673A publication Critical patent/CN114785673A/en
Application granted granted Critical
Publication of CN114785673B publication Critical patent/CN114785673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/70Virtual switches

Abstract

The present disclosure relates to a method and an apparatus for obtaining abnormal information during active-standby switching in a multi-master VSM environment, where the method includes: presetting the number of main equipment switching times; restarting the first main control board, capturing abnormal information of the first main control board in the restarting process and the abnormal information of the interface board and the exchange network board in the starting process, collecting the abnormal information after the abnormal information is captured, stopping main/standby switching when judging that the abnormal information is a hanging problem, and positioning the abnormal information; restarting the second main control board, wherein the method for acquiring the abnormal information in the restarting process of the second main control board is consistent with the method for acquiring the abnormal information in the restarting process of the first main control board; the main frame and the standby frame are switched, the main frame is restarted, and the method for acquiring the abnormal information in the restarting process of the main frame is consistent with the method for acquiring the abnormal information in the restarting process of the main control board; and repeating the steps until the abnormal information is captured, and stopping the active/standby switching when the abnormal information is the problem of suspension or stopping the active/standby switching when the actual active/standby switching times are equal to the active/standby switching times.

Description

Method and device for acquiring abnormal information during active-standby switching
Technical Field
The disclosure relates to the technical field of VSM, and in particular relates to a method and a device for acquiring abnormal information during active-standby switching in a multi-master VSM environment.
Background
As the scale of networks continues to expand, the complexity of configuration and maintenance increases substantially. The VSM (Virtual Switching Matrix ) technology is a virtualization technology for virtualizing a plurality of L2-L7 layer physical devices into one logical device for management and use. By means of VSM technology, networking complexity can be greatly simplified, network reliability is improved, and meanwhile the network is easier to configure and maintain.
The devices in the VSM system are divided into a main member device and a standby member device, only one main member device can exist in one VSM at the same time, other member devices are standby member devices, and the main member device and the standby member device are generated by role election. Specifically, the master member device is responsible for managing and controlling the whole VSM system, uniformly distributing all configuration information of the VSM to all standby member devices, uniformly maintaining and managing state information of a data link layer and an upper protocol state machine running in the VSM, and synchronizing the information to the standby member devices; the backup member device is controlled and managed by the main member device, operates as a backup device of the main member device, and can also forward data service. When the primary member device fails, the system automatically elects a new primary member device from the backup member devices to take over the primary member device.
High-end switches are not allowed to occur as a single point of failure due to the very important network location. High-end switches are usually equipped with two main control boards, respectively called a main control board and a standby main control board. The main control board is used as a core of a control plane, and performs service communication with the outside and the service board to complete the normal functions of each module in the system; the standby main control board is only used as a backup of the main control board, and is not communicated with the outside and the service board. When the main control board fails, the system automatically performs main-standby switching, and the standby main control board takes over the work of the main control board, so that the normal operation of the service is ensured.
In order to improve the reliability of the device, during the test process, we often need to verify whether the master control can be started normally or not and whether the functions are normal or not after the active/standby switching. Although there are designs of automatic restarting of the device, manual operation is still required for active/standby switching in VSM environment, especially active/standby switching in multi-master VSM environment; the starting of the common equipment, especially the frame equipment, requires a relatively long time, and if the main/standby switching is performed manually, a great amount of time of a tester is required, so that the working efficiency is affected; in addition, the existing scheme cannot acquire abnormal information when the equipment exchange network board and the interface board are started.
Therefore, a method and apparatus for acquiring abnormal information during active-standby switching in an automatically performed multi-master VSM environment without manual operation are needed.
Disclosure of Invention
In view of this, the present disclosure provides a method and apparatus for obtaining abnormal information during active-standby switching in a multi-master VSM environment.
According to an aspect of the present disclosure, a method for acquiring abnormal information during active-standby switching in a multi-master VSM environment is provided, where the method includes: presetting the number of main equipment switching times; restarting a first main control board of a main frame, capturing abnormal information of the first main control board in the restarting process and the interface board and the exchange network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hang-up problem abnormal information; the second main control board is switched for the second time, the second main control board of the main frame is restarted, the abnormal information of the second main control board in the restarting process and the abnormal information of the interface board and the exchange network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem abnormal information, the main/standby switching is stopped, and the captured abnormal information is positioned; the method comprises the steps of switching main frames and standby frames, restarting the main frames, capturing abnormal information of the main frames in the restarting process and the interface board and the exchange network board of the main frames in the starting process, collecting the captured abnormal information after the abnormal information is captured, stopping the main and standby switching and positioning the captured abnormal information when the captured abnormal information is judged to be the hanging-up problem abnormal information; and repeating the primary and secondary main and standby control board switching and the secondary main and standby control board switching to finish primary and standby switching once until the main and standby switching is stopped or the actual main and standby switching times are equal to the main and standby switching times when the acquired abnormal information is the hang-up problem abnormal information.
According to the method for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the method further comprises the following steps: and processing the abnormal information, extracting keywords from the collected abnormal information, matching the collected abnormal information in an abnormal information base containing historical abnormal information based on the extracted keywords, returning the abnormal information associated with the collected abnormal information and personnel information corresponding to the associated abnormal information if the matching is successful, and adding the collected abnormal information into the abnormal information base as newly discovered abnormal information if the matching is unsuccessful.
According to the method for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the method further comprises the following steps: and when the captured abnormal information is judged to be the non-hanging-up problem abnormal information, collecting the captured abnormal information and carrying out abnormal information processing on the captured abnormal information.
According to the method for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the method further comprises the following steps: when judging that the first main/standby control board switching, the second main/standby control board switching or the main/standby frame switching is completed and no abnormal information is captured, displaying the restarting times passed by the main/standby switching process and the system time when the main/standby switching is completed on the screen.
According to the method for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the number of active-standby switching times can be limited or infinite.
According to another aspect of the present disclosure, there is provided a device for acquiring abnormal information during active/standby switching in a multi-master VSM environment, the device including: the main/standby switching frequency presetting component is used for presetting the main/standby switching frequency; the first primary master-slave control board switching component is used for restarting the first master control board of the main frame, capturing abnormal information of the first master control board in the restarting process and the interface board and the exchange network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping master-slave switching and positioning the captured abnormal information when judging that the captured abnormal information is the hang-up problem abnormal information; the second main/standby control board switching component is used for restarting the second main control board of the main frame, capturing the abnormal information of the second main control board in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping main/standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hang-up problem abnormal information; the main and standby frame switching component is used for restarting the main frame, capturing the abnormal information of the main frame in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main and standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hang-up problem type abnormal information; and repeating the primary and secondary main and standby control board switching and the secondary main and standby control board switching to finish primary and standby switching once until the main and standby switching is stopped or the actual main and standby switching times are equal to the main and standby switching times when the acquired abnormal information is the hang-up problem abnormal information.
The device for acquiring abnormal information during active-standby switching in the multi-master VSM environment according to the present disclosure further comprises: the anomaly information processing component is used for extracting keywords from the collected anomaly information and matching the collected anomaly information in an anomaly information base containing historical anomaly information based on the extracted keywords, returning the anomaly information associated with the collected anomaly information and personnel information corresponding to the associated anomaly information if the matching is successful, and adding the collected anomaly information into the anomaly information base as newly discovered anomaly information if the matching is unsuccessful.
According to the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the abnormal information processing component is further used for: and when the captured abnormal information is judged to be the non-hanging-up problem abnormal information, collecting the captured abnormal information and carrying out abnormal information processing on the captured abnormal information.
The device for acquiring abnormal information during active-standby switching in the multi-master VSM environment according to the present disclosure further comprises: and the redisplay information component is used for displaying the restarting times passed by the main/standby switching process of the screen and the system time when the main/standby switching is finished when judging that the first main/standby switching, the second main/standby switching or the main/standby frame switching is finished and no abnormal information is captured.
According to the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the number of active-standby switching times can be limited or infinite.
In summary, by adopting the method and the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the abnormal information during the active-standby switching in the multi-master VSM environment can be automatically acquired, including the abnormal information started by the equipment switching network board and the interface board. Positioning and recording the hanging-up problem abnormal information, recording the non-hanging-up problem abnormal information and continuing to perform main-standby switching to find out as many problems as possible; in addition, the known abnormal information occurring during restarting is summarized by creating a problem library. The automatic operation can save human resources and improve efficiency; problems can be found in non-working time, so that the management of test time becomes more flexible; the applicable test environment is wider; the ability to find problems is greater.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are only some embodiments of the present application and other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a flowchart illustrating a method for acquiring exception information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Fig. 2 is a detailed flowchart of a method for acquiring exception information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of an apparatus for acquiring anomaly information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, systems, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the present disclosure, and therefore, should not be taken to limit the scope of the present disclosure.
In the VSM environment, we generally call a primary member device a primary box and a backup member device a backup box. The main/standby frame switching is to restart the whole main frame, and the standby frame becomes the main frame after the main frame is restarted. High-end switches in a VSM environment are typically equipped with two master control boards, referred to as a primary master control board and a standby master control board, respectively. When the main control board fails, the system automatically performs main-standby switching, and the standby main control board takes over the work of the main control board, so that the normal operation of the service is ensured. In the method and the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment in the embodiment of the present disclosure, two master control boards equipped in a high-end switch are respectively referred to as a first master control board and a second master control board. In order to improve the reliability of the device, during the test process, it is often necessary to verify that the master control can be normally started after the active/standby switching, and the functions are normal.
Fig. 1 is a flowchart illustrating a method for acquiring exception information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
As shown in fig. 1, in step S102, the number of main device switching times is preset.
In step S104, the first active/standby control board is restarted, specifically, the first main control board of the main frame is restarted, the abnormal information of the first main control board in the restarting process and the abnormal information of the interface board and the switch board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem abnormal information, the active/standby switching is stopped, and the hang-up problem corresponding to the captured abnormal information is positioned.
More specifically, capturing abnormal information in the restarting process of the first main control board; capturing abnormal information when the interface board of the first main control board and the exchange network board are started while capturing the abnormal information of the first main control board; after the abnormal information is captured, prompting that the abnormal information is started, and collecting the abnormal information. And after the abnormal information is collected, distinguishing the abnormal information. If the problem is a hang-up problem, such as the main control or board serial port prompting "Starting write exception" is restarted, the version cannot be identified or the version is empty, kdb is entered in the starting process of the equipment or the board, the main/standby switching is stopped, and the hang-up problem is positioned.
In step S106, the second active/standby control board is restarted, specifically, the second main control board of the main frame is restarted, the abnormal information of the second main control board in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem abnormal information, the active/standby switching is stopped, and the hang-up problem corresponding to the captured abnormal information is positioned.
More specifically, after the first switching of the main control board and the standby control board is completed, restarting the second main control board of the main frame. Capturing abnormal information in the restarting process of the second main control board; capturing abnormal information of the interface board of the second main control board and the switching network board when the second main control board is started while capturing the abnormal information of the second main control board; after the abnormal information is captured, prompting that the abnormal information is started, and collecting the abnormal information. And after the abnormal information is collected, distinguishing the abnormal information. If the problem is the hang-up problem, the main/standby switching is stopped, and the hang-up problem is positioned.
In step S108, the active/standby frame is switched, specifically, the active/standby frame is restarted, the abnormal information of the active/standby frame in the restarting process and the abnormal information of the interface board and the switch board of the active/standby frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem type abnormal information, the active/standby switching is stopped and the captured abnormal information is positioned.
More specifically, when the main frame completes the two main/standby control board switching, the main/standby frame switching is performed. The main/standby frame switching is to restart the whole main frame, and the standby frame becomes the main frame after the main frame is restarted. The process of capturing the abnormal information in the main/standby frame switching process is consistent with the main/standby control board switching process. Specifically, firstly restarting the whole main frame, and capturing the main control panel abnormality information of the main frame in the restarting process of the main frame; capturing the abnormal information of the interface board of the main frame and the abnormal information when the exchange network board is started while capturing the abnormal information of the main control board of the main frame; when the captured abnormal information is detected, prompting that the abnormal information is started, and collecting the abnormal information. After the anomaly information is collected, the collected anomaly information is discriminated. And the processing of the abnormal information is consistent with the switching of the main control board and the standby control board, namely the captured abnormal information is judged, if the abnormal information is a hang-up problem, the main control board and the standby control board stop to perform the main control board and standby control board switching, and the hang-up problem is positioned.
In step S110, it is determined whether the actual active-standby switching times are equal to the active-standby switching times.
More specifically, the primary master-slave control board switching step, the secondary master-slave control board switching step and the master-slave frame switching step are repeatedly executed to complete primary-slave switching until the master-slave switching is stopped when the abnormal information is captured in the primary master-slave control board switching step, the secondary master-slave control board switching step or the master-slave frame switching step and the captured abnormal information is the hang-up problem abnormal information or the actual master-slave switching frequency is equal to the master-slave switching frequency.
The method for acquiring abnormal information during active-standby switching in the multi-master VSM environment according to the embodiment of the present disclosure further includes: and processing the abnormal information, extracting keywords from the collected abnormal information and matching the extracted keywords in an abnormal information base containing the historical abnormal information. If the matching is successful, the collected abnormal information associated with the abnormal information and personnel information corresponding to the associated abnormal information are returned, and if the matching is unsuccessful, the collected abnormal information is added into an abnormal information base as newly discovered abnormal information.
More specifically, for the anomaly information collected during the active-standby switching process, an anomaly information base is established, and the anomaly information base is a summary of the anomaly information which occurs when the known device is restarted. When abnormal information occurs in the process of main/standby switching, the abnormal information is collected; the key characters are then extracted from the collected anomaly information and matched in an anomaly information base. Extracting bug single numbers of related problems of the problems corresponding to the collected abnormal information if the matching is successful, and indicating related responsible persons; and if the collected abnormal information is not matched, adding the collected abnormal information into an abnormal information base as newly discovered abnormal information.
The method for acquiring abnormal information during active-standby switching in the master control VSM environment according to the embodiment of the present disclosure further includes: and when the captured abnormal information is judged to be the non-hanging-up problem abnormal information, collecting the captured abnormal information and carrying out abnormal information processing on the captured abnormal information.
More specifically, if the problem is non-hanging, such as failure of the start item in the start process, redundant prompt information in the start process, prompt err item in the start process, etc., the active/standby switching is continued to find as many problems as possible. Since the test personnel can perform other test works when the test personnel rest or execute the script when executing the script, the test personnel cannot pay attention to the running of the script at any time, and in order to find out more problems, the scheme selects to continue executing the script for the non-hang-up problems.
The method for acquiring abnormal information during active-standby switching in the multi-master VSM environment according to the embodiment of the present disclosure further includes: when judging that the first main/standby control board switching, the second main/standby control board switching or the main/standby frame switching is completed and no abnormal information is captured, displaying the restarting times passed by the main/standby switching process and the system time when the main/standby switching is completed on the screen.
According to the method for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the number of active-standby switching times can be limited or infinite.
More specifically, in this solution, the script is stopped when the hang-up problem occurs, because the hang-up problem is a fatal problem, which needs to be solved first. After the test environment is built, the test personnel can execute the script, and the test personnel can choose to limit the execution times of the script and can also endless loop.
Fig. 2 is a detailed flowchart of a method for acquiring exception information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
As shown in fig. 2, in step S202, the first main control board is restarted.
In step S204, anomaly information is captured. Specifically, capturing abnormal information in the restarting process of the first main control board and capturing the abnormal information when the interface board and the exchange network board are started while capturing the abnormal information in the restarting process of the first main control board.
In step S206, it is determined whether or not abnormality information is captured.
If it is determined in step S206 whether or not the abnormality information is captured, the process proceeds to step S212 if yes. In step S212, the "start-up has abnormality" is presented, and abnormality information is collected.
In step S214, it is determined whether the collected abnormality information is hang-up problem type abnormality information.
If it is determined in step S214 whether or not the collected abnormality information is the hang-up problem type abnormality information, the flow proceeds to step S216 as a result of yes. In step S216, abnormality information positioning is performed.
In step S214, it is determined whether the collected abnormal information is the hang-up problem abnormal information, if no, the steps in step S204 and the steps subsequent to step S204 are re-entered until the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, and after the positioning of the abnormal information, the main/standby switching is finished, or when the first main control board is restarted, the abnormal information is not captured and the main/standby switching is completed, the step S218 is entered, and the new main control board is restarted.
If the result of determining whether or not the abnormality information is captured in step S206 is no, the routine proceeds to step S208. In step S208, it is determined whether the active-standby switching is completed. In step S208, if the result of determining whether the active/standby switching is completed is yes, the process proceeds to step S210. In step S210, the screen is displayed back for the number of times of restarting and the system time. And (3) in step S208, if the result of finishing the active-standby switching is NO, the step S204 and the subsequent steps are re-entered until the active-standby switching is finished after the positioning of the abnormal information is carried out when the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, or the step S218 is entered after the abnormal information is not captured and the active-standby switching is finished when the first main control board is restarted, and the new main control board is restarted.
In step S218, the second main control board is restarted.
In step S220, anomaly information is captured. Specifically, capturing abnormal information in the restarting process of the second main control board and capturing the abnormal information when the interface board and the exchange network board are started while capturing the abnormal information in the restarting process of the second main control board.
In step S222, it is determined whether or not abnormality information is captured.
If it is determined in step S222 whether or not the abnormality information is captured as yes, the flow proceeds to step S228. In step S228, the "start-up has abnormality" is presented, and abnormality information is collected.
In step S230, it is determined whether the collected abnormality information is hang-up problem type abnormality information.
If it is determined in step S230 whether the collected abnormality information is hang-up problem abnormality information, the flow proceeds to step S232. In step S232, abnormality information positioning is performed.
In step S230, it is determined whether the collected anomaly information is the hang-up problem anomaly information, if no, the method re-proceeds to step S220 and the steps subsequent to step S220 until the anomaly information is captured and the captured anomaly information is the hang-up problem anomaly information, and then the main/standby switching is finished after the positioning of the anomaly information, or the method does not capture the anomaly information when the second main control board is restarted and then proceeds to step S234 after the main/standby switching is finished, and the whole frame of the main frame is restarted.
If the result of determining whether or not the abnormality information is captured in step S222 is no, the routine proceeds to step S224. In step S224, it is determined whether the active-standby switching is completed. In step S224, if the result of determining whether the active/standby switching is completed is yes, the process proceeds to step S226. In step S226, the screen is displayed back for the number of times of restart and the system time. And (3) in step S224, if the result of finishing the main/standby switching is NO, the step S220 and the subsequent steps are re-entered until the main/standby switching is finished after the positioning of the abnormality information is performed when the abnormality information is captured and the captured abnormality information is the hang-up problem abnormality information, or the step S234 is entered after the abnormality information is not captured and the main/standby switching is finished when the second main control board is restarted, and the whole frame of the main frame is restarted.
In step S234, the main frame is restarted.
In step S236, anomaly information is captured. Specifically, capturing abnormal information in the restarting process of the main frame and the main control board of the main frame, and capturing the abnormal information when the interface board and the exchange network board are started.
In step S238, it is determined whether or not abnormality information is captured.
If it is determined in step S238 whether or not the abnormality information is captured, the process proceeds to step S244 as a result of yes. In step S244, the "start-up has abnormality" is presented, and abnormality information is collected.
In step S246, it is determined whether the collected abnormality information is hang-up problem type abnormality information.
If it is determined in step S246 whether or not the collected abnormality information is the hang-up problem type abnormality information, the flow proceeds to step S248. In step S248, abnormality information positioning is performed.
In step S246, it is determined whether the collected abnormal information is the hang-up problem abnormal information, if no, the steps S236 and the steps subsequent to step S236 are re-entered until the main/standby switching is finished after the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, or the main/standby switching is re-started after the abnormal information is not captured and the main/standby switching is completed when the main frame is restarted, and the step S202 is re-entered to restart the first main control board.
If the result of the determination in step S238 is no, the flow proceeds to step S240. In step S240, it is determined whether the active-standby switching is completed. In step S240, if the result of determining whether the active/standby switching is completed is yes, the process proceeds to step S242. In step S242, the screen is displayed back for the number of times of restarting and the system time. And (3) in step S240, if the result of finishing the active-standby switching is NO, the step S236 and the subsequent steps are re-entered until the active-standby switching is finished after the positioning of the abnormality information is performed when the abnormality information is captured and the captured abnormality information is the hang-up problem abnormality information, or the step S202 is entered after the active-standby switching is finished when the active frame is restarted, and the first main control board is restarted.
Fig. 3 is a schematic diagram of an apparatus for acquiring anomaly information during active-standby switching in a multi-master VSM environment according to an embodiment of the present disclosure. As shown in fig. 3, the device for acquiring abnormal information during active-standby switching in the multi-master VSM environment according to the embodiment of the present disclosure includes: the main/standby switching times presetting component 302, the first main/standby control board switching component 304, the second main/standby control board switching component 306, the main/standby frame switching component 308, the abnormal information processing component 310 and the back display information component 312.
The main/standby switching number presetting component 302 is configured to preset main/standby switching numbers; the first active/standby control board switching component 304 is configured to restart the first main control board of the main frame, capture abnormal information of the first main control board during restarting and abnormal information of the interface board and the switch board of the main frame during starting, collect the captured abnormal information after capturing the abnormal information, and stop active/standby switching and locate the captured abnormal information when judging that the captured abnormal information is a hang-up problem type abnormal information; the second active/standby control board switching component 306 is configured to restart the second main control board of the main frame, capture abnormal information of the second main control board during restarting and abnormal information of the interface board and the switch board of the main frame during starting, collect the captured abnormal information after capturing the abnormal information, and stop active/standby switching and locate the captured abnormal information when judging that the captured abnormal information is a hang-up problem type abnormal information; the main/standby frame switching component 308 is configured to restart the main frame, capture abnormal information of the main frame during restarting and abnormal information of the interface board and the switch board of the main frame during starting, collect the captured abnormal information after capturing the abnormal information, stop main/standby switching and locate the captured abnormal information when judging that the captured abnormal information is the hang-up problem type abnormal information; and repeating the primary and secondary main and standby control board switching and the secondary main and standby control board switching to finish primary and standby switching once until the main and standby switching is stopped or the actual main and standby switching times are equal to the main and standby switching times when the acquired abnormal information is the hang-up problem abnormal information.
According to the embodiment of the disclosure, the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment further comprises: the anomaly information processing component 310 is configured to extract keywords from the collected anomaly information and match the collected anomaly information in an anomaly information base including historical anomaly information based on the extracted keywords, return the anomaly information associated with the collected anomaly information and personnel information corresponding to the associated anomaly information if the matching is successful, and add the collected anomaly information as newly discovered anomaly information to the anomaly information base if the matching is unsuccessful.
The device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment according to the embodiment of the present disclosure, wherein the abnormal information processing component 310 is further configured to: and when the captured abnormal information is judged to be the non-hanging-up problem abnormal information, collecting the captured abnormal information and carrying out abnormal information processing on the captured abnormal information.
According to the embodiment of the disclosure, the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment further comprises: the redisplay information component 312 is configured to, when it is determined that the first active/standby control board switching, the second active/standby control board switching, or the active/standby frame switching is completed and no abnormal information is captured, redisplay the number of times of restarting the active/standby switching process and the system time when the active/standby switching is completed.
According to the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the number of active-standby switching times can be limited or infinite.
In summary, by adopting the method and the device for acquiring the abnormal information during the active-standby switching in the multi-master VSM environment, the abnormal information during the active-standby switching in the multi-master VSM environment can be automatically acquired, including the abnormal information started by the equipment switching network board and the interface board. Positioning and recording the hanging-up problem abnormal information, recording the non-hanging-up problem abnormal information and continuing to perform main-standby switching to find out as many problems as possible; in addition, the known abnormal information occurring during restarting is summarized by creating a problem library. The automatic operation can save human resources and improve efficiency; problems can be found in non-working time, so that the management of test time becomes more flexible; the applicable test environment is wider; the ability to find problems is greater.
In general, when the active/standby switching is performed, after the script starts, the active master is restarted, and abnormal information is captured during the restarting process of the active master; capturing the abnormal information when the interface board and the exchange network board are started while capturing the main control abnormal information; when the captured abnormal information is detected, prompting that the abnormal information is started, collecting the abnormal information, and distinguishing the abnormal information after the abnormal information is collected. If the problem is a hang-up problem, such as the main control or board serial port prompt Starting write exception is restarted, the version cannot be identified or the version is empty, kdb is entered in the starting process of the equipment or the board, the script is stopped, and the hang-up problem is positioned; if the problem is non-hanging, if the starting item fails in the starting process, redundant prompt information exists in the starting process, the prompting err item exists in the starting process, and the like, the script is continuously executed to discover as many problems as possible. Since the test personnel can perform other test works when the test personnel rest or execute the script when executing the script, the test personnel cannot pay attention to the running of the script at any time, and in order to find out more problems, the scheme selects to continue executing the script for the non-hang-up problems. For anomaly information, a problem library is created, which is a summary of the anomaly information that is known to occur at restart. After abnormal information occurs in the switching process, the abnormal information is collected; and then extracting key characters from the abnormal information, matching the key characters in a problem library, extracting bug single numbers of related problems if the matching is successful, and indicating related responsible persons. If there is no match, the anomaly information is used as a newly discovered join question library. And after the main/standby switching is completed, displaying the starting completion times and time. And then restarting the new main control, and repeating the steps.
When the main frame is switched, the main frame is switched after the main frame is switched twice. Generally, we call a primary member device a primary frame, and a standby member device a standby frame. The main/standby frame switching is to restart the whole main frame, and the standby frame becomes the main frame after the main frame is restarted. The process of capturing the abnormal information is consistent with the main-standby switching. Firstly restarting the whole main frame, and capturing abnormal information in the restarting process. And capturing the main control abnormal information and simultaneously capturing the abnormal information when the interface board and the exchange network board are started. When the captured abnormal information is detected, prompting that the abnormal information is started, collecting the abnormal information, and distinguishing the abnormal information after the abnormal information is collected. And processing the abnormal information is consistent with the main-standby switching. And after the main and standby frames are switched, displaying the starting completion times and time. After which a new round of circulation takes place. The script is stopped when the hang-up problem occurs, because the hang-up problem is a fatal problem and needs to be solved first. After the test environment is built, the test personnel can execute the script, and the test personnel can choose to limit the execution times of the script and can also endless loop.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
The exemplary embodiments of the present application have been particularly shown and described above. It is to be understood that this application is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (6)

1. A method for acquiring abnormal information during active-standby switching in a multi-master VSM environment comprises the following steps:
presetting the number of main equipment switching times;
restarting a first main control board of a main frame, capturing abnormal information of the first main control board in the restarting process and the interface board and the exchange network board of the main frame in the starting process, collecting the captured abnormal information and stopping main/standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hanging-up problem abnormal information after capturing the abnormal information, and collecting the captured abnormal information and carrying out abnormal information processing on the captured abnormal information and continuing main/standby switching when judging that the captured abnormal information is the non-hanging-up problem abnormal information;
the method comprises the steps of performing secondary main/standby control board switching, restarting a second main control board of a main frame, capturing abnormal information of the second main control board in the restarting process and the interface board and the switching network board of the main frame in the starting process, collecting the captured abnormal information and stopping main/standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hanging-up problem abnormal information after capturing the abnormal information, and collecting the captured abnormal information and performing abnormal information processing on the captured abnormal information and continuing main/standby switching when judging that the captured abnormal information is the non-hanging-up problem abnormal information;
The method comprises the steps of switching main frames and standby frames, restarting the main frames, capturing abnormal information of the main frames in the restarting process and the interface board and the exchange network board of the main frames in the starting process, collecting the captured abnormal information after the abnormal information is captured, stopping the main and standby switching and positioning the captured abnormal information when the captured abnormal information is judged to be the hanging-up problem abnormal information;
and repeating the primary main/standby control board switching step, the secondary main/standby control board switching step and the main/standby frame switching step to finish primary/standby switching until the main/standby switching is stopped when the abnormal information is captured in the primary main/standby control board switching step, the secondary main/standby control board switching step or the main/standby frame switching step and the captured abnormal information is the hang-up problem abnormal information or the actual main/standby switching frequency is equal to the main/standby switching frequency.
2. The method for obtaining abnormal information during active-standby switching in the multi-master VSM environment according to claim 1, further comprising:
processing the abnormal information, extracting the keywords of the collected abnormal information and matching the extracted keywords in an abnormal information base containing the historical abnormal information,
If the matching is successful, returning the collected abnormality information related to the abnormality information and the personnel information corresponding to the related abnormality information, and
if the matching is unsuccessful, the collected abnormal information is used as newly discovered abnormal information to be added into an abnormal information base.
3. The method for obtaining abnormal information during active-standby switching in the multi-master VSM environment according to claim 1, further comprising:
when judging that the first main/standby control board switching, the second main/standby control board switching or the main/standby frame switching is completed and no abnormal information is captured, displaying the restarting times passed by the main/standby switching process and the system time when the main/standby switching is completed on the screen.
4. A device for acquiring abnormal information during active-standby switching in a multi-master VSM environment comprises:
the main/standby switching frequency presetting component is used for presetting the main/standby switching frequency;
the first-time master-slave control board switching component is used for restarting the first master control board of the master frame, capturing the abnormal information of the first master control board in the restarting process and the abnormal information of the interface board and the exchange network board of the master frame in the starting process, collecting the captured abnormal information and stopping master-slave switching and positioning the captured abnormal information when judging that the captured abnormal information is the hanging-up problem abnormal information after capturing the abnormal information, and collecting the captured abnormal information and processing the captured abnormal information and continuing to perform master-slave switching when judging that the captured abnormal information is the non-hanging-up problem abnormal information;
The second active/standby control board switching component is used for restarting the second main control board of the main frame, capturing the abnormal information of the second main control board in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process, collecting the captured abnormal information and stopping active/standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hanging problem abnormal information after capturing the abnormal information, and collecting the captured abnormal information and processing the captured abnormal information and continuing active/standby switching when judging that the captured abnormal information is the non-hanging problem abnormal information;
the main and standby frame switching component is used for restarting the main frame, capturing the abnormal information of the main frame in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main and standby switching and positioning the captured abnormal information when judging that the captured abnormal information is the hang-up problem type abnormal information;
and repeating the primary and secondary main and standby control board switching and the secondary main and standby control board switching to finish primary and standby switching once until the main and standby switching is stopped or the actual main and standby switching times are equal to the main and standby switching times when the acquired abnormal information is the hang-up problem abnormal information.
5. The apparatus for obtaining exception information during active-standby switching in a multi-master VSM environment according to claim 4, further comprising:
an anomaly information processing component for extracting keywords from the collected anomaly information and matching in an anomaly information base containing historical anomaly information based on the extracted keywords,
if the matching is successful, returning the collected abnormality information related to the abnormality information and the personnel information corresponding to the related abnormality information, and
if the matching is unsuccessful, the collected abnormal information is used as newly discovered abnormal information to be added into an abnormal information base.
6. The apparatus for obtaining exception information during active-standby switching in a multi-master VSM environment according to claim 4, further comprising:
and the redisplay information component is used for displaying the restarting times passed by the main/standby switching process of the screen and the system time when the main/standby switching is finished when judging that the first main/standby switching, the second main/standby switching or the main/standby frame switching is finished and no abnormal information is captured.
CN202210451829.8A 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching Active CN114785673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210451829.8A CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210451829.8A CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Publications (2)

Publication Number Publication Date
CN114785673A CN114785673A (en) 2022-07-22
CN114785673B true CN114785673B (en) 2023-08-22

Family

ID=82433520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210451829.8A Active CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Country Status (1)

Country Link
CN (1) CN114785673B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722627A (en) * 2004-07-13 2006-01-18 华为技术有限公司 A method and device for realizing switching between main and backup units in communication equipment
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
CN105959128A (en) * 2015-08-11 2016-09-21 杭州迪普科技有限公司 Fault processing method and device and network device
WO2016177231A1 (en) * 2015-07-10 2016-11-10 中兴通讯股份有限公司 Dual-control-based active-backup switching method and device
CN106533736A (en) * 2016-10-13 2017-03-22 杭州迪普科技股份有限公司 Network device reboot method and apparatus
CN113162808A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Storage link fault processing method and device, electronic equipment and storage medium
CN114138534A (en) * 2021-12-01 2022-03-04 斑马网络技术有限公司 Recovery and positioning method, device and equipment for system hang-up fault and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7818615B2 (en) * 2004-09-16 2010-10-19 Invensys Systems, Inc. Runtime failure management of redundantly deployed hosts of a supervisory process control data acquisition facility
US8706914B2 (en) * 2007-04-23 2014-04-22 David D. Duchesneau Computing infrastructure
US10574513B2 (en) * 2017-06-16 2020-02-25 Cisco Technology, Inc. Handling controller and node failure scenarios during data collection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722627A (en) * 2004-07-13 2006-01-18 华为技术有限公司 A method and device for realizing switching between main and backup units in communication equipment
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
WO2016177231A1 (en) * 2015-07-10 2016-11-10 中兴通讯股份有限公司 Dual-control-based active-backup switching method and device
CN105959128A (en) * 2015-08-11 2016-09-21 杭州迪普科技有限公司 Fault processing method and device and network device
CN106533736A (en) * 2016-10-13 2017-03-22 杭州迪普科技股份有限公司 Network device reboot method and apparatus
CN113162808A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Storage link fault processing method and device, electronic equipment and storage medium
CN114138534A (en) * 2021-12-01 2022-03-04 斑马网络技术有限公司 Recovery and positioning method, device and equipment for system hang-up fault and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵海荣.基于802.1BR协议堆叠系统的设计与实现.中国优秀硕士学位论文数据库.2017,全文. *

Also Published As

Publication number Publication date
CN114785673A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
JP6396887B2 (en) System, method, apparatus, and non-transitory computer readable storage medium for providing mobile device support services
CN107666493B (en) Database configuration method and equipment thereof
CN109245966A (en) The monitoring method and device of the service state of cloud platform
US20020002448A1 (en) Means for incorporating software into avilability models
CN110581785B (en) Reliability evaluation method and device
CN111478796B (en) Cluster capacity expansion exception handling method for AI platform
CN113515316A (en) Novel edge cloud operating system
CN114785673B (en) Method and device for acquiring abnormal information during active-standby switching
CN111090537B (en) Cluster starting method and device, electronic equipment and readable storage medium
CN115098294B (en) Abnormal event processing method, electronic equipment and management terminal
CN112099879B (en) Configuration information management method and device, computer equipment and storage medium
CN111737130B (en) Public cloud multi-tenant authentication service testing method, device, equipment and storage medium
CN114679295A (en) Firewall security configuration method and device
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant