CN114785673A - Method and device for acquiring abnormal information during main/standby switching under multi-master control VSM environment - Google Patents

Method and device for acquiring abnormal information during main/standby switching under multi-master control VSM environment Download PDF

Info

Publication number
CN114785673A
CN114785673A CN202210451829.8A CN202210451829A CN114785673A CN 114785673 A CN114785673 A CN 114785673A CN 202210451829 A CN202210451829 A CN 202210451829A CN 114785673 A CN114785673 A CN 114785673A
Authority
CN
China
Prior art keywords
abnormal information
main
standby
switching
captured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210451829.8A
Other languages
Chinese (zh)
Other versions
CN114785673B (en
Inventor
鲍佳鹏
刘书超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN202210451829.8A priority Critical patent/CN114785673B/en
Publication of CN114785673A publication Critical patent/CN114785673A/en
Application granted granted Critical
Publication of CN114785673B publication Critical patent/CN114785673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/70Virtual switches

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present disclosure relates to a method and a device for acquiring abnormal information during main/standby switching in a multi-master control VSM environment, wherein the method comprises the following steps: presetting main/standby switching times; restarting the first main control board, capturing abnormal information of the first main control board in the restarting process and the abnormal information of the interface board and the exchange screen board in the starting process, collecting the abnormal information after capturing the abnormal information, stopping the main-standby switching when judging that the abnormal information is the hang-up problem, and positioning the abnormal information; restarting the second main control board, wherein the method for acquiring the abnormal information in the restarting process of the second main control board is consistent with the method for acquiring the abnormal information in the restarting process of the first main control board; switching the main frame and the standby frame, restarting the main frame, and enabling the method for acquiring the abnormal information in the restarting process of the main frame to be consistent with the method for acquiring the abnormal information in the restarting process of the main control board; and repeating the steps until the abnormal information is captured and the abnormal information is the hang-up problem, and stopping the main-standby switching or stopping the main-standby switching when the actual main-standby switching times are equal to the main-standby switching times.

Description

Method and device for acquiring abnormal information during main/standby switching under multi-master control VSM environment
Technical Field
The present disclosure relates to the field of VSM technologies, and in particular, to a method and an apparatus for acquiring abnormal information during active/standby switching in a multi-master VSM environment.
Background
As the network scale continues to grow, the complexity of configuration and maintenance increases dramatically. The VSM (Virtual Switching Matrix) technology is a virtualization technology for virtualizing a plurality of physical devices of layers L2 to L7 into one logical device for management and use. By the VSM technology, the networking complexity can be greatly simplified, the network reliability is improved, and the network is easier to configure and maintain.
The equipment in the VSM system is divided into main member equipment and standby member equipment, only one main member equipment can exist in one VSM, other member equipment is standby member equipment, and the main member equipment and the standby member equipment are generated by role election. Specifically, the main member device is responsible for managing and controlling the whole VSM system, uniformly issuing all configuration information of the VSM to all standby member devices, uniformly maintaining and managing state information of a data link layer and an upper protocol state machine running in the VSM and synchronizing the information to the standby member devices; the standby member equipment is controlled and managed by the main member equipment, operates as backup equipment of the main member equipment, and can also forward data services. When the main member device fails, the system can automatically select a new main member device from the standby member devices to take over the original main member device for work.
The high-end switch is very important because the network position is located, and single point of failure is not allowed to occur. High-end switches are usually equipped with two main control boards, which are called main control board and standby control board respectively. The main control board is used as the core of the control plane, and is in service communication with the external and the service boards to complete the normal functions of each module in the system; the standby main control board is only used as the backup of the main control board and is not communicated with the external and the service board. When the main control board has a fault, the system automatically switches the main board and the standby board, and the standby board replaces the main board to ensure the normal operation of the service.
In order to improve the reliability of the device, in the test process, it is often necessary to verify whether the main control can be started normally and the function is normal after the main/standby switch. Although the existing design scheme for automatic restarting of the device exists, manual operation is still needed for main/standby switching in a VSM environment, particularly in a multi-main-control VSM environment; general equipment is started, particularly frame type equipment needs a long time for starting, and if main and standby switching is carried out manually, a large amount of time of testers needs to be spent, so that the working efficiency is influenced; in addition, the existing scheme cannot acquire abnormal information when the equipment switching network board and the interface board are started.
Therefore, there is a need for a method and a device for automatically obtaining abnormal information during active/standby switching in a multi-master VSM environment without manual operation.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for acquiring abnormal information during active/standby switching in a multi-master VSM environment.
According to one aspect of the present disclosure, a method for acquiring abnormal information during main/standby switching in a multi-main-control VSM environment is provided, where the method includes: presetting main/standby switching times; the method comprises the steps that a main control board and a standby control board are switched for the first time, the first main control board of a main frame is restarted, abnormal information of the first main control board in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, when the captured abnormal information is judged to be abnormal information of the hang-up problem type, the main control board and the standby control board are switched, and the captured abnormal information is positioned; second main control board switching, restarting the second main control board of the main frame, capturing abnormal information of the second main control board in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main and standby switching when judging that the captured abnormal information is the abnormal information of the hang-up problem type, and positioning the captured abnormal information; switching the main frame and the standby frame, restarting the main frame, capturing abnormal information of the main frame in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main frame and the standby switching when judging that the captured abnormal information is the hang-up problem type abnormal information, and positioning the captured abnormal information; and repeating the first main-standby control board switching, the second main-standby control board switching and the main-standby frame switching to complete the main-standby switching until the main-standby switching is stopped or the actual main-standby switching times are equal to the main-standby switching times when the captured abnormal information is the hang-up question abnormal information.
According to the method for acquiring the abnormal information during the main/standby switching in the multi-master control VSM environment, the method further comprises the following steps: and processing abnormal information, namely extracting keywords from the collected abnormal information, matching the extracted keywords in an abnormal information base containing historical abnormal information, returning abnormal information associated with the collected abnormal information and personnel information corresponding to the associated abnormal information if the matching is successful, and adding the collected abnormal information into the abnormal information base as newly-found abnormal information if the matching is unsuccessful.
According to the method for acquiring the abnormal information during the main/standby switching in the multi-main control VSM environment, the method further comprises the following steps: and when the captured abnormal information is judged to be the abnormal information of the non-hang-up question, collecting the captured abnormal information and processing the captured abnormal information.
According to the method for acquiring the abnormal information during the main/standby switching in the multi-master control VSM environment, the method further comprises the following steps: when the first main-standby control board switching, the second main-standby control board switching or the main-standby frame switching is judged to be completed and abnormal information is not captured, the number of times of restarting the main-standby switching process and the system time when the main-standby switching is completed are displayed on a screen.
According to the method for acquiring the abnormal information during the main/standby switching in the multi-main control VSM environment, the number of times of main/standby switching can be a limited number or an infinite number.
According to another aspect of the present disclosure, a device for obtaining abnormal information during main/standby switching in a multi-main-control VSM environment is provided, the device including: a main/standby switching frequency presetting component for presetting the main/standby switching frequency; the first main/standby control board switching assembly is used for restarting a first main control board of a main frame, capturing abnormal information of the first main control board in the restarting process and abnormal information of an interface board and an exchange screen board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is the problem-hanging abnormal information, and positioning the captured abnormal information; the second main/standby control board switching assembly is used for restarting a second main control board of the main frame, capturing abnormal information of the second main control board in the restarting process and abnormal information of an interface board and an exchange network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is the hang-up problem abnormal information, and positioning the captured abnormal information; the main/standby frame switching assembly is used for restarting the main frame, capturing abnormal information of the main frame in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is the hang-up problem type abnormal information, and positioning the captured abnormal information; and repeating the first main/standby control board switching, the second main/standby control board switching and the main/standby frame switching to complete one main/standby switching until the main/standby switching is stopped or the actual main/standby switching times are equal to the main/standby switching times when the captured abnormal information is the hang-up problem abnormal information.
According to the device for acquiring the abnormal information during the main/standby switching in the multi-main control VSM environment, the device further comprises: and the abnormal information processing component is used for extracting keywords of the collected abnormal information, matching the collected abnormal information in an abnormal information base containing historical abnormal information on the basis of the extracted keywords, returning the abnormal information associated with the collected abnormal information and the personnel information corresponding to the associated abnormal information if the matching is successful, and adding the collected abnormal information into the abnormal information base as newly found abnormal information if the matching is unsuccessful.
According to the device for acquiring the abnormal information during the main/standby switching in the multi-master-control VSM environment, the abnormal information processing component is further configured to: and when the captured abnormal information is judged to be the abnormal information of the non-hang-up question, collecting the captured abnormal information and processing the captured abnormal information.
According to the device for obtaining the abnormal information during the main/standby switching in the multi-master control VSM environment, the device further comprises: and the back display information component is used for displaying the restart times passed by the main/standby switching process and the system time when the main/standby switching is completed on the screen when judging that the first main/standby control board switching, the second main/standby control board switching or the main/standby frame switching is completed and abnormal information is not captured.
According to the device for acquiring the abnormal information during the main/standby switching in the multi-main-control VSM environment, the main/standby switching times can be limited or infinite.
In summary, by using the method and the apparatus for acquiring abnormal information during main/standby switching in a multi-master VSM environment of the present disclosure, the abnormal information during main/standby switching in the multi-master VSM environment can be automatically acquired, including the abnormal information of the start of the device switching network board and the interface board. The method comprises the steps of obtaining hanging dead question class abnormal information, recording the non-hanging dead question class abnormal information and continuously carrying out main-standby switching to find problems as much as possible; in addition, the abnormal information which is known to occur at the time of restart is summarized by creating a problem library. The automatic operation can save human resources and improve the efficiency; problems can be found during non-working time, so that the management of the test time becomes more flexible; the applicable test environment is wider; the ability to detect problems is greater.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are only some embodiments of the present application, and other drawings may be derived from those drawings by those skilled in the art without inventive effort.
Fig. 1 is a schematic flowchart of a method for acquiring abnormal information during main/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Fig. 2 is a detailed flowchart illustrating a method for acquiring exception information during active/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of a device for acquiring exception information during active/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known methods, systems, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
In a VSM environment, we generally refer to a primary member device as a primary frame and a secondary member device as a secondary frame. The main frame and standby frame switching is to restart the whole main frame, and after the main frame is restarted, the standby frame becomes the main frame. A high-end switch in a VSM environment is generally equipped with two main control boards, which are called a main control board and a standby control board, respectively. When the main control board has a fault, the system automatically switches the main board and the standby board, and the standby board replaces the main board to ensure the normal operation of the service. In the method and apparatus for acquiring abnormal information during active-standby switching in a multi-master VSM environment according to the embodiment of the present disclosure, two master control boards equipped in a high-end switch are respectively referred to as a first master control board and a second master control board. In order to improve the reliability of the device, in the test process, it is often necessary to verify that the master control can be started normally after the master/slave switching, and the function is normal.
Fig. 1 is a schematic flowchart illustrating a method for acquiring exception information during active/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
As shown in fig. 1, in step S102, the number of main/standby switching times is preset.
In step S104, the main/standby control boards are switched for the first time, specifically, the first main control board of the main frame is restarted, abnormal information of the first main control board in the restarting process and abnormal information of the interface board and the switching network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem abnormal information, the main/standby switching is stopped and the hang-up problem corresponding to the captured abnormal information is located.
More specifically, capturing abnormal information in the process of restarting the first master control edition; capturing abnormal information of an interface board of the first main control board and the switching network board when the first main control board is started while capturing the abnormal information of the first main control board; and after capturing the abnormal information, prompting that the abnormal information is started, and collecting the abnormal information. And identifying the abnormal information after collecting the abnormal information. If the problem is hang-up, for example, the master control or the serial port of the board prompts 'staring write exception' and restarts, the version cannot be identified or is empty, kdb is entered in the Starting process of the device or the board, and the like, the master/standby switching is stopped, and the hang-up problem is located.
In step S106, the main/standby control boards are switched for the second time, specifically, the second main control board of the main frame is restarted, the abnormal information of the second main control board in the restarting process and the abnormal information of the interface board of the main frame and the switching network board in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when it is determined that the captured abnormal information is the hang-up problem abnormal information, the main/standby switching is stopped and the hang-up problem corresponding to the captured abnormal information is located.
More specifically, after the first switching of the main control board and the standby control board is completed, the second main control board of the main frame is restarted. Capturing abnormal information in the process of restarting the second master control version; capturing the abnormal information of the interface board of the second main control board and the switching network board when the interface board of the second main control board and the switching network board are started; and when the abnormal information is captured, prompting that the abnormal information exists in the starting process, and collecting the abnormal information. And identifying the abnormal information after collecting the abnormal information. If the problem is hang-up, the main/standby switching is stopped, and the hang-up problem is positioned.
In step S108, the main/standby frames are switched, specifically, the main frame is restarted, the abnormal information of the main frame in the restarting process and the abnormal information of the interface board and the switching network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, and when the captured abnormal information is judged to be the hang-up problem type abnormal information, the main/standby switching is stopped and the captured abnormal information is located.
More specifically, when the main frame completes two times of main/standby control board switching, the main/standby frame switching is performed. The main frame and standby frame switching is to restart the whole main frame, and after the main frame is restarted, the standby frame becomes the main frame. The process of capturing the abnormal information in the process of switching the main frame and the standby frame is consistent with the switching of the main control board and the standby control board. Specifically, firstly, restarting the whole main frame, and capturing the abnormal information of the main control board of the main frame in the restarting process of the main frame; capturing abnormal information when an interface board of the main frame and an exchange screen board start while capturing the abnormal information of a main control board of the main frame; and prompting that the abnormal information is started after the abnormal information is captured, and collecting the abnormal information. The collected abnormality information is discriminated after the abnormality information is collected. The processing of the abnormal information is consistent with the switching of the main control board and the standby control board, namely the captured abnormal information is judged, if the abnormal information is the hang-up problem, the main control board and the standby control board are stopped from switching, and the hang-up problem is positioned.
In step S110, it is determined whether the actual number of main/standby switching times is equal to the number of main/standby switching times.
More specifically, the first main/standby control board switching step, the second main/standby control board switching step, and the main/standby frame switching step are repeatedly executed to complete one main/standby switching until the main/standby switching is stopped when the abnormality information is captured in the first main/standby control board switching step, the second main/standby control board switching step, or the main/standby frame switching step and the captured abnormality information is the hang-up problem abnormality information or the actual main/standby switching times are equal to the main/standby switching times.
According to the method for acquiring abnormal information during main/standby switching in a multi-master-control VSM environment of the embodiment of the present disclosure, the method further includes: and processing the abnormal information, namely extracting keywords from the collected abnormal information and matching the extracted keywords in an abnormal information base containing historical abnormal information. If the matching is successful, returning the abnormal information associated with the collected abnormal information and the personnel information corresponding to the associated abnormal information, and if the matching is unsuccessful, adding the collected abnormal information as newly found abnormal information into an abnormal information base.
More specifically, an exception information base is established for exception information collected in the process of active/standby switching, and the exception information base summarizes known exception information occurring during restarting. When abnormal information occurs in the process of main/standby switching, the abnormal information is collected; and then extracting key characters from the collected abnormal information, and matching the key characters in an abnormal information base. If the matching is successful, extracting bug single numbers of the relevant problems of the problems corresponding to the collected abnormal information, and indicating relevant responsible persons; and if the abnormal information is not matched with the abnormal information, adding the collected abnormal information into an abnormal information base as newly found abnormal information.
The method for acquiring abnormal information during main/standby switching in a main control VSM environment according to the embodiment of the present disclosure further includes: and when the captured abnormal information is judged to be abnormal information of the non-hang-up problem type, collecting the captured abnormal information and processing the captured abnormal information.
More specifically, if there is a non-hang-up problem, such as a failure of a startup item in the startup process, redundant prompt information in the startup process, a prompt err item in the startup process, etc., the main/standby switching is continued to find a problem as much as possible. When the script is executed, most of the time is when the tester takes a rest or executes the script, the tester can perform other test work, the running of the script cannot be focused on all the time, and the script is selected to be continuously executed for more problems to be found.
According to the method for acquiring abnormal information during main/standby switching in a multi-master-control VSM environment of the embodiment of the present disclosure, the method further includes: when the first main/standby control board switching, the second main/standby control board switching or the main/standby frame switching is judged to be completed and no abnormal information is captured, the screen displays the restart times passed in the main/standby switching process and the system time when the main/standby switching is completed.
According to the method for acquiring the abnormal information during the main/standby switching in the multi-main control VSM environment, the number of times of the main/standby switching can be a limited number or an infinite number.
More specifically, in the present solution, the script is stopped when the hang-up problem occurs, because the hang-up problem is a fatal problem and needs to be solved first. The test personnel can execute the script after setting up the test environment, and the test personnel can choose to limit the execution times of the script or can circulate infinitely.
Fig. 2 is a detailed flowchart illustrating a method for acquiring exception information during active/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure.
As shown in fig. 2, in step S202, the first main control board is restarted.
In step S204, abnormality information is captured. Specifically, capturing abnormal information in the restarting process of the first main control board, and capturing the abnormal information when the interface board and the exchange network board are started while capturing the abnormal information in the restarting process of the first main control board.
In step S206, it is determined whether or not abnormality information is captured.
If it is determined in step S206 that the abnormality information has been captured, the process proceeds to step S212. In step S212, the "abnormality in activation" is presented, and abnormality information is collected.
In step S214, it is determined whether the collected abnormality information is the hang-up problem abnormality information.
If it is determined in step S214 that the collected abnormality information is the hang-up problem abnormality information, the process proceeds to step S216. In step S216, abnormality information positioning is performed.
If it is determined in step S214 that the result of determining whether the collected abnormal information is the hang-up problem abnormal information is "no", the process reenters the subsequent steps of step S204 and step S204 until the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, and then the main-standby switching is ended after the abnormal information is located, or the abnormal information is not captured when the first main control board is restarted, and the process enters step S218 after the main-standby switching is completed, and a new main control board is restarted.
If it is determined in step S206 that the abnormality information has been captured, the process proceeds to step S208. In step S208, it is determined whether the main/standby switching is completed. If it is determined in step S208 that the result of the main/standby switching is yes, the process proceeds to step S210. In step S210, the screen redisplays the restart end number and the system time. If it is determined in step S208 that the result of completing the main-standby switching is "no", step S204 and subsequent steps are re-entered until the exception information is captured and the captured exception information is the hang-up problem exception information, and the main-standby switching is ended after the exception information is located, or the exception information is not captured when the first main control board is restarted and the main-standby switching is completed, and then step S218 is entered to restart the new main control board.
In step S218, the second main control board is restarted.
In step S220, abnormality information is captured. Specifically, capturing abnormal information in the restarting process of the second main control board, and capturing the abnormal information when the interface board and the exchange network board are started while capturing the abnormal information in the restarting process of the second main control board.
In step S222, it is determined whether or not abnormality information is captured.
If it is determined in step S222 that the result of capturing the abnormality information is yes, the flow proceeds to step S228. In step S228, "abnormality is activated" is presented, and abnormality information is collected.
In step S230, it is determined whether the collected abnormality information is the hang-up question abnormality information.
If it is determined in step S230 that the collected abnormality information is the hang-up problem abnormality information, the process proceeds to step S232. In step S232, abnormality information positioning is performed.
If the result of determining whether the collected abnormal information is the hang-up problem abnormal information in step S230 is "no", the process re-enters the steps of step S220 and step S220, until the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, the process locates the abnormal information and then the main-standby switching is ended, or the process enters step S234 after the abnormal information is not captured and the main-standby switching is completed when the second main control board is restarted, and the whole frame of the main frame is restarted.
If it is determined in step S222 that the abnormality information has been captured, the process proceeds to step S224. In step S224, it is determined whether the main/standby switching is completed. If it is determined in step S224 that the result of the main/standby switching is yes, the process proceeds to step S226. In step S226, the screen redisplays the restart end number and the system time. If it is determined in step S224 that the result of completing the main/standby switching is "no", step S220 and subsequent steps are re-entered until the exception information is captured and the captured exception information is the hang-up problem type exception information, the main/standby switching is terminated after the exception information is located, or the exception information is not captured when the second main control board is restarted and the main/standby switching is completed, and then step S234 is entered to restart the entire main frame.
In step S234, the main frame is restarted.
In step S236, abnormality information is captured. Specifically, capturing abnormal information in the restarting process of the main frame and capturing the abnormal information in the restarting process of the main control board of the main frame, and capturing the abnormal information when the interface board and the exchange network board are started.
In step S238, it is determined whether or not abnormality information is captured.
If it is determined in step S238 that the result of capturing the abnormality information is yes, the routine proceeds to step S244. In step S244, the message "abnormal activation" is presented, and abnormality information is collected.
In step S246, it is determined whether or not the collected abnormality information is the hang-up problem abnormality information.
If it is determined in step S246 that the collected abnormality information is the hang-up problem abnormality information, the process proceeds to step S248. In step S248, abnormality information positioning is performed.
If the result of determining in step S246 whether the collected abnormal information is the hang-up problem abnormal information is "no", the process re-enters the subsequent steps of step S236 and step S236 until the abnormal information is captured and the captured abnormal information is the hang-up problem abnormal information, and then the main-standby switching is ended after the abnormal information is located, or the main-standby switching is re-entered into step S202 after the abnormal information is not captured and the main-standby switching is completed when the main frame is restarted, and the first main control board is restarted.
If it is determined in step S238 that the abnormality information has been captured, the process proceeds to step S240. In step S240, it is determined whether the main/standby switching is completed. If it is determined in step S240 that the result of the main/standby switching is yes, the process proceeds to step S242. In step S242, the screen redisplays the restart end number and the system time. If it is determined in step S240 that the result of completing the main-standby switching is "no", step S236 and subsequent steps are re-entered until the exception information is captured and the captured exception information is the hang-up problem exception information, and then the main-standby switching is ended, or the exception information is not captured when the main frame is restarted, and after this time the main-standby switching is completed, step S202 is entered, and the first main control board is restarted.
Fig. 3 is a schematic diagram of a device for acquiring exception information during active/standby switching in a multi-master VSM environment according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus for acquiring abnormal information during active/standby switching in a multi-master VSM environment according to the embodiment of the present disclosure includes: a main/standby switching times presetting component 302, a first main/standby control board switching component 304, a second main/standby control board switching component 306, a main/standby frame switching component 308, an abnormal information processing component 310, and a playback information component 312.
The main/standby switching frequency presetting component 302 is configured to preset main/standby switching frequencies; the first main/standby control board switching component 304 is configured to restart the first main control board of the main frame, capture abnormal information of the first main control board in the restart process and abnormal information of the interface board and the switching network board of the main frame in the start process, after capturing the abnormal information, collect the captured abnormal information, stop main/standby switching when it is determined that the captured abnormal information is the hang-up problem abnormal information, and locate the captured abnormal information; a second main/standby control board switching component 306, configured to restart the second main control board of the main frame, capture abnormal information of the second main control board in the restart process and abnormal information of the interface board and the switching network board of the main frame in the start process, collect the captured abnormal information after capturing the abnormal information, stop the main/standby switching when determining that the captured abnormal information is abnormal information of the hang-up problem type, and locate the captured abnormal information; the main/standby frame switching component 308 is configured to restart the main frame, capture abnormal information of the main frame in the restarting process and abnormal information of the interface board and the switching network board of the main frame in the starting process, collect the captured abnormal information after capturing the abnormal information, stop the main/standby switching when judging that the captured abnormal information is the hang-up problem type abnormal information, and locate the captured abnormal information; and repeating the first main/standby control board switching, the second main/standby control board switching and the main/standby frame switching to complete one main/standby switching until the main/standby switching is stopped or the actual main/standby switching times are equal to the main/standby switching times when the captured abnormal information is the hang-up problem abnormal information.
According to the apparatus for acquiring abnormal information during active/standby switching in a multi-master VSM environment of the embodiment of the present disclosure, the apparatus further includes: the abnormal information processing component 310 is used for extracting keywords of the collected abnormal information, matching the collected abnormal information in an abnormal information base containing historical abnormal information based on the extracted keywords, returning the abnormal information associated with the collected abnormal information and the personnel information corresponding to the associated abnormal information if the matching is successful, and adding the collected abnormal information into the abnormal information base as newly found abnormal information if the matching is unsuccessful.
According to the apparatus for acquiring abnormal information during active/standby switching in a multi-master VSM environment in the embodiment of the present disclosure, the abnormal information processing component 310 is further configured to: and when the captured abnormal information is judged to be abnormal information of the non-hang-up problem type, collecting the captured abnormal information and processing the captured abnormal information.
According to this device that embodiment of this disclosure obtains unusual information when the master spare is switched under many master control VSM environment, it still includes: the back display information component 312 is configured to display the number of times of restarting the main/standby switching process and the system time when the main/standby switching is completed on the screen when it is determined that the first main/standby switching, the second main/standby switching, or the main/standby frame switching is completed and the abnormal information is not captured.
According to the device for acquiring the abnormal information during the main/standby switching in the multi-main control VSM environment, the main/standby switching times can be limited or infinite.
In summary, by using the method and the device for acquiring exception information during main/standby switching in a multi-master VSM environment, exception information during main/standby switching in the multi-master VSM environment can be automatically acquired, including acquiring exception information of device switching network board and interface board start-up. The method comprises the steps of obtaining hanging dead question class abnormal information, recording the non-hanging dead question class abnormal information and continuously carrying out main-standby switching to find problems as much as possible; in addition, the known abnormal information occurring at the restart is summarized by creating a problem library. The automatic operation can save human resources and improve efficiency; problems can be found in non-working time, so that the management of the test time becomes more flexible; the applicable test environment is wider; the ability to detect problems is greater.
Generally speaking, when the main/standby switch is performed, after the script is started, the main master control is restarted, and abnormal information is captured in the restarting process of the main master control; capturing the abnormal information when the interface board and the exchange network board are started while capturing the master control abnormal information; and prompting that the abnormal information is started after the abnormal information is captured, collecting the abnormal information, and distinguishing the abnormal information after the abnormal information is collected. If the problem is the hang-up problem, if the main control or the serial port of the board card prompts the staring write exception and restarts, the version cannot be identified or is empty, kdb is entered in the Starting process of the equipment or the board card, and the like, stopping the script and positioning the hang-up problem; if the problem is not hung, for example, a starting item fails in the starting process, redundant prompt information exists in the starting process, a prompt err item exists in the starting process, and the like, the script is continuously executed to find the problem as much as possible. When the script is executed, most of time is when the tester takes a rest or executes the script, the tester can perform other test work, at this time, the tester cannot pay attention to the running of the script all the time, and in order to find more problems, the script is selected to be continuously executed according to the scheme for the problem of non-hang-up. For exception information, a problem library is created that is a summary of known exception information that occurs at restart. After abnormal information occurs in the switching process, the abnormal information is collected; and then extracting key characters from the abnormal information, matching the key characters in a question bank, and if the matching is successful, extracting bug single numbers of related problems and indicating related responsible persons. If there is no match, the exception information is added to the question bank as a new finding. And after the main/standby switching is completed, the completion times and time of the back display starting are counted. And then the new main master controller is restarted, and the steps are repeated.
When the main frame is switched between the main frame and the standby frame, the main frame and the standby frame are switched after the main frame completes the two times of main-standby switching. Generally, the main member device is called as a main frame, and the standby member device is called as a standby frame. The main frame and standby frame switching is to restart the whole main frame, and after the main frame is restarted, the standby frame becomes the main frame. The flow of capturing the abnormal information is consistent with the main/standby switching. Firstly, the whole main frame is restarted, and abnormal information is captured in the restarting process. And capturing the abnormal information when the interface board and the exchange screen board are started while capturing the master control abnormal information. And prompting that the abnormal information is started after the abnormal information is captured, collecting the abnormal information, and distinguishing the abnormal information after the abnormal information is collected. The processing of the abnormal information is consistent with the main-standby switching. And after the main frame and the standby frame are switched, the completion times and time of the back display starting are obtained. Followed by a new cycle. For the script to be stopped when the hang-up problem occurs, the hang-up problem is a fatal problem and needs to be solved firstly. The tester can execute the script after building a test environment, and the tester can choose to limit the execution times of the script and can also circulate infinitely.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiment of the present application.
Exemplary embodiments of the present application are specifically illustrated and described above. It is to be understood that the application is not limited to the details of construction, arrangement or method of operation set forth herein; on the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for obtaining abnormal information during main/standby switching under a multi-main control VSM environment comprises the following steps:
presetting main/standby switching times;
the method comprises the steps that a main control board and a standby control board are switched for the first time, the first main control board of a main frame is restarted, abnormal information of the first main control board in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process are captured, after the abnormal information is captured, the captured abnormal information is collected, when the captured abnormal information is judged to be abnormal information of the hang-up problem type, the main control board and the standby control board are switched, and the captured abnormal information is positioned;
second main-standby control board switching, restarting a second main control board of the main frame, capturing abnormal information of the second main control board in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main-standby switching when judging that the captured abnormal information is abnormal information of the hang-up problem type, and positioning the captured abnormal information;
switching the main frame and the standby frame, restarting the main frame, capturing abnormal information of the main frame in the restarting process and abnormal information of an interface board and a switching screen board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main frame and the standby switching when judging that the captured abnormal information is the problem-hanging abnormal information, and positioning the captured abnormal information;
and repeating the first main-standby control board switching step, the second main-standby control board switching step and the main-standby frame switching step to complete one main-standby switching until the main-standby switching is stopped when the abnormal information is captured in the first main-standby control board switching step, the second main-standby control board switching step or the main-standby frame switching step and the captured abnormal information is the problem hanging type abnormal information or the actual main-standby switching is stopped when the actual main-standby switching times is equal to the main-standby switching times.
2. The method of claim 1 for obtaining exception information during active/standby switching in a multi-master VSM environment, further comprising:
an abnormality information processing which extracts a keyword from the collected abnormality information and matches the extracted keyword in an abnormality information base containing history abnormality information,
if the matching is successful, returning the abnormal information associated with the collected abnormal information and the personnel information corresponding to the associated abnormal information, and
and if the matching is unsuccessful, adding the collected abnormal information into an abnormal information base as newly-found abnormal information.
3. The method of claim 2 for obtaining exception information during active/standby switching in a multi-master VSM environment, further comprising:
and when the captured abnormal information is judged to be abnormal information of the non-hang-up problem type, collecting the captured abnormal information and processing the captured abnormal information.
4. The method according to claim 1, further comprising:
when the first main-standby control board switching, the second main-standby control board switching or the main-standby frame switching is judged to be completed and abnormal information is not captured, the number of times of restarting the main-standby switching process and the system time when the main-standby switching is completed are displayed on a screen.
5. The method according to claim 1, wherein the number of times of active/standby switching may be a finite number or an infinite number.
6. A device for obtaining abnormal information during main/standby switching under a multi-main control VSM environment comprises:
the main/standby switching times presetting component is used for presetting main/standby switching times;
the first main/standby control board switching assembly is used for restarting a first main control board of a main frame, capturing abnormal information of the first main control board in the restarting process and abnormal information of an interface board and an exchange network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is the hang-up problem abnormal information, and positioning the captured abnormal information;
the second main/standby control board switching component is used for restarting a second main control board of the main frame, capturing abnormal information of the second main control board in the restarting process and abnormal information of an interface board and an exchange screen board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is abnormal information of the hang-up problem type, and positioning the captured abnormal information;
the main/standby frame switching assembly is used for restarting the main frame, capturing abnormal information of the main frame in the restarting process and abnormal information of an interface board and a switching network board of the main frame in the starting process, collecting the captured abnormal information after capturing the abnormal information, stopping the main/standby switching when judging that the captured abnormal information is the hang-up problem type abnormal information, and positioning the captured abnormal information;
and repeating the first main-standby control board switching, the second main-standby control board switching and the main-standby frame switching to complete the main-standby switching until the main-standby switching is stopped or the actual main-standby switching times are equal to the main-standby switching times when the captured abnormal information is the hang-up question abnormal information.
7. The apparatus of claim 6, further comprising:
an abnormality information processing component for performing keyword extraction on the collected abnormality information and performing matching in an abnormality information base containing historical abnormality information based on the extracted keywords,
if the matching is successful, returning the abnormal information associated with the collected abnormal information and the personnel information corresponding to the associated abnormal information, and
and if the matching is unsuccessful, adding the collected abnormal information into an abnormal information base as newly-found abnormal information.
8. The apparatus of claim 7 for obtaining exception information during active/standby switching in a multi-master VSM environment, wherein the exception information processing component is further configured to:
and when the captured abnormal information is judged to be the abnormal information of the non-hang-up question, collecting the captured abnormal information and processing the captured abnormal information.
9. The apparatus according to claim 6, further comprising:
and the back display information component is used for displaying the number of times of restarting and the system time when the main/standby switching process is finished and the main/standby switching is finished on the screen when judging that the first main/standby switching, the second main/standby switching or the main/standby frame switching is finished and abnormal information is not captured.
10. The device according to claim 6, wherein the number of times of switching between main and standby devices can be limited or infinite.
CN202210451829.8A 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching Active CN114785673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210451829.8A CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210451829.8A CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Publications (2)

Publication Number Publication Date
CN114785673A true CN114785673A (en) 2022-07-22
CN114785673B CN114785673B (en) 2023-08-22

Family

ID=82433520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210451829.8A Active CN114785673B (en) 2022-04-26 2022-04-26 Method and device for acquiring abnormal information during active-standby switching

Country Status (1)

Country Link
CN (1) CN114785673B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722627A (en) * 2004-07-13 2006-01-18 华为技术有限公司 A method and device for realizing switching between main and backup units in communication equipment
US20060069946A1 (en) * 2004-09-16 2006-03-30 Krajewski John J Iii Runtime failure management of redundantly deployed hosts of a supervisory process control data acquisition facility
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
US20090216910A1 (en) * 2007-04-23 2009-08-27 Duchesneau David D Computing infrastructure
CN105959128A (en) * 2015-08-11 2016-09-21 杭州迪普科技有限公司 Fault processing method and device and network device
WO2016177231A1 (en) * 2015-07-10 2016-11-10 中兴通讯股份有限公司 Dual-control-based active-backup switching method and device
CN106533736A (en) * 2016-10-13 2017-03-22 杭州迪普科技股份有限公司 Network device reboot method and apparatus
US20180367371A1 (en) * 2017-06-16 2018-12-20 Cisco Technology, Inc. Handling controller and node failure scenarios during data collection
CN113162808A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Storage link fault processing method and device, electronic equipment and storage medium
CN114138534A (en) * 2021-12-01 2022-03-04 斑马网络技术有限公司 Recovery and positioning method, device and equipment for system hang-up fault and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722627A (en) * 2004-07-13 2006-01-18 华为技术有限公司 A method and device for realizing switching between main and backup units in communication equipment
US20060069946A1 (en) * 2004-09-16 2006-03-30 Krajewski John J Iii Runtime failure management of redundantly deployed hosts of a supervisory process control data acquisition facility
US20090216910A1 (en) * 2007-04-23 2009-08-27 Duchesneau David D Computing infrastructure
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
WO2016177231A1 (en) * 2015-07-10 2016-11-10 中兴通讯股份有限公司 Dual-control-based active-backup switching method and device
CN105959128A (en) * 2015-08-11 2016-09-21 杭州迪普科技有限公司 Fault processing method and device and network device
CN106533736A (en) * 2016-10-13 2017-03-22 杭州迪普科技股份有限公司 Network device reboot method and apparatus
US20180367371A1 (en) * 2017-06-16 2018-12-20 Cisco Technology, Inc. Handling controller and node failure scenarios during data collection
CN113162808A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Storage link fault processing method and device, electronic equipment and storage medium
CN114138534A (en) * 2021-12-01 2022-03-04 斑马网络技术有限公司 Recovery and positioning method, device and equipment for system hang-up fault and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RUIPENG LI,HAI JIANG: "Adaptive and Fault Tolerant Simulation of Relativistic Particle Transport with Data-Level Checkpointing" *
赵海荣: "基于802.1BR协议堆叠系统的设计与实现" *

Also Published As

Publication number Publication date
CN114785673B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN108600029B (en) Configuration file updating method and device, terminal equipment and storage medium
CN105611330B (en) Smart television maintenance method and system, server and mobile terminal
CN107451040B (en) Method and device for positioning fault reason and computer readable storage medium
US10037238B2 (en) System and method for encoding exception conditions included at a remediation database
US20220050765A1 (en) Method for processing logs in a computer system for events identified as abnormal and revealing solutions, electronic device, and cloud server
CN113825164A (en) Network fault repairing method and device, storage medium and electronic equipment
CN112306802A (en) Data acquisition method, device, medium and electronic equipment of system
CN115033419B (en) Method and system for realizing hardware fault self-healing
CN112671586A (en) Automatic migration and guarantee method and device for service configuration
CN111090537B (en) Cluster starting method and device, electronic equipment and readable storage medium
CN115098294B (en) Abnormal event processing method, electronic equipment and management terminal
CN114785673A (en) Method and device for acquiring abnormal information during main/standby switching under multi-master control VSM environment
US20090083747A1 (en) Method for managing application programs by utilizing redundancy and load balance
CN110333964A (en) Abnormal log processing method and processing device, electronic equipment, storage medium
CN112596750B (en) Application testing method and device, electronic equipment and computer readable storage medium
CN115729727A (en) Fault repairing method, device, equipment and medium
CN113076210A (en) Server fault diagnosis result notification method, system, terminal and storage medium
CN113127029A (en) Firmware updating method and device, electronic equipment and storage medium
CN112818204A (en) Service processing method, device, equipment and storage medium
CN112035295A (en) Virtual machine crash event processing method, system, terminal and storage medium
RU2187835C1 (en) Computer maintenance method and system
CN112003727A (en) Multi-node server power supply testing method, system, terminal and storage medium
Kandan et al. A Generic Log Analyzer for automated troubleshooting in container orchestration system
CN113608750B (en) Deployment method and device of monitoring component, computer equipment and storage medium
CN113259531B (en) Automatic voice service operation and maintenance method and system for call center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant