WO2018001326A1 - Method and device for acquiring fault information - Google Patents

Method and device for acquiring fault information Download PDF

Info

Publication number
WO2018001326A1
WO2018001326A1 PCT/CN2017/090871 CN2017090871W WO2018001326A1 WO 2018001326 A1 WO2018001326 A1 WO 2018001326A1 CN 2017090871 W CN2017090871 W CN 2017090871W WO 2018001326 A1 WO2018001326 A1 WO 2018001326A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
service
otn
information
data
Prior art date
Application number
PCT/CN2017/090871
Other languages
French (fr)
Chinese (zh)
Inventor
刘庆明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018001326A1 publication Critical patent/WO2018001326A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing

Definitions

  • the present disclosure relates to the field of communications technologies, and, for example, to a fault information acquisition method and apparatus.
  • OTN Optical Transport Network
  • the first type when the faulty environment can be retained, the remote access environment or maintenance personnel directly go to the engineering site to operate, and use the positioning interface provided by the service board device to perform loopback testing. Each loopback point is gradually advanced to analyze the fault point. Read whether the monitoring register reserved by the service board is abnormal with the normal service. Check whether the monitoring storage variables reserved by the board are consistent with the expected ones. Analyze the alarm status of the board alarm and its associated board. However, these operations are performed in the actual OTN network, and the relevant staff, such as maintenance personnel or R&D personnel, directly operate the existing network environment, which may affect the normal operation of the service, and the scattered monitoring registers and software variables require corresponding work. Personnel with proficiency in professional competence;
  • the related technology has certain defects for the fault location of the OTN device.
  • the present disclosure provides a method and a device for acquiring fault information, which can solve the defect that the fault location of the OTN device in the related art is easy to affect the running stability of the existing network service and the timeliness of the positioning is poor.
  • the embodiment provides a method for acquiring fault information, which is applied to the server, and may include:
  • the service node is a service node of the preset service flow model of the first OTN board;
  • the monitoring information includes data of the service node at the moment when the first alarm occurs, and performance parameters, sample data.
  • the range is the data range and performance parameter range of the preset service node data;
  • the first OTN board corresponding to the first alarm is obtained, and the first OTN board corresponding to the first alarm is obtained, and the following:
  • the alarm code of the alarm is obtained, and the alarm code carries the time information, the location information, and the alarm name of the alarm;
  • the service node includes: a service split node, a service encapsulation node, and a hardware node.
  • the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
  • the method further includes:
  • uploading the first alarm code, the monitoring information, and the sample data range to the cloud server including:
  • the first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
  • uploading the error rate and the packet information to the cloud server including:
  • the error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
  • the embodiment further provides a method for acquiring fault information, which is applied to the client, and may include:
  • the client When the server detects that the OTN service sends an alarm, the client obtains the first alarm code of the first alarm from the cloud server, and the monitoring information corresponding to the first alarm code and the sample data range; the monitoring information is the first The data of the service node of the preset service flow model of the first OTN board corresponding to the alarm and the performance parameter at the moment when the first alarm occurs, and the sample data range is the data range and the performance parameter range of the preset service node data;
  • the first alarm code, the monitoring information, and the sample data range and the preset business flow model are displayed through the visual view.
  • displaying the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view including:
  • the first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
  • the method comprises:
  • the client When the server detects that the BER of the OTN service exceeds a preset value, the client obtains the second OTN board corresponding to the error rate from the cloud server, and the service of the preset service flow model of the second OTN board.
  • Packet information of the node the packet information includes: the number of received data packets and the number of transmitted data packets;
  • bit error rate and packet information are displayed in a visual view.
  • bit error rate and the packet information are displayed through the visual view, including:
  • the error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
  • the embodiment further provides a fault information obtaining device, which is applied to the server, and may include:
  • the alarm monitoring module is configured to: when the OTN service of the optical transport network is sent to send an alarm, obtain the first alarm code of the first alarm, and determine the first OTN board corresponding to the first alarm;
  • An information acquisition module configured to acquire monitoring information of a service node and a sample data range, and a service
  • the node is a service node of the preset service flow model of the first OTN board;
  • the monitoring information includes data of the service node at the moment when the first alarm occurs and performance parameters, and the sample data range is a data range of the preset service node data and Range of performance parameters;
  • the data uploading module is configured to upload the first alarm code, the monitoring information, and the sample data range to the cloud server.
  • the alarm monitoring module includes:
  • the alarm code acquisition sub-module is configured to acquire an alarm code of the alarm when the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, and the alarm code carries the time information, the location information, and the alarm name of the alarm;
  • a first alarm determining submodule configured to determine, according to the alarm code, a first alarm that occurs at an alarm time and a first alarm code of the first alarm
  • the board determining submodule is configured to determine, according to the first alarm code, the first OTN board corresponding to the first alarm.
  • the service node includes: a service split node, a service encapsulation node, and a hardware node.
  • the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
  • the device further includes:
  • the error monitoring module is configured to obtain a second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value
  • a packet information obtaining module configured to acquire data packet information of a service node of a preset service flow model of the second OTN board, where the data packet information includes: a quantity of the received data packet and a quantity of the sent data packet;
  • the error information uploading module is configured to upload the error rate and the data packet information to the cloud server.
  • the data upload module is used to:
  • the first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
  • the embodiment further provides a fault information obtaining device, which is applied to the client, and may include:
  • the data acquisition module is configured to: when the server detects that the OTN service sends an alarm, the first alarm code of the first alarm is obtained from the cloud server, and the monitoring signal is corresponding to the first alarm code. And the sample data range; the monitoring information is the data of the service node of the preset service flow model of the first OTN board corresponding to the first alarm, and the performance parameter at the moment when the first alarm occurs, and the sample data range is a preset service.
  • the view display module is configured to display the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view.
  • the view display module is set to:
  • the first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
  • the device further includes:
  • the erroneous display module is configured to obtain the second OTN board corresponding to the BER and the preset service flow of the second OTN board from the cloud server when the error rate of the OTN service exceeds a preset value.
  • Packet information of the service node of the model the packet information includes: the number of received data packets and the number of transmitted data packets;
  • bit error rate and packet information are displayed in a visual view.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
  • the embodiment further provides a server device, the server device comprising one or more processors, a memory and one or more programs, the one or more programs being stored in the memory when being processed by one or more When the device is executed, the corresponding fault information acquisition method described above is executed.
  • the embodiment also provides a client device including one or more processors, a memory, and one or more programs, the one or more programs being stored in the memory when processed by one or more When the device is executed, the corresponding fault information acquisition method described above is executed.
  • the embodiment further provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer Having the computer perform any of the methods described above.
  • the method and device for acquiring the fault information when monitoring the OTN service to send an alarm, the monitoring information and the sample of each service node of the first OTN board are obtained according to the first OTN board where the alarm time is generated according to the alarm code.
  • the data range is uploaded to the cloud server; the monitoring information of the first OTN board where the alarm time occurs at the top of the cloud server and the sample data range are obtained from the cloud service.
  • the data acquired by the device matches the preset service flow model, and displays the service node data and the performance parameter in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can fault according to the measurement information and the sample data range.
  • the data is analyzed; the disclosure directly obtains the fault information corresponding to the fault starting point through the server, avoids checking one by one according to a large amount of historical data, improves the speed of the positioning fault, and does not require the staff to go to the site environment operation, thereby reducing the fault location process.
  • FIG. 1 is a schematic flowchart of a method for acquiring fault information provided by a first embodiment
  • FIG. 2 is a schematic structural diagram of a fault information acquiring apparatus according to a second embodiment
  • FIG. 3 is a schematic flowchart of a method for acquiring fault information provided by a third embodiment
  • FIG. 4 is a schematic structural diagram of a fault information acquiring apparatus according to a fourth embodiment
  • FIG. 5 is a schematic diagram of an application scenario provided by the fifth embodiment
  • FIG. 6 is a schematic diagram of a system level model provided by the fifth embodiment.
  • FIG. 7 is a schematic diagram of a single board level model provided by the fifth embodiment.
  • FIG. 8 is a schematic structural diagram of hardware of a server device according to Embodiment 5;
  • FIG. 9 is a schematic structural diagram of hardware of a client device according to Embodiment 5.
  • this embodiment provides a method for acquiring fault information, which is applied to a server, and may include steps 110-130.
  • step 110 when the OTN service of the optical transport network is sent to generate an alarm, the first alarm code of the first alarm is obtained, and the first OTN board corresponding to the first alarm is determined.
  • the OTN service has multiple alarm monitoring points in the service topology.
  • the alarm monitoring point is usually set on the service node.
  • the alarm monitoring point generates an alarm code after the alarm is generated.
  • the corresponding OTN board can be determined according to the alarm code.
  • step 120 the monitoring information of the service node and the sample data range are obtained, where the service node is a service node of a preset service flow model of the first OTN board; and the monitoring information includes the service node at the first The data of the moment when the alarm occurs and the performance parameter.
  • the sample data range is the data range and performance parameter range of the preset service node data.
  • the monitoring information of the preset service node of the first OTN board is obtained, where the monitoring information includes data and performance parameters transmitted by the service node at the moment when the first alarm occurs.
  • the performance parameters include: a clock frequency, a peripheral chip state, an optical power, an optical module bias voltage, and a bias current; and the sample data range is a data range corresponding to the data of the service node in the monitoring information, that is, a normal value range. And the range of performance parameters, which is the normal range of performance parameters.
  • a monitoring point is set in the service node of the sub-service flow model to obtain monitoring information of the node.
  • the service node includes: a service split node, a service encapsulation node, and a hardware node.
  • step 130 the first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server.
  • the first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server, so that the fault information can be obtained manually by the maintenance personnel, and the maintenance personnel or the R&D personnel can directly operate in the live network environment to avoid affecting the operation of the normal service.
  • the risk of the stability of the existing network service during the fault location process is reduced, and the cumbersome operation of the site is not required, and the speed of fault location is improved.
  • step 110 includes the following steps:
  • the alarm code of the alarm is obtained.
  • the alarm code carries the time information, the location information, and the alarm name of the alarm.
  • the time information is used to determine the sequence of the alarms generated by each alarm monitoring point.
  • the location information is used to determine the OTN board corresponding to the alarm. The maintenance personnel directed to check the cause of the alarm.
  • the first alarm that occurs at the top of the alarm time and the first alarm code of the first alarm are determined.
  • the first OTN board corresponding to the first alarm code of the first alarm is the fault starting point.
  • the fault origin can be avoided based on a large amount of historical data.
  • the first alarm is generated, and the first alarm corresponding to the first alarm is obtained, according to the time information of each alarm code, and the first alarm code corresponding to the first alarm is obtained, so as to be based on the first alarm. Encoding obtains relevant information of the first alarm.
  • the first OTN board corresponding to the first alarm is determined according to the first alarm code.
  • the first OTN board in which the alarm time occurs is determined according to the location information in the first alarm code.
  • uploading the first alarm code, the monitoring information, and the sample data range to the cloud server including:
  • the first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
  • the monitoring information and the sample data range need to be encrypted and uploaded to the cloud server.
  • the method further includes:
  • the OTN board corresponding to the error rate can be determined when the BER of the OTN service exceeds the first preset value.
  • the second OTN board can be determined when the BER of the OTN service exceeds the first preset value.
  • the bit error rate exceeds a certain value (greater than the above preset value)
  • the alarm will be generated.
  • the second OTN list needs to be checked. Whether the packet loss occurs on the service node of the default service flow model of the board; if a packet loss occurs, check the corresponding service node to avoid system alarm caused by packet loss.
  • uploading the error rate and the packet information to the cloud server including:
  • the error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
  • the second OTN board is encrypted by the second preset encryption algorithm to the corresponding error rate and the packet information, and then uploaded to the cloud server.
  • the first OTN board is generated according to the alarm code, and the monitoring information and the sample data range of each service node of the first OTN board are uploaded to the cloud server.
  • the maintenance personnel analyzes the fault data according to the measurement information and the sample data range; directly obtains the fault information corresponding to the fault starting point through the server, and avoids according to a large number of Historical data is checked one by one, which improves the speed of locating faults and eliminates the need for staff to go to the on-site environment to reduce the risk of impact on the operational stability of the existing network during fault location.
  • the embodiment provides a fault information acquiring apparatus, which is applied to a server, and may include:
  • the alarm monitoring module 201 is configured to acquire the first alarm code of the first alarm and determine the first OTN board corresponding to the first alarm when the OTN service of the optical transmission network is detected.
  • the OTN service has multiple alarm monitoring points in the service topology, and the alarm monitoring point can be set on the service node. After the alarm monitoring point sends an alarm, an alarm code is generated, and the corresponding OTN board can be determined according to the alarm code. .
  • the information obtaining module 202 is configured to acquire the monitoring information of the service node and the sample data range, where the service node is a service node of the preset service flow model of the first OTN board; the monitoring information includes the time when the service node is at the time when the first alarm occurs.
  • Data and performance parameters, the sample data range is the data range and performance parameter range of the preset service node data.
  • the monitoring information of the preset service node of the first OTN board is obtained.
  • the monitoring information includes data transmitted by the service node at the moment when the first alarm occurs, and performance parameters.
  • the performance parameters include: clock frequency and periphery.
  • the chip state, optical power, optical module bias voltage, and bias current performance parameters include temperature, incoming and outgoing optical power, and bias current;
  • the sample data range is the data range corresponding to the service node data in the monitoring information, that is, the normal value range, And a range of performance parameters.
  • a monitoring point is set in the service node of the sub-service flow model to obtain monitoring information of the node.
  • the data uploading module 203 is configured to upload the first alarm code, the monitoring information, and the sample data range to the cloud server.
  • the first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server, so that the fault information can be obtained manually by the maintenance personnel, and the maintenance personnel or the R&D personnel can directly operate in the live network environment to avoid affecting the operation of the normal service and reduce the operation.
  • the alarm monitoring module 201 includes:
  • the well code acquisition sub-module is set to be used in the service topology of the OTN service of the optical transport network.
  • the alarm code of the alarm is obtained, and the alarm code carries the time information, the location information, and the alarm name of the alarm occurrence;
  • a first alarm determining submodule configured to determine, according to the alarm code, a first alarm that occurs at an alarm time and a first alarm code of the first alarm
  • the board determining submodule is configured to determine, according to the first alarm code, the first OTN board corresponding to the first alarm.
  • the service node includes: a service split node, a service encapsulation node, and a hardware node.
  • the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
  • the device further includes:
  • the error monitoring module is configured to obtain a second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value
  • a packet information obtaining module configured to acquire data packet information of a service node of a preset service flow model of the second OTN board, where the data packet information includes: a quantity of the received data packet and a quantity of the sent data packet;
  • the error information uploading module is configured to upload the error rate and the data packet information to the cloud server.
  • the data uploading module 203 is configured to:
  • the first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
  • the error information uploading module is set to:
  • the error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
  • the first OTN board is generated according to the alarm code, and the monitoring information and the sample data range of each service node of the first OTN board are uploaded to the cloud.
  • the server enables the maintenance personnel to remotely analyze the fault data according to the monitoring information and the sample data range.
  • the server obtains the fault information corresponding to the fault starting point directly, avoids troubleshooting according to a large amount of historical data, and improves the speed of the positioning fault, and does not need to The staff goes to the on-site environment to reduce the risk of impact on the operational stability of the existing network during the fault location process.
  • the embodiment provides a method for acquiring fault information, which is applied to the client, and may include steps 310-320.
  • step 310 when the server detects that the OTN service sends an alarm, the client obtains the first alarm code of the first alarm and the first alarm code from the cloud server that has obtained the first alarm code of the first alarm.
  • the first alarm is an alarm whose alarm time occurs first.
  • the server detects that the OTN service sends an alarm, the first alarm code and the monitoring information and the sample data range that occur in the first alarm are obtained from the cloud server.
  • the server when the server detects that the OTN service sends an alarm, the server uploads the service link data when the fault occurs to the cloud server, and notifies the client that the client obtains the first alarm that occurs at the forefront from the cloud server.
  • the first alarm code and the monitoring information and the sample data range are subjected to subsequent visual view display according to the data information acquired from the cloud server.
  • the OTN service has multiple alarm monitoring points in the service topology.
  • the alarm monitoring point can be set on the service node.
  • the alarm monitoring point generates an alarm code after the alarm is generated.
  • the alarm time is usually on the top OTN board.
  • the first OTN board corresponding to the first alarm code of the first alarm is the fault starting point; directly obtaining the fault starting point can avoid checking one by one according to a large amount of historical data, and reducing the work of the relevant staff. Quantity, improve the speed of positioning faults.
  • step 320 the first alarm code, the monitoring information, and the sample data range and the preset service flow model are displayed through the visual view.
  • the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node, and realize the remote positioning of the OTN device fault, no longer rely on the actual scene of the current network fault, and avoid the cumbersome operation of the alarm data analysis. Improve the speed of positioning faults.
  • step 310 may include:
  • the first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
  • the data obtained from the cloud server is the ciphertext encrypted according to the first preset encryption algorithm. Therefore, the ciphertext needs to be decrypted to obtain the plaintext, and displayed on the service flow model.
  • the method further includes:
  • the client When the server detects that the BER of the OTN service exceeds a preset value, the client obtains the second OTN board corresponding to the error rate from the cloud server, and the service of the preset service flow model of the second OTN board.
  • Packet information of the node the packet information includes: the number of received data packets and the number of transmitted data packets;
  • bit error rate and packet information are displayed in a visual view.
  • the bit error rate exceeds a certain value (greater than the above preset value)
  • the alarm will be generated.
  • the second OTN list needs to be checked. Whether the packet loss occurs on the service node of the default service flow model of the board; if a packet loss occurs, check the corresponding service node to avoid system alarm caused by packet loss.
  • the error rate and the packet information are visually displayed on the service flow model, so that the maintenance personnel can accurately locate the packet loss phenomenon.
  • Business node When the second OTN board corresponding to the bit error rate and the packet information are obtained from the cloud server, the error rate and the packet information are visually displayed on the service flow model, so that the maintenance personnel can accurately locate the packet loss phenomenon.
  • Business node When the second OTN board corresponding to the bit error rate and the packet information are obtained from the cloud server, the error rate and the packet information are visually displayed on the service flow model, so that the maintenance personnel can accurately locate the packet loss phenomenon.
  • bit error rate and the packet information are displayed through the visual view, including:
  • the error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
  • the error rate and the packet information obtained from the cloud server are ciphertexts encrypted according to the second preset encryption algorithm. Therefore, the ciphertext needs to be decrypted to obtain plaintext, and displayed on the service flow model.
  • the monitoring information and the sample data range of the first OTN board whose alarm time occurs at the foremost time are obtained from the cloud server, and the preset data is matched according to the data acquired from the cloud server.
  • the service flow model displays the service node data and performance parameters in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node and realize the remote operation.
  • the OTN device is faulty and no longer depends on the actual scenario of the fault on the live network. The cumbersome operation of the alarm data analysis is avoided.
  • the fault information corresponding to the fault origin is directly obtained from the cloud server through the client, avoiding one by one according to a large amount of historical data. Trouble-shooting improves the speed of locating faults and eliminates the need for staff to go to the on-site environment to reduce the risk of impact on the operational stability of the existing network during fault location.
  • the embodiment provides a fault information obtaining apparatus, which is applied to a client, and may include:
  • the data acquisition module 401 is configured to: when the server detects that the OTN service sends an alarm, the first alarm code of the first alarm is obtained from the cloud server that has obtained the first alarm code of the first alarm, and the first alarm is generated.
  • the corresponding monitoring information and the sample data range are encoded; the monitoring information is the data of the service node of the preset service flow model of the first OTN board corresponding to the first alarm, and the performance parameter, the sample data range The data range and performance parameter range of the preset business node data.
  • the first alarm is an alarm whose alarm time occurs first.
  • the first alarm code, the monitoring information, and the sample data range of the first alarm generated by the cloud server are obtained from the cloud server.
  • the OTN service has multiple alarm monitoring points in the service topology.
  • the alarm monitoring point can be set on the service node.
  • the alarm monitoring point generates an alarm code after the alarm is generated.
  • the alarm time is usually on the top OTN board.
  • the first OTN board corresponding to the first alarm code of the first alarm is the fault starting point; directly obtaining the fault starting point avoids checking one by one according to a large amount of historical data, and reduces the workload of the relevant staff.
  • the view display module 402 is configured to display the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view.
  • the preset service flow model is matched, and the service node data and the performance parameter in the monitoring information are displayed in the visual model corresponding to the sample data range, so that the maintenance personnel perform data according to the visualization model.
  • the faulty service node can be quickly filtered out to achieve remote location OTN device failure, no longer relying on the actual scenario of the existing network fault, avoiding the cumbersome operation of alarm data analysis, and improving the speed of fault location.
  • the view display module 402 is configured to:
  • the device further includes:
  • the erroneous display module is configured to: when the server detects that the OTN service has a BER that exceeds a preset value, the second OTN board corresponding to the BER of the OTN board and the preset of the second OTN board are obtained from the cloud server.
  • Packet information of the service node of the service flow model the packet information includes: the number of received data packets and the number of transmitted data packets;
  • bit error rate and packet information are displayed in a visual view.
  • the error display module is set to:
  • the error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
  • the monitoring information and the sample data range of the first OTN board whose alarm time occurs at the foremost time are obtained from the cloud server, and the preset data is matched according to the data acquired from the cloud server.
  • the service flow model displays the service node data and performance parameters in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node and realize the remote operation.
  • the fault of locating the OTN device is no longer dependent on the actual scenario of the fault on the live network, and the cumbersome process of analyzing the alarm data is avoided.
  • the fault information corresponding to the fault starting point is directly obtained from the cloud server through the client, avoiding troubleshooting according to a large amount of historical data, improving the speed of fault location, and eliminating the need for the staff to go to the site environment for operation, thereby reducing the presence of the fault during the fault location process.
  • the application scenarios shown in Figure 5 mainly include an OTN device, a server, a cloud server, and a client.
  • the visual data model of the OTN board can be established on the server side, and the visualization view is abstracted into a visual view based on the architecture of the OTN board hardware layout, and the monitoring point data on the service flow direction and the service flow is displayed in the visualization view.
  • the board-level hardware includes a client-side service board, a line-side service board, and a cross-board group.
  • the board hardware includes an OTN service board hardware device, an optical module, and a field programmable gate array (Field Programmable Gate). Array, FPGA), framing chip and clock module.
  • the system-level service processing model is abstracted.
  • the model is based on each board and will be a basic board.
  • the alarms of a time point are visualized, that is, the alarms affecting the service, such as the alarms of the optical input port, the optical output port, the dispatch receiving port, the dispatching and sending port, the internal receiving port, the internal sending port, and the backplane related port, and the bit error rate. Show it out.
  • the relevant staff can quickly determine which node is the first out of the traffic on the service link, and what time period has an alarm, etc., without the need for the staff to follow the historical topology from the historical alarm.
  • the screen is in the data.
  • the service board is configured to process the service flow model, and the service flow can be divided into an A-direction service flow and a B-direction service flow, where the A-direction refers to the service from the optical port to the cross-matrix.
  • the direction of the B-direction refers to the service from the cross-matrix to the optical interface.
  • the service flow processed by the board is abstracted into different models. For example, the 10GE service is accessed on the 10GE service. Then, the service flows into the cross-system under the optical channel data unit (ODU2).
  • ODU2 optical channel data unit
  • the traffic processing model of the A-direction service flow in the single board is: 10 optical port (Ge-Lan) ⁇ GFP-F ⁇ ODU2 ⁇ cross system, B direction
  • the traffic flow model cross system is: ODU2 ⁇ GFP-F ⁇ 10Ge-LAN;
  • the 10Ge-Lan ⁇ GFP-F mapping process is implemented in a framer (framer); the GFP-F ⁇ ODU2 process is implemented in the FPGA, and the abstracted service flow model presents 10GE related alarms on the optical port.
  • the framing chip presents an alarm of the Generic Framing Procedure (GFP) layer
  • the FPGA presents an Optical Channel Data Unit (ODU) layer alarm.
  • the optical module partially displays the incoming optical power and is offset. Performance such as current, the clock part shows the normal range of clock frequency measurement, the protection part shows all alarm states such as signal failure and signal degradation of the current trigger switching, and the Aspect-Oriented Software Development (AOSD) part shows the alarm state of the switching laser.
  • GFP Generic Framing Procedure
  • ODU Optical Channel Data Unit
  • the cloud data is used to upload the service data to the cloud server, so that the R&D personnel or maintenance personnel can obtain the corresponding service data through the client to perform service fault analysis.
  • This link may include: data collection and data uploading. During the data collection process, you need to confirm the board information that the faulty link to be analyzed passes, and start the data collection device, and store and upload the service data according to the agreed format. For the security needs of business data, the business data can be encrypted and then uploaded to the cloud server. Data collection, data encryption and upload functions can be turned on at the same time.
  • the service flow model may include the boards involved in the faulty service link, and the alarms of the ports on the board. By selecting the time period during which the fault occurs, you can view the alarms on the entire link. Pass the situation to determine the point at which the failure occurred.
  • the board-level service flow model is obtained by using the data of the board corresponding to the fault occurrence point.
  • the board-level service flow model can include information such as alarms, clock frequencies, and values of important monitoring node registers. Monitors whether the value of the node register is within the normal range.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
  • the server device includes: a processor 510 and a memory 520. Communication Interface 530 and bus 540.
  • the processor 510, the memory 520, and the communication interface 530 can complete communication with each other through the bus 540.
  • Communication interface 530 can be used for information transfer.
  • the processor 510 can call the logic instructions in the memory 520 to perform the corresponding fault information acquisition method in the above embodiment.
  • the memory 520 may include a storage program area and a storage data area, and the storage program area may store an operating system and an application required for at least one function.
  • the storage data area can store data and the like created according to the use of the server device.
  • the memory may include, for example, a volatile memory of a random access memory, and may also include a non-volatile memory. For example, at least one disk storage device, flash memory device, or other non-transitory solid state storage device.
  • the logic instructions in the memory 520 described above can be implemented in the form of software functional units and sold or used as separate products, the logic instructions can be stored in a computer readable storage medium.
  • the technical solution of the present disclosure may be embodied in the form of a computer software product, which may be stored in a storage medium, and includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) All or part of the steps of the method described in this embodiment are performed.
  • the storage medium may be a non-transitory storage medium or a transitory storage medium.
  • the non-transitory storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. medium.
  • FIG. 9 is a schematic diagram of a hardware structure of a client device according to the embodiment.
  • the client device includes: one or more processors 610 and a memory 620.
  • One processor 610 is taken as an example in FIG.
  • the client device may further include: an input device 630 and an output device 640.
  • the processor 610, the memory 620, the input device 630, and the output device 640 in the client device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
  • the input device 630 can receive input numeric or character information
  • the output device 640 can include a display device such as a display screen.
  • the memory 620 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules.
  • the processor 610 executes a plurality of functional applications and data processing by executing software programs, instructions, and modules stored in the memory 620 to implement corresponding fault information acquisition methods in the above embodiments.
  • the memory 620 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the client device, and the like.
  • the memory may include volatile memory such as random access memory (RAM), and may also include non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • Memory 620 can be a non-transitory computer storage medium or a transitory computer storage medium.
  • the non-transitory computer storage medium such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 620 can optionally include memory remotely located relative to processor 610, which can be connected to the client device over a network. Examples of the above networks may include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • Input device 630 can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the client device.
  • the output device 640 can include a display device such as a display screen.
  • the client device of this embodiment may also include a communication device 650 for transmission over a communication network and/or Receive information.
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by executing related hardware by a computer program, and the program can be stored in a non-transitory computer readable storage medium.
  • the program when executed, may include the flow of an embodiment of the method as described above, wherein the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM). Wait.
  • the present disclosure provides a fault information acquisition method and device, which can directly obtain fault information corresponding to a fault starting point when a service fault occurs, avoiding troubleshooting according to a large amount of historical data, improving the speed of the positioning fault, and eliminating the need for the staff to go to the site environment. The risk of affecting the operational stability of the existing network during the fault location process is reduced.

Abstract

A method and a device for acquiring fault information. The method is applicable to a server, and comprises: when monitoring that an optical transport network (OTN) service issues an alarm, acquiring a first alarm code of a first alarm and determining a first OTN single board corresponding to the first alarm; acquiring the monitored information and sample data ranges of a service node, the service node being the service node of a preset service stream model of the first OTN single board; the monitored information comprising the data and performance parameters of the service node when the first alarm is being issued, the sample data ranges being the range of the data and the range of the performance parameters of the service node, set in advance; and uploading the first alarm code, the monitored information and the sample data ranges to a cloud server.

Description

故障信息获取方法及装置Fault information acquisition method and device 技术领域Technical field
本公开涉及通信技术领域,例如涉及一种故障信息获取方法及装置。The present disclosure relates to the field of communications technologies, and, for example, to a fault information acquisition method and apparatus.
背景技术Background technique
随着通信技术的迅速发展,光传送网(Optical Transport Network,OTN)业务的应用网络逐步扩大,单个OTN业务板所承载的业务速率也越来越高。业务单板在发生故障时,造成的影响范围很广,因此要求在业务单板发生故障时,可以将业务迅速切换到保护网络;或在无保护环境中快速采取其他手段恢复业务,并且在业务恢复之后,进行故障定位,输出故障分析、规避方法及解决方案等,目前,定位工程故障主要有两种方法:With the rapid development of the communication technology, the application network of the Optical Transport Network (OTN) service is gradually expanded, and the service rate carried by a single OTN service board is also increasing. When a service board fails, the service board has a wide range of impacts. Therefore, when a service board fails, you can quickly switch services to the protection network. In other ways, you can quickly recover other services in an unprotected environment. After recovery, fault location, output fault analysis, evasive methods and solutions are available. Currently, there are two main methods for locating engineering faults:
第一种,在故障环境可以保留时,远程接入环境或维护人员直接去工程现场机房操作,利用业务单板设备提供的定位接口,进行环回测试,各环回点逐步推进以便分析故障点;读取业务单板预留的监测寄存器与正常业务比较是否有异常;读取单板预留的监测存储变量与预期是否一致;分析单板告警及其关联单板的告警状态推测故障点。但这些操作都是在实际OTN网络中进行的,而相关工作人员,如维护人员或研发人员,直接操作现网环境可能对正常运行的业务造成影响,并且零散的监测寄存器和软件变量要求相应工作人员具有精通的的专业能力;The first type, when the faulty environment can be retained, the remote access environment or maintenance personnel directly go to the engineering site to operate, and use the positioning interface provided by the service board device to perform loopback testing. Each loopback point is gradually advanced to analyze the fault point. Read whether the monitoring register reserved by the service board is abnormal with the normal service. Check whether the monitoring storage variables reserved by the board are consistent with the expected ones. Analyze the alarm status of the board alarm and its associated board. However, these operations are performed in the actual OTN network, and the relevant staff, such as maintenance personnel or R&D personnel, directly operate the existing network environment, which may affect the normal operation of the service, and the scattered monitoring registers and software variables require corresponding work. Personnel with proficiency in professional competence;
第二种,在故障环境不能保留时,可以收集相关环境信息,在实验室进行复现,模拟工程操作并进行自动化重复操作尝试重现故障。根据定位故障的经验来看,现场偶发故障可能因为环境不同、操作方法不同而无法复现。远程定位OTN设备故障严重依赖于现网的实际故障环境,告警数据分析繁琐并且定位故障的时效性不强。Second, when the fault environment cannot be retained, relevant environmental information can be collected, reproduced in the laboratory, simulated engineering operations, and automated repetitive operations attempting to reproduce the fault. According to the experience of positioning faults, accidental on-site faults may not be reproduced due to different environments and different operating methods. The remote location OTN device fault depends heavily on the actual fault environment of the live network. The analysis of the alarm data is cumbersome and the timeliness of the location fault is not strong.
因此,相关技术对于OTN设备的故障定位存在一定的缺陷。Therefore, the related technology has certain defects for the fault location of the OTN device.
发明内容 Summary of the invention
本公开提供了一种故障信息获取方法及装置,可以解决相关技术中对于OTN设备的故障定位易影响现网业务运行稳定性,以及定位时效性差的缺陷。The present disclosure provides a method and a device for acquiring fault information, which can solve the defect that the fault location of the OTN device in the related art is easy to affect the running stability of the existing network service and the timeliness of the positioning is poor.
本实施例提供了一种故障信息获取方法,应用于服务端,可以包括:The embodiment provides a method for acquiring fault information, which is applied to the server, and may include:
当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定第一告警对应的第一OTN单板;Obtaining the first alarm code of the first alarm and determining the first OTN board corresponding to the first alarm when the OTN service of the optical transmission network is sent an alarm;
获取业务节点的监测信息以及样本数据范围,业务节点为第一OTN单板的预设的业务流模型的业务节点;监测信息包括业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围;以及Obtaining the monitoring information of the service node and the sample data range, where the service node is a service node of the preset service flow model of the first OTN board; the monitoring information includes data of the service node at the moment when the first alarm occurs, and performance parameters, sample data. The range is the data range and performance parameter range of the preset service node data;
将第一告警编码、监测信息以及样本数据范围上传至云服务器。Upload the first alarm code, monitoring information, and sample data range to the cloud server.
可选地,当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定第一告警对应的第一OTN单板,包括:Optionally, the first OTN board corresponding to the first alarm is obtained, and the first OTN board corresponding to the first alarm is obtained, and the following:
当监测光传送网OTN业务的业务拓扑中的告警监测点发出告警时,获取告警的告警编码,告警编码中携带有告警发生的时间信息、位置信息以及告警名称;When the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, the alarm code of the alarm is obtained, and the alarm code carries the time information, the location information, and the alarm name of the alarm;
根据告警编码,确定告警时间发生在最前的第一告警以及第一告警编码;以及Determining, according to the alarm code, the first alarm and the first alarm code whose alarm time occurs at the top; and
根据第一告警编码,确定第一OTN单板。Determining the first OTN board according to the first alarm code.
可选地,业务节点包括:业务拆分节点、业务封装节点以及硬件节点。Optionally, the service node includes: a service split node, a service encapsulation node, and a hardware node.
可选地,性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流。Optionally, the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
可选地,该方法还包括:Optionally, the method further includes:
当监测到OTN业务的误码率超出一预设数值时,获取误码率对应的第二OTN单板;Obtaining a second OTN board corresponding to the error rate when the BER of the OTN service exceeds a preset value;
获取第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及Obtaining data packet information of the service node of the preset service flow model of the second OTN board, where the data packet information includes: the number of received data packets and the number of sent data packets;
将误码率与数据包信息上传至云服务器。Upload the error rate and packet information to the cloud server.
可选地,将第一告警编码、监测信息以及样本数据范围上传至云服务器,包括: Optionally, uploading the first alarm code, the monitoring information, and the sample data range to the cloud server, including:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法加密后,上传至云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
可选地,将误码率与数据包信息上传至云服务器,包括:Optionally, uploading the error rate and the packet information to the cloud server, including:
将误码率与数据包信息按照第二预设加密算法加密后,上传至云服务器。The error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
本实施例还提供了一种故障信息获取方法,应用于客户端,可以包括:The embodiment further provides a method for acquiring fault information, which is applied to the client, and may include:
当服务端监测到光传送网OTN业务发出告警时,客户端从云服务器获取第一告警的第一告警编码,以及与第一告警编码相对应的监测信息以及样本数据范围;监测信息为第一告警对应的第一OTN单板的预设的业务流模型的业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围;以及When the server detects that the OTN service sends an alarm, the client obtains the first alarm code of the first alarm from the cloud server, and the monitoring information corresponding to the first alarm code and the sample data range; the monitoring information is the first The data of the service node of the preset service flow model of the first OTN board corresponding to the alarm and the performance parameter at the moment when the first alarm occurs, and the sample data range is the data range and the performance parameter range of the preset service node data;
将第一告警编码、监测信息以及样本数据范围与预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range and the preset business flow model are displayed through the visual view.
可选地,将第一告警编码、监测信息以及样本数据范围与预设的业务流模型通过可视化视图显示,包括:Optionally, displaying the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view, including:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密成明文,与预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
可选地,该方法包括:Optionally, the method comprises:
当服务端监测到OTN业务的误码率超出一预设数值时,客户端从云服务器获取误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及When the server detects that the BER of the OTN service exceeds a preset value, the client obtains the second OTN board corresponding to the error rate from the cloud server, and the service of the preset service flow model of the second OTN board. Packet information of the node, the packet information includes: the number of received data packets and the number of transmitted data packets;
将误码率与数据包信息通过可视化视图显示。The bit error rate and packet information are displayed in a visual view.
可选地,将误码率与数据包信息通过可视化视图显示,包括:Optionally, the bit error rate and the packet information are displayed through the visual view, including:
将误码率与数据包信息按照第二预设加密算法解密成明文,与数据包信息通过可视化视图显示。The error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
本实施例还提供了一种故障信息获取装置,应用于服务端,可以包括:The embodiment further provides a fault information obtaining device, which is applied to the server, and may include:
告警监测模块,设置为当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定第一告警对应的第一OTN单板;The alarm monitoring module is configured to: when the OTN service of the optical transport network is sent to send an alarm, obtain the first alarm code of the first alarm, and determine the first OTN board corresponding to the first alarm;
信息获取模块,设置为获取业务节点的监测信息以及样本数据范围,业务 节点为第一OTN单板的预设的业务流模型的业务节点;监测信息包括业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围;以及An information acquisition module, configured to acquire monitoring information of a service node and a sample data range, and a service The node is a service node of the preset service flow model of the first OTN board; the monitoring information includes data of the service node at the moment when the first alarm occurs and performance parameters, and the sample data range is a data range of the preset service node data and Range of performance parameters;
数据上传模块,设置为将第一告警编码、监测信息以及样本数据范围上传至云服务器。The data uploading module is configured to upload the first alarm code, the monitoring information, and the sample data range to the cloud server.
可选地,告警监测模块包括:Optionally, the alarm monitoring module includes:
告警编码获取子模块,设置为当监测光传送网OTN业务的业务拓扑中的告警监测点发出告警时,获取告警的告警编码,告警编码中携带有告警发生的时间信息、位置信息以及告警名称;The alarm code acquisition sub-module is configured to acquire an alarm code of the alarm when the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, and the alarm code carries the time information, the location information, and the alarm name of the alarm;
第一告警确定子模块,设置为根据告警编码,确定告警时间发生在最前的第一告警以及第一告警的第一告警编码;以及a first alarm determining submodule configured to determine, according to the alarm code, a first alarm that occurs at an alarm time and a first alarm code of the first alarm;
单板确定子模块,设置为根据第一告警编码,确定第一告警对应的第一OTN单板。The board determining submodule is configured to determine, according to the first alarm code, the first OTN board corresponding to the first alarm.
可选地,业务节点包括:业务拆分节点、业务封装节点以及硬件节点。Optionally, the service node includes: a service split node, a service encapsulation node, and a hardware node.
可选地,性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流。Optionally, the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
可选地,该装置还包括:Optionally, the device further includes:
误码监测模块,设置为当监测到OTN业务的误码率超出一预设数值时,获取误码率对应的第二OTN单板;The error monitoring module is configured to obtain a second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value;
数据包信息获取模块,设置为获取第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及a packet information obtaining module, configured to acquire data packet information of a service node of a preset service flow model of the second OTN board, where the data packet information includes: a quantity of the received data packet and a quantity of the sent data packet;
误码信息上传模块,设置为将误码率与数据包信息上传至云服务器。The error information uploading module is configured to upload the error rate and the data packet information to the cloud server.
可选地,数据上传模块用于:Optionally, the data upload module is used to:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法加密后,上传至云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
本实施例还提供了一种故障信息获取装置,应用于客户端,可以包括:The embodiment further provides a fault information obtaining device, which is applied to the client, and may include:
数据获取模块,设置为当服务器监测到光传送网OTN业务发出告警时,从云服务器获取第一告警的第一告警编码,以及与第一告警编码相对应地监测信 息以及样本数据范围;监测信息为第一告警对应的第一OTN单板的预设的业务流模型的业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围;以及The data acquisition module is configured to: when the server detects that the OTN service sends an alarm, the first alarm code of the first alarm is obtained from the cloud server, and the monitoring signal is corresponding to the first alarm code. And the sample data range; the monitoring information is the data of the service node of the preset service flow model of the first OTN board corresponding to the first alarm, and the performance parameter at the moment when the first alarm occurs, and the sample data range is a preset service. The data range and performance parameter range of the node data;
视图显示模块,设置为将第一告警编码、监测信息以及样本数据范围与预设的业务流模型通过可视化视图显示。The view display module is configured to display the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view.
可选地,视图显示模块是设置为:Optionally, the view display module is set to:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密成明文,与预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
可选地,该装置还包括:Optionally, the device further includes:
误码显示模块,设置为当监测到OTN业务的误码率超出一预设数值时,从云服务器获取误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及The erroneous display module is configured to obtain the second OTN board corresponding to the BER and the preset service flow of the second OTN board from the cloud server when the error rate of the OTN service exceeds a preset value. Packet information of the service node of the model, the packet information includes: the number of received data packets and the number of transmitted data packets;
将误码率与数据包信息通过可视化视图显示。The bit error rate and packet information are displayed in a visual view.
本实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任意一种方法。The embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
本实施例还提供一种服务端设备,该服务端设备包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处理器执行时,执行上述相应的故障信息获取方法。The embodiment further provides a server device, the server device comprising one or more processors, a memory and one or more programs, the one or more programs being stored in the memory when being processed by one or more When the device is executed, the corresponding fault information acquisition method described above is executed.
本实施例还提供一种客户端设备,该客户端设备包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处理器执行时,执行上述相应的故障信息获取方法。The embodiment also provides a client device including one or more processors, a memory, and one or more programs, the one or more programs being stored in the memory when processed by one or more When the device is executed, the corresponding fault information acquisition method described above is executed.
本实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意一种方法。The embodiment further provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer Having the computer perform any of the methods described above.
本公开提供的故障信息获取方法及装置,通过监测到OTN业务发出告警时,根据告警编码定位告警时间发生在最前的第一OTN单板,将第一OTN单板各个业务节点的监测信息与样本数据范围上传至云服务器;从云服务器获取告警时间发生在最前的第一OTN单板的监测信息以及样本数据范围,根据从云服务 器获取的数据,匹配预设的业务流模型,将监测信息中的业务节点数据以及性能参数与样本数据范围相对应地显示在可视化的模型中,使维护人员根据测信息与样本数据范围对故障数据进行分析;本公开通过服务端直接获取故障起点对应的故障信息,避免根据大量的历史数据逐一排查,提高定位故障的速度,且无需工作人员去现场环境操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险,并提高故障定位的速度。The method and device for acquiring the fault information provided by the present disclosure, when monitoring the OTN service to send an alarm, the monitoring information and the sample of each service node of the first OTN board are obtained according to the first OTN board where the alarm time is generated according to the alarm code. The data range is uploaded to the cloud server; the monitoring information of the first OTN board where the alarm time occurs at the top of the cloud server and the sample data range are obtained from the cloud service. The data acquired by the device matches the preset service flow model, and displays the service node data and the performance parameter in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can fault according to the measurement information and the sample data range. The data is analyzed; the disclosure directly obtains the fault information corresponding to the fault starting point through the server, avoids checking one by one according to a large amount of historical data, improves the speed of the positioning fault, and does not require the staff to go to the site environment operation, thereby reducing the fault location process. The risk of impact on the operational stability of the existing network services and the speed of fault location.
附图说明DRAWINGS
图1为第一实施例提供的故障信息获取方法的流程示意图;1 is a schematic flowchart of a method for acquiring fault information provided by a first embodiment;
图2为第二实施例提供的故障信息获取装置的结构示意图;2 is a schematic structural diagram of a fault information acquiring apparatus according to a second embodiment;
图3为第三实施例提供的故障信息获取方法的流程示意图;3 is a schematic flowchart of a method for acquiring fault information provided by a third embodiment;
图4为第四实施例提供的故障信息获取装置的结构示意图;4 is a schematic structural diagram of a fault information acquiring apparatus according to a fourth embodiment;
图5为第五实施例提供的应用场景示意图;FIG. 5 is a schematic diagram of an application scenario provided by the fifth embodiment;
图6为第五实施例提供的系统级模型示意图;6 is a schematic diagram of a system level model provided by the fifth embodiment;
图7为第五实施例提供的单板级模型示意图;7 is a schematic diagram of a single board level model provided by the fifth embodiment;
图8为实施例五提供的一种服务端设备的硬件结构示意图;8 is a schematic structural diagram of hardware of a server device according to Embodiment 5;
图9为实施例五提供的一种客户端设备的硬件结构示意图。FIG. 9 is a schematic structural diagram of hardware of a client device according to Embodiment 5.
具体实施方式detailed description
第一实施例First embodiment
参见图1,本实施例提供了一种故障信息获取方法,应用于服务端,可以包括步骤110-步骤130。Referring to FIG. 1 , this embodiment provides a method for acquiring fault information, which is applied to a server, and may include steps 110-130.
在步骤110中,当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定第一告警对应的第一OTN单板。In step 110, when the OTN service of the optical transport network is sent to generate an alarm, the first alarm code of the first alarm is obtained, and the first OTN board corresponding to the first alarm is determined.
通常情况下,OTN业务的业务拓扑中设有多个告警监测点,告警监测点通常设置在业务节点上,告警监测点发出告警之后会生成告警编码,可以根据告警编码确定对应地OTN单板。Generally, the OTN service has multiple alarm monitoring points in the service topology. The alarm monitoring point is usually set on the service node. The alarm monitoring point generates an alarm code after the alarm is generated. The corresponding OTN board can be determined according to the alarm code.
在步骤120中,获取业务节点的监测信息以及样本数据范围,业务节点为第一OTN单板的预设的业务流模型的业务节点;监测信息包括业务节点在第一 告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围。In step 120, the monitoring information of the service node and the sample data range are obtained, where the service node is a service node of a preset service flow model of the first OTN board; and the monitoring information includes the service node at the first The data of the moment when the alarm occurs and the performance parameter. The sample data range is the data range and performance parameter range of the preset service node data.
其中,获取第一OTN单板的预设的业务节点的监测信息,监测信息包括业务节点在第一告警发生的时刻所传输的数据以及性能参数。The monitoring information of the preset service node of the first OTN board is obtained, where the monitoring information includes data and performance parameters transmitted by the service node at the moment when the first alarm occurs.
可选地,性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流等;样本数据范围为监测信息中的业务节点的数据对应的数据范围,即正常值范围,以及性能参数范围,即性能参数的正常值范围。Optionally, the performance parameters include: a clock frequency, a peripheral chip state, an optical power, an optical module bias voltage, and a bias current; and the sample data range is a data range corresponding to the data of the service node in the monitoring information, that is, a normal value range. And the range of performance parameters, which is the normal range of performance parameters.
可选地,在子业务流模型的业务节点设置监测点,以获取节点的监测信息。Optionally, a monitoring point is set in the service node of the sub-service flow model to obtain monitoring information of the node.
可选地,业务节点包括:业务拆分节点、业务封装节点以及硬件节点。Optionally, the service node includes: a service split node, a service encapsulation node, and a hardware node.
在步骤130中,将第一告警编码、监测信息以及样本数据范围上传至云服务器。In step 130, the first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server.
在本实施例中将第一告警编码、监测信息以及样本数据范围上传至云服务器,以便人工远程获取故障信息,无需维护人员或研发人员直接到现网环境中进行操作,避免影响正常业务的运行,减小了在故障定位过程中对现网业务运行稳定性影响的风险,并且无需进行现场的繁琐操作,提高故障定位的速度。In this embodiment, the first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server, so that the fault information can be obtained manually by the maintenance personnel, and the maintenance personnel or the R&D personnel can directly operate in the live network environment to avoid affecting the operation of the normal service. The risk of the stability of the existing network service during the fault location process is reduced, and the cumbersome operation of the site is not required, and the speed of fault location is improved.
可选地,上述步骤110包括如下步骤:Optionally, the foregoing step 110 includes the following steps:
第一步,当监测光传送网OTN业务的业务拓扑中的告警监测点发出告警时,获取告警的告警编码。In the first step, when the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, the alarm code of the alarm is obtained.
其中,告警编码中携带有告警发生的时间信息、位置信息以及告警名称;时间信息用于确定各个告警监测点发生告警的先后,位置信息用于确定告警对应的OTN单板,告警名称有助于维护人员定向排查告警原因。The alarm code carries the time information, the location information, and the alarm name of the alarm. The time information is used to determine the sequence of the alarms generated by each alarm monitoring point. The location information is used to determine the OTN board corresponding to the alarm. The maintenance personnel directed to check the cause of the alarm.
第二步,根据告警编码,确定告警时间发生在最前的第一告警以及第一告警的第一告警编码。In the second step, according to the alarm code, the first alarm that occurs at the top of the alarm time and the first alarm code of the first alarm are determined.
通常告警时间发生在最前的OTN单板为故障的起点,因此发生在最前的第一告警的第一告警编码对应的第一OTN单板为故障起点;直接获取故障起点可以避免根据大量的历史数据逐一排查,减少相关工作人员的工作量;根据每个告警编码中的时间信息,可以确定告警时间发生在最前的第一告警,并获取第一告警对应的第一告警编码,以便根据第一告警编码获得第一告警的相关信息。The first OTN board corresponding to the first alarm code of the first alarm is the fault starting point. The fault origin can be avoided based on a large amount of historical data. The first alarm is generated, and the first alarm corresponding to the first alarm is obtained, according to the time information of each alarm code, and the first alarm code corresponding to the first alarm is obtained, so as to be based on the first alarm. Encoding obtains relevant information of the first alarm.
第三步,根据第一告警编码,确定第一告警对应的第一OTN单板。 In the third step, the first OTN board corresponding to the first alarm is determined according to the first alarm code.
其中,根据第一告警编码中的位置信息,确定告警时间发生在最前的第一OTN单板。The first OTN board in which the alarm time occurs is determined according to the location information in the first alarm code.
可选地,将第一告警编码、监测信息以及样本数据范围上传至云服务器,包括:Optionally, uploading the first alarm code, the monitoring information, and the sample data range to the cloud server, including:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法加密后,上传至云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
其中,为了提高数据安全性,需要将监测信息以及样本数据范围加密后上传到云服务器。In order to improve data security, the monitoring information and the sample data range need to be encrypted and uploaded to the cloud server.
可选地,该方法还包括:Optionally, the method further includes:
当监测到OTN业务的误码率超出一预设数值时,获取误码率对应的第二OTN单板;Obtaining a second OTN board corresponding to the error rate when the BER of the OTN service exceeds a preset value;
其中,可以监测一条链路中的每一个单板的误码率,因此在监测到OTN业务的误码率超出第一预设数值时,可以确定与误码率对应的OTN单板,即上述第二OTN单板。The OTN board corresponding to the error rate can be determined when the BER of the OTN service exceeds the first preset value. The second OTN board.
获取第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;Obtaining data packet information of the service node of the preset service flow model of the second OTN board, where the data packet information includes: the number of received data packets and the number of sent data packets;
将误码率与数据包信息上传至云服务器。Upload the error rate and packet information to the cloud server.
通常情况下,当误码率超过一定数值(大于上述预设数值)时,才会引起告警;而当误码率不足以引起告警,但却超出上述预设数值时,需要检查第二OTN单板的预设的业务流模型的业务节点是否发生数据包丢失现象;如果发生数据包丢失,检查对应的业务节点即可,避免因数据包丢失引起系统告警。Normally, when the bit error rate exceeds a certain value (greater than the above preset value), the alarm will be generated. When the error rate is insufficient to cause the alarm, but the above preset value is exceeded, the second OTN list needs to be checked. Whether the packet loss occurs on the service node of the default service flow model of the board; if a packet loss occurs, check the corresponding service node to avoid system alarm caused by packet loss.
可选地,将误码率与数据包信息上传至云服务器,包括:Optionally, uploading the error rate and the packet information to the cloud server, including:
将误码率与数据包信息按照第二预设加密算法加密后,上传至云服务器。The error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
其中,为了提高数据安全性,将上述第二OTN单板到对应的误码率与数据包信息按照第二预设加密算法加密后,上传至云服务器。In order to improve data security, the second OTN board is encrypted by the second preset encryption algorithm to the corresponding error rate and the packet information, and then uploaded to the cloud server.
在本实施例中当监测到OTN业务发出告警时,根据告警编码定位告警时间发生在最前的第一OTN单板,将第一OTN单板各个业务节点的监测信息与样本数据范围上传至云服务器,使维护人员根据测信息与样本数据范围对故障数据进行分析;通过服务端直接获取故障起点对应的故障信息,避免根据大量的 历史数据逐一排查,提高了定位故障的速度,且无需工作人员去现场环境操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险。In the embodiment, when the alarm is generated by the OTN service, the first OTN board is generated according to the alarm code, and the monitoring information and the sample data range of each service node of the first OTN board are uploaded to the cloud server. The maintenance personnel analyzes the fault data according to the measurement information and the sample data range; directly obtains the fault information corresponding to the fault starting point through the server, and avoids according to a large number of Historical data is checked one by one, which improves the speed of locating faults and eliminates the need for staff to go to the on-site environment to reduce the risk of impact on the operational stability of the existing network during fault location.
第二实施例Second embodiment
参见图2,本实施例提供了一种故障信息获取装置,应用于服务端,可以包括:Referring to FIG. 2, the embodiment provides a fault information acquiring apparatus, which is applied to a server, and may include:
告警监测模块201,设置为当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定第一告警对应的第一OTN单板。The alarm monitoring module 201 is configured to acquire the first alarm code of the first alarm and determine the first OTN board corresponding to the first alarm when the OTN service of the optical transmission network is detected.
通常情况下,OTN业务的业务拓扑中设有多个告警监测点,可以将告警监测点设置在业务节点上,告警监测点发出告警之后会生成告警编码,可根据告警编码确定对应地OTN单板。Generally, the OTN service has multiple alarm monitoring points in the service topology, and the alarm monitoring point can be set on the service node. After the alarm monitoring point sends an alarm, an alarm code is generated, and the corresponding OTN board can be determined according to the alarm code. .
信息获取模块202,设置为获取业务节点的监测信息以及样本数据范围,业务节点为第一OTN单板的预设的业务流模型的业务节点;监测信息包括业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围。The information obtaining module 202 is configured to acquire the monitoring information of the service node and the sample data range, where the service node is a service node of the preset service flow model of the first OTN board; the monitoring information includes the time when the service node is at the time when the first alarm occurs. Data and performance parameters, the sample data range is the data range and performance parameter range of the preset service node data.
其中,获取第一OTN单板的预设的业务节点的监测信息,监测信息包括业务节点在第一告警发生的时刻所传输的数据以及性能参数,可选地,性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流性能参数包括温度、出入光功率和偏置电流等;样本数据范围为监测信息中的业务节点数据对应的数据范围,即正常值范围,以及性能参数范围。The monitoring information of the preset service node of the first OTN board is obtained. The monitoring information includes data transmitted by the service node at the moment when the first alarm occurs, and performance parameters. Optionally, the performance parameters include: clock frequency and periphery. The chip state, optical power, optical module bias voltage, and bias current performance parameters include temperature, incoming and outgoing optical power, and bias current; the sample data range is the data range corresponding to the service node data in the monitoring information, that is, the normal value range, And a range of performance parameters.
可选地,在子业务流模型的业务节点设置监测点,以获取节点的监测信息。Optionally, a monitoring point is set in the service node of the sub-service flow model to obtain monitoring information of the node.
数据上传模块203,设置为将第一告警编码、监测信息以及样本数据范围上传至云服务器。The data uploading module 203 is configured to upload the first alarm code, the monitoring information, and the sample data range to the cloud server.
其中,将第一告警编码、监测信息以及样本数据范围上传至云服务器,以便人工远程获取故障信息,无需维护人员或研发人员直接到现网环境中进行操作,避免影响正常业务的运行,减小了在故障定位过程中对现网业务运行稳定性影响的风险,并提高了定位故障的速度。The first alarm code, the monitoring information, and the sample data range are uploaded to the cloud server, so that the fault information can be obtained manually by the maintenance personnel, and the maintenance personnel or the R&D personnel can directly operate in the live network environment to avoid affecting the operation of the normal service and reduce the operation. The risk of affecting the operational stability of the existing network during the fault location process, and the speed of the fault location.
可选地,告警监测模块201包括:Optionally, the alarm monitoring module 201 includes:
该井编码获取子模块,设置为当监测光传送网OTN业务的业务拓扑中的告 警监测点发出告警时,获取告警的告警编码,告警编码中携带有告警发生的时间信息、位置信息以及告警名称;The well code acquisition sub-module is set to be used in the service topology of the OTN service of the optical transport network. When the alarm monitoring point issues an alarm, the alarm code of the alarm is obtained, and the alarm code carries the time information, the location information, and the alarm name of the alarm occurrence;
第一告警确定子模块,设置为根据告警编码,确定告警时间发生在最前的第一告警以及第一告警的第一告警编码;以及a first alarm determining submodule configured to determine, according to the alarm code, a first alarm that occurs at an alarm time and a first alarm code of the first alarm;
单板确定子模块,设置为根据第一告警编码,确定第一告警对应的第一OTN单板。The board determining submodule is configured to determine, according to the first alarm code, the first OTN board corresponding to the first alarm.
可选地,业务节点包括:业务拆分节点、业务封装节点以及硬件节点。Optionally, the service node includes: a service split node, a service encapsulation node, and a hardware node.
可选地,性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流。Optionally, the performance parameters include: clock frequency, peripheral chip state, optical power, optical module bias voltage, and bias current.
可选地,该装置还包括:Optionally, the device further includes:
误码监测模块,设置为当监测到OTN业务的误码率超出一预设数值时,获取误码率对应的第二OTN单板;The error monitoring module is configured to obtain a second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value;
数据包信息获取模块,设置为获取第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及a packet information obtaining module, configured to acquire data packet information of a service node of a preset service flow model of the second OTN board, where the data packet information includes: a quantity of the received data packet and a quantity of the sent data packet;
误码信息上传模块,设置为将误码率与数据包信息上传至云服务器。The error information uploading module is configured to upload the error rate and the data packet information to the cloud server.
可选地,数据上传模块203是设置为:Optionally, the data uploading module 203 is configured to:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法加密后,上传至云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to the first preset encryption algorithm, and then uploaded to the cloud server.
可选地,误码信息上传模块是设置为:Optionally, the error information uploading module is set to:
将误码率与数据包信息按照第二预设加密算法加密后,上传至云服务器。The error rate and the packet information are encrypted according to the second preset encryption algorithm, and then uploaded to the cloud server.
在本实施例中,当监测到OTN业务发出告警时,根据告警编码定位告警时间发生在最前的第一OTN单板,将第一OTN单板各个业务节点的监测信息与样本数据范围上传至云服务器,使维护人员能够远程根据监测信息与样本数据范围对故障数据进行分析;通过服务端获取直接获取故障起点对应的故障信息,避免根据大量的历史数据逐一排查,提高定位故障的速度,且无需工作人员去现场环境操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险。In this embodiment, when the alarm is generated by the OTN service, the first OTN board is generated according to the alarm code, and the monitoring information and the sample data range of each service node of the first OTN board are uploaded to the cloud. The server enables the maintenance personnel to remotely analyze the fault data according to the monitoring information and the sample data range. The server obtains the fault information corresponding to the fault starting point directly, avoids troubleshooting according to a large amount of historical data, and improves the speed of the positioning fault, and does not need to The staff goes to the on-site environment to reduce the risk of impact on the operational stability of the existing network during the fault location process.
第三实施例 Third embodiment
参见图3,本实施例提供了一种故障信息获取方法,应用于客户端,可以包括步骤310-步骤320。Referring to FIG. 3, the embodiment provides a method for acquiring fault information, which is applied to the client, and may include steps 310-320.
在步骤310中,当服务端监测到光传送网OTN业务发出告警时,客户端从已获取第一告警的第一告警编码的云服务器获取第一告警的第一告警编码、与第一告警编码相对应的监测信息以及样本数据范围;监测信息为第一告警对应的第一OTN单板的预设的业务流模型的业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围。In step 310, when the server detects that the OTN service sends an alarm, the client obtains the first alarm code of the first alarm and the first alarm code from the cloud server that has obtained the first alarm code of the first alarm. The corresponding monitoring information and the sample data range; the monitoring information is the data of the service node of the preset service flow model of the first OTN board corresponding to the first alarm, and the performance parameter at the moment when the first alarm occurs, and the sample data range is The data range and performance parameter range of the preset service node data.
其中,第一告警为告警时间发生在最前的告警。当服务端监测到OTN业务发出告警时,从云服务器获取发生在最前的第一告警的第一告警编码以及监测信息以及样本数据范围。The first alarm is an alarm whose alarm time occurs first. When the server detects that the OTN service sends an alarm, the first alarm code and the monitoring information and the sample data range that occur in the first alarm are obtained from the cloud server.
例如,当服务端监测到OTN业务发出告警时,由服务端将发生故障时的业务链路数据上传至云服务器,并通知客户端,由客户端从云服务器获取发生在最前的第一告警的第一告警编码以及监测信息以及样本数据范围,并根据从云服务器获取的数据信息,进行后续的可视化视图显示。通常情况下,OTN业务的业务拓扑中设有多个告警监测点,可以将告警监测点设置在业务节点上,告警监测点发出告警之后会生成告警编码;通常告警时间发生在最前的OTN单板为故障的起点,因此发生在最前的第一告警的第一告警编码对应的第一OTN单板为故障起点;直接获取故障起点,可以避免根据大量的历史数据逐一排查,减少相关工作人员的工作量,提高定位故障的速度。For example, when the server detects that the OTN service sends an alarm, the server uploads the service link data when the fault occurs to the cloud server, and notifies the client that the client obtains the first alarm that occurs at the forefront from the cloud server. The first alarm code and the monitoring information and the sample data range are subjected to subsequent visual view display according to the data information acquired from the cloud server. Generally, the OTN service has multiple alarm monitoring points in the service topology. The alarm monitoring point can be set on the service node. The alarm monitoring point generates an alarm code after the alarm is generated. The alarm time is usually on the top OTN board. As the starting point of the fault, the first OTN board corresponding to the first alarm code of the first alarm is the fault starting point; directly obtaining the fault starting point can avoid checking one by one according to a large amount of historical data, and reducing the work of the relevant staff. Quantity, improve the speed of positioning faults.
在步骤320中,将第一告警编码、监测信息以及样本数据范围与预设的业务流模型通过可视化视图显示。In step 320, the first alarm code, the monitoring information, and the sample data range and the preset service flow model are displayed through the visual view.
其中,根据从云服务器获取的第一告警编码、监测信息以及样本数据范围等数据,匹配预设的业务流模型,将监测信息中的业务节点数据、性能参数与样本数据范围相对应地显示在可视化的模型中,使维护人员根据可视化模型进行数据对比,可快速筛选出发生故障的业务节点,实现远程定位OTN设备故障,不再依赖于现网故障的实际场景,避免告警数据分析的繁琐操作,提高定位故障的速度。And matching the preset service flow model according to the first alarm code, the monitoring information, and the sample data range obtained from the cloud server, and displaying the service node data and the performance parameter in the monitoring information corresponding to the sample data range. In the visual model, the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node, and realize the remote positioning of the OTN device fault, no longer rely on the actual scene of the current network fault, and avoid the cumbersome operation of the alarm data analysis. Improve the speed of positioning faults.
可选地,步骤310可以包括: Optionally, step 310 may include:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密成明文,与预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range are decrypted into the plaintext according to the first preset encryption algorithm, and the preset service flow model is displayed through the visual view.
其中,从云服务器获取的数据为按照第一预设加密算法进行加密的密文,因此需要将密文进行解密得到明文,并在业务流模型上显示。The data obtained from the cloud server is the ciphertext encrypted according to the first preset encryption algorithm. Therefore, the ciphertext needs to be decrypted to obtain the plaintext, and displayed on the service flow model.
可选地,该方法还包括:Optionally, the method further includes:
当服务端监测到OTN业务的误码率超出一预设数值时,客户端从云服务器获取误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及When the server detects that the BER of the OTN service exceeds a preset value, the client obtains the second OTN board corresponding to the error rate from the cloud server, and the service of the preset service flow model of the second OTN board. Packet information of the node, the packet information includes: the number of received data packets and the number of transmitted data packets;
将误码率与数据包信息通过可视化视图显示。The bit error rate and packet information are displayed in a visual view.
通常情况下,当误码率超过一定数值(大于上述预设数值)时,才会引起告警;而当误码率不足以引起告警,但却超出上述预设数值时,需要检查第二OTN单板的预设的业务流模型的业务节点是否发生数据包丢失现象;如果发生数据包丢失,检查对应的业务节点即可,避免因数据包丢失引起系统告警。Normally, when the bit error rate exceeds a certain value (greater than the above preset value), the alarm will be generated. When the error rate is insufficient to cause the alarm, but the above preset value is exceeded, the second OTN list needs to be checked. Whether the packet loss occurs on the service node of the default service flow model of the board; if a packet loss occurs, check the corresponding service node to avoid system alarm caused by packet loss.
当从云服务器获取误码率对应的第二OTN单板以及数据包信息时,将误码率与数据包信息在业务流模型上可视化显示,使维护人员可准确定位出发生数据包丢失现象的业务节点。When the second OTN board corresponding to the bit error rate and the packet information are obtained from the cloud server, the error rate and the packet information are visually displayed on the service flow model, so that the maintenance personnel can accurately locate the packet loss phenomenon. Business node.
可选地,将误码率与数据包信息通过可视化视图显示,包括:Optionally, the bit error rate and the packet information are displayed through the visual view, including:
将误码率与数据包信息按照第二预设加密算法解密成明文,与数据包信息通过可视化视图显示。The error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
其中,从云服务器获取的误码率与数据包信息为按照第二预设加密算法进行加密的密文,因此需要将密文进行解密得到明文,并在业务流模型上显示。The error rate and the packet information obtained from the cloud server are ciphertexts encrypted according to the second preset encryption algorithm. Therefore, the ciphertext needs to be decrypted to obtain plaintext, and displayed on the service flow model.
在本实施例中,当监测到OTN业务发出告警时,从云服务器获取告警时间发生在最前的第一OTN单板的监测信息以及样本数据范围,根据从云服务器获取的数据,匹配预设的业务流模型,将监测信息中的业务节点数据、性能参数与样本数据范围相对应地显示在可视化的模型中,使维护人员根据可视化模型进行数据对比,可快速筛选出故障的业务节点,实现远程定位OTN设备故障,不再依赖于现网故障的实际场景,避免告警数据分析的繁琐操作,通过客户端从云服务器直接获取故障起点对应的故障信息,避免根据大量的历史数据逐一 排查,提高定位故障的速度,且无需工作人员去现场环境操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险。In this embodiment, when monitoring the OTN service to send an alarm, the monitoring information and the sample data range of the first OTN board whose alarm time occurs at the foremost time are obtained from the cloud server, and the preset data is matched according to the data acquired from the cloud server. The service flow model displays the service node data and performance parameters in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node and realize the remote operation. The OTN device is faulty and no longer depends on the actual scenario of the fault on the live network. The cumbersome operation of the alarm data analysis is avoided. The fault information corresponding to the fault origin is directly obtained from the cloud server through the client, avoiding one by one according to a large amount of historical data. Trouble-shooting improves the speed of locating faults and eliminates the need for staff to go to the on-site environment to reduce the risk of impact on the operational stability of the existing network during fault location.
第四实施例Fourth embodiment
参见图4,本实施例提供了一种故障信息获取装置,应用于客户端,可以包括:Referring to FIG. 4, the embodiment provides a fault information obtaining apparatus, which is applied to a client, and may include:
数据获取模块401,设置为当服务端监测到光传送网OTN业务发出告警时,从已获取第一告警的第一告警编码的云服务器获取第一告警的第一告警编码,以及与第一告警编码相对应的监测信息以及样本数据范围;监测信息为第一告警对应的第一OTN单板的预设的业务流模型的业务节点在第一告警发生的时刻的数据以及性能参数,样本数据范围为预设的业务节点数据的数据范围和性能参数范围。The data acquisition module 401 is configured to: when the server detects that the OTN service sends an alarm, the first alarm code of the first alarm is obtained from the cloud server that has obtained the first alarm code of the first alarm, and the first alarm is generated. The corresponding monitoring information and the sample data range are encoded; the monitoring information is the data of the service node of the preset service flow model of the first OTN board corresponding to the first alarm, and the performance parameter, the sample data range The data range and performance parameter range of the preset business node data.
其中,第一告警为告警时间发生在最前的告警。当监测到OTN业务发出告警时,从云服务器获取发生在最前的第一告警的第一告警编码、监测信息以及样本数据范围。The first alarm is an alarm whose alarm time occurs first. When the OTN service is alerted, the first alarm code, the monitoring information, and the sample data range of the first alarm generated by the cloud server are obtained from the cloud server.
通常情况下,OTN业务的业务拓扑中设有多个告警监测点,可以将告警监测点设置在业务节点上,告警监测点发出告警之后会生成告警编码;通常告警时间发生在最前的OTN单板为故障的起点,因此发生在最前的第一告警的第一告警编码对应的第一OTN单板为故障起点;直接获取故障起点避免根据大量的历史数据逐一排查,减少相关工作人员的工作量。Generally, the OTN service has multiple alarm monitoring points in the service topology. The alarm monitoring point can be set on the service node. The alarm monitoring point generates an alarm code after the alarm is generated. The alarm time is usually on the top OTN board. As a starting point of the fault, the first OTN board corresponding to the first alarm code of the first alarm is the fault starting point; directly obtaining the fault starting point avoids checking one by one according to a large amount of historical data, and reduces the workload of the relevant staff.
视图显示模块402,设置为将第一告警编码、监测信息以及样本数据范围与预设的业务流模型通过可视化视图显示。The view display module 402 is configured to display the first alarm code, the monitoring information, and the sample data range and the preset service flow model through the visual view.
其中,根据从云服务器获取的数据,匹配预设的业务流模型,将监测信息中的业务节点数据、性能参数与样本数据范围相对应地显示在可视化模型中,使维护人员根据可视化模型进行数据对比,可快速筛选出发生故障的业务节点,实现远程定位OTN设备故障,不再依赖于现网故障的实际场景,避免告警数据分析的繁琐操作,实现提高故障定位的速度。According to the data obtained from the cloud server, the preset service flow model is matched, and the service node data and the performance parameter in the monitoring information are displayed in the visual model corresponding to the sample data range, so that the maintenance personnel perform data according to the visualization model. In contrast, the faulty service node can be quickly filtered out to achieve remote location OTN device failure, no longer relying on the actual scenario of the existing network fault, avoiding the cumbersome operation of alarm data analysis, and improving the speed of fault location.
可选地,视图显示模块402是设置为:Optionally, the view display module 402 is configured to:
将第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密 成明文,与预设的业务流模型通过可视化视图显示。Decrypting the first alarm code, the monitoring information, and the sample data range according to the first preset encryption algorithm Cheng Mingwen, with the preset business flow model displayed through the visual view.
可选地,该装置还包括:Optionally, the device further includes:
误码显示模块,设置为当服务端监测到OTN业务的误码率超出一预设数值时,从云服务器获取误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,数据包信息包括:接收数据包的数量和发送数据包的数量;以及The erroneous display module is configured to: when the server detects that the OTN service has a BER that exceeds a preset value, the second OTN board corresponding to the BER of the OTN board and the preset of the second OTN board are obtained from the cloud server. Packet information of the service node of the service flow model, the packet information includes: the number of received data packets and the number of transmitted data packets;
将误码率与数据包信息通过可视化视图显示。The bit error rate and packet information are displayed in a visual view.
可选地,误码显示模块是设置为:Optionally, the error display module is set to:
将误码率与数据包信息按照第二预设加密算法解密成明文,与数据包信息通过可视化视图显示。The error rate and the packet information are decrypted into plaintext according to the second preset encryption algorithm, and the packet information is displayed through the visual view.
在本实施例中,当监测到OTN业务发出告警时,从云服务器获取告警时间发生在最前的第一OTN单板的监测信息以及样本数据范围,根据从云服务器获取的数据,匹配预设的业务流模型,将监测信息中的业务节点数据、性能参数与样本数据范围相对应地显示在可视化的模型中,使维护人员根据可视化模型进行数据对比,可快速筛选出故障的业务节点,实现远程定位OTN设备故障,不再依赖于现网故障的实际场景,避免告警数据分析的繁琐过程,。通过客户端从云服务器直接获取故障起点对应的故障信息,避免根据大量的历史数据逐一排查,提高故障定位的速度,且无需工作人员去现场环境进行操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险。In this embodiment, when monitoring the OTN service to send an alarm, the monitoring information and the sample data range of the first OTN board whose alarm time occurs at the foremost time are obtained from the cloud server, and the preset data is matched according to the data acquired from the cloud server. The service flow model displays the service node data and performance parameters in the monitoring information in a visualized model corresponding to the sample data range, so that the maintenance personnel can compare the data according to the visualization model, and can quickly filter out the faulty service node and realize the remote operation. The fault of locating the OTN device is no longer dependent on the actual scenario of the fault on the live network, and the cumbersome process of analyzing the alarm data is avoided. The fault information corresponding to the fault starting point is directly obtained from the cloud server through the client, avoiding troubleshooting according to a large amount of historical data, improving the speed of fault location, and eliminating the need for the staff to go to the site environment for operation, thereby reducing the presence of the fault during the fault location process. The risk of the stability of the operation of the network business.
第五实施例Fifth embodiment
参见图5,本实施例以一个应用场景来介绍故障信息获取方法。图5所示的应用场景主要包括OTN设备、服务端、云服务器以及客户端。Referring to FIG. 5, this embodiment introduces a fault information acquisition method in an application scenario. The application scenarios shown in Figure 5 mainly include an OTN device, a server, a cloud server, and a client.
可以在服务端建立OTN单板的可视化数据模型,在OTN单板硬件布局的架构基础上抽象成可视化视图,在可视化视图中示出业务流向和业务流上的监测点数据。参见图6以及图7,单板级硬件包括客户侧业务板,线路侧业务板和交叉板组等;板内硬件包括OTN业务板的硬件器件、光模块、现场可编程门阵列(Field Programmable Gate Array,FPGA)、成帧芯片和时钟模块等。The visual data model of the OTN board can be established on the server side, and the visualization view is abstracted into a visual view based on the architecture of the OTN board hardware layout, and the monitoring point data on the service flow direction and the service flow is displayed in the visualization view. Referring to FIG. 6 and FIG. 7, the board-level hardware includes a client-side service board, a line-side service board, and a cross-board group. The board hardware includes an OTN service board hardware device, an optical module, and a field programmable gate array (Field Programmable Gate). Array, FPGA), framing chip and clock module.
可选地,抽象出系统级业务处理模型,该模型以每个单板为基本单板,将 一个时间点的告警可视化呈现出来,即将光输入口、光输出口、调度接收端口、调度发送端口、内部接收端口、内部发送端口和背板相关端口的告警以及误码率等影响业务的性能参数展示出来。Optionally, the system-level service processing model is abstracted. The model is based on each board and will be a basic board. The alarms of a time point are visualized, that is, the alarms affecting the service, such as the alarms of the optical input port, the optical output port, the dispatch receiving port, the dispatching and sending port, the internal receiving port, the internal sending port, and the backplane related port, and the bit error rate. Show it out.
根据工程现场的业务拓扑组合,及上述业务处理模型,使得相关工作人员能够快速确定出业务链路上什么节点最先出的告警,什么时间段有告警等,无需工作人员根据现场拓扑从历史告警中筛数据了。According to the service topology combination of the project site and the above-mentioned service processing model, the relevant staff can quickly determine which node is the first out of the traffic on the service link, and what time period has an alarm, etc., without the need for the staff to follow the historical topology from the historical alarm. The screen is in the data.
可选地,抽象出业务单板处理业务流的模型,以单个单板为基准,业务流可以分为A向业务流和B向业务流,其中,A向是指业务从光口到交叉矩阵方向,B向是指业务从交叉矩阵到光口方向,根据实际业务映射及配置,将单板处理的业务流抽象成不同的模型,比如,10Gbps速率的客户侧单板接入10GE的业务,之后业务流入光通道数据单元(ODU2)下的交叉系统,这样的业务处理在单板中A向业务流模型为:10光口(Ge-Lan)→GFP-F→ODU2→交叉系统,B向业务流模型交叉系统为:ODU2→GFP-F→10Ge-LAN;Optionally, the service board is configured to process the service flow model, and the service flow can be divided into an A-direction service flow and a B-direction service flow, where the A-direction refers to the service from the optical port to the cross-matrix. The direction of the B-direction refers to the service from the cross-matrix to the optical interface. According to the actual service mapping and configuration, the service flow processed by the board is abstracted into different models. For example, the 10GE service is accessed on the 10GE service. Then, the service flows into the cross-system under the optical channel data unit (ODU2). The traffic processing model of the A-direction service flow in the single board is: 10 optical port (Ge-Lan)→GFP-F→ODU2→cross system, B direction The traffic flow model cross system is: ODU2→GFP-F→10Ge-LAN;
其中,10Ge-Lan→GFP-F映射过程在成帧芯片(framer)中实现;GFP-F→ODU2处理在FPGA中实现,抽象出的业务流模型在光口呈现10GE相关的告警。The 10Ge-Lan→GFP-F mapping process is implemented in a framer (framer); the GFP-F→ODU2 process is implemented in the FPGA, and the abstracted service flow model presents 10GE related alarms on the optical port.
参见图7,成帧芯片呈现通用成帧协议(Generic Framing Procedure,GFP)层的告警,FPGA呈现光通道数据单元(Optical channel Data Unit,ODU)层告警,光模块部分展现出入光功率,偏置电流等性能,时钟部分展现时钟测频正常范围,保护部分展现当前触发倒换的信号失效和信号劣化等所有告警状态,面向方面编程(Aspect-Oriented Software Development,AOSD)部分展现开关激光器的告警状态等,根据业务板功能不同、业务不同设计不同的抽象模型,以进行可视化展示。Referring to FIG. 7, the framing chip presents an alarm of the Generic Framing Procedure (GFP) layer, and the FPGA presents an Optical Channel Data Unit (ODU) layer alarm. The optical module partially displays the incoming optical power and is offset. Performance such as current, the clock part shows the normal range of clock frequency measurement, the protection part shows all alarm states such as signal failure and signal degradation of the current trigger switching, and the Aspect-Oriented Software Development (AOSD) part shows the alarm state of the switching laser. According to different functions of the business board and different business, different abstract models are designed for visual display.
本实施例通过云同步技术,将业务数据上传至云服务器,以便研发人员或维护人员可以通过客户端获取相应的业务数据进行业务故障分析,这个环节可以包括:数据采集和数据上传的过程。数据采集过程中需要确认待分析的故障链路所经过的单板信息,并启动数据采集设备,并按照约定格式存储并上传业务数据。为了业务数据的安全需要,可以对业务数据进行加密处理后,再上传到云端服务器。数据采集、数据加密及上传的功能可以同时开启。In this embodiment, the cloud data is used to upload the service data to the cloud server, so that the R&D personnel or maintenance personnel can obtain the corresponding service data through the client to perform service fault analysis. This link may include: data collection and data uploading. During the data collection process, you need to confirm the board information that the faulty link to be analyzed passes, and start the data collection device, and store and upload the service data according to the agreed format. For the security needs of business data, the business data can be encrypted and then uploaded to the cloud server. Data collection, data encryption and upload functions can be turned on at the same time.
研发人员得到业务数据之后,可以根据业务的拓扑结构得到系统级OTN的 业务流模型,该系统级的业务流模型可以包括故障业务链路上涉及到的单板,以及单板上各端口的告警,通过选择故障发生的时间段,可以查看告警在整个链路上的传递情况,从而判断出故障发生点。用筛选出来的故障发生点对应的单板的数据得到单板级的业务流模型,单板级的业务流模型可以包括单板内部的告警、时钟频率以及重要监测节点寄存器的值等信息,判断监测节点寄存器的值是否在正常值范围内。After the R&D personnel get the business data, they can get the system-level OTN according to the topology of the service. The service flow model, the system-level service flow model may include the boards involved in the faulty service link, and the alarms of the ports on the board. By selecting the time period during which the fault occurs, you can view the alarms on the entire link. Pass the situation to determine the point at which the failure occurred. The board-level service flow model is obtained by using the data of the board corresponding to the fault occurrence point. The board-level service flow model can include information such as alarms, clock frequencies, and values of important monitoring node registers. Monitors whether the value of the node register is within the normal range.
本实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任意一种方法。The embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
如图8所示,为本实施例提供的一种服务端设备的硬件结构示意图,如图8所示,该服务端设备包括:处理器(processor)510和存储器(memory)520;还可以包括通信接口(Communications Interface)530和总线540。As shown in FIG. 8 , a hardware structure of a server device is provided in this embodiment. As shown in FIG. 8 , the server device includes: a processor 510 and a memory 520. Communication Interface 530 and bus 540.
其中,处理器510、存储器520和通信接口530可以通过总线540完成相互间的通信。通信接口530可以用于信息传输。处理器510可以调用存储器520中的逻辑指令,以执行上述实施例中相应的故障信息获取方法。The processor 510, the memory 520, and the communication interface 530 can complete communication with each other through the bus 540. Communication interface 530 can be used for information transfer. The processor 510 can call the logic instructions in the memory 520 to perform the corresponding fault information acquisition method in the above embodiment.
存储器520可以包括存储程序区和存储数据区,存储程序区可以存储操作系统和至少一个功能所需的应用程序。存储数据区可以存储根据服务端设备的使用所创建的数据等。此外,存储器可以包括,例如,随机存取存储器的易失性存储器,还可以包括非易失性存储器。例如至少一个磁盘存储器件、闪存器件或者其他非暂态固态存储器件。The memory 520 may include a storage program area and a storage data area, and the storage program area may store an operating system and an application required for at least one function. The storage data area can store data and the like created according to the use of the server device. Further, the memory may include, for example, a volatile memory of a random access memory, and may also include a non-volatile memory. For example, at least one disk storage device, flash memory device, or other non-transitory solid state storage device.
此外,在上述存储器520中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,该逻辑指令可以存储在一个计算机可读取存储介质中。本公开的技术方案可以以计算机软件产品的形式体现出来,该计算机软件产品可以存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本实施例所述方法的全部或部分步骤。Moreover, when the logic instructions in the memory 520 described above can be implemented in the form of software functional units and sold or used as separate products, the logic instructions can be stored in a computer readable storage medium. The technical solution of the present disclosure may be embodied in the form of a computer software product, which may be stored in a storage medium, and includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) All or part of the steps of the method described in this embodiment are performed.
存储介质可以是非暂态存储介质,也可以是暂态存储介质。非暂态存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质。 The storage medium may be a non-transitory storage medium or a transitory storage medium. The non-transitory storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. medium.
图9为本实施例提供的一种客户端设备的硬件结构示意图,如图9所示,该客户端设备包括:一个或多个处理器610和存储器620。图9中以一个处理器610为例。FIG. 9 is a schematic diagram of a hardware structure of a client device according to the embodiment. As shown in FIG. 9, the client device includes: one or more processors 610 and a memory 620. One processor 610 is taken as an example in FIG.
所述客户端设备还可以包括:输入装置630和输出装置640。The client device may further include: an input device 630 and an output device 640.
所述客户端设备中的处理器610、存储器620、输入装置630和输出装置640可以通过总线或者其他方式连接,图9中以通过总线连接为例。The processor 610, the memory 620, the input device 630, and the output device 640 in the client device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
输入装置630可以接收输入的数字或字符信息,输出装置640可以包括显示屏等显示设备。The input device 630 can receive input numeric or character information, and the output device 640 can include a display device such as a display screen.
存储器620作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块。处理器610通过运行存储在存储器620中的软件程序、指令以及模块,从而执行多种功能应用以及数据处理,以实现上述实施例中相应的故障信息获取方法。The memory 620 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules. The processor 610 executes a plurality of functional applications and data processing by executing software programs, instructions, and modules stored in the memory 620 to implement corresponding fault information acquisition methods in the above embodiments.
存储器620可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据客户端设备的使用所创建的数据等。此外,存储器可以包括随机存取存储器(Random Access Memory,RAM)等易失性存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件或者其他非暂态固态存储器件。The memory 620 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the client device, and the like. In addition, the memory may include volatile memory such as random access memory (RAM), and may also include non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
存储器620可以是非暂态计算机存储介质或暂态计算机存储介质。该非暂态计算机存储介质,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器620可选包括相对于处理器610远程设置的存储器,这些远程存储器可以通过网络连接至客户端设备。上述网络的实例可以包括互联网、企业内部网、局域网、移动通信网及其组合。 Memory 620 can be a non-transitory computer storage medium or a transitory computer storage medium. The non-transitory computer storage medium, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 620 can optionally include memory remotely located relative to processor 610, which can be connected to the client device over a network. Examples of the above networks may include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
输入装置630可用于接收输入的数字或字符信息,以及产生与客户端设备的用户设置以及功能控制有关的键信号输入。输出装置640可包括显示屏等显示设备。 Input device 630 can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the client device. The output device 640 can include a display device such as a display screen.
本实施例的客户端设备还可以包括通信装置650,通过通信网络传输和/或 接收信息。The client device of this embodiment may also include a communication device 650 for transmission over a communication network and/or Receive information.
本领域普通技术人员可理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来执行相关的硬件来完成的,该程序可存储于一个非暂态计算机可读存储介质中,该程序在执行时,可包括如上述方法的实施例的流程,其中,该非暂态计算机可读存储介质可以为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by executing related hardware by a computer program, and the program can be stored in a non-transitory computer readable storage medium. The program, when executed, may include the flow of an embodiment of the method as described above, wherein the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM). Wait.
工业实用性Industrial applicability
本公开提供了故障信息获取方法及装置,可以在发生业务故障时,直接获取故障起点对应的故障信息,避免根据大量的历史数据逐一排查,提高定位故障的速度,且无需工作人员去现场环境操作,减小了在故障定位过程中对现网业务运行稳定性影响的风险。 The present disclosure provides a fault information acquisition method and device, which can directly obtain fault information corresponding to a fault starting point when a service fault occurs, avoiding troubleshooting according to a large amount of historical data, improving the speed of the positioning fault, and eliminating the need for the staff to go to the site environment. The risk of affecting the operational stability of the existing network during the fault location process is reduced.

Claims (21)

  1. 一种故障信息获取方法,应用于服务端,包括:A method for acquiring fault information is applied to a server, including:
    当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定所述第一告警对应的第一OTN单板;Obtaining a first alarm code of the first alarm and determining a first OTN board corresponding to the first alarm when the OTN service of the optical transmission network is sent an alarm;
    获取业务节点的监测信息以及样本数据范围,所述业务节点为所述第一OTN单板的预设的业务流模型的业务节点;所述监测信息包括所述业务节点在所述第一告警发生的时刻的数据以及性能参数,所述样本数据范围为预设的所述业务节点数据的数据范围和性能参数范围;以及Acquiring the monitoring information of the service node and the sample data range, where the service node is a service node of the preset service flow model of the first OTN board; the monitoring information includes that the service node occurs in the first alarm Data of the moment and the performance parameter, the sample data range being a preset data range and a performance parameter range of the service node data;
    将所述第一告警编码、所述监测信息以及所述样本数据范围上传至云服务器。Uploading the first alarm code, the monitoring information, and the sample data range to a cloud server.
  2. 根据权利要求1所述的方法,其中,所述当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定所述第一告警对应的第一OTN单板,包括:The method of claim 1, wherein the first OTN board corresponding to the first alarm is obtained, and the first OTN board corresponding to the first alarm is obtained, :
    当监测光传送网OTN业务的业务拓扑中的告警监测点发出告警时,获取所述告警的告警编码,所述告警编码中携带有所述告警发生的时间信息、位置信息以及告警名称;Obtaining an alarm code of the alarm when the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, where the alarm code carries time information, location information, and an alarm name of the alarm;
    根据所述告警编码,确定告警时间发生在最前的第一告警以及所述第一告警编码;以及Determining, according to the alarm code, a first alarm that occurs at an alarm time and the first alarm code;
    根据所述第一告警编码,确定所述第一OTN单板。Determining the first OTN board according to the first alarm code.
  3. 根据权利要求1所述的方法,其中,所述业务节点包括:业务拆分节点、业务封装节点以及硬件节点。The method of claim 1, wherein the service node comprises: a service split node, a service encapsulation node, and a hardware node.
  4. 根据权利要求1所述的方法,其中,所述性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流。The method of claim 1 wherein the performance parameters comprise: a clock frequency, a peripheral chip state, an optical power, an optical module bias voltage, and a bias current.
  5. 根据权利要求1所述的方法,还包括:The method of claim 1 further comprising:
    当监测到所述OTN业务的误码率超出一预设数值时,获取所述误码率对应的第二OTN单板;Obtaining a second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value;
    获取所述第二OTN单板的预设的业务流模型的业务节点的数据包信息,所述数据包信息包括:接收数据包的数量和发送数据包的数量;以及Acquiring packet information of the service node of the preset service flow model of the second OTN board, where the data packet information includes: the number of received data packets and the number of sent data packets;
    将所述误码率与所述数据包信息上传至所述云服务器。Transmitting the error rate and the data packet information to the cloud server.
  6. 根据权利要求1所述的方法,其中,所述将所述第一告警编码、所述监 测信息以及所述样本数据范围上传至云服务器,包括:The method of claim 1 wherein said encoding said first alert, said supervisor The measurement information and the sample data range are uploaded to the cloud server, including:
    将所述第一告警编码、所述监测信息以及所述样本数据范围按照第一预设加密算法加密后,上传至所述云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to a first preset encryption algorithm, and then uploaded to the cloud server.
  7. 根据权利要求5所述的方法,其中,所述将所述误码率与所述数据包信息上传至所述云服务器,包括:The method of claim 5, wherein the uploading the error rate and the data packet information to the cloud server comprises:
    将所述误码率与所述数据包信息按照第二预设加密算法加密后,上传至所述云服务器。And encrypting the error rate and the data packet information according to a second preset encryption algorithm, and uploading the data to the cloud server.
  8. 一种故障信息获取方法,应用于客户端,包括:A fault information acquisition method is applied to a client, including:
    当服务端监测到光传送网OTN业务发出告警时,客户端从已获取第一告警的第一告警编码的云服务器获取第一告警的第一告警编码、与所述第一告警编码相对应的监测信息以及样本数据范围;所述监测信息为所述第一告警对应的第一OTN单板的预设的业务流模型的业务节点在所述第一告警发生的时刻的数据以及性能参数,所述样本数据范围为预设的所述业务节点数据的数据范围和性能参数范围;以及When the server detects that the OTN service of the optical transport network sends an alarm, the client obtains the first alarm code of the first alarm and the first alarm code corresponding to the cloud server that has obtained the first alarm code of the first alarm. The monitoring information and the sample data range; the monitoring information is data and performance parameters of the service node of the preset service flow model of the first OTN board corresponding to the first alarm at the moment when the first alarm occurs. The sample data range is a preset data range and a performance parameter range of the service node data;
    将所述第一告警编码、监测信息以及样本数据范围与所述预设的业务流模型通过可视化视图显示。And displaying the first alarm code, the monitoring information, and the sample data range and the preset service flow model through a visual view.
  9. 根据权利要求8所述的方法,其中,所述将所述第一告警编码、监测信息以及样本数据范围与所述预设的业务流模型通过可视化视图显示,包括:The method of claim 8, wherein the displaying the first alarm code, the monitoring information, and the sample data range with the preset service flow model through a visual view comprises:
    将所述第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密成明文,与所述预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range are decrypted into a plaintext according to a first preset encryption algorithm, and the preset service flow model is displayed through a visual view.
  10. 根据权利要求8所述的方法,其中,还包括:The method of claim 8 further comprising:
    当服务端监测到所述OTN业务的误码率超出一预设数值时,客户端从所述云服务器获取所述误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,所述数据包信息包括:接收数据包的数量和发送数据包的数量;以及When the server detects that the error rate of the OTN service exceeds a preset value, the client obtains the second OTN board corresponding to the error rate and the preset of the second OTN board from the cloud server. Packet information of a service node of a service flow model, the packet information including: the number of received data packets and the number of transmitted data packets;
    将所述误码率与所述数据包信息通过可视化视图显示。The error rate and the packet information are displayed through a visual view.
  11. 根据权利要求10所述的方法,其中,所述将所述误码率与所述数据包信息通过可视化视图显示,包括:The method of claim 10, wherein the displaying the bit error rate and the data packet information through a visual view comprises:
    将所述误码率与所述数据包信息按照第二预设加密算法解密成明文,与所 述数据包信息通过可视化视图显示。Decrypting the error rate and the data packet information into a plaintext according to a second preset encryption algorithm, and The packet information is displayed in a visual view.
  12. 一种故障信息获取装置,应用于服务端,包括:A fault information acquiring device is applied to a server, and includes:
    告警监测模块,设置为当监测到光传送网OTN业务发出告警时,获取第一告警的第一告警编码并确定所述第一告警对应的第一OTN单板;The alarm monitoring module is configured to: when the OTN service of the optical transport network is sent to send an alarm, obtain the first alarm code of the first alarm, and determine the first OTN board corresponding to the first alarm;
    信息获取模块,设置为获取业务节点的监测信息以及样本数据范围,所述业务节点为所述第一OTN单板的预设的业务流模型的业务节点;所述监测信息包括所述业务节点在所述第一告警发生的时刻的数据以及性能参数,所述样本数据范围为预设的所述业务节点数据的数据范围和性能参数范围;以及An information obtaining module, configured to acquire monitoring information of a service node and a sample data range, where the service node is a service node of a preset service flow model of the first OTN board; and the monitoring information includes the service node The data of the moment when the first alarm occurs and the performance parameter, where the sample data range is a preset data range and a performance parameter range of the service node data;
    数据上传模块,设置为将所述第一告警编码、所述监测信息以及所述样本数据范围上传至云服务器。The data uploading module is configured to upload the first alarm code, the monitoring information, and the sample data range to a cloud server.
  13. 根据权利要求12所述的装置,其中,所述告警监测模块包括:The device of claim 12, wherein the alarm monitoring module comprises:
    告警编码获取子模块,设置为当监测光传送网OTN业务的业务拓扑中的告警监测点发出告警时,获取所述告警的告警编码,所述告警编码中携带有所述告警发生的时间信息、位置信息以及告警名称;The alarm code acquisition sub-module is configured to acquire an alarm code of the alarm when the alarm monitoring point in the service topology of the OTN service of the optical transport network sends an alarm, where the alarm code carries the time information of the alarm occurrence, Location information and the name of the alarm;
    第一告警确定子模块,设置为根据所述告警编码,确定告警时间发生在最前的第一告警以及所述第一告警的第一告警编码;以及a first alarm determining submodule, configured to determine, according to the alarm code, a first alarm that occurs at an alarm time and a first alarm code of the first alarm;
    单板确定子模块,设置为根据所述第一告警编码,确定所述第一告警对应的第一OTN单板。The board determining sub-module is configured to determine, according to the first alarm code, a first OTN board corresponding to the first alarm.
  14. 根据权利要求12所述的装置,其中,所述业务节点包括:业务拆分节点、业务封装节点以及硬件节点。The apparatus of claim 12, wherein the service node comprises: a service split node, a service encapsulation node, and a hardware node.
  15. 根据权利要求12所述的装置,其中,所述性能参数包括:时钟频率、外围芯片状态、光功率、光模块偏置电压以及偏置电流。The apparatus of claim 12, wherein the performance parameters comprise: a clock frequency, a peripheral chip state, an optical power, an optical module bias voltage, and a bias current.
  16. 根据权利要求12所述的装置,还包括:The apparatus of claim 12, further comprising:
    误码监测模块,设置为当监测到所述OTN业务的误码率超出一预设数值时,获取所述误码率对应的第二OTN单板;The error monitoring module is configured to acquire the second OTN board corresponding to the error rate when the error rate of the OTN service exceeds a preset value;
    数据包信息获取模块,设置为获取所述第二OTN单板的预设的业务流模型的业务节点的数据包信息,所述数据包信息包括:接收数据包的数量和发送数据包的数量;以及a packet information obtaining module, configured to acquire data packet information of a service node of a preset service flow model of the second OTN board, where the data packet information includes: a quantity of received data packets and a quantity of sent data packets; as well as
    误码信息上传模块,设置为将所述误码率与所述数据包信息上传至所述云 服务器。The error information uploading module is configured to upload the error rate and the data packet information to the cloud server.
  17. 根据权利要求12所述的装置,其中,所述数据上传模块是设置为:The apparatus of claim 12, wherein the data uploading module is configured to:
    将所述第一告警编码、所述监测信息以及所述样本数据范围按照第一预设加密算法加密后,上传至所述云服务器。The first alarm code, the monitoring information, and the sample data range are encrypted according to a first preset encryption algorithm, and then uploaded to the cloud server.
  18. 一种故障信息获取装置,应用于客户端,包括:A fault information obtaining device is applied to a client, including:
    数据获取模块,设置为当服务端监测到光传送网OTN业务发出告警时,从已获取第一告警的第一告警编码的云服务器获取第一告警的第一告警编码、与所述第一告警编码相对应的监测信息以及样本数据范围;所述监测信息为所述第一告警对应的第一OTN单板的预设的业务流模型的业务节点在所述第一告警发生的时刻的数据以及性能参数,所述样本数据范围为预设的所述业务节点数据的数据范围和性能参数范围;以及a data acquisition module, configured to: when the server detects that the OTN service of the optical transport network sends an alarm, obtain the first alarm code of the first alarm, and the first alarm from the cloud server that has obtained the first alarm code of the first alarm Encoding corresponding monitoring information and a sample data range; the monitoring information is data of a service node of a preset service flow model of the first OTN board corresponding to the first alarm at a moment when the first alarm occurs a performance parameter, where the sample data range is a preset data range and a performance parameter range of the service node data;
    视图显示模块,设置为将所述第一告警编码、监测信息以及样本数据范围与所述预设的业务流模型通过可视化视图显示。The view display module is configured to display the first alarm code, the monitoring information, and the sample data range and the preset service flow model through a visual view.
  19. 根据权利要求18所述的装置,其中,所述视图显示模块是设置为:The apparatus of claim 18, wherein the view display module is configured to:
    将所述第一告警编码、监测信息以及样本数据范围按照第一预设加密算法解密成明文,与所述预设的业务流模型通过可视化视图显示。The first alarm code, the monitoring information, and the sample data range are decrypted into a plaintext according to a first preset encryption algorithm, and the preset service flow model is displayed through a visual view.
  20. 根据权利要求18所述的装置,还包括:The apparatus of claim 18, further comprising:
    误码显示模块,设置为当服务端监测到所述OTN业务的误码率超出一预设数值时,从所述云服务器获取所述误码率对应的第二OTN单板,以及第二OTN单板的预设的业务流模型的业务节点的数据包信息,所述数据包信息包括:接收数据包的数量和发送数据包的数量;以及The error display module is configured to: when the server detects that the error rate of the OTN service exceeds a preset value, acquire, by the cloud server, the second OTN board corresponding to the error rate, and the second OTN The packet information of the service node of the preset service flow model of the board, where the data packet information includes: the number of received data packets and the number of sent data packets;
    将所述误码率与所述数据包信息通过可视化视图显示。The error rate and the packet information are displayed through a visual view.
  21. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-7和8-11中任一项的方法。 A computer readable storage medium storing computer executable instructions for performing the method of any of claims 1-7 and 8-11.
PCT/CN2017/090871 2016-06-29 2017-06-29 Method and device for acquiring fault information WO2018001326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610506815.6 2016-06-29
CN201610506815.6A CN107547127A (en) 2016-06-29 2016-06-29 A kind of failure information obtaining method and device

Publications (1)

Publication Number Publication Date
WO2018001326A1 true WO2018001326A1 (en) 2018-01-04

Family

ID=60785824

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090871 WO2018001326A1 (en) 2016-06-29 2017-06-29 Method and device for acquiring fault information

Country Status (2)

Country Link
CN (1) CN107547127A (en)
WO (1) WO2018001326A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112054860A (en) * 2020-09-15 2020-12-08 上海华兴数字科技有限公司 Radio frequency fault recurrence method and system
CN112217691A (en) * 2020-02-19 2021-01-12 杜义平 Network diagnosis processing method and device based on cloud platform
CN112350854A (en) * 2020-10-22 2021-02-09 中国建设银行股份有限公司 Flow fault positioning method, device, equipment and storage medium
CN113132128A (en) * 2019-12-30 2021-07-16 北京华为数字技术有限公司 Prompt information processing method, device and storage medium
CN114268562A (en) * 2021-11-01 2022-04-01 贵州电网有限责任公司 Transmission link detection device, system and method for chip relay protection
CN115396281A (en) * 2021-05-07 2022-11-25 中国移动通信集团设计院有限公司 Alarm visualization method, device, equipment and computer readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113904718B (en) * 2021-12-09 2022-04-01 深圳市飞速创新技术股份有限公司 Optical module detection method, terminal equipment and computer readable storage medium
CN115276779B (en) * 2022-06-23 2023-07-04 中国联合网络通信集团有限公司 Optical transport network circuit information acquisition method, device, system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905590A (en) * 2006-08-16 2007-01-31 华为技术有限公司 Single chip information acquiring method
US20140169783A1 (en) * 2012-12-17 2014-06-19 Steven Arvo Surek Fault localization using tandem connection monitors in optical transport network
CN103973496A (en) * 2014-05-21 2014-08-06 华为技术有限公司 Fault diagnosis method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905590A (en) * 2006-08-16 2007-01-31 华为技术有限公司 Single chip information acquiring method
US20140169783A1 (en) * 2012-12-17 2014-06-19 Steven Arvo Surek Fault localization using tandem connection monitors in optical transport network
CN103973496A (en) * 2014-05-21 2014-08-06 华为技术有限公司 Fault diagnosis method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132128A (en) * 2019-12-30 2021-07-16 北京华为数字技术有限公司 Prompt information processing method, device and storage medium
CN113132128B (en) * 2019-12-30 2022-07-19 北京华为数字技术有限公司 Prompt information processing method, device and storage medium
CN112217691A (en) * 2020-02-19 2021-01-12 杜义平 Network diagnosis processing method and device based on cloud platform
CN112054860A (en) * 2020-09-15 2020-12-08 上海华兴数字科技有限公司 Radio frequency fault recurrence method and system
CN112350854A (en) * 2020-10-22 2021-02-09 中国建设银行股份有限公司 Flow fault positioning method, device, equipment and storage medium
CN112350854B (en) * 2020-10-22 2022-11-18 中国建设银行股份有限公司 Flow fault positioning method, device, equipment and storage medium
CN115396281A (en) * 2021-05-07 2022-11-25 中国移动通信集团设计院有限公司 Alarm visualization method, device, equipment and computer readable storage medium
CN115396281B (en) * 2021-05-07 2023-10-27 中国移动通信集团设计院有限公司 Alarm visualization method, device, equipment and computer readable storage medium
CN114268562A (en) * 2021-11-01 2022-04-01 贵州电网有限责任公司 Transmission link detection device, system and method for chip relay protection

Also Published As

Publication number Publication date
CN107547127A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
WO2018001326A1 (en) Method and device for acquiring fault information
US10868730B2 (en) Methods, systems, and computer readable media for testing network elements of an in-band network telemetry capable network
WO2018126645A1 (en) Communication network management method and apparatus therefor
US11502932B2 (en) Indirect testing using impairment rules
US9210050B2 (en) System and method for a testing vector and associated performance map
US10425320B2 (en) Methods, systems, and computer readable media for network diagnostics
US9692671B2 (en) Method and apparatus for automatically determining causes of service quality degradation
CN111934936B (en) Network state detection method and device, electronic equipment and storage medium
US10708155B2 (en) Systems and methods for managing network operations
US20150117244A1 (en) Methods to visualize time-aligned data flow between nodes in a communication network
EP3897026A1 (en) Network analytics
EP3791543B1 (en) Packet programmable flow telemetry profiling and analytics
JP6124612B2 (en) Engineering apparatus and engineering method
WO2016091019A1 (en) Method and corresponding device for counting and analyzing traffic of characteristic data packet
WO2015117456A1 (en) Link polling method, device and system, and computer storage medium
CN104104548A (en) Network security posture information acquisition system and method based on SFLOW and OWAMP (One Way Active Measurement Protocol)
EP2596601B1 (en) Logging control plane events
US10338544B2 (en) Communication configuration analysis in process control systems
CN109964450B (en) Method and device for determining shared risk link group
CN101431435A (en) Connection-oriented service configuration and management method
Vuletić et al. Localization of network service performance degradation in multi-tenant networks
WO2016065752A1 (en) Method and device for detecting link state, and storage medium
EP3107242B1 (en) Network diagnosis processing method and device
WO2016078302A1 (en) Failure data acquiring system, remote device control system and corresponding method thereof
JP2014036310A (en) Apparatus and method for evaluating effect

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17819323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17819323

Country of ref document: EP

Kind code of ref document: A1