CN115037643A - Method and device for acquiring and labeling network health state data - Google Patents

Method and device for acquiring and labeling network health state data Download PDF

Info

Publication number
CN115037643A
CN115037643A CN202210299221.8A CN202210299221A CN115037643A CN 115037643 A CN115037643 A CN 115037643A CN 202210299221 A CN202210299221 A CN 202210299221A CN 115037643 A CN115037643 A CN 115037643A
Authority
CN
China
Prior art keywords
network
data
network performance
performance data
health
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210299221.8A
Other languages
Chinese (zh)
Other versions
CN115037643B (en
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Wuhan Fiberhome Technical Services Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Wuhan Fiberhome Technical Services Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd, Wuhan Fiberhome Technical Services Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN202210299221.8A priority Critical patent/CN115037643B/en
Publication of CN115037643A publication Critical patent/CN115037643A/en
Application granted granted Critical
Publication of CN115037643B publication Critical patent/CN115037643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/16Time-division multiplex systems in which the time allocation to individual channels within a transmission cycle is variable, e.g. to accommodate varying complexity of signals, to vary number of channels transmitted
    • H04J3/1605Fixed allocated frame structures
    • H04J3/1652Optical Transport Network [OTN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Maintenance And Management Of Digital Transmission (AREA)

Abstract

The invention relates to the technical field of communication, and provides a method and a device for acquiring and labeling network health state data. Wherein the method comprises: collecting network performance data and alarm information, cleaning the network performance data once according to the alarm information, removing the network performance data in a network fault state, and forming a first data set; selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information; and carrying out secondary cleaning on the first data set according to the network health state, removing the network performance data when the network accidental event exists, and forming a second data set for machine learning. The invention removes the network performance data when the network fault or the network incident occurs through the primary cleaning and the secondary cleaning, and improves the effect of machine learning.

Description

Method and device for acquiring and labeling network health state data
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for acquiring and labeling network health state data.
Background
With the development of artificial intelligence technology, all trades begin to utilize machine learning to solve the problem in self field, promote efficiency reduce cost. In the field of networks, with the increasing expansion of network scale and the rapid development of 5G networks, the traditional manual operation and maintenance mode cannot meet the requirement of quickly positioning the problem and solving the hidden trouble problem, and it is a common consensus in the industry to introduce the assistance of an artificial intelligence technology to improve the operation and maintenance efficiency. In the data-based machine learning technique, the quality of data directly determines the effect of machine learning, and therefore, processing of data is often the most important part of the machine learning process. Meanwhile, machine learning based on supervised learning needs to label a large number of real samples so as to apply data of the samples to machine learning and training, and most of data labeling is performed in a manual mode at present.
In the aspect of AI machine learning related to network performance and network health, for example, network performance prediction, network degradation trend analysis, network performance degradation fault point tracing, etc., the collected data need to be labeled during supervised learning, but there are various problems through manual labeling, including: when the labeling is carried out, the network health state has no unified standard, most of the labeling processes define and divide the network health state according to operation and maintenance experience, various performances, alarms and the like, the selected labels are various in basis, and different influences can be generated on the effect of machine learning according to different experiences of labeling personnel. The samples are usually collected from a network management system, and according to different network states, dirty data which affect the effect of machine learning may exist, such as data in the case of network faults or network accidental events, and if the data are directly brought into the machine learning, the effect of machine learning may be affected.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
Disclosure of Invention
The invention aims to solve the problem that the dirty data of the collected samples in the case of network faults or network accidental events influences the effect of machine learning.
The invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for acquiring and labeling network health state data, including:
collecting network performance data and alarm information, cleaning the network performance data once according to the alarm information, removing the network performance data in a network fault state, and forming a first data set;
selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information;
and cleaning the first data set for the second time according to the network health state, removing the network performance data when the network accidental events exist, and forming a second data set for machine learning.
Preferably, the acquiring network performance data and alarm information specifically includes:
according to a network performance index required to be evaluated by machine learning, acquiring one or more first data items related to the network performance index in a first network layer where the network performance index is located at intervals of a preset period, acquiring one or more second data items related to the network performance index in one or more second network layers related to the first network layer, and acquiring alarm information related to the network performance index in the first network layer and the second network layer;
and combining all the first data items and all the second data items acquired in each preset period to serve as corresponding network performance data in the preset period.
Preferably, the cleaning the network performance data once according to the alarm information specifically includes:
and judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing the network performance data corresponding to the preset period.
Preferably, the marking a network health status for each piece of network performance data in the first data set according to the feature information specifically includes:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data when the values of the characteristic information are at different positions of the range interval.
Preferably, when there is no hierarchical relationship among the selected pieces of feature information, the obtaining of the network health state of the network performance data according to one or more feature health states corresponding to the network performance data specifically includes:
setting different proportions for each characteristic information, calculating the total proportion of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total proportion as the network health state corresponding to the network performance data.
Preferably, when a hierarchical relationship exists between the selected pieces of feature information, the obtaining of the network health status of the network performance data according to one or more feature health statuses corresponding to the network performance data specifically includes:
and determining a corresponding network health state range according to the characteristic health state corresponding to each characteristic information, and selecting a common network health state from the multiple network health state ranges corresponding to the multiple characteristic information as the network health state corresponding to the network performance data.
Preferably, the performing the secondary cleaning on the first data set according to the network health status specifically includes:
finding first network performance data in the first data set, wherein the first network performance data is different from the network health state of the last piece of network performance data;
taking each piece of first network performance data in the first data set as target data, and if the first network data exists in a first preset amount of network performance data after the target data, removing the target data from the first data set.
Preferably, before said targeting each of the first network performance data in the first data set, the method further comprises:
and dividing the first data set into one or more intervals by taking the number of preset data pieces as the interval size, and removing all the network performance data in the interval from the first data set if the number of the first network performance data in the interval exceeds a second preset number.
Preferably, the network health status includes: health, sub-health and deterioration.
In a second aspect, the present invention further provides an apparatus for implementing the network health status data collection annotation described in the first aspect, where the apparatus includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of network health status data collection tagging of the first aspect.
In a third aspect, the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, which are executed by one or more processors, for performing the method for acquiring and labeling network health status data according to the first aspect.
The invention removes the network performance data when the network fault or the network incident occurs through the primary cleaning and the secondary cleaning, and improves the machine learning effect. In addition, in the preferred embodiment of the invention, the quality of data required by machine learning is ensured by formulating a standard flow of marking and cleaning, so that the effect of machine learning is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart of a method for acquiring and labeling network health status data according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for acquiring and labeling network health status data according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for acquiring and labeling network health status data according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for acquiring and labeling network health status data according to an embodiment of the present invention;
fig. 5 is a network hierarchy architecture diagram of an optical transport network OTN provided by an embodiment of the present invention;
FIG. 6 is a flowchart of a method for acquiring and labeling network health status data according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a plurality of pieces of collected network performance data provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of collected alarm information provided by an embodiment of the present invention;
FIG. 9 is a schematic illustration of a formed second data set provided by an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a network health status data collecting and labeling apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
an embodiment 1 of the present invention provides a method for acquiring and labeling network health state data, as shown in fig. 1, specifically including:
in step 201, network performance data and alarm information are collected, the network performance data is cleaned once according to the alarm information, the network performance data in a network fault state is removed, and a first data set is formed.
The acquiring of the plurality of pieces of network performance data specifically refers to selecting and combining one or more data items related to the network performance indexes as network performance data in a network layer where the network performance indexes are located according to the network performance indexes which need to be concerned by machine learning, and generally acquiring one piece of network performance data at intervals of a preset period, wherein the preset period is obtained by a person skilled in the art according to empirical analysis. The data items contained in the collected network performance data may be: bit Interleaved parity bit (bip) (bit Interleaved parity) Error code, crc (cyclic Redundancy check) check Error, Packet Loss (Packet Loss), Packet Loss Rate (Packet Loss Rate), ber (bit Error Rate), background Error Block bbe (background Block Error), background Error Block Error ratio bber (background Block Error Rate), ES (Error Second), SES (Serious Error Second), unavailable time uat (unailabletime), alarm information, etc. The selected network performance data may also be different according to different network performance indexes that need to be concerned by machine learning. The alarm information is related alarms generated in a network layer where network performance indexes which need to be concerned by machine learning are located. The network fault specifically refers to a fault which can not be repaired or can exist for a long time, such as optical fiber disconnection, a network loop and the like, when the network fault occurs, the network performance data can have errors, if the network fault is used for machine learning, the machine learning can be incorrect, so that whether the network fault occurs or not is distinguished through alarm information, and the network performance data when the network fault occurs is removed.
In step 202, one or more data items in the first data set are selected as one or more feature information, and a network health status is labeled for each piece of network performance data in the first data set according to the feature information.
The characteristic information is determined and selected according to network performance indexes which need to be concerned by machine learning. Wherein a piece of network performance data comprises one or more data items. The network health status generally includes two statuses, namely a health status and a degradation status, but in order to make the effect of machine learning more accurate and achieve better network performance prediction, the network health status may be defined as more than two, for example, the network health status is defined as three, and the network health status includes: health, sub-health and deterioration. When the machine learning is used for knowing that the network is in the sub-health state, corresponding measures can be taken to prevent the network health state from further deteriorating, so that the continuous and stable operation of the network is guaranteed.
In step 203, the first data set is cleaned for the second time according to the network health status, and the network performance data when the network incident exists is removed, so as to form a second data set for machine learning.
The network accidental event specifically refers to short-term events such as network routing change, network circuit switching, network maintenance operation, power switching and the like, and when the network accidental event occurs, the network performance data may have errors, so that the network performance data when the network accidental event occurs is removed through secondary cleaning.
The embodiment provides a method for acquiring and labeling network health state data, which removes network performance data during network faults or network accidental events through primary cleaning and secondary cleaning, and improves the effect of machine learning.
In an actual network application process, machine learning is mainly used for network performance prediction, network degradation trend analysis, network performance degradation traceability and the like, and in a currently applied network technology center, a network is usually multi-level, and only when traceability is performed through related data items in a network layer where network performance indexes are located, only the source of network performance degradation can be traced to the network layer, and other related network layers cannot be traced further, so that as shown in fig. 2, the following preferred embodiments exist, that is, the collection of network performance data and alarm information specifically includes:
in step 301, according to a network performance index required to be evaluated by machine learning, at preset intervals, one or more first data items related to the network performance index are collected in a first network layer where the network performance index is located, and one or more second data items related to the network performance index are collected in one or more second network layers related to the first network layer.
In step 302, alarm information related to the network performance index is collected in the first network layer and the second network layer.
In step 303, all the first data items and all the second data items collected in each preset period are combined to be used as corresponding network performance data in the preset period.
Wherein one or more second data items may be collected in a second network layer. The network performance indexes to be evaluated by machine learning may be one or more, the second network layer is another network layer in the network different from the first network layer, and the related one or more second network layers are specifically one or more network layers through which data reaching the first network layer passes before reaching the first network layer in the data transmission process, and these network layers may affect the network condition of the first network layer.
The second network layer is typically a lower network layer of the first network layer, and due to the nature of network data transmission from the lower network layer to an upper network layer, the data will also pass through the second network layer before reaching the first network layer.
The preset period is obtained by analyzing according to the network transmission condition and the machine learning requirement by the technical personnel in the field.
When the data item of the second network layer is also selected, only the data item of the first network layer can be selected as the feature information in step 202.
The preferred embodiment collects not only the data item of the first network layer where the network performance index is located, but also the data item of the second network layer related to the first network layer, so that when tracing the network performance degradation, the data item can be traced back to not only the degradation source of the first network layer, but also the lower network layer, and the true cause of the network performance degradation can be determined.
The alarm information is usually reported by an alarm module in a network management system, and due to the multi-level architecture design in a network, when one network layer has a fault, other network layers may also have faults or cannot normally operate. That is, under the design framework of the current alarm module, when alarm information appears in a certain network layer, it may be indicated that other network layers associated with the network layer also have a failure, when a flush is performed, if the cleaning is performed only according to the alarm information of the network layer where the network performance index concerned by machine learning is located, it may only be possible to remove network performance data at partial network failures, to remove network performance data at all network failures, and to perform cleaning according to the alarm information of the relevant network layer, so in the preferred embodiment, and by collecting the alarm information in the first network layer and the second network layer related to the first network layer, the alarm information appearing in all the network layers related to the first network layer can participate in one-time cleaning of the network data, thereby removing all network performance data in the presence of network faults associated with network performance indicators.
On the basis of the above preferred embodiment, there is also a preferred implementation manner that the performing of the cleaning of the network performance data according to the alarm information includes:
and judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing the network performance data corresponding to the preset period.
The alarm information typically includes one or more alarm items, each of which is used to determine whether there is an alarm of a corresponding category.
In the embodiment, the network performance data when the alarm exists is removed through the time correlation between the alarm information and the network performance data, and the embodiment not only carries out data cleaning through the alarm information in the first network layer but also through the alarm information in the second network layer related to the first network layer, so that when the related alarm occurs in any related network layer, the acquired abnormal network performance data can not be used for machine learning, and the machine learning effect is ensured.
In the foregoing embodiment, one or more data items may be selected as feature information, when one data item is selected as feature information for network performance data tagging, the network health status of the corresponding network performance data is determined by the single feature information, and when multiple data items are selected for network performance data tagging, how to obtain the network health status of the corresponding network performance data is a problem that needs to be solved, for which, the following preferred embodiments are provided, which specifically include:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data when the values of the characteristic information are at different positions of the range interval.
Wherein the characteristic health status may include some or all of the network health status.
In this embodiment, a network health value corresponding to each feature information is obtained by presetting a range interval for each feature information, and then a network health state is obtained according to the network health value, so that when a plurality of data items are selected as the feature information, a unique network health state corresponding to the network performance data can be obtained.
Wherein, the network health status of the network performance data is obtained according to one or more characteristic health statuses corresponding to the network performance data, and a commonly used mode is as follows: and taking the characteristic network state with the maximum quantity in the plurality of characteristic health states corresponding to the network performance data as the network health state.
When the influence degree of each data item on the network is consistent, it is feasible to obtain the network health state by using the above method, but in practical cases, the influence degree of the selected data item on the network health state is not completely consistent, for example, the two selected data items are respectively network average delay and packet loss rate, the network average delay shows the influence on the network transmission speed, and the packet loss rate shows the influence on the stability of network transmission, wherein, for those skilled in the art, when the network degradation trend analysis is performed by machine learning, the stability of network transmission has a larger influence on the network degradation trend analysis than the network transmission speed, and when the network health state obtained by using the above method is performed by machine learning, the conclusion of network degradation may be obtained by machine learning when the network average delay is too large, this may not be desired by those skilled in the art, so there is also a preferred implementation manner that, when there is no hierarchical relationship among the selected pieces of feature information, the deriving the network health state of the network performance data according to one or more feature health states corresponding to the network performance data specifically includes:
setting different proportions for each characteristic information, calculating the total proportion of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total proportion as the network health state corresponding to the network performance data.
Wherein, the proportion is set by analysis of the influence degree of each characteristic information on the network health state by the technicians in the field.
The optimal implementation mode reflects the influence degree on the network health state by presetting different proportions for different characteristic information, so that the obtained network health state is more accurate, and the machine learning effect is optimized. And the preferred implementation mode can realize automatic marking of the network performance data by presetting the ratio sum without the participation of manpower, thereby standardizing the network health state evaluation flow without depending on the experience of marking personnel.
When a hierarchical relationship exists between the selected pieces of feature information, for example, when the two selected data items are an ES and an SES, respectively, where the SES is higher in hierarchy than the ES, that is, when an SES occurs, the ES inevitably exists, in this case, if the obtained network health state may be inaccurate by calculating the network health state in a manner of proportion or selecting the feature health state with the largest number as the network health state, there is a preferred implementation manner that, when a hierarchical relationship exists between the selected pieces of feature information, the network health state of the network performance data is obtained according to one or more feature health states corresponding to the network performance data, which specifically includes:
and determining a range of the corresponding network health state according to the characteristic health state corresponding to each characteristic information, and selecting a common network health state from the range of the plurality of network health states corresponding to the plurality of characteristic information as the network health state corresponding to the network performance data.
In the preferred implementation mode, each characteristic information corresponds to one range, and then a common network performance state in each range is found as the network health state corresponding to the network performance data, so that an accurate network health state is obtained, and the effect of machine learning is optimized. In addition, the optimal implementation mode can realize automatic marking of the network performance data by presetting the range of the network health state corresponding to the characteristic health state of each piece of characteristic information without human participation, so that the network health state judgment process is standardized without depending on the experience of marking personnel.
On the basis of the preferred implementation manner, if some of the selected feature information has a hierarchical relationship and other feature information does not have a hierarchical relationship, the feature information having the hierarchical relationship can be regarded as single feature information, a single proportion is preset for the single feature information, a network health state corresponding to the feature information having the hierarchical relationship is obtained through the preferred implementation manner, the network health state is converted into a corresponding feature health state, the single feature health state and the single proportion corresponding to the single feature information are involved in the calculation of the subsequent network health state, and the final network health state is calculated through the proportion as the network health state corresponding to the network performance data.
In the above embodiment, the second cleaning is performed on the first data set according to the network health status to remove the problem that how to distinguish whether the network performance data is the network performance data when the network sporadic event exists from the network performance data when the network sporadic event exists, and a common means is to formulate the corresponding landmark information when different network sporadic events occur through experience of a person skilled in the art, and determine whether the network sporadic event exists by determining whether the landmark information exists in the network performance data. However, since one network incident may need to be comprehensively judged by a plurality of data items, and the network incident is of various types, this method is relatively complicated to implement, and the performance is low, so as to solve this problem, as shown in fig. 3, there are the following preferred embodiments, which specifically include:
in step 401, first network performance data in the first data set that is different from the network health status of the last network performance data is found.
In step 402, each piece of first network performance data in the first data set is used as target data, and if the first network data exists in a first preset amount of network performance data after the target data, the target data is removed from the first data set.
Wherein the first preset number is obtained by a person skilled in the art according to the time period of network performance data collection and empirical analysis.
Since the health state of the network is necessarily affected when a network incident occurs, and the purpose of the secondary cleaning is only to remove the network performance data when the network incident occurs, and does not care about the type of the network incident, the preferred embodiment uses the network health state as the basis for determining the network incident, when the network health state changes, determines whether the change can be recovered or changed to another state, and if the change is recovered or changed to another state in a short time, it is considered that the change does not have sustainability, that is, the change is caused by the network incident, and the data is removed, thereby ensuring the effectiveness of the network performance data in the second data set for machine learning, and ensuring the effect of machine learning.
In practical situations, when a network sporadic event frequently occurs in a certain time period, it is considered that the network operation in the time period is not stable enough, the network performance data in the time period is also not stable enough, and when the unstable network performance data is used for machine learning, the effect of machine learning may be affected, and in order to further optimize the effect of machine learning, in combination with the above preferred embodiment, as shown in fig. 4, a relatively complete logic display will be performed by further fusing the association steps in this embodiment:
in step 400, the first data set is divided into one or more intervals by taking the number of the preset data pieces as the interval size, and if the number of the first network performance data in the interval exceeds a second preset number, all the network performance data in the interval are removed from the first data set.
The preset number of data is obtained by analyzing the total number of network performance data required by machine learning and the accuracy requirement of machine learning by a person skilled in the art. The second preset number is obtained by analyzing according to the preset number of data and the accuracy requirement of machine learning by a person skilled in the art. And when the number of the network performance data in the divided interval does not reach a second preset number, judging according to the ratio of the first network performance data to all the network performance data in the interval, namely when the ratio of the first network performance data to all the network performance data in the interval exceeds the preset ratio, removing all the network performance data in the interval. The preset ratio is analyzed by a person skilled in the art according to the number of preset data pieces and the accuracy requirement of machine learning, and is usually a ratio of the second preset number to the preset number.
In the preferred implementation mode, intervals are divided for the first data set, stability evaluation is carried out on each interval, and if the network sporadic events in the interval are considered to occur frequently, namely the network health state changes frequently, the network performance data in the whole interval are removed, so that the network performance data for machine learning have certain stability, and the machine learning effect is optimized.
In the preferred embodiments and implementation manners of this embodiment, the standards and methods for cleaning or labeling are set in detail, so that automatic data acquisition, cleaning and labeling without human involvement can be realized by mutually combining the preferred embodiments, thereby ensuring that the effect of machine learning is not affected by processes such as manual labeling.
In the present embodiment, the expressions like "first", "second" and "third" have no special limited meaning, and are used for description only for the convenience of describing different individuals among a class of objects, and should not be interpreted as having a special limited meaning in order or otherwise.
Example 2:
based on the method described in embodiment 1, the invention combines with a specific application scenario and uses technical expressions in a related scenario to describe an implementation process in a characteristic scenario.
In this embodiment, machine learning is used to learn, train, predict, and estimate a Network health state of an ODU layer (Optical Channel Data Unit) of an OTN (Optical Transport Network) Network, and a Network performance index required to be concerned by this machine learning is an error code.
In this embodiment, for ease of execution of the computer program, health is defined by enumeration or otherwise as 0, sub-health as 1, and degradation as-1.
Fig. 5 is a network Layer architecture diagram of an OTN, which includes an OTS Layer (Optical Transmission Section Layer), an OMS Layer (Optical Multiplexing Section Layer), an OCH Layer (Optical Channel Layer), and a customer Layer, respectively, wherein the OCH Layer includes an OCH (Optical Channel), an OTU Layer (Optical Channel transport unit), an ODU Layer (Optical Channel data unit), and an OPU Layer (Optical Channel payload unit). Data is transmitted from a lower layer to an upper layer, that is, before the data reaches an ODU layer, the data may also pass through an OTS layer, an OMS layer, and an OCH layer.
As a result of analysis by those skilled in the art, if the preset period between the acquisition of the network performance data is 15 minutes, the first preset number is 3, the second preset number is 5, and the preset number of data pieces is 8, specific steps of generating a required data set for this machine learning are shown in fig. 6, and include:
in step 501, data items related to error codes of the ODU layer in the four network layers, that is, the ODU layer, the OTS layer, the OMS layer, and the OCH layer, are collected from the network management system at intervals of 15 minutes, the collected data items are merged into network performance data for storage, and alarm items in the four network layers are collected and merged as alarm information. The storing may be in a file, cache, or database, with the stored pieces of network performance data being shown in FIG. 7 from t 0 Start to collect until t m Ending, each network performance data includes n data items, stored t 0 To t m The plurality of pieces of alarm information are shown in fig. 8, each piece of alarm information includes one or more alarm items, and the alarm item n shown in fig. 8 and the data item n shown in fig. 7 do not mean that the number of alarm items is the same as the number of data items, but mean that the network performance data may include a plurality of data items and that the alarm information may include a plurality of alarm items.
In step 502, according to the noticeThe alarm information judges whether an alarm exists in a corresponding preset period or not, if the alarm exists, the corresponding network performance data are deleted, and a set formed by the remaining stored network performance data is a first data set. Wherein, the alarm item usually represents no corresponding alarm by 0, and represents that there is a corresponding alarm by a value other than 0, so t is shown in fig. 8 1 If there is alarm, the corresponding t 1 Network performance data of the time should be removed.
In step 503, the ES and the SES in the first data set are selected as feature data, corresponding feature health statuses are obtained through respective preset range intervals, corresponding ranges of network health statuses are obtained according to the corresponding feature health statuses, and a common network health status in the ranges of the network health statuses of the ES and the SES is selected for labeling.
Respectively presetting a first interval and a second interval for ES and SES, and when the value of ES is greater than the maximum value of the first interval or the value of SES is greater than the maximum value of the second interval, considering that the corresponding characteristic health state is degraded, namely the value of the characteristic health state is-1; when the value of ES is smaller than the maximum value of the first interval or the value of SES is smaller than the maximum value of the second interval, the corresponding characteristic health state is considered to be degraded, i.e., the value of the characteristic health state is 0; when the value of the ES is greater than or equal to the minimum value of the first interval and less than or equal to the maximum value of the first interval, or the value of the SES is greater than or equal to the minimum value of the second interval and less than or equal to the maximum value of the second interval, the corresponding characteristic health state is sub-health, and the value of the characteristic health state is 1.
Defining the range of the network health state which can be selected to be 0 or 1 when the characteristic health state corresponding to the SES is 0; when the characteristic health state corresponding to the SES is 1 or-1, the range of the network health state which can be selected is-1; defining the range of the selected network health state to be 0 or-1 when the characteristic health state corresponding to the ES is 0; when the characteristic health state corresponding to the ES is 1 or-1, the selected range of the network health state is 1 or-1. When the value of ES is-1 and the value of SES is 1 in a piece of network performance data, the corresponding network health state is-1, i.e. the state of degradation.
The labeling may be adding information of a network health state to the network performance data, or storing the labeled network health state in other manners such as mapping.
In step 504, the network performance data in the first data set is sequentially accessed, the network health status of the last piece of network performance data accessed is recorded by using the variable old, the old is compared with the network health status of the current piece of network performance data accessed, and if the old and the current pieces of network performance data are different, the current piece of network performance data is marked. The labels here are merely to show differences from other network performance data and do not carry other information.
In step 505, the first data set is partitioned into intervals by using 8 pieces of network performance data as a preset number of data pieces, the number of marked network performance data included in each partitioned interval is calculated, and if the number of marked network performance data included in each partitioned interval exceeds 5, all network performance data in the partitioned interval are deleted.
In step 506, the marked network performance data are sequentially accessed from the first marked network performance data, the i-th network performance data in the first data set is set to be currently accessed, whether the marked network performance data exist in the i + 1-th and … i + 3-th network performance data is judged, if yes, the i-th network performance data are removed until the last marked network performance data is accessed, a set formed by the remaining stored network performance data is a second data set, and the second data set is used for machine learning, where fig. 9 shows the formed second data set.
In the present embodiment, the expressions like "first", "second" and "third" have no special limited meaning, and are used for description only for the convenience of describing different individuals among a class of objects, and should not be interpreted as having a special limited meaning in order or otherwise.
Example 3:
fig. 10 is a schematic structural diagram of a network health status data collection and annotation device according to an embodiment of the present invention. The network health status data collecting and labeling device of the embodiment includes one or more processors 21 and a memory 22. In fig. 10, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 10 illustrates the connection by a bus as an example.
The memory 22, as a non-volatile computer-readable storage medium, can be used to store a non-volatile software program and a non-volatile computer-executable program, such as the method for network health status data collection annotation in embodiment 1. The processor 21 performs the method of network health status data collection annotation by executing non-volatile software programs and instructions stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, perform the method for network health status data collection annotation in embodiments 1 and 2, for example, perform the steps shown in fig. 1-4 and fig. 6 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be performed by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for collecting and labeling network health state data is characterized by comprising the following steps:
collecting network performance data and alarm information, cleaning the network performance data once according to the alarm information, removing the network performance data in a network fault state, and forming a first data set;
selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information;
and carrying out secondary cleaning on the first data set according to the network health state, removing the network performance data when the network accidental event exists, and forming a second data set for machine learning.
2. The method for acquiring and labeling network health status data according to claim 1, wherein the acquiring network performance data and alarm information specifically comprises:
according to a network performance index required to be evaluated by machine learning, acquiring one or more first data items related to the network performance index in a first network layer where the network performance index is located at intervals of a preset period, acquiring one or more second data items related to the network performance index in one or more second network layers related to the first network layer, and acquiring alarm information related to the network performance index in the first network layer and the second network layer;
and combining all the first data items and all the second data items acquired in each preset period to serve as corresponding network performance data in the preset period.
3. The method for acquiring and labeling network health status data according to claim 2, wherein the cleaning the network performance data once according to the alarm information specifically comprises:
and judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing the network performance data corresponding to the preset period.
4. The method for acquiring and labeling network health status data according to claim 1, wherein labeling a network health status for each piece of network performance data in the first data set according to the feature information specifically comprises:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data when the values of the characteristic information are at different positions of the range interval.
5. The method for acquiring and labeling network health status data according to claim 4, wherein when there is no hierarchical relationship among the selected pieces of feature information, the obtaining of the network health status of the network performance data according to one or more feature health statuses corresponding to the network performance data specifically comprises:
setting different proportions for each characteristic information, calculating the total proportion of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total proportion as the network health state corresponding to the network performance data.
6. The method for acquiring and labeling network health status data according to claim 4, wherein when a hierarchical relationship exists between the selected plurality of feature information, the obtaining of the network health status of the network performance data according to one or more feature health statuses corresponding to the network performance data specifically comprises:
and determining a range of the corresponding network health state according to the characteristic health state corresponding to each characteristic information, and selecting a common network health state from the range of the plurality of network health states corresponding to the plurality of characteristic information as the network health state corresponding to the network performance data.
7. The method for acquiring and labeling network health state data according to claim 1, wherein the performing of the second cleaning on the first data set according to the network health state specifically comprises:
finding first network performance data in the first data set that is different from the network health status of the previous network performance data;
taking each piece of first network performance data in the first data set as target data, and if the first network data exists in a first preset amount of network performance data after the target data, removing the target data from the first data set.
8. The method of network health status data collection annotation of claim 7, wherein prior to said targeting each of the first network performance data in the first data set, the method further comprises:
and dividing the first data set into one or more intervals by taking the number of preset data pieces as the interval size, and removing all the network performance data in the interval from the first data set if the number of the first network performance data in the interval exceeds a second preset number.
9. The method for network health status data collection annotation of any one of claims 1-8, wherein the network health status comprises: health, sub-health and deterioration.
10. An apparatus for acquiring and labeling network health state data, the apparatus comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of network health status data collection tagging of any one of claims 1-9.
CN202210299221.8A 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data Active CN115037643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210299221.8A CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210299221.8A CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Publications (2)

Publication Number Publication Date
CN115037643A true CN115037643A (en) 2022-09-09
CN115037643B CN115037643B (en) 2023-05-30

Family

ID=83119586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210299221.8A Active CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Country Status (1)

Country Link
CN (1) CN115037643B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209432A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Network equipment subhealth state method for early warning based on dynamic threshold and device
US20170099208A1 (en) * 2015-10-02 2017-04-06 Futurewei Technologies, Inc. Methodology to Improve the Anomaly Detection Rate
CN106992904A (en) * 2017-05-19 2017-07-28 湖南省起航嘉泰网络科技有限公司 Network equipment health degree appraisal procedure based on dynamic comprehensive weight
US20180367428A1 (en) * 2017-06-19 2018-12-20 Cisco Technology, Inc. Trustworthiness index computation in a network assurance system based on data source health monitoring
CN111131199A (en) * 2019-12-11 2020-05-08 中移(杭州)信息技术有限公司 Method, device, server and storage medium for controlling traffic cleaning of service attack
CN111355649A (en) * 2018-12-20 2020-06-30 阿里巴巴集团控股有限公司 Flow reinjection method, device and system
CN111641535A (en) * 2020-05-28 2020-09-08 中国工商银行股份有限公司 Network monitoring method, network monitoring device, electronic equipment and medium
CN111736566A (en) * 2019-03-25 2020-10-02 南京智能制造研究院有限公司 Remote equipment health prediction method based on machine learning and edge calculation
CN111934936A (en) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 Network state detection method and device, electronic equipment and storage medium
CN112838960A (en) * 2019-11-22 2021-05-25 中兴通讯股份有限公司 Communication data cleaning method, device, network equipment and storage medium
CN113568900A (en) * 2021-02-06 2021-10-29 高云 Big data cleaning method based on artificial intelligence and cloud server
CN113660115A (en) * 2021-07-28 2021-11-16 上海纽盾科技股份有限公司 Network security data processing method, device and system based on alarm
CN113934720A (en) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 Data cleaning method and equipment and computer storage medium
CN114036711A (en) * 2021-09-18 2022-02-11 浪潮通信信息系统有限公司 Network quality degradation detection method and system
CN114049637A (en) * 2021-11-10 2022-02-15 重庆大学 Method and system for establishing target recognition model, electronic equipment and medium
CN114218402A (en) * 2021-12-17 2022-03-22 迈创企业管理服务股份有限公司 Method for recommending computer hardware fault replacement part

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170099208A1 (en) * 2015-10-02 2017-04-06 Futurewei Technologies, Inc. Methodology to Improve the Anomaly Detection Rate
CN106209432A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Network equipment subhealth state method for early warning based on dynamic threshold and device
CN106992904A (en) * 2017-05-19 2017-07-28 湖南省起航嘉泰网络科技有限公司 Network equipment health degree appraisal procedure based on dynamic comprehensive weight
US20180367428A1 (en) * 2017-06-19 2018-12-20 Cisco Technology, Inc. Trustworthiness index computation in a network assurance system based on data source health monitoring
CN111355649A (en) * 2018-12-20 2020-06-30 阿里巴巴集团控股有限公司 Flow reinjection method, device and system
CN111736566A (en) * 2019-03-25 2020-10-02 南京智能制造研究院有限公司 Remote equipment health prediction method based on machine learning and edge calculation
CN112838960A (en) * 2019-11-22 2021-05-25 中兴通讯股份有限公司 Communication data cleaning method, device, network equipment and storage medium
CN111131199A (en) * 2019-12-11 2020-05-08 中移(杭州)信息技术有限公司 Method, device, server and storage medium for controlling traffic cleaning of service attack
CN111641535A (en) * 2020-05-28 2020-09-08 中国工商银行股份有限公司 Network monitoring method, network monitoring device, electronic equipment and medium
CN111934936A (en) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 Network state detection method and device, electronic equipment and storage medium
CN113568900A (en) * 2021-02-06 2021-10-29 高云 Big data cleaning method based on artificial intelligence and cloud server
CN113660115A (en) * 2021-07-28 2021-11-16 上海纽盾科技股份有限公司 Network security data processing method, device and system based on alarm
CN114036711A (en) * 2021-09-18 2022-02-11 浪潮通信信息系统有限公司 Network quality degradation detection method and system
CN113934720A (en) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 Data cleaning method and equipment and computer storage medium
CN114049637A (en) * 2021-11-10 2022-02-15 重庆大学 Method and system for establishing target recognition model, electronic equipment and medium
CN114218402A (en) * 2021-12-17 2022-03-22 迈创企业管理服务股份有限公司 Method for recommending computer hardware fault replacement part

Also Published As

Publication number Publication date
CN115037643B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN113282461A (en) Alarm identification method and device for transmission network
CN110175324B (en) Power grid operation instruction verification method and system based on data mining
CN109739904A (en) A kind of labeling method of time series, device, equipment and storage medium
CN111290913A (en) Fault location visualization system and method based on operation and maintenance data prediction
CN113189451A (en) Power distribution network fault positioning studying and judging method, system, computer equipment and storage medium
CN117034143B (en) Distributed system fault diagnosis method and device based on machine learning
CN116932523B (en) Platform for integrating and supervising third party environment detection mechanism
CN114090647A (en) Power communication equipment defect relevance analysis method and defect checking method
CN115865625A (en) Method and device for analyzing fault root cause of communication equipment
CN117009119A (en) Cloud-protogenesis-oriented micro-service intelligent operation and maintenance system, method and application
CN116088469A (en) Expert system-based generalized fault diagnosis platform system
CN112506802B (en) Test data management method and system
CN109389294B (en) Usability evaluation method and device of nuclear security level DCS (distributed control System)
CN114138610A (en) Fault processing method and device
CN112285484B (en) Power system fault diagnosis information fusion method and device based on deep neural network
CN109889258B (en) Optical network fault checking method and equipment
CN117891631A (en) Operation and maintenance fault root cause analysis method and device, electronic equipment and storage medium
CN115037643A (en) Method and device for acquiring and labeling network health state data
CN111935279B (en) Internet of things network maintenance method based on block chain and big data and computing node
CN113825162B (en) Method and device for positioning fault reasons of telecommunication network
CN116522213A (en) Service state level classification and classification model training method and electronic equipment
Popović et al. Case study: a maintenance practice used with real‐time telecommunications software
CN115599621A (en) Micro-service abnormity diagnosis method, device, equipment and storage medium
CN101945011B (en) Method and system for evaluating protective performance of multiplexing section
CN116755910B (en) Host machine high availability prediction method and device based on cold start and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant