CN115037643B - Method and device for collecting and labeling network health state data - Google Patents

Method and device for collecting and labeling network health state data Download PDF

Info

Publication number
CN115037643B
CN115037643B CN202210299221.8A CN202210299221A CN115037643B CN 115037643 B CN115037643 B CN 115037643B CN 202210299221 A CN202210299221 A CN 202210299221A CN 115037643 B CN115037643 B CN 115037643B
Authority
CN
China
Prior art keywords
network
data
network performance
performance data
health
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210299221.8A
Other languages
Chinese (zh)
Other versions
CN115037643A (en
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Wuhan Fiberhome Technical Services Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Wuhan Fiberhome Technical Services Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd, Wuhan Fiberhome Technical Services Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN202210299221.8A priority Critical patent/CN115037643B/en
Publication of CN115037643A publication Critical patent/CN115037643A/en
Application granted granted Critical
Publication of CN115037643B publication Critical patent/CN115037643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/16Time-division multiplex systems in which the time allocation to individual channels within a transmission cycle is variable, e.g. to accommodate varying complexity of signals, to vary number of channels transmitted
    • H04J3/1605Fixed allocated frame structures
    • H04J3/1652Optical Transport Network [OTN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Maintenance And Management Of Digital Transmission (AREA)

Abstract

The invention relates to the technical field of communication, and provides a method and a device for collecting and labeling network health state data. Wherein the method comprises: collecting network performance data and alarm information, cleaning the network performance data for one time according to the alarm information, and removing the network performance data in a network fault state to form a first data set; selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information; and performing secondary cleaning on the first data set according to the network health state, and removing network performance data when a network contingency event exists to form a second data set for machine learning. According to the invention, the network performance data during network faults or network contingency events is removed through primary cleaning and secondary cleaning, so that the machine learning effect is improved.

Description

Method and device for collecting and labeling network health state data
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for collecting and labeling network health state data.
Background
With the development of artificial intelligence technology, various industries begin to utilize machine learning to solve the problems in the self field, so that the efficiency is improved and the cost is reduced. In the network field, with the increasing expansion of the network scale, the rapid development of the 5G network, the traditional manual operation and maintenance mode can not meet the requirement of solving the fault hidden trouble of the rapid positioning problem, and the introduction of the assistance of the artificial intelligence technology to improve the operation and maintenance efficiency is common in the industry. In the machine learning technology based on data, the effect of machine learning is directly determined by the quality of the data, so that the processing of the data is often the most important part of the machine learning process. Meanwhile, machine learning based on supervised learning requires labeling a large number of real samples so as to apply the data to machine learning and training, and most of data labeling is performed in a manual mode at present.
In AI machine learning related to network performance and network health, for example, network performance prediction, network degradation trend analysis, network performance degradation fault point tracing and the like, the collected data needs to be labeled when supervised learning is performed, but there are various problems caused by manual labeling, including: when the labeling is carried out, no unified standard is provided for the network health state, most labeling processes define and divide the network health state according to operation and maintenance experience, various performances, alarms and the like, the selected labeling is various, and different effects can be generated on the machine learning effect according to different experience of labeling personnel. And the samples are usually collected from the network management system, there may be "dirty data" that affects the effect of machine learning, such as data at the time of network failure or network contingency, according to the network status, and if these data are directly brought into machine learning, the effect of machine learning may be affected.
In view of this, overcoming the drawbacks of the prior art is a problem to be solved in the art.
Disclosure of Invention
The invention aims to solve the problem that dirty data existing in a collected sample during network faults or network occasional events influences the effect of machine learning.
The invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for collecting and labeling network health status data, including:
collecting network performance data and alarm information, cleaning the network performance data for one time according to the alarm information, and removing the network performance data in a network fault state to form a first data set;
selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information;
and performing secondary cleaning on the first data set according to the network health state, and removing network performance data when a network contingency event exists to form a second data set for machine learning.
Preferably, the collecting network performance data and alarm information specifically includes:
according to network performance indexes to be evaluated by machine learning, acquiring one or more first data items related to the network performance indexes in a first network layer where the network performance indexes are located, acquiring one or more second data items related to the network performance indexes in one or more second network layers related to the first network layer, and acquiring alarm information related to the network performance indexes in the first network layer and the second network layer every interval of preset period;
and merging all the first data items and all the second data items acquired in each preset period to serve as corresponding network performance data in the preset period.
Preferably, the performing a cleaning on the network performance data according to the alarm information specifically includes:
judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing network performance data corresponding to the preset period.
Preferably, the labeling the network health status for each piece of network performance data in the first dataset according to the feature information specifically includes:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data, wherein the characteristic health states correspond to the characteristic information when the values of the characteristic information are in different positions of the range interval.
Preferably, when there is no hierarchical relationship between the selected plurality of feature information, the obtaining the network health status of the network performance data according to the one or more feature health statuses corresponding to the network performance data specifically includes:
and setting different duty ratios for each piece of characteristic information, calculating the total duty ratio of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total duty ratio as the network health state corresponding to the network performance data.
Preferably, when a hierarchical relationship exists between the selected plurality of feature information, the obtaining the network health status of the network performance data according to the one or more feature health statuses corresponding to the network performance data specifically includes:
and determining a corresponding network health state range according to the characteristic health state corresponding to each piece of characteristic information, and selecting a common network health state from the plurality of network health state ranges corresponding to the plurality of characteristic information as the network health state corresponding to the network performance data.
Preferably, the performing secondary cleaning on the first data set according to the network health status specifically includes:
finding first network performance data in the first data set, which is different from the network health state of the last network performance data;
and taking each piece of first network performance data in the first data set as target data, and removing the target data from the first data set if the first network performance data exist in a first preset number of network performance data after the target data.
Preferably, before said targeting each piece of first network performance data in the first data set, the method further comprises:
dividing the first data set into one or more intervals by taking the preset number of data as the interval size, and removing all network performance data in the interval from the first data set if the number of the first network performance data in the interval exceeds the second preset number.
Preferably, the network health status includes: health, sub-health and deterioration.
In a second aspect, the present invention further provides an apparatus for implementing the network health status data acquisition labeling in the first aspect, where the apparatus includes:
at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, the instructions being executable by the processor for performing the method of network health status data acquisition labeling of the first aspect.
In a third aspect, the present invention also provides a non-volatile computer storage medium storing computer executable instructions for execution by one or more processors to perform the method of network health status data acquisition annotation of the first aspect.
According to the invention, the network performance data during network faults or network contingency events is removed through primary cleaning and secondary cleaning, so that the machine learning effect is improved. In addition, in the preferred embodiment of the invention, the quality of data required by machine learning is ensured by making standard processes of labeling and cleaning, so that the effect of machine learning is ensured.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flowchart of a method for network health status data collection labeling provided by an embodiment of the present invention;
FIG. 2 is a flowchart of a method for network health status data collection labeling provided by an embodiment of the present invention;
FIG. 3 is a flowchart of a method for network health status data collection labeling provided by an embodiment of the present invention;
FIG. 4 is a flowchart of a method for network health status data collection labeling provided by an embodiment of the present invention;
fig. 5 is a network layer architecture diagram of an OTN according to an embodiment of the present invention;
FIG. 6 is a flowchart of a method for network health status data collection labeling provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of a plurality of collected network performance data provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of collected alert information provided by an embodiment of the present invention;
FIG. 9 is a schematic illustration of a formed second data set provided by an embodiment of the present invention;
fig. 10 is a schematic diagram of an architecture of a network health status data collection labeling device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, terms such as "inner", "outer", "longitudinal", "transverse", "upper", "lower", "top", "bottom", and the like refer to an orientation or positional relationship based on that shown in the drawings, and are merely for convenience in describing the present invention and do not require that the present invention must be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1:
the embodiment 1 of the invention provides a method for collecting and labeling network health state data, which is shown in fig. 1 and specifically comprises the following steps:
in step 201, network performance data and alarm information are collected, and according to the alarm information, the network performance data is cleaned once, and the network performance data in a network failure state is removed, so as to form a first data set.
The collecting the plurality of network performance data specifically refers to collecting one piece of network performance data at intervals of preset periods in a network layer where the network performance index is located, wherein the network performance index is concerned with according to the network performance index required by machine learning, one or more data items related to the network performance index are selected and combined to serve as the network performance data, and the preset periods are obtained by a person skilled in the art according to experience analysis. Among the data items contained in the collected network performance data may be: bit interleaved parity BIP (Bit Interleaved Parity) Error, CRC (Cyclic Redundancy Check) check Error, packet Loss (Packet Loss), packet Loss Rate (Packet Loss Rate), BER (Bit Error Rate) Error Rate, background Error block BBE (Background Block Error), background Error block Error ratio BBER (Background Block Error Ratio), ES (Error Second), SES (Serious Error Second, severe Error Second), unavailable time UAT (Unavailable Time), alarm information, and the like. The selected network performance data may also vary depending on the network performance index that the machine learning is required to focus on. The alert information is typically a related alert generated at the network layer where the network performance index of interest is located for machine learning. The network fault specifically refers to an irreparable fault or a fault which can exist for a long time, such as an optical fiber break, a network loop and the like, when the network fault exists, network performance data can be wrong, if the network fault is used for machine learning, the machine learning can be incorrect, so that whether the network fault exists is distinguished through alarm information, and the network performance data when the network fault exists is removed.
In step 202, one or more data items in the first data set are selected as one or more characteristic information, and a network health status is marked for each piece of network performance data in the first data set according to the characteristic information.
The characteristic information is determined and selected according to network performance indexes which are required to be paid attention to by machine learning. Wherein a piece of network performance data comprises one or more data items. The network health status generally includes two states, namely a health state and a degradation state, but in order to make the effect of machine learning more accurate and realize better network performance prediction, the network health status may be defined as more than two, such as three, and the network health status includes: health, sub-health and deterioration. When the network is in sub-health state through machine learning, corresponding measures can be taken to prevent the network health state from further going to degradation, thereby ensuring continuous and stable work of the network.
In step 203, the first data set is cleaned for the second time according to the network health status, and the network performance data when the network contingency exists is removed, so as to form a second data set for machine learning.
The network contingency event specifically refers to an event existing in a short period such as network route change, network circuit switching, network maintenance operation, power supply switching and the like, and when the network contingency event occurs, network performance data can be wrong, so that the network performance data when the network contingency event exists is removed through secondary cleaning.
The embodiment provides a method for collecting and labeling network health state data, which removes network performance data during network faults or network sporadic events through primary cleaning and secondary cleaning, and improves the machine learning effect.
In the actual network application process, machine learning is mainly used in aspects of network performance prediction, network degradation trend analysis, network performance degradation tracing and the like, while in the currently applied network technology center, the network is generally multi-level, and when tracing only the relevant data items in the network layer where the network performance indexes are located, only the source of network performance degradation can be traced to the present network layer, but other relevant network layers cannot be traced further, so as to be shown in fig. 2, the following preferred embodiment exists, namely, the collection of network performance data and alarm information specifically includes:
in step 301, according to a network performance index to be evaluated by machine learning, one or more first data items related to the network performance index are collected in a first network layer where the network performance index is located, and one or more second data items related to the network performance index are collected in one or more second network layers related to the first network layer every preset period.
In step 302, alarm information related to the network performance index is collected in a first network layer and a second network layer.
In step 303, all the first data items and all the second data items acquired in each preset period are combined as corresponding network performance data in the preset period.
Wherein one or more second data items may be collected in a second network layer. The network performance index required to be evaluated by the machine learning may be one or more, the second network layer is another network layer different from the first network layer in the network, the related one or more second network layers are specifically one or more network layers through which data reaching the first network layer passes before reaching the first network layer in the data transmission process, the network layers may affect the network condition of the first network layer, and when the network performance of the first network layer is degraded, the reason may be that data items are collected in the related network layer of the first network layer because of other network layers through which the data transmission process passes, so as to assist in tracing the network performance degradation.
The second network layer is typically the lower network layer of the first network layer, and because of the nature of the transfer of network data from the lower network layer to the upper network layer, the data also passes through the second network layer before reaching the first network layer.
The preset period is obtained by analysis of network transmission conditions and machine learning requirements by a person skilled in the art.
When the data items of the second network layer are also selected, only the data items in the first network layer can be selected as the feature information in step 202.
The preferred embodiment not only collects the data items of the first network layer where the network performance index is located, but also collects the data items of the second network layer related to the first network layer, so that when the network performance degradation tracing is carried out, the degradation source of the first network layer can be traced back, the tracing can be carried out continuously to the lower network layer, and the real reason of the network performance degradation can be determined.
The alarm information is usually reported by an alarm module in a network management system, and due to the multi-level architecture design in the network, when one network layer has a fault, other network layers may have the same fault or cannot operate normally, in this case, if alarms of a plurality of network layers exist, the alarm module integrates a plurality of alarm information with high correlation, and the alarm information with the highest processing priority or the earliest detected fault is displayed to a user. That is, under the design framework of the current alarm module, when alarm information appears in a certain network layer, it may indicate that other network layers related to the network layer also have faults, when one time cleaning is performed, if cleaning is performed only according to the alarm information of the network layer where the network performance index required to be focused by machine learning is located, it may only be possible to remove network performance data when part of the network faults are located, and cleaning is required to be performed according to the alarm information of the related network layers, so in the preferred embodiment, by collecting alarm information in the first network layer and the second network layer related to the first network layer, all alarm information appearing in all network layers related to the first network layer can participate in one time cleaning of the network data, thereby removing all network performance data when network faults related to the network performance index exist.
On the basis of the above preferred embodiment, there is also a preferred implementation manner, that is, the cleaning the network performance data once according to the alarm information, which specifically includes:
judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing network performance data corresponding to the preset period.
The alert information typically includes one or more alert items, each alert item for determining whether an alert of a corresponding category exists.
In the embodiment, the network performance data when the alarm exists is removed through time correlation between the alarm information and the network performance data, and the data is cleaned not only through the alarm information in the first network layer but also through the alarm information in the second network layer related to the first network layer, so that when related alarms occur in any related network layer, the acquired abnormal network performance data is not used for machine learning, thereby ensuring the effect of machine learning.
In the above embodiment, one or more data items may be selected as the feature information, when one data item is selected as the feature information to perform network performance data labeling, the network health status of the corresponding network performance data is determined by the single feature information, and when a plurality of data items are selected to perform network performance data labeling, how to obtain the network health status of the corresponding network performance data is a problem that must be solved is provided, which provides the following preferred embodiments specifically including:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data, wherein the characteristic health states correspond to the characteristic information when the values of the characteristic information are in different positions of the range interval.
Wherein the characteristic health status may comprise some or all of the network health status.
According to the embodiment, the range interval is preset for each piece of characteristic information to obtain the network health value corresponding to each piece of characteristic information, and then the network health state is obtained according to the network health value, so that when a plurality of data items are selected as the characteristic information, the unique network health state corresponding to the network performance data can be obtained.
The network health status of the network performance data is obtained according to one or more characteristic health statuses corresponding to the network performance data, and generally, the method is as follows: and taking the characteristic network state with the largest quantity among the characteristic health states corresponding to the network performance data as the network health state.
When the degree of influence of each data item on the network is consistent, the method is adopted to obtain the network health status, but in practical situations, the degree of influence of the selected data item on the network health status is not completely consistent, for example, the two selected data items are respectively network average time delay and packet loss rate, the network average time delay shows an influence on the network transmission speed, and the packet loss rate shows an influence on the stability of the network transmission, wherein, for a person skilled in the art, when the analysis of the network degradation trend is carried out through machine learning, the stability of the network transmission is larger than the influence of the network transmission speed on the network degradation trend analysis, and when the network health status obtained through the method is adopted to carry out machine learning, the conclusion of network degradation is possibly obtained through machine learning when the network average time delay is overlarge, which is not wanted by the person skilled in the art, so that there is a preferred implementation manner that when the hierarchical relationship between the selected plurality of feature information does not exist, the specific network health status according to one or more features corresponding to the network performance data comprises the network health status, and the specific network health status is obtained:
and setting different duty ratios for each piece of characteristic information, calculating the total duty ratio of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total duty ratio as the network health state corresponding to the network performance data.
Wherein the duty ratio is set by a person skilled in the art according to the degree of influence degree analysis of each characteristic information on the network health state.
According to the preferred implementation mode, different duty ratios are preset for different characteristic information, so that the influence degree on the network health state is reflected, the obtained network health state is more accurate, and the machine learning effect is optimized. The preferred implementation mode can automatically label the network performance data by presetting the duty ratio sum without human participation, so that the network health state judgment flow is standardized without depending on experience of labeling personnel.
When there is a hierarchical relationship between the selected feature information, for example, when two selected data items are ES and SES, respectively, where the SES hierarchy is higher than ES, that is, when SES occurs, there is necessarily ES, in this case, if the network health status is calculated by using a duty cycle manner or the feature health status with the largest number is selected as the network health status, this may result in inaccurate network health status, and therefore there is a preferred implementation manner in which, when there is a hierarchical relationship between the selected feature information, the network health status of the network performance data is obtained according to one or more feature health statuses corresponding to the network performance data, which specifically includes:
and determining a corresponding network health state range according to the characteristic health state corresponding to each piece of characteristic information, and selecting a common network health state from the plurality of network health state ranges corresponding to the plurality of characteristic information as the network health state corresponding to the network performance data.
According to the preferred implementation mode, a range is corresponding to each piece of characteristic information, and then a network performance state shared in each range is found and used as a network health state corresponding to the network performance data, so that the accurate network health state is obtained, and the machine learning effect is optimized. The preferred implementation mode can automatically label the network performance data by presetting the range of the network health state corresponding to the characteristic health state of each characteristic information without human participation, so that the network health state judging flow is standardized without depending on experience of labeling personnel.
On the basis of the above preferred implementation manner, if some of the selected feature information has a hierarchical relationship, and some of the selected feature information does not have a hierarchical relationship, the feature information having the hierarchical relationship can be regarded as single feature information, a single duty ratio is preset for the feature information, the network health state corresponding to the feature information having the hierarchical relationship is obtained through the above preferred implementation manner, then the network health state is converted into a corresponding feature health state, the form of the single feature information corresponding to the single feature health state and the single duty ratio participates in the calculation of the subsequent network health state, namely, the final network health state is calculated through the duty ratio form to be used as the network health state corresponding to the network performance data.
In the above embodiment, the second cleaning is performed on the first data set according to the network health status to remove the problem that how to distinguish whether the network performance data exists when the network contingency exists from the network performance data when the network contingency exists, and one common means is to formulate the corresponding flag information when different network contingency occurs through experience of a person skilled in the art, and determine whether the network contingency exists by determining whether the flag information exists in the network performance data. However, since a network contingency event may need to be comprehensively judged through a plurality of data items, and the network contingency event is of various kinds, the implementation of this approach is complicated, and the performance is low, so as to solve this problem, as shown in fig. 3, there are the following preferred embodiments, which specifically include:
in step 401, first network performance data in the first data set that is different from the network health status of the last piece of network performance data is found.
In step 402, each piece of first network performance data in the first data set is taken as target data, and if the first network performance data exists in the first preset quantity of network performance data after the target data, the target data is removed from the first data set.
Wherein the first predetermined number is determined by one skilled in the art based on a time period and empirical analysis of network performance data acquisition.
Since the health status of the network is necessarily affected when the network contingency occurs, and the purpose of the secondary cleaning is only to remove the network performance data when the network contingency occurs, but not to care about the type of the network contingency occurring, the preferred embodiment uses the health status of the network as the judgment basis of the network contingency, when the health status of the network changes, judges whether the change can be recovered or changed to other status, and if the change is recovered or changed to other status in a short time, the change is considered to have no sustainability, namely, the change belongs to the network contingency caused by the network contingency, and the data is removed, thereby guaranteeing the validity of the network performance data in the second data set for machine learning and guaranteeing the effect of machine learning.
In practical situations, when a network contingency event occurs frequently in a certain period of time, the network performance data in the period of time is considered to be not stable enough, and when the unstable network performance data is used for machine learning, the effect of the machine learning may be affected, and in order to further optimize the effect of the machine learning, in combination with the above preferred embodiment, as shown in fig. 4, the correlation steps in the present embodiment are further fused to perform a relatively complete logic display:
in step 400, the first data set is divided into one or more intervals with the preset number of data pieces as the interval size, and if the number of the first network performance data in the interval exceeds the second preset number, all the network performance data in the interval are removed from the first data set.
The preset data number is obtained by a person skilled in the art according to the total number of network performance data required by machine learning and the accuracy requirement analysis of the machine learning. The second preset number is analyzed by a person skilled in the art according to the preset number of data and the accuracy requirement of machine learning. When the number of the network performance data in the divided interval does not reach the second preset number, judging according to the ratio of the first network performance data to all the network performance data in the interval, namely removing all the network performance data in the interval when the ratio of the first network performance data to all the network performance data in the interval exceeds the preset ratio. The preset ratio is obtained by analysis of the preset number of data strips and the accuracy requirement of machine learning by a person skilled in the art, and is usually the ratio of the second preset number to the preset number of data strips.
According to the preferred implementation mode, the first data set is divided into sections, stability assessment is conducted on each section, if network sporadic events in the section are considered to be frequent, namely network health status changes frequently, network performance data in the whole section are removed, so that certain stability of the network performance data for machine learning is guaranteed, and the effect of machine learning is optimized.
In the preferred embodiments and implementation manners of the present embodiment, because the standards and methods for cleaning or labeling are set in detail, automatic data collection, cleaning and labeling without human participation can be achieved through the combination of the preferred embodiments, so that the effect of machine learning is not affected by the processes of manual labeling and the like.
In the present embodiment, the expressions like "first", "second" and "third" are not particularly limited in meaning, and the description thereof is merely for convenience of description that different individuals are distinguished in one kind of object, and should not be interpreted as a sequence or otherwise with a particularly limited meaning.
Example 2:
the invention is based on the method described in embodiment 1, and combines specific application scenes, and the implementation process in the characteristic scene of the invention is described by means of technical expression in the relevant scene.
In this embodiment, machine learning is used to learn, train, predict and evaluate the network health status of an ODU layer (Optical Channel Data Unit ) of an OTN (Optical Transport Network, optical transport network) network, and the network performance index of interest required for the machine learning is an error code, and the network health status set by the machine learning includes health, sub-health and degradation.
In this embodiment, to facilitate execution of the computer program, health is defined by enumeration or otherwise as 0, sub-health as 1, and degradation as-1.
As shown in fig. 5, the network layer architecture of the OTN includes an OTS layer (Optical Transmission Section Layer, optical transport layer), OMS layer (Optical Multiplexing Section Layer, optical multiplexing segment layer), OCH layer (Optical Channel Layer ), and client layer, respectively, where the OCH layer further includes an OCH (optical channel), an OTU layer (optical channel transport unit), an ODU layer (optical channel data unit), and an OPU layer (optical channel payload unit). Wherein data is transferred from the lower layer to the upper layer, i.e. the OTS layer, OMS layer and OCH layer may be passed before the data reaches the ODU layer.
The specific steps for generating the required data set for the machine learning are shown in fig. 6 when the preset period between the network performance data collection intervals is 15 minutes, the first preset number is 3, the second preset number is 5, and the preset data number is 8, and the specific steps are as follows:
in step 501, data items related to the error code of the ODU layer in the four network layers, namely the ODU layer, the OTS layer, the OMS layer and the OCH layer, are collected from the network management system every 15 minutes, the collected data items are combined into network performance data for storage, and alarm items in the four network layers are collected and combined as alarm information. The storing may be in a file, a cache, or a database, where the stored pieces of network performance data are collected from t0 to tm as shown in fig. 7, each piece of network performance data includes n data items, the stored pieces of alarm information from t0 to tm are shown in fig. 8, each piece of alarm information includes one or more alarm items, the alarm items n shown in fig. 8 and the data items n shown in fig. 7 do not refer to the same number of alarm items as the same number of data items, but refer to that the network performance data may include multiple data items and the alarm information may include multiple alarm items.
In step 502, it is determined whether an alarm exists in a corresponding preset period according to the alarm information, if so, the corresponding network performance data is deleted, and the set formed by the remaining stored network performance data is the first data set. Wherein, the alarm item generally represents that no corresponding alarm exists with 0, and represents that a corresponding alarm exists with a value other than 0, so that when an alarm exists at t1 as shown in fig. 8, the network performance data at the corresponding t1 should be removed.
In step 503, the ES and SES in the first dataset are selected as feature data, corresponding feature health states are obtained through respective preset range intervals, and corresponding ranges of network health states are obtained according to the corresponding feature health states, and network health states common to the ranges of network health states of the ES and SES are selected for labeling.
The method comprises the steps of presetting a first interval and a second interval for an ES and an SES respectively, and considering that the corresponding characteristic health state is deteriorated, namely the value of the characteristic health state is-1 when the value of the ES is larger than the maximum value of the first interval or the value of the SES is larger than the maximum value of the second interval; when the value of ES is smaller than the maximum value of the first section or the value of SES is smaller than the maximum value of the second section, the corresponding characteristic health state is considered to be degraded, that is, the value of the characteristic health state is 0; when the value of ES is greater than or equal to the minimum value of the first section and less than or equal to the maximum value of the first section, or the value of SES is greater than or equal to the minimum value of the second section and less than or equal to the maximum value of the second section, the corresponding characteristic health state is sub-health, and the value of the characteristic health state is 1.
When defining the characteristic health state corresponding to SES as 0, the range of the network health state which can be selected is 0 or 1; when the characteristic health state corresponding to SES is 1 or-1, the range of the network health state which can be selected is-1; defining that when the characteristic health state corresponding to the ES is 0, the range of the selected network health state is 0 or-1; when the characteristic health state corresponding to the ES is 1 or-1, the range of the selected network health state is 1 or-1. When the value of ES is-1 and the value of ses is 1 in one piece of network performance data, the corresponding network health status is-1, namely, the degradation status.
The labeling can be to add a piece of information of network health status to the network performance data, or to store the labeled network health status in other modes such as mapping.
In step 504, the network performance data in the first dataset is sequentially accessed, the network health status of the last accessed network performance data is recorded with a variable old, the old and the network health status of the current accessed network performance data are compared, and if the old and the network health status are different, the current network performance data is marked. The labels herein are merely for the purpose of showing differences from other network performance data and do not carry other information.
In step 505, the first data set is divided into sections with 8 pieces of network performance data as the preset number of data pieces, the number of marked network performance data included in each divided section is calculated, and if the number exceeds 5, all network performance data in the section is deleted.
In step 506, the marked network performance data is sequentially accessed from the first marked network performance data, the i-th network performance data in the first data set is set to be accessed currently, whether the marked network performance data exists in the i+1 and … i+3-th network performance data is judged, if so, the i-th network performance data is removed until the last marked network performance data access is finished, the set formed by the remaining stored network performance data is the second data set, and the second data set is used for machine learning, and the formed second data set is shown in fig. 9.
In the present embodiment, the expressions like "first", "second" and "third" are not particularly limited in meaning, and the description thereof is merely for convenience of description that different individuals are distinguished in one kind of object, and should not be interpreted as a sequence or otherwise with a particularly limited meaning.
Example 3:
fig. 10 is a schematic diagram of an architecture of a network health status data collection labeling device according to an embodiment of the invention. The network health status data acquisition and labeling device of the present embodiment includes one or more processors 21 and a memory 22. In fig. 10, a processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or otherwise, which is illustrated in fig. 10 as a bus connection.
The memory 22 is used as a non-volatile computer readable storage medium for storing non-volatile software programs and non-volatile computer executable programs, such as the method of network health status data acquisition labeling in embodiment 1. The processor 21 performs the method of network health data acquisition labeling by running non-volatile software programs and instructions stored in the memory 22.
The memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 22 may optionally include memory located remotely from processor 21, which may be connected to processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22 and when executed by the one or more processors 21 perform the method of network health status data acquisition labeling of embodiments 1 and 2 described above, for example, performing the steps shown in fig. 1-4 and 6 described above.
It should be noted that, because the content of information interaction and execution process between modules and units in the above-mentioned device and system is based on the same concept as the processing method embodiment of the present invention, specific content may be referred to the description in the method embodiment of the present invention, and will not be repeated here.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the embodiments may be implemented by a program that instructs associated hardware, the program may be stored on a computer readable storage medium, the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (8)

1. A method for collecting and labeling network health status data, comprising the following steps:
collecting network performance data and alarm information, cleaning the network performance data for one time according to the alarm information, and removing the network performance data in a network fault state to form a first data set;
selecting one or more data items in the first data set as one or more pieces of characteristic information, and marking the health state of the network for each piece of network performance data in the first data set according to the characteristic information;
performing secondary cleaning on the first data set according to the network health state, and removing network performance data when a network sporadic event exists to form a second data set for machine learning;
labeling the network health state for each piece of network performance data in the first data set according to the characteristic information, specifically including:
presetting a range interval for each piece of characteristic information, and obtaining the network health state of the network performance data according to one or more characteristic health states corresponding to the network performance data, wherein the characteristic health states correspond to the characteristic information when the values of the characteristic information are in different positions of the range interval;
the second cleaning of the first data set according to the network health status specifically includes:
finding first network performance data in the first data set, which is different from the network health state of the last network performance data;
and taking each piece of first network performance data in the first data set as target data, and removing the target data from the first data set if the first network performance data exist in a first preset number of network performance data after the target data.
2. The method for collecting and labeling network health status data according to claim 1, wherein the collecting network performance data and alarm information specifically comprises:
according to network performance indexes to be evaluated by machine learning, acquiring one or more first data items related to the network performance indexes in a first network layer where the network performance indexes are located, acquiring one or more second data items related to the network performance indexes in one or more second network layers related to the first network layer, and acquiring alarm information related to the network performance indexes in the first network layer and the second network layer every interval of preset period;
and merging all the first data items and all the second data items acquired in each preset period to serve as corresponding network performance data in the preset period.
3. The method for collecting and labeling network health status data according to claim 2, wherein the cleaning the network performance data according to the alarm information comprises:
judging whether an alarm exists in the corresponding preset period according to the alarm information, and if so, removing network performance data corresponding to the preset period.
4. The method for collecting and labeling network health status data according to claim 1, wherein when no hierarchical relationship exists between the selected plurality of feature information, the method for obtaining the network health status of the network performance data according to the one or more feature health statuses corresponding to the network performance data specifically comprises:
and setting different duty ratios for each piece of characteristic information, calculating the total duty ratio of each characteristic health state in a plurality of characteristic health states corresponding to the network performance data, and taking the characteristic health state with the highest total duty ratio as the network health state corresponding to the network performance data.
5. The method for collecting and labeling network health status data according to claim 1, wherein when a hierarchical relationship exists between the selected plurality of feature information, the method for obtaining the network health status of the network performance data according to the one or more feature health statuses corresponding to the network performance data specifically comprises:
and determining a corresponding network health state range according to the characteristic health state corresponding to each piece of characteristic information, and selecting a common network health state from the plurality of network health state ranges corresponding to the plurality of characteristic information as the network health state corresponding to the network performance data.
6. The method of network health data acquisition labeling of claim 1, wherein prior to targeting each piece of first network performance data in the first dataset, the method further comprises:
dividing the first data set into one or more intervals by taking the preset number of data as the interval size, and removing all network performance data in the interval from the first data set if the number of the first network performance data in the interval exceeds the second preset number.
7. The method for collecting and labeling network health status data according to any one of claims 1-6, wherein the network health status comprises: health, sub-health and deterioration.
8. A device for collecting and labeling network health status data, the device comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of network health status data acquisition labeling of any of claims 1-7.
CN202210299221.8A 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data Active CN115037643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210299221.8A CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210299221.8A CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Publications (2)

Publication Number Publication Date
CN115037643A CN115037643A (en) 2022-09-09
CN115037643B true CN115037643B (en) 2023-05-30

Family

ID=83119586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210299221.8A Active CN115037643B (en) 2022-03-25 2022-03-25 Method and device for collecting and labeling network health state data

Country Status (1)

Country Link
CN (1) CN115037643B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209432A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Network equipment subhealth state method for early warning based on dynamic threshold and device
CN106992904A (en) * 2017-05-19 2017-07-28 湖南省起航嘉泰网络科技有限公司 Network equipment health degree appraisal procedure based on dynamic comprehensive weight
CN111131199A (en) * 2019-12-11 2020-05-08 中移(杭州)信息技术有限公司 Method, device, server and storage medium for controlling traffic cleaning of service attack
CN111355649A (en) * 2018-12-20 2020-06-30 阿里巴巴集团控股有限公司 Flow reinjection method, device and system
CN111641535A (en) * 2020-05-28 2020-09-08 中国工商银行股份有限公司 Network monitoring method, network monitoring device, electronic equipment and medium
CN111736566A (en) * 2019-03-25 2020-10-02 南京智能制造研究院有限公司 Remote equipment health prediction method based on machine learning and edge calculation
CN111934936A (en) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 Network state detection method and device, electronic equipment and storage medium
CN112838960A (en) * 2019-11-22 2021-05-25 中兴通讯股份有限公司 Communication data cleaning method, device, network equipment and storage medium
CN113568900A (en) * 2021-02-06 2021-10-29 高云 Big data cleaning method based on artificial intelligence and cloud server
CN113660115A (en) * 2021-07-28 2021-11-16 上海纽盾科技股份有限公司 Network security data processing method, device and system based on alarm
CN113934720A (en) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 Data cleaning method and equipment and computer storage medium
CN114036711A (en) * 2021-09-18 2022-02-11 浪潮通信信息系统有限公司 Network quality degradation detection method and system
CN114049637A (en) * 2021-11-10 2022-02-15 重庆大学 Method and system for establishing target recognition model, electronic equipment and medium
CN114218402A (en) * 2021-12-17 2022-03-22 迈创企业管理服务股份有限公司 Method for recommending computer hardware fault replacement part

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10541903B2 (en) * 2015-10-02 2020-01-21 Futurewei Technologies, Inc. Methodology to improve the anomaly detection rate
US10484255B2 (en) * 2017-06-19 2019-11-19 Cisco Technology, Inc. Trustworthiness index computation in a network assurance system based on data source health monitoring

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209432A (en) * 2016-06-30 2016-12-07 中国人民解放军国防科学技术大学 Network equipment subhealth state method for early warning based on dynamic threshold and device
CN106992904A (en) * 2017-05-19 2017-07-28 湖南省起航嘉泰网络科技有限公司 Network equipment health degree appraisal procedure based on dynamic comprehensive weight
CN111355649A (en) * 2018-12-20 2020-06-30 阿里巴巴集团控股有限公司 Flow reinjection method, device and system
CN111736566A (en) * 2019-03-25 2020-10-02 南京智能制造研究院有限公司 Remote equipment health prediction method based on machine learning and edge calculation
CN112838960A (en) * 2019-11-22 2021-05-25 中兴通讯股份有限公司 Communication data cleaning method, device, network equipment and storage medium
CN111131199A (en) * 2019-12-11 2020-05-08 中移(杭州)信息技术有限公司 Method, device, server and storage medium for controlling traffic cleaning of service attack
CN111641535A (en) * 2020-05-28 2020-09-08 中国工商银行股份有限公司 Network monitoring method, network monitoring device, electronic equipment and medium
CN111934936A (en) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 Network state detection method and device, electronic equipment and storage medium
CN113568900A (en) * 2021-02-06 2021-10-29 高云 Big data cleaning method based on artificial intelligence and cloud server
CN113660115A (en) * 2021-07-28 2021-11-16 上海纽盾科技股份有限公司 Network security data processing method, device and system based on alarm
CN114036711A (en) * 2021-09-18 2022-02-11 浪潮通信信息系统有限公司 Network quality degradation detection method and system
CN113934720A (en) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 Data cleaning method and equipment and computer storage medium
CN114049637A (en) * 2021-11-10 2022-02-15 重庆大学 Method and system for establishing target recognition model, electronic equipment and medium
CN114218402A (en) * 2021-12-17 2022-03-22 迈创企业管理服务股份有限公司 Method for recommending computer hardware fault replacement part

Also Published As

Publication number Publication date
CN115037643A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US20210028973A1 (en) Identifying and locating a root cause of issues in a network having a known topology
CN106570778A (en) Big data-based data integration and line loss analysis and calculation method
CN109739904A (en) A kind of labeling method of time series, device, equipment and storage medium
CN111290913A (en) Fault location visualization system and method based on operation and maintenance data prediction
CN110162445A (en) The host health assessment method and device of Intrusion Detection based on host log and performance indicator
CN107679089A (en) A kind of cleaning method for electric power sensing data, device and system
CN115037643B (en) Method and device for collecting and labeling network health state data
Ren et al. A strong and reproducible object detector with only public datasets
CN111080484A (en) Method and device for monitoring abnormal data of power distribution network
CN116345699B (en) Internet-based power transmission circuit information acquisition system and acquisition method
CN110263622A (en) Train fire monitoring method, apparatus, terminal and storage medium
CN111614504A (en) Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN113285978B (en) Fault identification method based on block chain and big data and general computing node
CN113036917B (en) Power distribution network monitoring information monitoring system and method based on machine learning
CN113344150B (en) Method, device, medium and electronic equipment for identifying stained code points
CN116522213A (en) Service state level classification and classification model training method and electronic equipment
CN113489602A (en) Communication fault positioning method and system based on data mining
CN116755910B (en) Host machine high availability prediction method and device based on cold start and electronic equipment
CN117034143B (en) Distributed system fault diagnosis method and device based on machine learning
CN117439899B (en) Communication machine room inspection method and system based on big data
CN116773238B (en) Fault monitoring method and system based on industrial data
CN112684299B (en) High fault-tolerant identification method and device for power feeder line fault section by using voltage loss information
CN115576778A (en) Server predictive maintenance model method based on machine learning
CN115168080A (en) Distributed system time delay measurement method, device, equipment and medium
CN116052044A (en) Checking method and device for access data of vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant