WO2023179073A1 - 基于纵向联邦学习的otn数字孪生网络生成方法及系统 - Google Patents

基于纵向联邦学习的otn数字孪生网络生成方法及系统 Download PDF

Info

Publication number
WO2023179073A1
WO2023179073A1 PCT/CN2022/134719 CN2022134719W WO2023179073A1 WO 2023179073 A1 WO2023179073 A1 WO 2023179073A1 CN 2022134719 W CN2022134719 W CN 2022134719W WO 2023179073 A1 WO2023179073 A1 WO 2023179073A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
root cause
fault root
otn
model
Prior art date
Application number
PCT/CN2022/134719
Other languages
English (en)
French (fr)
Inventor
王大江
黄卓垚
王其磊
肖红运
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023179073A1 publication Critical patent/WO2023179073A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q11/0067Provisions for optical access or distribution networks, e.g. Gigabit Ethernet Passive Optical Network (GE-PON), ATM-based Passive Optical Network (A-PON), PON-Ring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0079Operation or maintenance aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0079Operation or maintenance aspects
    • H04Q2011/0083Testing; Monitoring

Definitions

  • embodiments of the present application provide a multi-domain orchestration system, including at least one processor and a memory used to communicate with the at least one processor; the memory stores information that can be used by the at least one processor. Execution instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the OTN digital twin network generation method as described in the second aspect.
  • Figure 5 is a flow chart of a method for constructing a single-domain training set based on fault root cause markers and related alarm information provided by an embodiment of the present application;
  • Figure 6 is a method flow chart of an iterative training process provided by an embodiment of the present application.
  • Figure 7 is a method flow chart of the single-domain management and control system and the multi-domain orchestration system in a single iterative training process provided by an embodiment of the present application;
  • the functional model of the OTN DT network layer needs to have the ability to globally analyze the entire network of the cross-domain OTN physical network, and collect sample data from each domain to conduct the OTN DT Training of network layer functional models.
  • supervised learning is usually used to build a cross-domain OTN fault root cause identification algorithm model.
  • embodiments of this application provide an OTN digital twin network generation method and system, which uses vertical federated learning technology to construct training sample data for a cross-domain fault root cause identification model, solving data privacy issues and improving model generalization capabilities.
  • an embodiment of the present application provides a method for generating an OTN digital twin network.
  • the OTN digital twin network is mapped to an OTN multi-domain physical network system including multiple single-domain physical networks.
  • the OTN multi-domain physical network system also It includes a single-domain management and control system and a multi-domain orchestration system.
  • the single-domain management and control system corresponds to the single-domain physical network.
  • the single-domain management and control system and the multi-domain orchestration system have the same structure of cross-domain fault root causes.
  • Identification model For the implementation method of single domain management and control system in OTN multi-domain physical network system and the method steps of implementation of multi-domain orchestration system, the following is a detailed description in two parts:
  • Step S110 perform homomorphic encryption on the local fault root cause mark to obtain the encrypted fault root cause mark
  • Step S120 Receive all encrypted alarm sample sequences corresponding to the encrypted fault root cause tags.
  • the encrypted alarm sample sequence is obtained by homomorphically encrypting the relevant alarm information by the single-domain management and control system corresponding to the single domain;
  • Step S130 generate a single-domain training set based on the encrypted fault root cause mark and the encrypted alarm sample sequence
  • Step S150 Report the model parameter updates to the multi-domain orchestration system, so that the multi-domain orchestration system can generate an OTN digital twin network based on the model parameter updates and topology information of each single-domain management and control system.
  • the main architecture of vertical federated learning includes three entities, namely entity A, entity B and coordinator C.
  • entity A entity A
  • entity B entity B
  • coordinator C entity C
  • the samples of A and B come from the same data side user.
  • the data side sample ID is the same, but Different samples have different feature dimensions.
  • the brief process of vertical federated learning includes:
  • Step 3 Party A and Party B calculate the encryption gradient and add additional masks respectively. Party B also calculates cryptographic losses. Party A and Party B send the encrypted results to Party C.
  • the solution for constructing a cross-domain DT network layer based on vertical federated learning in the embodiment of this application is a DT modeling solution of isomorphic cross-domain VFML, so as to build a cross-domain fault root cause identification model for the cross-domain OTN network DT case as
  • the structure of its OTN multi-domain physical network is shown in Figure 2:
  • the OTN multi-domain physical network includes multiple single-domain physical networks. These single-domain physical networks can be built based on the same manufacturer's switching technology, or they can be built based on different manufacturers' switching technologies. , Each single-domain physical network is equipped with a single-domain management and control system (Operations&Maintenance Center, OMC).
  • Each network element node within a single domain does not have AI training capabilities.
  • Each single-domain management and control system has AI training modeling capabilities.
  • Multi-domain orchestration systems across vendors also have AI training modeling capabilities.
  • a cross-domain fault diagnosis analysis case is used as a training sample: the fault root cause mark occurs in domain k, and the model training infers that the relevant input alarms of the root cause mark are scattered in other domains, and the fault root cause mark and related alarms of each domain The information all belongs to the cross-domain fault analysis case. It is necessary to collect relevant alarms and root cause markers in each domain to achieve model training, which meets the application scenarios and training conditions required for vertical federated learning.
  • the multi-domain orchestration system functions as an edge server.
  • Each single-domain management and control system reports the cross-domain fault root cause identification model parameter updates trained by its own AI algorithm to the multi-domain orchestration system through homomorphic encryption, and the multi-domain orchestration system updates the reported model parameters of all single-domain management and control systems.
  • the data is decrypted and aggregated, and the cross-domain fault root cause identification public model parameters constructed by the multi-domain orchestration system are updated at the same time.
  • the multi-domain orchestration system broadcasts the updated cross-domain fault root cause identification public model parameters to each single-domain management and control system.
  • Each single-domain management and control system uses the public model parameters to refresh the cross-domain fault root cause identification model parameters of its domain, and iteratively initiates the next round of model training and interaction with the multi-domain orchestration system based on this.
  • This solution can select relevant OTN domains to participate in vertical federated learning training based on the batch of fault root cause training samples. For example, if the fault root cause markers of this batch are related to alarms in K OTN domains, then the model of this batch Training and updating occur between the multi-domain orchestrator and the K single-domain management and control systems; and the next batch of fault root cause markers are related to alarms in the M OTN domains, then the model training and updating of this batch occur in Between the multi-domain orchestrator and these M single-domain management and control systems; the single-domain management and control systems participating in the two batches of training may or may not overlap, but the multi-domain orchestration system always participates in the root cause of cross-domain faults in all batches Identification model training ensures that the final trained public model for cross-domain fault root cause identification has stronger generalization ability and robustness.
  • the multi-domain orchestration system will generate the entire network cross-domain OTN DT network layer based on the cross-domain fault root cause identification model obtained through final training, the encrypted network topology information reported by each domain, and other functional model information.
  • RNN+Softmax (RNN+Softmax is used as an example here) algorithm model of each single-domain management and control system and the multi-domain edge server that plays the role of edge server in the training.
  • the RNN+Softmax algorithm model structure of the orchestration system is the same: including the vector attributes of the RNN+Softmax model input, the number of vector parameters, the number of layers of the RNN+Softmax model, the number of neurons in each layer, and the distance between layers. Activation function, connection relationship, output vector attributes, number of output vector parameters, etc., to ensure unified training and synchronous refresh of RNN+Softmax model parameters.
  • each fault root cause training sample based on the supervised learning RNN+Softmax model consists of the following two parts: Input: a 0 Cross-domain alarm sample sequence up to time t; output: fault root cause mark.
  • Input a 0 Cross-domain alarm sample sequence up to time t; output: fault root cause mark.
  • the batch of model training involves K OTN domains, each domain has a fault root cause tag, and each domain has a single fault root cause. Mark the relevant fault alarm information corresponding to it.
  • the number of fault root cause marks in domain k is n kL , then for i kL ⁇ [1,n kL ] we have:
  • n A is a positive integer.
  • the model input alarm sample vector at time t corresponding to each fault root cause mark is spliced by the associated alarms of each domain in all K domains. Therefore, n A means splicing each input The upper limit of the number of associated alarms per domain of the alarm sample vector. Assuming that the encrypted alarm sample sequences are all l-dimensional column vectors, if the number of associated alarms n A ⁇ l that can be provided by splicing the alarm sample vector in a certain domain, then the remaining Other elements are filled with zeros.
  • OTN multi-domain fault root cause has a total of k*m values, which can represent the domain where the root cause fault is located, and the fault root cause.
  • the type of cause is range of values.
  • x ikL represents the fault root cause tag of domain k
  • the corresponding alarm sample sequence Represents the alarm sampling of the alarm sample sequence x ikL at time t, expressed in the form of a vector.
  • the vector has a total of K*n A- dimensional alarm elements, and the vector is composed of alarm elements from each domain. For example Represents associated alarm elements from domain 1. Due to the zero padding process, x ikL is also an l-dimensional vector.
  • this solution regards the cross-domain fault root cause identification model in cross-OTN domain DT cases constructed with RNN technology as a logistic regression model that solves multi-classification problems.
  • the RNN model output adopts the form of Softmax, as follows:
  • the cost function of the OTN cross-domain fault root cause obtained by inferring n kL associated alarm sample sequences x ikL by the RNN model can be expressed by the following formula:
  • the 1 ⁇ operation indicates that the value rule is 1 when the expression in the curly brackets is true, and 0 when it is false.
  • ⁇ ( ⁇ 1 , ⁇ 2 , ⁇ 3 ,..., ⁇ K*m ) can be represented by a K*m dimensional column vector, and ⁇ j represents the RNN model parameters related to the fault root cause value j.
  • the gradient of the objective function J( ⁇ ) to the RNN model parameter ⁇ can be expressed as follows, and then the training of the RNN model parameter ⁇ can be completed through the gradient descent iterative algorithm:
  • the construction method can be achieved through the following steps:
  • Step S210 combine each encrypted alarm sample sequence to obtain a complete encrypted alarm training sample
  • Step S220 Construct a single-domain training set based on the correspondence between the encryption fault root cause mark and the complete encryption alarm training sample.
  • the fault root cause mark of a certain single-domain management and control system and the relevant alarm information of all other single-domain management and control systems except the single-domain management and control system are simultaneously processed.
  • State-of-the-art encryption and exchange are performed to obtain the encrypted fault root cause mark and the encrypted alarm sample sequence corresponding to the encrypted fault root cause mark.
  • a single-domain training set is constructed through the above two sets of encrypted information. This single-domain training set belongs to the fault root cause mark. The single domain in which it is located.
  • Step S223 Receive other second encrypted alarm sample vectors processed by single-domain homomorphic encryption, and the second alarm sample vector corresponding to the second encrypted alarm sample vector is related to the fault root cause mark;
  • Step S224 merge the first encrypted alarm sample vector and the second encrypted alarm sample vector to obtain the current single-domain encrypted alarm sample sequence
  • Step S225 Construct a single-domain training set based on the encrypted fault root cause mark and the encrypted alarm sample sequence provided by domain k.
  • the encrypted fault root cause mark is obtained by homomorphic encryption of the fault root cause mark by domain k.
  • the input alarm sample vector of the corresponding RNN model at time t is homomorphically encrypted on the part in domain 1 to obtain the first encrypted alarm sample vector (in this scheme, the encryption symbol is represented by en(), and the decrypted symbol is represented by dec() represents), and sends the encrypted alarm vector to other domains.
  • the expression of the first encrypted alarm sample vector is as follows:
  • the above two steps are the sampling sample sequence at time 0-t. Adjust the sampling time and repeat the above two steps (assuming that the encrypted input alarm sample vector corresponding to other times is Count the encrypted input alarm sample vectors at all times and finally obtain the complete encrypted alarm sample sequence en(x ikL (1)), as well as the encrypted fault root cause mark provided by domain k
  • step S150 above may include the following steps:
  • Step S310 Report the model parameter updates to the multi-domain orchestration system
  • Step S320 Receive the public model parameters issued by the multi-domain orchestration system.
  • the public model parameters are obtained according to the model parameter update amount and the initial public model parameters.
  • the initial public model parameters are issued by the multi-domain orchestration system to the single-domain management and control system before iterative training;
  • Step S330 update the model parameters of the local cross-domain fault root cause identification model according to the public model parameters
  • Step S340 Iteratively train the cross-domain fault root cause identification model based on the model parameter update amount and the public model parameters until the cross-domain fault root cause identification model meets the end conditions, so that the multi-domain orchestration system can perform cross-domain fault root cause identification based on the trained cross-domain fault root cause.
  • the OTN digital twin network is generated based on the identification model and the topology information of each single-domain management and control system.
  • the single-domain management and control system and the multi-domain orchestration system have the same structure of the cross-domain fault root cause identification model, iterative training is carried out through the transfer of model parameters during the training process.
  • the single-domain management and control system calculates the gradient after each training.
  • the model parameter update amount is sent to the multi-domain orchestration system.
  • the multi-domain orchestration system determines convergence based on the model parameter update amount. If convergence does not occur, the new public model parameters are calculated based on the model parameter update amount, and the new public model parameters are Send it to the single-domain management and control system, and iterate by analogy, eventually making the parameters converge.
  • Step S410 the single-domain management and control system receives the updated public model parameters
  • Step S420 The single-domain management and control system calculates the gradient of the updated public model parameters based on the cross-domain fault root cause identification model of the single domain it belongs to and performs homomorphic encryption processing;
  • Step S430 determine the model parameter update amount of the cross-domain fault root cause identification model based on the gradient calculation result and report the model parameter update amount to the multi-domain orchestration system;
  • Step S440 The multi-domain orchestration system updates the public model parameters according to the model parameter update amount, and delivers the public model parameters to the single-domain management and control system.
  • the multi-domain orchestration system sends the public model parameter ⁇ p to each single-domain management and control system.
  • the management and control system performs gradient calculation based on the public model parameter ⁇ p , obtains the updated amount of model parameters, and reports the updated amount of model parameters to the multi-domain orchestration system, which determines the convergence conditions:
  • ⁇ p+1 is the public model parameter used for the P+1 iteration.
  • ⁇ p+1 continues to be issued to each single-domain management and control system for the P+1 round of iteration.
  • the gradient of ⁇ p is calculated based on the cost function of the cross-domain fault root cause identification model of single domain 1 and homomorphic encryption is performed:
  • model parameter update amount is calculated, and single domain 1 homomorphically encrypts the model parameter update amount and reports it to the multi-domain orchestration system.
  • the model parameter update amount is calculated according to the following formula:
  • the multi-domain orchestration system obtains the updated encryption model parameters of all single domains including single domain 1.
  • the above-mentioned formula for updating public model parameters is used to update the public model parameters ⁇ p+ 1 of the p+1 round and send them to each single domain.
  • en(g 1 ) represents the update amount of encryption model parameters obtained by gradient calculation
  • a is the learning rate
  • the method includes but is not limited to the following step S510:
  • Step S510 Receive the model parameter update amount generated by the single-domain management and control system, and generate an OTN digital twin network based on the model parameter update amount and the topology information of each single-domain physical network;
  • the model parameter update amount is obtained by the single-domain management and control system training the cross-domain fault root cause identification model corresponding to the single domain based on the single-domain training set.
  • the single-domain training set is obtained by the single-domain management and control system based on the encrypted fault root cause mark and the encrypted fault root.
  • the encrypted alarm sample sequence corresponding to the tag is generated.
  • the encrypted fault root cause tag is obtained by homomorphic encryption of the fault root cause tag of the single domain where it is located by the single-domain management and control system.
  • the encrypted alarm sample sequence is obtained by the single-domain management and control system on the relevant alarms of the single domain where it is located. Information is obtained through homomorphic encryption.
  • the multi-domain orchestration system performs iterative training based on the model parameter updates uploaded by the single-domain management and control system. Based on the results of the iterative training, it combines the topological information of each single-domain physical network to generate an OTN digital twin. network.
  • performing step S510 iterative training to generate an OTN digital twin network may include the following steps:
  • Step S511 Generate public model parameters based on the model parameter update amount and initial public model parameters, and issue the public model parameters.
  • the initial public model parameters are issued by the multi-domain orchestration system to the single-domain management and control system before iterative training;
  • Step S512 Iteratively train the cross-domain fault root cause identification model based on the model parameter update amount and the public model parameters until the cross-domain fault root cause identification model meets the end conditions;
  • the end conditions of the iterative training of the multi-domain orchestration system are the same as the end conditions of the iterative training of the single-domain management and control system, and will not be repeated here.
  • the data of each single-domain physical network is homomorphically encrypted based on vertical federated learning technology, and a single-domain training set is constructed based on the fault root cause mark and the relevant alarm information corresponding to the fault root cause mark.
  • the cross-domain fault root cause identification model is trained.
  • the convergence is judged based on the public model parameters and model parameter updates, and OTN numbers are generated based on the trained cross-domain fault root cause identification model and the topology information of each single-domain physical network.
  • Twin network Twin network.
  • the embodiments of this application meet the privacy protection requirements of alarm information, user business data and other related data of each single domain in a multi-domain network.
  • this application is applied when the multi-domain orchestration system is an edge server in an OTN multi-domain physical network.
  • the method of the application embodiment can also use the computing power of edge devices to perform parallel training to improve model training efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种基于纵向联邦学习的OTN数字孪生网络生成方法及系统,通过纵向联邦学习对各个单域物理网络的数据进行同态加密后交换数据,根据故障根因标记和与故障根因标记对应的相关告警信息构建单域训练集,通过单域训练集对跨域故障根因识别模型进行训练,训练过程中根据公共模型参数和模型参数更新量判断收敛,从而根据训练完成的跨域故障根因识别模型和各个单域物理网络的拓扑信息生成OTN数字孪生网络。

Description

基于纵向联邦学习的OTN数字孪生网络生成方法及系统
相关申请的交叉引用
本申请基于申请号为202210286945.9、申请日为2022年03月23日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及数字孪生技术应用领域,尤其涉及一种基于纵向联邦学习的OTN数字孪生网络生成方法及系统。
背景技术
基于数字孪生可以进行监测、分析、预测、诊断、训练、仿真,并将仿真结果反馈给物理对象,从而帮助对物理对象进行优化和决策。涉及数字孪生模型构造、数字模型状态的实时更新、基于数字孪生的仿真分析和控制决策等相关技术则可统称为DT(Digital Twin,数字孪生)技术。
在电信领域应用DT技术可以实现对整网OTN(Optical Transport Network,光传送网)的分析能力。DT网络层是对整个跨域OTN物理网络的模型抽象,DT网元间通信不受物理网络空间限制、对DT网元的可视性与可操作性也不受物理网络分域等空间限制、管控限制,因此作为跨域OTN物理网络的OTN DT网络层,需要该OTN DT网络层的功能模型具备对跨域OTN物理网络的整网全局分析能力,并从各域采集样本数据进行该OTN DT网络层功能模型的训练。
一般而言,整个跨域OTN物理网络中各个单域具有大量训练数据的收集、处理,并且各单域有告警、网络拓扑、用户业务等相关数据信息的隐私保护诉求,不宜全部开放并集中上报,因此如何让各个单域管控系统收集到完整的跨域故障根因识别模型的训练样本数据并完成训练、增强模型推理的泛化能力,并保护各个单域的数据隐私,成为对跨域OTN网络的数字孪生功能模型建模的瓶颈,亟待解决。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种基于纵向联邦学习的OTN数字孪生网络生成方法及系统。
第一方面,本申请实施例提供了一种基于纵向联邦学习的OTN数字孪生网络生成方法,应用于OTN多域物理网络系统中的任一单域管控系统,所述OTN多域物理网络系统还包括多域编排系统,所述多域编排系统和所述单域管控系统具有相同结构的跨域故障根因识别模型;所述方法包括:对本地的故障根因标记进行同态加密得到加密故障根因标记;接收与所述加密故障根因标记对应的全部加密告警样本序列,所述加密告警样本序列由对应单域的所述单域管控系统对相关告警信息进行同态加密得到;根据所述加密故障根因标记和所述加密告警样本序列生成单域训练集;根据所述单域训练集训练本地的所述跨域故障根因识别模型,以 得到所述跨域故障根因识别模型的模型参数更新量;将所述模型参数更新量上报所述多域编排系统,供所述多域编排系统基于所述模型参数更新量和各个所述单域管控系统的拓扑信息生成OTN数字孪生网络。
第二方面,本申请实施例提供了一种基于纵向联邦学习的OTN数字孪生网络生成方法,应用于OTN多域物理网络系统中的多域编排系统,所述OTN多域物理网络系统还包括单域管控系统,所述多域编排系统和所述单域管控系统具有相同结构的跨域故障根因识别模型;所述方法包括:接收所述单域管控系统生成的模型参数更新量,并基于所述模型参数更新量和各个所述单域物理网络的拓扑信息生成OTN数字孪生网络;其中,所述模型参数更新量由所述单域管控系统根据单域训练集训练对应单域的所述跨域故障根因识别模型得到,所述单域训练集由所述单域管控系统根据加密故障根因标记和与所述加密故障根因标记对应的加密告警样本序列生成,所述加密故障根因标记由所述单域管控系统对所在单域的故障根因标记同态加密得到,所述加密告警样本序列由所述单域管控系统对所在单域的相关告警信息同态加密得到。
第三方面,本申请实施例提供了一种单域管控系统,包括至少一个处理器和用于与所述至少一个处理器通信连接的存储器;所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如第一方面所述的OTN数字孪生网络生成方法。
第四方面,本申请实施例提供了一种多域编排系统,包括至少一个处理器和用于与所述至少一个处理器通信连接的存储器;所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如第二方面所述的OTN数字孪生网络生成方法。
第五方面,本申请实施例提供了一种OTN多域物理网络系统,包括第三方面所述的单域管控系统和第四方面所述的多域编排系统,所述多域编排系统与所述单域管控系统连接。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的示例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的单域管控系统执行的OTN数字孪生网络生成方法的整体方法流程图;
图2是本申请一个实施例提供的OTN跨域物理网络架构图;
图3是本申请一个实施例提供的跨域故障根因识别模型的架构图;
图4是本申请一个实施例提供的构建单域训练集的方法流程图;
图5是本申请一个实施例提供的根据故障根因标记和相关告警信息构建单域训练集的方法流程图;
图6是本申请一个实施例提供的迭代训练过程的方法流程图;
图7是本申请一个实施例提供的单域管控系统和多域编排系统在单次迭代训练过程中的 方法流程图;
图8是本申请一个实施例提供的多域编排系统执行的OTN数字孪生网络生成方法的整体方法流程图;
图9是本申请一个实施例提供的迭代训练过程的方法流程图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
全球通信产业正在从互联时代、云时代迈向智能时代,新机遇与新挑战驱动网络加速全面转型升级。在此背景下,TM Forum于2019年提出AN(Autonomous Networks)自智网络的理念,历经2年多的时间,AN理念已在行业达成共识,旨在通过网络技术和数字技术融合,使能运营商网络数字化转型,为垂直行业和消费者用户提供零等待、零接触、零故障的创新ICT服务和用户体验,为运营商网络营运全生命周期打造自配置、自修复、自优化的网络能力。与此同时,新兴数字孪生技术也进入快速发展期,被广泛应用在制造业、供应链等领域,支撑工业数字化转型。电信领域的数字化转型同样对数字孪生技术有强烈诉求,被业界认为是数字化转型的基础、实现AN自智网络架构及技术的重要支撑和组成部分。网络数字孪生通过对自身及外部环境状态的精准感知,建立内外部环境的数字镜像,融合仿真与预防预测等能力,在“低成本试错”、“智能决策”、“预测性维护”等场景发挥关键使能技术作用。
当应用到运营商通信网络场景,在AN自智网络体系架构下,需要在OTN多域复杂网络环境中,构建OTN DT网络模型,生成OTN DT网络层,以通过DT技术实现对整网OTN的分析能力。DT网络层是对整个跨域OTN物理网络的模型抽象,DT网元间通信不受物理网络空间限制、对DT网元的可视性与可操作性也不受物理网络分域等空间限制、管控限制,因此作为跨域OTN物理网络的OTN DT网络层,需要该OTN DT网络层的功能模型具备对跨域OTN物理网络的整网全局分析能力,并从各域采集样本数据进行该OTN DT网络层功能模型的训练。以构建属于跨域OTN网络层DT case感知类算法模型的跨域OTN故障根因识别功能模型为例,通常采用监督式学习构建跨域OTN故障根因识别算法模型,一个完整的训练样本由输入部分(各域告警)、RNN(Recurrent Neura l Network,循环神经网络)的输出故障根因识别值、故障根因所在域提供的根因标记组成。此外,跨域故障根因识别的模型训练样本收集的复杂性还在于:
a、故障根因发生的不确定性,不同训练样本的故障根因来自于不同域:很可能上一个训练样本的真实故障根因发生在A域,故障根因标记由A域提供;而下一个训练样本的真实故障根因发生在B域,故障根因标记即由B域提供;
b、属于同一个训练样本的输入部分的告警和故障根因标记来自于不同域,单域必须获得其他域提供的相关告警信息和故障根因标记,才能获得一个完整的训练样本,否则无法训练本域的RNN模型。
因此,针对跨域OTN故障根因识别的模型训练,单个OTN域的管控系统仅在本域范围内无法收集完整的训练样本并完成训练,而如果由统一协同各单域管控系统的多域编排系统完成统一的模型训练,则存在以下问题:
1、对各单域大量训练数据的收集、处理,增加了多域编排系统的计算负荷,不符合多域编排系统的功能定位;
2、各单域有告警、网络拓扑、用户业务等相关数据信息的隐私保护诉求,不宜全部开放并集中上报给多域编排系统。
总之,如何让各单域管控系统收集到完整的跨域故障根因识别模型训练样本数据并完成训练、增强模型推理的泛化能力,并保护各域数据隐私。成为对跨域OTN网络的DT case功能模型建模的瓶颈,亟待解决。
基于此,本申请实施例提供了一种OTN数字孪生网络生成方法及系统,采用纵向联邦学习技术构建跨域故障根因识别模型的训练样本数据,解决数据隐私问题并提高模型泛化能力。
参照图1,本申请实施例提供一种OTN数字孪生网络生成方法,所述OTN数字孪生网络映射于包括多个单域物理网络的OTN多域物理网络系统,所述OTN多域物理网系统还包括单域管控系统和多域编排系统,所述单域管控系统与所述单域物理网络一一对应,所述单域管控系统和所述多域编排系统具有相同结构的跨域故障根因识别模型;对于OTN多域物理网络系统中单域管控系统执行的方法以及多域编排系统执行的方法步骤,下面分两部分进行详细说明:
对于OTN多域物理网络系统中任一单域管控系统,所述方法包括但不限于以下步骤S110、步骤S120、步骤S130、步骤S140和步骤S150:
步骤S110,对本地的故障根因标记进行同态加密得到加密故障根因标记;
步骤S120,接收与加密故障根因标记对应的全部加密告警样本序列,加密告警样本序列由对应单域的单域管控系统对相关告警信息进行同态加密得到;
步骤S130,根据加密故障根因标记和加密告警样本序列生成单域训练集;
步骤S140,根据单域训练集训练本地的跨域故障根因识别模型,以得到跨域故障根因识别模型的模型参数更新量;
步骤S150,将模型参数更新量上报多域编排系统,供多域编排系统基于模型参数更新量和各个单域管控系统的拓扑信息生成OTN数字孪生网络。
纵向联邦学习的主要架构包括三种实体,分别为实体A、实体B和协调者C,纵向联邦学习的应用场景中A和B的样本来自共同的数据方用户,数据方样本I D相同,但不同样本的特征维度不同。纵向联邦学习的简要流程包括:
步骤1:协调者C创建密钥对,并将公共密钥发送给A方和B方。
步骤2:A方和B方对中间结果进行加密和交换。中间结果用来帮助计算梯度和损失值。
步骤3:A方和B方计算加密梯度并分别加入附加掩码(additional mask)。B方还会计算加密损失。A方和B方将加密的结果发送给C方。
步骤4:协调者C对梯度和损失信息进行解密,并将结果发送回A方和B方。A方和B方解除梯度信息上的掩码,并根据这些梯度信息来更新模型参数。
基于纵向联邦学习,本申请实施例的基于纵向联邦学习构建跨域DT网络层方案为同构跨域VFML的DT建模方案,以构建跨域OTN网络DT case的跨域故障根因识别模型为例,其OTN多域物理网络的结构如图2所示:OTN多域物理网络包括多个单域物理网络,这些单域物理网络可以基于相同厂商交换技术搭建,也可以基于不同厂商交换技术搭建,每个单域物理网络设置单域管控系统(Operations&Maintenance Center,OMC),各个单域管控系统连接到 多域编排系统,多域编排系统负责输出训练后的跨域故障根因识别模型,并基于各个单域物理网络的拓扑信息进行DT建模。本申请实施例的思路和特点如下:
1、单域内各网元节点不具备AI训练能力,各单域管控系统具备AI训练建模能力,跨厂商的多域编排系统也具备AI训练建模能力。将一个跨域故障诊断分析案例作为训练样本:故障根因标记发生在域k,而模型训练推测该根因标记的相关输入告警散布在其他各域当中,且故障根因标记和各域相关告警信息都属于该跨域故障分析案例,需要把各域的相关告警和根因标记收集齐备,才能实现模型的训练,符合且满足纵向联邦学习需要的应用场景和训练条件。
2、在本方案中多域编排系统起到边缘服务器的作用。各单域管控系统将自身AI算法训练的跨域故障根因识别模型参数更新量经同态加密上报给多域编排系统,并由多域编排系统对上报的所有单域管控系统的模型参数更新量进行解密聚合处理,同时更新多域编排系统构建的跨域故障根因识别公共模型参数。
3、多域编排系统将更新后的跨域故障根因识别公共模型参数,广播下发给各单域管控系统。
4、各单域管控系统用公共模型参数刷新本域的跨域故障根因识别模型参数,并据此迭代发起下一轮模型训练及与多域编排系统的交互。
5、如某个单域管控系统与多域编排系统之间出现通讯故障或者节点自身出现故障,从而不能上传本域管控系统的模型参数梯度,这种情况不会影响多域编排系统对自身公共模型参数的更新及与其它单域管控系统的交互。
6、这种方案可以根据故障根因训练样本的批次选择相关OTN域参与纵向联邦学习的训练,例如本批次的故障根因标记与K个OTN域的告警相关,则该批次的模型训练与更新发生在多域编排器与这K个单域管控系统之间;而下个批次的故障根因标记与M个OTN域的告警相关,则该批次的模型训练与更新发生在多域编排器与这M个单域管控系统之间;参与两个批次训练的单域管控系统可能有重合,也可能没有,但多域编排系统始终参与所有批次的跨域故障根因识别模型训练,从而保证了训练出来的最终跨域故障根因识别公共模型具有更强的泛化能力和鲁棒性。
7、多域编排系统将根据最终训练获得的跨域故障根因识别模型、各域上报的加密网络拓扑信息及其它功能模型信息,生成整网跨域OTN DT网络层。
8、各域由不同设备制造商组网建设,因此各域管控系统在上报本域模型参数更新量之前,需要对本域的跨域故障根因识别模型参数更新做加密处理;同时各域上报给多域编排系统的本域网络拓扑信息,也视需要做加密处理。
9、采用纵向联邦学习训练公共跨域故障根因识别模型,要确保各单域管控系统的RNN+Softmax(此处以RNN+Softmax为例)算法模型和在训练中起到边缘服务器作用的多域编排系统的RNN+Softmax算法模型结构是相同的:包括RNN+Softmax模型输入的向量属性、向量参数个数、RNN+Softmax模型的层数,每层的神经元个数,层与层之间的激活函数、连接关系,输出的向量属性、输出的向量参数个数等等,以确保RNN+Softmax模型参数的统一训练和同步刷新。
参照图3,本申请实施例的建模方式如下:
以RNN+softmax模型结构的跨域故障根因识别模型为例,对于某一批次的训练样本,每 个基于监督学习RNN+Softmax模型的故障根因训练样本由以下两部分:输入:一个0到t时刻的跨域告警样本序列;输出:故障根因标记。在跨域故障根因识别的RNN+Softmax模型训练中,假设该批次的模型训练有K个OTN域参与,每个域都有故障根因标记,且各域都有与单独一个故障根因标记相对应的相关故障告警信息,设域k的故障根因标记数为n kL,则对i kL∈[1,n kL]有:
Figure PCTCN2022134719-appb-000001
其中,n A为正整数,每个故障根因标记对应的t时刻模型输入告警样本向量是由所有K个域中的每个域的关联告警拼接而成的,因此n A表示拼接每个输入告警样本向量的每域关联告警数量的上限值,假设加密告警样本序列均为l维列向量,则如果某域拼接该告警样本向量所能提供的关联告警数量n A<l,则剩余的其他元素项做补零处理。
Figure PCTCN2022134719-appb-000002
为标量值,表示域k的第i kL个故障根因标记。设该训练批次有K个OTN域参与,每域故障根因种类有m个,则OTN多域故障根因共有k*m个取值,可表示出根因故障所在的域、以及故障根因的种类,即为
Figure PCTCN2022134719-appb-000003
的取值范围。
x ikL表示域k的故障根因标记
Figure PCTCN2022134719-appb-000004
所对应的告警样本序列。
Figure PCTCN2022134719-appb-000005
表示告警样本序列x ikL在t时刻的告警采样,以向量形式表示,该向量共有K*n A维告警元素,向量由来自于各域的告警元素组成。例如
Figure PCTCN2022134719-appb-000006
表示来自域1的关联告警元素。由于采用补零处理,x ikL也是l维向量。
参照图3所示的跨域故障根因识别模型,输入样本序列为
Figure PCTCN2022134719-appb-000007
通过softmax分类层输出故障根因为y ω(x ikL),y ω(x ikL)表示基于第i kL个告警样本序列经过RNN推理得出的故障根因,采用的向量形式(K*m维)表示各种故障根因发生的概率,故障根因可以是“光纤老化”、“设备欠压”、“光模块故障”等,对应的故障告警信息可以是“帧丢失LOF”、“信号丢失LOS”、“光功率劣化PD”等。其中,ω(ω 123,…,ω K*m)表示RNN+softmax的模型参数向量,θ(θ αβ,b)表示RNN模型参数向量。
从AI角度讲,本方案将RNN技术构建的跨OTN域DT case中的跨域故障根因识别模型看作一个解决多分类问题的逻辑回归模型,RNN模型输出采用Softmax的形式,则有:
Figure PCTCN2022134719-appb-000008
其中,由RNN模型对n kL个关联告警样本序列x ikL推理获得的OTN跨域故障根因的代价函数可以用下式表示:
Figure PCTCN2022134719-appb-000009
其中1{·}运算表示取值规则为大括号内的表达式为真时取1,为假时取0。
ω(ω 123,…,ω K*m)可以用K*m维列向量表示,ω j表示与故障根因取值为j有关的RNN模型参数。
OTN DT case的跨域故障根因识别模型训练的目标函数即为:
Figure PCTCN2022134719-appb-000010
目标函数J(ω)对RNN模型参数ω的梯度可表示如下,进而可通过梯度下降迭代算法完成对RNN模型参数ω的训练:
Figure PCTCN2022134719-appb-000011
Figure PCTCN2022134719-appb-000012
Figure PCTCN2022134719-appb-000013
表示目标函数J(ω)对RNN模型参数ω j的梯度。
参照图4,对于跨域故障根因识别模型的训练集,构建方式可以通过以下步骤实现:
步骤S210,合并各个加密告警样本序列得到完整加密告警训练样本;
步骤S220,根据加密故障根因标记和完整加密告警训练样本的对应关系构建单域训练集。
基于纵向联邦学习中A和B加密和交换中间数据的方式,对某一单域管控系统的故障根因标记和除该单域管控系统外的其他全部单域管控系统的相关告警信息,进行同态加密并交换,得到加密故障根因标记和与加密故障根因标记对应的加密告警样本序列,通过上述加密后的两组信息构建单域训练集,该单域训练集归属于故障根因标记所在的单域。
参照图5,设带有故障根因标记的单域管控系统所在单域为域k,对于除域k外的任一单域管控系统,构建所属单域的单域训练集,包括:
步骤S221,根据域k的故障根因标记确定与故障根因标记相关的第一告警样本向量;
步骤S222,通过同态加密处理第一告警样本向量,得到第一加密告警样本向量并将第一加密告警样本向量发送到其他单域;
步骤S223,接收其他单域同态加密处理后的第二加密告警样本向量,第二加密告警样本向量对应的第二告警样本向量与故障根因标记相关;
步骤S224,合并第一加密告警样本向量和第二加密告警样本向量,得到当前单域的加密告警样本序列;
步骤S225,根据域k提供的加密故障根因标记和加密告警样本序列构建单域训练集,加密故障根因标记由域k对故障根因标记同态加密得到。
参照图2所示,以厂商1的OTN单域1获得一个完整加密训练样本为例,假设该样本的故障根因标记由域k提供:
首先,将与故障根因标记
Figure PCTCN2022134719-appb-000014
相对应的RNN模型在t时刻的输入告警样本向量在域1中的部分做同态加密处理,得到第一加密告警样本向量(本方案中加密符号用en()表示,解密符号用dec()表示),并将加密后的该告警向量发送给其它各域。
第一加密告警样本向量的表达式如下:
Figure PCTCN2022134719-appb-000015
Figure PCTCN2022134719-appb-000016
表示域1中与故障根因标记
Figure PCTCN2022134719-appb-000017
相对应的RNN模型在t时刻的输入告警样本向量,向量中对于不属于域1告警信息的部分做补零处理。
然后,从其他域获得与故障根因标记
Figure PCTCN2022134719-appb-000018
相对应的RNN模型在t时刻的输入告警样本向量经过同态加密处理的在各自域内的部分,并根据同态加密的合并公式做合并处理(合并公式为f(En(m1),En(m2),…,En(mk))=En(f(m1,m2,…,mk))):
Figure PCTCN2022134719-appb-000019
上述两个步骤为0-t时刻的采样样本序列,调整采样时刻并重复上述两个步骤(假设其他时刻对应的加密输入告警样本向量为
Figure PCTCN2022134719-appb-000020
统计全部时刻的加密输入告警样本向量并最终获得完整的加密告警样本序列en(x ikL(1)),以及域k提供的加密故障根因标记
Figure PCTCN2022134719-appb-000021
此外,其它域采用类似上述步骤处理,也都各自获得一个完整的加密跨域故障根因识别训练样本,即第二加密告警样本向量;按照类似方法,各域获得所有的加密跨域故障根因识别训练样本。按照联邦学习的算法思路,不参与为本组n kL个样本的提供输入告警和故障根因标记的其它OTN域(即去除该K个域之外的域),则不参与相关训练样本的加密获取与交换,也不参与本组n kL个样本的跨域故障根因识别的联邦学习模型训练与参数更新。
针对上述跨域故障根因识别模型和构建得到的单域训练集,下面提出跨域OTN case中的跨域故障根因识别模型的训练流程。
多域编排系统的跨域故障根因识别RNN+Softmax模型与各单域管控系统对应的RNN模型的纵向联邦学习训练总流程如下:
初始化多域编排系统的跨域故障根因识别模型的公共模型参数ω 0,并令迭代计数k=0; 采集故障根因标记在单域k的跨域故障根因识别的单域训练集,该单域训练集共有n kL个样本;采用n kL个样本训练多域编排系统和各个单域管控系统的跨域故障根因识别模型,每次训练使得迭代计数加1,并判断k是否超过限值K,不超过则继续迭代,超过则结束训练。
参照图6,上述步骤S150的训练过程可以包括以下步骤:
步骤S310,将模型参数更新量上报至多域编排系统;
步骤S320,接收多域编排系统下发的公共模型参数,公共模型参数根据模型参数更新量和初始公共模型参数得到,初始公共模型参数由多域编排系统于迭代训练前下发单域管控系统;
步骤S330,根据公共模型参数更新本地的跨域故障根因识别模型的模型参数;
步骤S340,基于模型参数更新量和公共模型参数对跨域故障根因识别模型进行迭代训练,直至跨域故障根因识别模型符合结束条件,以使多域编排系统根据训练后的跨域故障根因识别模型和各个单域管控系统的拓扑信息生成OTN数字孪生网络。
单域管控系统和多域编排系统之间由于具有相同结构的跨域故障根因识别模型,因此在训练过程中通过模型参数的传递进行迭代训练,单域管控系统在每次训练完后计算梯度并将模型参数更新量发送到多域编排系统,多域编排系统根据模型参数更新量判断收敛,未收敛的情况下根据模型参数更新量计算得到新的公共模型参数,并将新的公共模型参数下发给单域管控系统,以此类推进行迭代,最终使得参数收敛。
参照图7,基于上述迭代训练的流程,本申请实施例的单次迭代流程如下:
步骤S410,单域管控系统接收更新后的公共模型参数;
步骤S420,单域管控系统根据所属单域的跨域故障根因识别模型对更新后的公共模型参数求梯度并进行同态加密处理;
步骤S430,根据梯度计算结果确定跨域故障根因识别模型的模型参数更新量并将模型参数更新量上报多域编排系统;
步骤S440,多域编排系统根据模型参数更新量更新公共模型参数,并将公共模型参数下发到单域管控系统。
对于单次迭代过程:假设迭代次数当前为p,假设当前多域编排系统的公共模型参数为ω p,多域编排系统将公共模型参数为ω p下发给各个单域管控系统,各个单域管控系统根据公共模型参数为ω p进行梯度计算,得到模型参数的更新量,并将模型参数的更新量上报给多域编排系统,多域编排系统判断收敛条件:
Figure PCTCN2022134719-appb-000022
当收敛,则结束迭代,当不收敛,则根据下式更新公共模型参数:
Figure PCTCN2022134719-appb-000023
并使得迭代计数加1,即p+1,其中ω p+1为用于第P+1轮迭代的公共模型参数。ω p+1继续下发给各个单域管控系统从而进行第P+1轮迭代。
对于第P+1轮迭代的过程,假设ω p已经下发到各个单域管控系统,在此基础上第p+1轮多域编排系统与单域管控系统(以单域1为例)的纵向联邦学习训练交互流程如下:
首先,由单域1的跨域故障根因识别模型的代价函数对ω p求梯度并进行同态加密处理:
Figure PCTCN2022134719-appb-000024
然后,计算模型参数更新量,并由单域1将该模型参数更新量同态加密上报给多域编排系统,模型参数更新量按照下式计算:
Figure PCTCN2022134719-appb-000025
最后,多域编排系统在获得单域1在内的所有单域的加密模型参数更新量
Figure PCTCN2022134719-appb-000026
后,采用上述更新公共模型参数的公式,更新第p+1轮公共模型参数ω p+1并下发给各个单域。
其中en(g 1)表示梯度计算得到的加密模型参数更新量,
Figure PCTCN2022134719-appb-000027
表示由单域1的管控系统在第p+1轮迭代上报给多域编排系统的该单域的加密模型参数更新量,a为学习率。
参照图8,对于OTN多域物理网络系统中的多域编排系统,所述方法包括但不限于以下步骤S510:
步骤S510,接收单域管控系统生成的模型参数更新量,并基于模型参数更新量和各个单域物理网络的拓扑信息生成OTN数字孪生网络;
其中,模型参数更新量由单域管控系统根据单域训练集训练对应单域的跨域故障根因识别模型得到,单域训练集由单域管控系统根据加密故障根因标记和与加密故障根因标记对应的加密告警样本序列生成,加密故障根因标记由单域管控系统对所在单域的故障根因标记同态加密得到,加密告警样本序列由单域管控系统对所在单域的相关告警信息同态加密得到。
同样地,参照前述的步骤S110至步骤S150,多域编排系统根据单域管控系统上传的模型参数更新量进行迭代训练,根据迭代训练的结果,结合各个单域物理网络的拓扑信息生成OTN数字孪生网络。
参照图9,相对地,在多域编排系统一侧,执行步骤S510迭代训练生成OTN数字孪生网络可以包括以下步骤:
步骤S511,根据模型参数更新量和初始公共模型参数生成公共模型参数,并下发公共模型参数,初始公共模型参数由多域编排系统于迭代训练前下发单域管控系统;
步骤S512,基于模型参数更新量和公共模型参数对跨域故障根因识别模型进行迭代训练,直至跨域故障根因识别模型符合结束条件;
步骤S513,根据训练后的跨域故障根因识别模型和各个单域管控系统的拓扑信息生成OTN数字孪生网络。
上述多域编排系统迭代训练的结束条件与前述单域管控系统迭代训练的结束条件相同,在此不再重复说明。
通过上述方案,基于纵向联邦学习技术对各个单域物理网络的数据进行同态加密,并根据故障根因标记和与故障根因标记对应的相关告警信息构建单域训练集,通过单域训练集对跨域故障根因识别模型进行训练,训练过程中根据公共模型参数和模型参数更新量判断收敛,从而根据训练完成的跨域故障根因识别模型和各个单域物理网络的拓扑信息生成OTN数字孪生网络。本申请实施例满足了多域网络中各单域的告警信息、用户业务数据等相关数据的隐私保护诉求,同时在多域编排系统在OTN多域物理网络中为边缘服务器的情况下,应用本申请实施例的方法还可以利用边缘设备的计算能力进行并行训练,提升模型训练效率。
值得注意的是,本申请实施例的方案除了应用到OTN网络中,还可以用到其他同构网络中,例如PTN、POTN、IP网等。上述跨域故障识别模型虽然均为RNN+softmax结构为例进行 说明,但是显然在神经网络算法领域中,RNN也存在多个变种或可替换的算法,分类层除了softmax可以有其他类型的分类算法,在此不一一举例,本领域技术人员可以根据实际情况选用合适的算法构建跨域故障识别模型。
本申请实施例还提供了一种单域管控系统,包括至少一个处理器和用于与至少一个处理器通信连接的存储器;存储器存储有能够被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行前述的单域管控系统一侧的OTN数字孪生网络生成方法。
本申请实施例还提供了一种多域编排系统,包括至少一个处理器和用于与至少一个处理器通信连接的存储器;存储器存储有能够被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行前述的多域编排系统一侧的OTN数字孪生网络生成方法。
本申请实施例还提供了一种OTN多域物理网络系统,包括前述执行OTN数字孪生网络生成方法的单域管控系统和多域编排系统,单域管控系统与多域编排系统连接以实现数据交互。
本申请实施例提供的OTN数字孪生网络生成方法,至少具有如下有益效果:通过纵向联邦学习对各个单域物理网络的数据进行同态加密后交换数据,根据故障根因标记和与故障根因标记对应的相关告警信息构建单域训练集,通过单域训练集对跨域故障根因识别模型进行训练,训练过程中根据公共模型参数和模型参数更新量判断收敛,从而根据训练完成的跨域故障根因识别模型和各个单域物理网络的拓扑信息生成OTN数字孪生网络。本申请实施例满足了多域网络中各单域的告警信息、用户业务数据等相关数据的隐私保护诉求,同时在多域编排系统在OTN多域物理网络中为边缘服务器的情况下,应用本申请实施例的方法还可以利用边缘设备的计算能力进行并行训练,提升模型训练效率。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的若干实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请本质的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (15)

  1. 基于纵向联邦学习的光传送网OTN数字孪生网络生成方法,应用于OTN多域物理网络系统中的任一单域管控系统,所述OTN多域物理网络系统还包括多域编排系统,所述多域编排系统和所述单域管控系统具有相同结构的跨域故障根因识别模型;所述方法包括:
    对本地的故障根因标记进行同态加密得到加密故障根因标记;
    接收与所述加密故障根因标记对应的全部加密告警样本序列,所述加密告警样本序列由对应单域的所述单域管控系统对相关告警信息进行同态加密得到;
    根据所述加密故障根因标记和所述加密告警样本序列生成单域训练集;
    根据所述单域训练集训练本地的所述跨域故障根因识别模型,以得到所述跨域故障根因识别模型的模型参数更新量;
    将所述模型参数更新量上报所述多域编排系统,供所述多域编排系统基于所述模型参数更新量和各个所述单域管控系统的拓扑信息生成OTN数字孪生网络。
  2. 根据权利要求1所述的OTN数字孪生网络生成方法,其中,每个所述加密告警样本序列为l维列向量,所述加密告警样本序列中除相关告警信息的元素项,其余元素项做补零处理。
  3. 根据权利要求2所述的OTN数字孪生网络生成方法,其中,所述根据所述加密故障根因标记和所述加密告警样本序列生成单域训练集,包括:
    合并各个所述加密告警样本序列得到完整加密告警训练样本;
    根据所述加密故障根因标记和所述完整加密告警训练样本的对应关系构建单域训练集。
  4. 根据权利要求1所述的OTN数字孪生网络生成方法,其中,所述将所述模型参数更新量上报所述多域编排系统,供所述多域编排系统基于所述模型参数更新量和各个所述单域管控系统的拓扑信息生成OTN数字孪生网络,包括:
    将所述模型参数更新量上报至所述多域编排系统;
    接收所述多域编排系统下发的公共模型参数,所述公共模型参数根据所述模型参数更新量和初始公共模型参数得到,所述初始公共模型参数由所述多域编排系统于迭代训练前下发所述单域管控系统;
    根据所述公共模型参数更新本地的跨域故障根因识别模型的模型参数;
    基于所述模型参数更新量和所述公共模型参数对所述跨域故障根因识别模型进行迭代训练,直至所述跨域故障根因识别模型符合结束条件,以使所述多域编排系统根据训练后的所述跨域故障根因识别模型和各个所述单域管控系统的拓扑信息生成OTN数字孪生网络。
  5. 根据权利要求4所述的OTN数字孪生网络生成方法,其中,对所述跨域故障根因识别模型进行迭代训练的结束条件为:
    Figure PCTCN2022134719-appb-100001
    其中K表示具有K个单域,en()表示同态加密运算符,dec()表示同态加密的公共密钥解密运算符,
    Figure PCTCN2022134719-appb-100002
    表示第k个单域在第p次迭代中的模型参数更新量。
  6. 根据权利要求5所述的OTN数字孪生网络生成方法,其中,当所述跨域故障根因识别模型进行迭代训练未符合所述结束条件,则所述公共模型参数更新为:
    Figure PCTCN2022134719-appb-100003
    其中ω p+1为用于第P+1轮迭代的公共模型参数。
  7. 根据权利要求1所述的OTN数字孪生网络生成方法,其中,在上报所述拓扑信息之前,还包括:
    对所述拓扑信息进行加密处理。
  8. 基于纵向联邦学习的OTN数字孪生网络生成方法,应用于OTN多域物理网络系统中的多域编排系统,所述OTN多域物理网络系统还包括单域管控系统,所述多域编排系统和所述单域管控系统具有相同结构的跨域故障根因识别模型;所述方法包括:
    接收所述单域管控系统生成的模型参数更新量,并基于所述模型参数更新量和各个所述单域物理网络的拓扑信息生成OTN数字孪生网络;
    其中,所述模型参数更新量由所述单域管控系统根据单域训练集训练对应单域的所述跨域故障根因识别模型得到,所述单域训练集由所述单域管控系统根据加密故障根因标记和与所述加密故障根因标记对应的加密告警样本序列生成,所述加密故障根因标记由所述单域管控系统对所在单域的故障根因标记同态加密得到,所述加密告警样本序列由所述单域管控系统对所在单域的相关告警信息同态加密得到。
  9. 根据权利要求8所述的OTN数字孪生网络生成方法,其中,所述基于所述模型参数更新量和各个所述单域物理网络的拓扑信息生成OTN数字孪生网络,包括:
    根据所述模型参数更新量和初始公共模型参数生成公共模型参数,并下发所述公共模型参数,所述初始公共模型参数由所述多域编排系统于迭代训练前下发所述单域管控系统;
    基于所述模型参数更新量和所述公共模型参数对所述跨域故障根因识别模型进行迭代训练,直至所述跨域故障根因识别模型符合结束条件;
    根据训练后的所述跨域故障根因识别模型和各个所述单域管控系统的拓扑信息生成OTN数字孪生网络。
  10. 根据权利要求9所述的OTN数字孪生网络生成方法,其中,对所述跨域故障根因识别模型进行迭代训练的结束条件为:
    Figure PCTCN2022134719-appb-100004
    其中K表示具有K个单域,en()表示同态加密运算符,dec()表示同态加密的公共密钥解密运算符,
    Figure PCTCN2022134719-appb-100005
    表示第k个单域在第p次迭代中的模型参数更新量。
  11. 根据权利要求10所述的OTN数字孪生网络生成方法,其中,当所述跨域故障根因识别模型进行迭代训练未符合所述结束条件,则所述公共模型参数更新为:
    Figure PCTCN2022134719-appb-100006
    其中ω p+1为用于第P+1轮迭代的公共模型参数。
  12. 根据权利要求8所述的OTN数字孪生网络生成方法,其中,所述跨域故障根因识别模型由多个循环神经网络RNN单元和softmax分类层构成。
  13. 单域管控系统,包括至少一个处理器和用于与所述至少一个处理器通信连接的存储器; 所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任意一项所述的OTN数字孪生网络生成方法。
  14. 多域编排系统,包括至少一个处理器和用于与所述至少一个处理器通信连接的存储器;所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求8至12中任意一项所述的OTN数字孪生网络生成方法。
  15. OTN多域物理网络系统,包括权利要求13所述的单域管控系统和如权14所述的多域编排系统,所述多域编排系统与所述单域管控系统连接。
PCT/CN2022/134719 2022-03-23 2022-11-28 基于纵向联邦学习的otn数字孪生网络生成方法及系统 WO2023179073A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210286945.9A CN116866740A (zh) 2022-03-23 2022-03-23 基于纵向联邦学习的otn数字孪生网络生成方法及系统
CN202210286945.9 2022-03-23

Publications (1)

Publication Number Publication Date
WO2023179073A1 true WO2023179073A1 (zh) 2023-09-28

Family

ID=88099744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134719 WO2023179073A1 (zh) 2022-03-23 2022-11-28 基于纵向联邦学习的otn数字孪生网络生成方法及系统

Country Status (2)

Country Link
CN (1) CN116866740A (zh)
WO (1) WO2023179073A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737749A (zh) * 2020-06-28 2020-10-02 南方电网科学研究院有限责任公司 基于联邦学习的计量装置告警预测方法及设备
CN111897673A (zh) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 运维故障根因识别方法、装置、计算机设备和存储介质
CN113259148A (zh) * 2020-12-31 2021-08-13 中兴通讯股份有限公司 基于联邦学习的告警关联检测方法、系统、网络及介质
WO2021160686A1 (en) * 2020-02-10 2021-08-19 Deeplife Generative digital twin of complex systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021160686A1 (en) * 2020-02-10 2021-08-19 Deeplife Generative digital twin of complex systems
CN111737749A (zh) * 2020-06-28 2020-10-02 南方电网科学研究院有限责任公司 基于联邦学习的计量装置告警预测方法及设备
CN111897673A (zh) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 运维故障根因识别方法、装置、计算机设备和存储介质
CN113259148A (zh) * 2020-12-31 2021-08-13 中兴通讯股份有限公司 基于联邦学习的告警关联检测方法、系统、网络及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI JIAN, SHAO YUNFENG; LU YI; WU JUN: "Overview of Federal Learning and its Application in Telecom Industry", INFORMATION AND COMMUNICATIONS TECHNOLOGY AND POLICY, no. 9, 15 September 2020 (2020-09-15), pages 35 - 41, XP093094434 *

Also Published As

Publication number Publication date
CN116866740A (zh) 2023-10-10

Similar Documents

Publication Publication Date Title
Raskutti et al. Learning directed acyclic graph models based on sparsest permutations
CN112733967B (zh) 联邦学习的模型训练方法、装置、设备及存储介质
US20230039182A1 (en) Method, apparatus, computer device, storage medium, and program product for processing data
US8051330B2 (en) Fault location in telecommunications networks using bayesian networks
CN105074735B (zh) 通过在多个学习机器之间共享信息来加速学习
Manias et al. Concept drift detection in federated networked systems
Rusek et al. Message-passing neural networks learn little’s law
Oliva et al. Distributed finite-time average-consensus with limited computational and storage capability
CN113746663B (zh) 机理数据双驱动结合的性能劣化故障根因定位方法
CN114666204B (zh) 一种基于因果强化学习的故障根因定位方法及系统
US11948077B2 (en) Network fabric analysis
Friesen et al. Machine learning for zero-touch management in heterogeneous industrial networks-a review
Zhao et al. Spatiotemporal graph convolutional recurrent networks for traffic matrix prediction
WO2023179073A1 (zh) 基于纵向联邦学习的otn数字孪生网络生成方法及系统
CN117217820A (zh) 供应链采购需求智能集成预测方法及系统
Liu et al. EAGLE: Heterogeneous GNN-based Network Performance Analysis
Zhang et al. A novel virtual network fault diagnosis method based on long short-term memory neural networks
Wehbe et al. A deep learning approach for probabilistic security in multi-robot teams
JP2023543128A (ja) 動的アテンショングラフネットワークに基づくマーケティング裁定取引ネット暗黒産業の識別方法
WO2023168976A1 (zh) 光传送网性能预测方法、系统、电子设备及存储介质
Salami et al. Diffusion social learning over weakly-connected graphs
Wang et al. Distributed Optimization, Game and Learning Algorithms
Shi et al. Randomized optimal consensus of multiagent systems based on a novel intermittent projected subgradient algorithm
Shen et al. Long-term multivariate time series forecasting in data centers based on multi-factor separation evolutionary spatial–temporal graph neural networks
Doostmohammadian et al. Discretized Distributed Optimization Over Dynamic Digraphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933124

Country of ref document: EP

Kind code of ref document: A1