CN114938337A

CN114938337A - Model training method and device and electronic equipment

Info

Publication number: CN114938337A
Application number: CN202210380734.1A
Authority: CN
Inventors: 田光见; 饶思维; 叶强; 段艳杰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-08-23

Abstract

The application provides a model training method and device and electronic equipment. The method comprises the steps of determining N types of sample data, wherein the sample data comprises data generated by network equipment in a communication network, and N is more than or equal to 1; the method comprises the steps of carrying out self-supervision training on a first target model based on sample data, wherein the first target model comprises a first sub-model, the first sub-model comprises a first network and a second network, the first network comprises N first sub-networks, each first sub-network is used for carrying out feature extraction on one type of sample data to obtain N first features, the second network comprises N second sub-networks, and each second sub-network is used for predicting one type of sample data based on the N first features. Therefore, unified modeling of the existing network data of various different types is realized, the business association relation among different data is effectively modeled, and the difficulty of network operation and maintenance is reduced.

Description

Model training method and device and electronic equipment

Technical Field

The present application relates to the technical field of Artificial Intelligence (AI), and in particular, to a model training method and apparatus, and an electronic device.

Background

With the development of communication technologies, the number of physical sites, the number of logical sites, frequency spectrums, and the like in a global communication network are gradually increased, so that large-scale heterogeneous networks (such as Long Term Evolution (LTE), Universal Mobile Telecommunications System (UMTS), Wireless Local Area Networks (WLAN), and the like) coexist. When large-scale heterogeneous networks coexist, the network environment becomes more and more complex, so that the difficulty of network operation and maintenance is increased greatly.

Disclosure of Invention

The application provides a model training method, a model training device, electronic equipment, a computer storage medium, a computer product and a chip, which can realize unified modeling of existing network data of various different types, thereby effectively modeling business association relations among different data and reducing difficulty of network operation and maintenance.

In a first aspect, the present application provides a model training method, which may include: determining N types of sample data, wherein the sample data comprises data generated by network equipment in a communication network, and N is more than or equal to 1; the method comprises the steps of carrying out self-supervision training on a first target model based on sample data, wherein the first target model comprises a first sub-model, the first sub-model comprises a first network and a second network, the first network comprises N first sub-networks, each first sub-network is used for carrying out feature extraction on one type of sample data to obtain N first features, the second network comprises N second sub-networks, and each second sub-network is used for predicting one type of sample data based on the N first features. For example, the sample data may be live network data described below, and the first sub-model may be a second sub-model described below.

Therefore, unified modeling of the existing network data of various different types can be realized, so that the business association relation among different data can be effectively modeled, and the difficulty of network operation and maintenance is reduced. Meanwhile, different types of networks for feature extraction and networks for data prediction are designed according to different types of sample data, and modeling can be effectively performed according to the characteristics of the different types of data, so that the internal structural characteristics of the different types of data can be better modeled, and the accuracy of model prediction is improved.

In one possible implementation, the N types of sample data include: discrete Token sequence data, high-dimensional continuous time sequence data and event sequence data, wherein the first network is an encoder, and the second network is a decoder; a first sub-network corresponding to the discrete Token sequence data is a network which is coded based on a Transformer, and a second sub-network corresponding to the discrete Token sequence data is a network which is decoded based on the Transformer; the first sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are coded based on a Recurrent Neural Network (RNN), and the second sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are decoded based on the RNN.

In a possible implementation manner, the first target model further includes a second submodel, the second submodel is obtained based on knowledge training related to the communication field, and the second submodel is used for generating a representation of knowledge related to the communication field included in the N types of sample data to obtain a second feature; wherein each second sub-network is configured to predict a type of sample data based on the second features and the N first features. Therefore, communication principle knowledge is introduced through the second sub-model, the defect that the operation and maintenance analysis only aims at single data to carry out modeling and cannot reflect business facts is overcome, and the problem that data abnormality is not equal to business abnormality is solved; meanwhile, compared with a method for introducing communication principle knowledge into the rules, the workload of manually summarizing the rules is reduced, and the generalization capability and the expansibility of the technical scheme are improved. For example, the second submodel may be the submodel one described below, and the knowledge related to the communication domain may be the knowledge of the communication principle described below.

In a possible implementation manner, when the sample data is discrete Token sequence data and/or event sequence data, obtaining semantic representation of the sample data through a second sub-model to obtain a second feature; and when the sample data is high-dimensional continuous time sequence data, obtaining the semantic representation of the sample data in the preset time through the second submodel to obtain a second characteristic.

In one possible implementation, after performing the self-supervision training on the first target model based on the sample data, the method further includes: determining a target neural network corresponding to a target task; adding a target neural network into the first target model to obtain a second target model, wherein the input of the target neural network is the output of a first network in the first submodel; and training the second target model based on the sample data with the labels corresponding to the target tasks. Therefore, the pre-trained first target model is utilized, the network layer which is adapted to the target task is superposed on the basis, and the new network structure is subjected to iterative training with a small number of rounds by using specific labeling data, so that the whole network is subjected to small adjustment to adapt to the specific target task, the training amount and labeling sample demand of the second target model are reduced, and the model training efficiency is improved. By way of example, this process may be understood as a "downstream task fine tuning" process described below.

In a second aspect, the present application provides a model training apparatus, comprising: a determination unit and a training unit. The determining unit is used for determining N types of sample data, the sample data comprises data generated by network equipment in a communication network, and N is larger than or equal to 1. The training unit is used for carrying out self-supervision training on a first target model based on sample data, wherein the first target model comprises a first sub-model, the first sub-model comprises a first network and a second network, the first network comprises N types of first sub-networks, each type of first sub-network is used for carrying out feature extraction on one type of sample data to obtain N types of first features, the second network comprises N types of second sub-networks, and each type of second sub-network is used for predicting one type of sample data based on the N types of first features.

In a possible implementation manner, the first target model further includes a second submodel, the second submodel is obtained based on knowledge training related to the communication field, and the second submodel is used for generating a representation of knowledge related to the communication field included in the N types of sample data to obtain a second feature; wherein each second sub-network is configured to predict a type of sample data based on the second features and the N first features.

In one possible implementation, the apparatus further includes: the application unit is used for determining a target neural network corresponding to a target task after a first target model is obtained through training, and adding the target neural network into the first target model to obtain a second target model, wherein the input of the target neural network is the output of the first network in the first sub-model; the application unit is further configured to: and training the second target model based on the sample data with the labels corresponding to the target tasks.

In a third aspect, the present application provides an electronic device, comprising: at least one memory for storing a program; at least one processor for executing programs stored in the memory; wherein the processor is adapted to perform the method described in the first aspect or any one of the possible implementations of the first aspect when the memory stores a program that is executed.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method as described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, the present application provides a computer program product, which, when run on a processor, causes the processor to perform the method as described in the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a chip comprising at least one processor and an interface; at least one processor obtains program instructions or data through an interface; the at least one processor is configured to execute program instructions to implement the method described in the first aspect or any of its possible implementations.

It is understood that the beneficial effects of the second to sixth aspects can be seen from the description of the first aspect, and are not described herein again.

Drawings

Fig. 1 is a schematic view of an application scenario of a network operation and maintenance model provided in an embodiment of the present application;

fig. 2 is a schematic architecture diagram of a network operation and maintenance model provided in an embodiment of the present application;

FIG. 3 is a process for pre-training a first sub-model using a neural network architecture of BERT according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a framework for implementing a coder-decoder through a neural network of a conditional variational auto-encoder CVAE according to an embodiment of the present application;

fig. 5 is a schematic diagram of a process for injecting knowledge of communication principles in data of an existing network according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The term "and/or" herein is an association relationship describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The symbol "/" herein denotes a relationship in which the associated object is or, for example, a/B denotes a or B.

The terms "first" and "second," and the like, in the description and in the claims herein are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first response message and the second response message, etc. are for distinguishing different response messages, not for describing a specific order of the response messages.

In the embodiments of the present application, the words "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise specified, "a plurality" means two or more, for example, a plurality of processing units means two or more processing units, or the like; plural elements means two or more elements, and the like.

First, technical terms referred to in the present application will be described.

(1) Pre-training

In machine learning, a certain special task only has a very small amount of labeled related training data, so that a model cannot learn and summarize useful rules from the training data; however, the resources without labels are abundant, and by using as much training data without labels as possible, as much common features as possible are extracted from the training data, and a large model is trained in advance, and a specific task can be realized based on the large model, so that the learning burden is lightened. The big model obtained by pre-training is a model containing more layers and parameters obtained in an automatic supervision learning mode, and can provide better feature representation for samples of the model to be trained of a subsequent specific task.

(2) Downstream task fine-tuning

And (3) utilizing a pre-trained neural network structure and parameters (namely a large model), superposing a network layer which is adapted to a specific task on the basis, and carrying out a small number of rounds of iterative training on a new network structure by using specific marking data (namely a downstream task), thereby carrying out micro adjustment on the whole network to adapt to the specific downstream task. For example, when the downstream task is a classification task, a network layer for classification can be superimposed in a large model obtained by pre-training. The pre-trained large model can well represent the common characteristics in the data, so that a network model aiming at a specific task can be obtained by slightly adjusting parameters in the large model subsequently, and the specific task can be processed.

Next, the technical means of the present application will be described.

The problems faced by the current network operation and maintenance can be summarized as follows:

1. the data types are multiple, the structure is complex, and the data types are related to each other. The existing network data that needs to be processed by network operation and maintenance includes various types of data such as Call History Record (CHR), Key Performance Indicator (KPI), alarm, etc., and these data have a certain association relationship, and when service analysis is performed manually, various types of data often need to be analyzed simultaneously.

2. Data ≠ traffic. The data abnormality of the existing network is not equal to the business abnormality, the data can not completely reflect the business fact, and the data driving model has high false alarm rate due to no support of network communication protocol knowledge.

3. The abnormal data is small in quantity and difficult to collect. Normal data generated in a normal operation state are massive, abnormal data are accidental, and a model is difficult to construct facing the abnormal data.

Generally, when modeling the current network data of a communication network, a method for analyzing only a single data or fusing multiple data is too simple, so that the method cannot effectively handle complex business problems and even capture business association relations among different data. In addition, the communication principle knowledge introduced in the existing network data analysis process is a method for manually summarizing rules, so that the generalization capability is limited, and the expandability is poor.

In view of the above, the present application provides a network operation and maintenance model fusing a communication principle and present network data, wherein the present network data processed by the network operation and maintenance is uniformly modeled into one frame through a frame of an encoder-decoder, and different neural network structures are designed for different types of present network data under the uniform frame of the encoder-decoder aiming at the structural characteristics of the present network data, so that the characteristics of different data types are effectively modeled. Meanwhile, communication principle knowledge is introduced in a mode of domain pre-training a language model, on one hand, a better computable knowledge model is provided, on the other hand, the workload of manual rule summarization is reduced, and the generalization capability and the expansibility of the technical scheme are improved. The network operation and maintenance model can adopt a pre-training mode, can easily collect mass normal network data on normally running network equipment, and is pre-trained in a self-supervision and prediction mode; the pre-trained model can be used to support a variety of downstream tasks for fine tuning.

In this embodiment, the network operation and maintenance model may include two sub-models. The sub-models are pre-trained language models based on domain-related text (including but not limited to technical documents, product descriptions, communication protocols, cases, etc.); the second sub-model is a multi-modal encoder-decoder, which realizes the unified modeling of the sequence data (including discrete Token sequences, such as signaling and CHR, etc.), high-dimensional continuous time sequences, such as KPI, event sequences, such as alarms and logs, etc.) of the existing network in various forms. The second submodel can be pre-trained based on large-scale normal existing network data, and the first submodel is used for generating the representation of the communication principle and fusing with the existing network data and optimizing the parameters of the first submodel. The sub-model I and the sub-model II obtained by pre-training can be used for supporting downstream tasks of various network operation and maintenance.

For example, fig. 1 shows an application scenario diagram of a network operation and maintenance model. As shown in fig. 1, the process of pre-training to obtain the network operation and maintenance model may be described as follows: the method comprises a first sub-model pre-trained by using domain texts and a second sub-model pre-trained by using current network data. During pre-training, the first sub-model is used to provide communication principle knowledge to be fused into the second sub-model. And the sub-model I and the sub-model II obtained by pre-training form a final network operation and maintenance model.

The process of downstream task fine-tuning can be described as: and when the pre-trained submodel I and the pre-trained submodel II are used for fine tuning of the downstream tasks, the specific neural networks of the downstream tasks can be superposed on the pre-trained submodel I and the pre-trained submodel II, namely the submodel I and the submodel II are used as the input of the neural network framework of the downstream tasks. The first sub-model can be used for downstream tasks of text data, and the second sub-model can be used for downstream tasks of existing network data. The fine tuning is to superimpose the pre-trained model on the downstream task and then perform a few rounds of iterative training on the data of the downstream task by the overall neural network framework to obtain the final model of the downstream task.

For example, fig. 2 shows an architecture diagram of a network operation and maintenance model. As shown in fig. 2, the network operation and maintenance model may include two sub-models. The first sub-model can be a pre-trained language model, namely a pre-trained language model of the communication field shown in the figure; the second sub-model may be an encoder (encoder) -decoder (decoder) framework.

The first sub-model can be used for providing injection and fusion of communication principle knowledge for the second sub-model, namely, the communication principle is fused into the current network data, and the first sub-model can be a language model based on domain-related text pre-training. Knowledge of the principle of communication may be domainRelevant texts, such as technical documents, product descriptions, communication protocols, cases and the like, are recorded and stored in a natural language form. The first sub-model may be, but not limited to, a model such as bert (bidirectional encoder representation from transformers), a generative pre-trained Transformer (GPT), and the like, which is trained by using knowledge of communication principles. For example, as shown in FIG. 3, a pre-training process for submodel one using the neural network architecture of BERT is illustrated. In fig. 3, taking the input domain-related text length as 5 words as an example, one of the words is masked, i.e. w in the figure ₄ Masked, then characteristic coding is carried out on all words through a tansformer coder, and finally the coded characteristics are input into a full connection layer pair w ₄ Predicting, and comparing the predicted w _4 with the real w ₄ And performing comparison to realize self-supervision training.

The second submodel may be a multi-modal codec. Most current network operation and maintenance data (namely, current network data) are sequence data, and can be classified into the following types according to the difference of the forms: discrete Token sequence data (such as CHR), high-dimensional continuous time sequence data (such as KPI), event sequence data (such as alarm), therefore, the unified modeling of the three types of sequence data can be realized through a multi-modal encoder-decoder framework, and the model is pre-trained based on the data generated by the network equipment under normal state. Illustratively, as shown in fig. 4, the above-described codec framework may be implemented by, but is not limited to, a neural network of conditional argument auto Coders (CVAEs). With continuing reference to fig. 4, a transform can be used for the discrete token sequence as its encoding-decoding network, but is not limited to; for the high-dimensional continuous time sequence and event sequence, a Recurrent Neural Network (RNN) can be used as its encoding-decoding network. In fig. 4, the neural network selected for processing each sequence data may generate a hidden variable corresponding to each sequence data at the encoding stage; in the decoding stage, for the discrete token sequence, a prediction result can be directly generated through a decoding network; for the high-dimensional continuous time sequence, the distribution corresponding to the numerical value can be predicted through a network, and then the specific numerical value is obtained through the sampling of the distribution; for the event sequence, since the time points of the event occurrence are irregular, a continuously changing trajectory can be fitted through a neural network ordinary differential equation, and then the prediction of the event can be obtained through sampling. In some embodiments, in the pre-training process, after aligning the three types of data collected on the current network normal operation device through time axes, the data input encoder in the previous time window may be used for encoding, the prediction of the data in the next time window is obtained in the decoder, and the self-supervision training is realized through the comparison with the real data in the next time window.

With continued reference to fig. 2, when the network operation and maintenance model is pre-trained, a pre-trained language model (i.e., "submodel one") in the communication domain may be obtained by training based on the knowledge of the communication principle. Then, the present network data, such as a discrete Token sequence (e.g., CHR), a high-dimensional continuous time sequence (e.g., KPI), an event sequence (e.g., alarm), etc., are input to the encoder in the "sub-model two" for encoding respectively. In the encoding process, language features obtained by encoding at different stages can be input into the first sub-model, wherein the language features can comprise names, definitions, word descriptions of forms and the like of the existing network data, so that the communication principle knowledge is fused into the existing network data. The hidden variables corresponding to each type of current network data can be obtained after being coded by the coder, another hidden variable can be obtained after being processed by the first sub-model, the hidden variables can form a new characteristic, and the communication principle knowledge is fused in the characteristic. Finally, the new characteristic can be input into a decoder for decoding, so that the prediction of the existing network data at the next moment or in a time period is completed, the self-supervision training is realized based on the predicted data and the real data, and the pre-trained network operation and maintenance model is obtained. Illustratively, in the pre-training process, parameters in the first sub-model may be optimized simultaneously to achieve secondary training thereof. For example, the current network data may also be directly input into the first sub-model for processing, and the processed result is combined with the processed result in the second sub-model to obtain the new feature composed of hidden variables.

In some embodiments, when incorporating the knowledge of communication principles (which may be the language features contained in the different existing web data) into the existing web data, for discrete token sequence data, such as CHR, the token is a semantic unit (i.e. a word), and the semantic representation can be directly obtained through the sub-model as soon as the sub-model; for high-dimensional continuous time series data, such as KPI, combining the name of KPI and the form description in a sliding window into a text of natural language, and obtaining the semantic representation of the KPI through a first sub-model; for event sequence data, such as alarms, the definition of the alarm is natural language, and the semantic representation can be obtained through the sub-model. On the whole, semantic representations of various types of data in an input window of the encoder can be converted into a vector and finally fused with a coding vector of the current network data value, and the communication principle knowledge is injected into the current network data in this way. For example, taking KPIs and alarms as an example, as shown in fig. 5, a KPI description text can be obtained from the sliding window 51, a KPI description text can be obtained from the sliding window 52, and a KPI description text can be obtained from the sliding window 53, each alarm is composed of its own description text, and the corresponding semantic representation can be obtained by inputting the 3 KPI description texts and the 2 alarm description texts into the first sub-Model (i.e., "Pre-contained Language Model" in the figure).

It can be understood that the analysis of the network operation and maintenance present network data, such as anomaly detection, prediction prevention, root cause classification, etc., is essentially implemented based on the relevant tasks of machine learning, such as regression, classification, generation, etc. On one hand, the influence on the analysis effect is that business knowledge, namely a communication principle, needs to be introduced; the other hand is the quality of the computable vector into which the present network data is converted. Based on the starting point for generating the high-quality representation of the current network data, the characteristics that the normal current network data are massive and the abnormal current network data are difficult to acquire in network operation and maintenance are combined, and a high-quality data representation model can be constructed based on a mode of pre-training the normal current network data in the embodiment of the application. Thus, pre-training can be achieved by a framework of encoder-decoder modeling the existing net data, characterized encoding the existing net data in the encoder portion, and self-supervised prediction in the decoder portion.

In addition, the existing network data of the network operation and maintenance analysis can be roughly divided into a discrete Token sequence (such as signaling, CHR and the like), a high-dimensional continuous time sequence (such as KPI), an event sequence (such as alarm, log and the like), the data have different characteristics, the value of each timestamp of the discrete Token sequence is a set of a series of tokens, the value of each timestamp of the continuous time sequence is a real value, and the time sequence is a discrete value which is uncertainly distributed on the timestamps. Therefore, different neural networks can be designed in the framework of the encoder-decoder to model the three types of data, the characteristics of various types of data are effectively adapted, and the high-quality fusion characterization of various types of existing network data is realized. In addition, in the embodiment of the application, both the first sub-model and the second sub-model are constructed in a pre-training mode and can be used for supporting downstream tasks of various network operation and maintenance, wherein the first sub-model can be used for supporting downstream tasks of text types, such as machine question and answer, text classification and the like; the second sub-model can be used for supporting the analysis tasks of the existing network data, such as abnormality detection, prediction prevention and the like.

In addition, in the embodiment of the application, the existing network data processed by the network operation and maintenance is uniformly modeled into one framework through one framework of the encoder-decoder, so that the uniform modeling of the existing network data of various different types is realized, and the business association relation among different data can be effectively modeled. Meanwhile, aiming at the structural characteristics of the existing network data, different neural network structures are designed for different types of existing network data under the unified framework of an encoder and a decoder, and the characteristics of different data types are effectively modeled. In addition, a domain pre-training language model is used as a base of communication principle knowledge, corresponding text form data is obtained by using the name, token and morphological description of the existing network data, dense vectors are obtained through conversion of the pre-training language model and are fused with vector representation of the existing network data, so that the communication principle is fused into the existing network data, the defect that the business fact cannot be reflected by modeling of the traditional operation and maintenance analysis only aiming at single data is overcome, and the problem that data abnormality is not equal to business abnormality is solved; meanwhile, compared with a method for introducing communication principle knowledge into the rules, the method reduces the workload of manually summarizing the rules and improves the generalization capability and expansibility of the technical scheme.

In some embodiments, when performing downstream task fine-tuning, a downstream task-specific neural network may be superimposed on the basis of the aforementioned submodel one and/or submodel two, i.e. the submodels one and two are used as inputs to the neural network framework of the downstream task. The fine tuning is to perform a few rounds of iterative training on the data of the downstream task on the overall framework superimposed with the neural network of the downstream task to obtain a final model of the downstream task.

When downstream task fine tuning is carried out, for text type network operation and maintenance data including but not limited to consultation questions, fault descriptions and the like, corresponding semantic vector representations can be generated through the first sub-model and are used for relevant downstream tasks including but not limited to machine question answering, text classification and the like.

For the current network data of discrete Token sequences (such as CHR), high-dimensional continuous time sequences (such as KPI) and event sequences (such as alarm), the corresponding vector characterization (namely, the numerical characteristics of the communication principle and the data are fused) can be generated by the first sub-model and the second sub-model, and is used for the related downstream tasks, including but not limited to anomaly detection, prediction prevention and the like.

It can be understood that, compared with the existing single sequence data modeling for network operation, the network operation and maintenance model in the embodiment of the present application can implement unified modeling for multiple types of existing network data within a framework of an encoder-decoder, and can effectively model business association relations between different data. Meanwhile, different network structures can be designed according to the characteristics of various types of network operation and maintenance sequence data, so that the internal structure characteristics of different data can be better modeled. In addition, the pre-training domain language model is used as a base of the communication principle knowledge, so that on one hand, a better calculable knowledge model can be provided, on the other hand, the workload of manual summary rules can be reduced, and the generalization capability and the expansibility of the network operation and maintenance model are improved.

In some embodiments, the sub-models in the network operation and maintenance model described above may be adaptively selected, and no constraint is made here. For example, the network operation and maintenance model may be composed of only the sub-model two, or may be composed of the sub-model one and the sub-model two.

Next, a model training method provided by the embodiments of the present application is described based on the above description. It will be appreciated that the method is set forth in the foregoing description, and that some or all of the method can be referred to in the foregoing description.

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating a model training method according to an embodiment of the present disclosure. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 6, the model training method includes:

at S601, N types of sample data are determined, the sample data comprises data generated by a network device in a communication network, and N is larger than or equal to 1. The sample data may be current network data collected from the network and generated by the network device, such as CHR, KPI, alarm, etc. For example, N types of sample data may include: discrete Token sequence data, high-dimensional continuous time sequence data, and event sequence data.

In S602, performing an auto-supervised training on a first target model based on sample data, where the first target model includes a first sub-model, the first sub-model includes a first network and a second network, the first network includes N first sub-networks, each first sub-network is used to perform feature extraction on one type of sample data to obtain N first features, the second network includes N second sub-networks, and each second sub-network is used to predict one type of sample data based on the N first features.

In this embodiment, after the sample data is determined, the pre-established first target model may be subjected to self-supervision training based on the sample data to obtain a model adapted to the sample data. The first object model may include a first sub-model, the first sub-model includes a first network and a second network, the first network includes N types of first sub-networks, each type of first sub-network is used for feature extraction on one type of sample data to obtain N first features, the second network includes N types of second sub-networks, and each type of second sub-network is used for predicting one type of sample data based on the N first features. Illustratively, the first submodel may be the submodel two depicted in fig. 2, the first network may be the encoder in fig. 2, and the second network may be the decoder in fig. 2.

Illustratively, a first sub-network corresponding to the discrete Token sequence data is a transform-based encoding network, and a second sub-network corresponding to the discrete Token sequence data is a transform-based decoding network; the first sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are coded based on a Recurrent Neural Network (RNN), and the second sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are decoded based on the RNN.

Therefore, by uniformly modeling the existing network data of various different types, the business association relation among different data can be effectively modeled, and the difficulty of network operation and maintenance is reduced. Meanwhile, different types of networks for feature extraction and networks for data prediction are designed according to different types of sample data, and modeling can be effectively performed according to the characteristics of the different types of data, so that the internal structural characteristics of the different types of data can be better modeled, and the accuracy of model prediction is improved.

In some embodiments, the first target model may further include a second submodel, the second submodel being obtained based on knowledge training related to the communication field, the second submodel being configured to generate a representation of knowledge related to the communication field included in the N types of sample data to obtain a second feature; wherein each second sub-network is configured to predict a type of sample data based on the second features and the N first features. Therefore, communication principle knowledge is introduced through the second sub-model, the defect that the operation and maintenance analysis only aims at single data to model and cannot reflect business facts is overcome, and the problem that data abnormality is not equal to business abnormality is solved; meanwhile, compared with a method for introducing communication principle knowledge into the rules, the method reduces the workload of manually summarizing the rules and improves the generalization capability and expansibility of the technical scheme. Illustratively, the second submodel may be the submodel one described in fig. 2, and the knowledge related to the communication domain may be the knowledge of the communication principle described above. Illustratively, the second feature and the N first features may be combined into a hidden variable shown in fig. 2.

In some embodiments, when the sample data is discrete Token sequence data and/or event sequence data, obtaining semantic representation of the sample data through a second sub-model to obtain a second feature; and when the sample data is high-dimensional continuous time sequence data, obtaining the semantic representation of the sample data in the preset time through the second submodel to obtain a second characteristic.

In some embodiments, after performing the self-supervised training of the first target model based on the sample data, the method may further comprise: determining a target neural network corresponding to a target task; adding a target neural network into the first target model to obtain a second target model, wherein the input of the target neural network is the output of a first network in the first submodel; and training the second target model based on the sample data corresponding to the target task. Therefore, the pre-trained first target model is utilized, the network layer which is adapted to the target task is superposed on the basis, and the new network structure is subjected to iterative training with a small number of rounds by using specific labeling data, so that the whole network is subjected to small adjustment to adapt to the specific target task, the training amount and labeling sample demand of the second target model are reduced, and the model training efficiency is improved. By way of example, this process may be understood as a "downstream task fine tuning" process described below.

Based on the method in the above embodiment, the embodiment of the present application provides a model training device. Referring to fig. 7, fig. 7 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure. As shown in fig. 7, the model training apparatus 700 includes: a determination unit 710 and a training unit 720. The determining unit 710 is used for determining N types of sample data, the sample data comprises data generated by a network device in a communication network, and N is larger than or equal to 1. The training unit 720 is configured to perform an auto-supervised training on a first target model based on sample data, where the first target model includes a first sub-model, the first sub-model includes a first network and a second network, the first network includes N types of first sub-networks, each type of first sub-network is used to perform feature extraction on one type of sample data to obtain N first features, the second network includes N types of second sub-networks, and each type of second sub-network is used to predict one type of sample data based on the N types of first features.

In some embodiments, the N types of sample data include: discrete Token sequence data, high-dimensional continuous time sequence data and event sequence data, wherein the first network is an encoder, and the second network is a decoder; a first sub-network corresponding to the discrete Token sequence data is a network coded based on a transform, and a second sub-network corresponding to the discrete Token sequence data is a network decoded based on the transform; the first sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are coded based on a Recurrent Neural Network (RNN), and the second sub-networks corresponding to the high-dimensional continuous time series data and the event series data are both networks which are decoded based on the RNN.

In some embodiments, the first target model further includes a second submodel, the second submodel is obtained based on knowledge training related to the communication field, and the second submodel is used for generating a representation of knowledge related to the communication field, which is contained in the N types of sample data, so as to obtain a second feature; wherein each second sub-network is configured to predict a type of sample data based on the second features and the N first features.

In some embodiments, the apparatus further comprises: an application unit (not shown in the figure), configured to determine, after a first target model is obtained through training, a target neural network corresponding to a target task, and add the target neural network to the first target model to obtain a second target model, where an input of the target neural network is an output of the first sub-model; the application unit is further configured to: and training the second target model based on the sample data corresponding to the target task.

It should be understood that the above-mentioned apparatus is used for executing the method in the above-mentioned embodiments, and the implementation principle and technical effect of the apparatus are similar to those described in the above-mentioned method, and the working process of the apparatus may refer to the corresponding process in the above-mentioned method, and is not described herein again.

Based on the method in the foregoing embodiment, an embodiment of the present application provides an electronic device. The electronic device may include: at least one memory for storing a program; at least one processor for executing programs stored in the memory; wherein the processor is configured to perform the method of the above embodiments when the program stored in the memory is executed.

Based on the methods in the foregoing embodiments, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, causes the processor to execute the methods in the foregoing embodiments.

Based on the methods in the foregoing embodiments, the present application provides a computer program product, which is characterized by causing a processor to execute the methods in the foregoing embodiments when the computer program product runs on the processor.

Based on the method in the embodiment, the embodiment of the application also provides a chip. Referring to fig. 8, fig. 8 is a schematic structural diagram of a chip according to an embodiment of the present disclosure. As shown in fig. 8, chip 800 includes one or more processors 801 and interface circuits 802. Optionally, chip 800 may also include bus 803. Wherein:

the processor 801 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 801. The processor 801 described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods, steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The interface circuit 802 may be used for sending or receiving data, instructions or information, and the processor 801 may perform processing by using the data, instructions or other information received by the interface circuit 802, and may send out processing completion information through the interface circuit 802.

Optionally, chip 800 also includes memory, which may include read-only memory and random access memory, and provides operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM).

Optionally, the memory stores executable software modules or data structures, and the processor may perform corresponding operations by calling the operation instructions stored in the memory (the operation instructions may be stored in an operating system).

Optionally, the interface circuit 802 may be used to output the execution result of the processor 801.

It should be noted that the functions corresponding to the processor 801 and the interface circuit 802 may be implemented by hardware design, software design, or a combination of hardware and software, which is not limited herein.

It will be appreciated that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in a processor.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. In addition, in some possible implementation manners, each step in the foregoing embodiments may be selectively executed according to an actual situation, may be partially executed, or may be completely executed, which is not limited herein.

It is understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. The general purpose processor may be a microprocessor, but may be any conventional processor.

The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for convenience of description and distinction and are not intended to limit the scope of the embodiments of the present application.

Claims

1. A method of model training, the method comprising:

determining N types of sample data, wherein the sample data comprises data generated by network equipment in a communication network, and N is more than or equal to 1;

and performing self-supervision training on a first target model based on the sample data, wherein the first target model comprises a first sub-model, the first sub-model comprises a first network and a second network, the first network comprises N first sub-networks, each first sub-network is used for performing feature extraction on one type of sample data to obtain N first features, the second network comprises N second sub-networks, and each second sub-network is used for predicting one type of sample data based on the N first features.

2. The method of claim 1, wherein N types of sample data comprise: discrete Token sequence data, high-dimensional continuous time sequence data and event sequence data, wherein the first network is an encoder, and the second network is a decoder;

a first sub-network corresponding to the discrete Token sequence data is a network encoded based on a transform, and a second sub-network corresponding to the discrete Token sequence data is a network decoded based on the transform;

the high-dimensional continuous time sequence data and the first sub-network corresponding to the event sequence data are both networks which are coded based on a Recurrent Neural Network (RNN), and the high-dimensional continuous time sequence data and the second sub-network corresponding to the event sequence data are both networks which are decoded based on the RNN.

3. The method according to claim 1 or 2, wherein the first target model further comprises a second submodel, the second submodel is obtained based on knowledge training related to communication field, and the second submodel is used for generating representation of knowledge related to communication field contained in the N types of sample data to obtain a second feature;

wherein each of the second sub-networks is configured to predict a type of sample data based on the second feature and the N first features.

4. The method according to claim 3, wherein when the sample data is discrete Token sequence data and/or event sequence data, obtaining semantic representation of the sample data through the second sub-model to obtain the second feature;

and when the sample data is high-dimensional continuous time series data, obtaining the semantic representation of the sample data within preset time through the second submodel to obtain the second characteristic.

5. The method of any of claims 1-4, wherein after performing the self-supervised training of the first target model based on the sample data, the method further comprises:

determining a target neural network corresponding to a target task;

adding the target neural network into the first target model to obtain a second target model, wherein the input of the target neural network is the output of a first network in the first submodel;

and training the second target model based on the sample data with the labels corresponding to the target tasks.

6. A model training apparatus, the apparatus comprising:

a determining unit, configured to determine N types of sample data, where the sample data includes data generated by a network device in a communication network, and N is greater than or equal to 1;

the training unit is used for carrying out self-supervision training on a first target model based on the sample data, wherein the first target model comprises a first sub-model, the first sub-model comprises a first network and a second network, the first network comprises N first sub-networks, each first sub-network is used for carrying out feature extraction on one type of sample data to obtain N first features, the second network comprises N second sub-networks, and each second sub-network is used for predicting one type of sample data based on the N first features.

7. The apparatus of claim 6, wherein N types of sample data comprises: discrete Token sequence data, high-dimensional continuous time sequence data and event sequence data, wherein the first network is an encoder, and the second network is a decoder;

8. The apparatus according to claim 6 or 7, wherein the first target model further comprises a second submodel, the second submodel is trained based on knowledge related to communication domain, and the second submodel is used for generating a representation of knowledge related to communication domain included in the N types of sample data to obtain a second feature;

9. The apparatus according to claim 8, wherein when the sample data is discrete Token sequence data and/or event sequence data, obtaining semantic representation of the sample data by the second sub-model to obtain the second feature;

10. The apparatus of any of claims 6-9, further comprising:

the application unit is used for determining a target neural network corresponding to a target task after the first target model is obtained through training, and adding the target neural network into the first target model to obtain a second target model, wherein the input of the target neural network is the output of a first network in the first submodel;

the application unit is further configured to: and training the second target model based on the sample data with the labels corresponding to the target tasks.

11. An electronic device, comprising:

at least one memory for storing a program;

at least one processor for executing programs stored by the memory;

wherein the processor is configured to perform the method of any of claims 1-5 when the program stored in the memory is executed.

12. A computer-readable storage medium, having stored thereon a computer program which, when run on a processor, causes the processor to carry out the method according to any one of claims 1-5.

13. A computer program product, characterized in that, when run on a processor, causes the processor to execute the method according to any of claims 1-5.

14. A chip comprising at least one processor and an interface;

the at least one processor obtaining program instructions or data through the interface;

the at least one processor is configured to execute the program line instructions to implement the method of any of claims 1-5.