CN113839794B - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113839794B
CN113839794B CN202010514076.1A CN202010514076A CN113839794B CN 113839794 B CN113839794 B CN 113839794B CN 202010514076 A CN202010514076 A CN 202010514076A CN 113839794 B CN113839794 B CN 113839794B
Authority
CN
China
Prior art keywords
user
user flow
label
user traffic
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010514076.1A
Other languages
Chinese (zh)
Other versions
CN113839794A (en
Inventor
李唯源
李琴
朱艳宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010514076.1A priority Critical patent/CN113839794B/en
Publication of CN113839794A publication Critical patent/CN113839794A/en
Application granted granted Critical
Publication of CN113839794B publication Critical patent/CN113839794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds

Abstract

The invention discloses a data processing method, a data processing device, data processing equipment and a storage medium. Wherein the method comprises the following steps: acquiring a first request sent by a Network Function (NF) entity; sending a second request to a Session Management Function (SMF) entity based on the first request; receiving user flow to be processed sent by an SMF entity; determining the type corresponding to the user flow to be processed by utilizing the first identification model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the second identification model is obtained by training a preset identification model by utilizing at least one second user flow carrying a second label; analyzing the type corresponding to the user traffic to be processed to obtain an analysis result; and sends the analysis result to the NF entity.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of wireless technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
In the fifth Generation (5g) mobile communication system, a Network data analysis function (NWDAF) entity is introduced, and the NWDAF entity has a function of analyzing data in a Network. Generally, the NWDAF entity may analyze user data monitored by other network entities in the network. For example, the NWDAF entity may acquire the location information of the terminal from other network entities, and predict the movement track of the terminal using the acquired location information. However, the related art does not relate to a technical solution in which an NWDAF entity analyzes the type of user traffic generated by a terminal.
Disclosure of Invention
In view of this, embodiments of the present invention are intended to provide a data processing method, apparatus, device, and storage medium.
The technical scheme of the embodiment of the invention is realized as follows:
at least one embodiment of the present invention provides a data processing method applied to an NWDAF entity, the method including:
acquiring a first request sent by a Network Function (NF) entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed;
sending a second request to a Session Management Function (SMF) entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
determining the type corresponding to the user flow to be processed by utilizing a first identification model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; a second tag carried by the second user traffic is acquired from an Application Function (AF) entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label;
analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
Further, in accordance with at least one embodiment of the present invention, the method further comprises:
sending a third request to the SMF entity; the third request is used for requesting to acquire at least one first user flow and at least one second user flow; and sending a fourth request to the AF entity; the fourth request is used for requesting to acquire second tags respectively corresponding to the at least one second user traffic;
receiving at least one first user flow and at least one second user flow sent by the SMF entity; receiving second labels respectively corresponding to at least one second user flow sent by the AF entity;
training a preset recognition model by using at least one second user flow carrying a second label to obtain a second recognition model; respectively identifying the at least one first user flow by using the second identification model to obtain first labels respectively corresponding to the at least one first user flow;
and training the second recognition model by using the at least one first user flow carrying the first label and the at least one second user flow carrying the second label to obtain the first recognition model.
Furthermore, according to at least one embodiment of the present invention, the training the second recognition model by using the at least one first user traffic carrying the first label and the at least one second user traffic carrying the second label to obtain the first recognition model includes:
based on a preset principle, at least one third user flow meeting a preset condition is selected from the first set; excluding the at least one third user traffic from the first set; the first set is made up of at least one first user traffic;
training the second recognition model by using the selected at least one third user flow, the corresponding first label and the at least one second user flow carrying the second label to obtain a first recognition model;
and so on until the first set is empty or the iteration number of training the second recognition model is equal to a time threshold value.
Furthermore, according to at least one embodiment of the present invention, the selecting, based on the preset rule, at least one third user traffic meeting the preset condition from the first set includes:
selecting at least one user flow with a preset quantity from the first set;
and taking the selected at least one user flow as at least one third user flow meeting the preset condition.
Furthermore, according to at least one embodiment of the present invention, the selecting, based on a preset rule, at least one third user traffic satisfying a preset condition from the first set includes:
determining a confidence corresponding to each first user traffic in the first set;
sequencing each first user flow in the first set according to the confidence coefficient to obtain a sequencing result;
selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result;
and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
Furthermore, in accordance with at least one embodiment of the present invention, in training the second recognition model once, the method further comprises:
adjusting the confidence threshold value from a first value to a second value;
wherein the first value is greater than the second value.
Furthermore, in accordance with at least one embodiment of the present invention, the determining the confidence level corresponding to the respective first user traffic includes:
searching a second user flow matched with the first label and the generation time from a second set based on the first label and the generation time of the corresponding first user flow; the second set is composed of at least one second user traffic carrying a second label; determining a first numerical value based on one of the found source port and destination port of the second user flow; the first value characterizes a degree of association between the respective first user traffic and second user traffic;
determining an identification probability of the corresponding first user traffic and determining a second numerical value based on the identification probability; the second numerical value characterizes an identification accuracy of a first tag corresponding to the respective first user traffic;
determining a first feature vector of a corresponding first user traffic; determining a second feature vector using the at least one second user traffic; determining the Euclidean distance between the first feature vector and the second feature vector; determining a third value based on the determined euclidean distance; the third value characterizes a similarity of the respective first user traffic to the at least one second user traffic;
and determining the confidence corresponding to the corresponding first user flow based on the first numerical value, the second numerical value, the third numerical value, the first weight corresponding to the first numerical value, the second weight corresponding to the second numerical value, and the third weight corresponding to the third numerical value.
Furthermore, in accordance with at least one embodiment of the present invention, in training the second recognition model once, the method further comprises:
determining a training result for training the second recognition model;
when the accuracy of the training result is smaller than an accuracy threshold, adjusting the ratio of a first weight in the first weight, the second weight and a third weight;
and recalculating the confidence corresponding to the corresponding first user flow by using the adjusted first weight.
Furthermore, according to at least one embodiment of the present invention, the selecting, based on a preset rule, at least one third user traffic satisfying a preset condition from the first set includes:
for each first user traffic in the first set, dividing at least one first user traffic with the same first label into a group to obtain at least one group of user traffic;
and selecting at least one third user flow meeting preset conditions from the at least one group of user flows.
Furthermore, according to at least one embodiment of the present invention, the training the second recognition model by using the selected at least one third user traffic and the corresponding first label, and at least one second user traffic carrying a second label includes:
updating the parameters of the network structure of the second recognition model to obtain an updated second recognition model;
and training the second recognition model by using the selected at least one third user flow and the user flow in the at least one second user flow second set carrying the second label.
At least one embodiment of the present invention provides a data processing apparatus including:
the NF entity comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first request sent by the NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
the first processing unit is used for determining the type corresponding to the user flow to be processed by utilizing a first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; a second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label;
the second processing unit is used for analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
At least one embodiment of the present invention provides a network device, including:
the communication interface is used for acquiring a first request sent by the NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
the processor is used for determining the type corresponding to the user flow to be processed by utilizing the first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; the second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label; the system is also used for analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
At least one embodiment of the invention provides a network device comprising a processor and a memory storing a computer program capable of running on the processor,
wherein the processor is configured to perform the steps of any of the above methods when running the computer program.
At least one embodiment of the invention provides a storage medium having a computer program stored thereon, wherein the computer program is configured to perform the steps of any of the methods described above when executed by a processor.
The embodiment of the invention provides a data processing method, a data processing device, data processing equipment and a storage medium. The NWDAF entity can determine the type corresponding to the user flow to be processed by using the first recognition model, analyze the type corresponding to the user flow to be processed, and train the preset recognition model by using at least one second user flow acquired from the SMF entity and second labels respectively corresponding to the at least one second user flow acquired from the AF entity to obtain a second recognition model; then, at least one first user flow acquired from the SMF entity is combined with the second recognition model to obtain at least one first tag corresponding to each first user flow, so that the number of the second user flows carrying the second tags is expanded by the first user flows carrying the first tags, and the second recognition model is trained by the expanded user flows to obtain the first recognition model.
Drawings
FIG. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating an implementation of a first recognition model obtained by training a second recognition model by an NWDAF entity according to an embodiment of the present invention;
fig. 3a is a schematic flow chart illustrating an implementation process of the NWDAF entity training the preset recognition model by using the second user traffic in the tagged data set to obtain a second recognition model according to the embodiment of the present invention;
fig. 3b is a schematic flow chart illustrating an implementation of the NWDAF entity determining the first label corresponding to the first user traffic according to the embodiment of the present invention;
fig. 4 is a first flowchart illustrating an implementation process of the NWDAF entity selecting at least one third user traffic satisfying a preset condition from the first set according to the embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating an implementation process of the NWDAF entity selecting at least one third user traffic satisfying the preset condition from the first set according to the embodiment of the present invention;
fig. 6 is a schematic flow chart illustrating an implementation process of selecting at least one third user traffic meeting a preset condition from the at least one first user traffic whose confidence is greater than the confidence threshold according to the embodiment of the present invention;
fig. 7 is a schematic flow chart illustrating an implementation process of the NWDAF entity selecting at least one third user traffic satisfying the preset condition from the first set according to the embodiment of the present invention;
FIG. 8 is a schematic flow chart illustrating an implementation of iterative training of a second recognition model by an NWDAF entity in accordance with an embodiment of the present invention;
FIG. 9 is a schematic flow chart illustrating an implementation of an interaction between an NWDAF entity and other entities in a network in accordance with an embodiment of the present invention;
fig. 10 is a schematic flow chart illustrating an implementation of the NWDAF entity obtaining at least one first user traffic and at least one second user traffic from the SMF entity according to the embodiment of the present invention;
FIG. 11 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a network device according to an embodiment of the present invention.
Detailed Description
Before the technical solution of the embodiment of the present invention is introduced, a description is given of a related art.
In the related art, user traffic analysis is a necessary link for an operator to realize intelligent operation and maintenance, and can support the operator to perform customized mobility management, traffic routing regulation, service quality improvement and the like. The user traffic analysis refers to identifying the type of user traffic generated by a certain application program of the terminal, and performing statistics and prediction on the identified type, namely traffic type identification and traffic statistics and prediction. In a fifth generation mobile communication system, an NWDAF entity is introduced, which has the function of analyzing data in the network. Generally, the NWDAF entity may analyze user data monitored by other network entities in the network. For example, the NWDAF entity may acquire the location information of the terminal from other network entities, and predict the movement track of the terminal using the acquired location information. However, the related art does not relate to a technical solution in which an NWDAF entity analyzes the type of user traffic generated by a terminal.
Based on this, in various embodiments of the present invention, a first request sent by the NF entity is obtained; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity; determining the type corresponding to the user flow to be processed by utilizing the user flow to be processed and combining a first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; a second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label; analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
An embodiment of the present invention provides a data processing method applied to an NWDAF entity, and as shown in fig. 1, the method includes:
step 101: acquiring a first request sent by an NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed;
step 102: sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
step 103: determining the type corresponding to the user flow to be processed by utilizing a first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; the second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label;
step 104: analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
Here, in step 101, the User traffic to be processed may refer to a data stream transmitted between an application of the terminal and the network, such as a Transmission Control Protocol (TCP) stream and a User Datagram Protocol (UDP) stream. The Identification information of the user traffic to be processed may include an Identification (ID) corresponding to the user traffic to be processed, an analysis index, and a user or a user group corresponding to the user traffic to be processed. The analysis index can be used for identifying, counting and predicting the type of the user traffic to be processed.
Here, in step 102, the SMF entity may be configured to store user traffic generated by a terminal, so that after the NWDAF entity establishes a Protocol Data Unit (PDU) session with the SMF entity, the NWDAF entity may obtain the user traffic to be processed from PDU session information sent by the SMF entity. The user traffic to be processed may include data such as a packet port number, a packet size, and a packet number corresponding to the user traffic.
Here, in step 103, in order to avoid the problem that the recognition accuracy of the recognition model is not high due to the fact that the recognition model is trained by using a small amount of tagged data in the related art, the tagged data may be extended by using non-tagged data, so as to train the recognition model by using the extended data, in order to enable the tagged data to be extended by using the non-tagged data and reduce the cost for labeling the non-tagged data, the second tags corresponding to at least one second user traffic may be obtained from the AF entity, the preset recognition model may be trained by using at least one second user traffic carrying the second tags to obtain the second recognition model, and then the at least one first user traffic may be recognized by using the second recognition model to obtain the first tags corresponding to the at least one first user traffic, so as to extend the second user traffic carrying the second tags by using the at least one first user traffic carrying the first tags, and improve the recognition accuracy of the second recognition model. The first label may refer to an application type corresponding to the first user traffic; the second label may refer to an application type corresponding to the second user traffic; the first recognition model and the second recognition model can be neural network models, and the neural network models have strong recognition capability and can realize refined recognition.
Here, in step 104, the analyzing the type corresponding to the user traffic to be processed may refer to counting a generation cycle, a port number, and the like of the user traffic corresponding to the type within a certain time range, and predicting a probability that the user traffic corresponding to the type is generated in a next cycle by using the counted data.
In practical application, in order to train the second recognition model to obtain the first recognition model, the NWDAF entity may obtain training data required for training the second recognition model from other network element entities; the training data at least comprises labeled training data. Therefore, the NWDAF entity may obtain at least one first user traffic and at least one second user traffic from the SMF entity, obtain a second label corresponding to each of the at least one second user traffic from the AF entity, form a labeled data set using the at least one second user traffic and the corresponding second label, and form an unlabeled data set using the at least one first user traffic.
Based on this, in an embodiment, the method further comprises:
sending a third request to the SMF entity; the third request is used for requesting to acquire at least one first user flow and at least one second user flow; and sending a fourth request to the AF entity; the fourth request is used for requesting to obtain second tags respectively corresponding to the at least one second user flow;
receiving at least one first user flow and at least one second user flow sent by the SMF entity; receiving second labels respectively corresponding to at least one second user flow sent by the AF entity;
training a preset recognition model by using at least one second user flow carrying a second label to obtain a second recognition model; respectively identifying the at least one first user flow by using the second identification model to obtain first tags respectively corresponding to the at least one first user flow;
and training the second recognition model by using the at least one first user flow carrying the first label and the at least one second user flow carrying the second label to obtain the first recognition model.
Here, at least one first user flow and at least one second user flow sent by the SMF entity are received; after receiving the second labels respectively corresponding to the at least one second user traffic sent by the AF entity, the NWDAF entity may generate an unlabeled data set based on the at least one first user traffic, and generate a labeled data set based on the at least one second user traffic carrying the second labels.
In one example, as shown in fig. 2, a process of training the second recognition model to obtain the first recognition model by the NWDAF entity is described, which includes:
step 201: the NWDAF entity obtains at least one first user flow and at least one second user flow from the SMF entity;
step 202: and the NWDAF entity acquires second labels respectively corresponding to the at least one second user flow from the AF entity.
Step 203: the NWDAF entity generates an unlabeled data set based on the at least one first user traffic and generates a labeled data set based on at least one second user traffic carrying a second label.
Step 204: and the NWDAF entity trains the preset recognition model by using the second user flow in the labeled data set to obtain a second recognition model.
Here, as shown in fig. 3a, the NWDAF entity trains the preset recognition model by using the second user traffic in the tagged data set to obtain a second recognition model, including the following steps:
step 1: aiming at each second user flow in the labeled data set, constructing a feature vector corresponding to the corresponding second user flow by using the relevant information of the corresponding second user flow to obtain at least one feature vector;
the related information of the second user traffic may refer to a message port number, a message size, a message arrival interval time, a message number, and the like corresponding to the user traffic.
Here, the NWDAF entity may further send a request to the AMF entity to obtain TAC information sent by the AMF entity; and the TAC information represents the model of the terminal corresponding to the traffic of the user to be processed. The TAC information may be used to generate a feature vector corresponding to the second user traffic.
Step 2: and taking the at least one feature vector and the corresponding second label as input data, and training a preset recognition model until a classification model reaching the expected recognition accuracy, namely the second recognition model, is obtained.
The Network structure of the preset recognition model may be a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN).
Step 205: the NWDAF entity identifies the first user traffic in the unlabeled data set by using the second identification model to obtain an application type corresponding to the first user traffic; and taking the application type as a first label corresponding to the first user traffic.
Here, as shown in fig. 3b, the NWDAF entity determining the first label corresponding to the first user traffic includes the following steps:
step 1: for each first user flow in the non-tag data set, constructing a feature vector corresponding to the corresponding first user flow by using relevant information of the corresponding first user flow to obtain at least one feature vector;
and 2, step: and identifying the test data by using a second identification model by using the at least one feature vector as test data to obtain application types respectively corresponding to the at least one first user flow, and using the application types as first labels, also called pseudo labels, corresponding to the first user flows.
Step 206: and the NWDAF entity trains the second recognition model by utilizing at least one first user flow carrying the first label and at least one second user flow carrying the second label to obtain a first recognition model.
It should be noted that steps 201 to 206 may be performed by an AI analysis module in the NWDAF entity.
Here, the NWDAF entity trains the second recognition model to obtain the first recognition model, which has the following advantages:
training a second recognition model by using a semi-supervised direct-push classification method to obtain a first recognition model, namely training a preset recognition model by using at least one second user flow acquired from the SMF entity and second labels respectively corresponding to the at least one second user flow acquired from the AF entity to obtain the second recognition model; then, at least one first user flow acquired from the SMF entity is used, and a second recognition model is combined to obtain at least one first tag corresponding to each first user flow, so that the number of second user flows carrying second tags is extended by using the first user flows carrying the first tags, and the second recognition model is trained by using the extended user flows to obtain the first recognition model. The first user traffic is identified through the second identification model to obtain the corresponding first label, that is, the first user traffic without the label is not required to be labeled, and the second user traffic with the label can be expanded by fully utilizing the first user traffic and the corresponding first label, so that the classification accuracy of the second identification model can be improved, and the labeling cost can be reduced.
In practical application, the second recognition model is obtained by training with the labeled data, so that the problem that the recognition accuracy of the first recognition model is not high due to the fact that the accuracy of the label corresponding to the unlabeled data obtained by the second recognition model is not high is solved, and therefore when the second recognition model is trained for one time, a part of data can be selected from the unlabeled data set and the second recognition model is trained by combining the data in the labeled data set. The set of unlabeled data may consist of at least one first user traffic; the tagged data set may be comprised of at least one second user traffic carrying a second tag.
Based on this, in an embodiment, the training the second recognition model by using the at least one first user traffic carrying the first label and the at least one second user traffic carrying the second label to obtain the first recognition model includes:
based on a preset principle, at least one third user flow meeting a preset condition is selected from the first set; excluding the at least one third user traffic from the first set; the first set is made up of at least one first user traffic;
training the second recognition model by using the selected at least one third user flow, the corresponding first label and the at least one second user flow carrying the second label to obtain a first recognition model;
and so on until the first set is empty, or the iteration number of training the second recognition model is equal to the threshold number.
The following describes how to select at least one third user traffic satisfying the preset condition from the first set.
And in the case 1, based on a batch selection principle, at least one third user flow meeting a preset condition is selected from the first set.
In practical application, when a second recognition model is trained once, the input data of the second recognition model may refer to batch data (batch) formed by a plurality of user flows, so that a preset number of first user flows may be selected from the first set, and the second recognition model is trained by using the selected preset number of first user flows, the corresponding first tags, and at least one second user flow carrying the second tags.
Based on this, in an embodiment, the selecting, based on the preset rule, at least one third user traffic meeting the preset condition from the first set includes:
selecting at least one user flow of a preset quantity from the first set;
and taking the selected at least one user flow as at least one third user flow meeting the preset condition.
Here, the preset number may be equal to T; wherein T is equal to integer multiple of N, T and N are positive integers, and N represents the number of user traffic contained in one batch.
In an example, as shown in fig. 4, a process of the NWDAF entity selecting at least one third user traffic satisfying a preset condition from the first set is described, including:
step 401: and training the preset recognition model by using the second user flow in the second set to obtain a second recognition model.
The second set may be referred to as a tagged data set, and may be composed of at least one second user traffic carrying a second tag.
Step 402: identifying the first user traffic in the first set by using the second identification model to obtain an application type corresponding to the first user traffic; and taking the application type as a first label corresponding to the first user traffic.
Step 403: selecting at least one user flow of a preset quantity from the first set; and taking the selected at least one user flow as at least one third user flow meeting the preset condition.
The first set may be referred to as an unlabeled data set, and may be composed of at least one first user traffic.
And training the second recognition model by using the selected at least one third user flow, the corresponding first label and at least one second user flow carrying the second label to obtain the first recognition model.
Here, the NWDAF entity selects at least one third user traffic satisfying the preset condition from the first set, which has the following advantages:
according to a batch selection principle, at least one third user flow meeting preset conditions is selected from a first set, namely a non-tag data set, and the selected at least one third user flow is transferred to a second set, namely a tag data set, so that input data for training a second recognition model can be expanded, and therefore the accuracy of the second recognition model in recognizing subsequent user flows can be improved by utilizing the expanded input data to train the second recognition model.
And 2, based on the confidence coefficient principle, selecting at least one third user flow meeting the preset condition from the first set.
In practical application, when a second recognition model is trained for one time, the more accurate a label corresponding to input data of the second recognition model is, the higher the recognition accuracy of a first recognition model obtained by training the second recognition model is, so that the confidence degree corresponding to at least one first user flow can be determined, at least one first user flow with the confidence degree larger than the confidence degree threshold value is selected from a first set based on the determined confidence degree, and the second recognition model is trained by using at least one first user flow with the confidence degree larger than the confidence degree threshold value, the corresponding first label and at least one second user flow carrying the second label.
Based on this, in an embodiment, the selecting, based on a preset rule, at least one third user traffic meeting a preset condition from the first set includes:
determining a confidence corresponding to each first user traffic in the first set;
sequencing each first user flow in the first set according to the confidence coefficient to obtain a sequencing result;
selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result;
and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
It should be noted that, in the initial stage of training the second recognition model, in order to avoid introducing the first user traffic with the first label being not accurate enough, the confidence threshold is set to be higher, but as the number of times of iterative training performed on the second recognition model increases, the recognition capability and the recognition accuracy of the second recognition model are improved, so that in the subsequent stage of training the second recognition model, a "dynamically updating confidence threshold" strategy may be adopted to improve the iteration speed of training the second recognition model.
Based on this, in an embodiment, when the second recognition model is trained once, the method further includes:
adjusting the confidence threshold value from a first value to a second value;
wherein the first value is greater than the second value.
For example, assuming that the number of iterations for training the second recognition model is denoted by i, the confidence threshold drops by j% when i = i + 1. The adjusted confidence threshold may also be determined according to the number of the first user traffic carrying the first tag, which is increased each time, that is, when the second recognition model is trained once, if the number of the first user traffic carrying the first tag, which is increased each time, is decreased, the confidence threshold may be decreased by j%. The adjusted confidence threshold may also be determined according to the training result of the second recognition model each time, that is, if the recognition accuracy of the second recognition model is higher, the confidence threshold may be decreased by j%.
In practical application, if the confidence corresponding to the first user traffic is higher, the accuracy of the first tag of the first user traffic is higher, so that at least one first user traffic carrying the first tag is used to expand at least one second user traffic carrying the second tag, and the accuracy of training the second recognition model is higher by using the expanded user traffic.
Based on this, in an embodiment, the determining the confidence level corresponding to the corresponding first user traffic includes:
searching second user flow matched with the first label and the generation time from a second set based on the first label and the generation time of the corresponding first user flow; the second set is composed of at least one second user traffic carrying a second label; determining a first numerical value based on one of the found source port and destination port of the second user flow; the first value characterizes a degree of association between the respective first user traffic and second user traffic;
determining an identification probability of the corresponding first user traffic and determining a second numerical value based on the identification probability; the second numerical value characterizes an identification accuracy of a first tag corresponding to the respective first user traffic;
determining a first feature vector of a corresponding first user traffic; determining a second feature vector using the at least one second user traffic; determining Euclidean distance between the first characteristic vector and the second characteristic vector; determining a third value based on the determined euclidean distance; the third value characterizes a similarity of the respective first user traffic to the at least one second user traffic;
and determining the confidence corresponding to the corresponding first user flow based on the first numerical value, the second numerical value and the third numerical value, and the first weight corresponding to the first numerical value, the second weight corresponding to the second numerical value and the third weight corresponding to the third numerical value.
It should be noted that, after the first user traffic carrying the first tag is increased, when the second recognition model is trained for one time in combination with the second user traffic carrying the second tag, the recognition accuracy of the second recognition model may not be improved or reduced, and therefore, a "rollback confidence threshold" strategy may be adopted to improve the recognition accuracy of the second recognition model.
Based on this, in an embodiment, when the second recognition model is trained once, the method further includes:
determining a training result for training the second recognition model;
when the accuracy of the training result is smaller than an accuracy threshold, adjusting the ratio of a first weight in the first weight, the second weight and a third weight;
and recalculating the confidence corresponding to the corresponding first user flow by using the adjusted first weight.
Here, in actual application, the ratio of the second weight in the first weight, the second weight, and the third weight may also be adjusted in combination with actual requirements; or, the ratio of the third weight in the first weight, the second weight and the third weight is adjusted.
The following describes a process of calculating the confidence corresponding to the first user traffic in detail.
Here, the first value may characterize a first dimension, i.e., a degree of association between the first user traffic and the second user traffic, and may be represented by a Flow correlation Score (FS). The second numerical table may characterize a second dimension, i.e., a recognition Probability corresponding to the first user traffic, and may be represented by a recognition Probability Score (PS). The third value may characterize a third dimension, i.e. a similarity between the first user traffic and the second user traffic, and may be represented by a Distance Score (DS).
Specifically, FS may be calculated in one of the following ways:
if the source port of the second user flow is the same as the source port of the first user flow, the FS corresponding to the first user flow is equal to s1; otherwise, the FS corresponding to the first user flow is equal to s2;
if the difference value between the port number corresponding to the source port of the second user flow and the port number corresponding to the source port of the first user flow is equal to a preset threshold value, the FS corresponding to the first user flow is equal to s1; otherwise, the FS corresponding to the first user flow is equal to s2; the preset threshold value is +1 or-1;
if the destination port of the second user traffic is the same as the destination port of the first user traffic, the FS corresponding to the first user traffic is equal to s1; otherwise, the FS corresponding to the first user flow is equal to s2;
if the destination IP corresponding to the destination port of the second user flow is the same as the destination IP corresponding to the destination port of the first user flow, the FS corresponding to the first user flow is equal to s1; otherwise, the FS corresponding to the first user flow is equal to s2;
wherein the value of s1 is greater than the value of s2; the first label of the first user flow is the same as the second label of the second user flow, and the generation time of the first user flow is close to the generation time of the second user flow.
PS can be calculated by:
PS=k1×p;
wherein k1 represents a preset proportionality coefficient, and p represents the recognition probability of the second recognition model for recognizing the first user traffic. Here, when the type of the first user traffic is recognized using the second recognition model, the output layer of the second recognition model may convert the numerical value output by the intermediate layer into a recognition probability.
The DS may be calculated by:
DS=k2/d(x,x 0 );
wherein k2 represents a predetermined scale factor, d (x, x) 0 ) Representing a feature vector x and a feature vector x 0 The Euclidean distance between; the characteristic vector x represents a characteristic vector corresponding to the first user flow; feature vector x 0 The feature vector determined by using at least one second user traffic with the same second label in the at least one second user traffic may be specifically an average value of feature vectors corresponding to the at least one second user traffic with the same second label.
The confidence level corresponding to the first user traffic may be calculated according to equation (1).
CC=θ 1 ×FS+θ 2 ×PS+θ 3 ×DS (1)
Wherein CC represents the confidence corresponding to the first user flow, FS represents a first numerical value, PS represents a second numerical value, DS represents a third numerical value, and theta 123 =1。
In an example, as shown in fig. 5, a process for the NWDAF entity to select at least one third user traffic satisfying a preset condition from the first set is described, which includes:
step 501: and training the preset recognition model by using the second user flow in the second set to obtain a second recognition model.
The second set may be referred to as a tagged data set, and may be composed of at least one second user traffic carrying a second tag.
Step 502: identifying the first user traffic in the first set by using the second identification model to obtain an application type corresponding to the first user traffic; and taking the application type as a first label corresponding to the first user traffic.
Step 503: determining a confidence corresponding to each first user flow in the first set;
the first set may be referred to as an unlabeled data set, and may be composed of at least one first user traffic.
Step 504: sequencing each first user flow in the first set according to the confidence coefficient to obtain a sequencing result; selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result; and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
And training the second recognition model by using the selected at least one third user flow and the corresponding first label and at least one second user flow to obtain the first recognition model.
Here, as shown in fig. 6, selecting at least one third user flow satisfying a preset condition from the at least one first user flow whose confidence is greater than the confidence threshold includes the following steps:
step 601: descending and sorting each first user flow in the first set according to the confidence coefficient to obtain a sorting result;
step 602: determining a confidence threshold according to the iteration number of the current training of the second recognition model;
step 603: selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result;
step 604: if the number of the at least one first user traffic with the confidence coefficient larger than the confidence coefficient threshold value is larger than or equal to the preset number, executing step 605; otherwise, step 606 is performed.
Step 605: and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
Here, assuming that the number of at least one first user traffic whose confidence is greater than the confidence threshold is represented by M and the preset number is represented by N, if M is greater than or equal to N, the number of at least one third user traffic satisfying the preset condition is calculated according to equation (2).
L=M-(M mod N) (2)
Wherein, L represents the number of at least one third user flow meeting the preset condition, and M mod N represents the remainder obtained by quotient of M to N; n characterizes the number of user traffic contained by a batch.
Step 606: and performing upsampling on at least one second user flow in the second set, and obtaining at least one third user flow meeting a preset condition by using the user flow quantity obtained by the upsampling and at least one first user flow of which the confidence coefficient is greater than a confidence coefficient threshold value.
Here, assuming that the number of at least one first user traffic whose confidence is greater than the confidence threshold is represented by M and the preset number is represented by N, if M is less than N, L is calculated according to equation (3).
L=M+E (3)
Where E = N-M, may represent the number of user traffic upsampled from at least one second user traffic in the second set.
Here, the NWDAF entity selects at least one third user traffic satisfying the preset condition from the first set, which has the following advantages:
based on a confidence principle, at least one third user flow meeting preset conditions is selected from the first set, namely the non-labeled data set, and the selected at least one third user flow is transferred to the second set, namely the labeled data set. In addition, the confidence of the first user traffic is calculated from the dimensions such as the correlation degree between the user traffic, the identification probability corresponding to the user traffic, the Euclidean distance between the user traffic and the like by considering the characteristics of the user traffic, so that the accuracy of the calculated confidence can be guaranteed to the maximum extent.
And 3, based on a category balancing principle, selecting at least one third user flow meeting a preset condition from the first set.
In practical application, when a second recognition model is trained once, input data of the second recognition model may be composed of a plurality of user traffic of different types, and therefore, a first user traffic of different types may be selected from a first set, and the second recognition model is trained by using the selected first user traffic, a corresponding first tag, and at least one second user traffic carrying a second tag.
Based on this, in an embodiment, the selecting, based on a preset rule, at least one third user traffic meeting a preset condition from the first set includes:
for each first user traffic in the first set, dividing at least one first user traffic with the same first label into a group to obtain at least one group of user traffic;
and selecting at least one third user flow meeting preset conditions from the at least one group of user flows.
Here, a preset number of user traffics may be selected from each group of user traffics in the at least one group of user traffics, so as to obtain at least one third user traffic satisfying a preset condition.
Here, if the first labels corresponding to the plurality of first user flows are the same, the application types corresponding to the plurality of first user flows are the same, and assuming that the first set includes P types of application types of first user flows, the number of at least one third user flow satisfying the preset condition is calculated according to formula (3), and the number of user flows selected from user flows of different application types in each group is calculated according to formula (4).
L=N×m (3)
R=(N×m)/P (4)
Wherein L represents the number of at least one third user traffic satisfying a preset condition; n represents the number of user traffic contained in a batch, and m is a positive integer. R represents the number of user traffic selected from user traffic of different application types in each group; p represents first user traffic containing P application types in the first set.
For example, suppose that, for each first user traffic in the first set, at least one first user traffic with the same first label is divided into one group, resulting in 3 groups of user traffic, the first group of user traffic includes 12 user traffic, the second group of user traffic includes 54 user traffic, and the third group of user traffic includes 90 user traffic, assuming N =12, m =1, so that, when the second identification model is trained for the first time, L = N × m =12 user traffic is selected from three groups of user traffic with different application types, specifically, R = (N × m)/P =12 × 1/3=4 user traffic may be selected from the first group of user traffic, R = (N × m)/P =12 × 1/3=4 user traffic may be selected from the second group of user traffic, and R = (N × m)/P =12 × 1/3=4 user traffic may be selected from the third group of user traffic. When the second recognition model is trained for the second time, L = N × m =12 user traffics are selected from the three groups of user traffics with different application types, specifically, R = (N × m)/P =12 × 1/3=4 user traffics may be selected from the remaining user traffics of the first group of user traffics, R = (N × m)/P =12 × 1/3=4 user traffics may be selected from the remaining user traffics of the second group of user traffics, and R = (N × m)/P =12 × 1/3=4user traffics may be selected from the remaining user traffics of the third group of user traffics until the first set is empty or the number of times of iterative training for the second recognition model is equal to the number threshold.
In an example, as shown in fig. 7, a process of the NWDAF entity selecting at least one third user traffic satisfying a preset condition from the first set is described, including:
step 701: and training the preset recognition model by using the second user flow in the second set to obtain a second recognition model.
The second set may be referred to as a tagged data set, and may be composed of at least one second user traffic carrying a second tag.
Step 702: identifying the first user flow in the first set by using the second identification model to obtain an application type corresponding to the first user flow; and taking the application type as a first label corresponding to the first user traffic.
Step 703: dividing at least one first user flow with the same first label into one group aiming at each first user flow in the first set to obtain at least one group of user flows; selecting at least one third user flow meeting preset conditions from the at least one group of user flows;
the first set may be referred to as an unlabeled data set, and may be composed of at least one first user traffic.
And training the second recognition model by using the selected at least one third user flow and the corresponding first label and at least one second user flow to obtain the first recognition model.
Here, the NWDAF entity selects at least one third user traffic satisfying the preset condition from the first set, which has the following advantages:
based on a balance principle, at least one third user flow meeting preset conditions is selected from the first set, namely the non-labeled data set, and the selected at least one third user flow is transferred to the second set, namely the labeled data set. In addition, the confidence coefficient threshold value can be dynamically updated according to the iteration times of training the second recognition model, so that the iteration speed can be increased, and the noise generated in the process of training the second recognition model can be reduced, thereby reducing the error of the classification result output by the second recognition model.
The following describes how to train the second recognition model.
In case 1, when a second recognition model is trained once, the network structure of the second recognition model is not changed.
In actual application, static iterative training is performed on the second recognition model based on incremental learning, that is, the network structure of the second recognition model is not changed, and the input data volume of the second recognition model increases with the increase of the number of iterations.
In case 2, when a second recognition model is trained once, the network structure of the second recognition model changes.
In practical application, based on reinforcement learning, dynamic iterative training is performed on the second recognition model, that is, the network structure of the second recognition model changes, and the input data volume of the second recognition model increases with the increase of the iteration number.
Based on this, in an embodiment, the training the second recognition model by using the selected at least one third user traffic and the corresponding first label, and at least one second user traffic carrying a second label includes:
updating the parameters of the network structure of the second recognition model to obtain an updated second recognition model;
and training the second recognition model by using the selected at least one third user flow and the user flow in the at least one second user flow second set carrying the second label.
The parameters of the network structure of the second recognition model may include a learning rate, a number of layers, a number of nodes, and the like.
In one example, as shown in fig. 8, a process for iterative training of a second recognition model by an NWDAF entity is described, comprising:
step 801: judging whether the iteration training is the first iteration training; when it is the first iterative training, step 802 is executed: otherwise, when it is the R-th iterative training, step 803 is executed;
wherein R is an integer greater than 1.
Step 802: and training the preset recognition model by using at least one second user flow carrying a second label to obtain a second recognition model.
Step 803: judging whether the parameters of the network structure of the second recognition model need to be updated or not; when the parameters of the network structure of the second recognition model need to be updated, executing step 804; otherwise, go to step 805;
step 804: updating the parameters of the network structure of the second recognition model to obtain an updated second recognition model; and training the second recognition model by using the selected at least one third user flow and the user flow in the at least one second user flow second set carrying the second label.
Step 805: keeping the parameters of the network structure of the second recognition model unchanged; and training the second recognition model by using the selected at least one third user flow and the user flow in the at least one second user flow second set carrying the second label.
Here, the NWDAF entity performs iterative training on the second recognition model, which has the following advantages:
the second recognition model may be subjected to static iterative training based on incremental learning; the second recognition model can be dynamically and iteratively trained based on reinforcement learning, and the second recognition model can be ensured to adapt to the increase of the input data volume by changing the network structure of the second recognition model, so that the training accuracy is improved.
In one example, as shown in fig. 9, an interaction procedure between an NWDAF entity and other entities in a network is described, including:
step 901: the NF entity sends a user flow analysis request or a subscription request to the NWDAF entity;
here, the user traffic analysis request or the subscription request corresponds to the first request, where the first request is used to request analysis on the type of user traffic to be processed; and the first request carries the ID corresponding to the user traffic to be processed, the analysis index and the user or user group corresponding to the user traffic to be processed.
Step 902: the NWDAF entity acquires at least one first user flow and at least one second user flow from the SMF entity; and acquiring a corresponding second label of the at least one second user traffic from the AF entity.
Here, as shown in fig. 10, the NWDAF entity obtaining at least one first user traffic and at least one second user traffic from the SMF entity includes the following steps:
step 1a: the NWDAF entity establishes a PDU session with the SMF entity and sends a data acquisition subscription request to the SMF entity;
step 1b: receiving a data acquisition notification message sent by an SMF entity; the data acquisition notification message comprises at least one first user flow and at least one second user flow;
here, the NWDAF entity obtaining the corresponding second label of the at least one second user traffic from the AF entity, comprises the steps of:
step 2a: the NWDAF entity sends a data acquisition subscription request to the AF entity;
and step 2b: the NWDAF entity receives a data acquisition notification message sent by the AF entity; the data acquisition notification message includes the application type of the at least one second user traffic, and the acquired application type is used as a second label corresponding to the second user traffic.
Here, the NWDAF entity may also send a data acquisition subscription request to the AMF entity; receiving TAC information sent by an AMF entity; and the TAC information is used for generating a feature vector corresponding to the second user flow.
Step 903: the NWDAF entity identifies the type of the user traffic to be processed by utilizing a first identification model; and analyzing the type corresponding to the user flow to be processed to obtain an analysis result.
The first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; the second label carried by the second user traffic is acquired from an Application Function (AF) entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second recognition model is obtained by training a preset recognition model by utilizing the at least one second user flow carrying the second label.
Here, the first recognition model may be classified in one of the following ways:
dividing user traffic with the same quintuple into one type;
dividing user flow carrying the same application layer protocol into a type; the application layer Protocol comprises a text Transfer Protocol (FTP) and a Peer-to-Peer (P2P) network Protocol;
dividing user traffic identical to the application programs into a type; the application programs comprise video application programs and browser webpages;
dividing the user traffic of which the application program is the specific application program into a type;
dividing user traffic with the same operation into a type; the operation comprises a downloading operation, a logging operation and a payment operation.
Here, the analyzing, by the NWDAF entity, the type corresponding to the to-be-processed user traffic may include: and determining the proportion of the user traffic to be processed in the same type of user traffic. The obtained analysis result can assist operators to efficiently manage and control the network, reasonably plan the network, analyze the flow of the specified application type, know the inflow and outflow information of the network, quickly position the network when the network fails, and allocate the flow when the network is congested, so that users can obtain better service quality.
Step 904: and the NWDAF entity sends the analysis result to the NF entity.
Here, after step 904, the NWDAF entity may exit the process, or may continue to acquire the next user traffic to be processed from the SMF entity and the AF entity, and complete the type identification and analysis of the user traffic to be processed.
By adopting the technical scheme of the embodiment of the invention, the NWDAF entity can determine the type corresponding to the user flow to be processed by utilizing the first recognition model and analyze the type corresponding to the user flow to be processed, and can train the preset recognition model by utilizing at least one second user flow acquired from the SMF entity and at least one second label respectively corresponding to the second user flow acquired from the AF entity to obtain a second recognition model; then, at least one first user flow acquired from the SMF entity is combined with the second recognition model to obtain at least one first tag corresponding to each first user flow, so that the number of the second user flows carrying the second tags is expanded by the first user flows carrying the first tags, and the second recognition model is trained by the expanded user flows to obtain the first recognition model.
In order to implement the data processing method according to the embodiment of the present invention, an embodiment of the present invention further provides a data processing apparatus, which is disposed on the NWDAF entity, and fig. 11 is a schematic structural diagram of the data processing apparatus according to the embodiment of the present invention; as shown in fig. 11, the apparatus includes:
an obtaining unit 111, configured to obtain a first request sent by an NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; and based on the first request, sending a second request to the SMF entity; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
the first processing unit 112 is configured to determine, by using a first recognition model, a type corresponding to the user traffic to be processed; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; the second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label;
the second processing unit 113 is configured to analyze a type corresponding to the user traffic to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
In one embodiment, the apparatus further comprises:
a sending unit, configured to send a third request to the SMF entity; the third request is used for requesting to acquire at least one first user flow and at least one second user flow; and sending a fourth request to the AF entity; the fourth request is used for requesting to acquire second tags respectively corresponding to the at least one second user traffic;
receiving at least one first user flow and at least one second user flow sent by the SMF entity; receiving second labels respectively corresponding to at least one second user flow sent by the AF entity;
training a preset recognition model by using at least one second user flow carrying a second label to obtain a second recognition model; respectively identifying the at least one first user flow by using the second identification model to obtain first labels respectively corresponding to the at least one first user flow;
and training the second recognition model by using the at least one first user flow carrying the first label and the at least one second user flow carrying the second label to obtain the first recognition model.
In an embodiment, the first processing unit 112 is specifically configured to:
based on a preset principle, at least one third user flow meeting a preset condition is selected from the first set; excluding the at least one third user traffic from the first set; the first set is made up of at least one first user traffic;
training the second recognition model by using the selected at least one third user flow, the corresponding first label and the at least one second user flow carrying the second label to obtain a first recognition model;
and so on until the first set is empty, or the iteration number of training the second recognition model is equal to the threshold number.
In an embodiment, the first processing unit 112 is specifically configured to:
selecting at least one user flow of a preset quantity from the first set;
and taking the selected at least one user flow as at least one third user flow meeting the preset condition.
In an embodiment, the selecting, based on a preset rule, at least one third user traffic meeting a preset condition from the first set includes:
determining a confidence corresponding to each first user flow in the first set;
sequencing each first user flow in the first set according to the confidence coefficient to obtain a sequencing result;
selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result;
and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
In an embodiment, the first processing unit 112 is specifically configured to:
when the second recognition model is trained for one time, the confidence coefficient threshold value is adjusted from a first value to a second value;
wherein the first value is greater than the second value.
In an embodiment, the first processing unit 112 is specifically configured to:
searching a second user flow matched with the first label and the generation time from a second set based on the first label and the generation time of the corresponding first user flow; the second set is composed of at least one second user traffic carrying a second label; determining a first numerical value based on one of the found source port and destination port of the second user flow; the first value characterizes a degree of association between the respective first user traffic and second user traffic;
determining an identification probability of the corresponding first user traffic and determining a second numerical value based on the identification probability; the second numerical value characterizes an identification accuracy of a first tag corresponding to the respective first user traffic;
determining a first feature vector of a corresponding first user traffic; determining a second feature vector using the at least one second user traffic; determining the Euclidean distance between the first feature vector and the second feature vector; determining a third value based on the determined euclidean distance; the third value characterizes a similarity of the respective first user traffic and the at least one second user traffic;
and determining the confidence corresponding to the corresponding first user flow based on the first numerical value, the second numerical value and the third numerical value, and the first weight corresponding to the first numerical value, the second weight corresponding to the second numerical value and the third weight corresponding to the third numerical value.
In an embodiment, the first processing unit 112 is specifically configured to:
when the second recognition model is trained for one time, determining a training result for training the second recognition model;
when the accuracy of the training result is smaller than an accuracy threshold, adjusting the proportion of a first weight in the first weight, the second weight and the third weight;
and recalculating the confidence corresponding to the corresponding first user flow by using the adjusted first weight.
In an embodiment, the first processing unit 112 is specifically configured to:
for each first user traffic in the first set, dividing at least one first user traffic with the same first label into a group to obtain at least one group of user traffic;
and selecting at least one third user flow meeting preset conditions from the at least one group of user flows.
In an embodiment, the first processing unit 112 is specifically configured to:
updating the parameters of the network structure of the second recognition model to obtain an updated second recognition model;
and training the second recognition model by using the selected at least one third user flow and the user flow in the at least one second user flow second set carrying the second label.
In practical application, the obtaining unit 111 may be implemented by a communication interface in a data processing apparatus; the first processing unit 112 and the second processing unit 113 may be implemented by a processor in a data processing apparatus in combination with a communication interface.
It should be noted that: in the data processing apparatus provided in the above embodiment, when performing data processing, only the division of each program module is exemplified, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the data processing apparatus and the data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments, and are not described herein again.
An embodiment of the present invention further provides a network device, as shown in fig. 12, including:
a communication interface 121 capable of performing information interaction with other devices;
and the processor 122 is connected with the communication interface 121 and is configured to execute the method provided by one or more technical solutions of the intelligent device side when running the computer program. And the computer program is stored on the memory 123.
It should be noted that: the specific processing procedures of the processor 122 and the communication interface 121 are detailed in the method embodiment, and are not described herein again.
Of course, in practice, the various components in the terminal 120 are coupled together by a bus system 124. It will be appreciated that the bus system 124 is used to enable communications among the components. The bus system 124 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 124 in FIG. 12.
Memory 123 in the embodiments of the present application is used to store various types of data to support the operation of network device 120. Examples of such data include: any computer program for operating on network device 120.
The method disclosed in the embodiment of the present application may be applied to the processor 122, or implemented by the processor 122. The processor 122 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 122. The Processor 122 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The processor 122 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 123, and the processor 122 reads the information in the memory 123 and performs the steps of the foregoing method in combination with its hardware.
In an exemplary embodiment, the network Device 120 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.
It will be appreciated that the memory (memory 123) described herein can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a magnetic random access Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), synchronous Static Random Access Memory (SSRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), synchronous Dynamic Random Access Memory (SLDRAM), direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present invention further provides a storage medium, specifically a computer storage medium, which is a computer readable storage medium, for example, a memory 123 storing a computer program, where the computer program is executable by a processor 122 of a network device 120 to perform the steps described in the foregoing network device side method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In addition, the technical solutions described in the embodiments of the present invention may be arbitrarily combined without conflict.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (14)

1. A data processing method applied to a network data analysis function, NWDAF, entity, the method comprising:
acquiring a first request sent by a network function NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed;
sending a second request to a Session Management Function (SMF) entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
determining the type corresponding to the user flow to be processed by utilizing a first identification model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; a second label carried by the second user traffic is acquired from an Application Function (AF) entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second identification model is obtained by training a preset identification model by utilizing the at least one second user flow carrying the second label;
analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
2. The method of claim 1, further comprising:
sending a third request to the SMF entity; the third request is used for requesting to acquire at least one first user flow and at least one second user flow; and sending a fourth request to the AF entity; the fourth request is used for requesting to acquire a second label corresponding to the at least one second user flow;
receiving at least one first user flow and at least one second user flow sent by the SMF entity; receiving a second label corresponding to at least one second user traffic sent by the AF entity;
training a preset recognition model by using at least one second user flow carrying a second label to obtain a second recognition model; identifying the at least one first user flow by using the second identification model to obtain a first label corresponding to the at least one first user flow;
and training the second recognition model by using the at least one first user flow carrying the first label and the at least one second user flow carrying the second label to obtain the first recognition model.
3. The method according to claim 2, wherein the training the second recognition model by using the at least one first user traffic carrying a first label and the at least one second user traffic carrying a second label to obtain the first recognition model comprises:
based on a preset principle, at least one third user flow meeting a preset condition is selected from the first set; excluding the at least one third user traffic from the first set; the first set is made up of at least one first user traffic;
training the second recognition model by using the selected at least one third user flow, the corresponding first label and the at least one second user flow carrying the second label to obtain a first recognition model;
and so on until the first set is empty, or the iteration number of training the second recognition model is equal to the threshold number.
4. The method according to claim 3, wherein the selecting at least one third user traffic satisfying a preset condition from the first set based on a preset rule comprises:
selecting at least one user flow with a preset quantity from the first set;
and taking the selected at least one user flow as at least one third user flow meeting the preset condition.
5. The method according to claim 3, wherein the selecting at least one third user traffic satisfying a preset condition from the first set based on a preset rule comprises:
determining a confidence corresponding to each first user flow in the first set;
sequencing each first user flow in the first set according to the confidence coefficient to obtain a sequencing result;
selecting at least one first user flow with the confidence degree larger than a confidence degree threshold value from the sequencing result;
and selecting at least one third user flow meeting a preset condition from the at least one first user flow with the confidence coefficient larger than the confidence coefficient threshold value.
6. The method of claim 5, wherein in training the second recognition model once, the method further comprises:
adjusting the confidence threshold value from a first value to a second value;
wherein the first value is greater than the second value.
7. The method of claim 5, wherein determining the confidence level for the respective first user traffic comprises:
searching second user flow matched with the first label and the generation time from a second set based on the first label and the generation time of the corresponding first user flow; the second set is composed of at least one second user traffic carrying a second label; determining a first numerical value based on one of the found source port and destination port of the second user flow; the first value characterizes a degree of association between the respective first user traffic and second user traffic;
determining an identification probability of the corresponding first user traffic and determining a second numerical value based on the identification probability; the second numerical value characterizes an identification accuracy of a first tag corresponding to the respective first user traffic;
determining a first feature vector of a corresponding first user traffic; determining a second feature vector using the at least one second user traffic; determining the Euclidean distance between the first feature vector and the second feature vector; determining a third value based on the determined euclidean distance; the third value characterizes a similarity of the respective first user traffic to the at least one second user traffic;
and determining the confidence corresponding to the corresponding first user flow based on the first numerical value, the second numerical value, the third numerical value, the first weight corresponding to the first numerical value, the second weight corresponding to the second numerical value, and the third weight corresponding to the third numerical value.
8. The method of claim 7, wherein in training the second recognition model once, the method further comprises:
determining a training result for training the second recognition model;
when the accuracy of the training result is smaller than an accuracy threshold, adjusting the ratio of a first weight in the first weight, the second weight and a third weight;
and recalculating the confidence corresponding to the corresponding first user flow by using the adjusted first weight.
9. The method according to claim 3, wherein the selecting at least one third user traffic satisfying a preset condition from the first set based on a preset rule comprises:
for each first user traffic in the first set, dividing at least one first user traffic with the same first label into a group to obtain at least one group of user traffic;
and selecting at least one third user flow meeting preset conditions from the at least one group of user flows.
10. The method according to any one of claims 3 to 9, wherein the training of the second recognition model by using the selected at least one third user traffic and the corresponding first label, and at least one second user traffic carrying a second label comprises:
updating the parameters of the network structure of the second recognition model to obtain an updated second recognition model;
and training the second recognition model by using the selected at least one third user flow and the at least one second user flow carrying the second label.
11. A data processing apparatus, characterized by comprising:
the acquiring unit is used for acquiring a first request sent by the NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
the first processing unit is used for determining the type corresponding to the user flow to be processed by utilizing a first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; the second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second recognition model is obtained by training a preset recognition model by using the at least one second user flow carrying the second label;
the second processing unit is used for analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
12. A network device, comprising:
the communication interface is used for acquiring a first request sent by the NF entity; the first request is used for requesting the analysis of the type of the user traffic to be processed; the first request carries identification information of the user traffic to be processed; sending a second request to the SMF entity based on the first request; the second request is used for requesting to acquire the user flow to be processed; receiving the user flow to be processed sent by the SMF entity;
the processor is used for determining the type corresponding to the user flow to be processed by utilizing the first recognition model; the first identification model is obtained by utilizing at least one first user flow carrying a first label and at least one second user flow carrying a second label to train a second identification model at least once; the first label carried by the first user flow is obtained by identifying the first user flow by using the second identification model; a second label carried by the second user traffic is acquired from an AF entity; the first user traffic and the second user traffic are obtained from an SMF entity; the second recognition model is obtained by training a preset recognition model by using the at least one second user flow carrying the second label; the device is also used for analyzing the type corresponding to the user flow to be processed to obtain an analysis result; and sending the analysis result to the NF entity.
13. A network device comprising a processor and a memory for storing a computer program capable of running on the processor,
wherein the processor is adapted to perform the steps of the method of any one of claims 1 to 10 when running the computer program.
14. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 10.
CN202010514076.1A 2020-06-08 2020-06-08 Data processing method, device, equipment and storage medium Active CN113839794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010514076.1A CN113839794B (en) 2020-06-08 2020-06-08 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010514076.1A CN113839794B (en) 2020-06-08 2020-06-08 Data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113839794A CN113839794A (en) 2021-12-24
CN113839794B true CN113839794B (en) 2023-03-31

Family

ID=78963685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010514076.1A Active CN113839794B (en) 2020-06-08 2020-06-08 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113839794B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674831A (en) * 2018-06-14 2020-01-10 佛山市顺德区美的电热电器制造有限公司 Data processing method and device and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR112017021248A2 (en) * 2015-04-03 2018-06-26 Huawei Tech Co Ltd network management system and method, domain role entity, and operations support system.
US10986516B2 (en) * 2017-03-10 2021-04-20 Huawei Technologies Co., Ltd. System and method of network policy optimization

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674831A (en) * 2018-06-14 2020-01-10 佛山市顺德区美的电热电器制造有限公司 Data processing method and device and computer readable storage medium

Also Published As

Publication number Publication date
CN113839794A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN115022176B (en) NWDAF network element selection method and device, electronic equipment and readable storage medium
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
US11729286B2 (en) Feature-based network embedding
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
Lee et al. Performance analysis of local exit for distributed deep neural networks over cloud and edge computing
CN114422267B (en) Flow detection method, device, equipment and medium
CN112115372B (en) Parking lot recommendation method and device
CN111563560A (en) Data stream classification method and device based on time sequence feature learning
CN113839794B (en) Data processing method, device, equipment and storage medium
WO2021169478A1 (en) Fusion training method and apparatus for neural network model
CN112861894A (en) Data stream classification method, device and system
CN115695280A (en) Routing method and device based on edge node, electronic equipment and storage medium
CN114071448B (en) Data transmission method, related network node and storage medium
CN114238658A (en) Link prediction method and device of time sequence knowledge graph and electronic equipment
CN114422453A (en) Method, device and storage medium for online planning of time-sensitive streams
JP2022103149A (en) Image processing method and computing device
CN111209100B (en) Service processing and data source determining method
CN113705683A (en) Recommendation model training method and device, electronic equipment and storage medium
WO2023052827A1 (en) Processing a sequence of data items
CN113051400A (en) Method and device for determining annotation data, readable medium and electronic equipment
CN111526055A (en) Route planning method and device and electronic equipment
CN114550453B (en) Model training method, model determining method, electronic device and computer storage medium
CN113572627B (en) Data processing method and data processing device
CN115225518B (en) Base station traffic processing method and device and network equipment
CN111582482B (en) Method, apparatus, device and medium for generating network model information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant