CN117609814A - SD-WAN intelligent flow scheduling optimization method and system - Google Patents

SD-WAN intelligent flow scheduling optimization method and system Download PDF

Info

Publication number
CN117609814A
CN117609814A CN202410095264.3A CN202410095264A CN117609814A CN 117609814 A CN117609814 A CN 117609814A CN 202410095264 A CN202410095264 A CN 202410095264A CN 117609814 A CN117609814 A CN 117609814A
Authority
CN
China
Prior art keywords
dimension
kth
clustering
data
kth dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410095264.3A
Other languages
Chinese (zh)
Other versions
CN117609814B (en
Inventor
韩伟
李碧妍
易夕冬
张天松
肖连菊
翁祖逖
冯康
高宝军
黄展鹏
何烈军
刘文佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Aofei Data Technology Co ltd
Original Assignee
Guangdong Aofei Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Aofei Data Technology Co ltd filed Critical Guangdong Aofei Data Technology Co ltd
Priority to CN202410095264.3A priority Critical patent/CN117609814B/en
Publication of CN117609814A publication Critical patent/CN117609814A/en
Application granted granted Critical
Publication of CN117609814B publication Critical patent/CN117609814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of intelligent flow scheduling, in particular to an SD-WAN intelligent flow scheduling optimization method and system. The method comprises the following steps: acquiring a sequence corresponding to each dimension in the SD-WAN, and clustering data in the sequence corresponding to each dimension to acquire a clustering result corresponding to each dimension in each clustering mode; according to the difference of data values between the clustering results corresponding to each dimension in each clustering mode and the clustering center, the fluctuation condition of data in the clustering results corresponding to each dimension in each clustering mode, and the difference between the clustering results of each dimension and the clustering results of other dimensions, the influence evaluation value of each dimension on other features is obtained, covariance between the sequence corresponding to each dimension and the sequence corresponding to other dimensions is corrected, corrected covariance is obtained, and traffic is scheduled. The invention improves the accuracy and the credibility of intelligent flow dispatching.

Description

SD-WAN intelligent flow scheduling optimization method and system
Technical Field
The invention relates to the technical field of intelligent flow scheduling, in particular to an SD-WAN intelligent flow scheduling optimization method and system.
Background
With the continuous expansion of enterprise network scale and the increase of network applications, a software defined wide area network (SD-WAN) is a new generation network architecture, and gradually becomes an important component of enterprise networks. The SD-WAN provides a more flexible and efficient network connection mode through centralized control and intelligent routing, so that enterprises can better manage and optimize network traffic. However, in complex network environments, dynamic changes and uncertainties in network traffic often lead to fluctuations in network performance and the occurrence of anomalies. Conventional traffic scheduling methods mostly rely on predefined rules and static parameters, which may be difficult to adapt to dynamic changes of the network environment. Meanwhile, the traditional flow scheduling method generally fails to fully utilize potential information in flow data, so that the potential of performance optimization is not fully utilized.
In order to solve these problems, the flow data is usually subjected to deep data analysis, so as to extract a multidimensional feature vector of the flow data, and the multidimensional feature vector is subjected to dimension reduction by a dimension reduction method, wherein a commonly used dimension reduction method is PCA dimension reduction, and the dimension reduction method can find key features in the multidimensional feature vector, so that an auxiliary system can better understand and adapt to dynamic changes of network states. Conventional PCA dimension reduction algorithms typically choose to use variance-interpretation-rate eigenvalues for parameter selection, but when combined with traffic scheduling and adaptively adjusting the PCA dimension-reduction parameters, since the multidimensional eigenvectors of the traffic data are obtained through traffic depth analysis, there may be strong correlation between the multidimensional eigenvectors, and too strong correlation between the vector eigenvectors may result in that the found principal component with high variance may not represent the direction in which the most important information is found. The feature of strong correlation may cause the direction of the principal component to be not clear enough, thereby reducing the interpretation of the principal component, and further making the reliability of SD-WAN intelligent traffic scheduling lower.
Disclosure of Invention
In order to solve the problem of lower accuracy of analysis results in the process of analyzing SD-WAN flow data in the existing method, and further the problem of lower reliability of SD-WAN flow scheduling, the invention aims to provide an SD-WAN intelligent flow scheduling optimization method and system, and the adopted technical scheme is as follows:
in a first aspect, the present invention provides an SD-WAN intelligent traffic scheduling optimization method, which includes the following steps:
acquiring a flow data packet in an SD-WAN, and acquiring a sequence corresponding to each dimension based on the flow data packet;
clustering the data in the sequence corresponding to each dimension for a plurality of times based on the data in the sequence corresponding to each dimension to obtain clustering results corresponding to each dimension in each clustering mode; obtaining influence evaluation values of each dimension on other characteristics according to the difference of data values between the corresponding clustering results and the clustering centers of each dimension in each clustering mode, the fluctuation condition of data in the corresponding clustering results of each dimension in each clustering mode, and the difference between the corresponding clustering results of each dimension in different clustering modes and the corresponding clustering results of each dimension in different clustering modes;
obtaining corresponding corrected covariance according to covariance between sequences corresponding to each dimension and sequences corresponding to other dimensions, the number of types of data in the sequences corresponding to each dimension and the influence evaluation value; constructing a target covariance matrix based on the corrected covariance;
and scheduling the SD-WAN traffic based on the target covariance matrix.
Preferably, the clustering of the data in the sequence corresponding to each dimension for several times based on the data in the sequence corresponding to each dimension to obtain each clustering result corresponding to each dimension in each clustering mode includes:
for the sequence corresponding to the kth dimension:
clustering data in a sequence corresponding to a kth dimension by adopting a mean shift clustering algorithm to obtain a plurality of first clustering results;
constructing a plurality of tuples corresponding to the kth dimension and other dimensions based on each data in the sequence corresponding to the kth dimension and each data of the other dimensions on which the sequence is located;
for the j-th dimension other than the k-th dimension: and clustering all the tuples corresponding to the kth dimension and each dimension except the kth dimension by adopting a mean shift clustering algorithm to obtain a clustering result of the kth dimension and each dimension except the kth dimension.
Preferably, the obtaining the impact evaluation value of each dimension on other features according to the difference of the data value between each clustering result corresponding to each dimension in each clustering mode and the clustering center, the fluctuation condition of the data in each clustering result corresponding to each dimension in each clustering mode, and the difference between the clustering result corresponding to each dimension in different clustering modes and the clustering result corresponding to each other dimension in different clustering modes includes:
for the kth dimension:
for any clustering result: obtaining a discrete index corresponding to the clustering result according to the difference between each datum in the clustering result and the clustering center;
according to the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension, obtaining an influence degree value of the kth dimension, wherein the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension are in negative correlation with the influence degree value;
for the j-th dimension other than the k-th dimension: according to the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the kth dimension, and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension, obtaining an influence index of the kth dimension on the jth dimension, wherein the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension and the kth dimension are in a negative correlation relation with the influence index; and determining an influence evaluation value of the kth dimension on the jth dimension based on the influence index and the influence degree value.
Preferably, the obtaining of the discrete index includes:
for any one of the kth dimension and the jth dimension other than the kth dimension, clustering results: the difference between each jth dimension data in the clustering result and the jth dimension data corresponding to the clustering center is recorded as a first difference index of each jth dimension data; and determining the arithmetic square root of the average value of the first difference indexes of all the j-th dimension data in the clustering result as a discrete index corresponding to the clustering result.
Preferably, the impact index of the kth dimension on the jth dimension is calculated using the following formula:
wherein,an influence index representing the kth dimension on the jth dimension,/for>The number of clustering results representing the kth dimension and the jth dimension other than the kth dimension,/->Maximum value in discrete indexes representing that kth dimension corresponds to all clustering results of jth dimension except kth dimension,/for each of the plurality of clusters of kth dimension and jth dimension>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Discrete indexes corresponding to the clustering results, +.>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Variance of kth dimension data in all tuples in the clustering result, +.>Represents a logarithmic function based on a constant 2, exp () represents an exponential function based on a natural constant, ++>Representing a preset second adjustment parameter, +.>Greater than 0.
Preferably, the determining the impact evaluation value of the kth dimension on the jth dimension based on the impact index and the impact degree value includes:
and determining the ratio of the influence index of the kth dimension to the influence degree value of the kth dimension as the influence evaluation value of the kth dimension to the jth dimension.
Preferably, the obtaining the corresponding corrected covariance according to the covariance between the sequence corresponding to each dimension and the sequences corresponding to other dimensions, the number of types of data in the sequence corresponding to each dimension, and the impact evaluation value includes:
for the kth dimension and the jth dimension other than the kth dimension:
the ratio between the hyperbolic tangent function value of the number of kinds of data in the sequence corresponding to the kth dimension and the influence index of the kth dimension on the jth dimension is recorded as a correction coefficient;
and determining the product of the covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension and the correction coefficient as the corrected covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension.
Preferably, the scheduling the traffic of the SD-WAN based on the target covariance matrix includes:
performing self-adaptive dimension reduction processing on the data in the target covariance matrix through a variance interpretation rate to obtain dimension reduced data; and scheduling the traffic of the SD-WAN based on the reduced-dimension data.
Preferably, the obtaining the sequence corresponding to each dimension based on the traffic data packet includes:
carrying out deep analysis on the flow data packet to obtain high-dimensional vector data, and inputting the high-dimensional vector data into a PCA algorithm to obtain an initial matrix; each column vector in the initial matrix serves as a sequence corresponding to one dimension.
In a second aspect, the present invention provides an SD-WAN intelligent traffic scheduling optimization system, including a memory and a processor, where the processor executes a computer program stored in the memory to implement the above-mentioned SD-WAN intelligent traffic scheduling optimization method.
The invention has at least the following beneficial effects:
1. according to the method, firstly, the data in the sequence corresponding to each dimension are clustered for multiple times to obtain the corresponding clustering results of each dimension in each clustering mode, and the relevance among different characteristics is established.
2. The method provided by the invention better utilizes the inherent structure and relation of the data, increases the mining of the relevance of the data, improves the interpretation and characterization capability of the flow data characteristics, provides more powerful support for further analysis and application, and further improves the accuracy and reliability of SD-WAN flow intelligent scheduling.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an SD-WAN intelligent traffic scheduling optimization method according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given to an SD-WAN intelligent traffic scheduling optimization method according to the present invention with reference to the accompanying drawings and the preferred embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the SD-WAN intelligent traffic scheduling optimization method provided by the invention with reference to the accompanying drawings.
An embodiment of an SD-WAN intelligent traffic scheduling optimization method:
the specific scene aimed at by this embodiment is: conventional PCA dimension reduction algorithms typically choose to use variance-interpretation-rate eigenvalues for parameter selection, but when combined with traffic scheduling and adaptively adjusting the PCA dimension-reduction parameters, since the multidimensional eigenvectors of the traffic data are obtained through traffic depth analysis, there may be strong correlation between the multidimensional eigenvectors, and too strong correlation between the vector eigenvectors may result in that the found principal component with high variance may not represent the direction in which the most important information is found. The feature of strong correlation may cause the direction of the principal component to be insufficiently clear, thereby reducing the interpretation of the principal component. In the embodiment, firstly, a flow data packet in an SD-WAN is acquired, the flow data packet is subjected to deep analysis to obtain a high-dimensional vector, then the high-dimensional vector is analyzed, covariance between sequences corresponding to each dimension and sequences corresponding to other dimensions is corrected, a target covariance matrix is obtained, and then intelligent dispatching is performed on the flow of the SD-WAN based on the target covariance matrix.
The embodiment provides an SD-WAN intelligent traffic scheduling optimization method, as shown in fig. 1, which includes the following steps:
step S1, obtaining a flow data packet in the SD-WAN, and obtaining a sequence corresponding to each dimension based on the flow data packet.
In the case of better management and optimization through SD-WAN, the flow data is typically subjected to deep data analysis, and then high-dimensional feature vectors of the flow data are obtained through the deep analysis result, and these high-dimensional feature vectors are used to represent data features and data changes of various aspects of the flow data.
In this embodiment, firstly, traffic data packets in SD-WAN are collected, then, these data packets are subjected to deep analysis to obtain high-dimensional vector data, and these high-dimensional vector data are input into PCA algorithm to obtain an initial matrix, where the initial matrix specifically includes:
wherein,represents the initial matrix, m represents the number of rows of the initial matrix, n represents the number of columns of the initial matrix, +.>Data representing row 1 and column 1 of the initial matrix,>data representing row 1, column n, ">Data representing column 1 of row m in the initial matrix,/->Data representing the mth column of the mth row in the initial matrix.
Each column of data in the initial matrix is used as a sequence corresponding to one dimension, so that the sequence corresponding to a plurality of dimensions is obtained in the embodiment. The PCA algorithm is prior art and will not be described in further detail herein.
Step S2, clustering the data in the sequence corresponding to each dimension for a plurality of times based on the data in the sequence corresponding to each dimension to obtain clustering results corresponding to each dimension in each clustering mode; and obtaining the influence evaluation value of each dimension on other characteristics according to the difference of the data value between each clustering result corresponding to each dimension in each clustering mode and the clustering center, the fluctuation condition of the data in each clustering result corresponding to each dimension in each clustering mode, and the difference between the clustering result corresponding to each dimension in different clustering modes and the clustering result corresponding to each other dimension in different clustering modes.
After the deep analysis of the flow data packet, multidimensional feature vectors of the flow data are obtained, and strong correlation may exist between the feature vectors, and the strong correlation among the vector features may cause that the found principal component with large variance cannot represent the principal component data of the data. In the multi-dimensional features of the flow data, features affecting other features are often characteristic features of the data, for example, data reflecting that the flow belongs to a certain transmission protocol or a certain data format, and the data is often related to other feature vectors of the flow data strongly, so that the main component acquisition process of the PCA can be optimized by identifying and calculating the characteristic data in the multi-dimensional data and the correlation between the characteristic data and other feature data.
The characteristic data in the traffic data often represents the characteristics of the data packet, and there is a strong correlation between the data and various characteristics of the traffic data, so that the data needs to be identified in multidimensional data first. The data is characterized in that the data value is strictly limited, and the value and the specific meaning of the data have a relatively tight corresponding relation, namely the value range of the data is smaller, and usually only a plurality of specific values are fixed, and the data aggregation among other feature vectors when the values of the data are the same is relatively strong, so that the data have relatively large influence on other data. Based on this, the embodiment performs multiple clustering on the data in the sequence corresponding to each dimension, and evaluates the influence degree of each dimension on other features according to the difference of the data value between each clustering result corresponding to each dimension in each clustering mode and the clustering center, the fluctuation condition of the data in each clustering result corresponding to each dimension in each clustering mode, and the difference between the clustering result corresponding to each dimension in different clustering modes and the clustering result corresponding to each other dimension in different clustering modes, so as to obtain the influence evaluation value of each dimension on other features.
For the sequence corresponding to the kth dimension:
firstly, clustering data in a sequence corresponding to a kth dimension by adopting a mean shift clustering algorithm to obtain a plurality of clustering results, and marking each clustering result obtained at the moment as a first clustering result, namely obtaining a plurality of first clustering results of the kth dimension. And then constructing a plurality of tuples corresponding to the kth dimension and other dimensions based on each data in the sequence corresponding to the kth dimension and each data in other dimensions of the row where each data is located respectively. For the j-th dimension other than the k-th dimension: and clustering all the tuples corresponding to the kth dimension and each dimension except the kth dimension by adopting a mean shift clustering algorithm to obtain a clustering result of the kth dimension and each dimension except the kth dimension. In the embodiment, the number of seed points is set to 20 when a mean shift clustering algorithm is adopted for clustering, and the drift radius is set according to specific conditions. The mean shift clustering algorithm is the prior art and will not be described in detail here. And respectively acquiring a clustering center in each clustering result.
For the kth dimension:
for any one of the kth dimension and the jth dimension other than the kth dimension, clustering results: the difference between each jth dimension data in the clustering result and the jth dimension data corresponding to the clustering center is recorded as a first difference index of each jth dimension data; determining the arithmetic square root of the average value of the first difference indexes of all the j-th dimension data in the clustering result as the clusteringDiscrete indexes corresponding to the results; the specific calculation formula of the discrete index corresponding to the clustering result comprises the following steps:wherein Q represents the discrete index corresponding to the clustering result, V represents the number of the binary groups in the clustering result, and +.>Represents the j-th dimension data in the v-th binary group in the clustering result, O represents the j-th dimension data corresponding to the clustering center in the clustering result,/in the clustering result>The first difference index is used for reflecting the difference between the jth dimension data in the v and the jth dimension data in the clustering result, the discrete index can reflect the aggregation of the clustering result, and the smaller the discrete index is, the larger the aggregation of the data is; the larger the discrete index, the less aggregated the data. By adopting the method, the discrete index corresponding to each clustering result of the kth dimension in each clustering mode can be obtained. According to the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension, obtaining an influence degree value of the kth dimension, wherein the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension are in negative correlation with the influence degree value. The negative correlation indicates that the dependent variable decreases with increasing independent variable, and the dependent variable increases with decreasing independent variable, which may be a subtraction relationship, a division relationship, or the like, and is determined by the actual application. As a specific embodiment, a specific calculation formula of the influence degree value is given, where the specific calculation formula of the influence degree value of the kth dimension is:
wherein,a value representing the degree of influence of the kth dimension, < >>Maximum value of average discrete indexes corresponding to all first clustering results representing kth dimension,/, and>the number of first clustering results representing the kth dimension,/->Discrete index corresponding to the ith clustering result representing the kth dimension, ++>Representing the variance of the data in the ith cluster result of the kth dimension,/for the data in the kth cluster result>Represents a logarithmic function based on a constant 2, exp () represents an exponential function based on a natural constant,representing a preset first adjustment parameter, +.>Greater than 0.
The specific acquisition process of (1) is as follows: the maximum value of the discrete indexes corresponding to all the first clustering results of the kth dimension is taken as +.>. The reason why the preset first adjustment parameter is introduced into the calculation formula of the influence level value in this embodiment is to prevent the denominator from being 0, which is the preset first adjustment parameter in this embodiment0.01, in a specific application, the practitioner can set according to the specific situation. The larger the discrete index corresponding to each clustering result corresponding to the kth dimension in each clustering mode is, the smaller the overall aggregation of all data in the sequence corresponding to the kth dimension is, and the smaller the influence degree of the kth dimension is. />The method is used for reflecting the maximum aggregation of the data corresponding to the kth dimension after clustering, and the larger the value is, the larger the overall aggregation of the data corresponding to the kth dimension is; the variance of the data in each clustering result is used for reflecting the fluctuation condition of the data corresponding to the kth dimension in the clustering result, and the larger the value is, the smaller the overall aggregation is, and the larger the influence degree value of the kth dimension is.
For the j-th dimension other than the k-th dimension: according to the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the kth dimension, and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension, obtaining an influence index of the kth dimension on the jth dimension, wherein the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension are in a negative correlation relation with the influence index. The specific calculation formula of the influence index of the kth dimension on the jth dimension is as follows:
wherein,an influence index representing the kth dimension on the jth dimension,/for>The number of clustering results representing the kth dimension and the jth dimension other than the kth dimension,/->Representing the maximum value of the average discrete indexes of the kth dimension and all clustering results of the jth dimension except the kth dimension, +.>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Discrete indexes corresponding to the clustering results, +.>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Variance of kth dimension data in all tuples in the clustering result, +.>Represents a logarithmic function based on a constant 2, exp () represents an exponential function based on a natural constant, ++>Representing a preset second adjustment parameter, +.>Greater than 0.
The specific acquisition method of (1) comprises the following steps: taking the maximum value in the discrete indexes corresponding to all clustering results of the kth dimension and the jth dimension except the kth dimension as +.>. In this embodiment, the preset second adjustment parameter is introduced into the calculation formula of the impact index to prevent the denominator from being 0, and in this embodiment, the preset second adjustment parameter is 0.01, and in a specific application, an implementer can according to a specific situationThe condition is set. The larger the discrete index corresponding to each clustering result corresponding to the jth dimension in each clustering mode is, the smaller the overall aggregation of all data in the sequence corresponding to the jth dimension is, and the smaller the influence degree of the jth dimension is. />The larger the value is, the larger the overall aggregation of the data corresponding to the jth dimension is, and the value range of the data is controlled by using a logarithmic function; the variance of the data in each clustering result is used for reflecting the fluctuation condition of the data corresponding to the jth dimension in the clustering result, and the larger the value is, the larger the influence degree difference of the kth dimension on the jth dimension is, the smaller the influence index of the kth dimension on the jth dimension is.
And determining the ratio of the influence index of the kth dimension to the influence degree value of the kth dimension as the influence evaluation value of the kth dimension to the jth dimension. The influence index of the kth dimension is used for reflecting the data aggregation of the data of the kth dimension after being classified by combining with the jth dimension; the influence degree value of the kth dimension is used for reflecting the data aggregation of clustering based on the data value of the kth dimension only; the ratio of the influence index to the influence degree value can reflect the difference of the aggregation condition of the kth dimension data after being classified by combining the jth dimension, and the larger the value is, the more concentrated the data distribution of the kth dimension after being classified by combining the jth dimension is, and the stronger the overall correlation between the kth dimension and the jth dimension is. The influence index of the kth dimension on the jth dimension is used for reflecting the overall correlation of the features of the kth dimension to the features of the kth dimension, and the larger the value is, the more concentrated the data range of the kth dimension is when the data of the kth dimension is the same, the stronger the correlation between the kth dimension and the jth dimension is.
According to the embodiment, the influence degree among the features of different dimensions is calculated by comparing the clustering result of clustering the kth dimension by combining the jth dimension with the clustering result of clustering only according to the data value corresponding to the kth dimension, and the obtained correlation is more accurate.
By adopting the method, the influence evaluation value of each dimension on other characteristics can be obtained.
Step S3, obtaining corresponding corrected covariance according to covariance between sequences corresponding to each dimension and sequences corresponding to other dimensions, the types of data in the sequences corresponding to each dimension and the influence evaluation value; and constructing a target covariance matrix based on the corrected covariance.
In the covariance calculation between different features, the stronger the correlation between two dimension vectors, the more difficult the covariance between the two data is to represent its true dimension information, and the greater the extent to which the covariance should be corrected at this time. In addition, the possibility that both dimensions are the flag class data needs to be considered, and one characteristic of the flag class data is that the value range of the data is smaller, and often not a continuous value range, and is a single data. Based on this, in this embodiment, the covariance between the sequence corresponding to each dimension and the sequence corresponding to the other dimension is corrected according to the number of types of data in the sequence corresponding to each dimension and the influence evaluation value of each dimension on other features, so as to obtain the corresponding corrected covariance.
Specifically, for the kth dimension and the jth dimension other than the kth dimension:
the ratio between the hyperbolic tangent function value of the number of kinds of data in the sequence corresponding to the kth dimension and the influence index of the kth dimension on the jth dimension is recorded as a correction coefficient; and determining the product of the covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension and the correction coefficient as the corrected covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension. The specific calculation formula of the corrected covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension is as follows:
wherein,representing a modified covariance between sequences corresponding to the kth dimension and sequences corresponding to the jth dimension other than the kth dimension,/for example>Representing the covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension other than the kth dimension,/for each of the sequences>Representing the number of categories of data in the sequence corresponding to the kth dimension,/for each dimension>Representing a hyperbolic tangent function. The specific method for calculating covariance is the prior art, and will not be described in detail here.
Representing a correction factor for correcting the initial covariance between the different features. The larger the influence index of the kth dimension on the jth dimension is, the more concentrated the data range of the kth dimension is when the jth dimension is the same, the stronger the correlation between the jth dimension and the kth dimension is, and the larger the variance data caused by the correlation in the covariance is, the smaller the real covariance is; the more kinds of data in the sequence corresponding to the kth dimension, the wider the value of the data value of the kth dimension is, the less likely the data value is the sign type characteristic data, and the more accurate the correlation calculation is.
By adopting the method, the covariance between every two different features can be corrected, and the corresponding corrected covariance is obtained.
In this embodiment, a covariance matrix is constructed according to the corrected covariance between different features, and the covariance matrix constructed at this time is recorded as a target covariance matrix. Thus far, the present embodiment acquires the target covariance matrix.
And step S4, scheduling the SD-WAN traffic based on the target covariance matrix.
The present embodiment has obtained the target covariance matrix in step S3, and will schedule the traffic of the SD-WAN based on the target covariance matrix next.
Specifically, in this embodiment, eigenvalues and eigenvectors of a target covariance matrix are calculated, and then adaptive dimension reduction processing is performed on data in the target covariance matrix through a variance interpretation rate to obtain dimension reduced data, so that the dimension reduced data schedules the flow of the SD-WAN.
So far, the intelligent dispatching of SD-WAN traffic is realized by adopting the method provided by the embodiment.
According to the embodiment, firstly, clustering is carried out on data in a sequence corresponding to each dimension for a plurality of times to obtain clustering results corresponding to each dimension in each clustering mode, and relevance among different characteristics is established.
An embodiment of an SD-WAN intelligent traffic scheduling optimization system:
the SD-WAN intelligent flow scheduling optimization system comprises a memory and a processor, wherein the processor executes a computer program stored in the memory to realize the SD-WAN intelligent flow scheduling optimization method.
Since an SD-WAN intelligent traffic scheduling optimization method has been described in an embodiment of an SD-WAN intelligent traffic scheduling optimization method, the description of the SD-WAN intelligent traffic scheduling optimization method is not repeated in this embodiment.
It should be noted that: the foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. An intelligent SD-WAN flow scheduling optimization method is characterized by comprising the following steps:
acquiring a flow data packet in an SD-WAN, and acquiring a sequence corresponding to each dimension based on the flow data packet;
clustering the data in the sequence corresponding to each dimension for a plurality of times based on the data in the sequence corresponding to each dimension to obtain clustering results corresponding to each dimension in each clustering mode; obtaining influence evaluation values of each dimension on other characteristics according to the difference of data values between the corresponding clustering results and the clustering centers of each dimension in each clustering mode, the fluctuation condition of data in the corresponding clustering results of each dimension in each clustering mode, and the difference between the corresponding clustering results of each dimension in different clustering modes and the corresponding clustering results of each dimension in different clustering modes;
obtaining corresponding corrected covariance according to covariance between sequences corresponding to each dimension and sequences corresponding to other dimensions, the number of types of data in the sequences corresponding to each dimension and the influence evaluation value; constructing a target covariance matrix based on the corrected covariance;
and scheduling the SD-WAN traffic based on the target covariance matrix.
2. The method for intelligent traffic scheduling optimization of SD-WAN according to claim 1, wherein clustering the data in the sequence corresponding to each dimension for several times based on the data in the sequence corresponding to each dimension to obtain each clustering result corresponding to each dimension in each clustering mode, comprises:
for the sequence corresponding to the kth dimension:
clustering data in a sequence corresponding to a kth dimension by adopting a mean shift clustering algorithm to obtain a plurality of first clustering results;
constructing a plurality of tuples corresponding to the kth dimension and other dimensions based on each data in the sequence corresponding to the kth dimension and each data of the other dimensions on which the sequence is located;
for the j-th dimension other than the k-th dimension: and clustering all the tuples corresponding to the kth dimension and each dimension except the kth dimension by adopting a mean shift clustering algorithm to obtain a clustering result of the kth dimension and each dimension except the kth dimension.
3. The method for intelligent traffic scheduling optimization of SD-WAN according to claim 2, wherein the obtaining the impact evaluation value of each dimension on other features according to the difference of the data value between the corresponding clustering result and the clustering center in each clustering mode, the fluctuation condition of the data in the corresponding clustering result in each clustering mode, the difference between the corresponding clustering result in different clustering modes and the corresponding clustering result in different clustering modes, comprises:
for the kth dimension:
for any clustering result: obtaining a discrete index corresponding to the clustering result according to the difference between each datum in the clustering result and the clustering center;
according to the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension, obtaining an influence degree value of the kth dimension, wherein the discrete index corresponding to the clustering result of the kth dimension and each dimension except the kth dimension and the variance of the data in the clustering result of the kth dimension and each dimension except the kth dimension are in negative correlation with the influence degree value;
for the j-th dimension other than the k-th dimension: according to the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the kth dimension, and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension, obtaining an influence index of the kth dimension on the jth dimension, wherein the discrete index corresponding to each clustering result of the kth dimension except the kth dimension and the variance of the kth dimension data in each clustering result of the kth dimension except the kth dimension and the kth dimension are in a negative correlation relation with the influence index; and determining an influence evaluation value of the kth dimension on the jth dimension based on the influence index and the influence degree value.
4. A method for intelligent traffic scheduling optimization for SD-WAN according to claim 3, wherein the obtaining of the discrete index comprises:
for any one of the kth dimension and the jth dimension other than the kth dimension, clustering results: the difference between each jth dimension data in the clustering result and the jth dimension data corresponding to the clustering center is recorded as a first difference index of each jth dimension data; and determining the arithmetic square root of the average value of the first difference indexes of all the j-th dimension data in the clustering result as a discrete index corresponding to the clustering result.
5. A method for intelligent traffic scheduling optimization for SD-WAN according to claim 3, wherein the impact index of the kth dimension on the jth dimension is calculated by using the following formula:
wherein,an influence index representing the kth dimension on the jth dimension,/for>The number of clustering results representing the kth dimension and the jth dimension other than the kth dimension,/->Maximum value in discrete indexes representing that kth dimension corresponds to all clustering results of jth dimension except kth dimension,/for each of the plurality of clusters of kth dimension and jth dimension>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Discrete indexes corresponding to the clustering results, +.>A kth dimension and a jth dimension other than the kth dimension are represented by ≡>Variance of kth dimension data in all tuples in the clustering result, +.>Represents a logarithmic function based on a constant 2, exp () represents an exponential function based on a natural constant, ++>Representing a preset second adjustment parameter, +.>Greater than 0.
6. The method for optimizing SD-WAN intelligent traffic scheduling according to claim 3, wherein said determining an impact evaluation value of the kth dimension to the jth dimension based on the impact index and the impact level value comprises:
and determining the ratio of the influence index of the kth dimension to the influence degree value of the kth dimension as the influence evaluation value of the kth dimension to the jth dimension.
7. The method for optimizing SD-WAN intelligent traffic scheduling according to claim 1, wherein the obtaining the corresponding corrected covariance according to the covariance between the sequence corresponding to each dimension and the sequences corresponding to other dimensions, the number of kinds of data in the sequence corresponding to each dimension, and the impact evaluation value comprises:
for the kth dimension and the jth dimension other than the kth dimension:
the ratio between the hyperbolic tangent function value of the number of kinds of data in the sequence corresponding to the kth dimension and the influence index of the kth dimension on the jth dimension is recorded as a correction coefficient;
and determining the product of the covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension and the correction coefficient as the corrected covariance between the sequence corresponding to the kth dimension and the sequence corresponding to the jth dimension except the kth dimension.
8. The method for intelligent traffic scheduling optimization of SD-WAN according to claim 1, wherein said scheduling of SD-WAN traffic based on said target covariance matrix comprises:
performing self-adaptive dimension reduction processing on the data in the target covariance matrix through a variance interpretation rate to obtain dimension reduced data; and scheduling the traffic of the SD-WAN based on the reduced-dimension data.
9. The method for optimizing SD-WAN intelligent traffic scheduling according to claim 1, wherein said obtaining a sequence corresponding to each dimension based on the traffic data packet comprises:
carrying out deep analysis on the flow data packet to obtain high-dimensional vector data, and inputting the high-dimensional vector data into a PCA algorithm to obtain an initial matrix; each column vector in the initial matrix serves as a sequence corresponding to one dimension.
10. An SD-WAN intelligent traffic scheduling optimization system comprising a memory and a processor, characterized in that said processor executes a computer program stored in said memory to implement an SD-WAN intelligent traffic scheduling optimization method according to any of claims 1-9.
CN202410095264.3A 2024-01-24 2024-01-24 SD-WAN intelligent flow scheduling optimization method and system Active CN117609814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410095264.3A CN117609814B (en) 2024-01-24 2024-01-24 SD-WAN intelligent flow scheduling optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410095264.3A CN117609814B (en) 2024-01-24 2024-01-24 SD-WAN intelligent flow scheduling optimization method and system

Publications (2)

Publication Number Publication Date
CN117609814A true CN117609814A (en) 2024-02-27
CN117609814B CN117609814B (en) 2024-05-07

Family

ID=89960251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410095264.3A Active CN117609814B (en) 2024-01-24 2024-01-24 SD-WAN intelligent flow scheduling optimization method and system

Country Status (1)

Country Link
CN (1) CN117609814B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105703954A (en) * 2016-03-17 2016-06-22 福州大学 Network data flow prediction method based on ARIMA model
CN108566341A (en) * 2018-04-08 2018-09-21 西安交通大学 Flow control methods in a kind of SD-WAN environment
CN113393673A (en) * 2021-08-17 2021-09-14 深圳市城市交通规划设计研究中心股份有限公司 Traffic signal scheduling plan and time interval optimization method and device
CN115865780A (en) * 2022-11-07 2023-03-28 中电信数智科技有限公司 Network flow intelligent scheduling method based on multi-dimensional data
WO2023088362A1 (en) * 2021-11-19 2023-05-25 贵州白山云科技股份有限公司 Network traffic processing method and apparatus, and medium and electronic device
CN116760718A (en) * 2023-05-21 2023-09-15 北京工业大学 SDN flow scheduling method based on machine learning classification prediction
US20230344846A1 (en) * 2019-12-23 2023-10-26 Boon Logic Inc. Method for network traffic analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105703954A (en) * 2016-03-17 2016-06-22 福州大学 Network data flow prediction method based on ARIMA model
CN108566341A (en) * 2018-04-08 2018-09-21 西安交通大学 Flow control methods in a kind of SD-WAN environment
US20230344846A1 (en) * 2019-12-23 2023-10-26 Boon Logic Inc. Method for network traffic analysis
CN113393673A (en) * 2021-08-17 2021-09-14 深圳市城市交通规划设计研究中心股份有限公司 Traffic signal scheduling plan and time interval optimization method and device
WO2023088362A1 (en) * 2021-11-19 2023-05-25 贵州白山云科技股份有限公司 Network traffic processing method and apparatus, and medium and electronic device
CN115865780A (en) * 2022-11-07 2023-03-28 中电信数智科技有限公司 Network flow intelligent scheduling method based on multi-dimensional data
CN116760718A (en) * 2023-05-21 2023-09-15 北京工业大学 SDN flow scheduling method based on machine learning classification prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高静 等: "网络差异数据的优化挖掘模型仿真分析研究", 《微电子学与计算机》, vol. 33, no. 07, 31 July 2016 (2016-07-31), pages 136 - 139 *

Also Published As

Publication number Publication date
CN117609814B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US10073906B2 (en) Scalable tri-point arbitration and clustering
WO2020155755A1 (en) Spectral clustering-based optimization method for anomaly point ratio, device, and computer apparatus
US6449612B1 (en) Varying cluster number in a scalable clustering system for use with large databases
US6012058A (en) Scalable system for K-means clustering of large databases
CN115577275A (en) Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN111199016A (en) DTW-based improved K-means daily load curve clustering method
US20030182082A1 (en) Method for determining a quality for a data clustering and data processing system
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN111275132A (en) Target clustering method based on SA-PFCM + + algorithm
CN112348084A (en) Unknown protocol data frame classification method for improving k-means
CN111382320A (en) Large-scale data increment processing method for knowledge graph
JPH0744514A (en) Learning data contracting method for neural network
CN117493921B (en) Artificial intelligence energy-saving management method and system based on big data
CN114389974A (en) Method, device and medium for searching abnormal flow node in distributed training system
CN117609814B (en) SD-WAN intelligent flow scheduling optimization method and system
CN117149746B (en) Data warehouse management system based on cloud primordial and memory calculation separation
US20030204484A1 (en) System and method for determining internal parameters of a data clustering program
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
CN112149052A (en) Daily load curve clustering method based on PLR-DTW
CN113378900B (en) Large-scale irregular KPI time sequence anomaly detection method based on clustering
US7991578B2 (en) Method and apparatus for finding cluster in data stream as infinite data set having data objects to be continuously generated
CN110609832B (en) Non-repeated sampling method for streaming data
Haiyang et al. An improved Canopy-FFCM clustering algorithm for ocean data analysis
López-Rubio et al. Principal components analysis competitive learning
CN109685101A (en) A kind of adaptive acquisition method of multidimensional data and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant