CN108770002B - Base station flow analysis method, device, equipment and storage medium - Google Patents

Base station flow analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN108770002B
CN108770002B CN201810396528.3A CN201810396528A CN108770002B CN 108770002 B CN108770002 B CN 108770002B CN 201810396528 A CN201810396528 A CN 201810396528A CN 108770002 B CN108770002 B CN 108770002B
Authority
CN
China
Prior art keywords
base station
calculating
traffic
matrix
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810396528.3A
Other languages
Chinese (zh)
Other versions
CN108770002A (en
Inventor
杜翠凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jiesai Communication Planning And Design Institute Co ltd
GCI Science and Technology Co Ltd
Original Assignee
Guangzhou Jiesai Communication Planning And Design Institute Co ltd
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jiesai Communication Planning And Design Institute Co ltd, GCI Science and Technology Co Ltd filed Critical Guangzhou Jiesai Communication Planning And Design Institute Co ltd
Priority to CN201810396528.3A priority Critical patent/CN108770002B/en
Publication of CN108770002A publication Critical patent/CN108770002A/en
Application granted granted Critical
Publication of CN108770002B publication Critical patent/CN108770002B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a base station flow analysis method, which comprises the following steps: collecting flow time sequences of at least two base stations in a communication network; calculating a feature vector of at least one traffic pattern feature of each base station according to the traffic time sequence of each base station; calculating the weight of each flow mode feature according to the feature vector; generating a target feature matrix of a base station mode of the communication network according to the feature vector and the weight; and clustering the target characteristic matrix to obtain a clustering result, so that a base station mode can be analyzed according to the clustering result. The invention also discloses a device, equipment and a storage medium for analyzing the base station flow, which can improve the accuracy and stability of the clustering result, thereby improving the performance of the base station mode prediction analysis result.

Description

Base station flow analysis method, device, equipment and storage medium
Technical Field
The present invention relates to the field of communications network technologies, and in particular, to a method, an apparatus, a device, and a storage medium for analyzing base station traffic.
Background
With the continuous development of science and technology, the popularity of mobile terminals is higher and higher, and many popular applications including internet surfing, video, music, social contact and the like are converted from fixed networks to mobile communication networks, and mobile communication data show explosive growth due to the gradual formation of mobile internet of things. Operators have also accelerated base station deployment in order to meet the ever-increasing traffic demands of mobile users. In order to meet the requirements of various scenes, base stations of various types such as macro base stations, micro cells and micro cells form a multi-element heterogeneous network structure. The flow space-time analysis of the base station has become a key index for the network performance evaluation and planning design of operators. In the prior art, the original time sequence is generally used for analyzing the base station traffic, the euclidean distance is used for measuring the similarity between the time sequences, and the similarity is clustered.
However, in the process of implementing the present invention, the inventor finds that the existing method only considers the value difference of the time sequence at the corresponding time point, and measures the similarity between the time sequences by using the euclidean distance, so that the result is easily affected by the value at the individual time point, the accuracy and stability of the clustering result are reduced, and the performance of the predicted result is poor.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a method, an apparatus, a device and a storage medium for analyzing base station traffic, which can improve the accuracy and stability of clustering results, thereby improving the performance of base station pattern prediction analysis results.
In a first aspect, an embodiment of the present invention provides a base station traffic analysis method, including:
collecting flow time sequences of at least two base stations in a communication network;
calculating a feature vector of at least one traffic pattern feature of each base station according to the traffic time sequence of each base station;
calculating the weight of each flow mode feature according to the feature vector;
generating a target feature matrix of a base station mode of the communication network according to the feature vector and the weight;
and clustering the target characteristic matrix to obtain a clustering result, so that a base station mode can be analyzed according to the clustering result.
In a first implementation form of the first aspect, the traffic pattern features include correlation, scale component, entropy, and shape similarity; the calculating, according to the traffic time series of each base station, the feature vector of at least one traffic pattern feature of each base station specifically includes:
and calculating the correlation characteristic vector, the scale component characteristic vector, the entropy characteristic vector and the shape similarity characteristic vector of each base station according to the flow time sequence of each base station.
According to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the calculating, according to the traffic time series of each base station, the correlation feature vector, the scale component feature vector, the entropy feature vector, and the shape similarity feature vector of each base station specifically includes:
calculating a Pearson correlation coefficient of flow time sequences of every two base stations to obtain a correlation characteristic vector of each base station;
calculating a frequency variation trend coefficient of the flow time sequence of each base station to obtain a scale component feature vector of each base station;
calculating the flow entropy of the flow time sequence of each base station to obtain the entropy characteristic vector of each base station;
and calculating the sum of the distances between the points of the flow time sequence of each two base stations and the points to obtain the shape similarity characteristic vector of each base station.
In a third implementation manner of the first aspect, the calculating, according to the feature vector, a weight of each flow pattern feature specifically includes:
calculating a first evaluation value of each feature vector of each base station according to the feature vectors;
generating an index evaluation matrix according to the first evaluation value;
and calculating the weight of each flow pattern characteristic according to the index evaluation matrix.
According to a third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the calculating, according to the index evaluation matrix, a weight of each flow pattern feature specifically includes:
standardizing the index evaluation matrix to generate an index standardized matrix; the index standardization matrix comprises a second evaluation value which corresponds to each feature vector and is subjected to standardization processing;
calculating the entropy value and the entropy redundancy of each flow mode characteristic according to the index standardization matrix;
and calculating the weight of each flow pattern characteristic according to the entropy value and the entropy redundancy.
According to a fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the generating a target feature matrix of a base station mode of the communication network according to the feature vector and the weight specifically includes:
and generating a target characteristic matrix of a base station mode of the communication network according to the index standardization matrix and the weight of each traffic mode characteristic.
In a sixth implementation form of the first aspect, the clustering is K-means clustering.
In a second aspect, an embodiment of the present invention provides an apparatus for analyzing base station traffic, including:
the time sequence acquisition module is used for acquiring flow time sequences of at least two base stations in a communication network;
the characteristic vector calculation module is used for calculating a characteristic vector of at least one flow mode characteristic of each base station according to the flow time sequence of each base station;
the weight calculation module is used for calculating the weight of each flow mode characteristic according to the characteristic vector;
the characteristic matrix generating module is used for generating a target characteristic matrix of a base station mode of the communication network according to the characteristic vector and the weight;
and the matrix clustering module is used for clustering the target characteristic matrix to obtain a clustering result so as to analyze the base station mode according to the clustering result.
In a third aspect, an embodiment of the present invention further provides a base station traffic analysis device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the base station traffic analysis device implements any one of the above base station traffic analysis methods.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the foregoing base station traffic analysis methods.
One of the above technical solutions has the following advantages: clustering of the base stations is carried out by adopting a mode that the flow pattern characteristics of the time sequence form a characteristic vector, and the complexity of calculation is reduced on the premise of ensuring the acquisition of the flow characteristics of the base stations; calculating a feature vector of the flow pattern features, wherein each flow pattern feature reflects the change trend or stability and the like of the flow of each base station, and analyzing the base station patterns from different dimensions to improve the stability of clustering results; the weight of each flow pattern feature is calculated, the contribution value of each feature vector to the total base station flow is fully considered, the features of different base stations can be distinguished, and the stability of a clustering result is improved, so that the clustering result can better reflect the problems of the movement periodicity of mobile users and the stability and trend of the base station flow. Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for analyzing traffic of a base station according to a first embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a base station traffic analysis apparatus according to a fourth embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a first embodiment of the present invention provides a base station traffic analysis method, which can be executed on a base station traffic analysis device, and includes the following steps:
s10, collecting the flow time sequence of at least two base stations in the communication network.
In this embodiment, the base station traffic analysis device may be a mobile terminal such as a mobile phone, a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), or a digital broadcast receiver, or may also be a fixed terminal such as a digital TV, a desktop computer, or a server, and a time sequence of traffic of base stations in a communication network may be collected and acquired on the device, for example, a sequence between traffic of all base stations in a communication network of a certain city is collected from an operator of the city. In this embodiment, the traffic time series of each base station in the communication network may be collected according to a certain time granularity or a certain time period, for example, 1 hour or 1 day, and the collected traffic time series may be stored, for example, the traffic time series of a certain base station in one month is collected and obtained as P ═<p1,p2,...,pm>Wherein, said pmAnd represents the flow value corresponding to the mth time period.
And S20, calculating a feature vector of at least one traffic pattern feature of each base station according to the traffic time sequence of each base station.
In this embodiment, the extraction and calculation of the traffic pattern features are performed on the acquired traffic time series, so as to obtain the feature vectors of the traffic pattern features of the corresponding traffic time series in each aspect of correlation, traffic distribution, scale component, entropy, shape similarity, stability, model universality, and the like. In this embodiment, for the characteristics of the mobile communication heterogeneous network, the extracted base station traffic pattern features include 4 large features such as correlation, scale component, entropy, shape similarity, and the like, where the correlation can describe spatial correlation of base station network resources; the scale component can describe a time variation trend of the base station; the entropy can describe the demand stability of the base station traffic; the similarity in shape can well describe the similarity of the time series of base station traffic. As an example, the correlation feature vector of each base station is formed by calculating the correlation between every two base stations, for example, the communication network includes 3 base stations A, B and C, the correlation between base station a and base station a is 1, the correlation between base station a and base station B is 0.5, and the correlation between base station a and base station C is 0.7, so the correlation feature vector of base station a is (1, 0.5, 0.7), and the correlation feature vectors of other base stations can be calculated in the same way, and of course, the scale component feature vector, the entropy feature vector and the shape similarity feature vector of each base station can also be calculated according to the traffic time series of the base stations.
In this embodiment, generally, the lengths of the traffic time series of the two base stations need to be the same, but the base station which encounters data loss or error can process the data by a time series sliding or averaging method.
And S30, calculating the weight of each flow pattern characteristic according to the characteristic vector.
In this embodiment, since each traffic pattern feature has different roles, positions, and influences from those of other traffic pattern features, different weights need to be given according to the degree of importance of each traffic pattern feature, and therefore, in calculating the weights, the correlation between each traffic pattern feature and the distribution of each traffic pattern feature itself need to be considered. Here, the weight of each flow pattern feature (i.e., each index) is calculated by an entropy method.
Specifically, according to the feature vectors, calculating a first evaluation value of each feature vector of each base station; generating an index evaluation matrix according to the first evaluation value; and calculating the weight of each flow pattern characteristic according to the index evaluation matrix.
In this embodiment, assuming that the number of base stations is n and there are m traffic pattern features of the base stations, the index evaluation matrix obtained is Y ═ (Y)ij)n×m,yijAn evaluation value of the jth feature vector indicating the ith base station (i.e., the ithAn evaluation value). Therefore, it is necessary to calculate an evaluation value of each feature vector from all the feature vectors. As an example, assuming that the number of base stations is 3 (base stations A, B and C), the traffic pattern features of the base stations are 2 (correlation and shape similarity), a feature vector of each traffic pattern feature of each base station is calculated, and then the similarity between the respective feature vectors is calculated using cosine, thereby obtaining a first evaluation value of each feature vector, and it is assumed that the calculated similarities of the correlations of the base station a, the base station B, and the base station C (i.e., the similarities of the correlation feature vectors) are 107.2501, 97.5471, and 156.4992, respectively. Similarly, the similarity of the shape similarity of the base station a, the base station B and the base station C (i.e. the similarity of the shape similarity eigenvectors) is calculated to be 188.6891, 159.6681 and 27.5779 respectively. Therefore, from the first evaluation value of each feature vector calculated as described above, an index evaluation matrix can be generated as follows:
feature(s) Similarity of correlations Similarity of shape similarity
A 107.2501 188.6891
B 97.5471 159.6681
C 156.4992 27.5779
Therefore, the weight of each flow pattern feature (i.e., each index) can be calculated from the index evaluation matrix obtained as described above.
In this embodiment, specifically, the index evaluation matrix is normalized to generate an index normalization matrix; the index standardization matrix comprises a second evaluation value which corresponds to each feature vector and is subjected to standardization processing; calculating the entropy value and the entropy redundancy of each flow mode characteristic according to the index standardization matrix; and calculating the weight of each flow pattern characteristic according to the entropy value and the entropy redundancy.
In this embodiment, since the physical dimensions of different indexes are mostly different, when comparing each index, dimensionless processing (for example, methods such as extremization, equalization, and standard deviation) may be performed on each index value, and index normalization processing is also required to ensure the indexes have the same direction. After the index evaluation matrix is treated with syntropy and non-quantization, an index standardization matrix is obtained as follows:
X=(xij) n x m, wherein xijAnd (3) an evaluation value (namely, a second evaluation value) of the jth feature vector of the ith base station after the normalization processing.
The entropy method is then used to calculate the weight of each flow pattern feature (i.e., each index). According to the definition of the entropy value, the entropy value e of the jth flow pattern characteristic can be obtainedjAnd redundancy h of entropyjComprises the following steps:
Figure BDA0001643332670000071
hj=1-ej
wherein K is a constant. Then the entropy weight (i.e., weight) w of the flow pattern feature j can be calculatedjComprises the following steps:
Figure BDA0001643332670000072
therefore, the weight of each flow pattern feature can be calculated in the above manner, for example, the weight of the correlation is 0.3, and the weight of the shape similarity is 0.3.
And S40, generating a target feature matrix of the base station mode of the communication network according to the feature vector and the weight.
In this embodiment, each feature vector of each base station is combined with the calculated weight of the corresponding traffic pattern feature to obtain the total contribution of each feature vector, and a target feature matrix is generated, so that the target feature matrix takes the contribution value of each feature vector into consideration to some extent. Specifically, a target feature matrix of a base station mode of the communication network is generated according to the index normalization matrix and the weight of each traffic mode feature. In this embodiment, each second evaluation value in the index normalization matrix is calculated according to the feature vector of the corresponding base station, so after calculating the weight of each traffic pattern feature, each vector in the index normalization matrix (i.e. each second evaluation value) is multiplied by the corresponding weight to obtain an objective feature matrix of the traffic pattern of the base station with the weight taken into account, and the matrix can be put into a clustering algorithm for clustering. By way of example, assume that the index normalization matrix is as follows:
original characteristics Similarity of correlations Similarity of shape similarity
A 0.4 0.1
B 0.5 0.7
C 0.2 0.6
The weight of the correlation calculated at the same time is 0.3, and the weight of the shape similarity is 0.3. Thus, the resulting target feature matrix is as follows:
post-processing features Similarity of correlations Similarity of shape similarity
A 0.12 0.03
B 0.15 0.21
C 0.06 0.18
And S50, clustering the target characteristic matrix to obtain a clustering result, so that the base station mode can be analyzed according to the clustering result.
In the implementation, samples with higher similarity can be classified into one class by using a clustering algorithm of unsupervised learning, namely, different classes of base stations in a communication network can be obtained through clustering. The target characteristic matrix obtained by the calculation is placed in a clustering algorithm for clustering, so that base stations can be divided, the base stations with the same characteristic belong to one class, and the movement periodic characteristics of the mobile user can be researched and analyzed through the classification result of the base stations.
In summary, clustering of the base stations is performed by adopting a way that the traffic pattern characteristics of the time sequence form a characteristic vector, and the complexity of calculation is reduced on the premise of ensuring the acquisition of the traffic characteristics of the base stations; calculating a feature vector of the flow pattern features, wherein each flow pattern feature reflects the change trend or stability and the like of the flow of each base station, and analyzing the base station patterns from different dimensions to improve the stability of clustering results; the weight of each flow pattern feature is calculated, the contribution value of each feature vector to the total base station flow is fully considered, the features of different base stations can be distinguished, the accuracy and the stability of a clustering result are improved, and the clustering result can better reflect the problems of the moving periodicity of a mobile user and the stability and the trend of the base station flow.
Second embodiment of the invention:
on the basis of the first embodiment, the calculating the correlation feature vector, the scale component feature vector, the entropy feature vector, and the shape similarity feature vector of each base station according to the traffic time series of each base station specifically includes:
calculating a Pearson correlation coefficient of flow time sequences of every two base stations to obtain a correlation characteristic vector of each base station; calculating a frequency variation trend coefficient of the flow time sequence of each base station to obtain a scale component feature vector of each base station; calculating the flow entropy of the flow time sequence of each base station to obtain the entropy characteristic vector of each base station; and calculating the sum of the distances between the points of the flow time sequence of each two base stations and the points to obtain the shape similarity characteristic vector of each base station.
In the present embodiment, the correlation of the base station traffic time series is a spatial correlation used to represent the network resource usage of mobile communications. Here, the pearson correlation system is used to describe the correlation of any two base station traffic time series. In general, the closer the spatial distance between the base station and the base station, the larger the pearson correlation coefficient thereof, and vice versa. Specifically, the formula is as follows:
Figure BDA0001643332670000091
wherein the cov (X, Y) represents a covariance between the traffic time series of base station X and the traffic time series of base station Y; the σ X σ Y represents a product of standard deviations between the traffic time series of the base station X and the traffic time series of the base station Y, and the N represents the number of base stations.
As an example, the communication network includes 3 base stations A, B and C, and the correlation coefficient table 1 is obtained by calculating the pearson correlation coefficient between two base stations, i.e. the correlation between two base stations:
TABLE 1 correlation coefficient Table
Coefficient of correlation A B C
A 1 0.5 0.7
B 0.5 1 0.4
C 0.7 0.4 1
Thus, the correlation characteristic vector of the base station a is (1, 0.5, 0.7); similarly, the correlation eigenvector of base station B is (0.5, 1, 0.4), and the correlation eigenvector of base station C is (0.7, 0.4, 1).
In the embodiment, the scale component of the base station can reflect the time variation trend of the base station traffic. And the scale component needs to adopt multiple Haar wavelet transforms to carry out multi-scale decomposition on the base station flow data, and extract the scale component signal of the low-frequency component of the base station flow data, so that the frequency change trend coefficient of the time sequence of the base station flow is obtained. The main purpose of wavelet transform is to find the scaling and translation of the time series. The expansion reflects the frequency domain change characteristic of the base station flow, and the translation reflects the time domain change characteristic of the base station flow, and the formula is as follows:
Figure BDA0001643332670000101
wherein, the scale component a is used for controlling the expansion and contraction of the wavelet function and corresponds to the frequency variation trend of the time sequence; the width translation amount is the translation of the control wavelet function and corresponds to the time variation trend of the time sequence. For example, the scale component feature vector of the base station a is calculated to be 0.5 by the above formula according to the traffic time series of the base station a.
In this embodiment, the traffic entropy of the base station is a stability feature used to characterize traffic variation of the base station, and if the traffic variation of a base station is more orderly, the traffic entropy of the base station is lower, and the moving period feature of the mobile user is more obvious, and vice versa. The formula is as follows:
Figure BDA0001643332670000102
where H (X) ═ H (p1, p2, … pn) is the traffic entropy of base station X, and pi is the probability of occurrence of the ith result of the n possible results for the base station X traffic interval. If the traffic of the base station X is subjected to uniform distribution, the entropy of the traffic of the base station has a maximum value h (X) ═ lnn, and the entropy feature vector of the base station a is calculated to be 2 according to the above formula and the traffic time series of the base station a, for example.
In the present embodiment, the similarity of the shapes of the base stations is obtained by measuring the sum of the distances between the points of the traffic time series of the two base stations and the points, and if the shapes of the two base stations are more similar on the traffic time series, the sum of the distances is smaller. In consideration of the characteristics of the base station traffic, the embodiment uses Dynamic Time Warping (DTW) to measure the similarity of the shapes of the base station traffic data. The DTW distance is matched according to the time-warping path with the minimum cost, so that the problem of similarity measurement after time-warping of base station traffic is well solved.
For example, assuming that the flow records of any 2 mobile base stations in one month are P ═ < P1, P2, …, pm > and Q ═ Q1, Q2, …, qm >, respectively, there is no requirement for the amount of two base station flow data due to the algorithm of dynamic time planning. Then the similarity between the two base station traffic shapes is formulated as:
DTW(P,Q)=f(m,n)
Figure BDA0001643332670000111
wherein, | | pi-qjAnd | | is a two-norm of two-point coordinates, namely, the euclidean distance between two points.
As an example, the communication network includes 3 base stations A, B and C, and the DTW distance between two base stations is calculated to obtain the shape similarity between two base stations, as shown in table 2:
TABLE 2 form similarity Table
DTW distance A B C
A 0 5 7
B 5 0 4
C 7 4 0
Therefore, it can be obtained that the shape similarity feature vector of base station a is (0, 5, 7), and similarly, the shape similarity feature vector of base station B is (5, 0, 4), and the shape similarity feature vector of base station B is (7, 4, 0).
By the method, 4 characteristic vectors of the mass flow mode characteristics are calculated, each flow mode characteristic reflects the change trend or stability and the like of each base station flow, the base station modes are analyzed from different dimensions, and the accuracy and stability of clustering results are improved.
Third embodiment of the invention:
on the basis of the first embodiment, the method further comprises the following steps:
the clustering is K-means clustering.
In this embodiment, a K-means algorithm is adopted to cluster feature variables of the traffic pattern features of the base station. The specific process is as follows: the first step is as follows: determining the number K of base station types; the second step is that: the model randomly selects a base station as a centroid (class center), divides similar samples into a class, then calculates the centroid of each class, and repeatedly calculates the similarity between each sample and the centroid until the centroid is not changed any more; the third step: the final centroid is output along with each class. In this embodiment, the K value may be obtained through an experiment, and generally, a specific K value is obtained after a clustering effect is observed through multiple clustering. The developed cities have different K values, and the K value of a developed city is not more than 7. By way of example, assuming that there are 6 base stations P1, P2, P3, P4, P5 and P6 in the communication network, assuming that only two traffic pattern characteristics of each base station are considered, the resulting target characteristic matrix is as follows:
P1 0 0
P2 1 2
P3 3 1
P4 8 8
P5 9 10
P6 10 7
therefore, firstly, assuming that the K value is 2, namely, the K value is divided into 2 classes, the centroid of the first class is selected as P1, the centroid of the second class is selected as P2, the similarity between other base stations and the two centroids is calculated, and the first calculation result is the first class: p1, second type: p2, P3, P4, P5, P6; then, the centroid of each class is calculated, and the new centroid of the first class is (0, 0), i.e. P1, and the new centroid of the second class is (6.2, 5.6), then the second calculation result is the first class: p1, P2, P3, second class: p4, P5, P6; the third calculation result obtained by repeated calculation is the first type: p1, P2, P3, second class: p4, P5 and P6, and the third calculation result and the second calculation result do not have any change, which shows that convergence is achieved, clustering is finished, and finally, each class and the centroid of each class are output.
By the method, the base stations can be quickly and accurately clustered by adopting the K-means algorithm, the base station mode can be analyzed according to the clustering result, a series of network optimization works such as base station switching, dynamic clustering and the like can be carried out according to the base station mode obtained by analysis, and the mobile periodic characteristics of the mobile users can be reflected through the classification result of the base stations.
Referring to fig. 2, a fourth embodiment of the present invention provides a device for analyzing base station traffic, including:
a time sequence acquisition module 10, configured to acquire traffic time sequences of at least two base stations in a communication network;
a feature vector calculation module 20, configured to calculate a feature vector of at least one traffic pattern feature of each base station according to the traffic time sequence of each base station;
a weight calculation module 30, configured to calculate a weight of each flow pattern feature according to the feature vector;
a feature matrix generation module 40, configured to generate a target feature matrix of a base station mode of the communication network according to the feature vector and the weight;
and the matrix clustering module 50 is configured to cluster the target feature matrix to obtain a clustering result, so that a base station mode can be analyzed according to the clustering result.
Preferably, the traffic pattern features include correlation, scale component, entropy and shape similarity; the feature vector calculation module 20 specifically includes:
and the vector calculation unit is used for calculating the correlation characteristic vector, the scale component characteristic vector, the entropy characteristic vector and the shape similarity characteristic vector of each base station according to the flow time sequence of each base station.
Further, the vector calculation unit specifically is:
the first vector calculation unit is used for calculating the Pearson correlation coefficient of the flow time sequence of every two base stations to obtain the correlation characteristic vector of each base station;
the second vector calculation unit is used for calculating a frequency change trend coefficient of the flow time sequence of each base station to obtain a scale component feature vector of each base station;
a third vector calculation unit, configured to calculate a traffic entropy of the traffic time series of each base station, so as to obtain an entropy feature vector of each base station;
and the fourth vector calculation unit is used for calculating the sum of the distances between the points of the flow time sequence of each two base stations and the point to obtain the shape similarity characteristic vector of each base station.
Preferably, the weight calculating module 30 is specifically:
a first evaluation value calculation unit configured to calculate a first evaluation value of each of the feature vectors of each of the base stations based on the feature vectors;
an evaluation matrix generation unit configured to generate an index evaluation matrix according to the first evaluation value;
and the weight calculation unit is used for calculating the weight of each flow pattern characteristic according to the index evaluation matrix.
Further, the weight calculating unit specifically includes:
the second evaluation value calculation unit is used for carrying out standardization processing on the index evaluation matrix to generate an index standardization matrix; the index standardization matrix comprises a second evaluation value which corresponds to each feature vector and is subjected to standardization processing;
the entropy calculation unit is used for calculating the entropy and the entropy redundancy of each flow mode characteristic according to the index standardization matrix;
and the entropy weight calculation unit is used for calculating the weight of each flow pattern characteristic according to the entropy value and the entropy redundancy.
Further, the feature matrix generating module 40 is specifically:
and the target characteristic matrix generating unit is used for generating a target characteristic matrix of a base station mode of the communication network according to the index standardization matrix and the weight of each traffic mode characteristic.
Preferably, the clusters are K-means clusters.
A fifth embodiment of the present invention provides a device for analyzing base station traffic. The base station traffic analysis apparatus of this embodiment includes: a processor, a display, a memory, and a computer program stored in the memory and executable on the processor, such as a program for base station traffic analysis. The processor, when executing the computer program, implements the steps in the embodiments of the method for analyzing traffic of the base station, such as step S10 shown in fig. 1. Alternatively, the processor, when executing the computer program, implements the functions of the units in the above-mentioned device embodiments, for example, the time-series acquisition module 10 shown in fig. 2.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the device for analyzing the traffic of the base station.
The base station flow analysis device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The base station traffic analysis device may include, but is not limited to, a processor, a memory, and a display. It will be understood by those skilled in the art that the above components are merely examples of the base station traffic analyzing apparatus, and do not constitute a limitation of the base station traffic analyzing apparatus, and may include more or less components than the above components, or combine some components, or different components, for example, the base station traffic analyzing apparatus may further include an input and output device, a network access device, a bus, and the like.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the base station traffic analyzing apparatus, and various interfaces and lines are used to connect various parts of the entire base station traffic analyzing apparatus.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the device for base station traffic analysis by executing or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, a text conversion function, etc.), and the like; the storage data area may store data (such as audio data, text message data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the device integrated module for analyzing the base station flow can be stored in a computer readable storage medium if the module is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (9)

1. A method for analyzing base station traffic is characterized by comprising the following steps:
collecting flow time sequences of at least two base stations in a communication network;
calculating a feature vector of at least one traffic pattern feature of each base station according to the traffic time sequence of each base station;
calculating the weight of each flow mode feature according to the feature vector;
generating a target feature matrix of a base station mode of the communication network according to the feature vector and the weight;
clustering the target characteristic matrix to obtain a clustering result so as to analyze a base station mode according to the clustering result;
wherein, the calculating the weight of each flow pattern feature according to the feature vector specifically comprises:
calculating a first evaluation value of each feature vector of each base station according to the feature vectors;
generating an index evaluation matrix according to the first evaluation value;
and calculating the weight of each flow pattern characteristic according to the index evaluation matrix.
2. The base station traffic analysis method according to claim 1, wherein the traffic pattern features include correlation, scale component, entropy, and shape similarity; the calculating, according to the traffic time series of each base station, the feature vector of at least one traffic pattern feature of each base station specifically includes:
and calculating the correlation characteristic vector, the scale component characteristic vector, the entropy characteristic vector and the shape similarity characteristic vector of each base station according to the flow time sequence of each base station.
3. The base station traffic analysis method according to claim 2, wherein the calculating the correlation feature vector, the scale component feature vector, the entropy feature vector, and the shape similarity feature vector of each base station according to the traffic time series of each base station specifically includes:
calculating a Pearson correlation coefficient of flow time sequences of every two base stations to obtain a correlation characteristic vector of each base station;
calculating a frequency variation trend coefficient of the flow time sequence of each base station to obtain a scale component feature vector of each base station;
calculating the flow entropy of the flow time sequence of each base station to obtain the entropy characteristic vector of each base station;
and calculating the sum of the distances between the points of the flow time sequence of each two base stations and the points to obtain the shape similarity characteristic vector of each base station.
4. The method for analyzing base station traffic according to claim 1, wherein the calculating the weight of each traffic pattern feature according to the index evaluation matrix specifically includes:
standardizing the index evaluation matrix to generate an index standardized matrix; the index standardization matrix comprises a second evaluation value which corresponds to each feature vector and is subjected to standardization processing;
calculating the entropy value and the entropy redundancy of each flow mode characteristic according to the index standardization matrix;
and calculating the weight of each flow pattern characteristic according to the entropy value and the entropy redundancy.
5. The base station traffic analysis method according to claim 4, wherein the generating of the target feature matrix of the base station mode of the communication network according to the feature vector and the weight specifically includes:
and generating a target characteristic matrix of a base station mode of the communication network according to the index standardization matrix and the weight of each traffic mode characteristic.
6. The base station traffic analysis method according to claim 1, wherein the clustering is K-means clustering.
7. An apparatus for analyzing traffic of a base station, comprising:
the time sequence acquisition module is used for acquiring flow time sequences of at least two base stations in a communication network;
the characteristic vector calculation module is used for calculating a characteristic vector of at least one flow mode characteristic of each base station according to the flow time sequence of each base station;
the weight calculation module is used for calculating the weight of each flow mode characteristic according to the characteristic vector;
the characteristic matrix generating module is used for generating a target characteristic matrix of a base station mode of the communication network according to the characteristic vector and the weight;
the matrix clustering module is used for clustering the target characteristic matrix to obtain a clustering result so as to analyze a base station mode according to the clustering result;
the weight calculation module specifically comprises:
a first evaluation value calculation unit configured to calculate a first evaluation value of each of the feature vectors of each of the base stations based on the feature vectors;
an evaluation matrix generation unit configured to generate an index evaluation matrix according to the first evaluation value;
and the weight calculation unit is used for calculating the weight of each flow pattern characteristic according to the index evaluation matrix.
8. A base station traffic analysis device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the base station traffic analysis method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium controls an apparatus to execute the base station traffic analysis method according to any one of claims 1 to 6.
CN201810396528.3A 2018-04-27 2018-04-27 Base station flow analysis method, device, equipment and storage medium Active CN108770002B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810396528.3A CN108770002B (en) 2018-04-27 2018-04-27 Base station flow analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810396528.3A CN108770002B (en) 2018-04-27 2018-04-27 Base station flow analysis method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108770002A CN108770002A (en) 2018-11-06
CN108770002B true CN108770002B (en) 2021-08-10

Family

ID=64012225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810396528.3A Active CN108770002B (en) 2018-04-27 2018-04-27 Base station flow analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108770002B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2613189A (en) * 2021-11-26 2023-05-31 British Telecomm Wireless telecommunications network

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110139299B (en) * 2019-05-14 2022-04-01 鹰潭泰尔物联网研究中心 Clustering analysis method for base station flow in cellular network
CN110650058B (en) * 2019-10-08 2022-03-04 河南省云安大数据安全防护产业技术研究院有限公司 Network traffic analysis method, device, storage medium and equipment
CN112235152B (en) * 2020-09-04 2022-05-10 北京邮电大学 Flow size estimation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572881A (en) * 2011-12-29 2012-07-11 华为技术有限公司 Method and device for analyzing and displaying data traffic
CN102695186A (en) * 2012-05-30 2012-09-26 华为技术有限公司 Method and device for regional flow analysis
CN103747477A (en) * 2014-01-15 2014-04-23 广州杰赛科技股份有限公司 Network flow analysis and prediction method and device
WO2015018445A1 (en) * 2013-08-08 2015-02-12 Telecom Italia S.P.A. Management of data collected for traffic analysis
CN106060849A (en) * 2016-05-26 2016-10-26 重庆大学 Network type optimization allocation method in heterogeneous network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572881A (en) * 2011-12-29 2012-07-11 华为技术有限公司 Method and device for analyzing and displaying data traffic
CN102695186A (en) * 2012-05-30 2012-09-26 华为技术有限公司 Method and device for regional flow analysis
WO2015018445A1 (en) * 2013-08-08 2015-02-12 Telecom Italia S.P.A. Management of data collected for traffic analysis
CN103747477A (en) * 2014-01-15 2014-04-23 广州杰赛科技股份有限公司 Network flow analysis and prediction method and device
CN106060849A (en) * 2016-05-26 2016-10-26 重庆大学 Network type optimization allocation method in heterogeneous network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2613189A (en) * 2021-11-26 2023-05-31 British Telecomm Wireless telecommunications network
WO2023094059A1 (en) * 2021-11-26 2023-06-01 British Telecommunications Public Limited Company Clustering of access points in a wireless telecommunications network and configuration of an access point in a cluster
GB2613189B (en) * 2021-11-26 2023-11-22 British Telecomm Wireless telecommunications network

Also Published As

Publication number Publication date
CN108770002A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108770002B (en) Base station flow analysis method, device, equipment and storage medium
CN110362677B (en) Text data category identification method and device, storage medium and computer equipment
CN111898578B (en) Crowd density acquisition method and device and electronic equipment
Wang et al. CLUES: A non-parametric clustering method based on local shrinking
Yan et al. Collaborative filtering based on Gaussian mixture model and improved Jaccard similarity
CN116822651A (en) Large model parameter fine adjustment method, device, equipment and medium based on incremental learning
CN105930859B (en) Radar Signal Sorting Method based on linear manifold cluster
Xu et al. Discriminative analysis for symmetric positive definite matrices on lie groups
Ding et al. Full‐reference image quality assessment using statistical local correlation
Xi et al. Finger vein recognition based on the hyperinformation feature
Wang et al. Mic-kmeans: a maximum information coefficient based high-dimensional clustering algorithm
Liu et al. A weight-incorporated similarity-based clustering ensemble method
Parker et al. Nonlinear time series classification using bispectrum‐based deep convolutional neural networks
CN114417964B (en) Satellite operator classification method and device and electronic equipment
Wan et al. Multivariate time series data clustering method based on dynamic time warping and affinity propagation
CN113515662B (en) Similar song retrieval method, device, equipment and storage medium
Kwon et al. Improved memory-based collaborative filtering using entropy-based similarity measures
Xianjia et al. Just‐in‐Time Human Gesture Recognition Using WiFi Signals
CN113791386A (en) Method, device and equipment for positioning sound source and computer readable storage medium
Madsen et al. Learning combinations of multiple feature representations for music emotion prediction
Feng et al. RF fingerprint extraction and device recognition algorithm based on multi-scale fractal features and APWOA-LSSVM
CN111160969A (en) Power price prediction method and device
Mirzaei et al. Two‐stage blind audio source counting and separation of stereo instantaneous mixtures using Bayesian tensor factorisation
Tian et al. Gesture recognition method based on misalignment mean absolute deviation and KL divergence
Duan et al. Bayesian spiked Laplacian graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant