CN114745731A - Data analysis method, device, equipment and storage medium - Google Patents

Data analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN114745731A
CN114745731A CN202011543493.5A CN202011543493A CN114745731A CN 114745731 A CN114745731 A CN 114745731A CN 202011543493 A CN202011543493 A CN 202011543493A CN 114745731 A CN114745731 A CN 114745731A
Authority
CN
China
Prior art keywords
characteristic information
kqi
wireless side
sequencing
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011543493.5A
Other languages
Chinese (zh)
Inventor
高爱丽
刘阳
吕万
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Beijing Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011543493.5A priority Critical patent/CN114745731A/en
Publication of CN114745731A publication Critical patent/CN114745731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The application discloses a data analysis method, a data analysis device, data analysis equipment and a storage medium. The method specifically comprises the following steps: acquiring KQI of a target service and wireless side characteristic information of a target cell; calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm, and sequencing to obtain a first sequence of the wireless side characteristic information; respectively calculating the importance of the quality difference characteristic information of the wireless side characteristic information and the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information; and combining and sequencing the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference. According to the embodiment of the application, the main cause of the wireless side can be identified when the quality of the KQI index is poor more accurately and efficiently.

Description

Data analysis method, device, equipment and storage medium
Technical Field
The present application belongs to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for data analysis.
Background
Generally, in a wireless communication network, the user experience result of a service running on an application is often represented by a Key Quality Indicator (KQI). For telecommunication operators, when the KQI of some services of a communication service network deteriorates, it is desirable to quickly and accurately locate the wireless side cause affecting the KQI index for corresponding network optimization.
In the related art, the method for determining the main factor characteristics of the wireless side is generally long-term accumulated wireless network optimization experience, and the main factor characteristics of the wireless side influencing the KQI index are analyzed and determined. In the related art, the method for selecting and analyzing the main factor characteristics of the wireless side has certain limitations and the accuracy of the analysis result is not high.
Disclosure of Invention
The embodiment of the application provides a data analysis method, a data analysis device, data analysis equipment and a computer storage medium, which can analyze wireless side characteristic information influencing KQI by combining various algorithms, so that a wireless side main cause when the quality of a KQI index is poor can be identified more accurately and efficiently.
In a first aspect, an embodiment of the present application provides a method for data analysis, where the method includes:
acquiring a key quality index KQI of a target service and wireless side characteristic information of a target cell;
calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm, and sequencing to obtain a first sequence of the wireless side characteristic information;
respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information;
and combining and sequencing the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference.
Optionally, the calculating, by using a first algorithm, a correlation between the wireless-side feature information and the quality difference feature information of the KQI, and performing sequencing to obtain a first sequence of the wireless-side feature information includes:
calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI according to the type of the target cell;
and when the relevancy meets the preset characteristic importance condition, sequencing the wireless side characteristic information corresponding to the relevancy to obtain a first sequence.
Optionally, calculating importance of the wireless side feature information and the quality difference feature information of the KQI by using a second algorithm, and obtaining a second sequence of the wireless side feature information by sequencing, where the second algorithm includes:
calculating a first importance degree of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI;
and sequencing the wireless side characteristic information according to the first importance to obtain a second sequence.
Optionally, a third algorithm is adopted to calculate the importance of the wireless side feature information and the quality difference feature information of the KQI, and obtain a third sequence of the wireless side feature information by sequencing, including:
calculating the evidence weight of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI;
calculating a second importance degree according to the evidence weight;
and when the second importance degree meets the preset threshold condition, sequencing the wireless side characteristic information according to the second importance degree to obtain a third sequence.
Optionally, the step of combining and sorting the first sorting, the second sorting and the third sorting to obtain the wireless side feature information strongly related to the quality difference of the KQI includes:
respectively converting the ranking corresponding to the first ranking, the second ranking and the third ranking into scores;
respectively calculating the average value of the scores corresponding to the first sequence, the second sequence and the third sequence;
according to the preset weight value of each average value, combining and sorting the first sorting, the second sorting and the third sorting to obtain a combined sorting result;
and according to the sequencing result, obtaining the wireless side characteristic information which is strongly related to the KQI quality difference.
Optionally, the obtaining the key quality indicator KQI of the target service and the radio side feature information of the target cell includes:
and preprocessing the KQI and the wireless side characteristic information by using a preset service logic algorithm to obtain the screened KQI and the screened wireless side characteristic information.
Optionally, the second algorithm comprises a Catboost algorithm and the third algorithm comprises an optimal iterative binning algorithm.
In a second aspect, an embodiment of the present application provides an apparatus for data analysis, where the apparatus includes:
the acquisition module is used for acquiring a key quality index KQI of the target service and the wireless side characteristic information of the target cell;
the first calculation module is used for calculating the correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm and obtaining a first sequence of the wireless side characteristic information by sequencing;
the second calculation module is used for respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information;
and the sequencing module is used for carrying out combined sequencing on the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference.
In a third aspect, an embodiment of the present application provides an apparatus for data analysis, where the apparatus includes:
a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the method of data analysis as described in the first aspect and optional aspects of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of data analysis as described in the first aspect and optional embodiments of the first aspect.
The data analysis method, device, equipment and computer storage medium in the embodiments of the application can calculate the correlation between the acquired wireless side characteristic information and the quality difference characteristic information of the KQI by using a first algorithm, and determine a first sequence. And respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively determining a second sequence and a third sequence. Then, the three sorts are combined to determine the wireless side characteristic information which is strongly related to the KQI quality difference. Therefore, the correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI is analyzed based on various different algorithms, the wireless side characteristic information which possibly causes the quality difference of the KQI can be predicted more accurately and efficiently, and the accuracy and the efficiency of the wireless side main cause when the quality difference of the KQI index is determined are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of data analysis provided by some embodiments of the present application;
FIG. 2 is a schematic flow chart diagram of a method of data analysis provided in further embodiments of the present application;
FIG. 3 is a schematic diagram of an apparatus for data analysis according to further embodiments of the present application;
fig. 4 is a hardware structure diagram of a data analysis device provided in some embodiments of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional identical elements in the process, method, article, or apparatus that comprises the element.
In wireless communication networks, the KQI typically includes a weak signal to poor quality ratio, a video katton ratio, a game katton ratio, a web page stretch when opened, and the like. Factors affecting the KQI on the wireless side include, for example, a failure, interference, a physical resource block utilization rate, a number of users, a Measurement Report (MR) coverage, a Channel Quality Indication (CQI) ratio, and the like.
The existing wireless side main factor feature selection is to list wireless side main factors influencing the KQI according to wireless network optimization experience accumulated for a long time. Due to incomplete manual selection consideration, situations of feature misselection, few selections, no selection and the like can be caused. Important feature omission is caused when the experience of an analyst is insufficient. Therefore, the radio-side cause of the poor quality of the KQI index cannot be accurately identified.
In order to solve the prior art problems, embodiments of the present application provide a method, an apparatus, a device, and a computer storage medium for data analysis, which can analyze an association relationship between wireless side feature information and quality difference feature information of a KQI based on multiple different algorithms, can more accurately and efficiently predict wireless side feature information that may cause the quality difference of the KQI, and improve accuracy and efficiency of a wireless side main cause when determining the quality difference of a KQI index.
The following describes a method, an apparatus, a device and a computer storage medium for data analysis according to embodiments of the present application with reference to the accompanying drawings. It should be noted that these examples are not intended to limit the scope of the present disclosure.
First, a method for data analysis provided in the embodiments of the present application will be described.
Fig. 1 is a schematic flow chart of a method for data analysis according to an embodiment of the present application. As shown in fig. 1, in the application embodiment, the method for data analysis may include the following steps:
s101: and acquiring the KQI of the target service and the wireless side characteristic information of the target cell.
Here, a plurality of KQIs of the target service are acquired. The wireless side characteristic information may include a Key Performance Indicator (KPI) index, an MR statistical index, and communication data acquired within a predetermined time period of the target cell at the wireless side of the communication equipment manufacturer.
For example, for a selected amount of data, a week of 12 ten thousand cells in the entire network may be collected, with the frequency of data collection being 1 hour per cell for a single record. The total amount of all data was 120000 × 10 × 24 ═ 28800000.
S102: and calculating the correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm, and sequencing to obtain a first sequence of the wireless side characteristic information.
In some embodiments of the present application, the first algorithm may comprise a correlation algorithm. And calculating the correlation degree between the wireless side characteristic information and the quality difference characteristic information of the KQI through a correlation algorithm, and determining the wireless side characteristic information related to the quality difference of the KQI.
The quality difference characteristic information of the KQI may include quality difference and non-quality difference, and the KQI quality difference means that the KQI quality difference accounts for more than 5%.
The first ordering may be an ordered list that orders the degrees of correlation from high to low.
S103: and respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information.
In some embodiments of the present application, the second algorithm may include a supervised model algorithm such as a Catboost algorithm, and the third algorithm may include a supervised model algorithm such as an optimal iterative binning algorithm.
The second rank and the third rank may be rank lists obtained by ranking the importance degrees from high to low, respectively.
S104: and combining and sequencing the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference.
Here, the first rank, the second rank, and the third rank may be ranked lists including quality difference feature information of the KQI and corresponding wireless-side feature information.
In summary, the data analysis method according to the embodiment of the present application can calculate the correlation between the acquired wireless-side characteristic information and the quality difference characteristic information of the KQI by using the first algorithm, and determine the first rank. And respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively determining a second sequence and a third sequence. Then, the three sorts are combined to determine the wireless side characteristic information which is strongly related to the KQI quality difference. Therefore, the correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI is analyzed based on various different algorithms, the wireless side characteristic information which possibly causes the quality difference of the KQI can be predicted more accurately and efficiently, and the accuracy and the efficiency of the wireless side main cause when the quality difference of the KQI index is determined are improved.
In order to better explain the method for data analysis in the embodiments of the present application, the following describes in detail the implementation of the method. In some embodiments of the present application, as shown in fig. 2, fig. 2 is a schematic flow chart of a method for data analysis provided in another embodiment of the present application. The method can be embodied as the following steps:
s201: and acquiring the KQI of the target service and the wireless side characteristic information of the target cell.
In some embodiments of the present application, various raw data collected may need to be preprocessed. Preprocessing may include data cleansing, culling and replacement of outliers, and interpolation of missing values.
In some embodiments of the present application, data cleansing may include culling or replacing "dirty data" from various raw data collected. Illustratively, for numerical data comprising "# N/A", "# VALUE! "," NIL ","/0 ", etc., which may be set to null. For the problem that the dimension of the scale-occupying index is not consistent, for convenience of calculation, the dimension of the index can be unified between 0 and 1, and normalization processing is performed.
In some embodiments of the present application, outliers, also known as outliers, refer to individual values in a sample whose values deviate significantly from the observed values of a majority of the sample to which they pertain. The abnormal value processing can adopt a physical discrimination combined with a model identification method. Illustratively, whether data with obviously abnormal value ranges exist is checked according to exploration analysis, if the call completing rate is-3.5%, and values are removed and replaced according to physical judgment. And the abnormal value can be removed and replaced by adopting a mean triple standard deviation algorithm, a box line method and an algorithm based on skewness.
In some embodiments of the present application, the missing value processing may include performing the missing value processing by using a mean interpolation method, a Hot Dec interpolation method, a regression interpolation method, and a multiple interpolation method. In addition, deletion of a portion of high-dropout data may be considered based on dropout analysis and actual requirements.
It is to be understood that the manner of preprocessing of the corresponding data may not be limited herein. In practical application, a suitable data preprocessing mode can be selected according to specific requirements.
S202: and preprocessing the KQI and the wireless side characteristic information by using a preset service logic algorithm to obtain the screened KQI and the screened wireless side characteristic information.
In some embodiments of the present application, the key quality indicator KQI and the wireless side feature information may also be pre-screened by using a preset service logic algorithm, so as to obtain the screened key quality indicator KQI and the wireless side feature information.
Here, the preset business logic algorithm may be a business logic analysis algorithm according to expert experience. And searching the wireless side characteristic information which possibly has influence on the KQI according to the service logic.
Illustratively, 20 pieces of wireless side characteristic information are analyzed through business logic according to expert experience, wherein the target business KQI is a micro-business. The capacity class has 6 characteristics, the quality class has 5 characteristics, the time delay class has 4 characteristics, and the coverage class has 5 characteristics.
S203: and calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI according to the type of the target cell.
S204: and when the correlation degree meets a preset feature importance condition, sorting the wireless side feature information to obtain a first sort.
Here, the preset feature importance condition may be a characteristic value of feature importance determined according to the type of the target cell and the corresponding degree of correlation of the target cell. And sequencing the wireless side feature information according to a preset feature importance condition.
In some embodiments of the present application, a correlation degree between the wireless-side characteristic information and the quality difference characteristic information of the KQI and an autocorrelation degree between the wireless-side characteristic information are calculated using the pearson correlation coefficient. The pearson correlation coefficient calculation formula (1) is as follows:
Figure BDA0002855147720000081
wherein r represents a Pearson correlation coefficient, X, Y represents quality difference characteristic information between radio side characteristic information and KQI,
Figure BDA0002855147720000082
the average value of the wireless side characteristic information and the average value of the quality difference characteristic information of the KQI are obtained.
In some embodiments of the present application, since the wireless side characteristic information X with high correlation between the quality difference characteristic information Y of different cells and the KQI may be different, the same X of different cells may cancel the correlation problem if the calculation is performed directly. In order to improve the accuracy of calculation and facilitate the feature relevance ranking, the algorithm process of calculating and sequencing the relevance between the wireless side feature information and the quality difference feature information of the KQI is as follows:
firstly, grouping according to a target cell, and respectively calculating the correlation between the quality difference characteristic information of each wireless side characteristic information and the KQI of each cell;
next, a cell number ratio with an absolute value of correlation greater than 0.3 is calculated as an "importance" measure between the radio-side characteristic information and the quality difference characteristic information of KQI, and the calculation formula (2) is as follows:
IMC(i)=count(ABS(r)>0.3)/count(NOTNULL(r)) (2)
IMC (i) represents the cell proportion of which the absolute value of the correlation degree of the ith characteristic is greater than 0.3, count represents counting, ABS represents the absolute value, r represents the correlation coefficient, and NOTCLL represents non-null.
Ranking according to the IMC, and sequencing the wireless side characteristic information. In addition, before sorting, according to a conservative elimination method, the characteristic that the IMC value is less than 0.01 can be eliminated.
In some embodiments of the present application, multicollinearity refers to that model estimation is distorted or difficult to estimate accurately due to the existence of a high correlation between explanatory variables (X), and should be eliminated. Therefore, in order to reduce the adverse effect of the height correlation on the prediction accuracy of the algorithm, the autocorrelation between the characteristic information of each wireless side and the autocorrelation between the quality difference characteristic information of the KQI are calculated, a correlation coefficient matrix is formed, strong correlation characteristics are marked, and the information with the strong correlation characteristics is removed.
S205: and calculating the first importance of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI.
S206: and sequencing the wireless side characteristic information according to the first importance to obtain a second sequence.
In some embodiments of the present application, the performing step of calculating the importance of the wireless-side characteristic information and the quality difference characteristic information of the KQI by using the second algorithm and obtaining the second ranking of the wireless-side characteristic information by ranking may include S205 to S206.
In some embodiments of the present application, the second algorithm may be a Catboost algorithm. And calculating the first importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a Catboost algorithm, and determining the second sequence of the wireless side characteristic information in a sequence manner.
S207: and calculating the evidence weight of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI.
S208: and calculating a second importance degree according to the evidence weight.
S209: and when the second importance degree meets the preset threshold condition, sequencing the wireless side characteristic information according to the second importance degree to obtain a third sequence.
In some embodiments of the present application, the performing step of calculating the importance of the wireless-side characteristic information and the quality difference characteristic information of the KQI by using the third algorithm and obtaining the third ranking of the wireless-side characteristic information by ranking may include S207 to S209.
In some embodiments of the present application, the third algorithm may be an optimal iterative binning algorithm. The realization of the optimal iterative binning algorithm comprises the following processes:
first, binning is initialized. Initializing according to the minimum sample number (min _ per) of each type to obtain the initialized maximum bin number (max _ bins), and calculating formula (3) as follows:
max_bins=trunc(1/min_per) (3)
where trunc denotes rounding.
Second, an initial segmentation point is determined. Calculating initial segmentation points of each feature by taking the maximum box-separating sequence quotient as a quantile, wherein a calculation formula (4) is as follows:
cutp=quantile[df,(0:max_bins)/max_bins] (4)
where quantile is a quantile function, df is a characteristic, and 0: max _ bins represents a sequence of integers from 0 to max _ bins.
And thirdly, merging the sparse target classes. If the percentage of one target class column in a bin is below the limit, e.g., below 0.1%, the corresponding bin will be connected to other bins, and in the case of numerical variables, adjacent predictor classes are merged.
And finally, iteration binning and merging. The iterative binning merge includes an initial merge and a final merge.
And (5) initially combining. Bins with similar Evidence Weights (WOE) are merged and the corresponding WOE and Information Value (IV) are computed step by step. Here, the information value IV indicates the second importance. The WOE represents the difference between "the proportion of the number of quality difference samples in the current packet to all the quality difference samples" and "the proportion of the number of non-quality difference samples in the current packet to all the non-quality difference samples", and the difference is larger the WOE is.
WOE is calculated as follows:
Figure BDA0002855147720000101
wherein, pyiIs the proportion of response (bad quality) samples in this group to all response samples, pniIs the proportion of all unresponsive samples in the unresponsive (non-poor quality) samples in this group, # yiIs the number of responses in this group, # niIs the number of unresponsives in this group, # yTIs the number of all responses in the sample, # nTIs the number of all unresponsives in the sample.
The calculation formula (6) of IV is as follows:
Figure BDA0002855147720000102
based on the IV of each group of one feature variable, the IV of the entire feature variable can be calculated using the summation equation (7):
Figure BDA0002855147720000103
and finally merging. The nearby bins with the most similar evidence weight are connected step by step until the IV decreases by more than a preset percentage value threshold or a specified value, and the merging stops. And performing loop iteration through the preset hyper-parameters of the algorithm model until the IV of each feature reaches the maximum under the condition of meeting the stop merging condition. And the information value IV is used as a second importance degree, namely the feature importance, and the feature information is ranked according to the second importance degree to obtain a third ranking.
In addition, in some embodiments of the present application, the optimal iterative binning algorithm further includes extreme processing.
Illustratively, if the cross table with the target class box contains a frequency of 0, then adjust to be able to compute WOE and IV, add the offset ρ to each column percentage, and then recalculate. If extreme WOE occurs, it is merged into intervals with sparse target classes.
Here, the IV calculated using the optimal iterative binning may be sorted not only according to the IV size but also divided into importance levels for the IV as a feature importance. The algorithm can simultaneously box continuous variables and type variables, and is powerful in function and stronger in interpretability.
S210: and respectively converting the ranking corresponding to the first ranking, the second ranking and the third ranking into scores.
S211: and respectively calculating the average value of the scores corresponding to the first sequence, the second sequence and the third sequence.
S212: and according to the preset weight value of each average value, combining and sorting the first sorting, the second sorting and the third sorting to obtain a combined sorting result.
S213: and according to the sequencing result, obtaining the wireless side characteristic information which is strongly related to the KQI quality difference.
In some embodiments of the present application, after obtaining the first rank, the second rank, and the third rank of the wireless-side feature information, a combined ranking algorithm may be used to perform a combined ranking, so as to obtain wireless-side feature information that is strongly related to the quality of the KQI.
In some embodiments of the present application, the first rank, the second rank, and the third rank are determined by different algorithmic processes, which may be regarded as three ranks obtained by three different ranking methods. First, the rank of the rank obtained by each ranking method can be converted into a score using a ranking scoring method, and equation (8) is calculated as follows:
Rk=n-rk+1 (8)
Rkrepresents a score, rkThe rank under the k-th sorting method is represented by i ═ 1,2, … n, k ═ 1,2, … p, i.e., 1 st n score, …, n-th 1 score, and k-th n-k +1 score. If there are the same ranking, the score can be averaged over the scores of the several locations.
Then, the average of the scores for each ranking method is calculated, and formula (9) is calculated as follows:
Figure BDA0002855147720000111
the mean values scored according to each ranking method were reordered by mean.
In some embodiments of the present application, since there is a difference in importance between different sorting methods, weight values corresponding to the different sorting methods may be predetermined. And sorting the average value of the scores of each sorting method according to the sequence from large to small of the preset weight value. And then, combining and sorting the first sorting, the second sorting and the third sorting according to the sorting result of the average value to obtain a combined sorting result. Illustratively, the first rank corresponds to a preset weight value
In some embodiments of the present application, an Analytic Hierarchy Process (AHP) may be used to determine weight values corresponding to different ranking methods in advance.
In some embodiments of the present application, after the ranking result is determined, the ranking result may be corrected by combining the feature missing rate and the service logic, including considering the prediction capability of the wireless side feature information, whether the wireless side feature information has a network optimization value, and the like, and finally determining the wireless side feature information strongly related to the KQI quality difference. Therefore, the accuracy of determining the wireless side main factor corresponding to the KQI quality difference can be improved.
In summary, the method for data analysis provided in the embodiment of the present application can analyze the association relationship between the wireless side feature information and the quality difference feature information of the KQI based on a plurality of different algorithms, and determine the feature importance relationship between the two features, so as to predict the wireless side feature information which may cause the quality difference of the KQI more accurately and efficiently, and improve the accuracy and efficiency of the wireless side main cause when determining the quality difference of the KQI index.
Based on the data analysis method provided by the above embodiment, correspondingly, the application also provides a specific implementation manner of the data analysis device. Please see the examples below.
As shown in fig. 3, fig. 3 is a schematic structural diagram of a data analysis apparatus according to another embodiment of the present application. In an embodiment of the present application, the data analysis apparatus may include:
an obtaining module 301, configured to obtain a key quality indicator KQI of the target service and wireless side feature information of the target cell.
A first calculating module 302, configured to calculate a correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI by using a first algorithm, and obtain a first sequence of the wireless side characteristic information by sequencing.
The second calculating module 303 is configured to calculate importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by using a second algorithm and a third algorithm, and sort the importance to obtain a second rank and a third rank of the wireless side characteristic information.
A sorting module 304, configured to perform combined sorting on the first sorting, the second sorting, and the third sorting to obtain wireless side feature information that is strongly related to the KQI quality difference.
Each module/unit in the apparatus shown in fig. 3 has a function of implementing each step in fig. 1 and 2, and can achieve the corresponding technical effect, and for brevity, no further description is provided herein.
In summary, the data analysis apparatus provided in this embodiment of the present application may be used to execute the data analysis method described in the above embodiments, and the method may analyze an association relationship between the wireless side feature information and the quality difference feature information of the KQI based on a plurality of different algorithms, and determine a feature importance relationship between the two features, so as to predict the wireless side feature information that may cause the quality difference of the KQI more accurately and efficiently, and improve accuracy and efficiency of the wireless side main cause when determining the quality difference of the KQI index.
Based on the data analysis method provided by the above embodiment, correspondingly, the application also provides a specific implementation manner of the data analysis device. Please see the examples below.
Fig. 4 is a hardware structure diagram of a data analysis device provided in some embodiments of the present application.
The apparatus for data analysis may comprise a processor 401 and a memory 402 in which computer program instructions are stored.
Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 402 is a non-volatile solid-state memory. In a particular embodiment, the memory 402 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement the method of data analysis in any of the above embodiments.
In one example, the data analysis device may also include a communication interface 403 and a bus 410. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 410 includes hardware, software, or both to couple the components of the data analysis device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industrial Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industrial Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 410 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the present application, any suitable buses or interconnects are contemplated by the present application.
The data analysis device may perform the method of data analysis in the embodiments of the present application, thereby implementing the method of data analysis described in conjunction with fig. 1 and 2.
In addition, in combination with the method for data analysis in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the method of any of the above embodiments of data analysis.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (10)

1. A method of data analysis, comprising:
acquiring a key quality index KQI of a target service and wireless side characteristic information of a target cell;
calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm, and sequencing to obtain a first sequence of the wireless side characteristic information;
respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information;
and combining and sequencing the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference.
2. The method according to claim 1, wherein the calculating the correlation between the radio-side characteristic information and the quality difference characteristic information of the KQI by using a first algorithm and obtaining a first sequence of the radio-side characteristic information by sequencing comprises:
calculating the correlation degree of the wireless side characteristic information and the quality difference characteristic information of the KQI according to the type of the target cell;
and when the relevancy meets a preset characteristic importance condition, sorting the wireless side characteristic information corresponding to the relevancy to obtain a first sort.
3. The method according to claim 1, wherein the calculating the importance of the wireless-side characteristic information and the quality difference characteristic information of the KQI by using the second algorithm and obtaining the second sequence of the wireless-side characteristic information by sequencing comprises:
calculating a first importance of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI;
and sorting the wireless side characteristic information according to the first importance to obtain a second sorting.
4. The method according to claim 1, wherein the calculating the importance of the wireless-side characteristic information and the quality difference characteristic information of the KQI by using a third algorithm and obtaining a third sequence of the wireless-side characteristic information by sequencing comprises:
calculating the evidence weight of the wireless side characteristic information corresponding to the quality difference characteristic information of the KQI according to the quality difference characteristic information of the KQI;
calculating a second importance degree according to the evidence weight;
and when the second importance degree meets a preset threshold condition, sequencing the wireless side characteristic information according to the second importance degree to obtain a third sequence.
5. The method according to claim 1, wherein the combining and sorting the first sorting, the second sorting and the third sorting to obtain the radio side characteristic information strongly correlated to the KQI quality difference comprises:
respectively converting the ranking corresponding to the first ranking, the second ranking and the third ranking into scores;
respectively calculating the average value of the scores corresponding to the first sequence, the second sequence and the third sequence;
combining and sorting the first sorting, the second sorting and the third sorting according to the preset weight value of each average value to obtain a combined sorting result;
and obtaining the wireless side characteristic information which is strongly related to the KQI quality difference according to the sequencing result.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the Key Quality Indicator (KQI) of the target service and the radio side feature information of the target cell comprises:
and preprocessing the KQI and the wireless side characteristic information by using a preset service logic algorithm to obtain the screened KQI and the screened wireless side characteristic information.
7. The method according to any one of claims 1 to 5, wherein the second algorithm comprises a Catboost algorithm and the third algorithm comprises an optimal iterative binning algorithm.
8. An apparatus for data analysis, the apparatus comprising:
the acquisition module is used for acquiring a key quality index KQI of the target service and the wireless side characteristic information of the target cell;
the first calculation module is used for calculating the correlation between the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a first algorithm and obtaining a first sequence of the wireless side characteristic information by sequencing;
the second calculation module is used for respectively calculating the importance of the wireless side characteristic information and the quality difference characteristic information of the KQI by adopting a second algorithm and a third algorithm, and respectively sequencing to obtain a second sequence and a third sequence of the wireless side characteristic information;
and the sequencing module is used for carrying out combined sequencing on the first sequencing, the second sequencing and the third sequencing to obtain the wireless side characteristic information which is strongly related to the KQI quality difference.
9. An apparatus for data analysis, the apparatus comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method of data analysis as claimed in any one of claims 1 to 7.
10. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement a method of data analysis as claimed in any one of claims 1 to 7.
CN202011543493.5A 2020-12-24 2020-12-24 Data analysis method, device, equipment and storage medium Pending CN114745731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011543493.5A CN114745731A (en) 2020-12-24 2020-12-24 Data analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011543493.5A CN114745731A (en) 2020-12-24 2020-12-24 Data analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114745731A true CN114745731A (en) 2022-07-12

Family

ID=82273969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011543493.5A Pending CN114745731A (en) 2020-12-24 2020-12-24 Data analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114745731A (en)

Similar Documents

Publication Publication Date Title
KR101984730B1 (en) Automatic predicting system for server failure and automatic predicting method for server failure
CN111126824B (en) Multi-index correlation model training method and multi-index anomaly analysis method
CN108322347B (en) Data detection method, device, detection server and storage medium
CN108683530B (en) Data analysis method and device for multi-dimensional data and storage medium
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN112214577B (en) Method, device, equipment and computer storage medium for determining target user
CN109993183B (en) Network fault evaluation method and device, computing equipment and storage medium
CN113591393A (en) Fault diagnosis method, device, equipment and storage medium of intelligent substation
CN116126642A (en) Information processing method, device, equipment and storage medium
CN113810792A (en) Edge data acquisition and analysis system based on cloud computing
US20110015967A1 (en) Methodology to identify emerging issues based on fused severity and sensitivity of temporal trends
CN111783883A (en) Abnormal data detection method and device
CN112035286A (en) Method and device for determining fault cause, storage medium and electronic device
CN111343664B (en) User positioning method, device, equipment and medium
CN114745731A (en) Data analysis method, device, equipment and storage medium
CN105634781B (en) Multi-fault data decoupling method and device
CN116668264A (en) Root cause analysis method, device, equipment and storage medium for alarm clustering
CN110880117A (en) False service identification method, device, equipment and storage medium
CN115860856A (en) Data processing method and device, electronic equipment and storage medium
CN115878171A (en) Middleware configuration optimization method, device, equipment and computer storage medium
CN114828055A (en) User service perception evaluation method, device, equipment, medium and program product
CN114417830A (en) Risk evaluation method, device, equipment and computer readable storage medium
CN114760190A (en) Service-oriented converged network performance anomaly detection method
CN109993388B (en) Method, device, equipment and medium for judging cause of deteriorated cell
CN114281808A (en) Traffic big data cleaning method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination