CN114722061B

CN114722061B - Data processing method and device, equipment and computer readable storage medium

Info

Publication number: CN114722061B
Application number: CN202210370986.6A
Authority: CN
Inventors: 谭涵秋; 宋捷
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2023-11-14
Anticipated expiration: 2042-04-08
Also published as: CN114722061A

Abstract

The embodiment of the application discloses a data processing method, a data processing device, data processing equipment and a computer readable storage medium. The method comprises the following steps: inputting the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; if the reconstruction error value is larger than the preset error threshold value, carrying out deviation comparison on the characteristic data and abnormal characteristic data contained in a preset storage library to obtain a deviation rate of the characteristic data relative to the abnormal characteristic data; determining the abnormal condition of the characteristic data representation based on the relation between the deviation rate and a preset deviation threshold value; if the characteristic data representation has new abnormality, the characteristic data is stored in a preset storage library. According to the method, the self-encoder model is utilized, and the abnormal characteristic data in the measurement report data can be accurately determined through reconstructing the error value.

Description

Data processing method and device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data processing method and apparatus, a device, and a computer readable storage medium.

Background

At present, many network configurations of 5G are in a mode of 'singleness' and 'unification', and the requirement of network fine management cannot be met. Particularly, when the network is abnormal, the network cannot be automatically repaired in real time, a large amount of characteristic data exists in the measurement report data, the relevant characteristic data is compared with the abnormal characteristic data stored in the storage library, whether the relevant characteristic data is abnormal or not is determined, so that the abnormality of the measurement report data is reflected, and whether the base station generating the measurement report data is abnormal or not is determined.

However, the prior art does not have complete usage of mining feature parameters, and can not update abnormal feature data in a storage library in real time, and can not accurately determine abnormal measurement report data, so that a corresponding abnormal processing scheme can not be matched.

Therefore, how to improve the accuracy of determining abnormal feature data in measurement report data is a highly desirable problem.

Disclosure of Invention

To solve the above technical problems, embodiments of the present application provide a data processing method and apparatus, an electronic device, and a computer readable storage medium, respectively, which determine whether feature data is abnormal by reconstructing an error value, and determine a situation that feature data represents the abnormality according to a deviation rate of the feature data relative to the abnormal feature data.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to an aspect of an embodiment of the present application, there is provided a data processing method including: inputting the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; if the reconstruction error value is larger than a preset error threshold value, carrying out deviation comparison on the characteristic data and abnormal characteristic data contained in a preset storage library to obtain a deviation rate of the characteristic data relative to the abnormal characteristic data; determining the abnormal condition of the characteristic data representation based on the relation between the deviation rate and a preset deviation threshold value; and if the characteristic data representation has a new abnormality, storing the characteristic data into the preset storage library.

In another embodiment, before the feature data in the measurement report data is input into the trained self-encoder model to obtain the reconstruction error value corresponding to the feature data, the method further includes: constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the measurement report sample data to obtain the characteristic sample data; inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data; and if the similarity coefficient of the characteristic sample data is smaller than a preset similarity threshold value, training the initial self-encoder model by using the characteristic sample data to obtain the trained self-encoder model.

In another embodiment, the preprocessing the feature data extracted from the measurement report sample data to obtain feature sample data includes: clustering the feature data extracted from the measurement report sample data to obtain clustered feature data of a plurality of categories; performing two-classification processing on the clustering feature data of the multiple categories respectively to obtain the contribution degree of the clustering feature data of each category; and determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as the characteristic sample data.

In another embodiment, the inputting the feature sample data into the initial self-encoder model, to obtain the similarity coefficient of the feature sample data, includes: inputting the characteristic sample data into the initial self-encoder model to obtain the degree of abnormality of the characteristic sample data and the abnormal distance between the characteristic sample data and standard abnormal characteristic data in the initial self-encoder model; and calculating a similarity coefficient of the characteristic sample data based on the anomaly degree and the anomaly distance.

In another embodiment, the calculating the similarity coefficient of the feature sample data based on the anomaly degree and the anomaly distance includes: acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance; performing multiplication operation on the anomaly degree and a weight value corresponding to the anomaly degree to obtain a first value, and performing multiplication operation on the anomaly distance and a weight value corresponding to the anomaly distance to obtain a second value; and carrying out summation operation on the first value and the second value to obtain the similarity coefficient of the characteristic sample data.

In another embodiment, the determining the characteristic data characterizes an abnormal condition based on the relationship between the deviation rate and a preset deviation threshold value includes: comparing the deviation rate with the preset deviation threshold value; if the deviation rate is larger than the preset deviation rate threshold value, determining that new abnormality exists in the characteristic data representation; and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the storing the feature data in the preset repository includes: determining an exception handling scheme corresponding to the new exception characterized by the feature data; the exception handling scheme is used for handling the exception so as to restore normal operation; and storing the exception handling scheme and the characteristic data in the preset storage library in an associated mode.

According to an aspect of an embodiment of the present application, there is provided a data processing apparatus including:

the acquisition module is configured to input the characteristic data in the measurement report data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; the comparison module is configured to compare the characteristic data with abnormal characteristic data contained in a preset storage library in a deviation mode if the reconstruction error value is larger than a preset error threshold value, so that the deviation rate of the characteristic data relative to the abnormal characteristic data is obtained; a determining module configured to determine, based on a relationship between the deviation rate and a preset deviation threshold, a condition in which the feature data characterizes an anomaly; and the updating module is configured to store the characteristic data into the preset storage library if the characteristic data represents that a new abnormality exists.

In another embodiment, the data processing apparatus further includes: the model construction module is configured to construct an initial self-encoder model, and preprocesses the characteristic data extracted from the measurement report sample data to obtain characteristic sample data; inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data; and if the similarity coefficient of the characteristic sample data is smaller than a preset similarity threshold value, training the initial self-encoder model by using the characteristic sample data to obtain the trained self-encoder model.

In another embodiment, the model building module comprises: the preprocessing unit is configured to perform clustering processing on the characteristic data extracted from the measurement report sample data to obtain clustered characteristic data of a plurality of categories; the classification unit is configured to respectively perform two classification treatments on the clustering feature data of the multiple categories to obtain the contribution degree of the clustering feature data of each category; and the determining unit is used for determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as the characteristic sample data.

In another embodiment, the model building module comprises: an anomaly parameter unit configured to input the feature sample data into the initial self-encoder model, to obtain an anomaly degree of the feature sample data, and an anomaly distance of the feature sample data from standard anomaly feature data in the initial self-encoder model; and a similarity coefficient calculation unit configured to calculate a similarity coefficient of the feature sample data based on the abnormality degree and the abnormality distance.

In another embodiment, the similarity coefficient calculation unit further includes: obtaining a plate: is configured to acquire a weight value corresponding to the degree of abnormality and a weight value corresponding to the distance of abnormality. An operation plate: the method comprises the steps of obtaining a first value by multiplying the anomaly degree by a weight value corresponding to the anomaly degree, and obtaining a second value by multiplying the anomaly distance by a weight value corresponding to the anomaly distance. Summation plate: and the similarity coefficient of the characteristic sample data is obtained by carrying out summation operation on the first value and the second value.

In another embodiment, the summing block is specifically configured to: acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance; performing multiplication operation on the anomaly degree and a weight value corresponding to the anomaly degree to obtain a first value, and performing multiplication operation on the anomaly distance and a weight value corresponding to the anomaly distance to obtain a second value; and carrying out summation operation on the first value and the second value to obtain the similarity coefficient of the characteristic sample data.

In another embodiment, the determining module is specifically configured to compare the deviation rate with the preset deviation threshold; if the deviation rate is larger than the preset deviation rate threshold value, determining that new abnormality exists in the characteristic data representation; and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the updating module is specifically configured to determine an exception handling scheme corresponding to the new exception characterized by the feature data; the exception handling scheme is used for handling the exception so as to restore normal operation; and storing the exception handling scheme and the characteristic data in the preset storage library in an associated mode.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a controller; and a memory for storing one or more programs that when executed by the controller perform the method described above.

According to an aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions, which when executed by a processor of a computer, cause the computer to perform the above-described method.

According to an aspect of embodiments of the present application, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method described above.

In the technical scheme provided by the embodiment of the application, the characteristic data in the measurement report data are input into the trained self-encoder model to obtain the reconstruction error value corresponding to the characteristic data, whether the characteristic data are abnormal or not is determined through the reconstruction error value, the characteristic data representing abnormal conditions are determined according to the deviation rate of the characteristic data relative to the abnormal characteristic data, and the characteristic data with new abnormal conditions are stored into a preset storage library, so that the abnormal characteristic data in the storage library are updated. According to the technical scheme, the self-encoder model is utilized, the abnormal characteristic data in the measurement report data can be accurately determined through reconstructing the error value, and the determined new abnormal characteristic data is used for updating the abnormal characteristic data in the storage library, so that the storage library is the latest real-time abnormal characteristic data, and the corresponding abnormal investigation is more accurate by utilizing the related abnormal characteristic data contained in the storage library in the later stage, so that the abnormal measurement report data is more accurately determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment in which the present application is directed;

FIG. 2 is a flow chart of a data processing method according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a process for constructing a self-encoder model, as proposed based on the embodiment shown in FIG. 2;

FIG. 4 is a flow chart of a process for preprocessing feature data that is proposed based on the embodiment shown in FIG. 3;

FIG. 5 is a flow chart of a process for calculating similarity coefficients for feature sample data based on the embodiment of FIG. 3;

FIG. 6 is a flow chart of a process for calculating similarity coefficients for feature sample data based on the embodiment shown in FIG. 5;

FIG. 7 is a flow chart of a process for characterizing anomalies based on the determined characteristic data as set forth in the embodiment of FIG. 3;

FIG. 8 is a flow chart of an associative memory process for an exception handling scheme based on the embodiment of FIG. 2;

FIG. 9 is a schematic diagram illustrating a process of automatically handling abnormal situations by a base station based on the data processing method of the present application according to an exemplary embodiment of the present application;

FIG. 10 is a schematic diagram of a data processing apparatus according to an exemplary embodiment of the present application;

fig. 11 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In the present application, the term "plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Referring first to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to the present application. The implementation environment comprises a terminal 100, a server 200 and a base station 300, wherein the terminal 100, the server 200 and the base station 300 communicate with each other through a wired or wireless network.

The user terminal 100 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, etc., for example, any electronic device capable of implementing image visualization, such as a smart phone, a tablet, a notebook computer, a computer, etc., which is not limited in this regard.

The terminal 100 can generate MR (Measurement Report ) data reflecting information such as network quality, user behavior habits, and surrounding environment. The terminal 100 can transmit MR data to the server 200 in real time.

The server 200 extracts the feature data from the received MR data, inputs the feature data into a pre-trained self-encoder model, obtains a reconstruction error value corresponding to the feature data output from the self-encoder model, performs threshold judgment according to the reconstruction error value, obtains the deviation rate of the feature data, and stores the feature data in a storage library as new abnormal feature data if the deviation rate of the feature data is greater than a second preset threshold. The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, where a plurality of servers may form a blockchain, and the servers are nodes on the blockchain, and the server 200 may also be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms, which are not limited herein.

The base station 300 is used as a data transfer station, which provides communication service for the terminal 100, the request signal sent by the corresponding terminal 100 needs to be transmitted to the operator through the base station 300, and meanwhile, if the server 200 detects that the MR data transmitted by the terminal 100 is abnormal, the server 200 sends the corresponding abnormality processing scheme to the base station 300, and performs abnormality processing on the corresponding abnormality processing scheme.

The embodiment can detect whether the measurement report data sent by the terminal is abnormal in real time, if so, the corresponding abnormal processing scheme is sent to the base station for abnormal processing, so that the base station can automatically process abnormal conditions in real time, and the method is suitable for scenes with high data real-time requirements such as overhaul of the communication base station.

The prior method generally inputs all data except abnormal characteristic data into a self-encoder model as normal data, but a large amount of noise often exists, the self-encoder can directly influence the distribution of a self-encoder learning normal sample, and the learned distribution is inaccurate.

Referring now to FIG. 2, FIG. 2 is a flowchart illustrating a data processing method that may be performed in particular by server 200 in the implementation environment of FIG. 1 according to an exemplary embodiment of the present application. Of course, the method may also be applied to other implementation environments and executed by a server device in other implementation environments, which is not limited by the present embodiment. As shown in fig. 2, the method at least includes S210 to S240, which are described in detail as follows:

S210: and inputting the characteristic data in the measurement report data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data.

The measurement report data is data reported by the user terminal in real time, and reflects the information such as network quality, user behavior habit, surrounding environment and the like.

The self-Encoder model (AE) of this embodiment is a model trained in advance, and is trained using normal feature data during training.

The reconstruction error value is used for judging whether the characteristic data input from the encoder model is abnormal or not, and whether the characteristic data is abnormal or not is determined by comparing the preset reconstruction error value with the reconstruction error value output from the encoder model, so that whether the measurement report data corresponding to the characteristic data is abnormal or not is determined.

For the exemplary illustration of S210, first, the measurement report data is preprocessed, including null filling, abnormal data removing, and normalization processing, to obtain preprocessed measurement report data. And extracting characteristic data of the preprocessed measurement report data, and inputting the characteristic data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data.

S220: if the reconstruction error value is larger than the preset error threshold value, the characteristic data and the abnormal characteristic data contained in the preset storage library are subjected to deviation comparison, and the deviation rate of the characteristic data relative to the abnormal characteristic data is obtained.

The deviation rate in this embodiment refers to the deviation rate of the feature data relative to the abnormal feature data, and if the feature data is deviated from the abnormal feature data, a corresponding deviation amount is obtained, and the deviation amount is divided by the total amount of the abnormal feature data to obtain the deviation rate of the feature data relative to the abnormal feature data.

And determining the characteristic data with the reconstruction error value larger than the preset reconstruction error value as abnormal data, and in order to further determine whether the characteristic data is the abnormal characteristic data, performing deviation comparison on the characteristic data and the abnormal characteristic data contained in the storage library to obtain the deviation rate of the characteristic data relative to the abnormal characteristic data.

S230: and determining the abnormal condition of the characteristic data representation based on the relation between the deviation rate and a preset deviation threshold value.

And determining the abnormal condition of the characteristic data representation according to the magnitude relation between the deviation rate and the preset deviation threshold. For example, if the deviation rate is greater than a preset deviation threshold, determining that a new anomaly is present in the characterization data representation; if the deviation rate is smaller than or equal to the preset deviation threshold value, determining that the characterization data is not abnormal or no new abnormality occurs.

S240: if the characteristic data representation has new abnormality, the characteristic data is stored in a preset storage library.

The embodiment stores the characteristic data representing the existence of new anomalies into a preset storage library so as to update the anomaly characteristic data in the preset storage library.

The preset repository of the present embodiment includes exception feature data and an exception handling scheme for the exception feature data, and of course, other data may also be included, which is not limited herein.

In the embodiment, feature data in measurement report data are input into a trained self-encoder model to obtain a reconstruction error value corresponding to the feature data, whether the feature data are abnormal or not is determined through the reconstruction error value, the abnormal condition of the feature data is determined according to the deviation rate of the feature data relative to the abnormal feature data, and the feature data with new abnormality are stored into a preset storage library, so that the abnormal feature data in the storage library are updated. In this embodiment, the self-encoder model is utilized, and by reconstructing the error value, the abnormal feature data in the measurement report data can be accurately determined, and the determined new abnormal feature data is used for updating the abnormal feature data in the repository, so that the repository is the latest real-time abnormal feature data, and the corresponding abnormal investigation is more accurate by utilizing the relevant abnormal feature data contained in the repository in the later stage, thereby determining the abnormal measurement report data more accurately.

Referring to fig. 3, fig. 3 is a flowchart of a process for constructing a self-encoder model according to the embodiment shown in fig. 2. S310 to S330 are further included before S210 shown in fig. 2, and are described in detail below:

s310: and constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the measurement report sample data to obtain the characteristic sample data.

The initial self-encoder model of this embodiment can learn the sample distribution of normal feature data with the minimized reconstruction error as an objective function.

The preprocessing of the feature data in this embodiment includes: feature data that contributes little to the self-encoder model is screened out. For example, the clustering model KMEANS is utilized to preprocess the feature data, the contribution degree of the feature data to the KMEANS model is obtained by means of the classification model, and then the feature data is screened according to the contribution degree.

S320: and inputting the characteristic sample data into the initial self-encoder model to obtain the similarity coefficient of the characteristic sample data.

The similarity coefficient of the feature sample data can be obtained by inputting the feature sample data into the initial self-encoder model of the present embodiment, wherein the initial self-encoder model is an untrained self-encoder model.

S330: if the similarity coefficient of the characteristic sample data is smaller than the preset similarity threshold value, training the initial self-encoder model by utilizing the characteristic sample data to obtain a trained self-encoder model.

The preset similarity threshold in this embodiment is a preset parameter, for example, the preset similarity threshold is 0.75, the feature data with the similarity greater than or equal to 0.75 is considered to have potential anomalies, and the feature data with the potential anomalies filtered out is used as normal data to train the initial self-encoder model, so as to obtain a trained self-encoder model.

The embodiment further clarifies how to train the self-encoder model, and the characteristic data with the similarity coefficient smaller than the preset similarity threshold value is used as normal data to train the initial self-encoder model by acquiring the similarity coefficient of the characteristic data, so that the data output by the trained self-encoder model is more accurate by the optimization mode.

The existing method generally inputs the characteristics into the encoder model for calculation after the characteristic preprocessing such as null filling and outlier processing. However, problems such as collinearity between feature data and feature redundancy often have negative effects on model training. In order to solve the problem, the embodiment firstly performs clustering processing on the characteristic data characteristics, then trains a classifier by using a clustering label, screens the original characteristics by using the characteristic contribution degree output by the classifier, and eliminates the clustering characteristic data with small contribution degree to the model.

Fig. 4 is a flow chart of a process for preprocessing feature data, which is proposed based on the embodiment shown in fig. 3. Based on S310, this step specifically includes S410 to S430, which are described in detail below:

s410: and clustering the feature data extracted from the measurement report sample data to obtain clustered feature data of a plurality of categories.

By way of example, the clustering processing of the embodiment applies a KMEANS model, inputs the feature data extracted from the measurement report sample data into the KMEANS model for preprocessing, obtains the optimal clustering number K according to the elbow method, outputs the clustering category of each feature data, and obtains the clustering feature data of a plurality of categories.

S420: and respectively carrying out two-classification processing on the clustering feature data of the multiple categories to obtain the contribution degree of the clustering feature data of each category.

Firstly, training a classifier corresponding to each cluster category, such as a RFC (random forest) classifier; and then inputting the clustering characteristic data of each category into a corresponding trained classifier, thereby obtaining the contribution degree of the clustering characteristic data of each category.

S430: and determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as characteristic sample data.

For example, the preset contribution threshold is 0.05, the feature data with the contribution degree smaller than or equal to 0.05 is removed, the feature data with the contribution degree smaller than or equal to 0.05 is repeatedly screened out until the contribution degree of the rest feature data is larger than 0.05, and the clustered feature data with the contribution degree larger than 0.05 is determined as feature sample data.

The embodiment further explains the preprocessing process of the feature data, performs clustering processing on the feature data, performs two-classification processing on the clustered feature data to obtain the contribution degree of the clustered feature data of each class, finally eliminates the clustered feature data with the contribution degree smaller than the preset contribution threshold, and determines the clustered feature data with the contribution degree larger than the preset contribution threshold as feature sample data, so that the preprocessed feature sample data is more suitable for a subsequent self-encoder model, and abnormal feature data can be determined better.

Fig. 5 is a flowchart of a process for calculating similarity coefficients of feature sample data based on the embodiment shown in fig. 3. Based on S320, this step specifically includes S510 to S520, which are described in detail below:

s510: and inputting the characteristic sample data into the initial self-encoder model to obtain the anomaly degree of the characteristic sample data and the anomaly distance between the characteristic sample data and the standard anomaly characteristic data in the initial self-encoder model.

The anomaly degree is one of parameters for measuring the anomaly degree of the input sample data, and the feature sample data and the standard anomaly feature data are spatially formed by using an IF (Isolation Forest) method, so that the distance between the feature sample data and the standard anomaly feature data is determined.

S520: and calculating to obtain the similarity coefficient of the characteristic sample data based on the anomaly degree and the anomaly distance.

The calculation in this embodiment includes conventional mathematical calculation methods such as addition, subtraction, multiplication, division, and the like, so as to calculate the similarity coefficient of the sample data.

The embodiment further illustrates that the similarity coefficient of the characteristic sample data is obtained through calculation according to the anomaly degree and the anomaly distance of the characteristic sample data, so that whether the characteristic sample data is anomaly or not is determined more accurately.

Fig. 6 is a flowchart of a process of calculating a similarity coefficient of feature sample data based on the embodiment shown in fig. 5. Based on S520, this step specifically includes S610 to S630, which are described in detail below:

s610: and acquiring a weight value corresponding to the anomaly degree and a weight value corresponding to the anomaly distance.

The weight value of this embodiment is a mathematical constant, for example, a constant of 0 to 1.

S620: and carrying out multiplication operation on the weight values corresponding to the abnormal degree and the abnormal degree to obtain a first value, and carrying out multiplication operation on the weight values corresponding to the abnormal distance and the abnormal distance to obtain a second value.

Illustratively, the anomaly is a, which corresponds to a weight of 0.2; the anomaly distance is B, and the corresponding weight is 0.8, and then the first value is 0.2A, and the second value is 0.8B.

S630: and carrying out summation operation on the first value and the second value to obtain a similarity coefficient of the characteristic sample data.

For example, if the anomaly degree is a, the corresponding weight is 0.6, the first value is 0.6A, the anomaly distance is B, the corresponding weight is 0.4, and the second value is 0.4B, the similarity coefficient of the feature sample data=0.6a+0.4b.

Illustratively, the similarity coefficient of the feature sample data is calculated as follows:

TS(x)＝θIS(x)+(1-θ)SS(x)，0＜θ＜1；

wherein TS (x) represents a similarity coefficient of the feature sample data, IS (x) represents an anomaly degree of the feature sample data, SS (x) represents an anomaly distance, and θ represents a mathematical constant.

The degree of anomaly is calculated from the following formula:

wherein,

wherein E (h (x)) represents an average path length between the feature sample data and the abnormal feature data stored in the memory bank, h (x) represents a path length between the training feature data and the abnormal feature data stored in the memory bank, x represents a position value of the feature sample data, c (n) represents a coefficient, and n represents a number value of the feature sample data; h (n) represents a harmonic progression, and the magnitude of the harmonic progression is determined according to the value of n.

The anomaly distance is calculated by the following formula:

wherein mu _i And the position numerical value of different abnormal characteristic data in the database is represented.

The embodiment further clarifies the calculation process of the similarity coefficient of the characteristic sample data and provides a specific calculation formula, so that the similarity coefficient of the characteristic sample data is more accurate.

It is not easy to find a new anomaly from a large number of data samples, and in this embodiment, the chi-square test is used to determine whether the newly detected anomaly characteristic data and the known anomaly belong to the same data distribution, if not, the new anomaly is associated with the anomaly characteristic data and stored in the repository, so as to ensure the integrity of the type of the anomaly characteristic data in the repository.

FIG. 7 is a flow chart of a process for characterizing anomalies based on the determined characteristic data as set forth in the embodiment of FIG. 3. Based on S230, this step further includes S710 to S730, which are described in detail below:

s710: and comparing the deviation rate with a preset deviation threshold value.

The preset deviation threshold is a key threshold for determining that the characteristic data characterizes an abnormal situation.

Illustratively, the deviation rate is calculated using the Pearson method, and the calculation formula of the deviation rate is:

wherein d represents the feature vector of the feature sample data, x represents the feature vector of the abnormal feature data in the memory bank, x ² (alpha) represents a preset deviation threshold value, x ² The deviation rate of the feature sample data from the abnormal feature data is represented.

S720: and if the deviation rate is larger than a preset deviation rate threshold value, determining that new anomalies exist in the characteristic data representation.

If x ² ＞x ² (alpha), e.g. x ² If (a) is 0.1, the characteristic sample data with the deviation rate larger than 0.1 is characterized by the existence of new abnormality.

S730: and if the deviation rate is smaller than or equal to a preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

If x ² ≤x ² (alpha), e.g. x ² If the feature sample data deviation rate is 0.05, it is determined that the feature sample data has no new abnormality.

The embodiment clarifies how the deviation rate of the characteristic sample data is judged with a preset deviation rate threshold value, and whether the characteristic data representation has a new abnormality or not is determined, namely, if the deviation rate is larger than the preset deviation rate threshold value, the characteristic data representation is determined to have the new abnormality; and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality, and subsequently storing the characteristic data with the new abnormality in a storage library, so as to update the abnormal characteristic data in the storage library in real time.

FIG. 8 is a flow chart of an associative memory process for an exception handling scheme based on the embodiment of FIG. 2. Based on S230, this step further includes S810 to S820, described in detail below:

S810: determining an exception handling scheme corresponding to the new exception characterized by the feature data; the exception handling scheme is used for handling exceptions to restore normal operation.

If the feature data represents a new abnormality, inquiring whether an abnormality processing scheme matched with the abnormality exists in a preset storage library, and if so, indicating that the abnormality processing scheme is preset in the preset storage library; the situation that no matching exception handling scheme exists is possible, namely, the feature data represents brand new exceptions, and no associated exception handling scheme exists in a preset storage library.

S820: and storing the exception handling scheme and the characteristic data in a preset storage library in an associated manner.

If no new abnormality processing scheme associated with the abnormality characteristic data exists in the preset storage library, associating the abnormality processing scheme aiming at the abnormality characteristic data with the abnormality characteristic data, and simultaneously storing the abnormality processing scheme and the abnormality characteristic data into the preset storage library.

The embodiment provides an automatic optimization method for realizing base station parameters based on the data processing method.

First, measurement report data of 15971 base station cells for 7 days is collected for training of a self-encoder model, and available features as shown in table 1 can be selected as feature data in the measurement report data.

Table 1: available characteristic data in measurement report data

/>

As shown in fig. 9, fig. 9 is a schematic diagram illustrating a process of automatically handling an abnormal situation by a base station based on the data processing method of the present application according to an exemplary embodiment of the present application.

The server 200 receives the measurement report data transmitted from the terminal 100, and the server 200 extracts the feature data in the measurement report data and inputs the feature data into the encoder model, and processes the feature data with reference to the above-mentioned S210-S240, which will not be described herein. In particular, the present embodiment focuses on how to automate the process of handling abnormal situations after abnormal base station parameters. When the new abnormal feature data is determined, a new abnormal processing scheme is set to be associated with the abnormal feature data, the abnormal processing scheme is stored in a storage library, and the abnormal processing scheme is sent to the base station 300, so that the base station 300 automatically processes abnormal conditions according to the abnormal processing scheme.

The embodiment further applies the data processing method to the actual scene of the base station for automatically processing the abnormal situation, can update the abnormal characteristic data and the abnormal processing scheme of the storage library in the server in real time, and sends the abnormal processing scheme to the base station, so that the base station automatically processes the abnormal situation.

Another aspect of the present application further provides a data processing apparatus, as shown in fig. 10, and fig. 10 is a schematic structural diagram of the data processing apparatus according to an exemplary embodiment of the present application. Wherein the data processing device comprises:

the obtaining module 1010 is configured to input the characteristic data in the measurement report data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data;

the comparison module 1030 is configured to compare the characteristic data with the abnormal characteristic data contained in the preset storage library in a deviation manner if the reconstruction error value is greater than the preset error threshold value, so as to obtain a deviation rate of the characteristic data relative to the abnormal characteristic data;

a determining module 1050 configured to determine, based on a relationship of the deviation rate and a preset deviation threshold, a condition in which the feature data characterizes an anomaly;

the update module 1070 is configured to store the feature data in the preset repository if the feature data characterizes the new anomaly.

In another embodiment, the data processing apparatus further comprises:

the model construction module is configured to construct an initial self-encoder model, and preprocesses the characteristic data extracted from the measurement report sample data to obtain characteristic sample data; inputting the characteristic sample data into an initial self-encoder model to obtain a similarity coefficient of the characteristic sample data; if the similarity coefficient of the characteristic sample data is smaller than the preset similarity threshold value, training the initial self-encoder model by utilizing the characteristic sample data to obtain a trained self-encoder model.

In another embodiment, a model building module includes:

the preprocessing unit is configured to perform clustering processing on the characteristic data extracted from the measurement report sample data to obtain clustered characteristic data of a plurality of categories;

the classification unit is configured to respectively perform two-classification processing on the clustering feature data of the multiple categories to obtain the contribution degree of the clustering feature data of each category;

and the determining unit is used for determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as characteristic sample data.

In another embodiment, a model building module includes:

an anomaly parameter unit configured to input the feature sample data into the initial self-encoder model, to obtain an anomaly degree of the feature sample data, and an anomaly distance of the feature sample data from standard anomaly feature data in the initial self-encoder model;

and a similarity coefficient calculation unit configured to calculate a similarity coefficient of the feature sample data based on the anomaly degree and the anomaly distance.

In another embodiment, the similarity coefficient calculation unit further includes:

obtaining a plate: is configured to acquire a weight value corresponding to the degree of abnormality and a weight value corresponding to the distance of abnormality.

An operation plate: the method comprises the steps of obtaining a first value by multiplying the anomaly degree by a weight value corresponding to the anomaly degree, and obtaining a second value by multiplying the anomaly distance by a weight value corresponding to the anomaly distance.

Summation plate: and the method is configured to perform summation operation on the first value and the second value to obtain a similarity coefficient of the characteristic sample data.

In another embodiment, the summing block is specifically configured to: acquiring a weight value corresponding to the anomaly degree and a weight value corresponding to the anomaly distance; performing product operation on the abnormal degree and the weight value corresponding to the abnormal degree to obtain a first value, and performing product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value; and carrying out summation operation on the first value and the second value to obtain a similarity coefficient of the characteristic sample data.

In another embodiment, the determining module 1050 is specifically configured to compare the deviation rate to a preset deviation threshold; if the deviation rate is larger than a preset deviation rate threshold value, determining that new abnormality exists in the characteristic data representation; and if the deviation rate is smaller than or equal to a preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the update module 1070 is specifically configured to determine an exception handling scheme corresponding to the new exception characterized for the feature data; the exception handling scheme is used for handling exceptions to restore normal operation; and storing the exception handling scheme and the characteristic data in a preset storage library in an associated manner.

It should be noted that, the data processing apparatus provided in the foregoing embodiments and the data processing method provided in the foregoing embodiments belong to the same concept, and a specific manner in which each module and unit perform an operation has been described in detail in the method embodiment, which is not described herein again.

Another aspect of the present application also provides an electronic device, including: a controller; and a memory for storing one or more programs which, when executed by the controller, perform the method of data processing in the respective embodiments described above.

Referring to fig. 11, fig. 11 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application, which illustrates a schematic diagram of a computer system of an electronic device suitable for implementing an embodiment of the present application.

It should be noted that, the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 11, the computer system 1100 includes a central processing unit (Central Processing Unit, CPU) 1101 that can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a random access Memory (Random Access Memory, RAM) 1103. In the RAM 1103, various programs and data required for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. When executed by a Central Processing Unit (CPU) 1101, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a data processing method as before. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the data processing method provided in the above-described respective embodiments.

According to an aspect of the embodiment of the present application, there is also provided a computer system including a central processing unit (Central Processing Unit, CPU) which can perform various appropriate actions and processes, such as performing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) or a program loaded from a storage section into a random access Memory (Random Access Memory, RAM). In the RAM, various programs and data required for the system operation are also stored. The CPU, ROM and RAM are connected to each other by a bus. An Input/Output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, etc.; an output section including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section including a hard disk or the like; and a communication section including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drives are also connected to the I/O interfaces as needed. Removable media such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, and the like are mounted on the drive as needed so that a computer program read therefrom is mounted into the storage section as needed.

The foregoing is merely illustrative of the preferred embodiments of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make corresponding variations or modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be defined by the claims.

Claims

1. A method of data processing, comprising:

Constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the measurement report sample data to obtain the characteristic sample data;

inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data;

the process of inputting the characteristic sample data into the initial self-encoder model to obtain the similarity coefficient of the characteristic sample data comprises the following steps:

inputting the characteristic sample data into the initial self-encoder model to obtain the degree of abnormality of the characteristic sample data and the abnormal distance between the characteristic sample data and standard abnormal characteristic data in the initial self-encoder model;

calculating a similarity coefficient of the characteristic sample data based on the anomaly degree and the anomaly distance;

if the similarity coefficient of the characteristic sample data is smaller than a preset similarity threshold value, training the initial self-encoder model by utilizing the characteristic sample data to obtain a trained self-encoder model; wherein, the feature sample data corresponding to the preset similarity threshold value is larger than or equal to the potential abnormality;

Inputting the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; the measurement report data is used for reflecting network quality, user behavior habit and surrounding environment information;

if the reconstruction error value is larger than a preset error threshold value, carrying out deviation comparison on the characteristic data and abnormal characteristic data contained in a preset storage library to obtain a deviation rate of the characteristic data relative to the abnormal characteristic data;

comparing the deviation rate with a preset deviation rate threshold;

if the deviation rate is larger than a preset deviation rate threshold value, determining that new abnormality exists in the characteristic data representation;

if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that no new abnormality exists in the characteristic data representation;

if the characteristic data representation has a new abnormality, storing the characteristic data into the preset storage library;

determining an exception handling scheme corresponding to the new exception characterized by the feature data; the exception handling scheme is used for handling the exception so as to restore normal operation;

and storing the exception handling scheme and the characteristic data in the preset storage library in an associated mode.

2. The method of claim 1, wherein preprocessing the feature data extracted from the measurement report sample data to obtain feature sample data comprises:

clustering the feature data extracted from the measurement report sample data to obtain clustered feature data of a plurality of categories;

performing two-classification processing on the clustering feature data of the multiple categories respectively to obtain the contribution degree of the clustering feature data of each category;

and determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as the characteristic sample data.

3. The method according to claim 1, wherein the calculating a similarity coefficient of the feature sample data based on the degree of abnormality and the abnormality distance includes:

acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance;

performing multiplication operation on the anomaly degree and a weight value corresponding to the anomaly degree to obtain a first value, and performing multiplication operation on the anomaly distance and a weight value corresponding to the anomaly distance to obtain a second value;

and carrying out summation operation on the first value and the second value to obtain the similarity coefficient of the characteristic sample data.

4. A data processing apparatus, comprising:

the model construction module is configured to construct an initial self-encoder model, and preprocesses the characteristic data extracted from the measurement report sample data to obtain characteristic sample data;

The acquisition module is configured to input the characteristic data in the measurement report data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; the measurement report data is used for reflecting network quality, user behavior habit and surrounding environment information;

the comparison module is configured to compare the characteristic data with abnormal characteristic data contained in a preset storage library in a deviation mode if the reconstruction error value is larger than a preset error threshold value, so that the deviation rate of the characteristic data relative to the abnormal characteristic data is obtained;

a determining module configured to compare the deviation rate with a preset deviation rate threshold;

if the deviation rate is larger than the preset deviation rate threshold value, determining that new abnormality exists in the characteristic data representation;

the updating module is configured to store the characteristic data into the preset storage library if the characteristic data represents that a new abnormality exists;

5. An electronic device, comprising:

a controller;

a memory for storing one or more programs that, when executed by the controller, cause the controller to implement the method of any of claims 1-3.

6. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any of claims 1 to 3.