CN107943809B - Data quality monitoring method and device and big data computing platform - Google Patents

Data quality monitoring method and device and big data computing platform Download PDF

Info

Publication number
CN107943809B
CN107943809B CN201610895875.1A CN201610895875A CN107943809B CN 107943809 B CN107943809 B CN 107943809B CN 201610895875 A CN201610895875 A CN 201610895875A CN 107943809 B CN107943809 B CN 107943809B
Authority
CN
China
Prior art keywords
data
parameters
parameter
current data
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610895875.1A
Other languages
Chinese (zh)
Other versions
CN107943809A (en
Inventor
解敏
陈欢
范茸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610895875.1A priority Critical patent/CN107943809B/en
Publication of CN107943809A publication Critical patent/CN107943809A/en
Application granted granted Critical
Publication of CN107943809B publication Critical patent/CN107943809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

A data quality monitoring method, a device and a big data computing platform are provided, wherein the data quality monitoring device collects historical data related to parameters needing to be monitored, and establishes a prediction model of the parameters by taking the historical data as a sample; predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result; and acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result. The application realizes intelligent setting of the monitoring threshold. Compared with the manual setting of the threshold value, the method is more accurate, and the false alarm rate is reduced. Based on the prediction analysis method, the application also provides a user data quality monitoring method of the big data computing service and a corresponding big data computing platform.

Description

Data quality monitoring method and device and big data computing platform
Technical Field
The invention relates to data processing, in particular to a data quality monitoring method and device and a big data computing platform.
Background
Not only is big data becoming a strategic direction for each big internet company, but other industries are also beginning to explore big data. The data quality issues associated with large data are much more severe than those in traditional databases. The big data service has PB level data calculation amount every day, and is particularly important for data monitoring in a data output process in order to guarantee data quality. If the data content quality does not meet the standard, the data monitoring can give an alarm to inform a user, so that larger-scale data pollution of downstream data is avoided.
Data Quality Center (DQC) systems may monitor Data for big Data computing services such as MaxCommute (original name ODPS). Once the data quality monitoring threshold is set, the data quality monitoring threshold is compared with the monitoring threshold according to the data quality rule after every day of business generation, and if the data quality monitoring threshold exceeds a normal value range defined by the monitoring threshold, the system gives an alarm. For example, if the relative error rate of the data of the production time of the day and the data quality rule statistic (such as the seven-day average value, the maximum value and the like) is smaller than the normal value range (within plus or minus 10%) defined by the monitoring threshold, the data is normal, otherwise, an alarm is given and a downstream data task is blocked. And (4) checking after alarming, if the problem is found, modifying the code, otherwise, carrying out false alarm.
In the related art, a threshold value for monitoring data quality is manually set based on judgment of manual experience. However, this threshold setting method has the following disadvantages:
first, the threshold setting is too dependent on the experience of the setter. For example: data for financial transactions: the total loan amount of the loan service is that the setter A understands that the fluctuation range of the setting data is about 5% which is a reasonable range. And the setter B may understand that the setting data fluctuation range of 3% is a reasonable range.
Second, the traffic data produced each day is constantly changing, but is checked with the same threshold, possibly causing false alarms.
Thirdly, after the threshold is set, if the upstream relation of the data or the service is changed, and the setter of the threshold does not sense the change and still adopts the original set threshold for verification, the set threshold may be unreasonable and cannot give an alarm accurately.
When the monitoring threshold is set unreasonably, normal fluctuation (rising or falling) of the service data can exceed a normal value range defined by the monitoring threshold, so that frequent false alarm is caused, and the workload of operation and maintenance is increased.
At present, data quality monitoring of big data computing service is carried out aiming at general service data in a big data computing platform such as a cloud computing platform database, quality monitoring is not carried out on data of a specific user to be served, the user needs to carry out quality monitoring on the data of the user, a monitoring program needs to be designed by himself and the data is downloaded from the big data computing platform, and the data quality monitoring is very difficult to realize for the user.
In addition, a Service-Level Agreement (SLA) is a contract between a network Service provider and a customer that defines terms of Service type, quality of Service, and customer payment. However, at present, the SLA of big data computing does not use data quality alarm as one of parameters for measuring the service quality, which makes the evaluation of the service quality not perfect and is not beneficial to monitoring the service quality of big data computing service.
Disclosure of Invention
In view of this, the embodiments of the present invention provide the following solutions.
A method of data quality monitoring, comprising:
collecting historical data related to parameters needing to be monitored, and establishing a prediction model of the parameters by taking the historical data as a sample;
predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
and acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result.
A data quality monitoring apparatus comprising:
a model building module; the device comprises a parameter monitoring module, a parameter prediction module and a parameter prediction module, wherein the parameter prediction module is used for acquiring historical data related to parameters needing to be monitored and establishing a prediction model of the parameters by taking the historical data as a sample;
the threshold value determining module is used for predicting the current data of the parameters according to the prediction model and determining the monitoring threshold value of the parameters according to the prediction result;
and the alarm processing module is used for acquiring the current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold value, and carrying out alarm processing according to a comparison result.
A data quality monitoring apparatus comprising a processor and a memory, wherein:
the memory, configured to: saving the program code;
the processor is configured to: reading the program code and performing the following: collecting historical data related to parameters needing to be monitored, and establishing a prediction model of the parameters by taking the historical data as a sample; predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result; and acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result.
The data quality monitoring method and the data quality monitoring device predict the normal fluctuation range of the parameters in a data modeling mode, and realize intelligent setting of the monitoring threshold. Compared with the manual setting of the threshold value, the method is more accurate, reduces the false alarm rate and lightens the workload of operation and maintenance.
In view of this, the embodiment of the present invention further provides the following solutions.
A user data quality monitoring method for big data computing service comprises the following steps:
the big data computing platform collects historical data related to parameters needing to be monitored from stored user data, and predicts current data of the parameters according to the historical data to obtain a prediction result;
and the big data computing platform acquires the current data of the parameters, compares the current data with the prediction result, and performs alarm processing on the quality of the user data according to the comparison result.
A big data computing platform comprising a user data quality monitoring module, the user data quality monitoring module comprising:
the prediction unit is used for acquiring historical data related to the parameters needing to be monitored from the stored user data, and predicting the current data of the parameters according to the historical data to obtain a prediction result;
and the alarm unit is used for acquiring the current data of the parameters, comparing the current data with the prediction result and carrying out alarm processing on the quality of the user data according to the comparison result.
A big data computing platform comprising a processor and a memory, wherein:
the memory, configured to: saving the program code;
the processor is configured to: reading the program code and performing the following: acquiring historical data related to parameters needing to be monitored from stored user data, and predicting the current data of the parameters according to the historical data to obtain a prediction result; and acquiring current data of the parameters, comparing the current data with the prediction result, and performing alarm processing on the user data quality according to the comparison result.
The user data quality monitoring method and the big data computing platform provide data quality monitoring service for data of specific users, expand the field of big data computing service, and are beneficial to optimization of big data computing service.
Drawings
FIG. 1 is a flow chart of a data quality monitoring method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data quality monitoring apparatus according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for monitoring data quality of a second user according to an embodiment of the present invention;
FIG. 4 is a block diagram of a big data computing platform, according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the variation of an incremental parameter of the present invention;
FIG. 6 is a graph illustrating variation of an exemplary periodically varying parameter of the present invention;
FIG. 7 is a schematic diagram of an exemplary triple-cloud computing architecture of the present invention;
FIG. 8 is a flow chart of an exemplary three user data quality monitoring method of the present invention;
FIG. 9 is a signaling interaction diagram of an exemplary three user device and a big data computing platform of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
Example one
In the embodiment, a statistical prediction method is adopted, the current data of the parameters are predicted according to the historical data of the parameters, the current data of the parameters are collected and compared with the prediction result, and alarm processing is performed according to the comparison result. The present embodiments relate to data quality monitoring of data in a data warehouse disposed in a compute cluster providing big data compute services. The invention is not limited to this, and can be used for data quality monitoring of data in other systems and other nodes.
As shown in fig. 1, the data quality monitoring method of the present embodiment includes:
step 110, collecting historical data related to parameters needing to be monitored, and establishing a prediction model of the parameters by taking the historical data as a sample;
the statistical prediction belongs to the research category of prediction methods, namely how to use scientific statistical methods to carry out quantitative speculation on future development of things. The data model is a predictive means. There are many quantitative prediction methods, including trend extrapolation prediction, time series prediction, regression prediction, etc. In this embodiment, a regression prediction method is adopted, that is, the historical data is used as a sample to perform regression analysis, and a regression model using the parameter as a dependent variable is established. However, other predictive analysis methods may be used.
Regression analysis (regression analysis) is a computational method and theory that studies the specific dependence of one variable (called dependent variable or explained variable) on another variable (called independent variable or explained variable). Regression analysis starts with a set of sample data, determines mathematical relationships between variables, and performs various statistical tests on the degree of confidence of these relationships. The value of the dependent variable can be predicted based on the value of the independent variable by using the solved relational expression, the value interval of the dependent variable can be determined by interval analysis, and the accuracy degree of the prediction can be given.
According to the number of independent variables, the regression model can be divided into a unitary regression model and a multivariate regression model; the variables can be classified into a linear regression model, a nonlinear regression model and the like according to the dependency relationship between the variables. In this embodiment, the variables in the regression model correspond to parameters in the service, and if the parameters have a dependency relationship with other parameters, the regression model preferably adopts a regression model in which the independent variables include the other parameters, and at this time, the change of the other parameters will affect the prediction result of the parameters to be monitored, or the regression model may sense the change of the upstream relationship or the change of the service; if the parameter has no dependency relationship with other parameters, the regression model preferably adopts an autoregressive model, that is, the value of the parameter at the current period is predicted by using the value taking condition of each period before the same parameter, and the historical data based on which the regression model is established is continuously updated along with the continuous update of the parameter data, so that the regression model (such as a regression coefficient) is also updated, and the predicted value of the parameter is continuously changed.
In this embodiment, the parameter to be monitored includes at least one of the following parameters:
the number of records in the data table;
the file size of the data table;
a field in the data table;
statistics of fields in the data table.
Wherein the content of the first and second substances,
the number of records in the data table, for example, the number of records in the user access table, each record in the table represents a user access, so the number of records in the table represents the number of user accesses received by the large data platform. As another example, the number of records in the registered user table represents the number of registered users. The data table may be an originally generated data table (which may be referred to as a source head table) or a summary table generated according to the source head table, such as a weekly report table, a monthly report table, an annual report table, and the like. The source head table is an upstream table of these reports, and typically, the parameters in these reports have a dependency relationship with respect to the relevant parameters in their upstream tables.
The file size of the data table refers to the data volume of the file, such as 1M and 2M, and the file size can reflect to some extent whether the data in the data table is abnormal, for example, if the number of records of the user access table is normal, but the values of the data are all 0, that is, the data is abnormal, the file size is much smaller than the normal file size, and the data abnormality can be found by monitoring the file size.
The fields in the data table may be user-related parameters such as user ID, last name, address, etc., service-related parameters such as access time, access times, data traffic, cost, etc., or any other type of parameters.
The statistical value of the field in the data table is counted on the basis of the field in the data table, for example, how many records or how large the proportion of records in the data table in which the user ID is a null value; as another example, how many or a percentage of the records in the data table that are repeated by the user ID. If the user ID is empty, the repeated records are too many or the occupation ratio is too large, the data abnormality is indicated.
Step 120, predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
the current data refers to the data that is updated most recently, the period of updating the parameter may be one week, one day, one hour, etc., if it is one day, the current data refers to the data that is updated on the same day by the parameter, and so on.
The value of the parameter can be predicted through the regression model, the value is an approximate value, the error range of the approximate value is not reflected, and the range is usually given in the form of an interval. In the case of a given confidence level, a confidence interval may be calculated by the regression model, for example, if the confidence level is 95%, the calculated confidence interval indicates that the probability that the parameter will fall within the confidence interval is 95%. In this embodiment, a confidence interval of the current data of the parameter is calculated according to the regression model, and a boundary of the confidence interval is determined as the monitoring threshold of the parameter.
In another embodiment, the predicted value of the current data of the parameter (i.e., the approximation above) is calculated according to the regression model, and then the monitoring threshold value of the parameter is determined according to the product of the predicted value and the fluctuation coefficient, which is determined according to the fluctuation range between the historical data and the historical predicted value calculated based on the regression model. For example, the update cycle of the parameter is one day, the parameter is predicted every day in the past month, and the ratio of the predicted value to the actual value of the parameter is calculated, if the ratio is found to be always smaller than 115% and larger than 80% when the data is normal, 115% may be set as the fluctuation coefficient K1 corresponding to the upper limit of the parameter value, 80% may be set as the fluctuation coefficient K2 corresponding to the lower limit of the parameter value, if the predicted value of the current data is X, X · K1 may be set as the monitoring threshold corresponding to the upper limit of the parameter value, and X · K2 may be set as the monitoring threshold corresponding to the lower limit of the parameter value. The fluctuation coefficient is obtained by statistical analysis, and the fluctuation coefficient can sense the change of data.
It should be noted that the monitoring threshold may include both the monitoring threshold corresponding to the upper limit of the parameter value and the monitoring threshold corresponding to the lower limit of the parameter value, or may include only one of them.
And step 130, acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result.
And during alarm processing, if the current data exceeds a normal value range defined by the monitoring threshold, performing data quality alarm, and otherwise, not performing data quality alarm. In the above example in which the monitoring thresholds are X.K. 1 and X.K. 2, the monitoring thresholds define ranges of normal values of [ X.K. 2, X.K. 1], and if the parametric current data falls in [ X.K. 2, X.K. 1], the current data is judged to be normal, otherwise, the current data is judged to be abnormal.
The present embodiment further provides a data quality monitoring apparatus, as shown in fig. 2, including:
a model building module 10; the device comprises a parameter monitoring module, a parameter prediction module and a parameter prediction module, wherein the parameter prediction module is used for acquiring historical data related to parameters needing to be monitored and establishing a prediction model of the parameters by taking the historical data as a sample;
a threshold determining module 20, configured to predict current data of the parameter according to the prediction model, and determine a monitoring threshold of the parameter according to a prediction result;
and the alarm processing module 30 is configured to acquire current data of the parameter, compare the current data with a normal value range defined by the monitoring threshold, and perform alarm processing according to a comparison result.
Alternatively,
the model building module collects historical data related to parameters needing to be monitored, wherein the parameters needing to be monitored comprise at least one of the following parameters: the record number of the data table, the file size of the data table, the field in the data table, and the statistic value of the field in the data table.
Alternatively,
the model building module builds a prediction model of the parameter by taking the historical data as a sample, and comprises the following steps: and performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameter as a dependent variable. In one example, if the parameter has a dependency relationship with other parameters, the regression model adopts a regression model with independent variables including the other parameters; if the parameters do not have dependency relationship on other parameters, the regression model adopts an autoregressive model.
Alternatively,
the threshold determination module predicts the current data of the parameter according to the prediction model, and determines the monitoring threshold of the parameter according to a prediction result, including:
calculating a confidence interval of the current data of the parameter according to the regression model, and determining the boundary of the confidence interval as a monitoring threshold value of the parameter; or
Calculating a predicted value of current data of the parameter according to the regression model, determining a monitoring threshold value of the parameter according to a product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and a historical predicted value calculated based on the regression model.
The present embodiment further provides a data quality monitoring apparatus, including a processor and a memory, wherein:
the memory, configured to: saving the program code;
the processor is configured to: reading the program code and performing the following: collecting historical data related to parameters needing to be monitored, and establishing a prediction model of the parameters by taking the historical data as a sample; predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result; and acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result.
Alternatively,
the processor collects historical data related to parameters needing to be monitored, wherein the parameters needing to be monitored comprise at least one of the following parameters: the record number of the data table, the file size of the data table, the field in the data table, and the statistic value of the field in the data table.
Alternatively,
the processor establishes a prediction model of the parameter by taking the historical data as a sample, and the prediction model comprises the following steps: and performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameter as a dependent variable. In one example, if the parameter has a dependency relationship with other parameters, the regression model adopts a regression model with independent variables including the other parameters; if the parameters do not have dependency relationship on other parameters, the regression model adopts an autoregressive model.
Alternatively,
the processor predicts the current data of the parameters according to the prediction model, and determines the monitoring threshold of the parameters according to the prediction result, wherein the method comprises the following steps:
calculating a confidence interval of the current data of the parameter according to the regression model, and determining the boundary of the confidence interval as a monitoring threshold value of the parameter; or
Calculating a predicted value of current data of the parameter according to the regression model, determining a monitoring threshold value of the parameter according to a product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and a historical predicted value calculated based on the regression model.
The data quality monitoring method and the data quality monitoring device predict the normal fluctuation range of the parameters in a data modeling mode, and realize intelligent setting of the monitoring threshold. Compared with the manual setting of the threshold value, the method is more accurate, reduces the false alarm rate and lightens the workload of operation and maintenance.
Example two
In order to solve the problem that the existing big data computing service cannot provide data quality monitoring for a user, the embodiment provides a user data quality monitoring method for the big data computing service, as shown in fig. 3, where the method includes:
step 210, collecting historical data related to parameters needing to be monitored from stored user data by a big data computing platform, and predicting current data of the parameters according to the historical data to obtain a prediction result;
the big data computing platform can be a cloud computing platform and the like, and the user data can be personal user data or enterprise user data and the like. The content of the data can be various types of data such as data generated by business processing of a user based on the big data computing platform, log data or data generated by accessing the big data computing platform.
The user data alarm of the big data computing service can be used as a value-added service which needs to be customized by a user, only provided for a specified user applying the value-added service, and also provided for all users.
And step 220, the big data computing platform collects the current data of the parameters, compares the current data with the prediction result, and carries out alarm processing on the quality of the user data according to the comparison result.
For the embodiment, for the prior estimation, the method of the first embodiment may be adopted to predict the current data of the parameter according to the historical data, that is, the historical data is used as a sample to establish a prediction model of the parameter, the current data of the parameter is predicted according to the prediction model, and the monitoring threshold of the parameter is determined according to the prediction result. And when the alarm processing of the user data quality is carried out, if the current data exceeds the normal value range defined by the monitoring threshold, sending a data quality alarm to a corresponding user, otherwise, not sending an alarm.
In this embodiment, the parameter to be monitored includes at least one of the following parameters: the number of records of the data table, the file size of the data table, the fields in the data table and the statistics of the fields in the data table. Any other parameter is also possible.
In the modeling of this embodiment, as in the first embodiment, regression analysis may be performed using the historical data as a sample, and a regression model using the parameter as a dependent variable may be established. Similarly, if the parameter has a dependency relationship with other parameters, the regression model may adopt a regression model in which the independent variables include the other parameters; the regression model may employ an autoregressive model if the parameters are not dependent on other parameters.
In this embodiment, when determining the monitoring threshold, a confidence interval of the current data of the parameter may be calculated according to the regression model, and a boundary of the confidence interval is determined as the monitoring threshold of the parameter; it is also possible to calculate a predicted value of the current data of the parameter from the regression model, determine the monitoring threshold value of the parameter from a product of the predicted value and a fluctuation coefficient determined from a fluctuation range between the historical data and a historical predicted value calculated based on the regression model.
In this embodiment, the method further includes: the big data computing platform counts the alarm times generated by monitoring the user data quality in unit time, and sends the statistical result to a corresponding user as the service performance parameter agreed in the service level agreement SLA related to the big data computing service.
Other features of the data quality monitoring method in the first embodiment may also be used in this embodiment.
The embodiment further provides a big data computing platform, which includes a user data quality monitoring module, as shown in fig. 4, where the user data quality monitoring module includes:
the prediction unit 50 is used for acquiring historical data related to parameters needing to be monitored from stored user data, and predicting the current data of the parameters according to the historical data to obtain a prediction result;
and the alarm unit 60 is used for acquiring the current data of the parameters, comparing the current data with the prediction result, and carrying out alarm processing on the quality of the user data according to the comparison result.
Alternatively,
the predicting unit predicts the current data of the parameters according to the historical data to obtain a prediction result, and the predicting unit comprises: establishing a prediction model of the parameters by taking the historical data as a sample, predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
the alarm unit collects the current data of the parameters, compares the current data with the prediction result, and carries out alarm processing of user data quality according to the comparison result, and the alarm processing comprises the following steps: and acquiring current data of the parameters and comparing the current data with the monitoring threshold, and if the current data exceeds a normal value range defined by the monitoring threshold, sending a data quality alarm to a corresponding user.
Alternatively,
the predicting unit predicts the current data of the parameters according to the historical data to obtain a prediction result, and the predicting unit comprises: performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameter as a dependent variable; calculating a confidence interval of the current data of the parameter according to the regression model, and determining the boundary of the confidence interval as a monitoring threshold value of the parameter, or calculating a predicted value of the current data of the parameter according to the regression model, and determining the monitoring threshold value of the parameter according to the product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and the historical predicted value calculated based on the regression model.
Alternatively,
the parameters to be monitored comprise at least one of the following parameters: the number of records of the data table, the file size of the data table, the fields in the data table and the statistics of the fields in the data table. Any other parameter is also possible.
Alternatively,
the user data quality monitoring module further comprises: and the counting module is used for counting the alarm times generated by monitoring the user data quality in unit time, and sending the counting result to a corresponding user as a service performance parameter agreed in a service level agreement SLA related to the big data computing service.
The embodiment also provides a big data computing platform, which comprises a processor and a memory, wherein:
the memory, configured to: saving the program code;
the processor is configured to: reading the program code and performing the following: acquiring historical data related to parameters needing to be monitored from stored user data, and predicting the current data of the parameters according to the historical data to obtain a prediction result; and acquiring current data of the parameters, comparing the current data with the prediction result, and performing alarm processing on the user data quality according to the comparison result.
Alternatively,
the processor predicts the current data of the parameter according to the historical data, and comprises the following steps: establishing a prediction model of the parameters by taking the historical data as a sample, predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
the processor collects the current data of the parameters, compares the current data with the prediction result, and carries out alarm processing on the quality of the user data according to the comparison result, wherein the alarm processing comprises the following steps: and acquiring current data of the parameters and comparing the current data with the monitoring threshold, and if the current data exceeds a normal value range defined by the monitoring threshold, sending a data quality alarm to a corresponding user.
Alternatively,
the processor predicts the current data of the parameter according to the historical data, and comprises the following steps: performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameter as a dependent variable; and calculating a confidence interval of the current data of the parameter according to the regression model, determining the boundary of the confidence interval as the monitoring threshold of the parameter, or calculating a predicted value of the current data of the parameter according to the regression model, and determining the monitoring threshold of the parameter according to the product of the predicted value and a fluctuation coefficient.
Alternatively,
the processor further performs the following: and counting the alarm times generated by monitoring the user data quality in unit time, and sending the counting result to a corresponding user as a service performance parameter agreed in a service level agreement SLA related to the big data computing service.
The user data quality monitoring method and the big data computing platform of the embodiment provide data quality monitoring service for data of a specific user, expand the field of big data computing service, and are beneficial to optimization of the big data computing service. In addition, aiming at the characteristics of the big data computing Service, the frequency of the data abnormity occurring in unit time is used as a new parameter for evaluating the Service quality in the SLA (Service-Level agent) related to the big data computing Service, thereby facilitating the execution of the SLA and perfecting the Service quality evaluation system of the big data computing Service.
The following description is given with reference to several examples of specific applications.
Example 1
The monitored data in this example is data in an offline data warehouse in the cloud computing system, and the data in the offline data warehouse needs to be imported from a front-end database such as a mysql library or an oracle business library at regular time. This portion of the imported data exists in the form of a data table, which may be referred to as source data or a source header table. Typically, some summary tables may be generated based on these source data. The monitored data may be data related to the source header table or data related to a summary table generated based on the source header table.
In the data quality monitoring, the example can select a proper prediction model according to the characteristics of the data table. For example, when monitoring the number of records in the list of registered users, the number of records in the list of registered users is not increased or decreased because the user does not delete the records even if the user logs off after registering, but only marks the state as logged off, and fig. 5 is a schematic diagram of the parameter change. The registered user list is a source list imported from other systems, and has no dependency relationship with other lists in an offline data warehouse, so that an Autoregressive (AR) model can be selected as a prediction model, namely, the record number of the current day is predicted and inferred according to the historical record number of the list. When the properties of the table are not easily classified, the prediction model may be selected by drawing a graph based on the history data of the parameters and selecting an appropriate prediction model by a method such as graph analysis.
For the autoregressive model, the historical data to be collected is the historical data of the parameters to be monitored, and the quantity of the collected historical data can be determined according to comprehensive consideration such as precision, calculated amount and the like, namely, all the historical data can be collected as sample data, and only the historical data with a fixed quantity or a fixed proportion can be collected as the sample data. In this example, historical data is collected for the number of records of each data table to be monitored, and an autoregressive model, i.e., batch modeling, is established. The present example performs the collection and modeling of historical data once per update period of the parameters to improve the effectiveness of the predictive model.
In the process of collecting historical data, data can be firstly cleaned, and abnormal data or burst data can be removed. After sufficient historical data is collected from the database, the regression coefficients in the regression model can be determined by methods in regression analysis, such as the least squares method, to obtain the mathematical expression of the regression model.
The mathematical expression of the autoregressive model established in this example can be expressed as:
rx=a0+a1*r1+a2*r2+……+an*rn
where rx is the current data of the predicted parameter, r1, r2, … …, rn is the historical data of the parameter, a0, a1, … …, an are regression coefficients, and n is the number of independent variables. When n is 1, the corresponding regression model is a multiple linear autoregressive model; when n is more than or equal to 2, the corresponding regression model is a multiple linear autoregressive model.
In another example, a regression model with arguments including other parameters is used, and its expression can be expressed as:
Y=b0+b1x1+…+bkxk+e
wherein y is a dependent variable, x1, x2, …, xk is an independent variable, the independent variable and the dependent variable are in a linear relation, b0 is a constant term, b1 and b2 … bk are regression coefficients, and e is a random variable.
After the mathematical expression is obtained, the confidence interval of the dependent variable predicted value can be calculated according to an interval analysis method of a unary or multivariate linear regression model, wherein the confidence level can be adjusted according to actual needs and predicted effects. The confidence interval represents the possibility that the predicted value falls into the interval, and if the confidence level is set to 95%, the predicted value is likely to fall into the interval, so that the boundary of the interval can be used as a monitoring threshold value of the parameter, and a case with the minimum possibility that the current data does not fall into the interval can be regarded as an abnormal case to be alarmed. Another method for setting the monitoring threshold value is to calculate a predicted value (point value) of a parameter and then determine the monitoring threshold value of the parameter based on the product of the predicted value and a fluctuation coefficient determined based on a fluctuation range between the historical data and a historical predicted value calculated based on the regression model. Corresponding examples have been given above and will not be repeated here.
By monitoring the data tables in a plurality of projects by adopting the method in practice, the generated data quality alarm not only covers the discovered data quality problems, but also discovers a plurality of newly added data quality problems in the set testing time. The number of false positives is greatly reduced compared with the case of manually setting the threshold, which shows that the method is effective and more reasonable compared with the method of manually setting the monitoring threshold.
Example two
This example is similar to the basic case of the previous example, and only the differences will be described further below.
The number of records of the data table targeted in this example is periodically changed, the period being one month, as shown in fig. 6. It is assumed that the data table is the source table.
For the characteristics of the table and the number of records, when the prediction model is selected, an autoregressive moving average model (ARMA model) may be selected, and the AR model may also be regarded as a special ARMA model. The expression of the ARMA model is as follows: a
Yi=β01Yi-12Yi-2+…+ββYi-β+∈i1i-12i-2+…+αiji-ij
Wherein, YtCurrent data being a predicted parameter, Yt-1、Yt-2、……、Yt-pIs the historical data of the parameter, beta, alpha are regression coefficients, epsilont、εt-1、……、εt-qIs a random variable sequence.
If the table with the period of a month is used, the period number is unstable, some months are 30 days, and some months are 31 days, so that the AR model can be used for prediction, and the data of all periods are complemented into the data of 31 days, so that the ARMA model with the stable period is obtained.
After the ARMA model is established, the monitoring threshold of the corresponding parameter can be set based on the calculation of the confidence interval or the calculation of the predicted value of the model, and then the alarm processing is performed according to the comparison result of the current data of the parameter and the monitoring threshold. And will not be described in detail herein.
Example three
The present example takes cloud computing as an example to illustrate how to provide a service for data quality monitoring for users in a big data computing service.
FIG. 7 is a schematic diagram illustrating an architecture of cloud computing. The system comprises an application layer, a platform layer, a resource layer, a user access layer and a management layer. The resource layer refers to cloud computing services on an infrastructure layer, and the services can provide virtualized resources so as to hide complexity of physical resources. The platform layer provides encapsulation of resource layer services for users, so that the users can build own applications. The application layer provides software services. The user access layer is various support services required by the cloud computing service which is convenient for users to use, and a corresponding access interface is required to be provided for the cloud computing service of each layer. The management layer provides management functions for all layers of cloud computing services.
In this example, it is necessary to extend the service monitoring function of the management layer and add the data quality monitoring function to the user data. The data quality monitoring function takes user data generated by database service provided by the platform layer as a monitoring object and collects the user data in a corresponding memory in the resource layer. In this example, data quality monitoring for user data is taken as a service that needs to be customized by a user (this is optional), and therefore software of an application layer needs to be upgraded to provide the customization function, at a user access layer, an option for user data quality monitoring needs to be added in a service directory, and a corresponding customization function needs to be added on an interface for subscription management, and optionally, a parameter option that needs to be monitored and is provided according to characteristics of each user data may be included in the service directory. The data quality monitoring result of the user data reflects the quality of big data computing service provided for the user, and can be used as an important basis for managing the service quality of a management layer.
The user data quality monitoring method of the present example includes a customization process and a monitoring process, as shown in fig. 8, including:
step 310, a user terminal (such as a computer, a mobile phone, etc.) accesses a big data computing platform and obtains a service directory provided by the big data computing platform, wherein the service directory includes an option of a user data quality monitoring service; optionally, the next-level directory of the data quality monitoring service further includes a parameter option to be monitored;
step 320, the user terminal selects to subscribe the user data quality monitoring service, and optionally selects the parameters to be monitored from the provided parameter options;
step 330, the user terminal sends a subscription request for the user data quality monitoring service to the big data computing platform, optionally carrying the parameters to be monitored;
step 340, after receiving the subscription request, the big data computing platform determines the storage location of the corresponding user data, adds a data quality monitoring function to the user data in the service monitoring function, and returns a response of successful subscription to the user terminal;
step 350, the big data computing platform collects historical data related to parameters needing to be monitored in user data, a prediction model of the parameters is established by taking the historical data as a sample, current data of the parameters are predicted according to the prediction model, and monitoring threshold values of the parameters are determined;
step 360, the big data computing platform collects the current data of the parameters and compares the current data with the monitoring threshold, and if the current data exceeds the normal value range defined by the monitoring threshold, a data quality alarm message is sent to the user;
after receiving the data quality alarm, the user equipment can check whether the data is abnormal due to the problem of the user side, and if so, corresponding processing is carried out, such as modifying a program. Data exception may also be caused by a problem on the side of the big data computing platform, and therefore, a manager of the big data computing platform also needs to perform corresponding processing.
In step 370, the big data computing platform counts the number of alarms of the user data quality monitoring in a unit time (such as a day, a week, a month, or any set time period), and sends the statistical result to the user equipment.
The number of times of the data exception occurring in unit time can be used as a new parameter for evaluating the Service quality in a Service-Level Agreement (SLA) related to a big data computing platform, and an evaluation system for the Service quality of big data computing Service is perfected.
Fig. 9 is a schematic diagram illustrating signaling interaction between the user terminal and the big data computing platform according to the present example, and relevant steps are as described above for fig. 8, and will not be repeated here.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method of data quality monitoring, comprising:
collecting historical data related to parameters needing to be monitored, performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameters as dependent variables;
calculating a confidence interval of the current data of the parameter according to the regression model, and determining the boundary of the confidence interval as a monitoring threshold value of the parameter; or
Calculating a predicted value of current data of the parameter according to the regression model, determining a monitoring threshold value of the parameter according to a product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and a historical predicted value calculated based on the regression model;
and acquiring current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold, and performing alarm processing according to a comparison result.
2. The method of claim 1, wherein:
the parameters to be monitored comprise at least one of the following parameters:
the number of records in the data table;
the file size of the data table;
a field in the data table;
statistics of fields in the data table.
3. The method of claim 2, wherein:
if the parameters have dependency relationship with other parameters, the regression model adopts a regression model with independent variables including the other parameters;
if the parameters do not have dependency relationship on other parameters, the regression model adopts an autoregressive model.
4. A data quality monitoring apparatus, comprising:
a model building module; the device comprises a parameter monitoring module, a parameter analyzing module and a parameter analyzing module, wherein the parameter monitoring module is used for acquiring historical data related to parameters needing to be monitored, performing regression analysis by taking the historical data as samples and establishing a regression model by taking the parameters as dependent variables;
the threshold value determining module is used for calculating a confidence interval of the current data of the parameter according to the regression model and determining the boundary of the confidence interval as the monitoring threshold value of the parameter; or
Calculating a predicted value of current data of the parameter according to the regression model, determining a monitoring threshold value of the parameter according to a product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and a historical predicted value calculated based on the regression model;
and the alarm processing module is used for acquiring the current data of the parameters, comparing the current data with a normal value range defined by the monitoring threshold value, and carrying out alarm processing according to a comparison result.
5. The apparatus of claim 4, wherein:
the model building module collects historical data related to parameters needing to be monitored, wherein the parameters needing to be monitored comprise at least one of the following parameters: the record number of the data table, the file size of the data table, the field in the data table, and the statistic value of the field in the data table.
6. A data quality monitoring apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the process of any one of claims 1 to 3.
7. A user data quality monitoring method for big data computing service comprises the following steps:
the big data computing platform collects historical data related to parameters needing to be monitored from stored user data, performs regression analysis by taking the historical data as a sample, and establishes a regression model by taking the parameters as dependent variables; calculating a confidence interval of the current data of the parameter according to the regression model, determining the boundary of the confidence interval as a monitoring threshold of the parameter, or calculating a predicted value of the current data of the parameter according to the regression model, determining the monitoring threshold of the parameter according to the product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and the historical predicted value calculated based on the regression model;
and the big data computing platform acquires the current data of the parameters, compares the current data with the prediction result, and performs alarm processing on the quality of the user data according to the comparison result.
8. The method of claim 7, wherein:
the big data computing platform predicts the current data of the parameters according to historical data to obtain a prediction result, and the prediction result comprises the following steps: establishing a prediction model of the parameters by taking the historical data as a sample, predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
the big data computing platform collects the current data of the parameters, compares the current data with the prediction result, and carries out alarm processing on the quality of the user data according to the comparison result, and the alarm processing comprises the following steps: and acquiring current data of the parameters, comparing the current data with the monitoring threshold, and sending a data quality alarm to a corresponding user if the current data exceeds a normal value range defined by the monitoring threshold.
9. The method of claim 7 or 8, wherein:
the method further comprises the following steps: the big data computing platform counts the alarm times generated by monitoring the user data quality in unit time, and sends the statistical result to a corresponding user as the service performance parameter agreed in the service level agreement SLA related to the big data computing service.
10. A big data computing platform comprises a user data quality monitoring module, and is characterized in that: the user data quality monitoring module comprises:
the prediction unit is used for acquiring historical data related to parameters needing to be monitored from stored user data, performing regression analysis by taking the historical data as a sample, and establishing a regression model by taking the parameters as dependent variables; calculating a confidence interval of the current data of the parameter according to the regression model, determining the boundary of the confidence interval as a monitoring threshold of the parameter, or calculating a predicted value of the current data of the parameter according to the regression model, determining the monitoring threshold of the parameter according to the product of the predicted value and a fluctuation coefficient, wherein the fluctuation coefficient is determined according to a fluctuation range between the historical data and the historical predicted value calculated based on the regression model;
and the alarm unit is used for acquiring the current data of the parameters, comparing the current data with the prediction result and carrying out alarm processing on the quality of the user data according to the comparison result.
11. The big data computing platform of claim 10, wherein:
the predicting unit predicts the current data of the parameters according to the historical data to obtain a prediction result, and the predicting unit comprises: establishing a prediction model of the parameters by taking the historical data as a sample, predicting the current data of the parameters according to the prediction model, and determining the monitoring threshold of the parameters according to the prediction result;
the alarm unit collects the current data of the parameters, compares the current data with the prediction result, and carries out alarm processing of user data quality according to the comparison result, and the alarm processing comprises the following steps: and acquiring current data of the parameters and comparing the current data with the monitoring threshold, and if the current data exceeds a normal value range defined by the monitoring threshold, sending a data quality alarm to a corresponding user.
12. The big data computing platform of claim 10 or 11, wherein:
the user data quality monitoring module further comprises: and the counting module is used for counting the alarm times generated by monitoring the user data quality in unit time, and sending the counting result to a corresponding user as a service performance parameter agreed in a service level agreement SLA related to the big data computing service.
13. A big data computing platform comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the process of any of claims 7-9.
CN201610895875.1A 2016-10-13 2016-10-13 Data quality monitoring method and device and big data computing platform Active CN107943809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610895875.1A CN107943809B (en) 2016-10-13 2016-10-13 Data quality monitoring method and device and big data computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610895875.1A CN107943809B (en) 2016-10-13 2016-10-13 Data quality monitoring method and device and big data computing platform

Publications (2)

Publication Number Publication Date
CN107943809A CN107943809A (en) 2018-04-20
CN107943809B true CN107943809B (en) 2022-02-01

Family

ID=61928525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610895875.1A Active CN107943809B (en) 2016-10-13 2016-10-13 Data quality monitoring method and device and big data computing platform

Country Status (1)

Country Link
CN (1) CN107943809B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920310B (en) * 2018-05-23 2022-05-03 携程旅游网络技术(上海)有限公司 Abnormal value detection method and system of interface data
CN109035021B (en) * 2018-07-17 2020-06-09 阿里巴巴集团控股有限公司 Method, device and equipment for monitoring transaction index
CN110874674B (en) * 2018-08-29 2023-06-27 阿里巴巴集团控股有限公司 Abnormality detection method, device and equipment
CN109584057B (en) * 2018-09-28 2023-08-11 创新先进技术有限公司 Transaction detail data acquisition method, device and server
CN110399400B (en) * 2018-10-31 2023-08-15 腾讯科技(深圳)有限公司 Method, device, equipment and medium for detecting abnormal data
CN109709379B (en) * 2018-12-11 2020-11-20 河南辉煌科技股份有限公司 Track circuit alarm limit adjusting method based on big data
CN110033130A (en) * 2019-03-28 2019-07-19 阿里巴巴集团控股有限公司 The monitoring method and device of abnormal traffic
CN110309042A (en) * 2019-07-10 2019-10-08 西安点告网络科技有限公司 The method and platform of ad data monitoring
CN110598179A (en) * 2019-08-19 2019-12-20 国网新源控股有限公司 Method for setting threshold value of pumping and storage unit sensor based on multiple regression analysis
CN110530658A (en) * 2019-09-28 2019-12-03 河北工程大学 A kind of high-speed railway vehicle vibration detecting system and its detection method
CN110749307A (en) * 2019-12-03 2020-02-04 国家电网有限公司 Power transmission line displacement settlement determination method and system based on Beidou positioning
CN111090644A (en) * 2019-12-26 2020-05-01 成都康赛信息技术有限公司 Data consistency evaluation method based on data distribution fluctuation rate
CN111598449A (en) * 2020-05-15 2020-08-28 安阳工学院 Urban and rural planning monitoring management system
CN114193029A (en) * 2020-09-18 2022-03-18 宝山钢铁股份有限公司 Welding machine weld joint pre-evaluation method based on big data analysis model
CN112162878B (en) * 2020-09-30 2021-09-28 深圳前海微众银行股份有限公司 Database fault discovery method and device, electronic equipment and storage medium
CN112836964A (en) * 2021-02-02 2021-05-25 曹洪 Enterprise abnormity assessment system and assessment method
CN112882896A (en) * 2021-02-23 2021-06-01 广州虎牙科技有限公司 Data monitoring method and device and electronic equipment
CN113242153B (en) * 2021-06-08 2023-04-18 广东嘉贸通科技有限公司 Application-oriented monitoring analysis method based on network traffic monitoring
CN114623281A (en) * 2021-12-01 2022-06-14 哈尔滨圣昌科技开发有限公司 Pipeline prediction analysis alarm system and use method thereof
CN115097796B (en) * 2022-07-08 2023-04-21 广州市物码信息科技有限公司 Quality control system and method for simulating big data and correcting AQL value

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117306A (en) * 2010-01-04 2011-07-06 阿里巴巴集团控股有限公司 Method and system for monitoring ETL (extract-transform-load) data processing process
CN104063747A (en) * 2014-06-26 2014-09-24 上海交通大学 Performance abnormality prediction method in distributed system and system
CN104113872A (en) * 2013-04-22 2014-10-22 中国移动通信集团湖北有限公司 Method and system for data service monitoring
CN104301413A (en) * 2014-10-17 2015-01-21 国云科技股份有限公司 Oracle distributed real-time monitoring method orienting cloud databases
CN104820154A (en) * 2015-05-25 2015-08-05 重庆大学 Power supply data visualized monitoring system based on visualization technology
CN104850933A (en) * 2015-04-10 2015-08-19 国电南瑞科技股份有限公司 Scheduling automation data checking system and method based on credible characteristic values
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7477960B2 (en) * 2005-02-16 2009-01-13 Tokyo Electron Limited Fault detection and classification (FDC) using a run-to-run controller
CN105095614A (en) * 2014-04-18 2015-11-25 国际商业机器公司 Method and device for updating prediction model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117306A (en) * 2010-01-04 2011-07-06 阿里巴巴集团控股有限公司 Method and system for monitoring ETL (extract-transform-load) data processing process
CN104113872A (en) * 2013-04-22 2014-10-22 中国移动通信集团湖北有限公司 Method and system for data service monitoring
CN104063747A (en) * 2014-06-26 2014-09-24 上海交通大学 Performance abnormality prediction method in distributed system and system
CN104301413A (en) * 2014-10-17 2015-01-21 国云科技股份有限公司 Oracle distributed real-time monitoring method orienting cloud databases
CN104850933A (en) * 2015-04-10 2015-08-19 国电南瑞科技股份有限公司 Scheduling automation data checking system and method based on credible characteristic values
CN104820154A (en) * 2015-05-25 2015-08-05 重庆大学 Power supply data visualized monitoring system based on visualization technology
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Monitoring Data Streams at Process Level in Scientific Big Data Batch Clusters";E Kuehn 等;《2014 IEEE/ACM International Symposium on Big Data Computing (BDC)》;20141201;90-95 *
"基于数据库仓库与OLAP技术实现医院医疗数据质量监控";刘晓辉 等;《中国卫生信息管理杂志》;20070502;476-480 *
"基于运营商的数据质量管控体系探讨";林碧兰 等;《中国新通信》;20160905;57-59 *

Also Published As

Publication number Publication date
CN107943809A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107943809B (en) Data quality monitoring method and device and big data computing platform
US10666525B2 (en) Distributed multi-data source performance management
US10896203B2 (en) Digital analytics system
US10592811B1 (en) Analytics scripting systems and methods
US9286354B2 (en) Systems and/or methods for forecasting future behavior of event streams in complex event processing (CEP) environments
US8014983B2 (en) Computer-implemented system and method for storing data analysis models
US20170371757A1 (en) System monitoring method and apparatus
CN109413175B (en) Information processing method and device and electronic equipment
CN109120463B (en) Flow prediction method and device
US20050096949A1 (en) Method and system for automatic continuous monitoring and on-demand optimization of business IT infrastructure according to business objectives
US20200134642A1 (en) Method and system for validating ensemble demand forecasts
US20200134640A1 (en) Method and system for generating ensemble demand forecasts
US11716422B1 (en) Call center load balancing and routing management
US11816687B2 (en) Personalized approach to modeling users of a system and/or service
CN114202256A (en) Architecture upgrading early warning method and device, intelligent terminal and readable storage medium
US10439919B2 (en) Real time event monitoring and analysis system
CN116149848A (en) Load prediction method and device, electronic equipment and storage medium
CN114862435A (en) Customer loss early warning method, equipment and medium
CN114385121A (en) Software design modeling method and system based on business layering
CN112308419A (en) Data processing method, device, equipment and computer storage medium
Vela et al. Estimating the effect of network element events in a wireless network
CN111277445B (en) Method and device for evaluating performance of online node server
US20220027251A1 (en) System for monitoring activity in a process and method thereof
CN113742118B (en) Method and system for detecting anomalies in data pipes
Shah An Exogenous Factor Aware Resource Prediction Model for Auto-Scaling in Cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant