CN108829718B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN108829718B
CN108829718B CN201810426890.0A CN201810426890A CN108829718B CN 108829718 B CN108829718 B CN 108829718B CN 201810426890 A CN201810426890 A CN 201810426890A CN 108829718 B CN108829718 B CN 108829718B
Authority
CN
China
Prior art keywords
data
value
index
actual value
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810426890.0A
Other languages
Chinese (zh)
Other versions
CN108829718A (en
Inventor
周葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810426890.0A priority Critical patent/CN108829718B/en
Publication of CN108829718A publication Critical patent/CN108829718A/en
Application granted granted Critical
Publication of CN108829718B publication Critical patent/CN108829718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a data processing method and a data processing device, wherein the method comprises the following steps: determining a data index to be processed; acquiring historical data of the data index; determining a predicted value of the data index according to the historical data; when an actual value of the data index is acquired, calculating a difference value between the actual value and the predicted value; and if the difference does not exceed a preset threshold, judging that the actual value is valid data. According to the data processing method and device, future data of a certain data index are predicted through historical data of the data index, after an actual value of the data is obtained through actual collection, whether the actual value is effective or not is determined through judging whether the actual value meets the development trend of the historical data, and therefore a link of data verification is added in the data processing process, a data user cannot obtain wrong data, and the accuracy of a subsequent data analysis result is guaranteed.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and a data processing apparatus.
Background
In computer science, data is a generic term for media of all symbols that can be input to a computer and processed by a computer program, and is a generic term for numbers, letters, symbols, analog quantities, and the like that have a certain meaning and are used to be input to an electronic computer for processing. The objects stored and processed by computers are now quite extensive, and the data representing these objects has become increasingly complex.
Take the video industry as an example. When the video website provides functions of playing videos and the like, various data generated when a user searches and watches videos are continuously collected, and after the data are processed, data such as click rate, search rate, exposure and the like are formed for analysis of analysts, so that corresponding bases are provided for subsequent company operation decisions.
However, in the production process, the finally generated data may have errors of different degrees due to different data acquisition modes or faults in the transmission process. If the wrong data is directly acquired and used by an analyst, inaccuracy in the data analysis results may result. More seriously, it also misleads the operating decision of the company.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide a method of data processing and a corresponding apparatus of data processing that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present invention discloses a data processing method, including:
determining a data index to be processed;
acquiring historical data of the data index;
determining a predicted value of the data index according to the historical data;
when an actual value of the data index is acquired, calculating a difference value between the actual value and the predicted value;
and if the difference does not exceed a preset threshold, judging that the actual value is valid data.
Optionally, the step of obtaining the historical data of the data index includes:
determining the acquisition period of the historical data;
and acquiring historical data of the data index in the acquisition period.
Optionally, the step of determining the predicted value of the data index according to the historical data includes:
generating a prediction model for the data index from the historical data;
and calculating the predicted value of the data index by adopting the prediction model.
Optionally, the step of generating a predictive model for the data indicator from the historical data comprises:
setting default regression parameter values;
and training a preset autoregressive model by changing the default regression parameter value, so that the prediction error of the historical data is smaller than a second preset threshold value, and generating a prediction model.
Optionally, the method further comprises:
if the difference exceeds the preset threshold, carrying out error correction processing on the data calculation script;
and recalculating the actual value of the data index by adopting the corrected data calculation script.
In order to solve the above problem, an embodiment of the present invention discloses a data processing apparatus, including:
the data index determining module is used for determining a data index to be processed;
the historical data acquisition module is used for acquiring historical data of the data index;
the predicted value determining module is used for determining the predicted value of the data index according to the historical data;
the difference value calculation module is used for calculating the difference value between the actual value and the predicted value when the actual value of the data index is acquired;
and the effective data judging module is used for judging that the actual value is effective data if the difference value does not exceed a preset threshold value.
Optionally, the historical data obtaining module includes:
the acquisition period determining submodule is used for determining the acquisition period of the historical data;
and the historical data acquisition submodule is used for acquiring the historical data of the data index in the acquisition period.
Optionally, the predicted value determining module includes:
the prediction model generation submodule is used for generating a prediction model aiming at the data index according to the historical data;
and the predicted value calculating submodule is used for calculating the predicted value of the data index by adopting the prediction model.
Optionally, the prediction model generation sub-module includes:
the regression parameter value setting unit is used for setting default regression parameter values;
and the autoregressive model training unit is used for training a preset autoregressive model by changing the default regression parameter value, so that the prediction error of the historical data is smaller than a second preset threshold value, and a prediction model is generated.
Optionally, the method further comprises:
the error correction processing module is used for carrying out error correction processing on the data calculation script if the difference value exceeds the preset threshold value;
and the actual value calculating module is used for recalculating the actual value of the data index by adopting the corrected data calculating script.
Compared with the background art, the embodiment of the invention has the following advantages:
according to the embodiment of the invention, the data index to be processed is determined, the historical data of the data index is obtained, and then the predicted value of the data index can be determined according to the historical data, so that when the actual value of the data index is acquired, the difference value between the actual value and the predicted value can be calculated, and if the difference value between the actual value and the predicted value does not exceed the preset threshold value, the actual value can be judged to be effective data. According to the data processing method and device, future data of a certain data index are predicted through historical data of the data index, after an actual value of the data is obtained through actual collection, whether the actual value is effective or not is determined through judging whether the actual value meets the development trend of the historical data, and therefore a link of data verification is added in the data processing process, a data user cannot obtain wrong data, and the accuracy of a subsequent data analysis result is guaranteed.
Drawings
FIG. 1 is a flow chart illustrating steps of a method of data processing according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating steps of another method of data processing according to one embodiment of the present invention;
FIG. 3 is a business flow diagram of a method of data processing according to an embodiment of the invention;
fig. 4 is a schematic block diagram of an embodiment of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1, a schematic flow chart illustrating steps of a data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, determining a data index to be processed;
in the embodiment of the present invention, the data index may refer to a kind of preliminarily processed or calculated data obtained by processing or counting the collected original data in some way. For example, the data indicators may be advertisement exposure, video program search volume, click through volume, or advertisement revenue, among others. The present embodiment does not limit the specific type of the data index.
In the embodiment of the present invention, it may be determined which data index needs to be processed currently, and then the data index is processed to determine whether the currently acquired or calculated data is real and valid data.
For example, to determine whether the data of the advertisement exposure obtained by statistics is accurate, the advertisement exposure can be used as the data index to be processed currently.
102, acquiring historical data of the data index;
in embodiments of the present invention, historical data may refer to data over a period of time in the past. For example, the past day, week, month, or year, etc. Those skilled in the art can determine the collection period of the historical data according to actual needs, which is not limited in this embodiment.
In general, future data for a certain data index may be predicted from historical data for that index. Therefore, in order to ensure the accuracy of prediction, historical data can be acquired for a long period of time as much as possible. Under the condition of considering both the calculation efficiency and the calculation accuracy, the past year can be used as the acquisition period of the historical data.
Step 103, determining a predicted value of the data index according to the historical data;
in an embodiment of the present invention, the predicted value of the data index may refer to data of a future time point or time period of the index that has not been generated yet. For example, predicted exposure data for today's advertisements.
In a specific implementation, after the historical data of the data index is acquired, a data prediction model may be constructed by using the historical data, and the current predicted value of the data index is calculated by using the model.
Take advertising exposure as an example. After acquiring the advertisement exposure data of 365 days in the past year, a prediction model can be generated according to the data, and then the advertisement exposure of today is predicted through the model.
Of course, the manner of building the prediction model through history and then calculating the predicted value of the data index is only an example, and a person skilled in the art may determine the predicted value of the data index by using other manners according to actual needs, which is not limited in this embodiment.
104, when an actual value of the data index is acquired, calculating a difference value between the actual value and the predicted value;
in the embodiment of the present invention, the actual value of the data index may refer to actually acquired data. Generally, the data may be a data value obtained by processing or counting raw data through a data calculation script.
For example, the actual value of the advertising exposure data may refer to the advertising exposure for the day that is counted.
In the embodiment of the present invention, in order to determine whether the actual value of the acquired data index is valid data, the actual value may be first compared with the predicted value of the data index determined in step 103.
In a particular implementation, when comparing the actual value and the predicted value, a difference between the two may be calculated. For the convenience of subsequent calculations, the above-mentioned difference may further refer to an absolute value of the difference.
And 105, if the difference does not exceed a preset threshold, determining that the actual value is valid data.
In the embodiment of the present invention, if the difference between the actual value and the predicted value calculated in step 104 is within the preset threshold range, it can be considered that the actual value is within the error range of the predicted value, and conforms to the development trend of the historical data, and the actual value is effective with a very high probability. At this point, the actual value may be marked as valid data and passed to the business segment for use.
In the embodiment of the invention, the data index to be processed is determined, the historical data of the data index is obtained, and then the predicted value of the data index can be determined according to the historical data, so that when the actual value of the data index is acquired, the difference value between the actual value and the predicted value can be calculated, and if the difference value between the actual value and the predicted value does not exceed the preset threshold value, the actual value can be judged to be valid data. According to the data processing method and device, future data of a certain data index are predicted through historical data of the data index, after an actual value of the data is obtained through actual collection, whether the actual value is effective or not is determined through judging whether the actual value meets the development trend of the historical data, and therefore a link of data verification is added in the data processing process, a data user cannot obtain wrong data, and the accuracy of a subsequent data analysis result is guaranteed.
Referring to fig. 2, a schematic flow chart illustrating steps of another data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 201, determining a data index to be processed;
in the embodiment of the present invention, the data index may refer to a kind of preliminarily processed or calculated data obtained by processing or counting the collected original data in some way. For example, the data indicators may be advertisement exposure, video program search volume, click through volume, or advertisement revenue, among others. The present embodiment does not limit the specific type of the data index.
In the embodiment of the present invention, it may be determined which data index needs to be processed currently, and then the data index is processed to determine whether the currently acquired or calculated data is real and valid data.
For example, to determine whether the data of the advertisement exposure obtained by statistics is accurate, the advertisement exposure can be used as the data index to be processed currently.
Step 202, determining the acquisition period of historical data;
in the embodiment of the present invention, the historical data may refer to data in a past period of time, that is, a collection period of the historical data. For example, the acquisition period may be the past day, month, or year, etc.
For ease of understanding, the present embodiment is described by taking the historical data collection period as an example of the last year.
For example, if the current time point is 2018, 1/month, 1, the collection period of the historical data for a certain data index may be one year from 2018, 1/month, 1 to 2017, 12/month, 31.
Step 203, acquiring historical data of the data index in the acquisition period;
in the embodiment of the invention, after the acquisition period of the historical data is determined, the data of the corresponding index in the corresponding period can be extracted.
In a specific implementation, a data summary result of the index can be found out from the data statistics table by day, and then all data of the index in the past 365 days are found out by using the following SQL statements:
SELECT dt,SUM(revenue)FROM table WHERE dt BETWEEN DATE_SUB(DATE_FORMAT(NOW(),'%Y-%m-%d'),INTERVAL 1YEAR)AND DATE_SUB(DATE_FORMAT(NOW(),'%Y-%m-%d'),INTERVAL 1DAY)GROUP BY dt
in the embodiment of the invention, for the convenience of subsequent calculation, after the historical data is obtained, the historical data can be converted into vectors. Specifically, the data summary result of the past 365 days can be converted into a vector (x) with length of 3651,x2,……,x365)。
It should be noted that, for acquisition periods with different time lengths, the lengths of the vectors corresponding to the converted historical data are also different. For example, if the past 90 days are taken as the acquisition period, a vector with a length of 90, i.e., (x)1,x2,……,x90) This embodiment is not limited to this.
Step 204, generating a prediction model aiming at the data index according to the historical data;
in the embodiment of the present invention, an autoregressive model may be used to train the historical data, so as to obtain a prediction model for the data index.
An Autoregressive model (AR model for short) is a statistical method for processing time series. By using the same variable, e.g. x, for each preceding stage, i.e. x1To xt-1To predict the current period xtAnd assume that they are in a linear relationship. Autoregressive models are widely used in economics, informatics, and, the prediction of natural phenomena.
In a specific implementation, a default regression parameter value may be set first, and then a preset autoregressive model is trained by changing the default regression parameter value, so that a prediction error of historical data is smaller than a preset threshold value, so as to generate a prediction model.
Take the data index as the advertisement exposure for example. Since the advertisement data is periodic for one week (7 days), the training of the model using a seven-order autoregressive model may be selected. A group of default regression parameters are set by using historical 365-day data, and then the values of the regression parameters are continuously changed in a gradient descending mode, so that the error of the training model on the prediction of the historical values is minimized, and the prediction model is obtained.
Step 205, calculating a predicted value of the data index by using the prediction model;
in a specific implementation, after a data prediction model is constructed by using the historical data, the current predicted value of the data index can be calculated by the model. That is, the data of the next point predicted after the model is obtained can be trained through historical data.
For example, taking prediction of the advertisement exposure amount of 2018 year 1 month and 1 day as an example, after model training is performed by using historical data of the advertisement exposure amount of 2017 year 1 month and 1 day to 12 month and 31 to obtain a corresponding prediction model, the model can be used to predict advertisement exposure amount data of 2018 year 1 month and 1 day.
Step 206, when the actual value of the data index is acquired, calculating the difference value between the actual value and the predicted value;
in the embodiment of the present invention, the actual value of the data index may refer to actually acquired data. Generally, the data may be a data value obtained by processing or counting raw data through a data calculation script.
For example, the actual value of the advertisement exposure amount data may refer to the advertisement exposure amount counted for 1 month and 1 day of 2018.
In the embodiment of the present invention, in order to determine whether the actual value of the acquired data index is valid data, the actual value may be first compared with the predicted value of the data index determined in step 205.
In a particular implementation, when comparing the actual value and the predicted value, a difference between the two may be calculated. For the convenience of subsequent calculations, the above-mentioned difference may further refer to an absolute value of the difference.
Step 207, if the difference does not exceed a preset threshold, determining that the actual value is valid data;
in the embodiment of the present invention, if the difference between the actual value and the predicted value calculated in step 206 is within the preset threshold range, it can be considered that the actual value is within the error range of the predicted value, and conforms to the development trend of the historical data, and the actual value is effective with a very high probability. At this point, the actual value may be marked as valid data and passed to the business segment for use.
Step 208, if the difference exceeds the preset threshold, performing error correction processing on the data calculation script;
in the embodiment of the invention, if the difference between the actual value and the predicted value obtained by calculation is too large and is not within the range of the preset threshold, the actual value is considered to be not within the error range of the predicted value and not conform to the development trend of historical data, and the actual value is invalid with a very high probability.
At this time, the data calculation script may be subjected to error correction processing to eliminate a fault causing an error in the data calculation process, and step 209 is performed to recalculate the actual value of the data index using the corrected data calculation script.
And step 209, recalculating the actual value of the data index by using the corrected data calculation script.
In the embodiment of the present invention, after the actual value of the data index is obtained through recalculation, the step 206 may be executed again to continuously determine the validity of the actual value obtained through recalculation.
In the embodiment of the invention, a data verification link is added in data transmission to judge whether the actual value of the data obtained by actual calculation meets the development trend of historical data or not, so that whether the actual value is effective data or not is judged. If the actual value is determined to be unavailable invalid data through judgment, recalculation can be carried out again in a mode of manually correcting the data calculation script, so that the accuracy of data calculation is guaranteed, a data user cannot acquire wrong data, and the accuracy of a subsequent data analysis result is guaranteed.
For ease of understanding, the method of data processing of the present embodiment is described below as a complete example.
Fig. 3 is a schematic business flow diagram of a data processing method according to an embodiment of the present invention. In fig. 3, when a data analyst performs a data calculation task, a prediction model of the data index can be constructed from historical data, and a predicted value of a corresponding date can be predicted, so that after an actual value of the index on the same date is calculated, the predicted value and the actual value can be compared. And if the difference value between the actual value and the predicted value time meets the preset requirement, marking the actual value as effective data, and transmitting the effective data to a data demand side for analysis. And if the difference value between the actual value and the predicted value does not meet the preset requirement, the actual value can be considered as unavailable invalid data. At this time, the data calculation script can be corrected, the actual value of the index on the current day is recalculated, and the validity of the data is ensured.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 4, a schematic block diagram of an embodiment of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a data index determining module 401, configured to determine a data index to be processed;
a historical data obtaining module 402, configured to obtain historical data of the data index;
a predicted value determining module 403, configured to determine a predicted value of the data index according to the historical data;
a difference calculation module 404, configured to calculate a difference between the actual value and the predicted value when the actual value of the data index is acquired;
and an effective data determining module 405, configured to determine that the actual value is effective data if the difference does not exceed a preset threshold.
In the embodiment of the invention, the data index to be processed is determined by the data index determining module, the historical data of the data index is obtained by the historical data obtaining module, and then the predicted value of the data index is determined by the predicted value determining module according to the historical data, so that when the actual value of the data index is acquired, the difference value between the actual value and the predicted value can be calculated by the difference value calculating module, and if the difference value between the actual value and the predicted value does not exceed the preset threshold value, the actual value can be judged as the effective data by the effective data judging module. According to the data processing method and device, future data of a certain data index are predicted through historical data of the data index, after an actual value of the data is obtained through actual collection, whether the actual value is effective or not is determined through judging whether the actual value meets the development trend of the historical data, and therefore a link of data verification is added in the data processing process, a data user cannot obtain wrong data, and the accuracy of a subsequent data analysis result is guaranteed.
In this embodiment of the present invention, the historical data obtaining module 402 may specifically include the following sub-modules:
the acquisition period determining submodule is used for determining the acquisition period of the historical data;
and the historical data acquisition submodule is used for acquiring the historical data of the data index in the acquisition period.
In this embodiment of the present invention, the predicted value determining module 403 may specifically include the following sub-modules:
the prediction model generation submodule is used for generating a prediction model aiming at the data index according to the historical data;
and the predicted value calculating submodule is used for calculating the predicted value of the data index by adopting the prediction model.
In this embodiment of the present invention, the prediction model generation sub-module may specifically include the following units:
the regression parameter value setting unit is used for setting default regression parameter values;
and the autoregressive model training unit is used for training a preset autoregressive model by changing the default regression parameter value, so that the prediction error of the historical data is smaller than a second preset threshold value, and a prediction model is generated.
In the embodiment of the present invention, the apparatus may further include the following modules:
the error correction processing module is used for carrying out error correction processing on the data calculation script if the difference value exceeds the preset threshold value;
and the actual value calculating module is used for recalculating the actual value of the data index by adopting the corrected data calculating script.
In the embodiment of the invention, in data transmission, a data verification link is added, the difference between the actual value and the predicted value is calculated by the difference calculation module, and whether the actual value of the data obtained by actual calculation meets the development trend of historical data is judged by effective data judgment, so that whether the actual value is effective data is judged. If the actual value is determined to be unavailable invalid data through judgment, the data calculation script can be corrected through the error correction processing module and calculated again, so that the accuracy of data calculation is guaranteed, a data user cannot acquire wrong data, and the accuracy of a subsequent data analysis result is guaranteed.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The data processing method and the data processing apparatus provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (6)

1. A method for data processing, which is applied in a data transfer process and before a data analysis process, comprises the following steps:
when data transmission is needed, determining a data index to be processed;
acquiring historical data of the data index;
determining a predicted value of the data index according to the historical data;
when an actual value of the data index is acquired, calculating a difference value between the actual value and the predicted value; the actual value of the data index is obtained by processing or counting the original data through a data calculation script;
if the difference value does not exceed a preset threshold value, judging that the actual value is valid data; and transmitting the data to a data user so that the data user can analyze the data by using the effective data;
wherein the method further comprises:
if the difference exceeds the preset threshold, carrying out error correction processing on the data calculation script;
recalculating the actual value of the data index by adopting the corrected data calculation script, determining the recalculated actual value as effective data, and transmitting the effective data to a data user so that the data user performs data analysis by using the effective data;
wherein the step of determining the predicted value of the data indicator based on the historical data comprises:
generating a prediction model for the data index from the historical data;
and calculating the predicted value of the data index by adopting the prediction model.
2. The method of claim 1, wherein the step of obtaining historical data of the data indicators comprises:
determining the acquisition period of the historical data;
and acquiring historical data of the data index in the acquisition period.
3. The method of claim 1, wherein generating a predictive model for the data metric based on the historical data comprises:
setting default regression parameter values;
and training a preset autoregressive model by changing the default regression parameter value, so that the prediction error of the historical data is smaller than a second preset threshold value, and generating a prediction model.
4. A data processing device, which is applied in a data transmission process and before a data analysis process, comprises:
the data index determining module is used for determining a data index to be processed when data transmission is needed;
the historical data acquisition module is used for acquiring historical data of the data index;
the predicted value determining module is used for determining the predicted value of the data index according to the historical data;
the difference value calculation module is used for calculating the difference value between the actual value and the predicted value when the actual value of the data index is acquired; the actual value of the data index is obtained by processing or counting the original data through a data calculation script;
the effective data judgment module is used for judging that the actual value is effective data and transmitting the effective data to a data user if the difference value does not exceed a preset threshold value, so that the data user can analyze the data by using the effective data;
wherein the apparatus further comprises:
the error correction processing module is used for carrying out error correction processing on the data calculation script if the difference value exceeds the preset threshold value;
the actual value calculation module is used for recalculating the actual value of the data index by adopting the corrected data calculation script, determining the recalculated actual value as effective data and transmitting the effective data to a data user so as to enable the data user to utilize the effective data to perform data analysis;
wherein the predictor determination module comprises:
the prediction model generation submodule is used for generating a prediction model aiming at the data index according to the historical data;
and the predicted value calculating submodule is used for calculating the predicted value of the data index by adopting the prediction model.
5. The apparatus of claim 4, wherein the historical data acquisition module comprises:
the acquisition period determining submodule is used for determining the acquisition period of the historical data;
and the historical data acquisition submodule is used for acquiring the historical data of the data index in the acquisition period.
6. The apparatus of claim 4, wherein the predictive model generation sub-module comprises:
the regression parameter value setting unit is used for setting default regression parameter values;
and the autoregressive model training unit is used for training a preset autoregressive model by changing the default regression parameter value, so that the prediction error of the historical data is smaller than a second preset threshold value, and a prediction model is generated.
CN201810426890.0A 2018-05-07 2018-05-07 Data processing method and device Active CN108829718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810426890.0A CN108829718B (en) 2018-05-07 2018-05-07 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810426890.0A CN108829718B (en) 2018-05-07 2018-05-07 Data processing method and device

Publications (2)

Publication Number Publication Date
CN108829718A CN108829718A (en) 2018-11-16
CN108829718B true CN108829718B (en) 2021-04-06

Family

ID=64147515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810426890.0A Active CN108829718B (en) 2018-05-07 2018-05-07 Data processing method and device

Country Status (1)

Country Link
CN (1) CN108829718B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008386B (en) * 2019-01-17 2023-08-01 创新先进技术有限公司 Data generation, processing and evaluation method, device, equipment and medium
CN111311086B (en) * 2020-02-11 2024-02-09 中国银联股份有限公司 Capacity monitoring method, device and computer readable storage medium
CN113590989A (en) * 2020-04-30 2021-11-02 北京金山云网络技术有限公司 Data processing method and device for real-time computing abnormality and electronic equipment
CN113689078A (en) * 2021-07-27 2021-11-23 中国科学院地理科学与资源研究所 Survey data verification method and device
CN113744890A (en) * 2021-11-03 2021-12-03 北京融信数联科技有限公司 Reworking and production-resuming analysis method, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111284A (en) * 2009-12-28 2011-06-29 北京亿阳信通软件研究院有限公司 Method and device for predicting telecom traffic
CN103745279A (en) * 2014-01-24 2014-04-23 广东工业大学 Method and device for monitoring energy consumption abnormity
CN103886018A (en) * 2014-02-21 2014-06-25 车智互联(北京)科技有限公司 Data predication device, data predication method and electronic equipment
CN105676670A (en) * 2014-11-18 2016-06-15 北京翼虎能源科技有限公司 Method and system for processing energy data
CN107958297A (en) * 2016-10-17 2018-04-24 华为技术有限公司 A kind of product demand forecasting method and product demand prediction meanss

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126144A1 (en) * 2006-07-21 2008-05-29 Alex Elkin Method and system for improving the accuracy of a business forecast

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111284A (en) * 2009-12-28 2011-06-29 北京亿阳信通软件研究院有限公司 Method and device for predicting telecom traffic
CN103745279A (en) * 2014-01-24 2014-04-23 广东工业大学 Method and device for monitoring energy consumption abnormity
CN103886018A (en) * 2014-02-21 2014-06-25 车智互联(北京)科技有限公司 Data predication device, data predication method and electronic equipment
CN105676670A (en) * 2014-11-18 2016-06-15 北京翼虎能源科技有限公司 Method and system for processing energy data
CN107958297A (en) * 2016-10-17 2018-04-24 华为技术有限公司 A kind of product demand forecasting method and product demand prediction meanss

Also Published As

Publication number Publication date
CN108829718A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN108829718B (en) Data processing method and device
CN108280542B (en) User portrait model optimization method, medium and equipment
CN107220845B (en) User re-purchase probability prediction/user quality determination method and device and electronic equipment
CN109711440B (en) Data anomaly detection method and device
CN107357764B (en) Data analysis method, electronic device, and computer storage medium
CN107958297B (en) Product demand prediction method and product demand prediction device
CN112132485A (en) Index data processing method and device, electronic equipment and storage medium
CN106056239B (en) Product inventory prediction method and device
CN111833594A (en) Traffic flow prediction method, traffic flow prediction device, electronic device, and storage medium
CN111507483A (en) Rework board detection apparatus, method, and computer-readable storage medium
CN111553529A (en) Load prediction method and device, computer readable storage medium and electronic equipment
Wang An imperfect software debugging model considering irregular fluctuation of fault introduction rate
CN107330709B (en) Method and device for determining target object
CN110929922A (en) Index trend prediction method and device based on time series data
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN112783747B (en) Execution time prediction method and device for application program
CN114925750A (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
US20190205802A1 (en) Information processing device, information processing method and computer readable medium
CN113987034A (en) Information display method and device, electronic equipment and readable storage medium
CN113780356A (en) Water quality prediction method and system based on ensemble learning model
CN105589950A (en) Event attribute statement determination method, early warning method and apparatus based on event attribute statement
CN111476281B (en) Information popularity prediction method and device
CN109857911B (en) Method and device for determining policy data, readable medium and electronic equipment
CN112152968A (en) Network threat detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant