CN109697247B - Method and device for detecting data accuracy - Google Patents

Method and device for detecting data accuracy Download PDF

Info

Publication number
CN109697247B
CN109697247B CN201811648569.3A CN201811648569A CN109697247B CN 109697247 B CN109697247 B CN 109697247B CN 201811648569 A CN201811648569 A CN 201811648569A CN 109697247 B CN109697247 B CN 109697247B
Authority
CN
China
Prior art keywords
preset
time
result data
detection time
timestamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811648569.3A
Other languages
Chinese (zh)
Other versions
CN109697247A (en
Inventor
韩红根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201811648569.3A priority Critical patent/CN109697247B/en
Publication of CN109697247A publication Critical patent/CN109697247A/en
Application granted granted Critical
Publication of CN109697247B publication Critical patent/CN109697247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Abstract

The application discloses a method and a device for detecting data accuracy, wherein the method comprises the following steps: calculating a first difference value, wherein the first difference value is a difference value between result data acquired at the current detection time in the result data sequence and previous result data; the result data sequence is obtained by arranging the result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change between the result data; acquiring difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging the detection time in the sequence from first to last; and under the condition of meeting the preset condition, determining that the result data acquired at the current detection time is accurate. By the method and the device, whether the result data obtained at the current detection time is accurate can be determined.

Description

Method and device for detecting data accuracy
Technical Field
The present application relates to the field of data processing, and in particular, to a method and an apparatus for detecting data accuracy.
Background
Currently, there are high requirements on the accuracy of the result data in many application scenarios.
For example, in a video advertisement scene, a value of a preset index corresponding to a certain timestamp needs to be obtained by calculating based on data generated by the timestamp. For example, it is necessary to obtain the exposure rate value corresponding to 14 o 'clock 20 point calculated from the large amount of data generated from 14 o' clock 20 point as a reference for the user to decide how much to credit the account.
Since the accuracy of the obtained preset index value corresponding to a certain timestamp has a significant influence on the decision of the user, the accuracy of the obtained preset index value corresponding to a certain timestamp needs to be ensured.
Disclosure of Invention
The application provides a method and a device for detecting data accuracy, and aims to solve the problem of whether the value of a preset index corresponding to a timestamp is detected to be accurate or not.
In order to achieve the above object, the present application provides the following technical solutions:
the application discloses a method for detecting data accuracy, which comprises the following steps:
calculating a first difference value, wherein the first difference value is a difference value between result data acquired at the current detection time in a result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change among the result data;
acquiring difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging detection time according to the sequence from first to last;
and determining that the result data obtained by the current detection time is accurate under the condition of meeting a preset condition.
Wherein the preset conditions include: the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value.
Wherein the preset conditions include: the acquired preset number of difference values and the first difference value are both smaller than the preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks, wherein the calculation tasks are used for calculating data generated by preset equipment at least once to obtain at least one calculation result; the result data obtained at the current detection time is the last calculated data in the at least one calculation result
Wherein, still include:
determining that result data obtained by the current detection time is inaccurate under the condition that no unexecuted calculation task exists in the generated calculation tasks and the preset number of difference values and the first difference values are not all smaller than the preset threshold value;
when the time reaches a first target timestamp, acquiring the last calculated data from the at least one calculation result as result data, and executing the step of calculating a first difference value; the first target timestamp is a minimum timestamp which is greater than the current detection time in a plurality of preset timestamps; and the time length between two adjacent timestamps in the preset plurality of timestamps is preset time length.
Wherein, still include:
determining that result data acquired at the current detection time is inaccurate under the condition that unexecuted calculation tasks exist in the generated calculation tasks;
determining the total time length required for completing the unexecuted calculation tasks according to the number of the unexecuted calculation tasks and the preset time length required for executing one calculation task;
determining a timestamp obtained by delaying the total duration at the current detection time as a reference timestamp;
determining a minimum timestamp which is greater than the reference timestamp in the preset timestamps as a second target timestamp;
and when the time reaches the second target timestamp, acquiring data calculated last time by the second target timestamp from the at least one calculation result as result data, and executing the step of calculating the first difference value.
The difference value between the result data obtained at the current detection time and the previous result data is calculated in the following mode:
calculating the difference value between the result data obtained at the current detection time and the previous result data;
calculating the ratio of the difference value to the target time length as the difference value; and the target time length is the detection time corresponding to the previous result data and the time length corresponding to the current detection.
The application also provides a detection device for data accuracy, which comprises:
the calculating unit is used for calculating a first difference value, wherein the first difference value is a difference value between result data acquired at the current detection time in the result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change among the result data;
the first acquisition unit is used for acquiring difference values which are obtained by respectively calculating a preset number of detection time before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging detection time according to the sequence from first to last;
and the first determining unit is used for determining that the result data acquired by the current detection time is accurate under the condition that a preset condition is met.
Wherein the preset condition in the first determination unit includes: the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value.
Wherein the preset condition in the first determination unit includes: the acquired preset number of difference values and the first difference value are both smaller than the preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks, wherein the calculation tasks are used for calculating data generated by preset equipment at least once to obtain at least one calculation result; and the result data obtained by the current detection time is the data calculated last time in the at least one calculation result.
Wherein, still include:
a second determining unit, configured to determine that result data obtained at the current detection time is inaccurate when there is no unexecuted computation task in the generated computation tasks and a preset number of difference values and the first difference values are not all smaller than the preset threshold;
a second obtaining unit, configured to obtain, when time reaches a first target timestamp, data calculated last time from the at least one calculation result as result data, and perform the step of calculating the first difference value; the first target timestamp is a minimum timestamp which is greater than the current detection time in a plurality of preset timestamps; and the time length between two adjacent timestamps in the preset plurality of timestamps is preset time length.
Wherein, still include:
a third determination unit configured to determine that result data obtained at the current detection time is inaccurate when there is an unexecuted calculation task among the generated calculation tasks;
a fourth determining unit, configured to determine, according to the number of the unexecuted computing tasks and a preset time required for executing one computing task, a total time required for completing the unexecuted computing task;
a fifth determining unit, configured to determine that a timestamp obtained by delaying the total duration at the current detection time is a reference timestamp;
a sixth determining unit, configured to determine a minimum timestamp, which is greater than the reference timestamp, of the preset timestamps to be a second target timestamp;
a third obtaining unit, configured to, when the time reaches the second target timestamp, obtain, from the at least one calculation result, data calculated last by the second target timestamp as result data, and perform the step of calculating the first difference value.
Wherein, the computing unit is specifically configured to:
calculating the difference value between the result data obtained at the current detection time and the previous result data;
calculating the ratio of the difference value to the target time length as the difference value; and the target time length is the detection time corresponding to the previous result data and the time length corresponding to the current detection.
In the method and the device for detecting the data accuracy, a first difference value is calculated, wherein the first difference value is a difference value between result data acquired at the current detection time in a result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; based on the detection time sequence, difference values obtained by respectively calculating a preset number of detection times before the current detection time are obtained, at the moment, difference values obtained by respectively obtaining a plurality of detection times in the sequence from the back to the front of the detection time by taking the current detection time as a starting point are obtained, and a plurality of continuous difference values including the difference values obtained by the current detection time are obtained.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is an exemplary diagram of an application scenario of a data accuracy detection apparatus provided in the present application;
fig. 2 is a flowchart of an embodiment of a calculation method for generating a value of a preset index corresponding to a timestamp according to the present disclosure;
FIG. 3 is a flowchart of an embodiment of a method for detecting whether a computing task has a pile-up according to the present application;
fig. 4 is a flowchart illustrating an embodiment of a method for detecting whether a latest value of a preset index corresponding to a target generation timestamp is accurate according to the present disclosure;
fig. 5 is a flowchart illustrating an embodiment of a method for detecting whether a latest value of a preset indicator corresponding to a target generation timestamp is accurate according to the present application;
fig. 6 is a schematic structural diagram of an embodiment of a data accuracy detection apparatus according to an embodiment of the present disclosure.
Detailed Description
The inventor finds that the number of data based on which the value of the preset index corresponding to a certain timestamp is calculated is very large, and therefore, the final value of the preset index can be obtained by performing multiple calculations based on the data. However, in the process of obtaining the final value of the preset index, some intermediate values corresponding to the preset index are also obtained, and if the obtained preset index value corresponding to the timestamp is one intermediate value of the preset index corresponding to the timestamp, an erroneous decision is made based on the obtained intermediate value of the preset index.
Fig. 1 is an exemplary view of an application scenario of the data accuracy detection apparatus of the present application, and fig. 1 includes an advertisement log server and the data accuracy detection apparatus. Wherein, the advertisement log server generates an advertisement log data stream; and the data accuracy detection device is used for detecting whether the latest value of the preset index corresponding to any timestamp generated in the advertisement log data stream is accurate or not.
The data accuracy detection device in fig. 1 may be integrated in the advertisement log server or may be provided independently.
The technical solutions in the embodiments of the present application will be clearly and accurately described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 2 is a calculation method for generating a value of a preset index corresponding to a timestamp, which includes the following steps:
s201, analyzing the data in the message queue to obtain a plurality of analyzed data.
In the present embodiment, the data producer transmits the produced data to the message queue in real time, so that the subsequent data processing apparatus extracts the data from the message queue and processes the extracted data. The data producer is a device that produces data, such as a log server.
Taking the example of generating the advertisement log by the log server, the generated advertisement log is output to the message queue, and the subsequent data processing device extracts the advertisement log from the message queue and performs processing such as analysis calculation on the extracted advertisement log.
In this embodiment, the message queue is used to enable the process from data generation to data processing to be real-time, that is, a data generator can transmit generated data to the message queue after generating data, then, the data in the message queue can be processed as long as a data processing device exists in the message queue, and the data processing device can perform subsequent processing on all generated data without the data generator completing the data generation, thereby ensuring the real-time performance of the process from data generation to data processing.
In this step, the data in the message queue is analyzed to obtain multiple pieces of analyzed data, where each piece of analyzed data corresponds to a generation timestamp of the piece of data. Wherein, the generation time stamp of any piece of data refers to the time when the piece of data is generated; for example, the ad log server generates 1 ten thousand pieces of data at 14 points 30 minutes 23 seconds, and the generation time stamp of the 1 ten thousand pieces of data is 14 points 30 minutes 23 seconds.
In this embodiment, the process of parsing the data in the message queue may be performed in a distributed manner, for example, a plurality of processors may be used to parse the data in the message queue at the same time.
S202, storing a preset number of pieces of data in the analyzed pieces of data according to the sequence of generating the time stamps.
In order to facilitate subsequent determination of the generation timestamp of the value accuracy of the preset index to be detected, in this embodiment, a preset number of pieces of data are stored from the analyzed pieces of data according to the sequence of the generation timestamp. Specifically, the log delay table may be saved.
In this embodiment, the process of analyzing the data in the message queue may be processed in a distributed manner. For example, the actions of this step need to be performed after any processor has parsed the data.
It should be noted that, after the data in the message queue is analyzed in S201, a plurality of pieces of analyzed data are obtained, and for the analyzed data obtained in S201, the two steps S202 and S203 may be executed in parallel.
S203, calculating the value of the preset index according to the analyzed data, and storing the calculated value of the preset index according to the sequence of calculation.
In this step, the preset index may be determined by a technician according to an actual situation, and the preset index is not limited in this embodiment. The process of calculating the value of the preset index for which contents in the analyzed data and how to calculate the value of the preset index is the prior art, and is not described herein again.
In this embodiment, since one generation timestamp corresponds to a plurality of pieces of data, in the process of calculating the value of the preset index for the data corresponding to one generation timestamp, multiple calculations may be required to obtain the final value of the preset index corresponding to the generation timestamp. In this step, the calculation results of the preset indexes corresponding to the generated timestamps at each time are stored according to the calculation sequence, that is, the number of the stored calculation results is sequentially increased from the time perspective. Wherein, the calculation result of each time comprises: the generation time stamp of the data, the preset index (for example, the exposure rate) calculated this time, the value (for example, the exposure rate value) of the preset index obtained by calculation this time, and the like.
In this embodiment, in the process of calculating the value of the preset index for any data corresponding to the timestamp, each calculation index corresponds to a set of preset logic for calculating the data. In this embodiment, for convenience of description, the generated but unexecuted computation task is referred to as a stacked computation task.
It should be noted that the above-mentioned S201 to S203 are processes of performing analysis calculation on data in the message queue, and the following S301 to S303 are processes of detecting a state of the calculation task.
Specifically, fig. 3 illustrates a method for detecting whether a computing task has a pile, which includes the following steps:
s301, detecting whether accumulated computing tasks exist or not at preset time intervals, if so, executing S302, otherwise, executing S301.
In this embodiment, an interface may exist in the process of calculating the analyzed data, and if there are accumulated calculation tasks, the number of currently accumulated calculation tasks may be displayed in the interface. How to determine the calculation tasks of the piles displayed in the interface is the prior art, and the detailed description is omitted here.
In the step, whether the accumulated computing tasks exist or not is detected, the current accumulated computing tasks can be crawled from an interface displaying the current accumulated computing tasks through a crawler technology, and if the number of the crawled current accumulated computing tasks is not zero, the current accumulated computing tasks exist; otherwise, if the number of calculation tasks crawled to the current pile is zero, the calculation tasks are represented that no pile exists currently.
S302, determining the time length required by the completion of the stacked calculation tasks as delay time length.
In this step, each calculation task corresponds to a preset calculation time length, so that the product of the total number of the stacked calculation tasks and the preset calculation time length is the time length required for completing the stacked calculation tasks, and for convenience of description, the calculated time length is referred to as a delay time length.
S303, saving the state of the calculation task and the data delay, and returning to execute S301.
In this embodiment, the computing task state includes an accumulation state and a normal state, and if there is an accumulated computing task, the computing task state is the accumulation state, and if there is no accumulated computing task, the computing task state is the normal state. And detecting the state of the calculation task once every preset time length, and storing the detected state of the calculation task and the delay time length in the step.
S201 to S203 and S301 to S303 are bases for determining whether the value of the preset index corresponding to the timestamp is accurate, that is, determining whether the value of the preset index corresponding to any timestamp is accurate, and the determination needs to be performed based on the results obtained in S201 to S203 and S301 to S303.
It should be noted that the three processes of calculating the value of the preset index corresponding to the generated timestamp in S201 to S203, detecting the calculation task in S301 to S303, and detecting whether the value of the preset index corresponding to the generated timestamp is accurate are executed in parallel.
The detection process of whether the value of the preset index corresponding to the timestamp is accurate is described below.
The process of analyzing and calculating the data corresponding to each generation timestamp generated by the data generator is sequentially performed according to the sequence of the generation timestamps, so that the time of the accurate latest value of the preset index corresponding to each generation timestamp is also sequential, that is, the latest value of the preset index corresponding to the previous generation timestamp is also accurate first.
For example, the generation timestamps of the data generated by the data generator are 14 points 23 minutes 30 seconds, 14 points 23 minutes 31 seconds and 14 points 23 minutes 32 seconds in sequence, and the latest values of the corresponding preset indexes are also 14 points 23 minutes 30 seconds, 14 points 23 minutes 31 seconds and 14 points 23 minutes 32 seconds in sequence when the latest values are accurate.
That is, when the latest value of the preset index corresponding to 14 points 23 minutes 30 seconds is not accurate, the latest value of the preset index corresponding to 14 points 23 minutes 31 seconds and 14 points 23 minutes 32 seconds respectively is not accurate, so that if the latest value of the preset index corresponding to 14 points 23 minutes 30 seconds is detected to be accurate, the latest value of the preset index corresponding to 14 points 23 minutes 31 seconds and 14 points 23 minutes 32 seconds is detected to be accurate, and the detection efficiency is reduced.
Therefore, in this embodiment, in order to improve the detection efficiency, whether the latest value of the preset index corresponding to each generation timestamp sequentially generated by the data generator is accurate is sequentially detected by detecting whether the value of the preset index corresponding to each generation timestamp in the sliding time window is accurate.
The sliding time window refers to a time range formed from a starting time point to an ending time point. In this embodiment, the starting time point is a minimum generating time stamp in the generating time stamps with inaccurate values of the corresponding preset indexes, and the ending time point is any one generating time stamp greater than or equal to the starting time point in the obtained log delay table.
For example, if the minimum timestamp among the currently inaccurate generation timestamps is 15 o ' clock 20 min 25 sec, and the generation timestamps in the log delay table are 15 o ' clock 20 min 26 sec, 15 o ' clock 20 min 27 sec, 15 o ' clock 20 min 28 sec, and 15 o ' clock 20 min 29 sec, respectively, then the end time point may generate a timestamp for any one of the four generation timestamps in the log delay table.
In this step, it is necessary to determine whether the value of the preset index corresponding to each generation timestamp included in the sliding time window is accurate. For example, the sliding time window includes 3 generating timestamps, and in this step, it is necessary to detect whether values of preset indexes corresponding to the 3 generating timestamps are accurate.
In this embodiment, for any one of the generation timestamps in the sliding time window, the process of detecting whether the value of the preset index corresponding to the generation timestamp is accurate is the same, and for convenience of description, this embodiment takes any one of the generation timestamps (target generation timestamps) in the sliding time window as an example, and describes the process of detecting whether the value of the preset index corresponding to the generation timestamp is accurate.
Specifically, fig. 4 discloses a method for detecting whether the latest value of the preset index corresponding to the target generation timestamp is accurate in the embodiment of the present application.
The detection process is a circularly executed process, and when one detection time in a plurality of detection times with preset detection duration as an interval is reached, the detection process is triggered to be executed once, until the preset condition is judged to be met in the detection process executed at a certain time, the fact that the value of the preset index corresponding to the target generation timestamp is accurate is shown.
Each detection process corresponds to one detection time, the last calculated value of the preset index (calculation result) in the values of the preset index (calculation result) needs to be obtained in each detection process, that is, the latest value of the preset index is obtained, and for convenience in description, the calculation result obtained in each detection process is called result data.
Since the present embodiment is a cyclically executed process, for the convenience of description, a sequence arranged in accordance with the detection time from the beginning to the end is referred to as a detection time sequence, and a sequence arranged in accordance with the result data acquired in accordance with the detection time from the beginning to the end is referred to as a result data sequence.
Specifically, the method can comprise the following steps:
s401, when the target detection time is reached, the latest value of the preset index corresponding to the target generation timestamp is obtained from the result data table.
In the present embodiment, for convenience of description, any one of the generation time stamps in the sliding time window is referred to as a target generation time stamp. In this step, the initial value of the target detection time is a first detection time among a plurality of detection times at intervals of a preset detection time period, and the first detection time may be set by a user.
In this embodiment, since the value (calculation result) obtained by each calculation of the preset index is recorded in the result data table according to the calculation sequence, in this step, the latest value of the preset index corresponding to the target generation timestamp is obtained from the result data table.
The calculation process of calculating the value of the preset index includes at least one calculation, so that a calculation result is obtained in each calculation, and the calculation process of the value of the preset index is an independent process from the detection process of the data accuracy in the embodiment. Therefore, in this step, the latest value of the obtained preset index is the value calculated last time by the current detection time, and for convenience of description, the latest value of the obtained preset index is referred to as result data, that is, the result data obtained at the current detection time.
S402, calculating a difference value between the latest values of the preset indexes obtained last time when the latest value of the preset index obtained this time is obtained, and obtaining a first difference value.
In this embodiment, this time refers to the current detection time, and the last time refers to the detection time adjacent to the current detection time in the historical detection time. If the embodiment is executed for the first time, the last time does not exist, that is, the latest value of the preset index acquired last time is null.
In this step, the difference reflects the degree of change between the latest values of the preset indexes obtained in two adjacent times. Specifically, the difference value may be a difference value or a change rate, where the difference value is a difference value between a latest value of the preset index obtained this time and a latest value of the preset index obtained last time; the rate of change is the ratio of the difference to the time interval between this acquisition and the last acquisition. Of course, in practical application, the difference may also be other contents, and the specific content of the difference is not limited in this embodiment, as long as the difference can reflect the degree of change between the latest values of the preset indexes corresponding to the target generation timestamps acquired two times in the vicinity.
For convenience of description, a difference value between the latest value of the preset index obtained this time and the latest value of the preset index obtained last time is referred to as a first difference value.
S403, obtaining difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence.
In this step, the difference values calculated respectively for a predetermined number of detection times before the current detection time are null.
S404, judging whether a preset condition is met, if so, executing S405, and if not, executing S406.
In this step, the preset conditions include: the first difference value and the obtained preset number of difference values are smaller than a preset threshold value, wherein the preset threshold value is the same numerical value.
If the step is performed for the first time, there is only one difference, which is the first difference value, and the obtained difference value is null. The obtained difference value can be regarded as infinite, and at this time, when the step is executed for the first time, the judgment result must be that the preset condition is not satisfied.
It should be noted that the preset condition in this step is only one implementation manner, and in practice, a person skilled in the art may determine the specific content of the preset condition according to the actual situation.
S405, determining that the latest value of the preset index corresponding to the target generation timestamp is accurate.
In this step, it is determined that the latest value of the preset index corresponding to the target generation timestamp is accurate.
S406, it is determined that the latest value of the preset index corresponding to the target generation timestamp is inaccurate.
In this step, it is determined that the latest value of the preset index corresponding to the target generation timestamp is inaccurate.
And S407, updating the target detection time.
In this embodiment, whether the latest value of the preset index corresponding to the target generation timestamp is accurately taken in a cyclic manner is determined, and the cyclic interval is a preset detection duration, that is, the detection time is distributed according to the preset detection duration. For example, the preset detection time is one minute, and the detection time is 14 o ' clock 30 minutes, 14 o ' clock 31 minutes, and 14 o ' clock 32 minutes … … in sequence.
In this step, the target detection time is a minimum timestamp greater than the current detection time among the preset timestamps, where the current detection time is the target detection time reaching S401 in the execution process. For example, if the target detection time in S401 is 14: 30 minutes in the current execution process, the target detection time in this step is 14: 31 minutes, so that when the time reaches 14: 31 minutes, S401 is continuously executed.
In practice, a failure may occur in a value taking process of calculating a preset index, which causes accumulation of calculation tasks, and when the calculation tasks are accumulated, it is determined in S404 that the acquired preset number of difference values and the first difference value are both smaller than a preset threshold, but in the process of calculating the value of the preset index for the data corresponding to the target generation timestamp, the data corresponding to the target generation timestamp still contains un-calculated data, that is, the latest value of the preset index obtained currently is not the final calculation result of all the data corresponding to the generation timestamp; however, it is determined according to S404 that the preset condition is satisfied, that is, the latest value of the preset index corresponding to the target generation timestamp is accurate; therefore, the accuracy of the detection result that the latest value of the preset index corresponding to the target generation timestamp is accurate is low.
In order to improve the accuracy of the detection result that the latest value of the preset index corresponding to the target generation timestamp is accurate, a process of detecting whether the latest value of the preset index corresponding to the target generation timestamp is accurate is shown in fig. 5.
In the detection process, in each execution process, the preset condition includes that the generated calculation tasks do not have unexecuted calculation tasks except that the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value. That is, at a certain detection time, when the obtained preset number of difference values and the first difference value are both smaller than a preset threshold value, and there is no unexecuted calculation task in the generated calculation tasks, the latest value of the preset index corresponding to the target generation timestamp is accurate at the detection time, otherwise, the latest value of the preset index corresponding to the target generation timestamp is determined to be inaccurate at the detection time, when the target detection time is reached, the execution is continued according to the above-mentioned thought until the preset condition is met, and the latest value of the preset index corresponding to the target generation timestamp is determined to be accurate.
Specifically, the process may include the steps of:
s501, when the target detection time is reached, the latest value of the preset index corresponding to the target generation timestamp is obtained from the result data table.
S502, calculating a difference value between the latest value of the preset index obtained this time and the latest value of the preset index obtained last time to obtain a first difference value.
S503, obtaining difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence.
The implementation details of S501 to S503 are the same as S401 to S403 corresponding to fig. 4, and are not described again here.
S504, judging whether preset conditions are met, if so, executing S505, and if not, executing S506.
In this step, the preset conditions include: the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks.
The condition that the preset condition is not met comprises the following steps:
in the first case: the generated computing tasks comprise unexecuted computing tasks;
in the second case: the generated calculation tasks do not have unexecuted calculation tasks, and the acquired preset number of difference values and the first difference values are not all smaller than a preset threshold value.
And S505, determining that the latest value of the preset index corresponding to the target generation timestamp is accurate.
S506, it is determined that the latest value of the preset index corresponding to the target generation timestamp is inaccurate.
And S506, updating the target detection time.
In this step, if there is no unexecuted computation task in the generated computation tasks, but the obtained preset number of difference values and the first difference value are not all smaller than the preset threshold, the detection time for triggering the detection process next time is the minimum timestamp larger than the current detection time among the multiple detection times distributed according to the preset detection duration. For example, if the current detection time is 14 points and 30 minutes, and the preset detection time is 1 minute, the target detection time is 14 points and 31 minutes. For convenience of description, in the case that there is no unexecuted computation task in the generated computation tasks and all of the acquired preset number of difference values and the first difference values are not smaller than a preset threshold, the determined detection time is referred to as a first target detection time, and the first target detection time is referred to as a target detection time.
If there are unexecuted calculation tasks in the generated calculation tasks, that is, the latest value of the preset index obtained in the execution process of this time may be the latest value of the preset index obtained in the execution process of the last time, and therefore, even if the obtained preset number of difference values and the first difference value are both smaller than the preset threshold, it cannot be ensured that the latest value of the preset index corresponding to the target generation timestamp is accurate.
The delay time length can be determined, that is, the calculation of the calculation tasks accumulated after the delay time length can be completed, so that the detection time for triggering next detection is determined to be the minimum timestamp after the delay time length is postponed from the current detection time length in a plurality of preset detection timestamps, for the convenience of description, the determined detection time is called as second target detection time, the second target detection time is used as target detection time, and the detection process is continuously executed when the target detection time is reached.
Specifically, the determining method for determining the detection time for triggering the next detection process includes:
and A1, calculating the time point of the current detection time delayed by the delay time length as a reference time stamp.
A2, determining the minimum time stamp larger than the reference time stamp as the second target detection time from a plurality of detection times distributed in the preset detection time length.
For example, if the current detection time is 14 points 30 minutes, the detection time period is 2 minutes, and the delay time period is 3 minutes, the second target detection time is 14 points 34 minutes.
Has the advantages that: in this embodiment, data generated by a data generator in a message queue is analyzed, and the analyzed data corresponding to any generated timestamp is sequentially calculated to obtain a plurality of values of a preset index corresponding to the generated timestamp; and detecting whether the latest value of the preset index corresponding to the generation timestamp is accurate or not based on the value of the preset index corresponding to the generation timestamp. Specifically, through a cyclic detection process, in each detection process, the latest value of the preset index corresponding to the generation timestamp is acquired, the first difference value is determined based on the acquired latest value, and whether the latest value of the preset index corresponding to the generation timestamp is accurate is judged.
One way of judging whether the latest value of the preset index is accurate may include: the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value.
In practice, when a program of a process of calculating a value of a preset index corresponding to a timestamp fails, an unexecuted calculation task exists in generated calculation tasks, so that a stacked calculation task is generated, at this time, due to the program failure, the latest value of the preset index obtained for multiple times in multiple detection processes is the same intermediate value, at this time, it is determined that the obtained preset number of difference values and the first difference value are both smaller than a preset threshold value, but the latest value of the preset index is not accurate. In order to improve the accuracy of the detection result with the accurate latest value of the preset index, in this embodiment, besides that the obtained preset number of difference values and the first difference value are both smaller than the preset threshold, it is also determined whether an unexecuted calculation task exists in the generated calculation tasks, only when the obtained preset number of difference values and the first difference value are both smaller than the preset threshold, and the unexecuted calculation task does not exist in the generated calculation tasks, it is determined that the latest value of the preset index corresponding to the generation timestamp is accurate, and at this time, the detection result has higher accuracy.
In order to improve the detection efficiency, when an unexecuted computing task exists in the generated computing tasks, the total time length required by the unexecuted computing task is calculated, and the next detection process is carried out at the second target detection time after the current detection time is delayed by the total time length.
Fig. 6 is a device for detecting data accuracy, which includes:
a calculating unit 601, configured to calculate a first difference value, where the first difference value is a difference value between result data obtained at a current detection time in a result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change among the result data;
a first obtaining unit 602, configured to obtain difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging detection time according to the sequence from first to last;
a first determining unit 603, configured to determine that, when a preset condition is met, result data obtained at the current detection time is accurate.
The preset conditions in the first determining unit 603 include: the acquired preset number of difference values and the first difference value are both smaller than a preset threshold value.
The preset conditions in the first determining unit 603 include: the acquired preset number of difference values and the first difference value are both smaller than the preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks, wherein the calculation tasks are used for calculating data generated by preset equipment at least once to obtain at least one calculation result; and the result data obtained by the current detection time is the data calculated last time in the at least one calculation result.
Wherein, still include:
a second determining unit, configured to determine that result data obtained at the current detection time is inaccurate when there is no unexecuted computation task in the generated computation tasks and a preset number of difference values and the first difference values are not all smaller than the preset threshold;
a second obtaining unit, configured to obtain, when time reaches a first target timestamp, data calculated last time from the at least one calculation result as result data, and perform the step of calculating the first difference value; the first target timestamp is a minimum timestamp which is greater than the current detection time in a plurality of preset timestamps; and the time length between two adjacent timestamps in the preset plurality of timestamps is preset time length.
Wherein, still include:
a third determination unit configured to determine that result data obtained at the current detection time is inaccurate when there is an unexecuted calculation task among the generated calculation tasks;
a fourth determining unit, configured to determine, according to the number of the unexecuted computing tasks and a preset time required for executing one computing task, a total time required for completing the unexecuted computing task;
a fifth determining unit, configured to determine that a timestamp obtained by delaying the total duration at the current detection time is a reference timestamp;
a sixth determining unit, configured to determine a minimum timestamp, which is greater than the reference timestamp, of the preset timestamps to be a second target timestamp;
a third obtaining unit, configured to, when the time reaches the second target timestamp, obtain, from the at least one calculation result, data calculated last by the second target timestamp as result data, and perform the step of calculating the first difference value.
The calculating unit 601 is specifically configured to:
calculating the difference value between the result data obtained at the current detection time and the previous result data;
calculating the ratio of the difference value to the target time length as the difference value; and the target time length is the detection time corresponding to the previous result data and the time length corresponding to the current detection.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A method for detecting data accuracy is characterized in that the method is used for detecting whether the latest value of a preset index corresponding to any timestamp generated in an advertisement log data stream is accurate or not, and the method comprises the following steps:
calculating a first difference value, wherein the first difference value is a difference value between result data acquired at the current detection time in a result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change among the result data;
acquiring difference values respectively calculated at a preset number of detection times before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging detection time according to the sequence from first to last;
determining that the result data obtained by the current detection time is accurate under the condition that a preset condition is met;
wherein the preset conditions include: the acquired preset number of difference values and the first difference value are both smaller than the preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks, wherein the calculation tasks are used for calculating data generated by preset equipment at least once to obtain at least one calculation result; and the result data obtained by the current detection time is the data calculated last time in the at least one calculation result.
2. The method of claim 1, further comprising:
determining that result data obtained by the current detection time is inaccurate under the condition that no unexecuted calculation task exists in the generated calculation tasks and the preset number of difference values and the first difference values are not all smaller than the preset threshold value;
when the time reaches a first target timestamp, acquiring the last calculated data from the at least one calculation result as result data, and executing the step of calculating a first difference value; the first target timestamp is a minimum timestamp which is greater than the current detection time in a plurality of preset timestamps; and the time length between two adjacent timestamps in the preset plurality of timestamps is preset time length.
3. The method of claim 1, further comprising:
determining that result data acquired at the current detection time is inaccurate under the condition that unexecuted calculation tasks exist in the generated calculation tasks;
determining the total time length required for completing the unexecuted calculation tasks according to the number of the unexecuted calculation tasks and the preset time length required for executing one calculation task;
determining a timestamp obtained by delaying the total duration at the current detection time as a reference timestamp;
determining a minimum timestamp which is greater than the reference timestamp in the preset timestamps as a second target timestamp;
and when the time reaches the second target timestamp, acquiring data calculated last time by the second target timestamp from the at least one calculation result as result data, and executing the step of calculating the first difference value.
4. The method according to claim 1, wherein the difference between the result data obtained at the current detection time and the previous result data is calculated by:
calculating the difference value between the result data obtained at the current detection time and the previous result data;
calculating the ratio of the difference value to the target time length as the difference value; and the target time length is the detection time corresponding to the previous result data and the time length corresponding to the current detection.
5. The utility model provides a detection device of data accuracy, be used for detecting whether the latest value of the corresponding preset index of arbitrary production timestamp in the advertisement log data stream is accurate, include:
the calculating unit is used for calculating a first difference value, wherein the first difference value is a difference value between result data acquired at the current detection time in the result data sequence and previous result data; the result data sequence is obtained by arranging result data according to the sequence of the acquired time from first to last; the difference value is a parameter value reflecting the degree of change among the result data;
the first acquisition unit is used for acquiring difference values which are obtained by respectively calculating a preset number of detection time before the current detection time based on the detection time sequence; the detection time sequence is obtained by arranging detection time according to the sequence from first to last;
the first determining unit is used for determining that the result data acquired by the current detection time is accurate under the condition that a preset condition is met;
wherein the preset condition in the first determination unit includes: the acquired preset number of difference values and the first difference value are both smaller than the preset threshold value, and no unexecuted calculation task exists in the generated calculation tasks, wherein the calculation tasks are used for calculating data generated by preset equipment at least once to obtain at least one calculation result; and the result data obtained by the current detection time is the data calculated last time in the at least one calculation result.
6. The apparatus of claim 5, further comprising:
a second determining unit, configured to determine that result data obtained at the current detection time is inaccurate when there is no unexecuted computation task in the generated computation tasks and a preset number of difference values and the first difference values are not all smaller than the preset threshold;
a second obtaining unit, configured to obtain, when time reaches a first target timestamp, data calculated last time from the at least one calculation result as result data, and perform the step of calculating the first difference value; the first target timestamp is a minimum timestamp which is greater than the current detection time in a plurality of preset timestamps; and the time length between two adjacent timestamps in the preset plurality of timestamps is preset time length.
7. The apparatus of claim 5, further comprising:
a third determination unit configured to determine that result data obtained at the current detection time is inaccurate when there is an unexecuted calculation task among the generated calculation tasks;
a fourth determining unit, configured to determine, according to the number of the unexecuted computing tasks and a preset time required for executing one computing task, a total time required for completing the unexecuted computing task;
a fifth determining unit, configured to determine that a timestamp obtained by delaying the total duration at the current detection time is a reference timestamp;
a sixth determining unit, configured to determine a minimum timestamp, which is greater than the reference timestamp, of the preset timestamps to be a second target timestamp;
a third obtaining unit, configured to, when the time reaches the second target timestamp, obtain, from the at least one calculation result, data calculated last by the second target timestamp as result data, and perform the step of calculating the first difference value.
8. The apparatus according to claim 5, wherein the computing unit is specifically configured to:
calculating the difference value between the result data obtained at the current detection time and the previous result data;
calculating the ratio of the difference value to the target time length as the difference value; and the target time length is the detection time corresponding to the previous result data and the time length corresponding to the current detection.
CN201811648569.3A 2018-12-30 2018-12-30 Method and device for detecting data accuracy Active CN109697247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811648569.3A CN109697247B (en) 2018-12-30 2018-12-30 Method and device for detecting data accuracy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811648569.3A CN109697247B (en) 2018-12-30 2018-12-30 Method and device for detecting data accuracy

Publications (2)

Publication Number Publication Date
CN109697247A CN109697247A (en) 2019-04-30
CN109697247B true CN109697247B (en) 2021-05-18

Family

ID=66233122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811648569.3A Active CN109697247B (en) 2018-12-30 2018-12-30 Method and device for detecting data accuracy

Country Status (1)

Country Link
CN (1) CN109697247B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928255B (en) * 2019-11-20 2021-02-05 珠海格力电器股份有限公司 Data anomaly statistical alarm method and device, storage medium and electronic equipment
CN111563078B (en) * 2020-07-15 2020-11-10 浙江大华技术股份有限公司 Data quality detection method and device based on time sequence data and storage device
CN113189664B (en) * 2021-04-26 2022-04-22 拉扎斯网络科技(上海)有限公司 Object placing state detection method and device
CN115017099A (en) * 2022-08-08 2022-09-06 深圳市华曦达科技股份有限公司 Distributed network task cooperation method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965773B (en) * 2015-07-09 2019-06-04 网易(杭州)网络有限公司 Terminal, Caton detection method, device and game Caton detection method, device
CN105187863B (en) * 2015-07-31 2018-06-08 小米科技有限责任公司 Play the method and device of advertisement
KR101896002B1 (en) * 2015-10-15 2018-09-06 (주) 솔텍시스템 Server for efficiently compressing real time processing data
CN105653407A (en) * 2015-12-08 2016-06-08 网易(杭州)网络有限公司 Terminal, jam measuring method, device, game jam measuring method and apparatus
CN106095787A (en) * 2016-05-30 2016-11-09 重庆大学 A kind of Symbolic Representation method of time series data
CN107968731B (en) * 2016-10-20 2019-03-15 腾讯科技(深圳)有限公司 The aobvious number method for detecting abnormality of one kind and server
CN107423435B (en) * 2017-08-04 2020-05-12 电子科技大学 Multi-level anomaly detection method for multi-dimensional space-time data
CN108959174A (en) * 2018-07-27 2018-12-07 中国大唐集团新能源科学技术研究院有限公司 A kind of calculation method of wind power system generated energy

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device

Also Published As

Publication number Publication date
CN109697247A (en) 2019-04-30

Similar Documents

Publication Publication Date Title
CN109697247B (en) Method and device for detecting data accuracy
CN106656536B (en) Method and equipment for processing service calling information
CN108156006B (en) Buried point data reporting method and device and electronic equipment
EP2357562A1 (en) System for assisting with execution of actions in response to detected events, method for assisting with execution of actions in response to detected events, assisting device, and computer program
CN106815254B (en) Data processing method and device
CN108154252A (en) Method and apparatus for estimating the flow deadline
CN108492150B (en) Method and system for determining entity heat degree
US10057155B2 (en) Method and apparatus for determining automatic scanning action
CN114223189A (en) Duration statistical method and device, electronic equipment and computer readable medium
CN114623939A (en) Method, device, equipment and medium for determining pulse frequency
CN107870848B (en) Method, device and system for detecting CPU performance conflict
CN105139122A (en) Program operation time duration statistics method and system
CN110347572B (en) Method, device, system, equipment and medium for outputting performance log
CN110442439B (en) Task process processing method and device and computer equipment
CN115168154B (en) Abnormal log detection method, device and equipment based on dynamic baseline
CN108664550B (en) Funnel analysis method and device for user behavior data
JP6018024B2 (en) CHANGE DETECTION DEVICE, CHANGE DETECTION SYSTEM, CHANGE DETECTION METHOD, AND PROGRAM
CN115629903A (en) Task delay monitoring method, device, equipment and storage medium
US11206156B2 (en) Method and apparatus for storing data of transmission signal, and computer readable storage medium
CN110555182A (en) User portrait determination method and device and computer readable storage medium
CN115129548A (en) Alarm analysis method, device, equipment and medium
CN111125193B (en) Method, device, equipment and storage medium for identifying abnormal multimedia comments
CN112395155A (en) Service monitoring method and device, storage medium and electronic device
CN111026879A (en) Multi-dimensional value-oriented intent-oriented object-oriented numerical calculation method
CN112948031A (en) Dynamic window adjusting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant