CN111882349B - Data processing method, device and storage medium - Google Patents

Data processing method, device and storage medium Download PDF

Info

Publication number
CN111882349B
CN111882349B CN202010673436.2A CN202010673436A CN111882349B CN 111882349 B CN111882349 B CN 111882349B CN 202010673436 A CN202010673436 A CN 202010673436A CN 111882349 B CN111882349 B CN 111882349B
Authority
CN
China
Prior art keywords
target object
data
exposure
time
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010673436.2A
Other languages
Chinese (zh)
Other versions
CN111882349A (en
Inventor
冯志祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010673436.2A priority Critical patent/CN111882349B/en
Publication of CN111882349A publication Critical patent/CN111882349A/en
Application granted granted Critical
Publication of CN111882349B publication Critical patent/CN111882349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0244Optimization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0273Determination of fees for advertising

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data processing method, a device and a storage medium, wherein the method comprises the following steps: acquiring log data of a target object in a preset time period; determining a time difference between an Nth exposure time of the target object and an Nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer; when the time difference is smaller than or equal to a first threshold value, calculating an expected exposure time length value of the target object; when the expected exposure duration value of the target object is smaller than a preset duration threshold value and the time difference is larger than a second threshold value, determining exposure data corresponding to the Nth exposure time and clicking data corresponding to the Nth clicking time as abnormal data; and deleting the abnormal data from the log data. The data processed by the method is used as training data, and the prediction accuracy rate of the target object click rate prediction model is higher.

Description

Data processing method, device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, and a storage medium.
Background
The internet advertisement develops rapidly in the internet field, and compared with the traditional media, the internet advertisement has a faster propagation speed, and becomes the best choice for expanding the popularity of small and medium-sized enterprises as a brand new advertisement media. In the rapid development process of internet technology, some enterprises put advertisements by using the advantage of rapid information propagation of the internet, mainly aiming at expanding the popularization degree of the advertisements, and the prediction of advertisement clicks is more important.
The personalized advertisement recommendation industry is developing vigorously, and if the recommendation algorithm is not accurate enough, not only the user experience is influenced, but also the income of the advertisement platform is negatively influenced. In order to capture user behavior more quickly and update recommendation algorithm, streaming computing is often a technical support for each product.
If the click reflow data in the period a is delayed to the period B and the exposure amount in the period B is less than that in the period a, the overall click rate in the period B is falsely high, which results in a high estimated click rate of the model updated by the data in the period B. Some advertisers can utilize the technical defect to intentionally create advertisements with very small budget limit, so that the budget limit can be quickly consumed within 1-2 minutes, exposure is stopped due to under-calculation after 2 minutes, but the click rate after only 2 minutes is seen is abnormally high due to slow click reflow, so that a model obtained by training data is high in the click rate estimation when the next same type of advertisement is delivered again.
If exposure is stopped after exposure of the last time point of a training period before a certain advertisement, only click data in a current training window has no exposure data, so that the misleading model estimates the click rate of dimensions such as the advertisement and an advertiser to be higher, and the advertisements which are not exposed are exposed due to estimation of higher dimension, thereby influencing user experience and bringing about economic loss of an advertisement platform easily. Some advertisers can perform some speculative behavior operations for such situations, such as stopping delivering suddenly for a period of time and then delivering again in a large amount (referred to as "stop-delivery restart") in order to mislead the model to estimate a higher degree, so that more exposure opportunities can be obtained with lower bids under the same conditions. In addition, the advertiser can automatically stop the delivery by setting a smaller budget so that the advertisement can be consumed in a few minutes, and the equivalent effect of 'stop delivery and restart' is achieved.
The general ordering formula of advertisement recommendation is 'bid pCTR' pcvr 'quality factor', if other variables are unchanged, pCTR is estimated to be higher, so that an advertiser can obtain exposure with lower bid, the goal of 'low bid high exposure' is achieved, the income loss of a platform is caused correspondingly, and simultaneously, the user experience is also discounted because the accuracy of advertisement recommendation is influenced.
Therefore, in the prior art, data used for training the advertisement click rate estimation model has a large error, so that the prediction accuracy of the advertisement click rate is reduced. Therefore, it is necessary to provide a data processing method, device and storage medium, which can accurately identify abnormal click data, thereby improving the accuracy of training data corresponding to a model and improving the prediction accuracy of the model.
Disclosure of Invention
The application provides a data processing method, a data processing device and a storage medium, which can accurately identify abnormal click data, thereby improving the accuracy of training data corresponding to a model and improving the prediction accuracy of the model.
In one aspect, the present application provides a data processing method, including:
acquiring log data of a target object in a preset time period;
determining a time difference between an Nth exposure time of the target object and an Nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer;
when the time difference is smaller than or equal to a first threshold value, calculating an expected exposure time length value of the target object;
when the expected exposure duration value of the target object is smaller than a preset duration threshold value and the time difference is larger than a second threshold value, determining exposure data corresponding to the Nth exposure time and clicking data corresponding to the Nth clicking time as abnormal data;
and deleting the abnormal data from the log data.
Another aspect provides a data processing apparatus, the apparatus comprising:
the log data acquisition module is used for acquiring log data of the target object within a preset time period;
a time difference determination module, configured to determine a time difference between an nth exposure time of the target object and an nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer;
the exposure duration expected value calculation module is used for calculating the exposure duration expected value of the target object when the time difference is smaller than or equal to a first threshold;
an abnormal data determining module, configured to determine, when the expected exposure duration value of the target object is smaller than a preset duration threshold and the time difference is greater than a second threshold, exposure data corresponding to the nth exposure time and click data corresponding to the nth click time as abnormal data;
and the deleting module is used for deleting the abnormal data from the log data.
Another aspect provides a data processing apparatus comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the data processing method as described above.
Another aspect provides a computer storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement the data processing method as described above.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing method.
The data processing method, the device and the equipment have the following technical effects:
the method comprises the steps of firstly obtaining log data of a target object in a preset time period, determining the time difference between the exposure time of the target object and the corresponding click time according to the log data, and screening out part of abnormal data according to the time difference between the exposure time of the target object and the corresponding click time; secondly, performing secondary screening on the data based on the exposure duration of the target object; therefore, abnormal click data can be identified more comprehensively and accurately; the data processed by the method is used as training data, and the prediction accuracy rate of the target object click rate prediction model is higher.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of a data processing system provided by an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for marking a target object according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for determining a time difference between an Nth exposure time of the target object and an Nth click time of the target object according to an embodiment of the present disclosure;
fig. 5 is a flowchart illustrating a method for calculating an expected exposure duration value of the target object according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a method for determining an object click rate prediction model according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a blockchain system according to an embodiment of the present disclosure;
FIG. 8 is a block diagram according to an embodiment of the present disclosure;
FIG. 9 is a schematic flowchart of a training data screening method corresponding to an advertisement click-through rate prediction model according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.
Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.
Specifically, the scheme provided by the embodiment of the application relates to the field of machine learning of artificial intelligence. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of a data processing system according to an embodiment of the present disclosure, and as shown in fig. 1, the data processing system may include at least a server 01 and a client 02.
Specifically, in this embodiment of the present disclosure, the server 01 may include a server that operates independently, or a distributed server, or a server cluster composed of a plurality of servers. The server 01 may comprise a network communication unit, a processor, a memory, etc. Specifically, the server 01 may be configured to process log data of a target object.
Specifically, in the embodiment of the present disclosure, the client 02 may include a physical device such as a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, and a smart wearable device, and may also include software running in the physical device, such as a web page provided by some service providers to a user, and an application provided by the service providers to the user. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. Specifically, the client 02 may be used to query an abnormal object online.
A data processing method of the present application is described below, and fig. 2 is a flow chart of a data processing method provided in an embodiment of the present application, and the present specification provides the method operation steps as described in the embodiment or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:
s201: and acquiring log data of the target object in a preset time period.
In this specification embodiment, the target object may be an advertisement, and the log data may include an exposure log and a click log; the exposure log records the exposure time point data of the advertisement, and the click log records the time point of the click of the advertisement. The preset time period may be set according to actual conditions, for example, the preset time period may be set to one week, one month or one quarter.
In an embodiment of this specification, as shown in fig. 3, before the step of obtaining log data of the target object within a preset time period, the method further includes:
s2001: judging whether a playing stopping request is received or not in the playing process of the target object;
s2003: if the request for stopping playing is received, marking the target object by adopting first identification information;
s2005: and if the request for stopping playing is not received, marking the target object by adopting second identification information.
In the embodiments of the present specification, the target object may be stored according to the identification information of the target object.
Specifically, in this embodiment of the present specification, the target object may be an advertisement, and in the playing process of the advertisement, it is detected whether a playing stop request of an advertiser is received; if a request for stopping playing of the advertiser is received, the advertiser is indicated to adopt a 'stop-and-restart' strategy so as to mislead that the click rate estimated by the advertisement click rate prediction model is higher, and therefore more exposure opportunities can be obtained with lower bids under the same condition; at the moment, the target object (advertisement) is marked by the first identification information, so that an advertiser can find the advertisement adopting the 'stop-and-restart' strategy conveniently, corresponding click data are discarded, and the prediction accuracy of the advertisement click rate prediction model is ensured.
In the embodiment of the present specification, an advertiser side may obtain log data of a plurality of target objects, thereby predicting a click rate of each target object and obtaining a ranking of the click rates of the target objects; the advertiser may determine the placement budget for the target objects according to click-through rate ranking, e.g., setting a lower placement budget for top ranked target objects and a higher placement budget for bottom ranked target objects.
S203: determining a time difference between an Nth exposure time of the target object and an Nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer.
Specifically, in this embodiment of the present specification, as shown in fig. 4, the determining a time difference between an nth exposure time of the target object and an nth click time of the target object based on log data of the target object within a preset time period may include:
s2031: determining the Nth exposure time of the target object based on the exposure log of the target object within a preset time period;
s2033: determining the N-time clicking time aiming at the target object based on the clicking log of the target object in a preset time period;
s2035: calculating a time difference between an Nth exposure time of the target object and an Nth click time for the target object.
In the embodiment of the present specification, one exposure corresponds to one time point, each exposure time point may correspond to one click time point, and the click time point has a certain delay with respect to the exposure time point; the Nth exposure time and the Nth click time aiming at the target object are in a corresponding relation, and the Nth click is a click behavior generated aiming at the Nth exposure; n is 1, 2, 3, … ….
In a specific embodiment, when the target object is an advertisement, the number of advertisements exposed and clicked at each time point for an advertisement is set to X and Y, Yn-1 represents that the exposure time of the advertisement occurs at Tn-1, Y0 represents the click amount of the return flow at the last time point of the previous training period, it can be seen that there is a case that the first time point click and the last time point exposure data of the click, exposure and actual scene of the training data are inconsistent, and the exposure data and click data of the advertisement at different time points are as shown in the following table 1:
TABLE 1
Time T0 T1 T2 …… Tn
Number of exposures X0 X1 X2 …… Xn
Number of clicks Y0 Y1 …… Yn-1
In the present specification embodiment, as shown in table 1, a time difference between the nth exposure time of the target object and the nth click time for the target object may be T1-T0 (for X0, Y0), T2-T1 (for X1, Y1), … ….
S205: and when the time difference is smaller than or equal to a first threshold value, calculating the expected exposure time length of the target object.
Specifically, in this embodiment of the present specification, before the step of calculating the expected exposure time of the target object when the time difference is smaller than or equal to the first threshold, the method further includes:
determining identification information of the target object;
determining the first threshold value according to the identification information of the target object;
the determining the first threshold value according to the identification information of the target object includes:
when the identification information of the target object is first identification information, determining a first preset value as the first threshold value;
and when the identification information of the target object is second identification information, determining a second preset value as the first threshold value.
In this embodiment of the present specification, it may be determined whether the target object is an object that adopts a "stop-and-restart" policy according to identification information of the target object, and when the identification information of the target object is first identification information, it is determined that the target object is an object that adopts the "stop-and-restart" policy, and then a first preset value T2 may be determined as the first threshold; when the identification information of the target object is the second identification information, a second preset value T1 is determined as the first threshold. Wherein the values of T1 and T2 can be set according to actual conditions, for example, T1 > T2 can be set. And setting the first threshold of the target object adopting the 'stopping and restarting' strategy as a small value, thereby improving the screening accuracy of the abnormal data.
Specifically, in this embodiment of the present specification, before the step of calculating the expected exposure time of the target object when the time difference is smaller than or equal to the first threshold, the method further includes:
and judging whether the time difference is larger than a first threshold value or not.
Correspondingly, the method further comprises the following steps:
and when the time difference is larger than the first threshold, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as abnormal data.
And when the time difference is smaller than or equal to a first threshold value, determining exposure data corresponding to the Nth exposure time and click data corresponding to the Nth click time as undetermined data.
In this embodiment of the present specification, the first threshold may be set according to actual conditions, for example, may be set to 15min, and when the time difference is greater than the first threshold, it may be determined that both the corresponding exposure data and the click data are abnormal data, so that the abnormal data may be deleted from the log data. When the time difference is smaller than or equal to the first threshold, the corresponding exposure data and the click data are not determined as normal data, but determined as undetermined data, and are further screened, so that abnormal data can be screened out comprehensively.
Specifically, in the embodiment of the present specification, as shown in fig. 5, the calculating an expected exposure time of the target object when the time difference is smaller than or equal to a first threshold includes:
s2051: when the time difference is smaller than or equal to a first threshold value, acquiring virtual data of the target object;
s2053: determining an expected exposure value of the target object based on the virtual data of the target object;
s2055: acquiring the playing speed of the target object;
s2057: and determining the expected value of the exposure duration of the target object according to the expected value of the exposure amount of the target object and the playing speed of the target object.
In this embodiment, the determining the expected exposure duration value of the target object according to the expected exposure value of the target object and the playing speed of the target object may include:
calculating the ratio of the expected exposure value of the target object to the playing speed of the target object;
and determining the ratio of the expected exposure value of the target object to the playing speed of the target object as the expected exposure duration value of the target object.
In this embodiment of the present specification, the virtual data may be a budget of a target object, and when the target object is an advertisement, the virtual data is an advertisement budget.
In a specific embodiment, assume that the budget set by the advertiser for the advertisement is a (yuan), the CPM mean value of the advertisement delivery platform is C (yuan), the average speed of the advertisement delivery platform playing is D (advertisement/MIN), and CPM (cost Per mile) refers to the deduction fee Per thousand exposures;
the advertisement total exposure expected value can be calculated based on the total budget:
Figure BDA0002583174050000111
expected value of total exposure time of advertisement:
Figure BDA0002583174050000112
in the embodiment of the present specification, the expected exposure time value of the target object may be calculated according to the virtual data corresponding to the target object, so that the target object with short exposure time may be determined according to the expected exposure time value, and the abnormal click data may be further determined.
In the embodiment of the present specification, the abnormal data may also be determined directly from the virtual data of the target object and the time difference, for example, the click data whose budget is less than 200 yuan and whose click delay (time difference) exceeds 2 minutes may be filtered.
In the embodiments of the present specification, "Minute (MIN)" is used as the time unit, but in practical applications, the specific time granularity may be determined according to the product form, and for example, "second", "hour" or the like may be used as the time unit.
In this embodiment of the present specification, an expected exposure duration value of a target object may be calculated first, then log data of the target object within a preset time period is obtained, and a time difference between an nth exposure time of the target object and an nth click time of the target object is determined.
S207: and when the expected exposure duration value of the target object is smaller than a preset duration threshold value and the time difference is larger than a second threshold value, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as abnormal data.
In the embodiment of the present specification, when the target object is an advertisement, when the advertiser wants to "bid low and bid high" to speculate, the total exposure time is short, for example, 1 minute, and at this time, the advertisement reflow click delay exceeds 1 minute, and the discard filtering process may be adopted. The preset time threshold may be set to 1min, or may be set to other time durations according to actual situations.
Specifically, in this embodiment of the present specification, before the step of determining, as the abnormal data, the exposure data corresponding to the nth exposure time and the click data corresponding to the nth click time when the expected exposure time of the target object is smaller than the preset time threshold and the time difference is larger than the second threshold, the method further includes:
judging whether the expected value of the exposure time length of the target object is smaller than a preset time length threshold value or not;
and when the expected exposure time length value of the target object is greater than or equal to the preset time length threshold value, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as normal data.
Specifically, in this embodiment of the present specification, when the expected exposure duration value of the target object is smaller than a preset duration threshold and the time difference is greater than a second threshold, determining the exposure data corresponding to the nth exposure time and the click data corresponding to the nth click time as abnormal data includes:
when the expected exposure time length value of the target object is smaller than a preset time length threshold, judging whether the time difference is larger than a second threshold;
and when the time difference is larger than a second threshold value, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as abnormal data.
In the embodiment of the present specification, the second threshold may be set to be smaller than the first preset value and the second preset value. When the expected value of the exposure duration of the target object is smaller than the preset duration threshold, it can be determined that the exposure duration of the target object is shorter, and if the time difference is larger than the second threshold, that is, the click reflux delay time is longer, it can be determined that the target object is a low-budget high-exposure speculative behavior, and the corresponding exposure data and click data are abnormal data.
In an embodiment of the present specification, the method further comprises:
and when the time difference is smaller than or equal to a second threshold value, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as normal data.
In embodiments of the present description, the normal data may be used to train an object click rate prediction model.
S209: and deleting the abnormal data from the log data.
In an embodiment of this specification, as shown in fig. 6, after the step of deleting the abnormal data from the log data, the method further includes:
s2011: determining the log data after the abnormal data is deleted as training data;
s2013: training the training data based on a preset algorithm model to obtain an object click rate prediction model.
In the embodiment of the present specification, the log data after the abnormal data is deleted may be used as the training data of the target click rate prediction model, so that the accuracy of the training data is improved, and the prediction accuracy of the model is improved.
In an embodiment of the present specification, after the step of obtaining an object click rate prediction model, the method further includes:
determining test data of an object to be tested;
inputting the test data of the object to be tested into the object click rate prediction model to obtain the predicted click rate of the object to be tested;
and determining the putting result of the object to be detected according to the predicted click rate of the object to be detected.
In an embodiment of this specification, the determining, according to the predicted click rate of the object to be tested, an issuing result of the object to be tested may include:
when the predicted click rate of the object to be detected is larger than a preset click rate threshold value, determining the object to be detected as a release object;
and when the click rate of the object to be detected is smaller than or equal to the preset click rate threshold value, determining the object to be detected as a non-delivery object.
In the embodiment of the description, the click rate of the object to be detected can be predicted according to the trained object click rate prediction model, and the launched object is determined according to the predicted click rate, so that the click rate of the launched object can be improved.
In an embodiment of the present specification, the method may further include:
storing the training data based on a blockchain system, the blockchain system comprising a plurality of nodes forming a point-to-point network therebetween.
In some embodiments, the blockchain system may be the structure shown in fig. 7, a Peer-To-Peer (P2P) network is formed among a plurality of nodes, and the P2P Protocol is an application layer Protocol operating on top of a Transmission Control Protocol (TCP). In the blockchain system, any machine such as a server and a terminal can be added to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
The functions of each node in the blockchain system shown in fig. 7 involve:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
In some embodiments, the Block Structure (Block Structure) may be the Structure shown in fig. 8, where each Block includes a hash value of the Block storing the transaction record (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
In the embodiment of the present disclosure, the object click rate prediction model may be an advertisement click rate prediction model, and the application of the machine learning method in advertisement click rate prediction is important, wherein the shallow machine learning model is a comparison basis, and includes many model types, such as a decomposition model, which is applied in the field of recommendation systems, and is a nonlinear model, which can be applied in a nonlinear feature problem, and exerts its advantages. The GBDT model is an important learning method in the field of machine learning, and the same classification algorithm can be adopted on a training set for multiple classification, so that a series of weak classifiers are obtained, and a strong classifier is obtained on the basis of a final combined weak classifier. The deep neural network learning method in the advertisement click rate prediction machine learning method is also important, the learning capability is strong, a plurality of hidden layers are emphasized, and the deep neural network learning method has strong data fitting capability. The method is also characterized by a gradient optimization algorithm, which is an optimal model optimization algorithm for advertisement click rate estimation and comprises a first-order gradient optimization algorithm and a second-order gradient quasi-Newton optimization algorithm.
The placement of the advertisement is followed by a charge, primarily on a click-through basis. The click forecast becomes a key content of the advertisement system, which also has an influence on the advertisement system. In the process of predicting advertisement click, the advertisement click prediction method needs to be applied through a statistical method or a machine learning-based method, the advertisement click prediction method can predict small data volume through the statistical method, and for large data volume, the machine learning method needs to be applied to each sample according to the click rate corresponding to the characteristics.
Click-through rate prediction for advertisements is often modeled as a two-class problem model, with a prediction of whether there is a click-through rate given the advertisement as well as user information and other contextual information. The learning training of the machine can be implemented through the information of the advertisement log, firstly, the advertisement platform collects historical click behaviors of the advertisement displayed by the user, then, the website browsed by the user, the webpage content, the time and the like are recorded from the log system, the data of the log are converted into a proper model after data cleaning and characteristic processing, and click and non-click are represented through corresponding numbers, so that the input data of the advertisement click rate prediction model can be obtained. The data and the training model are processed through the application of the machine learning algorithm, when a new advertisement putting request is received, the corresponding data is input into the model, and the predicted click rate of the advertisement is calculated. According to the method, the speculating behavior of the advertiser can be hit, and the advertisement data close to the end sound of the conventional advertisement can be reasonably cleaned, so that higher-quality training data is provided for the advertisement recommendation algorithm, the upper limit is determined by data, the advertisement click rate prediction model can be promoted to accurately predict the advertisement click rate, and accurate advertisement recommendation is carried out. According to the method and the device, the abnormal click data of each advertisement is automatically marked in real time according to the advertisement budget, so that the condition of the advertisement can be met, the abnormal data can be actively found in real time, the abnormal advertisement data can be more comprehensively identified by an advertisement platform, and the phenomenon that the income of the advertisement platform is influenced due to the abnormal behavior of an advertiser is avoided.
In an embodiment of this specification, when the expected exposure time of the target object is smaller than a preset time threshold, and the time difference is greater than a second threshold, the method further includes:
determining the target object as an abnormal object;
the method further comprises the following steps:
determining the number of abnormal objects in an object putting platform, wherein the object putting platform comprises at least two target objects;
determining the proportion of the abnormal objects in the object putting platform according to the number of the abnormal objects in the object putting platform;
and when the proportion of the abnormal object in the object putting platform is greater than a preset proportion threshold value, sending alarm prompt information.
In the embodiment of the specification, the abnormal object is determined through the virtual data of the target object, so that the proportion of the abnormal object in the object launching platform can be determined, and when the proportion of the abnormal object in the object launching platform is higher, the alarm prompt message can be sent out; the object putting platform can adopt a proper strategy to inhibit the proportion of the abnormal objects based on the alarm prompt information. The abnormal click can be judged based on budget, global analysis can be carried out on the offline data, the occupation condition of the speculative behavior in the advertisement putting platform can be conveniently known, so that the phenomenon is restrained from aggravating by adopting a corresponding strategy in time, the malignant development of the speculative behavior is restrained, and the benign competition of the advertisement platform is guaranteed.
The data processing method of the present application is described below in conjunction with a training data screening process corresponding to the advertisement click-through rate prediction model. As shown in fig. 9, in the advertisement delivery process, it is determined whether the advertiser stops delivering, if so, a stop bid flag is added to the advertisement; then respectively generating a click log and an exposure log according to the judgment result; calculating the time difference between one-time exposure time and the corresponding click time according to the exposure log and the click log; when the advertisement is the non-stop bid marking advertisement, judging whether the time difference is greater than T1; when the advertisement is the mark advertisement with the stop bid, judging whether the time difference is greater than T2; if the judgment result is yes, discarding the corresponding click data and exposure data; if not, calculating an expected value of the advertisement exposure time according to the advertisement budget; when the expected value of the exposure duration is greater than or equal to the preset duration threshold, taking the corresponding exposure data and the corresponding click data as training data of the model; when the expected value of the exposure duration is smaller than a preset duration threshold, judging whether the time difference is larger than T3, if so, discarding the corresponding exposure data and click data; if not, taking the corresponding exposure data and click data as training data of the advertisement click rate prediction model; by adopting the data processing method, abnormal data of the advertisement are screened out comprehensively, and the accuracy of training data is improved, so that the click rate prediction accuracy of the advertisement click rate prediction model is improved.
According to the technical scheme provided by the embodiment of the specification, the log data of the target object in the preset time period are obtained, the time difference between the exposure time of the target object and the corresponding click time is determined according to the log data, and part of abnormal data is screened out according to the time difference between the exposure time of the target object and the corresponding click time; secondly, performing secondary screening on the data based on the exposure duration of the target object; therefore, abnormal click data can be identified more comprehensively and accurately; the data processed by the method is used as training data, and the prediction accuracy rate of the target object click rate prediction model is higher.
An embodiment of the present application further provides a data processing apparatus, as shown in fig. 10, the apparatus includes:
the log data acquisition module 1010 is configured to acquire log data of a target object within a preset time period;
a time difference determining module 1020, configured to determine a time difference between an nth exposure time of the target object and an nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer;
an expected exposure time calculation module 1030, configured to calculate an expected exposure time of the target object when the time difference is smaller than or equal to a first threshold;
an abnormal data determining module 1040, configured to determine, when the expected exposure duration value of the target object is smaller than a preset duration threshold and the time difference is greater than a second threshold, exposure data corresponding to the nth exposure time and click data corresponding to the nth click time as abnormal data;
a deleting module 1050 configured to delete the abnormal data from the log data.
In some embodiments, the apparatus may further comprise:
the judging module is used for judging whether the time difference is larger than a first threshold value or not;
and the data determining module is used for determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as abnormal data when the time difference is larger than the first threshold.
In some embodiments, the exposure duration expected value calculation module may include:
a virtual data acquisition unit configured to acquire virtual data of the target object when the time difference is less than or equal to a first threshold;
an expected exposure value determining unit configured to determine an expected exposure value of the target object based on the dummy data of the target object;
a play speed acquisition unit for acquiring a play speed of the target object;
and the expected exposure time value determining unit is used for determining the expected exposure time value of the target object according to the expected exposure value of the target object and the playing speed of the target object.
In some embodiments, the apparatus may further comprise:
a playing request judging module, configured to judge whether a playing stop request is received in the playing process of the target object;
the first marking module is used for marking the target object by adopting first identification information if the playing stopping request is received;
and the second marking module is used for marking the target object by adopting second identification information if the request for stopping playing is not received.
In some embodiments, the apparatus may further comprise:
the identification information determining module is used for determining the identification information of the target object;
the first threshold value determining module is used for determining the first threshold value according to the identification information of the target object;
the first threshold determination module may include:
a first determining unit, configured to determine a first preset value as the first threshold when the identification information of the target object is first identification information;
and the second determining unit is used for determining a second preset value as the first threshold value when the identification information of the target object is second identification information.
In some embodiments, the apparatus may further comprise:
the training data determining module is used for determining the log data after the abnormal data are deleted as training data;
and the object click rate prediction model determining module is used for training the training data based on a preset algorithm model to obtain an object click rate prediction model.
In some embodiments, the apparatus may further comprise:
the test data determining module is used for determining the test data of the object to be tested;
the predicted click rate determining module is used for inputting the test data of the object to be tested into the object click rate prediction model to obtain the predicted click rate of the object to be tested;
and the release result determining module is used for determining the release result of the object to be detected according to the predicted click rate of the object to be detected.
In some embodiments, the apparatus may further comprise:
an abnormal object determination module, configured to determine the target object as an abnormal object;
the system comprises a quantity determining module, a data processing module and a data processing module, wherein the quantity determining module is used for determining the quantity of abnormal objects in an object putting platform, and the object putting platform comprises at least two target objects;
the abnormal object proportion determining module is used for determining the proportion of the abnormal objects in the object putting platform according to the number of the abnormal objects in the object putting platform;
and the alarm prompt information sending module is used for sending alarm prompt information when the proportion of the abnormal object in the object putting platform is greater than a preset proportion threshold value.
The device and method embodiments in the device embodiment described are based on the same inventive concept.
The embodiment of the present application provides a data processing device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the data processing method provided by the above method embodiment.
Embodiments of the present application further provide a computer storage medium, where the storage medium may be disposed in a terminal to store at least one instruction or at least one program for implementing a data processing method in the method embodiments, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the data processing method provided in the method embodiments.
Alternatively, in the present specification embodiment, the storage medium may be located at least one network server among a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The memory described in the embodiments of the present disclosure may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.
The data processing method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar arithmetic device. Taking an example of the data processing method running on a server, fig. 11 is a hardware structure block diagram of the server according to the data processing method provided in the embodiment of the present application. As shown in fig. 11, the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1110 (the processors 1110 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA), a memory 1130 for storing data, and one or more storage media 1120 (e.g., one or more mass storage devices) for storing applications 1123 or data 1122. The memory 1130 and the storage medium 1120 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 1120 may include one or more modules, each of which may include a series of instruction operations for a server. Still further, the central processor 1110 may be configured to communicate with the storage medium 1120, and execute a series of instruction operations in the storage medium 1120 on the server 1100. The server 1100 may also include one or more power supplies 1160, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1140, and/or one or more operating systems 1121, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
The input output interface 1140 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 1100. In one example, i/o Interface 1140 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 1140 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
It will be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1100 may also include more or fewer components than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the method provided by the embodiment.
According to the data processing method, the data processing device or the data processing storage medium, log data of a target object in a preset time period are obtained, the time difference between the exposure time of the target object and the corresponding click time is determined according to the log data, and part of abnormal data is screened out according to the time difference between the exposure time of the target object and the corresponding click time; secondly, performing secondary screening on the data based on the exposure duration of the target object; therefore, abnormal click data can be identified more comprehensively and accurately; the data processed by the method is used as training data, and the prediction accuracy rate of the target object click rate prediction model is higher.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer storage medium, and the above storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (9)

1. A method of data processing, the method comprising:
in the playing process of a target object, if a request for stopping playing is received, marking the target object by adopting first identification information;
if the request for stopping playing is not received, marking the target object by adopting second identification information;
acquiring log data of the target object in a preset time period;
determining a time difference between an Nth exposure time of the target object and an Nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer;
determining a first threshold value according to the identification information of the target object; the identification information of the target object comprises the first identification information and the second identification information;
when the time difference is smaller than or equal to the first threshold, calculating an expected exposure time length value of the target object;
when the expected exposure duration value of the target object is smaller than a preset duration threshold value and the time difference is larger than a second threshold value, determining exposure data corresponding to the Nth exposure time and clicking data corresponding to the Nth clicking time as abnormal data;
and deleting the abnormal data from the log data.
2. The method of claim 1, wherein before the step of calculating the expected exposure duration for the target object when the time difference is less than or equal to the first threshold, the method further comprises:
judging whether the time difference is larger than a first threshold value or not;
correspondingly, the method further comprises the following steps:
and when the time difference is larger than the first threshold, determining the exposure data corresponding to the Nth exposure time and the clicking data corresponding to the Nth clicking time as abnormal data.
3. The method of claim 1, wherein calculating the expected exposure duration for the target object when the time difference is less than or equal to a first threshold comprises:
when the time difference is smaller than or equal to a first threshold value, acquiring virtual data of the target object;
determining an expected exposure value of the target object based on the virtual data of the target object;
acquiring the playing speed of the target object;
and determining the expected value of the exposure duration of the target object according to the expected value of the exposure amount of the target object and the playing speed of the target object.
4. The method of claim 1, wherein determining the first threshold value according to the identification information of the target object comprises:
when the identification information of the target object is the first identification information, determining a first preset value as the first threshold value;
and when the identification information of the target object is the second identification information, determining a second preset value as the first threshold value.
5. The method of claim 1, wherein after the step of deleting the anomalous data from the log data, the method further comprises:
determining the log data after the abnormal data is deleted as training data;
training the training data based on a preset algorithm model to obtain an object click rate prediction model.
6. The method of claim 5, wherein after the step of obtaining an object click rate prediction model, the method further comprises:
determining test data of an object to be tested;
inputting the test data of the object to be tested into the object click rate prediction model to obtain the predicted click rate of the object to be tested;
and determining the putting result of the object to be detected according to the predicted click rate of the object to be detected.
7. The method of claim 1, wherein when the desired exposure time period of the target object is less than a preset time period threshold and the time difference is greater than a second threshold, the method further comprises:
determining the target object as an abnormal object;
the method further comprises the following steps:
determining the number of abnormal objects in an object putting platform, wherein the object putting platform comprises at least two target objects;
determining the proportion of the abnormal objects in the object putting platform according to the number of the abnormal objects in the object putting platform;
and when the proportion of the abnormal object in the object putting platform is greater than a preset proportion threshold value, sending alarm prompt information.
8. A data processing apparatus, characterized in that the apparatus comprises:
the first marking module is used for marking the target object by adopting first identification information if a playing stopping request is received in the playing process of the target object;
the second marking module is used for marking the target object by adopting second identification information if the playing stopping request is not received;
the log data acquisition module is used for acquiring the log data of the target object within a preset time period;
a time difference determination module, configured to determine a time difference between an nth exposure time of the target object and an nth click time for the target object based on log data of the target object within a preset time period; wherein N is a positive integer;
the first threshold value determining module is used for determining a first threshold value according to the identification information of the target object; the identification information of the target object comprises the first identification information and the second identification information;
the exposure duration expected value calculation module is used for calculating the exposure duration expected value of the target object when the time difference is smaller than or equal to the first threshold;
an abnormal data determining module, configured to determine, when the expected exposure duration value of the target object is smaller than a preset duration threshold and the time difference is greater than a second threshold, exposure data corresponding to the nth exposure time and click data corresponding to the nth click time as abnormal data;
and the deleting module is used for deleting the abnormal data from the log data.
9. A computer storage medium, in which at least one instruction or at least one program is stored, which is loaded and executed by a processor to implement the data processing method according to any one of claims 1 to 7.
CN202010673436.2A 2020-07-14 2020-07-14 Data processing method, device and storage medium Active CN111882349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010673436.2A CN111882349B (en) 2020-07-14 2020-07-14 Data processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010673436.2A CN111882349B (en) 2020-07-14 2020-07-14 Data processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111882349A CN111882349A (en) 2020-11-03
CN111882349B true CN111882349B (en) 2021-09-14

Family

ID=73151093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010673436.2A Active CN111882349B (en) 2020-07-14 2020-07-14 Data processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111882349B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114546974B (en) * 2020-11-26 2024-09-20 北京达佳互联信息技术有限公司 Data labeling method and device
CN113225325B (en) * 2021-04-23 2022-09-13 北京明略昭辉科技有限公司 IP (Internet protocol) blacklist determining method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8554611B2 (en) * 2003-09-11 2013-10-08 Catalina Marketing Corporation Method and system for electronic delivery of incentive information based on user proximity
WO2017043781A1 (en) * 2015-09-07 2017-03-16 에스케이플래닛 주식회사 Method for providing advertisement, and apparatus applicable thereto
US20170091807A1 (en) * 2015-09-30 2017-03-30 Linkedin Corporation Tracking interaction with sponsored and unsponsored content
CN107423992A (en) * 2016-05-23 2017-12-01 北京易车互联信息技术有限公司 Determine the method and device of the prediction model of ad click rate
CN110097389A (en) * 2018-01-31 2019-08-06 上海甚术网络科技有限公司 A kind of anti-cheat method of ad traffic
CN109978609A (en) * 2019-03-13 2019-07-05 科大讯飞股份有限公司 A kind of method and device for screening cheating equipment
CN110545292B (en) * 2019-09-29 2021-07-30 秒针信息技术有限公司 Abnormal flow monitoring method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
社交网站广告反作弊系统的实现和优化;刘子微;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150815(第08期);第I139-267页 *

Also Published As

Publication number Publication date
CN111882349A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
US11070643B2 (en) Discovering signature of electronic social networks
Zhao et al. Seismic: A self-exciting point process model for predicting tweet popularity
Li et al. On popularity prediction of videos shared in online social networks
CN105678587B (en) Recommendation feature determination method, information recommendation method and device
US20160285672A1 (en) Method and system for processing network media information
US9015128B2 (en) Method and system for measuring social influence and receptivity of users
CN109840782B (en) Click rate prediction method, device, server and storage medium
JP5454357B2 (en) Information processing apparatus and method, and program
US20160132904A1 (en) Influence score of a brand
US9256692B2 (en) Clickstreams and website classification
CN110740356B (en) Live broadcast data monitoring method and system based on block chain
CN109561052B (en) Method and device for detecting abnormal flow of website
CN108416616A (en) The sort method and device of complaints and denunciation classification
CN110674144A (en) User portrait generation method and device, computer equipment and storage medium
CN111882349B (en) Data processing method, device and storage medium
CN110991789B (en) Method and device for determining confidence interval, storage medium and electronic device
US20180204248A1 (en) Web page viewership prediction
CN111460384A (en) Policy evaluation method, device and equipment
Alswailim et al. A reputation system to evaluate participants for participatory sensing
CN113869931A (en) Advertisement putting strategy determining method and device, computer equipment and storage medium
CN115222433A (en) Information recommendation method and device and storage medium
US20130179223A1 (en) Method and arrangement for segmentation of telecommunication customers
JP7549668B2 (en) Pattern-Based Classification
US11386506B2 (en) System and technique for influence estimation on social media networks using causal inference
CN111461188A (en) Target service control method, device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant