CN104091276A - Click stream data online analyzing method and related device and system - Google Patents

Click stream data online analyzing method and related device and system Download PDF

Info

Publication number
CN104091276A
CN104091276A CN201310672117.XA CN201310672117A CN104091276A CN 104091276 A CN104091276 A CN 104091276A CN 201310672117 A CN201310672117 A CN 201310672117A CN 104091276 A CN104091276 A CN 104091276A
Authority
CN
China
Prior art keywords
key information
click stream
advertisement
information
time window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310672117.XA
Other languages
Chinese (zh)
Other versions
CN104091276B (en
Inventor
王洋
张书彬
薛伟
李勇
肖磊
刘大鹏
言艳花
姜磊
郭伟昭
胡少锋
柳金晶
黄丕培
徐妙
蔡斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201310672117.XA priority Critical patent/CN104091276B/en
Publication of CN104091276A publication Critical patent/CN104091276A/en
Application granted granted Critical
Publication of CN104091276B publication Critical patent/CN104091276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a click stream data online analyzing method and a related device and system. The click stream data online analyzing method comprises the steps that click stream data are acquired from an advertisement server; key information is extracted from the click stream data; the click stream type corresponding to the key information is determined; a first time window is determined; according to the click stream type corresponding to the key information and the corresponding relationship between log time corresponding to the key information and the first time window, whether the key information needs to be filtered is determined; and if the fact that the key information does not need to be filtered is determined, the key information is used to generate the training data of an advertising forecast model. According to the technical scheme provided by the embodiment of the invention, the limitation of a processing resource on the acquired training data is reduced; the instantaneity of the advertising forecast model is improved; and the fitness of the advertising forecast model and online real-time data is improved.

Description

Method for online analysis of click stream data and related device and system
Technical Field
The invention relates to the technical field of internet, in particular to a method for analyzing click stream data on line, a related device and a related system.
Background
Advertisement push is an important internet service.
The advertisement pushing tool is commonly used by operators. When an advertisement push tool such as Guangdong advertisement and the like predicts advertisements, the advertisement delivery prediction model needs to be trained by using the daily click stream data of the user. The existing advertisement pushing tools such as Guangdong advertisement pushing tool generally use training data obtained by off-line analysis to train an advertisement delivery prediction model.
During research and practice, the inventor of the present invention finds that the prior art has at least the following technical problems: the training data obtained through the off-line analysis is limited by processing resources, the requirement of the advertisement delivery prediction model on high real-time performance is difficult to meet, and the advertisement delivery prediction model trained on the training data obtained through the off-line analysis is sometimes difficult to be well matched with the on-line real-time data.
Disclosure of Invention
The embodiment of the invention provides a method for analyzing click stream data on line, a related device and a related system, aiming at reducing the limitation of processing resources on obtaining training data, improving the real-time performance of an advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the real-time data on line.
An embodiment of the present invention provides a method for analyzing click stream data online, which is applied to a distributed system, and the method includes:
acquiring click stream data from an advertisement service server;
extracting key information contained in the click stream data;
determining a click stream type corresponding to the key information;
determining a first time window;
determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window;
and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
Another aspect of an embodiment of the present invention provides an apparatus for online analyzing clickstream data, which is applied to a distributed system, and the apparatus may include:
the acquisition unit is used for acquiring click stream data from the advertisement service server;
the extracting unit is used for extracting key information contained in the click stream data;
the type determining unit is used for determining the click stream type corresponding to the key information;
a time window determining unit for determining a first time window;
the filtering control unit is used for determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window;
and the generating unit is used for generating training data of the advertisement delivery prediction model by using the key information extracted by the extracting unit if the filtering control unit determines that the key information does not need to be filtered.
Another aspect of the embodiments of the present invention provides a communication system, which may include:
the system comprises an advertisement service server and an analysis and prediction platform;
the analysis and prediction platform is used for acquiring click stream data from the advertisement service server; extracting key information contained in the click stream data; determining a click stream type corresponding to the key information; determining a first time window; determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
It can be seen that in some embodiments of the present invention, after click stream data is obtained from an advertisement service server; extracting key information contained in the click stream data; determining a first time window and a click stream type corresponding to the key information; determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information. Compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limit of processing resources on obtaining the training data (the click stream data does not need to be accumulated to a certain amount and then is analyzed to obtain the training data), improving the real-time performance of the advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating a method for analyzing click stream data online according to an embodiment of the present invention;
fig. 2-a is a schematic structural diagram of a communication system according to an embodiment of the present invention;
FIG. 2-b is a schematic diagram of a logical architecture of an analysis prediction platform according to an embodiment of the present invention;
FIG. 2-c is a schematic flow chart of a method for analyzing click stream data online according to an embodiment of the present invention;
fig. 3-a is a schematic diagram of a process of determining whether the key information needs to be filtered according to a click stream type corresponding to the key information and a corresponding relationship between a log time corresponding to the key information and a first time window, provided by an embodiment of the present invention;
FIG. 3-b is a flow chart illustrating a method for processing critical information written to a negative sample buffer according to an embodiment of the present invention;
FIG. 4-a is a schematic diagram of an apparatus for online analysis of clickstream data according to an embodiment of the present invention;
FIG. 4-b is a schematic diagram of another online clickstream data analysis apparatus provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of an analytical prediction platform provided by an embodiment of the present invention;
FIG. 6-a is a schematic diagram of a distributed communication system provided by an embodiment of the present invention;
FIG. 6-b is a schematic diagram of an analytic prediction platform constructed based on a distributed architecture according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a server according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method for analyzing click stream data on line, a related device and a related system, aiming at reducing the limitation of processing resources on obtaining training data, improving the real-time performance of an advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the real-time data on line.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following are detailed below.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
One embodiment of a method for analyzing clickstream data online, which may be applied to a distributed system, may include: acquiring click stream data from an advertisement service server; extracting key information contained in the click stream data; determining the click stream type corresponding to the key information; determining a first time window; determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for analyzing clickstream data online according to an embodiment of the present invention. As shown in FIG. 1, a method for analyzing clickstream data online according to an embodiment of the present invention may be applied to a distributed system, and the method may include the following steps:
101. click stream data is obtained from an advertisement service server.
The click stream data refers to a data stream formed by sensing click and/or exposure behavior occurring when an advertisement is shown.
102. And extracting key information contained in the click stream data.
In some embodiments of the present invention, the key information may include an advertisement identifier, an advertisement space identifier, a user identifier (e.g., a user identifier such as a mailbox, a QQ number, a mobile phone number, etc.), and the like, and certainly, the key information may also include some other key information.
103. And determining the click stream type corresponding to the key information.
The click stream type corresponding to the key information may be exposure or click.
104. A first time window is determined.
The duration of the first time window may range from 3 to 10 minutes or other durations, for example. The expiration time of the first time window may be, for example, a log time corresponding to the newly acquired clickstream data including the key information. The duration of the first time window may be determined by preset parameters or may be determined according to user instructions or may be determined in other ways.
It is understood that there is no necessary order of execution between step 103 and step 104.
105. And determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window.
106. And if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
Wherein a piece of training data can be considered as an information aggregate in which a plurality of kinds of information including key information are aggregated.
In some embodiments of the present invention, the extracted key information included in the clickstream data may be written into a buffer, and after the key information is read from the buffer, the training data of the advertisement delivery prediction model may be generated using the key information.
For example, after extracting the key information included in the click stream data, the ad slot classification corresponding to the key information may be determined (i.e., the ad slot classification corresponding to the key information is determined according to the extracted key information of the ad slot classification); the key information is added to a queue corresponding to the ad slot classification (e.g., different ad slot classifications may correspond to different queues, e.g., ad slot classifications and queues may be one-to-one). The generating of the training data of the advertisement delivery prediction model by using the key information may specifically include: and after reading the key information from the queue corresponding to the advertisement space classification, generating training data of an advertisement putting prediction model by using the key information. One of the purposes of classifying the key information is to process the key information according to the granularity of the advertisement space when generating the training data.
In some embodiments of the present invention, the generating of the training data of the advertisement placement prediction model by using the key information may include: calling a streaming computing topology (or calling other computing units), and searching attribute information and characteristic information matched with the key information in an online storage server by taking the key information as an index; in some embodiments of the present invention, the obtained training data of the advertisement delivery prediction model may be written into a distributed file system for training the advertisement delivery prediction model.
The specific manner of determining whether the key information needs to be filtered may be various according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window.
For example, determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window may include: if the click stream type corresponding to the key information is determined to be a click, determining that the key information does not need to be filtered; or if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is outside a first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click is also acquired within the first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired within the first time window, it is determined that the key information does not need to be filtered.
Wherein another clickstream data may refer to another clickstream data or another clickstream data.
In some embodiments of the present invention, the attribute information and the feature information may include, for example, at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
The advertisement delivery prediction model mentioned in the embodiment of the present invention may be a Logistic Regression model, a Factorization mechanism model, a custom advertisement delivery prediction model, or other types of advertisement delivery prediction models.
Tests show that the faster the updating speed of the advertisement delivery prediction model is, namely the better the real-time performance of the advertisement delivery prediction model is, the higher the goodness of fit between the advertisement delivery prediction model and the online real-time data is generally, and the updating of the advertisement delivery prediction model mainly depends on the training data of the model, so that whether the training data can be ready as soon as possible, whether the training data is latest, and whether the online current click condition can be reflected as truly as possible/as real as possible is a key factor for ensuring the stability of the advertisement delivery prediction model and improving the quality of the advertisement delivery prediction model.
It can be seen that, in the embodiment, after the click stream data is obtained from the advertisement service server; extracting key information contained in the click stream data; and generating training data of the advertisement putting prediction model by using the key information. Compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limit of processing resources on obtaining the training data (the click stream data does not need to be analyzed after being accumulated to a certain quantity to obtain the training data), improving the real-time performance of the advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
To facilitate a better understanding and an implementation of the above-described aspects of the embodiments of the present invention, a few specific application scenarios are presented below by way of example.
Referring to fig. 2-a, fig. 2-a shows a communication system architecture diagram. The communication system shown in fig. 2-a comprises an analysis and prediction platform, an online storage server and a plurality of advertisement service servers. 2-b shows a logical architecture diagram of an analytical prediction platform.
Referring to fig. 2-c, fig. 2-c is a schematic flow chart illustrating a method for analyzing clickstream data online according to another embodiment of the present invention. As shown in FIG. 2-c, another embodiment of the present invention provides a method for online analyzing clickstream data, which may include the following:
201. and the analysis and prediction platform acquires click stream data from the advertisement service server.
The click stream data refers to a data stream formed by sensing click and/or exposure behaviors of the displayed advertisement.
202. And the analysis and prediction platform calls a streaming computing topology to extract the key information contained in the click stream data.
In some embodiments of the present invention, the key information may include an advertisement identifier, an advertisement space identifier, a user identifier (e.g., a user identifier such as a mailbox, a QQ number, a mobile phone number, etc.), and the like, and of course, the key information may also include some other key information.
The analysis and prediction platform may write click stream data acquired from the advertisement service server into a queue, for example. And after the analysis and prediction platform calls the streaming computing topology and takes the click stream data out of the queue, extracting the key information contained in the click stream data. Wherein. The processing speed of the clickstream data can be controlled by using the queue.
203. And analyzing the key information extracted by the prediction platform according to the advertisement space classification to determine the advertisement space classification corresponding to the key information.
204. And the analysis and prediction platform adds the key information into a queue corresponding to the advertisement space classification. For example, different ad slot classifications may correspond to different queues, e.g., ad slot classifications and queues may be one-to-one.
205. After reading the key information from the queue corresponding to the advertisement space classification, the analysis and prediction platform calls a stream computing topology, determines a click stream type and a first time window corresponding to the key information, determines whether the key information needs to be filtered based on a preset filtering strategy, if the key information does not need to be filtered, the analysis and prediction platform can find out attribute information and characteristic information matched with the key information in an online storage server by using the key information as an index, and generates training data of an advertisement delivery prediction model by using the key information, the attribute information and the characteristic information.
In some scenarios, there may be a situation that exposure of the same advertisement identifying the same advertisement slot to the same user may exist within a period of time before and after a user clicks, and if the user likes the clicks and the exposure, the user likes the clicks, and only does not click indicates dislikes. If a user clicks after a while, the user is considered to be preferred and the user's disliked notes can preferably be eliminated. Therefore, some critical information that the click stream type is exposure can be cleaned through the preset filtering strategy. The preset filtering strategies may be various, and the corresponding filtering strategies may be set according to specific needs.
Streaming computing topologies (e.g., streaming computing topologies in fig. 2-b, each of which includes several processing units) can be considered as a unit for implementing one computing flow. Wherein different streaming computing topologies may provide training data for different ad placement prediction models.
In some embodiments of the invention, some of the critical information stored in the queue may be multiplexed by multiple streaming computing topologies used to obtain training data. The key information is classified according to the advertisement positions, the stream type calculation topology of the training data of different advertisement putting prediction models is generated, the key information corresponding to the same advertisement position can be used, a certain stream type calculation topology can also use the key information corresponding to a plurality of advertisement positions, namely the key information on a specific advertisement position can be used for generating the training data of a plurality of models.
In some embodiments of the present invention, the number and types of the attribute information and the feature information required for generating the training data corresponding to each advertisement delivery prediction model may be adjusted according to different requirements as long as the required attribute information and feature information are stored in the online storage server in advance.
Specifically, determining whether the key information needs to be filtered based on a preset filtering policy may be determining whether the key information needs to be filtered according to a click stream type corresponding to the key information and a correspondence between a log time corresponding to the key information and a first time window.
For example, determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window may include: if the click stream type corresponding to the key information is determined to be a click, determining that the key information does not need to be filtered; or if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is out of a first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click is also acquired within the first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired within the first time window, it is determined that the key information does not need to be filtered.
The duration of the first time window may range from 3 to 10 minutes or other durations, for example. The expiration time of the first time window may be, for example, a log time corresponding to the newly acquired clickstream data including the key information. The time window is updated to the streaming system through the distributed reliable coordination service, and dynamic adjustment of the size of the time window is also supported without stopping topology calculation.
In some embodiments of the present invention, if the click stream type corresponding to the key information is a click, the key information is further written into the positive sample buffer; if the click stream type corresponding to the key information is determined to be exposure, the log time corresponding to the key information is within a first time window, and the key information does not exist in a first time window in a positive sample buffer area, the key information can be written into a negative sample buffer area, and if another click stream data which contains the key information and has a click stream type of click is also acquired in the first time window, the key information is determined to need to be filtered; if another clickstream data which contains the key information and has a clickstream type of click and/or exposure is not acquired within the first time window, it is determined that the key information does not need to be filtered.
In some embodiments of the present invention, the attribute information and the feature information may include, for example, at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
The advertisement delivery prediction model mentioned in the embodiment of the present invention may be a Logistic Regression model, a Factorization mechanism model, a custom advertisement delivery prediction model, or other types of advertisement delivery prediction models.
Tests show that the faster the updating speed of the advertisement delivery prediction model is, namely the better the real-time performance of the advertisement delivery prediction model is, the higher the goodness of fit between the advertisement delivery prediction model and the online real-time data is generally, and the updating of the advertisement delivery prediction model mainly depends on the training data of the model, so that whether the training data can be ready as soon as possible, whether the training data is latest, and whether the online current click condition can be reflected as truly as possible/as real as possible is a key factor for ensuring the stability of the advertisement delivery prediction model and improving the quality of the advertisement delivery prediction model.
206. And the analysis and prediction platform writes the obtained training data of the advertisement putting prediction model into the distributed file system so as to train the advertisement putting prediction model.
It can be seen that, after the analysis and prediction platform acquires click stream data from the advertisement service server in the embodiment; extracting key information contained in the click stream data; and generating training data of the advertisement putting prediction model by using the key information. Compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limit of processing resources on obtaining the training data (the click stream data does not need to be analyzed after being accumulated to a certain quantity to obtain the training data), improving the real-time performance of the advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
With reference to fig. 3-a and fig. 3-b, a manner of determining whether the key information needs to be filtered according to a click stream type corresponding to the key information and a correspondence between a log time corresponding to the key information and a first time window is described by way of example in some scenarios.
As shown in fig. 3-a, after key information in the clickstream data is obtained, a clickstream type corresponding to the key information may be determined, and if it is determined that the clickstream type corresponding to the key information is a click, the key information may be written into a positive sample buffer (pSample), training data of an advertisement delivery prediction model may be generated using the key information, and the key information may be deleted from the pSample after a Log time (Log _ time) corresponding to the key information falls within a first time window. If the click stream type corresponding to the key information is exposure, whether the log time corresponding to the key information does not fall into a first time window can be judged, and if the log time corresponding to the key information does not fall into the first time window, the key information can be filtered; if the log time corresponding to the key information falls into a first time window, whether the key information same as the key information currently exists in the positive sample buffer area can be judged, if the key information same as the key information currently exists in the positive sample buffer area is judged, the key information can be filtered, and if the key information same as the key information currently does not exist in the positive sample buffer area, the key information can be written into a negative sample buffer area (nSample).
Referring to fig. 3-b, fig. 3-b illustrates one way in which critical information written to the negative sample buffer is processed. As shown in fig. 3-b, after the sleep setting duration, it can be determined whether the log Time (Update Time) corresponding to the key information with the exposure as the click stream type newly written into the negative sample buffer does not fall into the first Time window (here, the deadline of the first Time window is the current Time of the system). And if the log time corresponding to the key information with the exposure in the click stream type newly written into the negative sample buffer area does not fall into the first time window, judging whether the negative sample buffer area has at least one piece of key information, and if the negative sample buffer area does not have any key information, returning to the step of setting the time length for dormancy. If the negative sample buffer area is judged to have at least one piece of key information, whether the same key information exists in the positive sample buffer area is further judged, if the positive sample buffer area is judged to have the same key information, the corresponding same key information in the negative sample buffer area is deleted (namely the key information is filtered), and if the positive sample buffer area is judged not to have the same key information, the key information can be further utilized to generate training data of an advertisement putting prediction model, and the key information is deleted from the negative sample buffer area.
As shown in fig. 3-b, if it is determined that the log time corresponding to the key information with the exposure type of the click stream newly written into the negative sample buffer already falls into the first time window, it may be further determined whether there is at least one piece of key information in the negative sample buffer. If the negative sample buffer area is judged to have no key information, the step of setting the time length by dormancy can be returned. If the negative sample buffer area is judged to have at least one piece of key information, whether the log time corresponding to the key information which is written first currently in the negative sample buffer area does not fall into the first time window or not can be further judged, and if the log time corresponding to the key information which is written first currently in the negative sample buffer area does not fall into the first time window, the step of setting the time length for dormancy is returned. If the log time corresponding to the key information written first currently in the negative sample buffer is judged to fall into the first time window, whether the same key information exists in the positive sample buffer or not can be further judged, if the same key information exists in the positive sample buffer, the same corresponding key information in the negative sample buffer is deleted (namely, the key information is filtered), and if the same key information does not exist in the positive sample buffer, the training data of the advertisement putting prediction model can be generated by using the key information, and the key information is deleted from the negative sample buffer.
It is understood that fig. 3-a and 3-b illustrate a possible manner of determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window, of course. In practical applications, the adjustment may be performed adaptively according to specific situations, and is not limited to the above examples.
The following also provides a related apparatus for implementing the above-described scheme.
Referring to fig. 4-a, an embodiment of the present invention further provides an apparatus 400 for online analyzing clickstream data, which may include: an acquisition unit 410, an extraction unit 420, a type determination unit 430, a time window determination unit 440, a filter control unit 450, and a generation unit 460.
Wherein, the obtaining unit 410 is configured to obtain click stream data from the advertisement service server.
An extracting unit 420, configured to extract key information included in the clickstream data.
A type determining unit 430, configured to determine a click stream type corresponding to the key information.
A time window determining unit 440 for determining the first time window.
The filtering control unit 450 is configured to determine whether the key information needs to be filtered according to the click stream type corresponding to the key information and a corresponding relationship between the log time corresponding to the key information and the first time window.
A generating unit 460, configured to generate training data of an advertisement delivery prediction model by using the key information extracted by the extracting unit if the filtering control unit 450 determines that the key information does not need to be filtered.
Referring to fig. 4-b, in some embodiments of the present invention, the apparatus 400 for online analyzing clickstream data further comprises:
a classification unit 470, configured to determine a slot classification corresponding to the key information, and add the key information to a queue corresponding to the slot classification;
the generating unit 460 is specifically configured to, after reading the key information from the queue corresponding to the ad slot classification, generate training data of an ad placement prediction model by using the key information.
In some embodiments of the present invention, the generating unit 460 is specifically configured to invoke a stream computing topology, and find out attribute information and feature information that match the key information in an online storage server by using the key information as an index; and generating training data of the advertisement putting prediction model by using the key information, the attribute information and the characteristic information.
In some embodiments of the present invention, the filtration control unit 450 is specifically configured to,
if the click stream type corresponding to the key information is determined to be click, determining that the key information does not need to be filtered;
or,
if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is out of a first time window, determining that the key information needs to be filtered;
or,
if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has a click stream type of click is also acquired within the first time window, it is determined that the key information needs to be filtered;
or,
and if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is in a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired in the first time window, determining that the key information does not need to be filtered.
In some embodiments of the present invention, the duration of the first time window is in a range of 3-10 minutes.
In some embodiments of the present invention, the key information includes an advertisement identification, an advertisement spot identification, and a user identification.
In some embodiments of the invention, the attribute information and the feature information comprise at least one of: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
The advertisement delivery prediction model mentioned in the embodiment of the present invention may be a Logistic Regression model, a Factorization mechanism model, a custom advertisement delivery prediction model, or other types of advertisement delivery prediction models.
Tests show that the faster the updating speed of the advertisement delivery prediction model is, namely the better the real-time performance of the advertisement delivery prediction model is, the higher the goodness of fit between the advertisement delivery prediction model and the online real-time data is generally, and the updating of the advertisement delivery prediction model mainly depends on the training data of the model, so that whether the training data can be ready as soon as possible, whether the training data is latest, and whether the online current click condition can be reflected as truly as possible/as real as possible is a key factor for ensuring the stability of the advertisement delivery prediction model and improving the quality of the advertisement delivery prediction model.
It can be understood that the functions of the functional modules of the device 400 for online analyzing clickstream data according to this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
It can be seen that, after the device 400 for analyzing click stream data online acquires click stream data from the advertisement service server in the present embodiment; extracting key information contained in the click stream data; and generating training data of the advertisement putting prediction model by using the key information. Compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limit of processing resources on obtaining the training data (the click stream data does not need to be analyzed after being accumulated to a certain quantity to obtain the training data), improving the real-time performance of the advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
Referring to fig. 5, an embodiment of the present invention provides an analysis and prediction platform 500, which may include:
a processor 510, a memory 520, an input device 530, and an output device 540. The number of processors 510 in the terminal device 500 may be one or more, and one processor is taken as an example in fig. 5. In some embodiments of the invention, the processor 510, the memory 520, the input device 530 and the output device 540 may be connected by a bus or other means, wherein the connection by the bus is exemplified in fig. 5.
The memory 520 may be used to store software programs and modules, and the processor 510 executes various functional applications and data processing of the analysis and prediction platform 500 by operating the software programs and modules stored in the memory 520. The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal device, and the like. Further, memory 520 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the analysis prediction platform 500. The input device 540 may include a display device such as a display screen.
Wherein, the processor 510 executes the following steps: acquiring click stream data from an advertisement service server; extracting key information contained in the click stream data; determining the click stream type corresponding to the key information; determining a first time window; determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
The click stream data refers to a data stream formed by sensing click and/or exposure behavior occurring when an advertisement is shown.
In some embodiments of the present invention, the key information may include an advertisement identifier, an advertisement space identifier, a user identifier (e.g., a user identifier such as a mailbox, a QQ number, a mobile phone number, etc.), and the like, and certainly, the key information may also include some other key information.
Wherein a piece of training data can be considered as an information aggregate in which a plurality of kinds of information including key information are aggregated.
In some embodiments of the present invention, the processor 510 may write the extracted key information contained in the clickstream data into a buffer, and after reading the key information from the buffer, generate training data of an advertisement delivery prediction model using the key information.
For example, after extracting the key information included in the click stream data, the ad slot classification corresponding to the key information may be determined (i.e., the ad slot classification corresponding to the key information is determined according to the extracted key information of the ad slot classification); the key information is added to a queue corresponding to the ad slot classification (e.g., different ad slot classifications may correspond to different queues, e.g., ad slot classifications and queues may be one-to-one). The generating of the training data of the advertisement delivery prediction model by using the key information may include: and after reading the key information from the queue corresponding to the advertisement space classification, generating training data of an advertisement putting prediction model by using the key information. One of the purposes of classifying the key information is to process the key information according to the granularity of the advertisement space when generating the training data.
In some embodiments of the present invention, the generating training data of the advertisement placement prediction model by the processor 510 using the key information may include: calling a streaming computing topology (or calling other computing units), and searching attribute information and characteristic information matched with the key information in an online storage server by taking the key information as an index; in some embodiments of the present invention, the obtained training data of the advertisement delivery prediction model may be written into a distributed file system for training the advertisement delivery prediction model.
For example, the determining, by the processor 510, whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window may include: if the click stream type corresponding to the key information is determined to be a click, determining that the key information does not need to be filtered; or if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is outside a first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click is also acquired within the first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired within the first time window, it is determined that the key information does not need to be filtered.
The duration of the first time window may range from 3 to 10 minutes or other durations, for example. The expiration time of the first time window may be, for example, a log time corresponding to the newly acquired clickstream data including the key information.
In some embodiments of the present invention, the attribute information and the feature information may include, for example, at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
The advertisement delivery prediction model mentioned in the embodiment of the present invention may be a Logistic Regression model, a Factorization mechanism model, a custom advertisement delivery prediction model, or other types of advertisement delivery prediction models.
Tests show that the faster the updating speed of the advertisement delivery prediction model is, namely the better the real-time performance of the advertisement delivery prediction model is, the higher the goodness of fit between the advertisement delivery prediction model and the online real-time data is generally, and the updating of the advertisement delivery prediction model mainly depends on the training data of the model, so that whether the training data can be ready as soon as possible, whether the training data is latest, and whether the online current click condition can be reflected as truly as possible/as real as possible is a key factor for ensuring the stability of the advertisement delivery prediction model and improving the quality of the advertisement delivery prediction model.
It can be seen that, after the analysis and prediction platform 500 of the present embodiment obtains click stream data from the advertisement service server; extracting key information contained in the click stream data; and generating training data of the advertisement putting prediction model by using the key information. Compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limit of processing resources on obtaining the training data (the click stream data does not need to be analyzed after being accumulated to a certain quantity to obtain the training data), improving the real-time performance of the advertisement delivery prediction model and improving the goodness of fit between the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
Referring to fig. 6-a, an embodiment of the present invention provides a distributed communication system, which may include:
an ad service server 610 and an analytics prediction platform 620. In fig. 6, a plurality of advertisement servers 610 are illustrated as an example.
Referring to fig. 6-a, analytics prediction platform 620 may include one or more analytics prediction servers 621. The plurality of analytical prediction servers 621 may be built based on a distributed architecture.
The analysis and prediction platform 620 is configured to obtain click stream data from the advertisement service server 610; extracting key information contained in the click stream data; determining the click stream type corresponding to the key information; determining a first time window; determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
The click stream data refers to a data stream formed by sensing click and/or exposure behavior occurring when an advertisement is shown.
In some embodiments of the present invention, the key information may include an advertisement identifier, an advertisement space identifier, a user identifier (e.g., a user identifier such as a mailbox, a QQ number, a mobile phone number, etc.), and the like, and certainly, the key information may also include some other key information.
Wherein a piece of training data can be considered as an information aggregate in which a plurality of kinds of information including key information are aggregated.
In some embodiments of the present invention, the extracted key information included in the clickstream data may be written into a buffer, and after the key information is read from the buffer, the training data of the advertisement delivery prediction model may be generated using the key information.
For example, after extracting the key information included in the click stream data, the ad slot classification corresponding to the key information may be determined (i.e., the ad slot classification corresponding to the key information is determined according to the extracted key information of the ad slot classification); the key information is added to a queue corresponding to the ad slot classification (e.g., different ad slot classifications may correspond to different queues, e.g., ad slot classifications and queues may be one-to-one). The generating of the training data of the advertisement delivery prediction model by using the key information may include: and after reading the key information from the queue corresponding to the advertisement space classification, generating training data of an advertisement putting prediction model by using the key information. One of the purposes of classifying the key information is to process the key information according to the granularity of the advertisement space when generating the training data.
In some embodiments of the present invention, in terms of generating training data of an advertisement placement prediction model using the key information, the analysis prediction platform 620 may be specifically configured to invoke a streaming computing topology (or invoke other computing units), and find out attribute information and feature information that match the key information in an online storage server using the key information as an index; in some embodiments of the present invention, the obtained training data of the advertisement delivery prediction model may be written into a distributed file system for training the advertisement delivery prediction model.
For example, in the aspect of determining whether the key information needs to be filtered according to the click stream type corresponding to the key information and the corresponding relationship between the log time corresponding to the key information and the first time window, the analysis and prediction platform may be specifically configured to determine that the key information does not need to be filtered if it is determined that the click stream type corresponding to the key information is a click; or if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is outside a first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click is also acquired within the first time window, determining that the key information needs to be filtered; or if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired within the first time window, it is determined that the key information does not need to be filtered.
The duration of the first time window may range from 3 to 10 minutes or other durations, for example. The expiration time of the first time window may be, for example, a log time corresponding to the newly acquired clickstream data including the key information.
In some embodiments of the present invention, the attribute information and the feature information may include, for example, at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
The advertisement delivery prediction model mentioned in the embodiment of the present invention may be a Logistic Regression model, a Factorization mechanism model, a custom advertisement delivery prediction model, or other types of advertisement delivery prediction models.
Tests show that the faster the updating speed of the advertisement delivery prediction model is, namely the better the real-time performance of the advertisement delivery prediction model is, the higher the goodness of fit between the advertisement delivery prediction model and the online real-time data is generally, and the updating of the advertisement delivery prediction model mainly depends on the training data of the model, so that whether the training data can be ready as soon as possible, whether the training data is latest, and whether the online current click condition can be reflected as truly as possible/as real as possible is a key factor for ensuring the stability of the advertisement delivery prediction model and improving the quality of the advertisement delivery prediction model.
It can be seen that, in this embodiment, after the analysis and prediction platform 620 obtains click stream data from the advertisement service server 610; extracting key information contained in the click stream data; and generating training data of the advertisement putting prediction model by using the key information. Because the real-time click stream data is obtained from the advertisement service server 610, and the click stream data is analyzed on line in real time to obtain the training data of the advertisement delivery prediction model, compared with the existing off-line analysis mechanism, the technical scheme of the invention is beneficial to reducing the limitation of processing resources on obtaining the training data (the click stream data does not need to be analyzed to obtain the training data after being accumulated to a certain quantity), improving the real-time performance of the advertisement delivery prediction model and improving the matching degree of the advertisement delivery prediction model and the on-line real-time data. In addition, the scheme further filters the key information according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the determined first time window, so that the effectiveness of the used key information is improved, the effectiveness of the training data of the generated advertisement putting prediction model is improved, and the advertisement putting prediction model which is more fit with the actual occurrence scene is trained.
Referring to fig. 7, fig. 7 is a schematic diagram of a server structure according to an embodiment of the present invention. The server 700 may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the server.
Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700. The server 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, and/or one or more operating systems 741, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth. The steps performed by the analytics prediction platform, the analytics prediction server, the online storage server, or the advertising server described in the embodiments of fig. 1, 2, 3-a-3-b above may be based on the server architecture shown in fig. 7.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program includes, when executed, some or all of the steps of the method for online analyzing clickstream data described in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (16)

1. A method for analyzing click stream data online, which is applied to a distributed system, the method comprising:
acquiring click stream data from an advertisement service server;
extracting key information contained in the click stream data;
determining a click stream type corresponding to the key information;
determining a first time window;
determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window;
and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
2. The method of claim 1,
the method further comprises the following steps:
determining the advertisement space classification corresponding to the key information;
adding the key information to a queue corresponding to the ad slot classification;
the generating of the training data of the advertisement delivery prediction model by using the key information includes: and after reading the key information from the queue corresponding to the advertisement space classification, generating training data of an advertisement putting prediction model by using the key information.
3. The method according to claim 1 or 2,
the training data for generating the advertisement delivery prediction model by using the key information comprises:
calling a streaming computing topology, and searching attribute information and characteristic information matched with the key information in an online storage server by taking the key information as an index; and generating training data of the advertisement putting prediction model by using the key information, the attribute information and the characteristic information.
4. The method according to any one of claims 1 to 3, wherein the step of determining, according to the click stream type corresponding to the key information and the correspondence between the log time corresponding to the key information and the first time window comprises:
if the click stream type corresponding to the key information is determined to be click, determining that the key information does not need to be filtered;
if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is out of a first time window, determining that the key information needs to be filtered;
or,
if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has a click stream type of click is also acquired within the first time window, it is determined that the key information needs to be filtered;
or,
and if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is in a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired in the first time window, determining that the key information does not need to be filtered.
5. The method of claim 4,
the duration range of the first time window is 3-10 minutes.
6. The method according to claim 1 or 2,
the key information comprises advertisement identification, advertisement position identification and user identification.
7. The method according to claim 1 or 2,
the attribute information and the feature information include at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
8. An apparatus for analyzing clickstream data online, applied to a distributed system, the apparatus comprising:
the acquisition unit is used for acquiring click stream data from the advertisement service server;
the extracting unit is used for extracting key information contained in the click stream data;
the type determining unit is used for determining the click stream type corresponding to the key information;
a time window determining unit for determining a first time window;
the filtering control unit is used for determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window;
and the generating unit is used for generating training data of the advertisement delivery prediction model by using the key information extracted by the extracting unit if the filtering control unit determines that the key information does not need to be filtered.
9. The apparatus of claim 8,
the device further comprises:
the classification unit is used for determining the advertisement space classification corresponding to the key information and adding the key information into a queue corresponding to the advertisement space classification;
the generating unit is specifically configured to read the key information from the queue corresponding to the advertisement space classification, and then generate training data of an advertisement delivery prediction model using the key information.
10. The apparatus according to claim 8 or 9,
the generating unit is specifically configured to, if the filtering control unit determines that the key information does not need to be filtered, invoke a streaming computing topology, and find out attribute information and feature information that match the key information in an online storage server with the key information as an index; and generating training data of the advertisement putting prediction model by using the key information, the attribute information and the characteristic information.
11. The apparatus according to any one of claims 8 to 10,
the filtering control unit is specifically configured to determine that the key information does not need to be filtered if it is determined that the click stream type corresponding to the key information is a click;
if the click stream type corresponding to the key information is determined to be exposure and the log time corresponding to the key information is out of a first time window, determining that the key information needs to be filtered;
or,
if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is within a first time window, and another click stream data which contains the key information and has a click stream type of click is also acquired within the first time window, it is determined that the key information needs to be filtered;
or,
and if it is determined that the click stream type corresponding to the key information is exposure, the log time corresponding to the key information is in a first time window, and another click stream data which contains the key information and has the click stream type of click and/or exposure is not acquired in the first time window, determining that the key information does not need to be filtered.
12. The apparatus of claim 11,
the duration range of the first time window is 3-10 minutes.
13. The apparatus according to claim 8 or 9,
the key information comprises advertisement identification, advertisement position identification and user identification.
14. The apparatus according to claim 8 or 9,
the attribute information and the feature information include at least one of the following information: user age, user liveness, user gender, advertiser identification, advertisement category information, advertisement image information.
15. A distributed communications system, comprising:
the system comprises an advertisement service server and an analysis and prediction platform;
the analysis and prediction platform is used for acquiring click stream data from the advertisement service server; extracting key information contained in the click stream data; determining a click stream type corresponding to the key information; determining a first time window; determining whether the key information needs to be filtered or not according to the click stream type corresponding to the key information and the corresponding relation between the log time corresponding to the key information and the first time window; and if the key information is determined not to need to be filtered, generating training data of the advertisement putting prediction model by using the key information.
16. The communication system according to claim 15, wherein in the aspect of generating the training data of the prediction model for advertisement placement using the key information, the analysis and prediction platform is specifically configured to invoke a streaming computing topology, find out attribute information and feature information that match the key information in an online storage server using the key information as an index, and generate the training data of the prediction model for advertisement placement using the key information, the attribute information, and the feature information.
CN201310672117.XA 2013-12-10 2013-12-10 The method of on-line analysis clickstream data and relevant apparatus and system Active CN104091276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310672117.XA CN104091276B (en) 2013-12-10 2013-12-10 The method of on-line analysis clickstream data and relevant apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310672117.XA CN104091276B (en) 2013-12-10 2013-12-10 The method of on-line analysis clickstream data and relevant apparatus and system

Publications (2)

Publication Number Publication Date
CN104091276A true CN104091276A (en) 2014-10-08
CN104091276B CN104091276B (en) 2015-08-26

Family

ID=51638991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310672117.XA Active CN104091276B (en) 2013-12-10 2013-12-10 The method of on-line analysis clickstream data and relevant apparatus and system

Country Status (1)

Country Link
CN (1) CN104091276B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484372A (en) * 2014-12-04 2015-04-01 北京奇虎科技有限公司 Detecting method and device of business object sending information
CN104809158A (en) * 2015-03-26 2015-07-29 小米科技有限责任公司 Network content filter method and device
CN104965812A (en) * 2015-07-13 2015-10-07 深圳市腾讯计算机系统有限公司 Deep-layer model processing method and device
CN105023170A (en) * 2015-06-26 2015-11-04 深圳市腾讯计算机系统有限公司 Processing method and device of click stream data
CN105224998A (en) * 2015-09-08 2016-01-06 北京金山安全软件有限公司 Data processing method and device for pre-estimation model
CN106127528A (en) * 2016-06-30 2016-11-16 北京小米移动软件有限公司 Advertisement placement method and device
CN106919579A (en) * 2015-12-24 2017-07-04 腾讯科技(深圳)有限公司 A kind of information processing method and device, equipment
CN106997394A (en) * 2017-04-12 2017-08-01 成都四方伟业软件股份有限公司 A kind of out of order arrival processing method and system of data
CN107092620A (en) * 2016-02-18 2017-08-25 奥多比公司 Click steam visual analysis based on maximum serial model
WO2017219858A1 (en) * 2016-06-20 2017-12-28 阿里巴巴集团控股有限公司 Streaming data distributed processing method and device
CN107808333A (en) * 2016-09-08 2018-03-16 阿里巴巴集团控股有限公司 A kind of commodity launch decision system, method and device
CN108769167A (en) * 2018-05-17 2018-11-06 北京奇艺世纪科技有限公司 A kind of the push distribution method and device of business datum
CN109993587A (en) * 2019-04-10 2019-07-09 金瓜子科技发展(北京)有限公司 A kind of data classification method, device, equipment and medium
CN111163078A (en) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 Network link interception method, device, equipment and medium
CN113051413A (en) * 2019-12-27 2021-06-29 腾讯科技(北京)有限公司 Multimedia information processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1340785A (en) * 2000-09-01 2002-03-20 国际商业机器公司 System and method for visibly analyzing dot-teat flow data with parallel coordinate system
US20080183718A1 (en) * 2002-03-07 2008-07-31 Man Jit Singh Clickstream analysis methods and systems
US20110231256A1 (en) * 2009-07-25 2011-09-22 Kindsight, Inc. Automated building of a model for behavioral targeting
CN102254265A (en) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 Rich media internet advertisement content matching and effect evaluation method
CN103178982A (en) * 2011-12-23 2013-06-26 阿里巴巴集团控股有限公司 Method and device for analyzing log

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1340785A (en) * 2000-09-01 2002-03-20 国际商业机器公司 System and method for visibly analyzing dot-teat flow data with parallel coordinate system
US20080183718A1 (en) * 2002-03-07 2008-07-31 Man Jit Singh Clickstream analysis methods and systems
US20110231256A1 (en) * 2009-07-25 2011-09-22 Kindsight, Inc. Automated building of a model for behavioral targeting
CN102254265A (en) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 Rich media internet advertisement content matching and effect evaluation method
CN103178982A (en) * 2011-12-23 2013-06-26 阿里巴巴集团控股有限公司 Method and device for analyzing log

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484372A (en) * 2014-12-04 2015-04-01 北京奇虎科技有限公司 Detecting method and device of business object sending information
CN104809158A (en) * 2015-03-26 2015-07-29 小米科技有限责任公司 Network content filter method and device
CN104809158B (en) * 2015-03-26 2018-05-18 小米科技有限责任公司 Web content filter method and device
CN105023170A (en) * 2015-06-26 2015-11-04 深圳市腾讯计算机系统有限公司 Processing method and device of click stream data
CN104965812B (en) * 2015-07-13 2017-12-01 深圳市腾讯计算机系统有限公司 A kind of Deep model processing method and processing device
CN104965812A (en) * 2015-07-13 2015-10-07 深圳市腾讯计算机系统有限公司 Deep-layer model processing method and device
CN105224998A (en) * 2015-09-08 2016-01-06 北京金山安全软件有限公司 Data processing method and device for pre-estimation model
CN106919579A (en) * 2015-12-24 2017-07-04 腾讯科技(深圳)有限公司 A kind of information processing method and device, equipment
CN107092620B (en) * 2016-02-18 2021-11-16 奥多比公司 Click stream visual analysis based on maximum order mode
CN107092620A (en) * 2016-02-18 2017-08-25 奥多比公司 Click steam visual analysis based on maximum serial model
TWI662426B (en) * 2016-06-20 2019-06-11 香港商阿里巴巴集團服務有限公司 Method and device for distributed stream data processing
WO2017219858A1 (en) * 2016-06-20 2017-12-28 阿里巴巴集团控股有限公司 Streaming data distributed processing method and device
US11036562B2 (en) 2016-06-20 2021-06-15 Advanced New Technologies Co., Ltd. Streaming data distributed processing method and device
KR102099544B1 (en) 2016-06-20 2020-05-18 알리바바 그룹 홀딩 리미티드 Method and device for processing distribution of streaming data
KR20190020105A (en) * 2016-06-20 2019-02-27 알리바바 그룹 홀딩 리미티드 Method and device for distributing streaming data
CN106127528B (en) * 2016-06-30 2021-06-08 北京小米移动软件有限公司 Advertisement putting method and device
CN106127528A (en) * 2016-06-30 2016-11-16 北京小米移动软件有限公司 Advertisement placement method and device
CN107808333A (en) * 2016-09-08 2018-03-16 阿里巴巴集团控股有限公司 A kind of commodity launch decision system, method and device
CN106997394B (en) * 2017-04-12 2019-06-14 成都四方伟业软件股份有限公司 A kind of data random ordering arrival processing method and system
CN106997394A (en) * 2017-04-12 2017-08-01 成都四方伟业软件股份有限公司 A kind of out of order arrival processing method and system of data
CN108769167A (en) * 2018-05-17 2018-11-06 北京奇艺世纪科技有限公司 A kind of the push distribution method and device of business datum
CN109993587A (en) * 2019-04-10 2019-07-09 金瓜子科技发展(北京)有限公司 A kind of data classification method, device, equipment and medium
CN109993587B (en) * 2019-04-10 2022-06-03 金瓜子科技发展(北京)有限公司 Data classification method, device, equipment and medium
CN111163078A (en) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 Network link interception method, device, equipment and medium
CN113051413A (en) * 2019-12-27 2021-06-29 腾讯科技(北京)有限公司 Multimedia information processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN104091276B (en) 2015-08-26

Similar Documents

Publication Publication Date Title
CN104091276B (en) The method of on-line analysis clickstream data and relevant apparatus and system
CN110198310B (en) Network behavior anti-cheating method and device and storage medium
CN108108821B (en) Model training method and device
CN108121795B (en) User behavior prediction method and device
US20180253755A1 (en) Method and apparatus for identification of fraudulent click activity
US20190266206A1 (en) Data processing method, server, and computer storage medium
WO2020257991A1 (en) User identification method and related product
CN109561052B (en) Method and device for detecting abnormal flow of website
CN106789543A (en) The method and apparatus that facial expression image sends are realized in session
CN111522724B (en) Method and device for determining abnormal account number, server and storage medium
CN113505272B (en) Control method and device based on behavior habit, electronic equipment and storage medium
CN110782291A (en) Advertisement delivery user determination method and device, storage medium and electronic device
CN113076416A (en) Information heat evaluation method and device and electronic equipment
CN107093092B (en) Data analysis method and device
CN104992060A (en) User age estimation method and apparatus
CN113010785A (en) User recommendation method and device
CN112651790A (en) OCPX self-adaptive learning method and system based on user reach in fast-moving industry
CN110460593B (en) Network address identification method, device and medium for mobile traffic gateway
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment
CN111368864A (en) Identification method, availability evaluation method and device, electronic equipment and storage medium
CN113655958A (en) Application data storage method
CN104484329B (en) Consumption hot spot method for tracing and device based on comment centre word timing variations analysis
CN108629610B (en) Method and device for determining popularization information exposure
CN109919197B (en) Random forest model training method and device
CN109213906B (en) Session duration calculation method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant