CN107886352B

CN107886352B - Advertisement settlement method and system

Info

Publication number: CN107886352B
Application number: CN201711023402.3A
Authority: CN
Inventors: 肖培林; 李东升
Original assignee: Weimeng Chuangke Network Technology China Co Ltd
Current assignee: Weimeng Chuangke Network Technology China Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2021-11-19
Anticipated expiration: 2037-10-27
Also published as: CN107886352A

Abstract

The embodiment of the invention provides a method and a system for advertisement settlement, wherein the method comprises the following steps: acquiring a plurality of advertisement log data, and determining the identification information of an advertiser corresponding to each advertisement log data; sending the advertisement log data to the aggregation queues of the corresponding advertisers respectively; aggregating the advertisement log data in the aggregation queue of each advertiser in parallel; aiming at the aggregate data of the advertisement log data of each advertiser, the following steps are respectively executed: judging whether the aggregate data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not, and if so, sending the aggregate data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser; and sending each aggregated data to a statistical coroutine corresponding to the current cache queue, and writing the aggregated data into a database. By the invention, the writing efficiency of the database is greatly improved, and a large amount of advertisement log data can be processed timely and efficiently.

Description

Advertisement settlement method and system

Technical Field

The invention relates to the technical field of internet advertisement settlement, in particular to a method and a system for advertisement settlement.

Background

The existing online advertisement settlement is generally to calculate the amount of money consumed by an advertiser by a method of counting advertisement exposure or interaction logs, and when the consumed amount of money reaches or exceeds the limit set by the advertiser, the advertiser is notified to be offline. When one advertisement is off-line, no consumption is generated, and no bandwidth resource is occupied. However, the advertisement log amount increases with the increase of the advertisement exposure, and if the log amount increases rapidly in a short time or a large amount of advertisements are delivered by the same advertiser, the settlement or the processing upper limit of the database is reached, and the redundant logs cannot be processed in time, so that the advertisement offline notification cannot be sent in time, the advertisements are delivered all the time, bandwidth resources are occupied, and the advertisement income cannot be brought.

In the existing method for solving the problem of advertisement overspray, received settlement advertisements are analyzed, primary aggregation is carried out on the advertisements to be settled according to the advertisers and the advertisement plans, aggregated advertisement data are respectively pushed to buffer queues with corresponding serial numbers of the advertisers, so that the number of logs of the advertisement plans is reduced, secondary aggregation is carried out on the aggregated advertisement data in the buffer queues, the queue length is reduced, and the settlement speed is improved.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: although the prior art solves the problem of over-delivery caused by a large number of single advertisers within a short time, the prior art still has some defects:

1. the distributed preprocessing module and the statistical module are inflexible in capacity expansion: when the log amount increases sharply, the preprocessing module is most likely to reach the processing limit, and because the prior art is limited by a software architecture, the front-end log distribution is not flexible enough, so that the preprocessing module cannot expand rapidly, and once a problem occurs, a large amount of logs cannot be processed in time, so that settlement is delayed;

2. the aggregation algorithm is not intelligent enough: in the prior art, a fixed speed factor is adopted to aggregate a buffer queue, so that the aggregation granularity can be improved only when the queue generates more accumulation, thereby accelerating the processing speed, but when less accumulated data is digested, the accumulated data cannot be digested in time by the method.

Disclosure of Invention

The embodiment of the invention provides a method and a system for advertisement settlement, which can process a large amount of advertisement log data timely and efficiently.

In one aspect, an embodiment of the present invention provides a method for advertisement settlement, including:

acquiring a plurality of advertisement log data, and determining the identification information of an advertiser corresponding to each advertisement log data;

according to the determined identification information of the advertiser corresponding to each advertisement log data, sending each advertisement log data to the aggregation queue of the corresponding advertiser;

aggregating the advertisement log data in the aggregation queue of each advertiser in parallel to obtain the aggregation data of the advertisement log data of each advertiser;

aiming at the aggregate data of the advertisement log data of each advertiser, the following steps are respectively executed: judging whether the aggregated data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not according to the aggregated data of the current advertiser advertisement log data and the current time, and sending the aggregated data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser when any one preset pushing condition is met;

and in each cache queue, sending each aggregated data to a statistical coroutine corresponding to the current cache queue, and writing each aggregated data into a database through the statistical coroutine.

In another aspect, an embodiment of the present invention provides a system for advertisement settlement, including:

the access submodule is used for acquiring a plurality of advertisement log data and determining the identification information of the advertiser corresponding to each advertisement log data;

the preprocessing submodule is also used for respectively sending each piece of advertisement log data to the aggregation queue of the corresponding advertiser according to the identification information of the corresponding advertiser of each piece of determined advertisement log data;

the aggregation submodule is used for aggregating the advertisement log data in the aggregation queue of each advertiser in parallel to obtain the aggregation data of the advertisement log data of each advertiser;

the judgment submodule is further used for executing the following steps respectively according to the aggregated data of the advertisement log data of each advertiser: judging whether the aggregated data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not according to the aggregated data of the current advertiser advertisement log data and the current time, and sending the aggregated data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser when any one preset pushing condition is met;

and the counting submodule is used for sending each aggregated data to a counting coroutine corresponding to the current cache queue in each cache queue and writing each aggregated data into a database through the counting coroutine.

The technical scheme has the following beneficial effects: the method and the system ensure that the advertisement log data of the same advertiser can be sent to the same aggregation queue and process the advertisement log data of the same advertiser in the same aggregation corotation, avoid the occurrence of the condition that the efficiency of processing the advertisement log data of the same advertiser is reduced when the log data of the same advertiser is dispersed to different aggregation queues, and greatly improve the data processing efficiency of the advertisement log data of each same advertiser; by predefining different pushing conditions, the aggregated advertisement log data are quickly sent to the corresponding statistical queues, and important precondition guarantee is provided for timely processing a large amount of advertisement log data; meanwhile, the condition that locking of the advertisement log data of the advertiser is not needed in the process of writing in the database is avoided, so that the writing efficiency of the database is greatly improved, and further, the capability of timely processing a large amount of advertisement log data is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram of a method for advertisement settlement in one embodiment of the present invention;

FIG. 2 is a schematic diagram of a system for advertisement settlement according to another embodiment of the present invention;

FIG. 3 is a diagram illustrating the structure of queue capacity in a preferred embodiment of the present invention;

FIG. 4 is a diagram illustrating an image of an aggregation coefficient function according to a preferred embodiment of the present invention;

fig. 5 is a system architecture diagram of advertisement settlement in a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, which is a flowchart of a method for advertisement settlement in an embodiment of the present invention, the method includes:

101. acquiring a plurality of advertisement log data, and determining the identification information of an advertiser corresponding to each advertisement log data;

102. according to the determined identification information of the advertiser corresponding to each advertisement log data, sending each advertisement log data to the aggregation queue of the corresponding advertiser;

103. aggregating the advertisement log data in the aggregation queue of each advertiser in parallel to obtain the aggregation data of the advertisement log data of each advertiser;

104. aiming at the aggregate data of the advertisement log data of each advertiser, the following steps are respectively executed: judging whether the aggregated data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not according to the aggregated data of the current advertiser advertisement log data and the current time, and sending the aggregated data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser when any one preset pushing condition is met;

105. and in each cache queue, sending each aggregated data to a statistical coroutine corresponding to the current cache queue, and writing each aggregated data into a database through the statistical coroutine.

Optionally, the method further comprises:

acquiring the number of preset preprocessing routines, and creating a plurality of preprocessing routines matched with the number of the preprocessing routines;

acquiring the number of preset aggregation queues, creating a plurality of aggregation queues matched with the number of the aggregation queues and a plurality of aggregation coroutines in one-to-one correspondence with the aggregation queues, and determining an aggregation queue serial number uniquely corresponding to each aggregation queue;

acquiring the number of preset statistical queues, creating a plurality of statistical queues matched with the number of the statistical queues and a plurality of statistical coroutines corresponding to the statistical queues one by one, and determining a unique statistical queue serial number corresponding to each statistical queue;

the acquiring of the plurality of advertisement log data and the determining of the identification information of the advertiser corresponding to each advertisement log data includes:

acquiring a plurality of advertisement log data from each partition in a distributed publish-subscribe message system Kafka through the plurality of preprocessing routines, and respectively analyzing the acquired advertisement log data;

according to the analyzed advertisement log data, determining the identification information of the advertiser corresponding to the advertisement log data;

the analyzed advertisement log data comprise identification information of an advertisement plan, identification information of an advertiser and an advertisement consumption value corresponding to the advertisement log data;

after determining the identification information of the advertiser corresponding to each advertisement log data according to each analyzed advertisement log data, the method further includes:

and according to the determined identification information of the advertiser corresponding to each advertisement log data, performing data processing on the identification information of each advertiser in each partition in Kafka by a preset data processing algorithm.

Preferably, the sending, according to the identification information of the advertiser corresponding to each determined advertisement log data, each advertisement log data to the aggregation queue of the advertiser corresponding to each determined advertisement log data includes:

for each piece of advertisement log data, the following operations are respectively executed:

calculating a remainder obtained by dividing the identification information of the advertiser by the number of the aggregation queues according to the identification information of the advertiser corresponding to the determined current advertisement log data to obtain a first remainder value;

determining an aggregation queue sequence number matched with the first remainder value;

and sending the current advertisement log data to the aggregation queue corresponding to the determined aggregation queue serial number.

Preferably, the aggregating the advertisement log data in the aggregation queue of each advertiser in parallel to obtain the aggregated data of the advertisement log data of each advertiser includes:

according to the determined identification information of the advertisers and the identification information of the advertisement plans corresponding to the advertisement log data, aggregating the advertisement log data of each advertisement plan of each advertiser in parallel to obtain the aggregated data of each advertisement plan of each advertiser; wherein, the aggregate data of all the advertisement plans of any advertiser form the aggregate data of the advertisement log data of the advertiser;

wherein, after aggregating the advertisement log data of each advertisement plan of each advertiser in parallel to obtain the aggregated data of each advertisement plan of each advertiser, the method further comprises:

combining the advertisement consumption values of each advertisement plan of the same advertiser according to the aggregation data of each advertisement plan of each advertiser to obtain the aggregation consumption value of each advertisement plan of each advertiser;

the method comprises the following steps of aiming at the aggregate data of each advertisement plan of each advertiser:

acquiring current time, and judging whether any one of a plurality of preset pushing conditions is met according to the aggregation consumption value of the current advertisement plan of the current advertiser and the current time;

when any preset pushing condition is met, calculating a remainder obtained by dividing the identification information of the advertiser by the number of the counting queues according to the identification information of the current advertiser to obtain a second remainder value;

determining a buffer queue serial number of the current aggregation data matched with the second remainder value;

and sending the aggregated data of the current advertisement plan of the current advertiser to a cache queue corresponding to the cache queue serial number.

Preferably, the determining whether the aggregated data of the current advertiser advertisement log data satisfies any one of a predetermined plurality of push conditions includes:

acquiring a preset aggregation number threshold and an aggregation coefficient corresponding to aggregation data of a current advertisement plan of a current advertiser, determining a first pushing threshold according to the aggregation number threshold and the aggregation coefficient, judging whether an aggregation consumption value of the current advertisement plan of the current advertiser reaches the first pushing threshold, and if so, meeting a preset first pushing condition;

acquiring a preset aggregation consumption threshold, determining a second pushing threshold according to the aggregation consumption threshold and the aggregation coefficient, judging whether the aggregation consumption value of the current advertisement plan of the current advertiser reaches the second pushing threshold, and if so, meeting a preset second pushing condition;

acquiring a preset first difference threshold, determining first pushing time for pushing the advertisement log data of the current advertisement plan of the current advertiser for the latest time from the current time, calculating a first time difference between the current time and the first pushing time, and if the first time difference is greater than the first difference threshold, meeting a preset third pushing condition;

and acquiring a preset second difference threshold, determining second pushing time for pushing the advertisement log data of the current advertisement plan of the current advertiser to a cache queue at the latest time from the current time, calculating a second time difference between the current time and the second pushing time, and if the second time difference is greater than the second difference threshold, meeting a preset fourth pushing condition.

Preferably, the obtaining of the aggregation coefficient corresponding to the aggregation data of the current advertisement plan of the current advertiser includes:

determining a cache queue serial number of the aggregated data of the current advertisement plan of the current advertiser according to the aggregated data of the current advertisement plan of the current advertiser, and determining a capacity proportion of elements in a cache queue corresponding to the cache queue serial number to the capacity of the cache queue;

and determining an aggregation coefficient of the aggregation data of the current advertising plan of the current advertiser according to the capacity proportion based on a preset calculation rule.

Fig. 2 is a schematic structural diagram of a system for advertisement settlement in an embodiment of the present invention, including:

the access sub-module 21 is configured to obtain a plurality of advertisement log data, and determine identification information of an advertiser corresponding to each advertisement log data;

the preprocessing submodule 22 is further configured to send each piece of advertisement log data to an aggregation queue of each corresponding advertiser according to the determined identification information of the advertiser corresponding to each piece of advertisement log data;

the aggregation submodule 23 is configured to aggregate the advertisement log data in the aggregation queue of each advertiser in parallel to obtain aggregated data of the advertisement log data of each advertiser;

the determining sub-module 24 is further configured to, for the aggregated data of the advertisement log data of each advertiser, respectively perform the following steps: judging whether the aggregated data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not according to the aggregated data of the current advertiser advertisement log data and the current time, and sending the aggregated data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser when any one preset pushing condition is met;

and the counting submodule 25 is configured to send each aggregation data to a counting coroutine corresponding to the current cache queue in each cache queue, and write each aggregation data into the database through the counting coroutine.

Optionally, the method further comprises:

the first acquisition and creation submodule is used for acquiring the number of preset preprocessing routines and creating a plurality of preprocessing routines matched with the number of the preprocessing routines;

the second obtaining and creating submodule is used for obtaining the preset number of aggregation queues, creating a plurality of aggregation queues matched with the number of the aggregation queues and a plurality of aggregation coroutines corresponding to the aggregation queues one by one, and determining an aggregation queue serial number uniquely corresponding to each aggregation queue;

the third obtaining and creating submodule is used for obtaining the number of preset statistical queues, creating a plurality of statistical queues matched with the number of the statistical queues and a plurality of statistical coroutines corresponding to the statistical queues one by one, and determining a unique statistical queue serial number corresponding to each statistical queue;

wherein, the access submodule comprises:

the acquisition and analysis unit is used for acquiring a plurality of advertisement log data from each partition in the distributed publish-subscribe message system Kafka through the plurality of preprocessing routines and analyzing the acquired advertisement log data respectively;

the first determining unit is used for determining the identification information of the advertiser corresponding to each advertisement log data according to each analyzed advertisement log data;

wherein the first determining unit is further used for

Preferably, the pre-processing submodule is particularly for

Preferably, the aggregation sub-module is specifically configured to aggregate, in parallel, the advertisement log data of each advertisement plan of each advertiser according to the identification information of the advertiser and the identification information of the advertisement plan corresponding to each determined advertisement log data, so as to obtain aggregated data of each advertisement plan of each advertiser; wherein, the aggregate data of all the advertisement plans of any advertiser form the aggregate data of the advertisement log data of the advertiser;

wherein the aggregation submodule is also used for

the judgment sub-module is specifically configured to perform the following steps for the aggregated data of each advertisement plan of each advertiser:

Preferably, the judgment sub-module includes:

the first judging unit is used for acquiring a preset aggregation number threshold and an aggregation coefficient corresponding to the aggregation data of the current advertisement plan of the current advertiser, determining a first pushing threshold according to the aggregation number threshold and the aggregation coefficient, judging whether the aggregation consumption value of the current advertisement plan of the current advertiser reaches the first pushing threshold, and if so, meeting a preset first pushing condition;

a second judging unit, configured to obtain a preset aggregate consumption threshold, determine a second pushing threshold according to the aggregate consumption threshold and the aggregation coefficient, and judge whether an aggregate consumption value of a current advertisement plan of a current advertiser reaches the second pushing threshold, and if so, meet a predetermined second pushing condition;

a third judging unit, configured to obtain a preset first difference threshold, determine a first pushing time for pushing advertisement log data of a current advertisement plan of a current advertiser for the last time from the current time, calculate a first time difference between the current time and the first pushing time, and if the first time difference is greater than the first difference threshold, satisfy a predetermined third pushing condition;

and the fourth judging unit is used for acquiring a preset second difference threshold, determining a second pushing time for pushing the advertisement log data of the current advertisement plan of the current advertiser to the cache queue at the latest time from the current time, calculating a second time difference between the current time and the second pushing time, and if the second time difference is greater than the second difference threshold, meeting a preset fourth pushing condition.

Preferably, the determining sub-module further includes:

the second determining unit is used for determining a cache queue serial number of the aggregated data of the current advertisement plan of the current advertiser according to the aggregated data of the current advertisement plan of the current advertiser, and determining a capacity proportion of elements in a cache queue corresponding to the cache queue serial number to the capacity of the cache queue;

and the calculation unit is used for determining the aggregation coefficient of the aggregation data of the current advertisement plan of the current advertiser according to the capacity proportion based on a preset calculation rule.

The technical scheme of the embodiment of the invention has the following beneficial effects: the method and the system ensure that the advertisement log data of the same advertiser can be sent to the same aggregation queue and process the advertisement log data of the same advertiser in the same aggregation corotation, avoid the occurrence of the condition that the efficiency of processing the advertisement log data of the same advertiser is reduced when the log data of the same advertiser is dispersed to different aggregation queues, and greatly improve the data processing efficiency of the advertisement log data of each same advertiser; by predefining different pushing conditions, the aggregated advertisement log data are quickly sent to the corresponding statistical queues, and important precondition guarantee is provided for timely processing a large amount of advertisement log data; meanwhile, the condition that locking of the advertisement log data of the advertiser is not needed in the process of writing in the database is avoided, so that the writing efficiency of the database is greatly improved, and further, the capability of timely processing a large amount of advertisement log data is greatly improved.

The above technical solutions of the embodiments of the present invention are described in detail below with reference to application examples:

the application example of the invention aims to process a large amount of advertisement log data timely and efficiently.

As shown in fig. 1, for example, in the online advertisement settlement system a, the advertisement log files are all stored in the distributed publish-subscribe message system Kafka, the online advertisement settlement system a extracts data from each partition of Kafka, and determines the identification information of the advertiser corresponding to each advertisement log data, for example, the identification information of the advertiser corresponding to the advertisement log data 1 is 1001, and the identification information of the advertiser corresponding to the advertisement log data 2 is 1002, and then sends the advertisement log data 1 to the aggregation queue corresponding to the advertiser 1001, for example, the aggregation queue 1, and sends the advertisement log data 2 to the aggregation queue corresponding to the advertiser 1002, for example, the aggregation queue 2; for the advertiser 1001 and the advertiser 1002, aggregating the advertisement log data of the advertiser 1001 and the advertiser 1002 in the aggregation queue 1 and the aggregation queue 2 in parallel to obtain aggregate data of the advertisement log data of the advertiser 1001 and aggregate data of the advertisement log data of the advertiser 1002; for the aggregated data of the advertiser 1001 advertisement log data, the following steps are respectively performed: acquiring current time, judging whether any one of a plurality of preset pushing conditions is met or not according to the aggregated data of the advertisement log data of the advertiser 1001 and the current time, and sending the aggregated data of the advertisement log data of the advertiser 1001 to a corresponding cache queue, such as the cache queue 1, when any one preset pushing condition is met; for the aggregated data of the advertiser 1002 advertisement log data, the following steps are respectively performed: acquiring current time, judging whether any one of a plurality of preset pushing conditions is met or not according to the aggregation data of the advertisement log data of the advertiser 1002 and the current time, and sending the aggregation data of the advertisement log data of the advertiser 1002 to a corresponding cache queue, such as the cache queue 2, when any one preset pushing condition is met; the aggregated data of the advertisement log data of the advertiser 1001 is sent to a statistical coroutine corresponding to the cache queue 1 in the cache queue 1, such as the statistical coroutine 1, the aggregated data of the advertisement log data of the advertiser 1001 is written into a database through the statistical coroutine 1, the aggregated data of the advertisement log data of the advertiser 1002 is sent to a statistical coroutine corresponding to the cache queue 2 in the cache queue 2, such as the statistical coroutine 2, and the aggregated data of the advertisement log data of the advertiser 1002 is written into the database through the statistical coroutine 2.

It should be noted that one skilled in the art will appreciate that kafka (apache kafka) is a high throughput distributed publish-subscribe messaging system that can handle all the action flow data in a consumer-scale web site. Web browsing, searching and other user behavior are a key factor in many social functions on modern networks, and these data are usually addressed by processing logs and log aggregations due to throughput requirements. For log data and offline analysis systems like Hadoop (a distributed system infrastructure developed by the Apache foundation), but with the limitation of requiring real-time processing, the Kafka solution can be used. The purpose of Kafka is to unify message processing both online and offline through the parallel loading mechanism of Hadoop, and also to provide real-time consumption through clustering. Coroutine, also called micro-thread, is known as Coroutine. Coroutines, like subroutines, are program components that are more general and flexible than subroutines, but are not as extensive in practice as subroutines. Coroutines are derived from Simula and Modula-2 languages, but are supported by other languages as well.

In a preferred embodiment, the method further comprises: acquiring the number of preset preprocessing routines, and creating a plurality of preprocessing routines matched with the number of the preprocessing routines; acquiring the number of preset aggregation queues, creating a plurality of aggregation queues matched with the number of the aggregation queues and a plurality of aggregation coroutines in one-to-one correspondence with the aggregation queues, and determining an aggregation queue serial number uniquely corresponding to each aggregation queue; the method comprises the steps of obtaining the number of preset statistical queues, creating a plurality of statistical queues matched with the number of the statistical queues and a plurality of statistical coroutines corresponding to the statistical queues one by one, and determining the unique statistical queue serial number corresponding to each statistical queue.

For example, in the online advertisement settlement system a, a preset number of pre-processing routines is obtained, for example, 2 pre-processing routines matching the number of pre-processing routines are created; acquiring the preset number of aggregation queues, for example, 3, creating 3 aggregation queues matched with the number of the aggregation queues and 3 aggregation coroutines in one-to-one correspondence with the 3 aggregation queues, wherein the one-to-one correspondence between the aggregation queues and the aggregation coroutines can be realized through a hash function, and determining the aggregation queue serial number, for example, 1, 2 and 3, which is uniquely corresponding to each aggregation queue; the method comprises the steps of obtaining a preset number of statistical queues, such as 4, creating 4 statistical queues matched with the number of the statistical queues and 4 statistical coroutines corresponding to the 4 statistical queues one by one, wherein the one-to-one corresponding relation between the statistical queues and the statistical coroutines can be realized through a hash function, and each statistical queue is determined to correspond to a unique statistical queue serial number, such as 1, 2, 3 and 4.

Through the embodiment, because each module is composed of the coroutines, the quantity of the coroutines can be freely and flexibly adjusted according to the business volume, the data processing efficiency is greatly ensured, and further, when the data volume of the advertisement log is increased sharply, necessary precondition guarantee is provided for timely processing a large amount of advertisement log data.

The acquiring of the plurality of advertisement log data and the determining of the identification information of the advertiser corresponding to each advertisement log data includes: acquiring a plurality of advertisement log data from each partition in a distributed publish-subscribe message system Kafka through the plurality of preprocessing routines, and respectively analyzing the acquired advertisement log data; and determining the identification information of the advertiser corresponding to each advertisement log data according to each analyzed advertisement log data.

The analyzed advertisement log data comprise identification information of an advertisement plan, identification information of an advertiser and an advertisement consumption value corresponding to the advertisement log data.

For example, in the online advertisement settlement system a, a plurality of advertisement log data are acquired from each partition in the distributed publish-subscribe message system Kafka through 2 pre-processing routines that have been created, and the acquired advertisement log data are respectively analyzed; and according to the analyzed advertisement log data, determining the identification information of the advertiser corresponding to each advertisement log data, wherein the identification information of the advertiser corresponding to the advertisement log data 1 is 1001, and the identification information of the advertiser corresponding to the advertisement log data 2 is 1002.

After determining the identification information of the advertiser corresponding to each advertisement log data according to each analyzed advertisement log data, the method further includes: and according to the analyzed identification information of the advertiser corresponding to each advertisement log data, performing data processing on the identification information of each advertiser in each partition in Kafka through a preset data processing algorithm.

For example, in the online advertisement settlement system a, according to the identifier information of the advertiser corresponding to each analyzed advertisement log data, the stored advertisement log data is hashed according to the identifier information of the advertiser in each partition of Kafka by a predetermined data processing algorithm, such as a hash algorithm, so that each subsequent preprocessing routine obtains each advertisement log data of each advertiser stored in each partition from at least one partition.

Through the embodiment, the Hash processing is carried out on the advertisement log data of different advertisers, so that the logs of the same advertiser can only exist in the same Kafka partition, the condition that the same advertisement log data are repeatedly processed by all preprocessing routines is avoided, and the advertisement log data of the same Kafka partition can only be processed by one single preprocessing routine; because the Kafka partitions read by different preprocessing routines are different, the data separation of the dimension of the advertiser is realized, and necessary precondition is provided for not locking the advertiser when database operation is subsequently carried out.

In a preferred embodiment, the step 102 of sending each piece of advertisement log data to an aggregation queue of each corresponding advertiser according to the determined identification information of each corresponding advertiser of each piece of advertisement log data includes: for each piece of advertisement log data, the following operations are respectively executed: calculating a remainder obtained by dividing the identification information of the advertiser by the number of the aggregation queues according to the identification information of the advertiser corresponding to the determined current advertisement log data to obtain a first remainder value; determining an aggregation queue sequence number matched with the first remainder value; and sending the current advertisement log data to the aggregation queue corresponding to the determined aggregation queue serial number.

For example, in the online advertisement settlement system a, the following operations are performed for the advertisement log data 1, respectively: according to the identification information 1001 of the advertiser corresponding to the determined current advertisement log data 1, calculating the remainder obtained by dividing the identification information 1001 of the advertiser by the number 3 of aggregation queues to obtain a first remainder value 2, determining the serial number of the aggregation queues matched with the first remainder value to be 2, and sending the analyzed current advertisement log data 1 to the aggregation queues 2; with respect to the advertisement log data 2, the following operations are performed, respectively: according to the determined identification information 1002 of the advertiser corresponding to the current advertisement log data 2, calculating a remainder obtained by dividing the identification information 1002 of the advertiser by the number 3 of the aggregation queues to obtain a first remainder value 0, determining the number 3 of the aggregation queues matched with the first remainder value, and sending the analyzed current advertisement log data 2 to the determined aggregation queues 3 corresponding to the number 3 of the aggregation queues.

Through the embodiment, the advertisement log data of the same advertiser can be sent to the same aggregation queue, and the advertisement log data of the same advertiser can be processed in the same aggregation protocol, so that the condition that the efficiency of processing the advertisement log data of the same advertiser is reduced when the log data of the same advertiser is dispersed to different aggregation queues is avoided, the data processing efficiency of the advertisement log data of each same advertiser is greatly improved, and important precondition guarantee is further provided for timely processing a large amount of advertisement log data.

In a preferred embodiment, the aggregating the advertisement log data in the aggregation queue of each advertiser in parallel in step 103 to obtain the aggregated data of the advertisement log data of each advertiser includes: according to the determined identification information of the advertisers and the identification information of the advertisement plans corresponding to the advertisement log data, aggregating the advertisement log data of each advertisement plan of each advertiser in parallel to obtain the aggregated data of each advertisement plan of each advertiser; wherein the aggregated data for all advertising campaigns of any advertiser comprises aggregated data for the advertiser's advertising log data.

Wherein, after aggregating the advertisement log data of each advertisement plan of each advertiser in parallel to obtain the aggregated data of each advertisement plan of each advertiser, the method further comprises: and combining the advertisement consumption values of each advertisement plan of the same advertiser according to the aggregation data of each advertisement plan of each advertiser to obtain the aggregation consumption value of each advertisement plan of each advertiser.

Wherein the aggregated data for each advertisement plan of each advertiser separately performs the following steps: acquiring current time, and judging whether any one of a plurality of preset pushing conditions is met according to the aggregation consumption value of the current advertisement plan of the current advertiser and the current time; when any preset pushing condition is met, calculating a remainder obtained by dividing the identification information of the advertiser by the number of the counting queues according to the identification information of the current advertiser to obtain a second remainder value; determining a buffer queue serial number of the current aggregation data matched with the second remainder value; and sending the aggregated data of the current advertisement plan of the current advertiser to a cache queue corresponding to the cache queue serial number.

For example, in the online advertisement settlement system a, the obtained advertisement log data includes advertisement log data 1 and advertisement log data 2, where the advertisement log data 1 includes log data of an advertisement plan a and log data of an advertisement plan b with advertiser identification information of 1001, and the advertisement log data 2 includes log data of an advertisement plan c with advertiser identification information of 1002; for the aggregated data of the advertiser's ad plan a of the identification information 1001, the following steps are respectively performed: acquiring current time, and judging whether any one of a plurality of preset pushing conditions is met according to the aggregation consumption value of the current advertisement plan a of the current advertiser 1001 and the current time; when any preset pushing condition is met, calculating a remainder obtained by dividing identification information 1001 of the advertiser by the number 4 of the statistical queues according to the identification information 1001 of the current advertiser to obtain a second remainder value 1; determining that the buffer queue serial number of the current aggregation data matched with the second remainder value is 1; the aggregated data of the current ad plan a of the current advertiser 1001 is sent to the buffer queue 1. The execution steps of the log data of the advertisement plan b with the advertiser identification information of 1001 and the log data of the advertisement plan c with the advertiser identification information of 1002 are similar to the steps of processing the log data of the advertisement plan a with the advertiser identification information of 1001, and are not described again here.

By the embodiment, the advertisement log data of different advertisers are further dispersed, wherein the dispersed limit state is that each statistical coroutine only processes the advertisement log data of one advertiser; when the advertisement log data of the advertiser is written into the database by a certain statistical coroutine, the advertisement log data of the advertiser is ensured to be written only by the statistical coroutine, so that the advertisement log data of the advertiser does not need to be locked in the process of writing into the database, and the writing efficiency of the database is greatly improved.

In a preferred embodiment, the determining whether the aggregated data of the current advertiser's ad log data satisfies any one of a predetermined plurality of push conditions in step 104 includes:

acquiring a preset aggregation number threshold and an aggregation coefficient corresponding to aggregation data of a current advertisement plan of a current advertiser, determining a first pushing threshold according to the aggregation number threshold and the aggregation coefficient, judging whether an aggregation consumption value of the current advertisement plan of the current advertiser reaches the first pushing threshold, and if so, meeting a preset first pushing condition; and calculating to determine a first pushing threshold, for example, calculating the product of the aggregation number threshold and the aggregation coefficient to determine the first pushing threshold.

Acquiring a preset aggregation consumption threshold, determining a second pushing threshold according to the aggregation consumption threshold and the aggregation coefficient, judging whether the aggregation consumption value of the current advertisement plan of the current advertiser reaches the second pushing threshold, and if so, meeting a preset second pushing condition; and calculating to determine a second pushing threshold, for example, calculating the product of the aggregation consumption threshold and the aggregation coefficient to determine a first pushing threshold.

In a preferred embodiment, the obtaining an aggregation coefficient corresponding to aggregation data of a current advertisement plan of a current advertiser includes: determining a cache queue serial number of the aggregated data of the current advertisement plan of the current advertiser according to the aggregated data of the current advertisement plan of the current advertiser, and determining a capacity proportion of elements in a cache queue corresponding to the cache queue serial number to the capacity of the cache queue; and determining an aggregation coefficient of the aggregation data of the current advertising plan of the current advertiser according to the capacity proportion based on a preset calculation rule.

For example, in the online advertisement settlement system a, the aggregation coefficient corresponding to the aggregation data of the current advertisement plan of the current advertiser is calculated according to the "water level" of the corresponding statistical queue, so as to adjust the aggregation threshold and increase the aggregation granularity. The water level refers to a capacity ratio of existing elements in a queue to a queue capacity, and the value range is (0, 1), referring to fig. 3, specifically, according to aggregated data of a current advertisement plan of a current advertiser, a cache queue serial number of the aggregated data of the current advertisement plan of the current advertiser is determined, and a capacity ratio of elements in a cache queue corresponding to the cache queue serial number to the cache queue capacity is determined, such as 0.2, and based on a predetermined calculation rule, the following calculation formula is given:

wherein level is queue level, max is expansion coefficient (configurable), and default is 200. Then, from the capacity ratio of 0.5, an aggregation coefficient of aggregated data for the current advertiser's current advertising campaign may be determined to be 5.

As shown in fig. 4, when the "water level" of the corresponding statistical queue is lower than a threshold (less than 0.2), the aggregation coefficient is 1, and the aggregation threshold is a default value in the configuration. When the 'water level' of the corresponding statistical queue is gradually increased (more than 0.2), the aggregation coefficient is also increased sharply, the aggregation threshold value is also increased by times along with the aggregation coefficient, and more advertisement log data of the same advertisement plan of the same advertiser are aggregated into one piece of data to be sent to the corresponding statistical queue. If the "water level" of the corresponding statistical queue continues to increase, the high aggregation state is maintained, but the increase rate of the aggregation coefficient is gradually decreased to prevent the settlement delay from being too large due to the excessive aggregation. As the data sent to the queue is gradually reduced, the "water level" of the corresponding statistical queue is gradually reduced, and the aggregation coefficient is also gradually reduced, wherein the reduction speed is slow first and then fast, so as to prevent the back-and-forth fluctuation of the aggregation coefficient.

According to the embodiment, the aggregation coefficient is dynamically calculated according to the water level of the queue, and the calculation method can be flexibly configured, so that the time delay during processing of a large amount of advertisement log data is controllable, and further, when the amount of the advertisement log data is increased sharply, important precondition guarantee is provided for timely processing of a large amount of advertisement log data.

In a specific application scenario, the system architecture diagram of the online advertisement settlement system B is shown in fig. 5, and the online advertisement settlement system B includes a settlement module, wherein the settlement module is composed of an access sub-module, a preprocessing sub-module, an aggregation sub-module, a judgment sub-module, and a statistics sub-module:

1. an access submodule and a preprocessing submodule: the access sub-module and the preprocessing sub-module jointly comprise M preprocessing routines, wherein the M preprocessing routines are responsible for pulling data of each advertisement log from the Kafka and analyzing each advertisement log to obtain data such as identification information of an advertisement plan, identification information of an advertiser, an advertisement consumption value and the like; the access submodule and the preprocessing submodule consist of a plurality of preprocessing routines, the preprocessing routines work in parallel, are independent of each other and can be matched in number, wherein the number of preprocessing routines is generally less than or equal to the number of Kafka partitions; and sending the analyzed advertisement log data to aggregation queues in different aggregation sub-modules according to the identification information of the advertisers processed by the Hash algorithm.

2. An aggregation submodule and a judgment submodule: the aggregation submodule and the judgment submodule jointly comprise O aggregation queues and O aggregation coroutines, wherein the aggregation queues correspond to the aggregation coroutines one by one, and the O aggregation queues are responsible for receiving and caching the advertisement log data sent by the preprocessing submodule and combining the advertisement log data of the same advertiser so as to reduce the total data volume and further reduce the pressure on the statistical submodule; the O aggregation co-processes work in parallel, are independent of each other and have configurable quantity, wherein the quantity of the aggregation co-processes is generally larger than or equal to the quantity of the preprocessing co-processes commonly included in the access sub-module and the preprocessing sub-module.

3. A statistic submodule: the system comprises P statistical queues and P statistical coroutines, wherein the statistical queues correspond to the statistical coroutines one by one, the P statistical queues are responsible for receiving and caching advertisement log data sent by an aggregation sub-module, are responsible for carrying out statistical addition on advertisement consumption value data of each advertiser according to advertiser dimensions and advertisement plan dimensions, send the statistical advertisement log data to the corresponding statistical coroutines, and update the statistical advertisement log data to a database through the P statistical coroutines; the statistic submodule is composed of a plurality of statistic coroutines, each coroutine works in parallel, and the quantity of the coroutines is independent and can be matched, wherein the quantity of the statistic coroutines is generally larger than or equal to the quantity of the aggregation coroutines of the aggregation submodule.

The embodiment of the present invention provides a system for advertisement settlement, which can implement the method embodiment provided above, and for specific function implementation, reference is made to the description in the method embodiment, which is not repeated herein.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. To those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The various illustrative logical blocks, or elements, described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.

In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of advertisement settlement, comprising:

in each cache queue, sending each aggregated data to a statistical coroutine corresponding to the current cache queue, and writing each aggregated data into a database through the statistical coroutine;

the method further comprises the following steps:

2. The method of claim 1, wherein after determining, according to the parsed advertisement log data, identification information of an advertiser corresponding to each advertisement log data, the method further comprises:

3. The method of claim 1, wherein the sending the advertisement log data to the aggregation queues of the advertisers respectively corresponding to the determined identification information of the advertisers respectively corresponding to the advertisement log data comprises:

4. The method of claim 1, wherein the aggregating the advertisement log data in the aggregation queue of each advertiser in parallel to obtain aggregated data of the advertisement log data of each advertiser comprises:

5. The method of claim 4, wherein determining whether aggregated data of current advertiser ad log data satisfies any of a predetermined plurality of push conditions comprises:

6. The method of claim 5, wherein obtaining the aggregation coefficient corresponding to the aggregation data of the current advertisement plan of the current advertiser comprises:

7. A system for advertisement settlement, comprising:

the preprocessing submodule is used for respectively sending each piece of advertisement log data to the aggregation queue of the corresponding advertiser according to the determined identification information of the advertiser corresponding to each piece of advertisement log data;

the judgment submodule is used for respectively executing the following steps aiming at the aggregated data of the advertisement log data of each advertiser: judging whether the aggregated data of the current advertiser advertisement log data meets any one of a plurality of preset pushing conditions or not according to the aggregated data of the current advertiser advertisement log data and the current time, and sending the aggregated data of the current advertiser advertisement log data to a cache queue corresponding to the current advertiser when any one preset pushing condition is met;

the counting submodule is used for sending each aggregation data to a counting coroutine corresponding to the current cache queue in each cache queue and writing each aggregation data into a database through the counting coroutine;

the system further comprises:

wherein, the access submodule comprises:

8. The system of claim 7,

the first determining unit is further configured to perform data processing on the identification information of each advertiser in each partition in Kafka through a predetermined data processing algorithm according to the identification information of the advertiser corresponding to each determined advertisement log data.

9. System according to claim 7, characterized in that said pre-processing submodule, in particular for

10. The system according to claim 7, wherein the aggregation sub-module is specifically configured to aggregate, in parallel, the advertisement log data of each advertisement plan of each advertiser according to the determined identification information of the advertiser and the identification information of the advertisement plan corresponding to each advertisement log data, so as to obtain aggregated data of each advertisement plan of each advertiser; wherein, the aggregate data of all the advertisement plans of any advertiser form the aggregate data of the advertisement log data of the advertiser;

the aggregation sub-module is further configured to combine the advertisement consumption values of each advertisement plan of the same advertiser according to the aggregation data of each advertisement plan of each advertiser to obtain an aggregation consumption value of each advertisement plan of each advertiser;

11. The system of claim 10, wherein the determining sub-module comprises:

12. The system of claim 11, wherein the determining sub-module further comprises: