CN114187038A

CN114187038A - Attribution scene-based data processing method and device

Info

Publication number: CN114187038A
Application number: CN202111474499.6A
Authority: CN
Inventors: 王壮伟; 李涛
Original assignee: Zhengzhou Apas Digital Cloud Information Technology Co ltd
Current assignee: Zhengzhou Apas Digital Cloud Information Technology Co ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-03-15

Abstract

The application discloses a data processing method and device based on attribution scenes, which are used for solving the problems of low attribution accuracy and low efficiency of advertisement click data at present. The method comprises the following steps: acquiring click data corresponding to the advertisement data, and generating a data identifier corresponding to the click data; according to the business attribute corresponding to the click data, splitting the click data into attribution dimension data and click information; determining a first target database narrow table corresponding to attribution dimension data; determining a second target database narrow table corresponding to the click information; each database narrow table is created based on different time units; and storing the attribution dimension data and the data identification in a first target database narrow table in an associated manner, and storing the click information and the data identification in a second target database narrow table in an associated manner, so that the advertisement attribution party can acquire the click data through different database narrow tables and perform attribution processing. The data processing method based on the technical scheme can improve attribution accuracy and efficiency.

Description

Attribution scene-based data processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus based on attribution scenarios.

Background

Today, the prevalence of the internet, advertisement delivery is becoming indispensable as one of effective channels for acquiring new users. The exact location of the installation and activation by which particular ad click is in the ad placement process is the attribution process. Due to the fact that a series of problems and challenges of large click data amount, high concurrency of click data, availability of a system, performance of the system, multiple attribution dimensions, attribution accuracy and the like are faced, how to solve the problems becomes a problem that each advertising player is headache.

Currently, the general practice in the industry for processing advertisement click data is generally divided into the following: the method comprises the steps that advertisement click data are stored through a traditional relational database, and the traditional relational database has the problems that a large amount of data cannot be stored and a large amount of concurrency cannot be borne; secondly, a new database combining OLAP (On-Line Analytical Processing) and OLTP (On-Line Transaction Processing), such as a TIDB (time-aligned Transaction Processing) database, is adopted, the database can support OLTP and OLAP and can bear concurrency of data, but the TIDB database still has a performance bottleneck, has limited bearing capacity for massive concurrency of large data volume and has the problems that cold data and hot data cannot be stored separately, so that the cold data and the hot data cannot be distinguished when the data is attributed, and further the accuracy of the mass attribution is low; and thirdly, storing the advertisement click data by adopting a big data scheme such as an OLAP (on-line analytical processing) database, wherein the big data scheme has a problem in supporting the effectiveness of multi-condition structural data query, so that the data query result is inaccurate, and the accuracy of data attribution is low. It can be seen that due to several existing data processing approaches used in the scenario, both data attribution accuracy and attribution efficiency are low.

Disclosure of Invention

The embodiment of the application aims to provide a data processing method and device based on attribution scenes, and aims to solve the problems of low attribution accuracy and low efficiency of current advertisement click data.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

in one aspect, an embodiment of the present application provides a data processing method based on attribution scenarios, including:

acquiring click data corresponding to advertisement data, and generating a data identifier corresponding to the click data;

according to the business attribute corresponding to the click data, dividing the click data into attribution dimension data and click information; the service attribute comprises user identification information and/or equipment identification information corresponding to the click data; the click information comprises at least one of click content, timestamp information and the device identification information;

determining a first target database narrow table corresponding to attribution dimension data according to the attribution dimension data, the timestamp information, the number of first databases and the number of first database narrow tables included in each first database; determining a second target database narrow table corresponding to the click information according to the data identification, the timestamp information, the number of second databases and the number of second database narrow tables included in each second database; each database narrow table is created based on different time units;

and storing the attribution dimension data and the data identification in the first narrow target database table in an associated manner, and storing the click information and the data identification in the second narrow target database table in an associated manner, so that the advertisement attribution party can acquire the click data through different narrow database tables and perform attribution processing.

In another aspect, an embodiment of the present application provides an attribution scenario-based data processing apparatus, including:

the acquisition and generation module is used for acquiring click data corresponding to the advertisement data and generating a data identifier corresponding to the click data;

the splitting module is used for splitting the click data into the attribution dimension data and the click information according to the service attribute corresponding to the click data; the service attribute comprises user identification information and/or equipment identification information corresponding to the click data; the click information comprises at least one of click content, timestamp information and the device identification information;

a first determining module, configured to determine, according to the attribution dimension data, the timestamp information, the number of first databases, and the number of first database narrow tables included in each first database, a first target database narrow table corresponding to the attribution dimension data; determining a second target database narrow table corresponding to the click information according to the data identification, the timestamp information, the number of second databases and the number of second database narrow tables included in each second database; each database narrow table is created based on different time units;

and the association storage module is used for storing the attribution dimension data and the data identification into the first target database narrow table in an association manner, and storing the click information and the data identification into the second target database narrow table in an association manner, so that the advertisement attribution party can obtain the click data through different database narrow tables and perform attribution processing.

In yet another aspect, an embodiment of the present application provides an attribution context-based data processing apparatus, including a processor and a memory electrically connected to the processor, where the memory stores a computer program, and the processor is configured to call and execute the computer program from the memory to implement:

In another aspect, an embodiment of the present application provides a storage medium for storing a computer program, where the computer program is executed by a processor to implement the following processes:

By adopting the technical scheme of the embodiment of the application, the click data corresponding to the advertisement data is obtained, the data identification corresponding to the click data is generated, the click data is divided into the attribution dimension data and the click information according to the service attribute corresponding to the click data, so that the first target database narrow table corresponding to the attribution dimension data is determined according to the attribution dimension data, the timestamp information, the number of the first databases and the number of the first database narrow tables included in each first database, and the second target database narrow table corresponding to the click information is determined according to the data identification, the timestamp information, the number of the second databases and the number of the second database narrow tables included in each second database. Each database narrow table is created based on different time units, so that each click data can be stored in the database narrow table corresponding to the corresponding time unit based on the timestamp information of each click data, the effect of separating and storing cold and hot data (namely data in different periods) is realized, and the influence on the whole service and performance of the database in the process of clearing the cold data can be avoided. And the attribution dimension data and the data identification are stored in the first target database narrow table in an associated mode, the click information and the data identification are stored in the second target database narrow table in an associated mode, the storage effect of performing database division and table division on the click data is achieved, the storage of massive click data is supported, massive concurrency of large data volume can be borne, the efficient storage effect of massive click data is achieved, and the situation that the click data is failed to be stored is avoided. In addition, in the technical scheme, the advertisement attribution party can quickly and accurately acquire the click data through different narrow tables of the database and perform attribution processing, so that the searching accuracy and searching speed of the click data are improved, and the accuracy and efficiency of attribution of the advertisement are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic flow chart diagram of a method for attribution context based data processing according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of a method of attribution context based data processing according to an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a method of attribution context based data processing according to another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a method of attributing requests according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of an attribution scenario-based data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of an attribution-scenario-based data processing apparatus according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a data processing method and device based on attribution scenes, which are used for solving the problems of low attribution accuracy and low efficiency of advertisement click data at present.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In one embodiment, the attribution scene-based data processing method provided by the application can be applied to an advertisement attribution scene, wherein the advertisement attribution scene can comprise an attribution scene of commodity purchase, an attribution scene of application installation and activation and the like. The following describes in detail a data processing method based on attribution scenarios, which is provided in an embodiment of the present application, by taking advertisement attribution scenarios as an example.

Fig. 1 is a schematic flow chart of an attribution scenario-based data processing method according to an embodiment of the present application, as shown in fig. 1, the method including:

s102, click data corresponding to the advertisement data are obtained, and data identification corresponding to the click data is generated.

The click data corresponding to the advertisement data can be acquired from the advertisement putting platform used for displaying the advertisement data. Illustratively, the advertising data may be promotional advertisements for applications, promotional advertisements for goods, and the like.

Optionally, because the snowflake algorithm can generate an Identifier with uniqueness in a distributed scenario, a data Identifier corresponding to click data may be generated based on the snowflake algorithm, and the data Identifier may be characterized by a UUID (universal Unique Identifier).

And S104, splitting the click data into the attribution dimension data and the click information according to the service attribute corresponding to the click data.

The service attribute may include user identification information, device identification information, and the like corresponding to the click data. The click information may include click content, timestamp information, device identification information as described above, and the like. Optionally, the Device identification information may include Android ID (Android unique Device Identifier), Android ID _ MD5 (Android unique Device Identifier encrypted by message digest algorithm), OAID (Open Anonymous Device Identifier), OAID _ MD5 (Anonymous Device Identifier encrypted by message digest algorithm), IMEI (International Mobile Equipment Identity), IMEI _ MD5 (International Mobile Equipment Identity encrypted by message digest algorithm), IP (Internet Protocol, which refers to an IP address when the User clicks advertisement data) + UA (User Agent, information for identifying an operating system and version, a browser and version, etc. of the User), and the like. The user identification information may include a communication number of the user. Each user identification information or device identification information may correspond to an attribution dimension.

The service attribute may be a service attribute of an advertisement delivery platform that generates click data, and user identification information and device identification information corresponding to click data that can be acquired by advertisement delivery platforms with different service attributes are the same or different. For example, when the user identification information and the device identification information corresponding to the click data include an Android ID, an OAID, an IMEI _ MD5, and a communication number of the user, assuming that the same advertisement data is delivered to both an advertisement delivery platform a (the advertisement delivery platform a can only obtain the Android ID, the OAID, and the communication number of the user corresponding to the click data according to the service attribute of the advertisement delivery platform a) and an advertisement delivery platform B (the advertisement delivery platform B can only obtain the OAID, the IMEI _ MD5, and the communication number of the user corresponding to the click data according to the service attribute of the advertisement delivery platform B), only the Android ID, the OAID, and the communication number of the user corresponding to the click data can be obtained from the advertisement delivery platform a, and only the OAID, the IMEI _ MD5, and the communication number of the user corresponding to the click data can be obtained from the advertisement delivery platform B. Therefore, according to the service attribute corresponding to the click data, the click data generated by the advertisement delivery platform A can be divided into three attribution dimensions of click information, Android ID, OAID and a communication number of the user, and the click data generated by the advertisement delivery platform B can be divided into three attribution dimensions of click information, OAID, IMEI _ MD5 and the communication number of the user.

S106, determining a first target database narrow table corresponding to attribution dimension data according to the attribution dimension data, the timestamp information, the number of the first databases and the number of first database narrow tables included in each first database; and determining a second target database narrow table corresponding to the click information according to the data identification, the timestamp information, the number of the second databases and the number of second database narrow tables included in each second database.

Wherein each database narrow table is created based on a different time unit. The first database may be a attribution dimension library and the second database may be a click details library.

And S108, storing the attribution dimension data and the data identification into a first target database narrow table in an associated manner, and storing the click information and the data identification into a second target database narrow table in an associated manner, so that the advertisement attribution party can acquire the click data through different database narrow tables and perform attribution processing.

In one embodiment, before determining the first target database narrow table corresponding to the attribution dimension data (i.e., S106) according to the attribution dimension data, the timestamp information, the number of first databases, and the number of first database narrow tables included in each first database, a database and a database narrow table may be created through the following steps a1-a 4:

step A1, obtaining historical click data and historical attribution dimensions corresponding to the historical click data.

The historical click data may be the historical click data corresponding to the advertisement data in S102. When obtaining the historical click data, the historical click data in one attribution period can be obtained according to the attribution period. The attribution period may be one week, two weeks, three weeks, etc.

Step A2, determining the number of the first database and the number of the second database according to the data size corresponding to the historical click data and the dimension number of the historical attribution dimension.

Wherein, the number of the Second databases can be determined according to the data volume corresponding to the historical click data and TPS (Transactions Per Second, system throughput) of a single database. Assuming that the TPS of a single database supports 1 million concurrences, and the historical click data service high point TPS concurrences are 1 million, the number of second databases is 1.

Wherein each historical attribution dimension corresponding to the historical click data can be regarded as a piece of data, and the number of the first databases can be determined by evaluating the number of dimensions of the historical click data and combining the TPS of a single database. Assuming that the TPS of a single database supports 1 million concurrencies and the historical click data service high point TPS concurrencies are 10 ten thousand, the number of the first databases is 10.

Step A3, determining the number of the first database narrow tables and the number of the second database narrow tables according to the data volume corresponding to the historical click data and the data volume storage threshold of each database narrow table.

In this embodiment, the number of the first database narrow tables and the number of the second database narrow tables may be analyzed according to the data size corresponding to the historical click data, the dimension number of the historical attribution dimension, and the data size storage threshold of each database narrow table. Since each database narrow table is created based on different time units, assuming that click data is stored in one time unit for one week, the number of the first database narrow tables and the number of the second database narrow tables are analyzed, that is, how many database narrow tables need to be created every week.

Assuming that there are about 2 hundred million data volumes per day in the historical click data and 10 dimensions attributed to the historical dimension, the data volume corresponding to the historical click data per day is about 20 hundred million, if the historical click data is divided into 10 databases, each database has about 2 hundred million data per day and about 14 hundred million data per week, if the data volume that each database narrow table can store is 5000 ten thousand, about 28 database narrow tables are needed per week, and at least 64 database narrow tables can be set per week in order to support flexible expansion of the database. It should be noted that, since databases are generally expanded by the power N of 2, the power 5 of 2 is 32, and the power 6 of 2 is 64, for example, at least 64 database narrow tables are set every week to make the databases more expandable. In practical applications, at least 32 database narrow tables may be set every week, at least 128 database narrow tables may be set every week, and the present application does not limit this.

Step A4, respectively creating a first database and a second database according to the number, and creating a first database narrow table in the first database and a second database narrow table in the second database.

In the embodiment, the number of the databases and the number of the narrow tables of the databases are estimated according to the historical click data and the historical attribution dimensions, so that the databases and the narrow tables of the databases are created according to the number, the number of the databases and the narrow tables of the databases is more reasonable, the number of the databases and the narrow tables of the databases can be changed along with the continuous updating of the historical click data, the databases and the narrow tables of the databases in corresponding number are created flexibly, and the effect of elastic expansion and contraction is achieved.

In one embodiment, the first database may include a plurality of first sub-databases corresponding to time units, and each of the first sub-databases includes a plurality of first database narrow tables.

Due to the fact that the data size of the click data corresponding to the advertisement data is huge, several thousands of clicks can be achieved every second, and the click data exceeding the attribution period has little significance on attribution, cold and hot separation of the click data is achieved, and the click data exceeding the attribution period can be quickly cleaned under the condition that the performance of the database is not affected. In the attribution process, when the click data is searched according to the activation operation of the advertisement data, the attribution period generally does not exceed seven days, so that dividing the time unit by taking the week as a unit is an effective method for realizing the separation of cold data and hot data. The new click data needs to be stored in the database space of the current week, the click data of the previous week needs to be inquired during attribution, and the click data before the previous week can be emptied at any time, so that the click data needing to be stored for three weeks can be stored in a cold-hot separation mode.

Illustratively, the first database may include first sub-databases corresponding to 3 time units, and following the example in step a3, each of the first sub-databases may include 64 first database narrow tables.

In this embodiment, determining the first target database narrow table corresponding to the attribution dimension data according to the attribution dimension data, the timestamp information, the number of the first databases, and the number of the first database narrow tables included in each first database (i.e., S106) may be specifically performed as the following steps B1-B4:

and step B1, determining attribution dimension representation values corresponding to the attribution dimension data according to the attribution dimension data.

In the attribution process, multiple attribution dimension data such as Android ID, Android ID _ MD5, OAID _ MD5, IMEI _ MD5, IP + UA and the like are matched based on activation information generated by activation operation on advertisement data, so how to realize rapid positioning and searching of data and realize average distribution of data is the important factor in sub-library and sub-table, and when the multiple attribution dimension data are required to use the same sub-library and sub-table strategy, the same conversion or encryption algorithm is required.

Optionally, in this embodiment, a Cyclic Redundancy Check (CRC) 64 algorithm may be used to encrypt each attributed dimension data to obtain a CRC64 value (i.e., an attributed dimension characterization value) corresponding to each attributed dimension data, and the algorithm can ensure that CRC64 values obtained in an application scenario of massive click data are not repeated, so that the processing efficiency is high, and the generated data length is moderate. In practical applications, other transformation or encryption algorithms may also be used, and the present application is not limited thereto.

And step B2, determining a first target database corresponding to the attribution dimension data according to the attribution dimension representation values and the number of the first databases.

The number of the first database can be modulo (i.e. left) according to the attribute dimension characterization value to obtain a modulo result, and the serial number of the first target database corresponding to the attribute dimension data is determined according to the modulo result.

In the case where the number of first databases is 10 due to the dimension characterizing value being the CRC64 value, step B2 may be specifically performed as: and performing modulus operation on 10 according to the CRC64 value, namely CRC 64% 10 to obtain a modulus operation result (one of 0-9), and determining the sequence number of the first target database corresponding to the attribution dimension data according to the modulus operation result.

Similarly, in the case where the number of first databases is 4 due to the dimensional characterization value being the CRC64 value, step B2 may be specifically performed as: and performing modulus operation on 4 according to the CRC64 value, namely CRC 64% 4 to obtain a modulus operation result (one of 0-3), and determining the sequence number of the first target database corresponding to the attribution dimension data according to the modulus operation result.

And step B3, determining a first target time unit corresponding to the timestamp information, and determining that the first sub-database corresponding to the first target time unit is a first target sub-database corresponding to the attribution dimension data.

In this embodiment, since the time units are divided in units of weeks, and the first database includes the first sub-databases corresponding to 3 time units, respectively, the number of milliseconds (i.e., 604800000 milliseconds) of each week can be modulo by the timestamp information to obtain the number of weeks since 1 month and 1 day of 1970, and the number of the first target time unit corresponding to the timestamp information can be determined by modulo 3 by the obtained number of weeks and according to the modulo result (one of 0-2).

And step B4, determining a first target database narrow table in the first target sub database corresponding to the attribution dimension data according to the attribution dimension representation value and the number of the first database narrow tables included in the first target sub database.

The number of the narrow tables of the first database can be modulo according to the attribution dimension characterization values to obtain a modulo result, and the sequence number of the narrow table of the first target database corresponding to the attribution dimension data is determined according to the modulo result.

Following the example in step a3, in the case that the first target sub-database includes 64 first narrow database tables, if the attribute dimension characterization value is a CRC64 value, step B4 may be specifically performed as: and performing modulus operation on 64 according to the CRC64 value, namely CRC 64% 64 to obtain a modulus operation result (one of 0-63), and determining the sequence number of the first target database narrow table corresponding to the attribution dimension data according to the modulus operation result.

In this embodiment, taking as an example that the first database includes first sub-databases corresponding to 3 time units respectively, and each first sub-database includes 64 first database narrow tables, the overall table name of the first database narrow table storing the dimension data may be "click _ $ {0-2} _$ {0-63 }". The design mode is convenient for quickly positioning attribution dimension data in the narrow table of the specific database of the specific time unit, and greatly improves the searching speed of attribution dimension data in the attribution process, thereby improving the attribution efficiency.

In this embodiment, the data identifiers corresponding to the attribution dimension data, the attribution dimension characterization value, and the click data may be used as table fields of the narrow table of the first target database, and the ordinary index may be created by using the attribution dimension characterization value.

In this embodiment, by determining the attribution dimension representation value corresponding to the attribution dimension data, the first target database corresponding to the attribution dimension data is determined according to the attribution dimension representation value and the number of the first databases, and by determining the first target time unit corresponding to the timestamp information, the first sub-database corresponding to the first target time unit is determined to be the first target sub-database corresponding to the attribution dimension data, and according to the attribution dimension representation value and the number of the first database narrow tables included in the first target sub-database, the first target database narrow table in the first target sub-database corresponding to the attribution dimension data is determined, so that an effect of cold and hot data separation storage is achieved, and data distribution in each database and in each database narrow table is more uniform.

In one embodiment, the second database may include a plurality of second sub-databases corresponding to the time unit, and each of the second sub-databases may include a plurality of second database narrow tables.

For the same reason as that of the first database, in order to achieve cold-hot separation of click data stored in the second database, a second sub-database corresponding to 3 time units may be provided in the second database. Following the example in step A3, each second sub-database may include 64 second narrow database tables.

In this embodiment, the second target database narrow table corresponding to the click information is determined according to the data identifier, the timestamp information, the number of the second databases, and the number of the second database narrow tables included in each second database (i.e., S106), and the following steps C1-C3 may be specifically performed:

and step C1, determining a second target database corresponding to the click information according to the data identification and the number of the second databases.

And determining the serial number of the second target database corresponding to the click information according to the modulus result.

Assuming that the data identifications are characterized by UUIDs and the number of second databases is 3, then step C1 may be specifically performed as: and (3) performing modulo operation on the UUID, namely UUID% 3, to obtain a modulo result (one of 0-2), and determining the serial number of the second target database corresponding to the click information according to the modulo result.

And step C2, determining a second target time unit corresponding to the timestamp information, and determining that a second sub-database corresponding to the second target time unit is a second target sub-database corresponding to the click information.

Alternatively, the timestamp information can be calculated reversely according to the data identifier generated based on the snowflake algorithm. In this embodiment, since the time units are divided in units of weeks, and the second database includes the second sub-databases corresponding to 3 time units, respectively, the number of milliseconds (i.e., 604800000 milliseconds) of each week can be modulo according to the timestamp information to obtain the number of weeks since 1 month and 1 day of 1970, and modulo 3 is obtained according to the obtained number of weeks to obtain a modulo result (one of 0-2), and the number of the second target time unit corresponding to the timestamp information is determined according to the modulo result.

And step C3, determining a second target database narrow table in the second target sub database corresponding to the click information according to the data identifier and the number of the second database narrow tables included in the second target sub database.

And determining the serial number of the second target database narrow table corresponding to the click information according to the modulus result.

Following the example in step a3, in the case that the second target sub-database includes 64 second narrow database tables, if the data identifier is characterized by a UUID, step C3 may be specifically executed as: and performing modulus extraction on 64 according to the UUID to obtain a modulus extraction result (one of 0-63), and determining the serial number of the narrow table of the second target database corresponding to the click information according to the modulus extraction result.

In this embodiment, taking as an example that the second database includes the second sub-databases corresponding to 3 time units respectively, and each second sub-database includes 64 second database narrow tables, the overall table name of the second database narrow table storing click information may be "click _ detail _ $ {0-2} $ {0-63 }". The design mode is convenient for quickly positioning the click information in the narrow table of the specific database of the specific time unit, and greatly improves the searching speed of the click information in the attribution process, thereby improving the attribution efficiency.

In this embodiment, the click information and the data identifier corresponding to the click data may be used as a table field of the narrow table of the second target database, and a unique index may be created by using the data identifier corresponding to the click data.

In this embodiment, according to the data identifier corresponding to the click data and the number of the second databases, the second target database corresponding to the click information is determined, the second target time unit corresponding to the timestamp information is determined, and the second sub-database corresponding to the second target time unit is determined as the second target sub-database corresponding to the click information, so that the second target database narrow table in the second target sub-database corresponding to the click information is determined according to the number of the second database narrow tables included in the data identifier and the second target sub-database, an effect of cold and hot data separation storage is achieved, and data distribution in each database and in each database narrow table is more uniform.

In one embodiment, after the attribution dimension data and the data identifier are stored in the first narrow target database table in an associated manner, and the click information and the data identifier are stored in the second narrow target database table in an associated manner (i.e., S108), the target time unit satisfying the preset condition in each time unit may be determined, so as to delete the click data stored in the target sub-database corresponding to the target time unit, so that the space in the target sub-database is released.

Wherein the preset condition may include at least one of: the time length from the current time exceeds the preset time length, and the utilization rate of the click data stored in the sub-database corresponding to the time unit is lower than the preset threshold value.

In the attribution process, when the click data is searched according to the activation operation on the advertisement data, the attribution period generally does not exceed seven days, time unit division by taking week as a unit is an effective method for realizing cold and hot data separation, the click data before the last week can be emptied at any time due to the fact that the click data of the last week needs to be inquired during attribution, and cold and hot separation of the click data can be realized by storing the click data of three weeks in the database. Therefore, the preset time duration may be 2 weeks, and if the time duration from the current time exceeds 2 weeks, it indicates that the click data stored in the sub-database corresponding to the time unit is expired, that is, the part of the click data is cold data.

When the usage rate of the click data stored in the sub-database corresponding to the time unit is lower than a preset threshold, it is indicated that the meaning of the click data stored in the sub-database corresponding to the time unit to the attribution is not large, that is, the demand for the part of the click data is not high in the attribution process, that is, the part of the click data is cold data. In this embodiment, the target time unit meeting the preset condition is a time unit corresponding to the cold data.

Optionally, the narrow tables of the databases can be regularly traversed every week by writing a script, the narrow tables of the databases in the target sub-database corresponding to the cold data are emptied by means of a tune command, and the hard disk space is released, so that the narrow tables of the databases can be recycled.

In the embodiment, the cold data which is not used any more is cleaned regularly, the automatic release of the narrow table space of the database is realized, the purpose of recycling the narrow table of the database is achieved, the cost is greatly saved, and the effect of separating and storing the cold data and the hot data is realized because the click data is stored based on the time unit, so that the whole service and the performance of the database cannot be influenced when the cold data is cleaned regularly.

In one embodiment, after storing the attribution dimension data and the data identification association into the first narrow table of the target database and storing the click information and the data identification association into the second narrow table of the target database (i.e., S108), the attribution processing to the click data may be implemented through the following steps D1-D3:

when an attribution request for the click data is received, a target attribution dimension corresponding to the attribution request is determined, step D1.

Wherein the attribution request is generated based on a user's activation of the advertisement data. The attribution request may carry activation data for the advertisement data, which may include attribution dimension data. And determining a target attribution dimension corresponding to the attribution request, namely splitting the activation data into the target attribution dimensions according to the attribution dimension data contained in the activation data.

And D2, searching the target data identification corresponding to the target attribution dimension from the narrow table of the first target database according to the target attribution dimension.

In this embodiment, the first narrow table of the target database is indexed by the attribution dimension. In performing step D2, the attribute dimension characterization value corresponding to the target attribute dimension may be determined first. Since the attribution dimension data, the attribution dimension characterization value and the data identification corresponding to the click data are table fields of the first target database narrow table, and the attribution dimension characterization value is an index of the first target database narrow table, the target data identification corresponding to the target attribution dimension can be searched from the first target database narrow table based on the attribution dimension characterization value corresponding to the target attribution dimension.

And D3, according to the target data identification, searching the click information corresponding to the target data identification from the narrow table of the second target database so as to perform attribution processing by using the click information in the click data.

In this embodiment, the second narrow target database table uses the data identifier as an index. Because the data identifier corresponding to the click information and the click data is the table field of the narrow table of the second target database, and the data identifier corresponding to the click data is the index of the narrow table of the second target database, the click information corresponding to the target data identifier can be searched from the narrow table of the second target database based on the target data identifier.

In the embodiment, the target attribution dimension corresponding to the attribution request is determined, the target data identification corresponding to the target attribution dimension is searched from the first target database narrow table according to the target attribution dimension, the click information corresponding to the target data identification is searched from the second target database narrow table according to the target data identification, so that the attribution processing is performed by using the click information in the click data, the effect of quickly positioning the relevant database and the specific database narrow table according to the database narrow table index is realized, the data searching accuracy and speed are improved, and the attribution accuracy and efficiency are greatly improved.

In one embodiment, sharing JDBC (library and table open source middleware) may be used to carry the above data processing method based on attribution scenario. According to the data processing method based on the attribution scenes, a delicate database and table dividing method is adopted, database and table dividing is carried out based on CRC64 values of various attribution dimensions and data identification of click data, the effects of separating cold data from hot data and uniformly dividing the data into different database narrow tables are achieved, and the purposes of expanding and shrinking capacity according to business elasticity and sorting out outdated data regularly to support business requirements and save cost are achieved. The method has the advantages of low cost, quick attribution, uniform data distribution, cold and hot data separation, support of elastic expansion and the like, and has profound reference significance in the attribution field.

Illustratively, the attribution dimension characterization value is a CRC64 value, the data identifier corresponding to the click data is characterized by a UUID, the number of the first databases (i.e., the attribution dimension library 23 in fig. 2) is 4, the number of the second databases (i.e., the click details library 24 in fig. 2) is 4, the time unit is created in units of weeks, each attribution dimension library includes an attribution dimension sub-database 25 corresponding to 3 time units respectively, each attribution dimension sub-database includes 64 attribution dimension library narrow tables, each click details library includes a click details sub-database 26 corresponding to 3 time units respectively, and each click details sub-database includes 64 click details library narrow tables.

As shown in FIG. 2, the click data 20 may be split into attribution dimension data 21 and click information 22. Wherein, the attribution dimension data 21 comprises an Android ID and an IMEI, and the click information 22 comprises a data identifier UUID. For the attribution dimension data 21, CRC64 values corresponding to the Android ID and the IMEI may be generated, respectively, a modulus (namely CRC 64% 4) is obtained according to the CRC64 value for the number of the attribution dimension libraries 23, and one of modulus results (0-3) is obtained, and the attribution dimension libraries corresponding to the Android ID and the IMEI, respectively, are determined according to the modulus results; secondly, according to the timestamp information clickTime of the click data, performing modulo operation on 604800000 milliseconds per week, performing modulo operation on 3 again by using the obtained modulo result to obtain a modulo result, and according to the modulo result, determining in which time unit the Android ID and the IMEI are respectively stored in the attribution dimension sub-database 25 corresponding to the time unit; and finally, performing modulus operation on the number of the narrow tables of the attribution dimension library according to the CRC64 value (namely CRC 64% 64) to obtain a modulus operation result, and determining which attribution dimension library narrow table of the corresponding time unit the Android ID and the IMEI are respectively stored in according to the modulus operation result. For the click information 22, firstly, a module is taken for the number of the second databases according to the UUID to obtain a module taking result (one of 0-3), and a click detail library corresponding to the click information is determined according to the module taking result; secondly, according to the timestamp information clickTime of the click data, performing modulo operation on 604800000 milliseconds per week, performing modulo operation on 3 by using the obtained modulo result to obtain a modulo result, and according to the modulo result, determining the click information to be stored in the click detail sub-database 26 corresponding to which time unit; and finally, obtaining a module-obtaining result by taking a module (namely UUID% 64) for the number of the narrow tables of the click detail library according to the UUID, and determining which click detail library narrow table of the corresponding time unit the click information is stored in according to the module-obtaining result.

Fig. 3 is a schematic flow chart of a data processing method based on attribution scenarios according to another embodiment of the present application, as shown in fig. 3, the method including:

s301, click data corresponding to the advertisement data are obtained, and data identification corresponding to the click data is generated.

S302, according to the service attribute corresponding to the click data, the click data is divided into the attribution dimension data and the click information. Then, S303 and S308 are performed, respectively.

The service attribute includes user identification information, device identification information and the like corresponding to the click data. The click information includes click content, time stamp information, device identification information, and the like.

In one embodiment, before performing S303 and S308, the number of the first databases and the number of the second databases may be determined according to the data amount corresponding to the historical click data and the dimension number of the historical attribution dimension, the number of the narrow tables of the first databases and the number of the narrow tables of the second databases are determined according to the data amount corresponding to the historical click data and the data amount storage threshold of each narrow table of the databases, so that the first databases and the second databases are respectively created according to the numbers, the narrow tables of the first databases are created in the first databases, and the narrow tables of the second databases are created in the second databases.

S303, determining attribution dimension characterization values corresponding to attribution dimension data according to the attribution dimension data.

Optionally, the CRC64 algorithm may be used to encrypt each attributed dimension data, so as to obtain a CRC64 value (i.e., an attributed dimension characterization value) corresponding to each attributed dimension data.

S304, determining a first target database corresponding to the attribution dimension data according to the attribution dimension representation values and the number of the first databases.

Wherein the first database may be an attribution dimension library. The first database may include a plurality of first sub-databases corresponding to the time unit, and each of the first sub-databases may include a plurality of first database narrow tables.

S305, determining a first target time unit corresponding to the timestamp information, and determining that a first sub-database corresponding to the first target time unit is a first target sub-database corresponding to the attribution dimension data.

S306, according to the attribution dimension representation value and the number of the first database narrow tables included in the first target sub-database, determining a first target database narrow table in the first target sub-database corresponding to the attribution dimension data.

S307, the attribution dimension data and the data identification are stored in the narrow table of the first target database in an associated mode.

And S308, determining a second target database corresponding to the click information according to the data identification and the number of the second databases.

Wherein the second database may be a click details repository. The second database may include a plurality of second sub-databases corresponding to the time unit, and each of the second sub-databases may include a plurality of second database narrow tables.

S309, a second target time unit corresponding to the timestamp information is determined, and a second sub-database corresponding to the second target time unit is determined to be a second target sub-database corresponding to the click information.

S310, according to the data identification and the number of the second database narrow tables included in the second target sub-database, determining a second target database narrow table in the second target sub-database corresponding to the click information.

S311, storing the click information and the data identification in a narrow table of a second target database in an associated manner.

In one embodiment, after performing S307 and S311, a target time unit satisfying a preset condition in each time unit may be determined, so as to delete click data stored in a target sub-database corresponding to the target time unit, so that a space in the target sub-database is released.

The specific processes of S301 to S311 are described in detail in the above embodiments, and are not described herein again.

Fig. 4 is a schematic flow chart of a method for processing attribution requests according to an embodiment of the present application, as shown in fig. 4, the method comprising:

s401, when an attribution request for click data is received, determining a target attribution dimension corresponding to the attribution request.

Wherein the attribution request is generated based on a user's activation of the advertisement data.

S402, searching a target data identifier corresponding to the target attribution dimension from the narrow table of the first target database according to the target attribution dimension.

Wherein the first target database narrow table is indexed by attribution dimension.

And S403, according to the target data identifier, searching click information corresponding to the target data identifier from a narrow table of a second target database so as to perform attribution processing by using the click information in the click data.

And the narrow table of the second target database takes the data identification as an index.

The specific processes of S401 to S403 are described in detail in the above embodiments, and are not described herein again.

By adopting the technical scheme of the embodiment of the application, the target attribution dimension corresponding to the attribution request is determined, the target data identification corresponding to the target attribution dimension is searched from the first target database narrow table according to the target attribution dimension, the click information corresponding to the target data identification is searched from the second target database narrow table according to the target data identification, so that the attribution processing is carried out by utilizing the click information in the click data, the effect of quickly positioning the relevant database and the specific database narrow table according to the database narrow table index is realized, the data searching accuracy and speed are improved, and the attribution accuracy and efficiency are greatly improved.

In summary, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

Based on the same idea, the data processing method based on the attribution scene provided by the embodiment of the present application further provides a data processing device based on the attribution scene.

Fig. 5 is a schematic block diagram of an attribution scenario-based data processing apparatus according to an embodiment of the present application, and as shown in fig. 5, the attribution scenario-based data processing apparatus may include:

an obtaining and generating module 510, configured to obtain click data corresponding to the advertisement data, and generate a data identifier corresponding to the click data;

the splitting module 520 is configured to split the click data into the attributed dimension data and the click information according to the service attribute corresponding to the click data; the service attribute comprises user identification information and/or equipment identification information corresponding to the click data; the click information comprises at least one item of click content, timestamp information and equipment identification information;

a first determining module 530, configured to determine, according to the attribution dimension data, the timestamp information, the number of the first databases, and the number of the first database narrow tables included in each first database, a first target database narrow table corresponding to the attribution dimension data; determining a second target database narrow table corresponding to the click information according to the data identification, the timestamp information, the number of the second databases and the number of second database narrow tables included in each second database; each database narrow table is created based on different time units;

and the association storage module 540 is configured to store the attribution dimension data and the data identifier in an association manner in the first target database narrow table, and store the click information and the data identifier in an association manner in the second target database narrow table, so that the advertisement attribution party obtains click data through different database narrow tables and performs attribution processing.

In one embodiment, the first database includes a first sub-database corresponding to a plurality of time units; each first sub-database comprises a plurality of first database narrow tables;

the first determining module 530 includes:

the first determining unit is used for determining attribution dimension representation values corresponding to attribution dimension data according to the attribution dimension data;

the second determining unit is used for determining the first target database corresponding to the attribution dimension data according to the attribution dimension representation values and the number of the first databases;

the third determining unit is used for determining a first target time unit corresponding to the timestamp information and determining that a first sub-database corresponding to the first target time unit is a first target sub-database corresponding to the attribution dimension data;

and the fourth determining unit is used for determining the first target database narrow table in the first target sub-database corresponding to the attribution dimension data according to the attribution dimension representation value and the number of the first database narrow tables included in the first target sub-database.

In one embodiment, the second database includes a plurality of second sub-databases corresponding to time units; each second sub-database comprises a plurality of second database narrow tables;

the first determining module 530 includes:

the fifth determining unit is used for determining a second target database corresponding to the click information according to the data identification and the number of the second databases;

a sixth determining unit, configured to determine a second target time unit corresponding to the timestamp information, and determine that a second sub-database corresponding to the second target time unit is a second target sub-database corresponding to the click information;

and the seventh determining unit is used for determining the second target database narrow table in the second target sub-database corresponding to the click information according to the data identifier and the number of the second database narrow tables included in the second target sub-database.

In one embodiment, the attribution scenario-based data processing apparatus further comprises:

the second determining module is used for determining a target time unit meeting the preset condition in each time unit; the preset conditions include at least one of the following: the time length from the current time exceeds the preset time length, and the utilization rate of the click data stored in the sub-database corresponding to the time unit is lower than a preset threshold value;

and the deleting module is used for deleting the click data stored in the target sub-database corresponding to the target time unit so as to release the space in the target sub-database.

the third determination module is used for determining a target attribution dimension corresponding to the attribution request when the attribution request for the click data is received; attribution requests are generated based on user activation of advertisement data;

the first searching module is used for searching a target data identifier corresponding to the target attribution dimension from a narrow table of a first target database according to the target attribution dimension; the narrow table of the first target database takes attribution dimension as an index;

the second searching module is used for searching click information corresponding to the target data identifier from a narrow table of a second target database according to the target data identifier so as to perform attribution processing by using the click information in the click data; the second narrow table of the target database takes the data identification as an index.

the acquisition module is used for acquiring historical click data and historical attribution dimensions corresponding to the historical click data;

the fourth determining module is used for determining the number of the first databases and the number of the second databases according to the data size corresponding to the historical click data and the dimension number of the historical attribution dimensions;

the fifth determining module is used for determining the number of the first database narrow tables and the number of the second database narrow tables according to the data volume corresponding to the historical click data and the data volume storage threshold value of each database narrow table;

the creating module is used for respectively creating a first database and a second database according to the quantity, creating a first database narrow table in the first database, and creating a second database narrow table in the second database.

By adopting the device, the click data corresponding to the advertisement data is obtained, the data identification corresponding to the click data is generated, the click data is divided into the attribution dimension data and the click information according to the service attribute corresponding to the click data, the first target database narrow table corresponding to the attribution dimension data is determined according to the attribution dimension data, the timestamp information, the number of the first databases and the number of the first database narrow tables included in each first database, and the second target database narrow table corresponding to the click information is determined according to the data identification, the timestamp information, the number of the second databases and the number of the second database narrow tables included in each second database. Each database narrow table is created based on different time units, so that each click data can be stored in the database narrow table corresponding to the corresponding time unit based on the timestamp information of each click data, the effect of separating and storing cold and hot data (namely data in different periods) is realized, and the influence on the whole service and performance of the database in the process of clearing the cold data can be avoided. And the attribution dimension data and the data identification are stored in the first target database narrow table in an associated mode, the click information and the data identification are stored in the second target database narrow table in an associated mode, the storage effect of performing database division and table division on the click data is achieved, the storage of massive click data is supported, massive concurrency of large data volume can be borne, the efficient storage effect of massive click data is achieved, and the situation that the click data is failed to be stored is avoided. In addition, the advertisement attribution party in the device can quickly and accurately acquire the click data through different narrow tables of the database and perform attribution processing, so that the searching accuracy and searching speed of the click data are improved, and the accuracy and efficiency of attribution of the advertisement are improved.

It should be understood by those skilled in the art that the attribution context based data processing apparatus in fig. 5 can be used to implement the aforementioned attribution context based data processing method, wherein the detailed description thereof should be similar to the above method partial description, and in order to avoid the complexity, the detailed description thereof is omitted here.

Based on the same idea, the embodiment of the present application further provides a data processing device based on attribution scenario, as shown in fig. 6. The data processing apparatus based on attribution of scenes may generate a large difference due to different configurations or performances, and may include one or more processors 601 and a memory 602, and one or more stored applications or data may be stored in the memory 602. Wherein the memory 602 may be transient or persistent storage. The application program stored in memory 602 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for an attribution context-based data processing device. Still further, the processor 601 may be arranged in communication with the memory 602 to execute a series of computer executable instructions in the memory 602 on a attribution based data processing device. The attribution scenario-based data processing apparatus may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more input-output interfaces 605, one or more keyboards 606.

In particular, in the present embodiment, the attribution context based data processing apparatus comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instructions for the attribution context based data processing apparatus, and the one or more programs configured to be executed by the one or more processors comprise computer-executable instructions for:

acquiring click data corresponding to the advertisement data, and generating a data identifier corresponding to the click data;

according to the business attribute corresponding to the click data, splitting the click data into attribution dimension data and click information; the service attribute comprises user identification information and/or equipment identification information corresponding to the click data; the click information comprises at least one item of click content, timestamp information and equipment identification information;

determining a first target database narrow table corresponding to attribution dimension data according to the attribution dimension data, the timestamp information, the number of the first databases and the number of first database narrow tables included in each first database; determining a second target database narrow table corresponding to the click information according to the data identification, the timestamp information, the number of the second databases and the number of second database narrow tables included in each second database; each database narrow table is created based on different time units;

and storing the attribution dimension data and the data identification in a first target database narrow table in an associated manner, and storing the click information and the data identification in a second target database narrow table in an associated manner, so that the advertisement attribution party can acquire the click data through different database narrow tables and perform attribution processing.

Optionally, the first database includes a first sub-database corresponding to a plurality of time units; each first sub-database comprises a plurality of first database narrow tables; the computer executable instructions, when executed, may further cause the processor to:

determining attribution dimension characterization values corresponding to attribution dimension data according to the attribution dimension data;

determining a first target database corresponding to attribution dimension data according to the attribution dimension representation values and the number of the first databases;

determining a first target time unit corresponding to the timestamp information, and determining a first sub-database corresponding to the first target time unit as a first target sub-database corresponding to attribution dimension data;

and determining a first target database narrow table in the first target sub database corresponding to the attribution dimension data according to the attribution dimension characterization value and the number of the first database narrow tables included in the first target sub database.

Optionally, the second database includes a second sub-database corresponding to a plurality of time units; each second sub-database comprises a plurality of second database narrow tables; the computer executable instructions, when executed, may further cause the processor to:

determining a second target database corresponding to the click information according to the data identification and the number of the second databases;

determining a second target time unit corresponding to the timestamp information, and determining a second sub-database corresponding to the second target time unit as a second target sub-database corresponding to the click information;

and determining a second target database narrow table in the second target sub database corresponding to the click information according to the data identifier and the number of the second database narrow tables included in the second target sub database.

Optionally, the computer executable instructions, when executed, may further cause the processor to:

determining a target time unit meeting a preset condition in each time unit; the preset conditions include at least one of the following: the time length from the current time exceeds the preset time length, and the utilization rate of the click data stored in the sub-database corresponding to the time unit is lower than a preset threshold value; and deleting click data stored in the target sub-database corresponding to the target time unit so as to release the space in the target sub-database.

when an attribution request for click data is received, determining a target attribution dimension corresponding to the attribution request; attribution requests are generated based on user activation of advertisement data;

according to the target attribution dimension, searching a target data identifier corresponding to the target attribution dimension from a narrow table of a first target database; the narrow table of the first target database takes attribution dimension as an index;

according to the target data identification, searching click information corresponding to the target data identification from a narrow table of a second target database so as to perform attribution processing by using the click information in the click data; the second narrow table of the target database takes the data identification as an index.

acquiring historical click data and historical attribution dimensions corresponding to the historical click data;

determining the number of the first databases and the number of the second databases according to the data size corresponding to the historical click data and the dimension number of the historical attribution dimensions;

determining the number of first database narrow tables and the number of second database narrow tables according to the data volume corresponding to the historical click data and the data volume storage threshold value of each database narrow table;

and respectively creating a first database and a second database according to the quantity, creating a first database narrow table in the first database, and creating a second database narrow table in the second database.

By adopting the device, the click data corresponding to the advertisement data is obtained, the data identifier corresponding to the click data is generated, the click data is divided into the attribution dimension data and the click information according to the service attribute corresponding to the click data, the first target database narrow table corresponding to the attribution dimension data is determined according to the attribution dimension data, the timestamp information, the number of the first databases and the number of the first database narrow tables included in each first database, and the second target database narrow table corresponding to the click information is determined according to the data identifier, the timestamp information, the number of the second databases and the number of the second database narrow tables included in each second database. Each database narrow table is created based on different time units, so that each click data can be stored in the database narrow table corresponding to the corresponding time unit based on the timestamp information of each click data, the effect of separating and storing cold and hot data (namely data in different periods) is realized, and the influence on the whole service and performance of the database in the process of clearing the cold data can be avoided. And the attribution dimension data and the data identification are stored in the first target database narrow table in an associated mode, the click information and the data identification are stored in the second target database narrow table in an associated mode, the storage effect of performing database division and table division on the click data is achieved, the storage of massive click data is supported, massive concurrency of large data volume can be borne, the efficient storage effect of massive click data is achieved, and the situation that the click data is failed to be stored is avoided. In addition, the advertisement attribution party in the equipment can quickly and accurately acquire the click data through different narrow tables of the database and perform attribution processing, so that the searching accuracy and searching speed of the click data are improved, and the accuracy and efficiency of attribution of the advertisement are improved.

An embodiment of the present application further provides a storage medium storing one or more computer programs, where the one or more computer programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the data processing method based on attribution scenarios, and are specifically configured to perform:

Optionally, the first database includes a first sub-database corresponding to a plurality of time units; each first sub-database comprises a plurality of first database narrow tables; the instructions, when executed by an electronic device comprising a plurality of application programs, are further capable of causing the electronic device to perform:

Optionally, the second database includes a second sub-database corresponding to a plurality of time units; each second sub-database comprises a plurality of second database narrow tables; the instructions, when executed by an electronic device comprising a plurality of application programs, are further capable of causing the electronic device to perform:

Optionally, the instructions, when executed by an electronic device comprising a plurality of application programs, are further capable of causing the electronic device to perform:

determining a target time unit meeting a preset condition in each time unit; the preset conditions include at least one of the following: the time length from the current time exceeds the preset time length, and the utilization rate of the click data stored in the sub-database corresponding to the time unit is lower than a preset threshold value;

and deleting click data stored in the target sub-database corresponding to the target time unit so as to release the space in the target sub-database.

By adopting the storage medium of the embodiment of the application, the click data corresponding to the advertisement data is obtained, the data identifier corresponding to the click data is generated, the click data is divided into the attribution dimension data and the click information according to the service attribute corresponding to the click data, so that the first target database narrow table corresponding to the attribution dimension data is determined according to the attribution dimension data, the timestamp information, the number of the first databases and the number of the first database narrow tables included in each first database, and the second target database narrow table corresponding to the click information is determined according to the data identifier, the timestamp information, the number of the second databases and the number of the second database narrow tables included in each second database. Each database narrow table is created based on different time units, so that each click data can be stored in the database narrow table corresponding to the corresponding time unit based on the timestamp information of each click data, the effect of separating and storing cold and hot data (namely data in different periods) is realized, and the influence on the whole service and performance of the database in the process of clearing the cold data can be avoided. And the attribution dimension data and the data identification are stored in the first target database narrow table in an associated mode, the click information and the data identification are stored in the second target database narrow table in an associated mode, the storage effect of performing database division and table division on the click data is achieved, the storage of massive click data is supported, massive concurrency of large data volume can be borne, the efficient storage effect of massive click data is achieved, and the situation that the click data is failed to be stored is avoided. In addition, the advertisement attribution party in the storage medium can quickly and accurately acquire the click data through different narrow tables of the database and perform attribution processing, so that the searching accuracy and searching speed of the click data are improved, and the accuracy and efficiency of attribution of the advertisement are improved.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A data processing method based on attribution scenes is characterized by comprising the following steps:

2. The method of claim 1, wherein the first database comprises a first sub-database corresponding to a plurality of the time units; each first sub-database comprises a plurality of first database narrow tables;

determining a first target database narrow table corresponding to attribution dimension data according to the attribution dimension data, the timestamp information, the number of first databases and the number of first database narrow tables included in each first database, including:

determining attribution dimension characterization values corresponding to the attribution dimension data according to the attribution dimension data;

determining a first target database corresponding to the attribution dimension data according to the attribution dimension characterization values and the number of the first databases;

determining a first target time unit corresponding to the timestamp information, and determining that the first sub-database corresponding to the first target time unit is a first target sub-database corresponding to the attribution dimension data;

determining the first target database narrow table in the first target sub-database corresponding to the attribution dimension data according to the attribution dimension characterization value and the number of the first database narrow tables included in the first target sub-database.

3. The method of claim 1, wherein the second database comprises a plurality of second sub-databases corresponding to the time units; each second sub-database comprises a plurality of second database narrow tables;

determining a second target database narrow table corresponding to the click information according to the data identifier, the timestamp information, the number of second databases, and the number of second database narrow tables included in each second database, includes:

determining a second target time unit corresponding to the timestamp information, and determining that a second sub-database corresponding to the second target time unit is a second target sub-database corresponding to the click information;

and determining the second target database narrow table in the second target sub-database corresponding to the click information according to the data identifier and the number of the second database narrow tables included in the second target sub-database.

4. The method of claim 2 or 3, wherein after storing the attribution dimension data and the data identification association in the first target database narrow table, and storing the click information and the data identification association in the second target database narrow table, the method further comprises:

determining a target time unit meeting a preset condition in each time unit; the preset condition comprises at least one of the following conditions: the time length from the current time exceeds the preset time length, and the utilization rate of the click data stored in the sub-database corresponding to the time unit is lower than the preset threshold value;

deleting the click data stored in a target sub-database corresponding to the target time unit so as to release the space in the target sub-database.

5. The method of claim 1, wherein after storing the attribution dimension data and the data identification association in the first target database narrow table, and storing the click information and the data identification association in the second target database narrow table, the method further comprises:

when an attribution request for the click data is received, determining a target attribution dimension corresponding to the attribution request; the attribution request is generated based on an activation operation of the advertisement data by a user;

according to the target attribution dimension, searching a target data identifier corresponding to the target attribution dimension from the first target database narrow table; the first target database narrow table takes the attribution dimension as an index;

according to the target data identification, searching the click information corresponding to the target data identification from the narrow table of the second target database so as to perform attribution processing by utilizing the click information in the click data; the second target database narrow table takes the data identification as an index.

6. The method of claim 1, wherein before determining the first target database narrow table corresponding to the attribution dimension data according to the attribution dimension data, the timestamp information, the number of first databases, and the number of first database narrow tables included in each first database, the method further comprises:

obtaining historical click data and historical attribution dimensions corresponding to the historical click data;

determining the number of the first database narrow tables and the number of the second database narrow tables according to the data volume corresponding to the historical click data and the data volume storage threshold of each database narrow table;

and respectively creating the first database and the second database according to the quantity, creating the first database narrow table in the first database, and creating the second database narrow table in the second database.

7. An attribution scenario-based data processing apparatus, comprising:

8. The apparatus of claim 7, wherein the first database comprises a first sub-database corresponding to a plurality of the time units; each first sub-database comprises a plurality of first database narrow tables;

the first determining module includes:

a first determining unit, configured to determine, according to the attribution dimension data, an attribution dimension characterization value corresponding to the attribution dimension data;

a second determining unit, configured to determine, according to the attribution dimension characterization value and the number of the first databases, a first target database corresponding to the attribution dimension data;

a third determining unit, configured to determine a first target time unit corresponding to the timestamp information, and determine that the first sub-database corresponding to the first target time unit is a first target sub-database corresponding to the attribution dimension data;

a fourth determining unit, configured to determine, according to the attribution dimension characterization value and the number of the first database narrow tables included in the first target sub-database, the first target database narrow table in the first target sub-database corresponding to the attribution dimension data.

9. An attribution scenario-based data processing apparatus, comprising a processor and a memory electrically connected to the processor, the memory storing a computer program, the processor being configured to invoke and execute the computer program from the memory to implement:

10. A storage medium for storing a computer program which, when executed by a processor, implements the following: