CN112418896A - Data mining method and device, storage medium and electronic equipment - Google Patents

Data mining method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112418896A
CN112418896A CN201910770384.8A CN201910770384A CN112418896A CN 112418896 A CN112418896 A CN 112418896A CN 201910770384 A CN201910770384 A CN 201910770384A CN 112418896 A CN112418896 A CN 112418896A
Authority
CN
China
Prior art keywords
users
proportion
sample data
predicted
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910770384.8A
Other languages
Chinese (zh)
Inventor
吴充
陈玉萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201910770384.8A priority Critical patent/CN112418896A/en
Publication of CN112418896A publication Critical patent/CN112418896A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Abstract

The present disclosure relates to a data mining method, apparatus, storage medium, and electronic device, the method comprising: acquiring recorded data and sample data of released content; respectively determining the number of the predicted users corresponding to each type of users in the record data and the number of the predicted operations corresponding to each type of users on the released content according to the number ratio of each type of users in the sample data and the ratio of the operation times of each type of users on the released content; if the number of the predicted users is larger than the number of the predicted operation times, correcting the operation time ratio of each type of users to the released content in the sample data according to the number ratio of each type of users in the sample data and the recorded data to obtain a correction ratio; and determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion, wherein the number of the predicted users is less than or equal to the target prediction operation times.

Description

Data mining method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of information technology, and in particular, to a data mining method, apparatus, storage medium, and electronic device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims and, therefore, the description herein is not admitted to be prior art by inclusion in this section.
In the process of monitoring the advertisement putting effect, in addition to data such as the exposure, click rate and independent exposure of the put advertisement, the advertisement putting party often focuses on the composition proportion of users browsing the advertisement, so as to perform more accurate activity planning aiming at the target group.
In the related art, when the composition ratio of the user browsing the advertisement is calculated, a calculation result with a large error may occur, thereby misleading the subsequent planning of the advertisement delivery party.
Disclosure of Invention
The present disclosure is directed to a data mining method, device, storage medium and electronic apparatus, so as to solve the above technical problems.
According to a first aspect of the embodiments of the present disclosure, there is provided a data mining method, the method including:
acquiring record data of released content and sample data of the released content, wherein the record data comprises the total times of operating the released content by a user and the total number of the users operating the released content, and the sample data comprises characteristic information of each user and the times of operating the released content by the user;
respectively determining the number of the predicted users corresponding to each type of users in the recorded data and the number of the predicted operations corresponding to each type of users on the released content according to the proportion of the number of each type of users in the sample data and the proportion of the operation times of each type of users on the released content in the sample data;
if the number of the predicted users is larger than the number of the predicted operation times, correcting the proportion of the number of the operation times of each type of users on the released content in the sample data according to the proportion of the number of each type of users in the sample data and the recorded data to obtain a corrected proportion;
and determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion, wherein the number of the predicted users is less than or equal to the target prediction operation times.
Optionally, the modifying, according to the ratio of the number of each type of user in the sample data and the record data, the ratio of the number of operations of each type of user on the delivered content in the sample data includes:
and correcting the proportion of the operation times of each type of users on the released content in the sample data by adopting the following formula:
Figure BDA0002173369310000021
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
Optionally, the determining, according to the total number of times and the correction ratio, the target prediction operation number of each type of user on the delivered content in the total number of times includes:
and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
Optionally, the log data includes a browsing volume and a browsing number of the delivered advertisement, and the sample data includes gender information of each user and a number of times that the user browses the advertisement, the method includes:
respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data;
if the number of the predicted users is larger than the predicted browsing times, correcting the proportion of the browsing times of the corresponding gender users to the advertisement in the sample data according to the proportion of the number of the users of each gender in the sample data and the recorded data to obtain a corrected proportion;
and determining the target predicted browsing times of the users of all genders in the browsing amount to the delivered content according to the browsing amount and the correction proportion, wherein the predicted user number is less than or equal to the target predicted browsing times for the users of the same gender.
According to a second aspect of the embodiments of the present disclosure, there is provided a data mining apparatus including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring recorded data of released contents, and the recorded data comprises the total times of the users operating the released contents and the total number of the users operating the released contents;
the second acquisition module is used for acquiring sample data of the released content, wherein the sample data comprises the characteristic information of each user and the times of the user operating the released content;
the first determining module is used for respectively determining the number of the predicted users corresponding to each type of users in the recorded data and the number of the predicted operations corresponding to each type of users on the released content according to the proportion of the number of each type of users in the sample data and the proportion of the operation times of each type of users on the released content in the sample data;
the calculation module is used for correcting the proportion of the operation times of each type of users to the released content in the sample data according to the proportion of the quantity of each type of users in the sample data and the recorded data when the quantity of the predicted users is larger than the predicted operation times to obtain a corrected proportion;
and the second determining module is used for determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion, wherein the number of the predicted users is less than or equal to the target prediction operation times.
Optionally, the calculation module corrects the proportion of the operation times of each type of user on the released content in the sample data by using the following formula:
Figure BDA0002173369310000041
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
Optionally, the second determining module is configured to:
and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
Optionally, the recorded data includes a browsing volume and a browsing number of the delivered advertisement, the sample data includes gender information of each user and a number of times that the user browses the advertisement, and the first determining module is configured to:
respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data;
the calculation module is used for correcting the proportion of the browsing times of the advertisement by the users with the corresponding gender in the sample data according to the proportion of the quantity of the users with the respective gender in the sample data and the recorded data when the quantity of the predicted users is larger than the predicted browsing times to obtain a corrected proportion;
and the second determining module is used for determining the target predicted browsing times of the users of all genders in the browsing volume on the delivered content according to the browsing volume and the correction proportion, wherein the predicted user number is less than or equal to the target predicted browsing times for the users of the same gender.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the data mining method provided by the first aspect of the present disclosure.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the data mining method provided by the first aspect of the present disclosure.
By adopting the technical scheme, the method at least has the following beneficial effects:
when the number of various users and the operation times corresponding to various users in the recorded data are predicted according to the number distribution of various users in the sample data and the distribution of the operation times of the various users on the released content, if the number of the users corresponding to various types in the calculation result is greater than the operation times corresponding to the users, the ratio of the operation times of each type of users in the sample data can be corrected according to the ratio of the number of each type of users in the sample data and the recorded data, and a correction ratio is obtained. And finally, determining the operation times of each type of users on the released content in the record data according to the corrected proportion and the total times of the users on the released content in the record data, wherein the operation times of each type of users on the released content in the record data are more than or equal to the number of the types of users. Therefore, the phenomenon that the number of users is larger than the operation times in the calculation result of the related technology is avoided, and more reasonable data reference is provided for the content delivery party.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow chart of a method of data mining, shown in an exemplary embodiment of the present disclosure.
FIG. 2 is a flow chart illustrating another method of data mining according to an exemplary embodiment of the present disclosure.
Fig. 3 is a block diagram of a data mining device according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Before introducing the data mining method, apparatus, storage medium, and electronic device provided by the present disclosure, an application scenario and related terms of the present disclosure are first introduced. In the present disclosure, the "exposure amount" refers to the number of times the delivered advertisement is viewed, unless otherwise specified; "independent exposure" refers to the number of people who view the advertisement.
At present, in the process of monitoring the content delivery effect, in addition to data such as the browsing volume and the click volume of delivered content, a content delivery party often focuses on the crowd distribution situation of users who browse advertisements, so as to purposefully plan subsequent related activities.
For example, when monitoring the advertisement delivery effect, the advertisement delivery party pays attention to the exposure, click rate, independent exposure and other data of the delivered advertisement, and also pays attention to the proportion of target audiences in users who view or click the advertisement and the distribution of the crowd characteristics (such as sex, age, occupation, academic calendar and the like) of the target audiences, so that subsequent marketing activities can be performed in a targeted and more accurate manner.
For example, when the exposure of the delivered advertisement and the occupation ratio of various types of users in the independent exposure are determined, the distribution of various types of users in the independent exposure can be determined according to the occupation ratio of various types of users in the user sample by acquiring the user sample.
For example, in the implementation, the total exposure of a certain advertisement is 100, the total independent exposure is 60, and the obtained user samples are shown in table 1:
Figure BDA0002173369310000061
Figure BDA0002173369310000071
TABLE 1
As can be seen from the user sample, in the user sample, the male exposure accounts for 80%, and the male independent exposure accounts for 75%. In this way, the distribution of the users with different genders in the total exposure and the total independent exposure can be determined according to the exposure of the users with different genders in the user sample and the ratio of the independent exposures. Of the calculated total exposure, the male exposure is as follows:
the total exposure is multiplied by the exposure of the male in the user sample, which is 100 times multiplied by 80 times;
similarly, of the total independent exposures, the male independent exposure is:
the total independent exposure × the independent exposure for men in the user sample is 60 × 75%, 45 people.
In one possible implementation scenario, the total exposure is 60, the total independent exposure is 50, and the obtained user samples are shown in table 2:
independent exposure (human) Sample 1 (Man) Sample 2 (woman)
Exposure (second) 9 1
TABLE 2
Similarly, of the total exposure amount calculated by the above method, the female exposure amount is 6, and of the total independent exposure amounts, the female independent exposure amount is 25, that is, the female independent exposure amount > female exposure amount.
That is, when the total exposure and the distribution of various types of users in the independent exposure are determined through the user sample in the above manner, a calculation result that the independent exposure of a certain type of user group is greater than the exposure may occur, and an erroneous data reference is provided for an advertisement delivery party.
In order to solve the above technical problem, the present disclosure provides a data mining method, which is described with reference to a flowchart of the data mining method shown in fig. 1, and the method includes:
and S11, acquiring the record data of the delivered content and the sample data of the delivered content.
The released content may be presented in the form of text, video, voice, etc., and the operation performed by the user on the released content may include browsing, commenting, clicking, forwarding, etc. The recorded data comprises the total operation times of the users on the released content and the total number of the users operating the released content, and the sample data comprises the characteristic information of each user and the operation times of each user on the released content.
For example, the log data may include a total number of clicks by the user on the placed advertisement and a total number of clicks by the user on the advertisement. In an embodiment, the click volume of the advertisement can be counted by setting a corresponding statistical field in a database, monitoring a corresponding event (e.g., an onclick event) on a delivery page of the advertisement, and after the event occurs, transmitting corresponding click information to a PHP (Hypertext Preprocessor) through JavaScript, so that the statistical field is changed.
In another embodiment, after the recorded data is acquired, matching may be performed on the user according to the recorded data, and a set of user data whose feature information and click times are successfully matched is used as sample data. In this way, in the sample data after successful matching, each user corresponds to the determined characteristic information and the number of times of operating the released content.
And S12, respectively determining the predicted user number corresponding to each type of user in the recorded data and the predicted operation times corresponding to each type of user on the released content according to the proportion of the number of each type of user in the sample data and the proportion of the operation times of each type of user on the released content in the sample data.
And S13, if the number of the predicted users is larger than the number of the predicted operation times, correcting the proportion of the operation times of each type of users on the released content in the sample data according to the proportion of the number of each type of users in the sample data and the recorded data to obtain a corrected proportion.
And S14, determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion.
Wherein the predicted number of users is less than or equal to the target number of prediction operations. For example, the product of the number of users in each class having corresponding characteristic information in the sample data and the total number of users operating on the released content in the log data may be used as the number of users having corresponding characteristic information in the log data.
By adopting the technical scheme, when the number of various users and the operation times corresponding to various users in the record data are predicted according to the number distribution of various users in the sample data and the distribution of the operation times of the sample data to the released content, if the number of the users corresponding to various types in the calculation result is greater than the operation times corresponding to the users, the ratio of the operation times of each type of users in the sample data can be corrected according to the number ratio of each type of users in the sample data and the record data, and a correction ratio is obtained. And finally, determining the operation times of each type of users on the released content in the record data according to the corrected proportion and the total times of the users on the released content in the record data, wherein the operation times of each type of users on the released content in the record data are more than or equal to the number of the types of users. Therefore, the phenomenon that the number of users is larger than the operation times in the calculation result of the related technology is avoided, and more reasonable data reference is provided for the content delivery party.
In one possible implementation, the step S13 includes:
and correcting the proportion of the operation times of each type of users on the released content in the sample data by adopting the following formula:
Figure BDA0002173369310000091
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
Illustratively, the total number of times that the users operate the released content in the recorded data is N, and the total number of the users operating the released content is M, where M is less than or equal to N. The sample data comprises first characteristic information andtwo types of users of the second characteristic information, the number ratio of which is U respectively1And U2The operation times are respectively P1And P2Then U is1+U2=P1+P21. Correcting the operation frequency ratio of the two types of users to obtain the corrected operation frequency ratio of the two types of users which is respectively marked as P'1And P'2Wherein, in the step (A),
Figure BDA0002173369310000101
Figure BDA0002173369310000102
corrected P 'is'1And P'2The rationality of (a) was verified, wherein:
Figure BDA0002173369310000103
the result is that M is less than or equal to N,
Figure BDA0002173369310000104
and is
Figure BDA0002173369310000105
Thus P'1Is more than or equal to 0. It can be understood from the same reason that P'2≥0。
And because of the fact that,
Figure BDA0002173369310000106
from this, 1 ≥ P'1≥0,1≥P′2≥0。
Optionally, in step S14, the determining, according to the total number of times and the correction ratio, the target prediction operation number of each type of user on the delivered content in the total number of times includes:
and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
Referring to the above example for explanation, the target prediction operation times of the user having the corresponding first feature information in the recorded data are:
Figure BDA0002173369310000107
the result is that M is less than or equal to N,
Figure BDA0002173369310000108
namely P'1×N≥U1And (4) x M. Similarly, P'2×N≥U2×M。
Taking table 2 as an example, the correction result is verified. Wherein, the male user is taken as a first type user, and the female user is taken as a second type user. Then P 'after correction'1=(1-50/60)×0.9+0.5×50/60=0.5667,P′2(1-50/60) × 0.1+0.5 × 50/60 ═ 0.4333. Further, the independent exposure for men was 25, and the exposure for men was: 0.5667 × 60 ═ 34. The female independent exposure is 25, and the female exposure is: 0.4333 × 60 is 26, i.e., the exposure is ≧ independent exposure.
That is, the ratio of the operation times of each type of user on the released content in the sample data is corrected, so that the operation times of each type of user on the released content in the record data determined by the correction ratio and the record data is greater than or equal to the number of the users in the type. Therefore, the phenomenon that the number of users is larger than the operation times in the calculation results of the related technology is avoided, and the reasonability of the calculation results is improved.
Fig. 2 is a flowchart illustrating another data mining method according to an exemplary embodiment of the present disclosure, and referring to fig. 2, the method includes:
and S21, acquiring the record data of the advertisement and the sample data of the advertisement.
The recorded data comprises the browsing amount and the browsing number of the delivered advertisements, and the sample data comprises the gender information of each user and the browsing times of the advertisements by each user.
And S22, respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data.
And S23, if the number of the predicted users is larger than the predicted browsing times, correcting the proportion of the browsing times of the corresponding gender users to the advertisement in the sample data according to the proportion of the number of the users of each gender in the sample data and the recorded data to obtain a corrected proportion.
And S24, determining the target predicted browsing times of the users of all genders in the browsing volume to the delivered content according to the browsing volume and the correction proportion.
And aiming at users with the same gender, the predicted user number is less than or equal to the target predicted browsing times. For example, the number of users of different genders in the sample data may be multiplied by the total number of users operating the released content in the log data, and the product may be used as the number of users of the corresponding gender in the log data.
That is, when the number of users of each gender and the browsing times corresponding to the users of each gender in the log data are predicted according to the number distribution of the users of each gender in the sample data and the distribution of the operation times of the users of each gender in the sample data, if the number of the users corresponding to the gender in the calculation result is greater than the browsing times corresponding to the gender, the ratio of the browsing times of the users of each gender in the sample data can be corrected according to the ratio of the number of the users of each gender in the sample data and the log data to obtain a correction ratio. And finally, the browsing times of the users of all genders in the recorded data are determined to be more than or equal to the number of the users of the corresponding genders according to the correction proportion and the total browsing times of the users in the recorded data, so that the calculation result is corrected.
Fig. 3 is a block diagram of a data mining apparatus according to an exemplary embodiment of the disclosure, and as shown in fig. 3, the apparatus 300 includes:
a first obtaining module 301, configured to obtain record data of released content, where the record data includes a total number of times that a user operates the released content and a total number of users operating the released content;
a second obtaining module 302, configured to obtain sample data of the released content, where the sample data includes feature information of each user and the number of times that the user operates the released content;
a first determining module 303, configured to respectively determine, according to a ratio of the number of each type of user in the sample data and a ratio of the number of operations on the delivered content by each type of user in the sample data, the number of predicted users corresponding to each type of user in the record data, and the number of predicted operations on the delivered content by each type of user;
a calculating module 304, configured to, when the number of predicted users is greater than the number of predicted operations, modify, according to the ratio of the number of each type of user in the sample data and the record data, the ratio of the number of operations on the released content by each type of user in the sample data, so as to obtain a modified ratio;
a second determining module 305, configured to determine, according to the total number of times and the correction ratio, a target prediction operation number of times of each type of user on the delivered content in the total number of times, where the number of predicted users is less than or equal to the target prediction operation number of times.
By adopting the device, when the number of various users and the operation times corresponding to various users in the record data are predicted according to the number distribution of various users in the sample data and the distribution of the operation times of the sample data to the released content, if the number of the users corresponding to various types in the calculation result is greater than the operation times corresponding to the users, the ratio of the operation times of each type of users in the sample data can be corrected according to the number ratio of each type of users in the sample data and the record data, and a correction ratio is obtained. And finally, determining the operation times of each type of users on the released content in the record data according to the corrected proportion and the total times of the users on the released content in the record data, wherein the operation times of each type of users on the released content in the record data are more than or equal to the number of the types of users. Therefore, the phenomenon that the number of users is larger than the operation times in the calculation result of the related technology is avoided, and more reasonable data reference is provided for the content delivery party.
Optionally, the calculating module 304 corrects the proportion of the operation times of each type of user on the released content in the sample data by using the following formula:
Figure BDA0002173369310000131
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
Optionally, the second determining module 305 is configured to:
and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
Optionally, the log data includes a browsing volume and a browsing number of the delivered advertisement, the sample data includes gender information of each user and a number of times that the user browses the advertisement, and the first determining module 303 is configured to:
respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data;
the calculation module 304 is configured to, when the predicted number of users is greater than the predicted browsing number, modify, according to the ratio of the number of users of each gender in the sample data and the recorded data, the ratio of the browsing number of the advertisement by the user of the corresponding gender in the sample data to obtain a modified ratio;
the second determining module 305 is configured to determine, according to the browsing volume and the correction ratio, a target predicted browsing frequency of each gender of users to the delivered content in the browsing volume, where the predicted user number is less than or equal to the target predicted browsing frequency for users of the same gender.
In this way, when the number of users of each gender and the browsing times corresponding to the users of each gender in the log data are predicted according to the number distribution of the users of each gender in the sample data and the distribution of the operation times of the users of each gender on the delivered content, if the number of the users corresponding to the gender in the calculation result is greater than the browsing times corresponding to the gender, the ratio of the browsing times of the users of each gender in the sample data can be corrected according to the number ratio of the users of each gender in the sample data and the log data to obtain a correction ratio. And finally, the browsing times of the users of all genders in the recorded data are determined to be more than or equal to the number of the users of the corresponding genders according to the correction proportion and the total browsing times of the users in the recorded data, so that the calculation result is corrected.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
It is further noted that the above-described device embodiments are examples and are not necessarily required for the invention per se. For example, as shown in fig. 3, in the data mining apparatus, the first obtaining module 301 and the second obtaining module 302 may be independent apparatuses or may be the same apparatus in specific implementation, which is not limited in this disclosure.
The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the data mining method as shown in fig. 1 or fig. 2.
The present disclosure also provides an electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the data mining method shown in fig. 1 or fig. 2.
Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. For example, the electronic device 400 may be provided as a server. Referring to fig. 4, the electronic device 400 comprises a processor 422, which may be one or more in number, and a memory 432 for storing computer programs executable by the processor 422. The computer program stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processor 422 may be configured to execute the computer program to perform the data mining method described above.
Additionally, electronic device 400 may also include a power component 426 and a communication component 450, the power component 426 may be configured to perform power management of the electronic device 400, and the communication component 450 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 400. The electronic device 400 may also include input/output (I/O) interfaces 458. The electronic device 400 may operate based on an operating system stored in the memory 432, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, and the like.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data mining method described above is also provided. For example, the computer readable storage medium may be the memory 432 described above that includes program instructions executable by the processor 422 of the electronic device 400 to perform the data mining method described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned data mining method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method of data mining, the method comprising:
acquiring record data of released content and sample data of the released content, wherein the record data comprises the total times of operating the released content by a user and the total number of the users operating the released content, and the sample data comprises characteristic information of each user and the times of operating the released content by the user;
respectively determining the number of the predicted users corresponding to each type of users in the recorded data and the number of the predicted operations corresponding to each type of users on the released content according to the proportion of the number of each type of users in the sample data and the proportion of the operation times of each type of users on the released content in the sample data;
if the number of the predicted users is larger than the number of the predicted operation times, correcting the proportion of the number of the operation times of each type of users on the released content in the sample data according to the proportion of the number of each type of users in the sample data and the recorded data to obtain a corrected proportion;
and determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion, wherein the number of the predicted users is less than or equal to the target prediction operation times.
2. The method according to claim 1, wherein the modifying the ratio of the number of operations of each type of user on the released content in the sample data according to the ratio of the number of each type of user in the sample data and the record data comprises:
and correcting the proportion of the operation times of each type of users on the released content in the sample data by adopting the following formula:
Figure FDA0002173369300000011
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
3. The method according to claim 1, wherein the determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the modification ratio comprises: and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
4. A method according to any one of claims 1 to 3, wherein the log data includes the viewed amount and viewed population of the delivered advertisements, and the sample data includes gender information of each user and the number of times the user viewed the advertisement, the method comprising:
respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data;
if the number of the predicted users is larger than the predicted browsing times, correcting the proportion of the browsing times of the corresponding gender users to the advertisement in the sample data according to the proportion of the number of the users of each gender in the sample data and the recorded data to obtain a corrected proportion;
and determining the target predicted browsing times of the users of all genders in the browsing amount to the delivered content according to the browsing amount and the correction proportion, wherein the predicted user number is less than or equal to the target predicted browsing times for the users of the same gender.
5. A data mining device, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring recorded data of released contents, and the recorded data comprises the total times of the users operating the released contents and the total number of the users operating the released contents;
the second acquisition module is used for acquiring sample data of the released content, wherein the sample data comprises the characteristic information of each user and the times of the user operating the released content;
the first determining module is used for respectively determining the number of the predicted users corresponding to each type of users in the recorded data and the number of the predicted operations corresponding to each type of users on the released content according to the proportion of the number of each type of users in the sample data and the proportion of the operation times of each type of users on the released content in the sample data;
the calculation module is used for correcting the proportion of the operation times of each type of users to the released content in the sample data according to the proportion of the quantity of each type of users in the sample data and the recorded data when the quantity of the predicted users is larger than the predicted operation times to obtain a corrected proportion;
and the second determining module is used for determining the target prediction operation times of each type of users on the delivered content in the total times according to the total times and the correction proportion, wherein the number of the predicted users is less than or equal to the target prediction operation times.
6. The apparatus according to claim 5, wherein the calculating module corrects the sample data by using the following formula:
Figure FDA0002173369300000031
wherein, P'iFor the correction ratio, PiThe ratio of the operation times of the ith class user to the released content in the sample data, UiAnd M is the proportion of the number of the ith class users in the sample data, M is the total number of the users in the recorded data, and N is the total number of times of the users operating the release content in the recorded data.
7. The method of claim 5, wherein the second determination module is configured to:
and taking the product of the correction proportion and the total times as the target prediction operation times of each type of users on the delivered content in the recorded data.
8. The method according to any one of claims 5 to 7, wherein the log data includes a browsing volume and a browsing population of the delivered advertisement, the sample data includes gender information of each user and a number of times the user browses the advertisement, and the first determining module is configured to:
respectively determining the predicted user number of the users with the corresponding gender in the recorded data and the predicted browsing times of the users with the corresponding gender on the advertisement according to the proportion of the number of the users with the different gender in the sample data and the proportion of the browsing times of the users with the different gender in the sample data;
the calculation module is used for correcting the proportion of the browsing times of the advertisement by the users with the corresponding gender in the sample data according to the proportion of the quantity of the users with the respective gender in the sample data and the recorded data when the quantity of the predicted users is larger than the predicted browsing times to obtain a corrected proportion;
and the second determining module is used for determining the target predicted browsing times of the users of all genders in the browsing volume on the delivered content according to the browsing volume and the correction proportion, wherein the predicted user number is less than or equal to the target predicted browsing times for the users of the same gender.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 4.
CN201910770384.8A 2019-08-20 2019-08-20 Data mining method and device, storage medium and electronic equipment Pending CN112418896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910770384.8A CN112418896A (en) 2019-08-20 2019-08-20 Data mining method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910770384.8A CN112418896A (en) 2019-08-20 2019-08-20 Data mining method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112418896A true CN112418896A (en) 2021-02-26

Family

ID=74779511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910770384.8A Pending CN112418896A (en) 2019-08-20 2019-08-20 Data mining method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112418896A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335876A (en) * 2015-11-05 2016-02-17 精硕世纪科技(北京)有限公司 Effect tracking method and apparatus of advertisement put on media
CN105787767A (en) * 2016-03-03 2016-07-20 上海珍岛信息技术有限公司 Method and system for obtaining advertisement click-through rate pre-estimation model
CN107371048A (en) * 2017-07-13 2017-11-21 北京奇艺世纪科技有限公司 A kind of Forecasting Methodology and device of the stock of publicity orders
CN107481029A (en) * 2017-07-13 2017-12-15 北京奇艺世纪科技有限公司 A kind of Forecasting Methodology and device of the stock of publicity orders
CN109191217A (en) * 2018-11-12 2019-01-11 北京奇艺世纪科技有限公司 A kind of video ads impressions prediction technique and device
CN109472632A (en) * 2018-09-25 2019-03-15 平安科技(深圳)有限公司 Evaluate method, apparatus, medium and the electronic equipment of advertisement competition power
CN109783686A (en) * 2019-01-21 2019-05-21 广州虎牙信息科技有限公司 Behavioral data processing method, device, terminal device and storage medium
CN109784978A (en) * 2018-12-19 2019-05-21 平安科技(深圳)有限公司 Advertisement competition power calculation method, device, medium and equipment based on big data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335876A (en) * 2015-11-05 2016-02-17 精硕世纪科技(北京)有限公司 Effect tracking method and apparatus of advertisement put on media
CN105787767A (en) * 2016-03-03 2016-07-20 上海珍岛信息技术有限公司 Method and system for obtaining advertisement click-through rate pre-estimation model
CN107371048A (en) * 2017-07-13 2017-11-21 北京奇艺世纪科技有限公司 A kind of Forecasting Methodology and device of the stock of publicity orders
CN107481029A (en) * 2017-07-13 2017-12-15 北京奇艺世纪科技有限公司 A kind of Forecasting Methodology and device of the stock of publicity orders
CN109472632A (en) * 2018-09-25 2019-03-15 平安科技(深圳)有限公司 Evaluate method, apparatus, medium and the electronic equipment of advertisement competition power
CN109191217A (en) * 2018-11-12 2019-01-11 北京奇艺世纪科技有限公司 A kind of video ads impressions prediction technique and device
CN109784978A (en) * 2018-12-19 2019-05-21 平安科技(深圳)有限公司 Advertisement competition power calculation method, device, medium and equipment based on big data
CN109783686A (en) * 2019-01-21 2019-05-21 广州虎牙信息科技有限公司 Behavioral data processing method, device, terminal device and storage medium

Similar Documents

Publication Publication Date Title
US20200372526A1 (en) Methods and apparatus to determine ratings data from population sample data having unreliable demographic classifications
US20070198937A1 (en) Method for determining a profile of a user of a communication network
US20160217383A1 (en) Method and apparatus for forecasting characteristic information change
US20110196821A1 (en) Method and system for generation, adjustment and utilization of web pages selection rules
US10937053B1 (en) Framework for evaluating targeting models
US11887132B2 (en) Processor systems to estimate audience sizes and impression counts for different frequency intervals
US20170068964A1 (en) Ranking of sponsored content items for compliance with policies enforced by an online system
US11372805B2 (en) Method and device for information processing
CN108021673A (en) A kind of user interest model generation method, position recommend method and computing device
CN108965951B (en) Advertisement playing method and device
US20120136877A1 (en) System and method for selecting compatible users for activities based on experiences, interests or preferences as identified from one or more web services
CN111582771A (en) Risk assessment method, device, equipment and computer readable storage medium
US20180218286A1 (en) Generating models to measure performance of content presented to a plurality of identifiable and non-identifiable individuals
CN110782291A (en) Advertisement delivery user determination method and device, storage medium and electronic device
US11238367B1 (en) Distribution of content based on machine learning based model by an online system
CN110782286A (en) Advertisement pushing method and device, server and computer readable storage medium
CN101971198A (en) Qualitative and quantitative method for rating a brand using keywords
US11107120B1 (en) Estimating the reach performance of an advertising campaign
CN111311312A (en) Advertisement effect evaluation method, display terminal and computer-readable storage medium
US11188846B1 (en) Determining a sequential order of types of events based on user actions associated with a third party system
US10504136B2 (en) Measuring performance of content among groups of similar users of an online system
CN110347973B (en) Method and device for generating information
CN111369281A (en) Online message processing method, device, equipment and readable storage medium
CN112418896A (en) Data mining method and device, storage medium and electronic equipment
CN113139826A (en) Method and device for determining distribution authority of advertisement space and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination