CN108427690B - Information delivery method and device - Google Patents
Information delivery method and device Download PDFInfo
- Publication number
- CN108427690B CN108427690B CN201710081843.2A CN201710081843A CN108427690B CN 108427690 B CN108427690 B CN 108427690B CN 201710081843 A CN201710081843 A CN 201710081843A CN 108427690 B CN108427690 B CN 108427690B
- Authority
- CN
- China
- Prior art keywords
- round
- training set
- kth
- population
- diffusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000002716 delivery method Methods 0.000 title claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 158
- 238000009792 diffusion process Methods 0.000 claims abstract description 100
- 239000013598 vector Substances 0.000 claims abstract description 58
- 230000003578 releasing effect Effects 0.000 claims abstract description 30
- 238000012216 screening Methods 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 15
- 230000006399 behavior Effects 0.000 claims description 21
- 238000001914 filtration Methods 0.000 claims description 6
- 238000005266 casting Methods 0.000 claims 2
- 238000012163 sequencing technique Methods 0.000 claims 2
- 238000006243 chemical reaction Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 7
- 238000007477 logistic regression Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides an information delivery method and device. The method comprises the following steps: generating a kth round of diffusion training set according to the initial seed population and the feedback data of the kth-1 round of releasing population; each sample in the k-th diffusion training set has a feature vector, and the feature vector comprises a plurality of attributes and corresponding attribute values of the individual sample; performing a kth round of iterative training by using the feature vectors of all the individual samples in the kth round of diffusion training set to obtain a kth round of diffusion model; screening out kth round of releasing crowd from whole crowd by using the kth round of diffusion model, wherein the kth round of releasing crowd is used for kth round of information releasing. In the embodiment of the invention, the k-th diffusion training set is adjusted based on the feedback data of the previous release crowd, so that even if the initial seed crowd quality is poor, the sample can be adjusted through the feedback data and multiple iterations, the matching degree of the release crowd and the information is higher and higher, and the accuracy of the release crowd is improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an information delivery method and device.
Background
Information delivery is needed in many fields at present. For example, advertisement push of WeChat friend circles, or paper recommendation of paper search websites, etc.
The existing information delivery mode can be used for carrying out crowd diffusion based on seed crowds to obtain delivery crowds, and then information delivery is carried out on the delivery crowds. By seed population is meant the population with the same needs and interests for a product or service in a particular application scenario. The population to be delivered should theoretically have the same needs and interests as the seed population, but the number is several times, dozens of times or even hundreds of times of the seed population.
The accuracy of the released crowd diffused by the seed crowd determines the accuracy of information release, so that how to accurately diffuse the released crowd is the hot of research and development at present.
Disclosure of Invention
In view of this, embodiments of the present invention provide information delivery and an apparatus thereof, so as to improve accuracy of diffusing delivery people and further improve accuracy of information delivery.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
an information delivery method, comprising:
generating a k-th round diffusion training set according to the initial seed crowd and the feedback data of the k-1 th round throwing crowd; k is an integer; each individual in the k-th round of diffusion training set has a feature vector comprising a plurality of attributes and corresponding attribute values for the individual;
performing kth iteration training by using the feature vectors of all individuals in the kth diffusion training set to obtain a kth diffusion model;
screening out the kth round of put population from the whole population by using the kth round of diffusion model; and the kth round of releasing crowd is used for the kth round of releasing information.
An information delivery apparatus, comprising:
the diffusion training set generating unit is used for generating a kth round of diffusion training set according to the initial seed crowd and the feedback data of the kth-1 round of throwing crowd; k is an integer; each individual in the diffusion training set has a feature vector comprising a plurality of attributes and corresponding attribute values for the individual;
the training unit is used for performing kth iteration training by using the feature vectors of all individuals in the kth diffusion training set to obtain a kth diffusion model;
the screening unit is used for screening the kth round of thrown people from the whole people by using the kth round of diffusion model; and the kth round of information releasing crowd is used for performing kth round of information releasing.
In the embodiment of the invention, the k-th round diffusion training set is adjusted based on the feedback data of the released crowd in the previous round (k-1), so that even if the quality of the initial seed crowd is poor, the sample can be adjusted through the feedback data and multiple iterations, the matching degree of the released crowd and the information is higher and higher, and the precision of the released crowd is improved.
Meanwhile, when iterative training is carried out, feature vectors of all individuals in a k-th diffusion training set are introduced for model training, and a k-th releasing crowd is screened out according to a trained model, so that the trained model can accurately distinguish crowds similar to a positive sample subset, and the accuracy of the diffused releasing crowd is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of an application scenario;
fig. 2 is a diagram illustrating an example of a computer rack of an information delivery platform or a server according to an embodiment of the present invention;
fig. 3-5 are exemplary flowcharts of an information delivery method according to an embodiment of the present invention;
fig. 6 is an exemplary structural diagram of an information delivery apparatus according to an embodiment of the present invention.
Detailed Description
The invention provides an information delivery method and an information delivery device, which can be applied to various application fields needing crowd diffusion, such as the fields of crowd diffusion and advertisement delivery of WeChat friend circles.
Fig. 1 shows an application scenario of the information delivery apparatus, which may include: an information delivery platform 101 and a database 102.
The functions of the information delivery platform 101 may be implemented by one or more information delivery servers.
In the present invention, the information delivery platform 101 is mainly responsible for obtaining delivery population based on the initial seed population diffusion, and delivering information to the client of the delivery population.
The information delivery device may be applied to the information delivery server in the form of software, or may be a component of the information delivery server in the form of hardware (for example, specifically, a controller/processor of the information delivery server).
When the information delivery device exists in the form of software, the information delivery device may be specifically an application program, such as a terminal application program, and may also be a component or a plug-in of an application program or an operating system.
The database 102 may be used to store user unique Identifiers (IDs) of all users under the information delivery platform, basic information, and various attributes and attribute values of each user.
The basic information may include a mobile phone number, a mailbox, and the like, and the attribute may exemplarily include: location, sex, age, height, etc. In some application scenarios, the attributes may also include: interest tags (which are information reflecting the interest of the user), purchase times, and the like, which are not described in detail herein. The functions of database 102 may be performed by one or more database nodes.
In practice, the functions of the information delivery platform 101 and the database 102 may be implemented by the same server.
In addition, in some application scenarios, the database 102 may also be used to provide an initial seed population. Of course, the initial seed population may also be provided by an information publisher, such as an advertiser.
The database 102 may further be comprised of one or more servers due to differences in the information provided by the database 102. For example, the database 102 may include an underlying information server or the like.
Fig. 2 shows a general computer system structure of the information delivery platform/server/device.
The computer system may include a bus, a processor 1, a memory 2, a communication interface 3, an input device 4, and an output device 5. The processor 1, the memory 2, the communication interface 3, the input device 4, and the output device 5 are connected to each other through a bus. Wherein:
a bus may comprise a path that carries information between various components of a computer system.
The Processor 1 may be a general-purpose Processor, such as a general-purpose Central Processing Unit (CPU), a Network Processor (NP), a microprocessor, etc., an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program according to the present invention, a Digital Signal Processor (DSP), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.
The memory 2 stores programs for executing the technical scheme of the invention, and can also store an operating system and other key services. In particular, the program may include program code comprising computer operating instructions. More specifically, memory 2 may include a read-only memory (ROM), another type of static storage device that may store static information and instructions, a Random Access Memory (RAM), another type of dynamic storage device that may store information and instructions, a magnetic disk storage, a flash memory, and so forth.
The input device 4 may include means for receiving data and information input by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer or gravity sensor, etc.
The output device 5 may comprise means allowing output of information to a user, such as a display screen, a printer, a loudspeaker, etc.
The communication interface 3 may comprise means for using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN) or the like.
The processor 1 executes the program stored in the memory 2 and calls other devices, which can be used to implement the steps of the information delivery method provided by the embodiment of the present invention.
The embodiments of the present invention will be described in further detail below based on the above common aspects to which the present invention relates.
Fig. 3 shows an exemplary flow of the information delivery method. The method shown in fig. 3 is applied to the field or application scenario mentioned in fig. 1, and is performed by the processor 1 of the information delivery platform (or server) shown in fig. 2 interacting with other devices.
The above exemplary process includes:
in section 301: and a processor 1 of the information delivery platform (or the server) generates a k-th round diffusion training set according to the initial seed crowd and the feedback data of the k-1 th round delivery crowd.
K is an integer which is incremented one by one starting from 0.
The initial seed population refers to a population having the same demand and interest for a product or service in a specific application scenario (e.g., WeChat circle of friends).
In an advertisement placement scenario, an initial seed population may be provided by an advertiser. The advertiser may upload the initial seed demographic package to enable the provision of the initial seed demographic. The content of the initial seed crowd package may include at least one of a phone number, an account number, a mailbox, a user unique Identification (ID), and the like. For telephone numbers, accounts and mailboxes, the association relationship with the user ID can be converted into the user ID.
In other application scenarios, the initial seed population may also be obtained from a certain database or databases, or from a trading platform.
And identifying a user on the platform corresponding to each individual in the k-th diffusion training set and the k-1 th delivery crowd by using the user ID. Further, each individual has a feature vector (the feature vector can be obtained from the database 102), and the feature vector includes a plurality of attributes and corresponding attribute values of the individual.
For example, the k-th round of diffusion training set includes 100 user IDs, each corresponding to a plurality of attributes and attribute values.
The attributes may illustratively include: location, sex, age, height, etc. In some application scenarios, the attributes may also include: interest tags (which are information for reflecting the user's interest), the number of purchases, and the like.
The attribute value is a specific value corresponding to a certain attribute. For example, height 1.8m, height being an attribute, 1.8m being the attribute value of height. Of course, the attribute value may be one interval.
The feedback data can comprise feedback statistics of the k-1 th delivery population and behavior data of each individual in the k-1 th delivery population.
And the feedback statistical data is calculated according to the behavior data of all individuals of the k-1 th round of putting population.
In one example, the feedback statistics may include click-through rates, and accordingly, the behavioral data for each user may include data characterizing whether to click through.
The click rate is calculated as click/exposure. For example, suppose that the number of people delivered in the k-1 th round is 100, and the 100 people open the WeChat client 1000 times in the k-1 time period and click 10 times on a certain advertisement position. The click rate is 10/1000-1%.
In another example, the feedback statistics may include conversion rates. The conversion rate may include a like rate (number of like persons/number of put persons), a dislike rate (number of dislike persons/number of put persons), a comment rate (number of comment persons/number of put persons), and the like.
Correspondingly, the behavior data of each user can include APP download information, friend circle praise information, friend circle comment information, even information of 'no interest', information of 'dislike' and the like.
At element 302: and the processor 1 of the information delivery platform (or the server) performs the kth round of iterative training by using the feature vectors of all the individuals in the kth round of diffusion training set to obtain a kth round of diffusion model.
It should be noted that the embodiment does not have to perform iterative training and advertisement delivery endlessly. The iterative training and subsequent ad placement will stop when the stop condition is met.
In one example, the stop condition may include the number of iterations reaching an upper limit. In another example, the stop condition may include the number of people who have delivered the crowd reaching the advertiser's requirement, etc.
The k-th round diffusion training set comprises a first positive sample subset and a first negative sample subset. The first and second steps in the present invention are for distinction and are not intended to indicate the order of the steps.
Individuals in the first subset of positive samples are positive samples, and individuals in the first subset of negative samples are negative samples.
Accordingly, the k-th round diffusion model obtained by training may include: a first discriminating feature vector (or first eigenvalue weight vector) for discriminating between positive and negative samples.
The first distinguishing feature vector may include: and the attribute and the corresponding attribute value are strongly associated with the training target of distinguishing the positive sample from the negative sample.
The foregoing mentions that the feature vector of each individual includes a plurality of attributes, and in the iterative training, the weight of each attribute with respect to the training target is calculated, and the larger the weight is, the stronger the association with the training target is. Wherein, the training target can be a target of distinguishing positive and negative samples.
Since there is a first subset of negative examples, the weights may also be negative.
For example, assuming that the common attributes 1-4 have weights of 2, 0.67, 0.625, and-0.125, respectively, with respect to the training target, if 3 attributes are taken, the attributes 1-3 are attributes strongly associated with distinguishing between positive and negative samples.
It should be noted that the attribute value of the attribute in the first distinguishing feature vector may be a value range or an average value.
For example, assuming that the first distinguishing feature vector includes an attribute of age and attribute values, and there are 4 individuals in the first positive sample subset, and the ages are 20, 25, 15, and 20, respectively, the attribute value of the attribute of age in the first distinguishing feature vector may be [15,25], or (20+25+15+20)/4 ═ 20.
At part 303: screening the k-th round of put population from the whole population by using a k-th round diffusion model.
In different application scenarios, the whole population refers to all users of the platform. For example, under the wechat platform, the whole population refers to all wechat users.
In one example, each individual (i.e., user) in the overall population may be scored using a k-th round diffusion model, the users may be ranked according to the score from high to low, and top N may be selected as the impression population.
The score (or called score) characterizes the similarity of the feature vector of the individual to the first discriminative feature vector. The higher the score, the higher the similarity of the feature vector characterizing the respective individual to the first distinguishing feature vector.
The value of N is determined by the placement size selected by the advertiser, for example, if the placement size is 10 ten thousand, N is 10 ten thousand.
Alternatively, users with scores greater than a certain threshold may be considered as the delivery population.
At element 304: and the processor 1 of the information delivery platform (or the server) performs the kth round of information delivery on the kth round of delivery crowd through the communication interface 3.
Or, the processor 1 of the information delivery platform (or server) may output the kth round of delivered people through the communication interface 3, and the other platforms deliver information to the kth round of delivered people.
And after entering the information release, judging whether the next round of information release is carried out, if so, generating a diffusion training set of the next round, and executing subsequent operation.
It should be noted that the traditional method for population diffusion is as follows:
taking a seed crowd packet as a positive sample subset, and randomly selecting a negative sample subset from the whole crowd to form a training set;
secondly, training a linear Logistic Regression (LR) model by using a training set;
and thirdly, predicting the whole population by using the trained LR model, and taking out TOPN users as the population to be delivered.
The disadvantages are that:
the seed population with poor quality spreads out the releasing population with poor quality, and releasing advertisements to the releasing population with poor quality causes poor releasing effect and damages the benefit of users;
the randomly extracted negative sample subset does not carry characteristic information, so that the trained model cannot accurately distinguish positive and negative samples, and the model accuracy is poor.
In the embodiment of the invention, the k-th round diffusion training set is adjusted based on the feedback data of the released crowd in the previous round (k-1), so that even if the quality of the initial seed crowd is poor, the sample can be adjusted through the feedback data and multiple iterations, the matching degree of the released crowd and the information is higher and higher, and the precision of the diffused released crowd is further improved.
Meanwhile, when iterative training is carried out, feature vectors of all individuals in a k-th diffusion training set are introduced for model training, and a k-th releasing crowd is screened out according to a trained model, so that the trained model can accurately distinguish crowds similar to a positive sample subset, and the accuracy of the diffused releasing crowd is improved.
The technical solution of the present invention will be further described below by taking an advertisement placement scenario as an example.
Fig. 4 shows another exemplary flow of the information delivery method described above. The method shown in fig. 4 can be applied to the application scenario shown in fig. 1, and is performed by the processor 1 in the information delivery platform/server shown in fig. 2 interacting with other components.
Because of the multiple iteration processes, the present embodiment takes the 0 th iteration training and advertisement delivery, and the m (m is not equal to 0) th iteration and advertisement delivery as examples for teaching. m is equal to any value of k ≠ 0.
The exemplary process includes:
in part 400: the initial seed population is obtained as the positive sample subset (first positive sample subset) of the 0 th round diffusion training set.
For a description of the initial seed population, reference may be made to section 301 of the foregoing embodiment, which is not described herein.
In section 401: and randomly selecting the same amount of population as the initial seed population from the whole population as the negative sample subset (the first negative sample subset) of the 0 th round of diffusion training set.
For example, if the number of users in the initial seed population is 5 ten thousand, 5 general users are randomly selected from the whole population as the first negative sample subset.
For a general description of the population, reference is made to the aforementioned embodiment, section 303, and further description is omitted here.
At part 402: feature vectors are obtained for the first subset of positive samples and the first subset of negative samples.
For related descriptions of the feature vectors, reference may be made to the section 301 of the foregoing embodiment, which is not described herein again.
Thus, each sample in the 0 th round of diffusion training set has a feature vector.
At element 403: and importing the feature vector of each individual in the first positive sample subset and the first negative sample subset into a first preset model for training and learning to obtain a 0 th round diffusion model.
More specifically, the first predetermined model may be a logistic regression model (LR), and the LR may further refine the model including Spark ADMMLR model. Due to the large number of samples, the Spark ADMMLR model can be selected for training and learning.
Of course, other logic models, such as decision trees, support vector machines, etc., may be selected in other embodiments of the invention.
Wherein, the 0 th round diffusion model comprises: a first discriminative feature vector. For a related description of the first distinguishing feature vector, reference is made to the aforementioned section 301, which is not described herein again.
At element 404: and screening the 0 th round of thrown people from the whole people by using the 0 th round of diffusion model.
In part 405: and carrying out the 0 th round information delivery on the 0 th round delivery crowd, and obtaining feedback data of the 0 th round delivery crowd.
More specifically, after the 0 th round of information delivery, the feedback data of the 0 th round of delivery crowd can be acquired after waiting for a predetermined time. For example, it may wait for 10 minutes, an hour, a day, etc.
For a related description of the feedback data, reference may be made to the section 301 of the foregoing embodiment, which is not described herein again.
In section 406: and generating an mth round putting training set according to the feedback data of the (m-1) th round putting crowd.
In the advertisement putting scene, the feedback data of the m-1 th round of putting crowd reflects the advertisement putting effect.
The mth round of impression training set includes a second subset of positive samples and a second subset of negative samples. Individuals in the second positive sample subset are taken as positive samples and individuals in the second negative sample subset are taken as negative samples. Similar to the 0 th round of impression training set, each individual in the mth round of impression training set has a feature vector.
As mentioned above, the feedback data of the k-1 th round may include feedback statistics (click-through rate or conversion rate) of the k-1 th round of impression crowd, and behavior data of each individual in the k-1 th round of impression crowd.
Correspondingly, the behavior data of the individuals in the second positive sample subset has a positive correlation with the feedback statistical data; the behavior data of the individuals in the second negative sample subset has an inverse relationship with the feedback statistical data.
The forward correlation means that the feedback statistic data increases as the number of individuals having the behavior data increases when the total number is constant.
Taking the feedback statistical data as the click rate as an example, the more the number of individuals with click behavior (which may be called click crowd) is, the higher the click rate is in the case of a certain total number.
For another example, taking feedback statistics data as an example of the conversion rate, in a case where the total number is constant, the conversion rate is higher as the number of individuals (which may be referred to as conversion crowd) having behaviors such as APP download, praise, and comment is larger.
Further, if the click rate is taken as the release target, the obtaining manner of the second positive sample subset of the mth release training set may specifically include: the click crowd is taken as the second positive sample subset.
If the conversion rate is used as the release target, the obtaining manner of the second positive sample subset of the mth release training set may specifically include: the transformed population is taken as the second positive sample subset.
The reverse association relationship means that the feedback statistic data decreases as the number of individuals having the behavior data increases when the total number is constant.
Whether the click rate or the conversion rate is used as the release target, the obtaining manner of the second negative sample subset of the mth release training set may specifically include:
putting other individuals, which are removed from the second positive sample subset of the mth round of putting training set in the mth-1 round of putting population, into the second negative sample subset of the mth round of putting training set; or, the m-1 th round of putting population is used as a second negative sample subset of the m-th round of putting training set.
Of course, if the number of samples of the second positive sample subset is the same as the number of samples of the second negative sample subset, the sampling may be performed to obtain the second negative sample subset of the mth round of launching the training set.
In the present embodiment, the mth round of delivery training set is obtained based on the feedback statistical data (click-through rate or conversion rate) of the delivery population of the previous round (m-1) and the behavior data of the individual. This may result in the second positive sample subset in the training set of the mth round being strongly associated with increasing click-through rate or conversion rate, and the second negative sample subset in the training set of the mth round being strongly associated with decreasing click-through rate or conversion rate.
And the click rate or the conversion rate represents the advertisement putting effect, namely, the positive strong correlation between the (positive sample subset in the) mth round putting training set and the advertisement putting effect. Therefore, the m-th round putting model obtained based on the m-th round putting training set can be accurately diffused into the putting crowd which is similar to the positive sample subset and is beneficial to improving the advertisement putting effect. Therefore, in this embodiment, even if the quality of the seed population is poor, the accurate delivery population can be gradually diffused.
At part 407: and importing the feature vectors of all individuals in the mth round of putting training set into a second preset model for training and learning to obtain the mth round of putting model.
The second predetermined model is similar to the first predetermined model, and is not described herein.
The mth round putting model may include: a second discrimination feature vector for discriminating positive and negative samples; the second discriminating feature vector includes: attributes strongly associated with the object of distinguishing between positive and negative samples and corresponding attribute values.
The second distinguishing feature vector is similar to the first distinguishing feature vector, and for related introduction, refer to the aforementioned portion 302, which is not described herein again.
It should be noted that, since the training sets are different, the specific contents included in the second distinguishing feature vector and the first distinguishing feature vector are also different.
In this embodiment, because the negative sample and the positive sample both carry the eigenvector, the trained release model can accurately distinguish the positive and negative samples compared with the existing mode, so that an accurate diffusion training set can be screened out through the release model subsequently. Therefore, even if the quality of the seed crowd is poor, the accurate throwing crowd can be gradually diffused.
At element 408: and screening the mth round diffusion training set from the mth round throwing training set and the initial seed crowd by using the mth round throwing model.
More specifically, the first positive sample subset in the mth round of diffusion training set can be obtained as follows:
scoring each individual in the initial seed population using the mth round of feeding model to obtain a set Seedscore { (u, score (u)) | u ∈ seed user }; wherein, SeedScore represents a score set (which may be referred to as a first score set) of the seed population, u represents a certain user in the initial seed population, and score (u) represents a score corresponding to a certain user in the seed population.
Filtering out the individuals with the scores smaller than the first threshold value theta 1 from the initial seed population to obtain the filtered seed population, which may also be called as the first subset P 1 ,P 1 ={u|score(u)>θ1,(u,score(u))∈SeedScore}。
And (3) scoring each individual in the second positive sample subset of the mth round of launching training set by using the mth round of launching model to obtain a set posiveadscore { (u, score (u) | u ∈ { the second positive sample subset of the mth round of launching training set }, wherein posiveadscore represents a set of scores (which can be referred to as a second set of scores) corresponding to the second positive sample subset in the mth round of launching training set.
Filtering out individuals with scores smaller than a second threshold value theta 2 from the second positive sample subset of the mth round of launching the training set to obtain a filtered second positive sample subset, which may also be called a second subset P 2 ,P 2 { u | score (u) > θ 2, (u, score (u)) ∈ PositiveADScore }. θ 1, θ 2 may be equal or different.
Merging the filtered seed population with the filtered second positive sample subset to obtain a first positive sample subset P of the mth round diffusion training set, that is, P ═ P 1 ∪P 2 。
In other embodiments, a certain number of users may also be randomly extracted from the union of the second positive sample subset of the mth round of launched training set and the initial seed group to obtain the first positive sample subset of the mth round of diffusion training set.
In this embodiment, positive sample is all elected from the seed crowd in each round, can guarantee like this that initial seed crowd puts in crowd's similarity with the kth round of selecting out to can carry out the crowd diffusion on the basis of guaranteeing to put in crowd and seed crowd's similarity.
The first negative sample subset in the mth round of diffusion training set can be obtained as follows:
(1) and (3) scoring the individuals in the second negative sample subset of the mth round putting training set by using the mth round putting model to obtain a set NegativeADscore ═ { i, score (i) } | i ∈ { the negative sample subset of the mth round putting training set }. NegativeADscore represents a score set (which may be referred to as a third score set) corresponding to the second negative sample subset in the mth round of putting training set, i represents a certain user in the second negative sample subset in the mth round of putting training set, and score (i) represents a score corresponding to a certain user in the second negative sample subset in the mth round of putting training set.
(2) Sampling the second negative sample subset in the mth round of putting training set by using Bernoulli distribution, wherein the sampling formula is as follows:
wherein p (i) represents the probability num of the ith individual in the second negative sample subset of the mth round of putting training set as a negative sample neg Number of samples, num, representing the second negative sample subset of the training set for the mth round of delivery p Represents the total number of samples in the training set put in the mth round, score (i) epsilon NegativeADscore,means that the scores of all individuals in the second negative sample subset in the training set are summed for the mth round of impressions.
(3) Randomly calculating a pure decimal, and if p (i) is less than or equal to the pure decimal calculated randomly, putting the ith individual into the first negative sample subset in the mth round of diffusion training set.
And (4) performing the operations of the steps (1) to (3) on each negative sample in the mth round of putting training set to finally obtain a first negative sample subset in the mth round of diffusion training set.
Of course, in other embodiments of the present invention, other probability distributions such as gaussian distribution may be used for sampling, which is not described herein.
At part 409: and leading the feature vector of each individual in the mth round diffusion training set into a first preset model for training and learning to obtain the mth round diffusion model.
In this embodiment, because the negative sample and the positive sample that mth round diffusion training concentrated all carry the eigenvector, the mth round diffusion model that consequently trains out can be compared in current mode and accurately can be distinguished positive and negative sample to follow-up accurate crowd of puting in can be selected.
In section 410: and screening the mth round of thrown people from the whole people by using the mth round of diffusion model.
At element 411: and performing information delivery on the mth delivery crowd, and obtaining feedback data of the mth delivery crowd.
The portion 411 is similar to the portion 405 and will not be described in detail.
Referring to fig. 5, an iterative schematic diagram of the embodiment shown in fig. 4 is shown.
In summary, the embodiment of the present invention introduces details of the 0 th iteration training and the other iterations training, and even if the quality of the seed population is poor, the delivery population which is beneficial to improving the advertisement delivery effect can be diffused based on the feedback data and the feature vector.
Fig. 6 shows a schematic diagram of a possible structure of the information delivery apparatus according to the above embodiment, including:
a diffusion training set generating unit 601, configured to generate a kth round diffusion training set according to feedback data of the initial seed population and the kth-1 round released population;
k is an integer; each individual in the diffusion training set has a feature vector comprising a plurality of attributes and corresponding attribute values for the individual;
a training unit 602, configured to perform a kth iteration training using the feature vectors of all individuals in the kth diffusion training set to obtain a kth diffusion model;
the screening unit 603 is configured to screen a kth round of thrown people from the whole people by using the kth round of diffusion model; and the kth round of releasing crowd is used for performing kth round of information releasing.
For details, please refer to the above description, which is not repeated herein.
In another embodiment of the present invention, referring to fig. 6, the information delivery device may further include:
and an advertisement delivery unit 604, configured to perform a kth round of information delivery on the kth round of delivered people.
Wherein, the diffusion training set generating unit 601 can be used to execute part 301 of the embodiment shown in fig. 3; in addition, the 400-402 portion of the embodiment shown in FIG. 4 may also be performed.
The training unit 602 may be used to perform part 302 of the embodiment shown in FIG. 3; in addition, portions 403, 406 and 409 of the embodiment shown in FIG. 4 may also be performed.
The screening unit 603 may be used to perform part 303 of the embodiment shown in fig. 3; in addition, portions 404, 410 of the embodiment shown in FIG. 4 may also be performed.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in user equipment.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (14)
1. An information delivery method, comprising:
generating a k-th round diffusion training set according to the initial seed crowd and the feedback data of the k-1 th round throwing crowd; k is an integer; each individual in the k-th round of diffusion training set has a feature vector comprising a plurality of attributes and corresponding attribute values for the individual; the feedback data comprise feedback statistical data of the k-1 th round of releasing crowd and behavior data of each individual in the k-1 th round of releasing crowd; the feedback data of the k-1 th round of putting population is obtained after the k-1 th round of putting population is selected, information is put on the k-1 th round of putting population and the preset time is waited;
performing a kth round of iterative training by using the feature vectors of all individuals in the kth round of diffusion training set to obtain a kth round of diffusion model;
scoring each individual in the whole crowd by using the kth round diffusion model, sequencing each individual according to the score from high to low, and selecting top N as the kth round releasing crowd; the kth round of information releasing crowd is used for releasing kth round of information; the value of N is determined by the delivery scale selected by the advertiser;
when k ≠ 0, the generating the k-th round diffusion training set comprises:
generating a kth putting training set according to the feedback data of the kth-1 putting crowd; the kth launch training set comprises a second positive sample subset and a second negative sample subset; the behavior data of the individuals in the second positive sample subset and the feedback statistical data of the k-1 th round of delivery population have a positive incidence relation; the behavior data of the individuals in the second negative sample subset and the feedback statistical data of the k-1 th round of putting population have a reverse correlation;
leading the feature vectors of all individuals in the k-th round of putting training set into a second preset model for training and learning to obtain a k-th round of putting model;
and screening the k-th round diffusion training set from the k-th round throwing training set and the initial seed population by using the k-th round throwing model.
2. The method of claim 1,
the k-th round diffusion training set comprises a first positive sample subset and a first negative sample subset; individuals in the first positive sample subset are taken as positive samples, and individuals in the first negative sample subset are taken as negative samples;
performing a kth round of iterative training by using the feature vectors of all individuals in the kth round of diffusion training set to obtain a kth round of diffusion model, wherein the kth round of diffusion model comprises:
and importing the feature vector of each individual in the first positive sample subset and the first negative sample subset into a first preset model for training and learning to obtain the k-th round diffusion model.
3. The method of claim 1 or 2, wherein when k is 0, the generating a kth round diffusion training set comprises:
obtaining the initial seed population as a first positive sample subset;
randomly selecting a population from the whole population, which is equal to the initial seed population, as a first negative sample subset;
and acquiring the feature vector of each individual in the k-th round of diffusion training set.
4. The method according to claim 1 or 2, wherein each individual in the kth round of putting training set has a feature vector.
5. The method of claim 1, wherein individuals in the second subset of positive samples are taken as positive samples and individuals in the second subset of negative samples are taken as negative samples;
the generating of the k-th putting training set comprises:
acquiring feedback data of the k-1 th round of throwing population; the feedback data comprises feedback statistical data of the k-1 th round of putting population and behavior data of each individual in the k-1 th round of putting population; the feedback statistical data are obtained by calculation according to the behavior data of all individuals of the k-1 th round of throwing population;
screening the second positive sample subset from the k-1 th round of launched population; each individual in the second subset of positive samples has a corresponding feature vector;
screening the second negative sample subset from the k-1 th round of launched population; each individual of the second subset of negative samples has a corresponding feature vector.
6. The method according to claim 5, wherein the second negative sample subset comprises other individuals from the k-1 st round of impressions crowd after the second positive sample subset is eliminated; or,
the second negative sample subset comprises the k-1 st impression crowd.
7. The method of claim 6, wherein said screening said kth round diffusion training set from said kth round placement training set and said initial seed population using said kth round placement model comprises:
scoring each individual in the initial seed population by using the k-th putting model, and filtering out individuals with scores smaller than a first threshold value in the initial seed population to obtain a filtered seed population;
scoring each individual in the second positive sample subset by using the k-th round putting model, and filtering out individuals with scores smaller than a second threshold value in the second positive sample subset to obtain a filtered second positive sample subset;
and taking the union of the filtered seed population and the filtered second positive sample subset as the first positive sample subset of the k-th diffusion training set.
8. The method of claim 7, wherein said screening said kth round diffusion training set from said kth round placement training set and said initial seed population using said kth round placement model further comprises:
scoring an ith individual in the second subset of negative samples using the kth round of delivery model;
calculating the probability p (i) that the ith individual is a negative sample based on the score of the ith individual;
randomly generating a pure fraction for the ith individual;
if the p (i) is less than or equal to the pure decimal number, putting the ith individual into a first negative sample subset of the k round diffusion training set.
9. An information delivery apparatus, the apparatus comprising:
the diffusion training set generating unit is used for generating a kth round of diffusion training set according to the initial seed crowd and the feedback data of the kth-1 round of throwing crowd; k is an integer; each individual in the diffusion training set has a feature vector comprising a plurality of attributes and corresponding attribute values for the individual; the feedback data comprise feedback statistical data of the k-1 th round of releasing crowd and behavior data of each individual in the k-1 th round of releasing crowd; the feedback data of the k-1 th round of putting population is obtained after the k-1 th round of putting population is selected, information is put on the k-1 th round of putting population and the preset time is waited;
the training unit is used for performing kth iteration training by using the feature vectors of all individuals in the kth diffusion training set to obtain a kth diffusion model; the screening unit is used for scoring each individual in the whole crowd by using the k-th round diffusion model, sequencing each individual according to the score from high to low, and selecting top N as the k-th round throwing crowd; the kth round of information releasing crowd is used for releasing kth round of information; the value of N is determined by the delivery scale selected by the advertiser;
the diffusion training set generating unit is further specifically configured to:
when k is not equal to 0, generating a kth throwing training set according to feedback data of the kth-1 throwing crowd; the kth launch training set comprises a second positive sample subset and a second negative sample subset; the behavior data of the individuals in the second positive sample subset and the feedback statistical data of the k-1 th round of delivery population have a positive incidence relation; the behavior data of the individuals in the second negative sample subset and the feedback statistical data of the k-1 th round of putting population have a reverse correlation;
leading the feature vectors of all individuals in the k-th round of putting training set into a second preset model for training and learning to obtain a k-th round of putting model;
and screening the k-th round diffusion training set from the k-th round throwing training set and the initial seed population by using the k-th round throwing model.
10. The apparatus of claim 9,
the k-th round diffusion training set comprises a first positive sample subset and a first negative sample subset; individuals in the first positive sample subset are taken as positive samples, and individuals in the first negative sample subset are taken as negative samples;
when k is equal to 0, in generating the k-th round diffusion training set, the diffusion training set generating unit is specifically configured to:
obtaining the initial seed population as a first positive sample subset;
randomly selecting a population from the whole population equal to the initial seed population as a first negative sample subset;
and acquiring the feature vector of each individual in the k-th round of diffusion training set.
11. The apparatus of claim 10, wherein each individual in the k-th impression training set has a feature vector.
12. The apparatus of claim 9,
individuals in the second positive sample subset are taken as positive samples, and individuals in the second negative sample subset are taken as negative samples;
in the aspect of generating the kth impression training set, the diffusion training set generating unit is specifically configured to:
obtaining feedback data of a kth-1 th round of releasing crowd, wherein the feedback data comprise feedback statistical data of the kth-1 th round of releasing crowd and behavior data of each individual in the kth-1 th round of releasing crowd; the feedback statistical data are obtained by calculation according to the behavior data of all individuals of the k-1 th round of throwing population;
screening the second positive sample subset from the k-1 st round of launched population; each individual in the second subset of positive samples has a corresponding feature vector;
screening the second negative sample subset from the k-1 th round of launched population; each individual of the second subset of negative samples has a corresponding feature vector.
13. The apparatus of claim 12, wherein in using the kth round of seed placement model to screen out the kth round of diffusion training set from the kth round of seed placement training set and the population of initial seeds, the diffusion training set generation unit is specifically configured to:
scoring each individual in the initial seed population by using the k-th putting model, and filtering out individuals with scores smaller than a first threshold value in the initial seed population to obtain a filtered seed population;
scoring each individual in the second positive sample subset by using the kth round of putting model, and filtering out individuals with scores smaller than a second threshold value in the second positive sample subset to obtain a filtered second positive sample subset;
and taking the union of the filtered seed population and the filtered second positive sample subset as the positive sample subset of the k-th diffusion training set.
14. The apparatus of claim 13, wherein in said screening out the kth round diffusion training set from the kth round casting training set and the initial seed population using the kth round casting model, the diffusion training set generating unit is further specifically configured to:
scoring an ith individual in the second subset of negative samples using the kth round of delivery model;
calculating a probability p (i) of the ith individual as a negative sample based on the score of the ith individual;
randomly generating a pure fraction for the ith individual;
if p (i) is less than or equal to the net decimal number, placing the ith individual into a first negative sample subset of the kth round diffusion training set.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710081843.2A CN108427690B (en) | 2017-02-15 | 2017-02-15 | Information delivery method and device |
PCT/CN2018/075521 WO2018149337A1 (en) | 2017-02-15 | 2018-02-07 | Information distribution method, device, and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710081843.2A CN108427690B (en) | 2017-02-15 | 2017-02-15 | Information delivery method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108427690A CN108427690A (en) | 2018-08-21 |
CN108427690B true CN108427690B (en) | 2022-09-13 |
Family
ID=63155504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710081843.2A Active CN108427690B (en) | 2017-02-15 | 2017-02-15 | Information delivery method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108427690B (en) |
WO (1) | WO2018149337A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866766A (en) * | 2018-08-27 | 2020-03-06 | 阿里巴巴集团控股有限公司 | Advertisement putting method, method for determining popularization crowd, server and client |
CN109615408B (en) * | 2018-10-24 | 2024-04-05 | 中国平安人寿保险股份有限公司 | Advertisement putting method and device based on big data, storage medium and electronic equipment |
CN111831827B (en) * | 2019-09-05 | 2023-12-08 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN110704706B (en) * | 2019-09-11 | 2021-09-03 | 北京海益同展信息科技有限公司 | Training method and classification method of classification model, related equipment and classification system |
CN111178934B (en) * | 2019-11-29 | 2024-03-08 | 北京深演智能科技股份有限公司 | Method and device for acquiring target object |
CN112925973B (en) * | 2019-12-06 | 2024-06-18 | 北京沃东天骏信息技术有限公司 | Data processing method and device |
CN113496304B (en) * | 2020-04-03 | 2024-03-08 | 北京达佳互联信息技术有限公司 | Method, device, equipment and storage medium for controlling delivery of network medium information |
CN111681057A (en) * | 2020-06-11 | 2020-09-18 | 北京深演智能科技股份有限公司 | Method and device for processing media resource for information delivery |
CN112651790B (en) * | 2021-01-19 | 2024-04-12 | 恩亿科(北京)数据科技有限公司 | OCPX self-adaptive learning method and system based on user touch in quick-elimination industry |
CN114792256B (en) * | 2022-06-23 | 2023-05-26 | 上海维智卓新信息科技有限公司 | Crowd expansion method and device based on model selection |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530304B (en) * | 2013-05-10 | 2017-05-10 | Tcl集团股份有限公司 | On-line recommendation method, system and mobile terminal based on self-adaption distributed computation |
CN104331459B (en) * | 2014-10-31 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | A kind of network resource recommended method and device based on on-line study |
US10592956B2 (en) * | 2015-05-22 | 2020-03-17 | Mastercard International Incorporated | Adaptive recommendation system and methods |
WO2016201631A1 (en) * | 2015-06-17 | 2016-12-22 | Yahoo! Inc. | Systems and methods for online content recommendation |
CN105069470A (en) * | 2015-07-29 | 2015-11-18 | 腾讯科技(深圳)有限公司 | Classification model training method and device |
CN105427129B (en) * | 2015-11-12 | 2020-09-04 | 腾讯科技(深圳)有限公司 | Information delivery method and system |
CN105447730B (en) * | 2015-12-25 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Target user orientation method and device |
CN106022865A (en) * | 2016-05-10 | 2016-10-12 | 江苏大学 | Goods recommendation method based on scores and user behaviors |
CN106355449B (en) * | 2016-08-31 | 2021-09-07 | 腾讯科技(深圳)有限公司 | User selection method and device |
-
2017
- 2017-02-15 CN CN201710081843.2A patent/CN108427690B/en active Active
-
2018
- 2018-02-07 WO PCT/CN2018/075521 patent/WO2018149337A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2018149337A1 (en) | 2018-08-23 |
CN108427690A (en) | 2018-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427690B (en) | Information delivery method and device | |
CN105427129B (en) | Information delivery method and system | |
CN105608179B (en) | The method and apparatus for determining the relevance of user identifier | |
US10348550B2 (en) | Method and system for processing network media information | |
CN109165975B (en) | Label recommending method, device, computer equipment and storage medium | |
CN109800320B (en) | Image processing method, device and computer readable storage medium | |
CN108616491B (en) | Malicious user identification method and system | |
WO2019061989A1 (en) | Loan risk control method, electronic device and readable storage medium | |
US20140095308A1 (en) | Advertisement distribution apparatus and advertisement distribution method | |
CN106530015A (en) | Information releasing control method and apparatus | |
CN110750658B (en) | Recommendation method of media resource, server and computer readable storage medium | |
CN108269122B (en) | Advertisement similarity processing method and device | |
CN111666492A (en) | Information pushing method, device and equipment based on user behaviors and storage medium | |
CN106776925B (en) | Method, server and system for predicting gender of mobile terminal user | |
CN108717519B (en) | Text classification method and device | |
US12047466B2 (en) | Content processing method and apparatus, computer device, and storage medium | |
WO2015124024A1 (en) | Method and device for promoting exposure rate of information, method and device for determining value of search word | |
CN111275205A (en) | Virtual sample generation method, terminal device and storage medium | |
CN109308332B (en) | Target user acquisition method and device and server | |
CN109543940B (en) | Activity evaluation method, activity evaluation device, electronic equipment and storage medium | |
WO2019001463A1 (en) | Data processing method and apparatus | |
CN106910135A (en) | User recommends method and device | |
CN112884529A (en) | Advertisement bidding method, device, equipment and medium | |
WO2019133164A1 (en) | System and method for recommending features for content presentations | |
CN111538909A (en) | Information recommendation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |