CN105427129B - Information delivery method and system - Google Patents

Information delivery method and system Download PDF

Info

Publication number
CN105427129B
CN105427129B CN201510768448.2A CN201510768448A CN105427129B CN 105427129 B CN105427129 B CN 105427129B CN 201510768448 A CN201510768448 A CN 201510768448A CN 105427129 B CN105427129 B CN 105427129B
Authority
CN
China
Prior art keywords
determining
crowd
population
seed
candidate seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510768448.2A
Other languages
Chinese (zh)
Other versions
CN105427129A (en
Inventor
叶幸春
张海川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510768448.2A priority Critical patent/CN105427129B/en
Publication of CN105427129A publication Critical patent/CN105427129A/en
Application granted granted Critical
Publication of CN105427129B publication Critical patent/CN105427129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses an information delivery method and an information delivery system, wherein the method comprises the steps of obtaining seed population, and determining a characteristic weight value vector of the seed population through a first preset model; expanding the seed crowd based on the characteristic weight value vector to generate a candidate seed crowd; determining an optimal candidate seed crowd according to historical data about information delivery of the candidate seed crowd; determining a characteristic weight value vector of the optimal candidate seed population through a second preset model; and performing secondary expansion on the basis of the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd. According to the method, the optimal candidate seed crowd is determined by combining the historical data of the candidate seed crowd about information delivery, secondary model training and secondary expansion are performed, the target crowd for information delivery is determined, non-high-quality seed users which may have redundancy or low reliability and the like are filtered, the data calculation amount is reduced, the determination of the target crowd is more accurate, and the advertisement delivery effect is improved.

Description

Information delivery method and system
Technical Field
The invention belongs to the technical field of communication, and particularly relates to an information delivery method and system.
Background
With the development of internet technology, various instant messaging and social applications are also in the wake of the spring as if they were in the rain. The large amount of user data involved in instant messaging and social applications, such as user preferences, age, and needs, is of great significance for the delivery of information, such as advertisements.
In the prior art, people who have the same requirements and interests for products and services and are collected under a specific business scene are called seed people, and the seed people are usually small in number and generally less than one hundred thousand; people with the same characteristics as the seed population are called extended population, and the number of the extended population is usually multiple times of that of the seed population. Generally, when the advertisement is delivered, the extended crowd is found through the seed crowd firstly, then the extended crowd is used as the target user of the advertisement delivery, when a plurality of seed crowds exist, the extended crowd of each seed crowd can be found firstly, and then the intersection of the extended crowd of each seed crowd can be taken as the target user of the final advertisement delivery.
In the research and practice process of the prior art, the inventor of the invention finds that the data calculation amount is huge when the seed population is more because the intersection of the extended population of all the seed populations is selected as the target user for advertisement delivery in the prior art; in addition, non-good seed users with redundancy or low reliability may exist in the seed population, which may affect the determination of the target user and lead to poor advertisement putting effect.
Disclosure of Invention
The invention aims to provide an information delivery method and an information delivery system, which aim to reduce the data calculation amount and improve the advertisement delivery effect.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
a method for delivering information comprises the following steps:
acquiring a seed crowd, and determining a characteristic weight value vector of the seed crowd through a first preset model;
expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd;
counting historical data of the candidate seed crowd about information delivery, and determining the optimal candidate seed crowd according to the historical data, wherein the historical data comprises click rate;
determining a characteristic weight value vector of the optimal candidate seed population through a second preset model;
and expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd so as to deliver information to the target crowd.
In order to solve the above technical problems, embodiments of the present invention further provide the following technical solutions:
a system for delivering information, comprising:
the acquisition unit is used for acquiring seed population;
the first determining unit is used for determining a characteristic weight value vector of the seed crowd through a first preset model;
the expansion unit is used for expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd;
the optimal population determining unit is used for counting historical data of the candidate seed population about information delivery, and determining the optimal candidate seed population according to the historical data, wherein the historical data comprises click rate;
the second determining unit is used for determining a characteristic weight value vector of the optimal candidate seed crowd through a second preset model;
and the releasing unit is used for expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd so as to release information to the target crowd.
Compared with the prior art, the method and the device have the advantages that the characteristic weight value vector of the seed crowd is determined through model training, the seed crowd is expanded based on the characteristic weight value vector to generate the candidate seed crowd, the optimal candidate seed crowd is determined by combining historical data of the candidate seed crowd about information delivery, secondary expansion is conducted on the optimal candidate seed crowd to determine the target crowd of the information delivery, non-high-quality seed users with redundancy or low reliability and the like which may exist are filtered, data calculation amount is greatly reduced, the target crowd can be determined more accurately, and the advertisement delivery effect is improved.
Drawings
The technical solution and other advantages of the present invention will become apparent from the following detailed description of specific embodiments of the present invention, which is to be read in connection with the accompanying drawings.
Fig. 1a is a schematic view of an application scenario of an information delivery system according to an embodiment of the present invention;
fig. 1b is a schematic flow chart of a method for delivering information according to a first embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for delivering information according to a second embodiment of the present invention;
fig. 3a is a schematic structural diagram of a delivery system for information provided by a third embodiment of the present invention;
fig. 3b is another schematic structural diagram of a delivery system for information provided by a third embodiment of the present invention.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present invention are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the invention and should not be taken as limiting the invention with regard to other embodiments that are not detailed herein.
In the description that follows, specific embodiments of the present invention are described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the invention have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, but on the contrary, it is to be understood that various steps and operations described hereinafter may be implemented in hardware.
The term "module" as used herein may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as objects implemented on the computing system. The system and method described herein are preferably implemented in software, but may be implemented in hardware without departing from the scope of the invention.
The embodiment of the invention provides an information delivery method and system.
Referring to fig. 1a, the application scenario of the information delivery system provided in the embodiment of the present invention is schematically illustrated, where the application scenario includes an information delivery system, which is mainly used for acquiring seed groups, such as groups of people having the same needs and interests for products and services; determining a characteristic weight value vector of the seed crowd through a first preset model according to the seed crowd; expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd; counting historical data of the candidate seed crowd about information delivery, such as click rate and/or transaction information; the method comprises the steps of determining a target crowd from candidate seed crowds according to historical data, specifically, determining an optimal candidate seed crowd according to the historical data, determining a characteristic weight value vector of the optimal candidate seed crowd through a second preset model, expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd, and determining the target crowd so as to deliver information to the target crowd, such as advertisement delivery and the like.
In addition, the application scene can also comprise a database which is mainly used for storing a mass of seed crowds to be candidate; certainly, the relevant data of the seed population may be derived from an information delivery platform and a trading platform, that is, the application scenario may further include the information delivery platform and the trading platform, where the information delivery platform is mainly used to record click rate relevant data such as playing times, a like rate, a dislike rate, and the like of a network video, and to deliver information, such as advertisement delivery, according to a target population determined by the information delivery system. The trading platform is mainly used for recording relevant data of information trading, such as the relation between the information playing times and the trading value, the trading record of a user and the like.
The details will be described below separately.
First embodiment
In this embodiment, a description will be given from the perspective of an information delivery system, which may be specifically integrated in a network device such as a server.
A method for delivering information comprises the following steps: acquiring a seed crowd, and determining a characteristic weight value vector of the seed crowd through a first preset model; expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd; counting historical data of the candidate seed crowd about information delivery, and determining the optimal candidate seed crowd according to the historical data, wherein the historical data comprises click rate; determining a characteristic weight value vector of the optimal candidate seed population through a second preset model; and expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd so as to deliver information to the target crowd.
Referring to fig. 1b, fig. 1b is a schematic flow chart of a method for delivering information according to a first embodiment of the present invention. The method comprises the following steps:
in step S101, a seed population is obtained, and a feature weight value vector of the seed population is determined through a first preset model.
It can be understood that the seed population in the embodiment of the present invention refers to the population collected in a specific business scenario and having the same needs and interests in products and services, and the seed population is usually not large in number, generally less than one hundred thousand.
It is also conceivable that the seed population may be collected in a predetermined database, wherein the database data may be uploaded by the information provider of various types of information (such as advertisements) or may be obtained from the corresponding trading platform.
After the seed crowd is obtained, the seed crowd needs to be analyzed to determine a characteristic weight value vector of the seed crowd, for example, determining the characteristic weight value vector of the seed crowd through a first preset model may specifically include:
(1) determining a seed population as a first positive example sample;
(2) acquiring a non-seed population, and determining the non-seed population as a first negative sample;
it can be understood that, in order to find out the user group similar to the seed group from the large-disc users, the problem is converted into a classical two-classification (0, 1) problem, that is, the output result has only two categories, for example: (positive/negative), (sick/not), spam/not spam, (enemy/not enemy), etc.
The sample labeled in the sample data used by the training model in the two-classification model is called a positive sample, the seed population belongs to the positive sample labeled off-line, the sample unlabeled in the sample data used by the training model in the two-classification model is called a negative sample, and the sample includes potential labeled samples.
For example, assuming that the seed population is a user interested in a certain brand of tablet computer, the seed user package at this time is defined as a positive sample of classification model training learning, referred to as a first positive sample, and a negative sample (referred to as a first negative sample herein) of corresponding data is found from the large-disk user, i.e., the large candidate seed population in the aforementioned preset database, for model training learning.
Further, for example, the first negative example sample may be defined as a group of people randomly sampled after the large disk user removes all seed people and collects the group of people equal to the first positive example sample.
(3) Respectively determining a feature vector of a first positive sample and a feature vector of a first negative sample;
(4) importing the feature vector of the first positive sample and the feature vector of the first negative sample into a first preset model for training and learning to generate a first logistic regression classification model;
(5) and determining a corresponding characteristic weight value vector according to the first logistic regression classification model.
In the embodiment of the invention, the first positive example and the first negative example are two target values for model learning, and are interested or not interested in a certain brand of tablet computer.
Therefore, the first preset model can be understood as a classification model, the purpose of the classification model is to normalize the interesting or uninteresting estimation of the large-disk users to 0, 1, and a threshold value can be selected according to the probability of the estimation to map the classification result to 0 or 1, so that the interested or uninteresting prediction of the large-disk users on a certain brand of tablet computers is realized.
That is, after the first preset model is imported for training and learning, the feature weight vectors having the discrimination for the first positive sample and the first negative sample can be output.
It can be understood that, in the embodiment of the present invention, the feature vector refers to a feature set of a sample user cut from global features of a large-disk user, and the global features refer to a sum of user interest categories selected by a large-disk user on multiple social platforms, for example, the global features corresponding to a Tencent large-disk user may specifically include a QQ group interest category, a WeChat public signal interest category, an advertisement commercial interest category, a mobile phone APP category, and the like.
Preferably, for the selection of the classification model, a Logistic Regression (LR) model which is mature, has good universality and excellent high latitude extension gender is used in this embodiment; it is easy to think that other logic models such as Decision Tree (DT) and Support Vector Machine (SVM) can be used to implement classification, and this is only an example and should not be construed as a limitation to the present invention.
In step S102, the seed population is expanded based on the feature weight vector, and a corresponding candidate seed population is generated.
For example, the seed population expansion may be specifically as follows:
1. expanding the seed crowd according to the characteristic weight value vector to obtain an expanded crowd;
2. and determining the extended population meeting the preset condition as a candidate seed population.
Wherein, the extended population refers to the population with the same characteristics as the seed population, and the number of the extended population is usually several times of that of the seed population.
After the extended population is obtained, the extended population meeting the preset condition can be determined as a candidate seed population, for example, a candidate user corresponding to an answer interval (0.5-1) of a two-classification problem is determined as a candidate seed population.
In step S103, historical data of the candidate seed population about information delivery is counted, and an optimal candidate seed population is determined according to the historical data.
In the embodiment of the present invention, the historical data may specifically include click rate and/or transaction information;
for example, the historical data of the statistical candidate seed population about information delivery may be as follows:
A. acquiring a database corresponding to a preset information delivery platform and/or a preset transaction platform;
B. and counting historical data of the candidate seed crowd about information delivery in the acquired database.
It is understood that the historical data includes click-through rate and/or transaction information, wherein the click-through rate can be obtained within a predetermined time period, such as a week, a month, etc.; may mainly include a like rate (number of people who like/number of people who expose), a dislike rate (number of people who dislike/number of people who expose), a comment rate (number of people who fill comments/number of people who expose), and the like. The transaction information may include information such as transaction amount, transaction value, etc. of the product or service.
It will be appreciated that there are many ways to determine the optimal candidate seed population, such as: and sequencing the candidate seed crowd according to the historical data, and determining the candidate seed crowd meeting the preset rule as the optimal candidate seed crowd based on the sequencing.
In step S104, a feature weight value vector of the optimal candidate seed population is determined through a second preset model.
In step S105, based on the feature weight vector of the optimal candidate seed population, the optimal candidate seed population is expanded to generate a target population, so as to deliver information to the target population.
The steps S104 and S105 may specifically be:
based on the optimal candidate seed crowd, performing secondary training model and evaluation and secondary crowd extension, and further determining a target crowd, for example, the determination of the feature weight value vector of the optimal candidate seed crowd may specifically include:
a. determining the optimal candidate seed population as a second positive example sample;
b. determining a second negative example sample from the population of candidate seeds;
c. respectively determining a feature vector of a second positive sample and a feature vector of a second negative sample;
d. importing the feature vector of the second positive sample and the feature vector of the second negative sample into a second preset model for training and learning to generate a second logistic regression classification model;
e. and determining a corresponding characteristic weight value vector according to the second logistic regression classification model.
It can be understood that, in the secondary training model and the evaluation, the process of determining the feature weight value vector of the optimal candidate seed population may refer to the process of determining the feature weight value vector of the seed population, and in the secondary population expansion, the process of expanding the optimal candidate seed population may specifically refer to the process of expanding the seed population, which is not described herein again.
It can be seen from the above description that, in the information delivery method provided in this embodiment, the characteristic weight value vector of the seed population is determined through model training, then the seed population is expanded based on the characteristic weight value vector to generate candidate seed population, the optimal candidate seed population is determined by combining historical data of the candidate seed population about information delivery, secondary expansion is performed on the optimal candidate seed population to determine the target population for information delivery, and non-high-quality seed users with redundancy or low reliability and the like, which may exist, are filtered, so that not only is the data calculation amount greatly reduced, but also the target population can be determined more accurately, and the advertisement delivery effect is improved.
Second embodiment
The method according to the first embodiment will be described in further detail below by way of example.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for delivering information according to a second embodiment of the present invention. The embodiment of the invention provides a method for selecting the best quality seeds from seed crowds and carrying out advertisement putting on the seeds and the expanded users thereof based on the basis that the expanded crowds have past network wide click rate, selling value, praise, dislike and the like as important reference indexes for evaluating the quality of the seed crowds. The method comprises the following steps:
in step S201, the information delivery system acquires a plurality of groups of seed groups.
First, multiple sets of seed populations are collected from different channels, such as offline CRM user data, online label data for individual product data teams, and so on.
Specifically, for example, the seed population may be collected in a preset database, where the database data may be uploaded by an information owner of various information (such as an advertisement), or may be obtained from a corresponding trading platform, which is not limited herein.
In step S202, the information delivery system determines a first positive example and a first negative example.
In the embodiment of the present invention, it is assumed that the seed population is a user interested in a certain brand of tablet computer, at this time, the multiple groups of seed population are defined as positive samples for training and learning of the classification model, and for convenience of distinguishing and elucidating, the positive samples are referred to as first positive example samples, and correspondingly, after all seed populations of the large-disk users are removed and are collected, multiple groups of population equal to the number of the positive example samples of the seed population are randomly sampled and defined as first negative example samples.
In step S203, the information delivery system determines feature weight vectors of a plurality of groups of seed groups through model training.
For example, a plurality of sets of feature vectors of the first positive sample and the first negative sample are introduced into a first preset model for training and learning, and a plurality of logistic regression LR classification models are created, that is, a plurality of sets of feature weight vectors having a distinction degree between the first positive sample and the first negative sample can be output, wherein the feature having a distinction degree refers to a feature capable of distinguishing the first positive sample and the first negative sample. If the first positive example is characterized by liking to listen to a song and the first negative example is characterized by disliking to listen to a song, the listening to a song is characterized by having a degree of discrimination.
It can be understood that, in the embodiment of the present invention, the feature vector refers to a feature set of a sample user cut from global features of a large-disk user, and the global features refer to a sum of user interest categories selected by a large-disk user on multiple social platforms, for example, the global features corresponding to a Tencent large-disk user may specifically include a QQ group interest category, a WeChat public signal interest category, an advertisement commercial interest category, a mobile phone APP category, and the like.
For example, the seed population feature vector may be specifically cut from the global feature, and preferably, the feature may be selected entirely in the global feature without manual selection; and the non-seed population is obtained by randomly sampling a plurality of groups of population equal to the number of the positive examples of the seed population after the seed population is removed from the large-disk users and collected, and cutting the feature vectors from the global features, wherein the feature vectors of the seed population and the non-seed population are ready at the moment.
In step S204, the information delivery system performs population expansion on the seed population to generate a plurality of sets of candidate seed populations.
According to the feature weight vectors output by the training result in step S203, the global features of the large-panel users are respectively calculated to obtain a plurality of groups of extended people, where the extended modes are many, for example, the extended modes may be extended according to each feature value, or the main feature values may be extended, and so on.
Furthermore, users with the prediction probability of more than 0.5 are regarded as candidate users, so that a plurality of groups of candidate seed crowds are output, namely, the candidate users corresponding to the answer interval (0.5-1) of one two-classification problem are determined as the candidate seed crowds.
In step S205, the information delivery system counts the historical data of the candidate seed population about information delivery.
Wherein the historical data comprises click rate and/or transaction information;
for example, the historical overall click rate and selling value, or the click rate, selling value, like rate and dislike rate under a certain delivery platform and the same type of information are calculated for the multiple sets of candidate seed groups output in step S205.
In step S206, the information delivery system determines an optimal candidate seed population.
And referring to historical advertisement effect data corresponding to each group of candidate seed crowds, comprehensively sequencing all candidate extended crowds according to different requirements and click rate and selling value, and selecting the users with the top rank as the optimal candidate seed crowds.
In step S207, the information delivery system determines a feature weight vector of the optimal candidate seed population through secondary model training.
The method specifically comprises the following steps:
(1) determining a second positive example sample and a second negative example sample;
(2) respectively determining a feature vector of a second positive sample and a feature vector of a second negative sample;
(3) importing the feature vector of the second positive sample and the feature vector of the second negative sample into a second preset model for training and learning to generate a second logistic regression classification model;
(4) and determining a corresponding characteristic weight value vector according to the second logistic regression classification model.
And (4) taking the optimal candidate seed population determined in the step (S206) as a second positive example sample, and randomly sampling a group of populations with the same number as the second positive example sample after removing the optimal candidate seed population from the candidate seed population to define the second negative example sample.
For example, the optimal candidate seed population feature vector may be specifically cut from the global feature, and preferably, the feature may be selected entirely in the global feature without manual selection; and then cutting feature vectors of the determined non-optimal candidate seed population from the global features, leading the feature vectors of the optimal candidate seed population and the non-optimal candidate seed population into a second preset model, namely, carrying out training learning on the classification model, and outputting a group of feature weight value vectors with the discrimination of second positive and negative sample.
Preferably, for the selection of the classification model, a logistic regression LR model which is mature, good in universality and excellent in high latitude expansion gender is used in the embodiment; it is easy to think that the classification can be implemented by selecting other logic models such as decision tree DT and support vector machine SVM, which are only used as examples and are not meant to limit the invention.
In step S208, the information delivery system performs secondary crowd expansion to generate a target crowd.
Calculating the global features of the large-disk users according to the multiple groups of feature weights output by the training result in the last step to obtain an extended population of the optimal candidate seed population, and further taking the extended users with the prediction probability of more than 0.5 as the final information delivery target population, namely determining the extended users corresponding to the answer interval (0.5-1) of a two-class problem as the target population.
It can be understood that in some more precise scenarios, a third training of the model and a third population expansion may also be performed, and are not described herein again.
As can be seen from the above, in the information delivery method provided in this embodiment, first, the characteristic weight value vector of the seed population is determined through model training, then, the seed population is expanded based on the characteristic weight value vector to generate a candidate seed population, a target population is determined by combining historical data of the candidate seed population about information delivery, and non-high-quality seed users with redundancy or low reliability and the like which may exist are filtered; that is to say, according to the invention, the historical information effect performance of the user is expanded, including click rate, transaction value and the like as important reference indexes for evaluating the output-input ratio, so that the data calculation amount is greatly reduced, the target population can be more accurately determined, and the advertisement putting effect is improved; further, the optimal candidate seed crowd is determined through secondary model training learning and secondary crowd extension, so that the optimal target crowd is determined, for example, based on the method, the finally defined advertisement delivery target users are extended from the optimal seed crowd, the target users can be ensured to be selected as the optimal result, and accurate advertisement delivery can be performed on the target crowd.
Third embodiment
In order to better implement the information delivery method provided by the embodiment of the invention, the embodiment of the invention also provides a system based on the information delivery method. The terms are the same as those in the method for delivering the information, and specific implementation details can refer to the description in the method embodiment.
Referring to fig. 3a, fig. 3a is a schematic structural diagram of an information delivery system according to an embodiment of the present invention, where the information delivery system may include an obtaining unit 301, a first determining unit 302, an expanding unit 303, an optimal crowd determining unit 304, a second determining unit 305, and a delivering unit 306.
The acquiring unit 301 is used for acquiring seed population; the determining unit 302 is configured to determine, according to the seed population, a feature weight value vector of the seed population through a first preset model.
It can be understood that the seed population in the embodiment of the present invention refers to the population collected in a specific business scenario and having the same needs and interests in products and services, and the seed population is usually not large in number, generally less than one hundred thousand.
It is also conceivable that the seed population may be collected in a predetermined database, wherein the database data may be uploaded by the information provider of various types of information (such as advertisements) or may be obtained from the corresponding trading platform.
An expanding unit 303, configured to expand the seed population based on the feature weight value vector to generate a corresponding candidate seed population; an optimal population determining unit 304, configured to count historical data of the candidate seed population about information delivery, and determine an optimal candidate seed population according to the historical data, where the historical data includes a click rate.
Then, the second determining unit 305 is configured to determine a feature weight value vector of the optimal candidate seed population through a second preset model, and the delivering unit 306 is configured to expand the optimal candidate seed population based on the feature weight value vector of the optimal candidate seed population to generate a target population, so as to deliver information to the target population.
Referring to fig. 3b, for another schematic structural diagram of the information delivery system provided in the embodiment of the present invention, the first determining unit 302 may specifically include:
(1) a first determining subunit 3021, configured to determine the seed population as a first positive example sample;
(2) a second determining subunit 3022, configured to obtain a non-seed population, and determine the non-seed population as a first negative example sample;
for example, assuming that the seed population is a user interested in a certain brand of tablet computer, the seed user package at this time is defined as a positive sample of classification model training learning, referred to as a first positive sample, and a negative sample (referred to as a first negative sample herein) of corresponding data is found from the large-disk user, i.e., the large candidate seed population in the aforementioned preset database, for model training learning.
Further, for example, the first negative example sample may be defined as a group of people randomly sampled after the large disk user removes all seed people and collects the group of people equal to the first positive example sample.
(3) A third determining subunit 3023, configured to determine a feature vector of the first positive example and a feature vector of the first negative example, respectively;
(4) a first model generation subunit 3024, configured to introduce the feature vector of the first positive example sample and the feature vector of the first negative example sample into a first preset model for training and learning, so as to generate a first logistic regression classification model;
(5) a fourth determining subunit 3025, configured to determine corresponding feature weight value vectors according to the first logistic regression classification model.
It can be understood that, in order to find out the user group similar to the seed group from the large-disc users, the problem is converted into a classical two-classification (0, 1) problem, that is, the output result has only two categories, for example: (positive/negative) (sick/not) (spam/non-spam) (enemy/non-enemy).
In the embodiment of the invention, the first positive example and the first negative example are two target values for model learning, and are interested or not interested in a certain brand of tablet computer.
Therefore, the first preset model can be understood as a classification model, the purpose of the classification model is to normalize the interesting or uninteresting estimation of the large-disk users to 0, 1, and a threshold value can be selected according to the probability of the estimation to map the classification result to 0 or 1, so that the interested or uninteresting prediction of the large-disk users on a certain brand of tablet computers is realized.
That is, after the first preset model is imported for training and learning, the feature weight vectors having the discrimination for the first positive sample and the first negative sample can be output.
It can be understood that, in the embodiment of the present invention, the feature vector refers to a feature set of a sample user cut from global features of a large-disk user, and the global features refer to a sum of user interest categories selected by a large-disk user on multiple social platforms, for example, the global features corresponding to a Tencent large-disk user may specifically include a QQ group interest category, a WeChat public signal interest category, an advertisement commercial interest category, a mobile phone APP category, and the like.
Preferably, for the selection of the classification model, a logistic regression LR model which is mature, good in universality and excellent in high latitude expansion gender is used in the embodiment; it is easy to think that the classification can be implemented by selecting other logic models such as decision tree DT and support vector machine SVM, which are only used as examples and are not meant to limit the invention.
Based on this, the extension unit 303 may be specifically configured to:
expanding the seed crowd according to the characteristic weight value vector to obtain an expanded crowd; and determining the extended population meeting the preset condition as a candidate seed population.
Wherein, the extended population refers to the population with the same characteristics as the seed population, and the number of the extended population is usually several times of that of the seed population.
After the extended population is obtained, the extended population meeting the preset condition can be determined as a candidate seed population, for example, a candidate user corresponding to an answer interval (0.5-1) of a two-classification problem is determined as a candidate seed population.
Further, the optimal population determining unit 304 may count historical data of candidate seed populations about information delivery, which may be specifically configured to:
and acquiring a database corresponding to a preset information release platform and/or a preset transaction platform, and counting historical data of the candidate seed population about information release in the database.
It is understood that the historical data includes click-through rate and/or transaction information, wherein the click-through rate can be obtained within a predetermined time period, such as a week, a month, etc.; may mainly include a like rate (number of people who like/number of people who expose), a dislike rate (number of people who dislike/number of people who expose), a comment rate (number of people who fill comments/number of people who expose), and the like. The transaction information may include information such as transaction amount, transaction value, etc. of the product or service.
It can be understood that there are many ways to determine the optimal candidate seed population, and preferably, the optimal population determining unit 304 may be further configured to sort the candidate seed population according to the historical data; and determining the candidate seed crowd meeting the preset rule as the optimal candidate seed crowd based on the sorting.
Based on the optimal candidate seed population, performing a secondary training model and evaluation, and performing secondary population expansion, so as to determine a target population, for example, the determining, by the second determining unit 305, a feature weight value vector of the optimal candidate seed population may specifically include:
a fifth determining subunit 3051, configured to determine the optimal candidate seed population as a second positive example;
a sixth determining subunit 3052, configured to determine a second negative example sample from the candidate seed population;
a seventh determining subunit 3053, configured to determine a feature vector of the second positive example and a feature vector of the second negative example, respectively;
the second model generation subunit 3054 is configured to introduce the feature vector of the second positive example and the feature vector of the second negative example into a second preset model for training and learning, so as to generate a second logistic regression classification model;
an eighth determining subunit 3055, configured to determine, according to the second logistic regression classification model, a corresponding feature weight value vector.
It can be understood that, in the secondary training model and the evaluation, the process of determining the feature weight value vector of the optimal candidate seed population may refer to the process of determining the feature weight value vector of the seed population, and in the secondary population expansion, the process of expanding the optimal candidate seed population may specifically refer to the process of expanding the seed population, which is not described herein again.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
The information delivery system may be specifically integrated in a network device such as a server.
It can be seen from the above description that, in the information delivery system provided in this embodiment, the characteristic weight value vector of the seed population is determined through model training, then the seed population is expanded based on the characteristic weight value vector to generate candidate seed population, the optimal candidate seed population is determined by combining historical data of the candidate seed population about information delivery, secondary expansion is performed on the optimal candidate seed population to determine the target population for information delivery, and non-high-quality seed users with redundancy or low reliability and the like, which may exist, are filtered, so that not only is the data calculation amount greatly reduced, but also the target population can be determined more accurately, and the advertisement delivery effect is improved.
In the above embodiments, the descriptions of the embodiments have respective emphasis, and a part which is not described in detail in a certain embodiment may refer to the above detailed description of the information delivery method, and is not described herein again.
The information delivery system provided in the embodiment of the present invention is, for example, a computer, a tablet computer, a mobile phone with a touch function, and the like, and the information delivery system and the information delivery method in the above embodiments belong to the same concept, and any method provided in the information delivery method embodiment may be operated on the information delivery system, and a specific implementation process thereof is described in the information delivery method embodiment, and is not described herein again.
It should be noted that, for the information delivery method of the present invention, it can be understood by a person skilled in the art that all or part of the process of implementing the information delivery method of the embodiment of the present invention can be completed by controlling related hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of a terminal, and executed by at least one processor in the terminal, and during the execution process, the process of the embodiment of the information delivery method can be included. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
For the information delivery system according to the embodiment of the present invention, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.
The information delivery method and system provided by the embodiment of the present invention are described in detail above, and the principle and the implementation manner of the present invention are explained in the present document by applying specific examples, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (11)

1. A method for delivering information, comprising:
acquiring a seed population, and determining the seed population as a first positive example sample;
acquiring a non-seed population, and determining the non-seed population as a first negative sample;
respectively determining a feature vector of the first positive sample and a feature vector of the first negative sample;
importing the feature vector of the first positive sample and the feature vector of the first negative sample into a first preset model for training and learning to generate a first logistic regression classification model;
determining a corresponding feature weight value vector according to the first logistic regression classification model;
expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd;
counting historical data of the candidate seed crowd about information delivery, and determining the optimal candidate seed crowd according to the historical data, wherein the historical data comprises click rate;
determining a characteristic weight value vector of the optimal candidate seed population through a second preset model;
and expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd so as to deliver information to the target crowd.
2. The method of claim 1, wherein the expanding the seed population based on the feature weight value vector to generate a corresponding candidate seed population comprises:
expanding the seed crowd according to the characteristic weight value vector to obtain an expanded crowd;
and determining the extended population meeting the preset condition as a candidate seed population.
3. The method for delivering information according to claim 1 or 2, wherein said statistics of historical data of said candidate seed population about information delivery comprises:
acquiring a database corresponding to a preset information delivery platform and/or a preset transaction platform;
and counting historical data of the candidate seed crowd about information delivery in the database.
4. The method for delivering information according to claim 3, wherein said determining an optimal candidate seed population based on said historical data comprises:
sorting the candidate seed population according to the historical data;
and determining the candidate seed crowd meeting the preset rule as the optimal candidate seed crowd based on the sorting.
5. The information delivery method according to claim 4, wherein the determining the feature weight value vector of the optimal candidate seed population through the second preset model comprises:
determining the optimal candidate seed population as a second positive example sample;
determining a second negative example sample from the candidate seed population;
respectively determining a feature vector of the second positive sample and a feature vector of the second negative sample;
importing the feature vector of the second positive sample and the feature vector of the second negative sample into a second preset model for training and learning to generate a second logistic regression classification model;
and determining a corresponding characteristic weight value vector according to the second logistic regression classification model.
6. A system for delivering information, comprising:
the acquisition unit is used for acquiring seed population;
the first determining unit is used for determining a characteristic weight value vector of the seed crowd through a first preset model;
the expansion unit is used for expanding the seed crowd based on the characteristic weight value vector to generate a corresponding candidate seed crowd;
the optimal population determining unit is used for counting historical data of the candidate seed population about information delivery, and determining the optimal candidate seed population according to the historical data, wherein the historical data comprises click rate;
the second determining unit is used for determining a characteristic weight value vector of the optimal candidate seed crowd through a second preset model;
the releasing unit is used for expanding the optimal candidate seed crowd based on the characteristic weight value vector of the optimal candidate seed crowd to generate a target crowd so as to release information to the target crowd;
wherein the first determination unit includes:
a first determining subunit, configured to determine the seed population as a first positive example;
the second determining subunit is used for acquiring a non-seed population and determining the non-seed population as a first negative example sample;
a third determining subunit, configured to determine a feature vector of the first positive example and a feature vector of the first negative example, respectively;
the first model generation subunit is configured to introduce the feature vector of the first positive example sample and the feature vector of the first negative example sample into a first preset model for training and learning, and generate a first logistic regression classification model;
and the fourth determining subunit is used for determining corresponding feature weight value vectors according to the first logistic regression classification model.
7. The information delivery system according to claim 6, wherein the extension unit is specifically configured to:
expanding the seed crowd according to the characteristic weight value vector to obtain an expanded crowd; and determining the extended population meeting the preset condition as a candidate seed population.
8. Delivery system of information according to claim 6 or 7, wherein the optimal population determination is specifically configured to:
and acquiring a database corresponding to a preset information release platform and/or a preset transaction platform, and counting historical data of the candidate seed population about information release in the database.
9. The information delivery system according to claim 8, wherein the optimal population determining unit is specifically configured to rank the candidate seed populations according to the historical data; and determining the candidate seed crowd meeting the preset rule as the optimal candidate seed crowd based on the sorting.
10. The information delivery system according to claim 9, wherein the second determination unit comprises:
a fifth determining subunit, configured to determine the optimal candidate seed population as a second positive example;
a sixth determining subunit, configured to determine a second negative example sample from the candidate seed population;
a seventh determining subunit, configured to determine a feature vector of the second positive example and a feature vector of the second negative example respectively;
the second model generation subunit is configured to introduce the feature vector of the second positive example sample and the feature vector of the second negative example sample into a second preset model for training and learning, so as to generate a second logistic regression classification model;
and the eighth determining subunit is configured to determine a corresponding feature weight value vector according to the second logistic regression classification model.
11. A storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method according to any of claims 1-5.
CN201510768448.2A 2015-11-12 2015-11-12 Information delivery method and system Active CN105427129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510768448.2A CN105427129B (en) 2015-11-12 2015-11-12 Information delivery method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510768448.2A CN105427129B (en) 2015-11-12 2015-11-12 Information delivery method and system

Publications (2)

Publication Number Publication Date
CN105427129A CN105427129A (en) 2016-03-23
CN105427129B true CN105427129B (en) 2020-09-04

Family

ID=55505314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510768448.2A Active CN105427129B (en) 2015-11-12 2015-11-12 Information delivery method and system

Country Status (1)

Country Link
CN (1) CN105427129B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931079A (en) * 2016-04-29 2016-09-07 合网络技术(北京)有限公司 Method and apparatus for diffusing seed users
CN108280670B (en) 2017-01-06 2022-06-21 腾讯科技(深圳)有限公司 Seed crowd diffusion method and device and information delivery system
CN108427690B (en) * 2017-02-15 2022-09-13 腾讯科技(深圳)有限公司 Information delivery method and device
CN107633421A (en) * 2017-08-24 2018-01-26 武汉科技大学 A kind of processing method and processing device of market prediction data
CN107729937B (en) * 2017-10-12 2020-11-03 北京京东尚科信息技术有限公司 Method and device for determining user interest tag
CN107679920A (en) * 2017-10-20 2018-02-09 北京奇艺世纪科技有限公司 The put-on method and device of a kind of advertisement
CN107918922B (en) * 2017-11-15 2020-10-27 中国联合网络通信集团有限公司 Service recommendation method and service recommendation device
CN109801091B (en) * 2017-11-16 2022-12-20 腾讯科技(深圳)有限公司 Target user group positioning method and device, computer equipment and storage medium
CN109961310A (en) * 2017-12-26 2019-07-02 北京奇虎科技有限公司 Advertisement placement method, server and computer readable storage medium
CN108053260A (en) * 2017-12-27 2018-05-18 北京奇虎科技有限公司 A kind of method and system that extending user is determined according to statistics interest-degree
CN112106067A (en) 2018-05-18 2020-12-18 北京嘀嘀无限科技发展有限公司 System and method for user analysis
CN109034896A (en) * 2018-07-23 2018-12-18 北京奇艺世纪科技有限公司 Crowd's prediction technique and device are launched in a kind of advertisement
CN109359236A (en) * 2018-08-22 2019-02-19 中国平安人寿保险股份有限公司 Target user's method for digging and device based on big data analysis
CN110866766A (en) * 2018-08-27 2020-03-06 阿里巴巴集团控股有限公司 Advertisement putting method, method for determining popularization crowd, server and client
CN109255656B (en) * 2018-08-31 2020-09-18 有米科技股份有限公司 User extension method, device and system based on composite model
CN110059244A (en) * 2019-02-01 2019-07-26 阿里巴巴集团控股有限公司 Audient's extended method and device
CN110110195B (en) * 2019-05-07 2022-05-17 宜人恒业科技发展(北京)有限公司 Impurity removal method and device
CN111178934B (en) * 2019-11-29 2024-03-08 北京深演智能科技股份有限公司 Method and device for acquiring target object
CN111563761A (en) * 2020-01-19 2020-08-21 深圳前海微众银行股份有限公司 Crowd expansion method, device, equipment and storage medium
CN111460299A (en) * 2020-03-31 2020-07-28 北京深演智能科技股份有限公司 Information delivery method and device
CN111582944B (en) * 2020-05-14 2023-08-22 北京金山安全软件有限公司 Crowd expansion method, device and equipment for advertisement and storage medium
CN111881364A (en) * 2020-07-22 2020-11-03 北京深演智能科技股份有限公司 Media file delivery method and device, storage medium and processor
CN111882360A (en) * 2020-07-30 2020-11-03 北京达佳互联信息技术有限公司 User group expansion method and device
CN112053184B (en) * 2020-08-20 2024-01-30 腾讯科技(深圳)有限公司 Popularization information delivery method and device, electronic equipment and storage medium
CN112950276B (en) * 2021-03-12 2022-03-29 华扬联众数字技术(深圳)有限公司 Seed population expansion method based on multi-order feature combination
CN113011922B (en) * 2021-03-18 2023-08-04 北京百度网讯科技有限公司 Method and device for determining similar crowd, electronic equipment and storage medium
CN112967100B (en) * 2021-04-02 2024-03-15 杭州网易云音乐科技有限公司 Similar crowd expansion method, device, computing equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456199A (en) * 2010-10-18 2012-05-16 北京学之途网络科技有限公司 Method and device for expanding internet user sample set and acquiring attribute parameter
CN103136267A (en) * 2011-12-01 2013-06-05 腾讯科技(深圳)有限公司 Expert mining method and device based on community
CN104751354A (en) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 Advertisement cluster screening method
CN104834641A (en) * 2014-02-11 2015-08-12 腾讯科技(北京)有限公司 Processing method of network medium information and relevant system
CN105023165A (en) * 2014-04-17 2015-11-04 腾讯科技(深圳)有限公司 Method, device and system for controlling release tasks in social networking platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7650317B2 (en) * 2006-12-06 2010-01-19 Microsoft Corporation Active learning framework for automatic field extraction from network traffic
CN100461182C (en) * 2007-05-24 2009-02-11 北京交通大学 Interactive video searching method based on multi-view angle
WO2011148571A1 (en) * 2010-05-24 2011-12-01 日本電気株式会社 Information extraction system, method, and program
CN103105924B (en) * 2011-11-15 2015-09-09 中国科学院深圳先进技术研究院 Man-machine interaction method and device
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN104537383A (en) * 2015-01-20 2015-04-22 全国组织机构代码管理中心 Massive organizational structure data classification method and system based on particle swarm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456199A (en) * 2010-10-18 2012-05-16 北京学之途网络科技有限公司 Method and device for expanding internet user sample set and acquiring attribute parameter
CN103136267A (en) * 2011-12-01 2013-06-05 腾讯科技(深圳)有限公司 Expert mining method and device based on community
CN104834641A (en) * 2014-02-11 2015-08-12 腾讯科技(北京)有限公司 Processing method of network medium information and relevant system
CN105023165A (en) * 2014-04-17 2015-11-04 腾讯科技(深圳)有限公司 Method, device and system for controlling release tasks in social networking platform
CN104751354A (en) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 Advertisement cluster screening method

Also Published As

Publication number Publication date
CN105427129A (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN105427129B (en) Information delivery method and system
CN110097066B (en) User classification method and device and electronic equipment
CN106355449B (en) User selection method and device
CN108280670B (en) Seed crowd diffusion method and device and information delivery system
CN109446430B (en) Product recommendation method and device, computer equipment and readable storage medium
WO2017202006A1 (en) Data processing method and device, and computer storage medium
US9367603B2 (en) Systems and methods for behavioral segmentation of users in a social data network
AU2017101862A4 (en) Collaborative filtering method, apparatus, server and storage medium in combination with time factor
US10262336B2 (en) Non-converting publisher attribution weighting and analytics server and method
US11243992B2 (en) System and method for information recommendation
CN110472154B (en) Resource pushing method and device, electronic equipment and readable storage medium
CN105023165A (en) Method, device and system for controlling release tasks in social networking platform
WO2019169978A1 (en) Resource recommendation method and device
CN111177559B (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN110135912B (en) Information pushing method and device, server and storage medium
CN107291755B (en) Terminal pushing method and device
US20140147048A1 (en) Document quality measurement
CN111027838B (en) Crowd-sourced task pushing method, device, equipment and storage medium thereof
CN107634897A (en) Group recommends method and apparatus
CN115577316A (en) User personality prediction method based on multi-mode data fusion and application
CN108133058B (en) Video retrieval method
CN103700004A (en) Method and device for pushing microblog advertising service information
CN111126071A (en) Method and device for determining questioning text data and data processing method of customer service group
CN108090193B (en) Abnormal text recognition method and device
WO2019242453A1 (en) Information processing method and device, storage medium, and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant