CN108647990A - A kind of method, apparatus and electronic equipment of determining target user - Google Patents

A kind of method, apparatus and electronic equipment of determining target user Download PDF

Info

Publication number
CN108647990A
CN108647990A CN201810297028.4A CN201810297028A CN108647990A CN 108647990 A CN108647990 A CN 108647990A CN 201810297028 A CN201810297028 A CN 201810297028A CN 108647990 A CN108647990 A CN 108647990A
Authority
CN
China
Prior art keywords
sample
feature
user
seed user
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810297028.4A
Other languages
Chinese (zh)
Other versions
CN108647990B (en
Inventor
吴健君
张鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810297028.4A priority Critical patent/CN108647990B/en
Publication of CN108647990A publication Critical patent/CN108647990A/en
Application granted granted Critical
Publication of CN108647990B publication Critical patent/CN108647990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An embodiment of the present invention provides a kind of method, apparatus and electronic equipment of determining target user, this method includes:For each feature in fisrt feature set, the first accounting of seed user sample with this feature and non-seed the second accounting of user's sample with this feature are calculated;According to the magnitude relationship and multiple non-seed user's samples of the first accounting of each feature and the second accounting, negative sample collection is generated;Then using multiple seed user samples as positive sample collection, pass through positive sample collection and negative sample collection training logic of propositions regression model, further according to the Logic Regression Models after the third feature vector sum training of each non-seed user's sample, calculate the sample value of non-seed user's sample, finally, according to the sequence of sample value from big to small, select non-seed user corresponding with the first of target user's quantity non-seed user's sample as target user.To realize the less seed user provided according to advertiser, suitable target user is determined.

Description

A kind of method, apparatus and electronic equipment of determining target user
Technical field
The present invention relates to advertisements to launch technical field, more particularly to a kind of method, apparatus and electricity of determining target user Sub- equipment.
Background technology
Currently, launching advertisement in website, as a kind of business model, made by each Large-Scale Interconnected net company With each Large-Scale Interconnected net company has its advertisement launching platform, advertiser that can submit oneself by advertisement launching platform Want advertisement, then advertisement launching platform can search out target user according to the want advertisement of advertiser, and then be used to the target Launch advertisement in family.
Specifically, advertiser can provide seed user when to advertising platform releasing advertisements demand, advertising platform passes through this Seed user finds the target user for meeting the want advertisement, and then corresponding with the want advertisement to target user dispensing Advertisement.
However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art:Work as advertiser When the seed user negligible amounts of offer, suitable target user can not be determined by the prior art.
Invention content
The embodiment of the present invention is designed to provide a kind of method, apparatus and electronic equipment of determining target user, with reality The less seed user now provided according to advertiser, determines suitable target user.Specific technical solution is as follows:
In the one side of the embodiment of the present invention, an embodiment of the present invention provides a kind of methods of determining target user, should Method includes:
Obtain fisrt feature set, multiple seed user samples and multiple non-seed user's samples;
For each feature in fisrt feature set, calculates the seed user sample with this feature and used in multiple seeds The first accounting in the sample of family and non-seed user's sample with this feature second accounting in multiple non-seed user's samples Than;
According to the magnitude relationship of the first accounting and the second accounting of each feature, second feature set or third feature are generated Set;And according to second feature set or third feature set, the first non-seed use is selected in multiple non-seed user's samples Family sample generates negative sample collection;
Multiple seed user samples are obtained, and using multiple seed user samples as positive sample collection, obtain positive sample collection First sample label, positive sample concentrate the second sample label of the first eigenvector of each seed user sample, negative sample collection The second feature vector that each non-seed user's sample is concentrated with negative sample, is trained logic of propositions regression model, obtains Logic Regression Models after training;
The third feature vector of each non-seed user's sample in multiple non-seed user's samples is obtained, and according to third spy The Logic Regression Models after vector sum training are levied, the sample of each non-seed user's sample in multiple non-seed user's samples is calculated Value;
Target user's quantity is obtained, with multiple non-seed user's samples, according to the sequence of sample value from big to small, is selected Select the first non-seed user's sample for meeting target user's quantity, and will non-seed use corresponding with first non-seed user's sample Family is as target user.
Optionally, before obtaining fisrt feature set, multiple seed user samples and multiple non-seed user's samples, this The method of determining target user of inventive embodiments a kind of further includes:
The second feature of the fisrt feature and multiple non-seed user's samples of multiple seed user samples is obtained, and according to One feature and second feature establish fisrt feature set, wherein each feature in fisrt feature set does not repeat.
Optionally, after obtaining fisrt feature set, multiple seed user samples and multiple non-seed user's samples, this The method of determining target user of inventive embodiments a kind of further includes:
Each feature in fisrt feature set is encoded, the fisrt feature set after being encoded;
Correspondingly, for each feature in fisrt feature set, the seed user sample with this feature is calculated more The first accounting in a seed user sample and non-seed user's sample with this feature are in multiple non-seed user's samples The second accounting, including:
For each feature in the fisrt feature set after coding, the seed user sample with this feature is calculated more The first accounting in a seed user sample and non-seed user's sample with this feature are in multiple non-seed user's samples The second accounting.
Optionally, according to the magnitude relationship of the first accounting and the second accounting of each feature, generate second feature set or Third feature set;And according to second feature set or third feature set, first is selected in multiple non-seed user's samples Non-seed user's sample generates negative sample collection, including:
For each feature in fisrt feature set, when first accounting of this feature is less than the second accounting, by the spy Sign is added in second feature set, obtains the second feature set added with multiple features;
The multiple features for obtaining multiple non-seed user's samples, in multiple features, selection is present in second feature set In third feature select non-seed user's sample corresponding with third feature, generation and in multiple non-seed user's samples Negative sample collection.
Optionally, according to the magnitude relationship of the first accounting and the second accounting of each feature, generate second feature set or Third feature set;And according to second feature set or third feature set, first is selected in multiple non-seed user's samples Non-seed user's sample generates negative sample collection, including:
This feature is added to third when the first accounting is more than second for each feature in fisrt feature set In characteristic set, the third feature set added with multiple features is obtained;
The multiple features for obtaining multiple non-seed user's samples, in multiple features, selection is not present in third feature collection Fourth feature in conjunction, and in multiple non-seed user's samples, non-seed user's sample corresponding with fourth feature is selected, it is raw At negative sample collection.
At the another aspect of the embodiment of the present invention, the embodiment of the present invention additionally provides a kind of device of determining target user, The device includes:
Acquisition module, for obtaining fisrt feature set, multiple seed user samples and multiple non-seed user's samples;
Accounting computing module is used for for each feature in fisrt feature set, calculating the seed with this feature First accounting of the family sample in multiple seed user samples and non-seed user's sample with this feature are multiple non-seed The second accounting in user's sample;
Negative sample collection generation module is used for the magnitude relationship of the first accounting and the second accounting according to each feature, generates Second feature set or third feature set;And according to second feature set or third feature set, in multiple non-seed users First non-seed user's sample is selected in sample, generates negative sample collection;
Training module for obtaining multiple seed user samples, and using multiple seed user samples as positive sample collection, obtains The first sample label of positive sample collection, positive sample is taken to concentrate the first eigenvector of each seed user sample, negative sample collection Second sample label and negative sample concentrate the second feature vector of each non-seed user's sample, to logic of propositions regression model into Row training, the Logic Regression Models after being trained;
Sample value computing module, the third for obtaining each non-seed user's sample in multiple non-seed user's samples are special Sign vector, and according to the Logic Regression Models after the training of third feature vector sum, calculate each in multiple non-seed user's samples The sample value of non-seed user's sample;
Target user's selecting module, for obtaining target user's quantity, in multiple non-seed user's samples, according to sample The sequence of this value from big to small, selection meet first non-seed user's sample of target user's quantity, and will with it is first non-seed The corresponding non-seed user of user's sample is as target user.
Optionally, the device of a kind of determining target user of the embodiment of the present invention further includes:
Fisrt feature set establishes module, the fisrt feature for obtaining multiple seed user samples and multiple non-seed use The second feature of family sample, and according to fisrt feature and second feature, establish fisrt feature set, wherein fisrt feature set In each feature do not repeat.
Optionally, the device of a kind of determining target user of the embodiment of the present invention further includes:
Coding module, for being encoded to each feature in fisrt feature set, the fisrt feature after being encoded Set;
Correspondingly, accounting computing module, is specifically used for:
For each feature in the fisrt feature set after coding, the seed user sample with this feature is calculated more The first accounting in a seed user sample and non-seed user's sample with this feature are in multiple non-seed user's samples The second accounting.
Optionally, negative sample collection generation module, including:
Second feature set generates submodule, for for each feature in fisrt feature set, the of this feature When one accounting is less than the second accounting, this feature is added in second feature set, it is special to obtain second added with multiple features Collection is closed;
First negative sample collection generates submodule, multiple features for obtaining multiple non-seed user's samples, in multiple spies In sign, selection is present in the third feature in second feature set, and in multiple non-seed user's samples, and selection is special with third Corresponding non-seed user's sample is levied, negative sample collection is generated.
Optionally, negative sample collection generation module further includes:
Third feature set generates submodule, each feature for being directed in fisrt feature set, big in the first accounting When second, this feature is added in third feature set, obtains the third feature set added with multiple features;
Second negative sample collection generates submodule, multiple features for obtaining multiple non-seed user's samples, in multiple spies In sign, selection is not present in the fourth feature in third feature set, and in multiple non-seed user's samples, selection and the 4th The corresponding non-seed user's sample of feature generates negative sample collection.
At the another aspect that the present invention is implemented, the embodiment of the present invention additionally provides a kind of electronic equipment, the electronic equipment packet It includes:Processor, communication interface, memory and communication bus, wherein processor, communication interface, memory are complete by communication bus At mutual communication;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes a kind of any of the above-described determining target The method of user.
At the another aspect that the present invention is implemented, the embodiment of the present invention additionally provides a kind of computer readable storage medium, institute It states and is stored with instruction in computer readable storage medium, when run on a computer so that computer executes any of the above-described A kind of method of the determining target user.
At the another aspect that the present invention is implemented, the embodiment of the present invention additionally provides a kind of computer program production comprising instruction Product, when run on a computer so that the method that computer executes a kind of any of the above-described determining target user.
The method, apparatus and electronic equipment of a kind of determining target user provided in an embodiment of the present invention, is getting first After characteristic set, multiple seed user samples and multiple non-seed user's samples, for each feature in fisrt feature set, Calculate first accounting of the seed user sample with this feature in multiple seed user samples and non-kind with this feature Second accounting of the child user sample in multiple non-seed user's samples, then accounts for according to the first accounting of each feature and second The magnitude relationship of ratio generates second feature set or third feature collection for generating negative sample collection and merges generation negative sample collection; By generating negative sample collection according to the magnitude relationship of the first accounting and the second accounting so that the negative sample collection and positive sample may be used This collection training logic of propositions regression model can pass through multiple non-seed users after the Logic Regression Models after being trained Logic Regression Models in sample after the third feature vector sum training of each non-seed user's sample, calculate multiple non-seed use The sample value of each non-seed user's sample in the sample of family;Sample value is bigger, then illustrates more to be likely to become target user, because This, according to the sequence of sample value from big to small, can select corresponding with target user's quantity in multiple non-seed user's samples First non-seed user's sample, and will non-seed user corresponding with first non-seed user's sample as target user, from And the less seed user provided according to advertiser is provided, determine suitable target user.Certainly, implement the present invention Any product or method must be not necessarily required to reach all the above advantage simultaneously.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described.
Fig. 1 is a kind of flow chart of the first embodiment of the method for determining target user of the embodiment of the present invention;
Fig. 2 is a kind of flow chart of second of embodiment of method of determining target user of the embodiment of the present invention;
Fig. 3 is a kind of flow chart of the third embodiment of the method for determining target user of the embodiment of the present invention;
Fig. 4 is a kind of flow chart of the 4th kind of embodiment of method of determining target user of the embodiment of the present invention;
Fig. 5 is a kind of flow chart of the 5th kind of embodiment of method of determining target user of the embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of the device of determining target user of the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of a kind of electronic equipment of the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
It is of the existing technology in order to solve the problems, such as, an embodiment of the present invention provides a kind of method of determining target user, Device and electronic equipment determine suitable target user to realize the less seed user provided according to advertiser.And then to The target user launches advertisement corresponding with the want advertisement, improves advertisement delivery effect.
In the following, being illustrated first to the method for determining target user of the embodiment of the present invention a kind of, as shown in Figure 1, being A kind of flow chart of the first embodiment of the method for determining target user of the embodiment of the present invention, in Fig. 1, this method can be with Including:
S110 obtains fisrt feature set, multiple seed user samples and multiple non-seed user's samples.
Wherein, which can be that is pre-established preserve the characteristic set of multiple features, and this feature can To include:Age of user, user's gender, the affiliated city of user, user's viewing preference etc..The characteristic set pre-established can be with It is by carrying out the characteristic set that the feature that signature analysis obtains is established to the historical user for watching film.
In some instances, it when advertiser sends want advertisement to advertisement launching platform, can be thrown simultaneously to the advertisement It is laid flat platform and sends seed user sample, which, can be with after the seed user sample for receiving advertiser's transmission Triggering is using a kind of target user's determining device of the method for determining target user of the embodiment of the present invention, target user determination Device can get multiple seed user samples from above-mentioned advertisement launching platform.Each seed user sample may include kind The identification information of child user, the feature of seed user, feature vector of seed user etc..
In some instances, above-mentioned advertisement launching platform can pre-establish a historical use data library, the history The identification information of historical user, the characteristic information etc. of historical user can be preserved in customer data base.Above-mentioned target user Determining device can obtain multiple non-seed user's samples from the historical use data library.
In some instances, above-mentioned advertisement launching platform can be in the seed user information for receiving advertiser's offer Afterwards, judge whether the quantity of seed user in seed user information is less than default seed user threshold value, judging seed user letter When the quantity of seed user is less than default seed user threshold value in breath, above-mentioned target user's determining device is triggered.
It in some instances, can also be simultaneously to the advertisement when advertiser sends want advertisement to advertisement launching platform Release platform sends the identification information of seed user, and above-mentioned advertisement launching platform is in the identification information for receiving the seed user Afterwards, in the historical use data library pre-established, the inquiry historical user corresponding with the identification information, and will believe with the mark Corresponding historical user is ceased as seed user sample, by the historical use data library, except corresponding with the identification information is gone through Historical user other than history user is sent to above-mentioned target user's determining device as non-seed user's sample, therefore, above-mentioned Target user's determining device can get multiple seed user samples and multiple non-seed user's samples.
In one possible implementation, above-mentioned target user's determining device can be arranged in above-mentioned advertisement dispensing Platform interior can also be independently arranged with above-mentioned advertisement launching platform.
S120 calculates the seed user sample with this feature multiple for each feature in fisrt feature set The first accounting in seed user sample and non-seed user's sample with this feature are in multiple non-seed user's samples Second accounting.
Specifically, above-mentioned target user's determining device get fisrt feature set, multiple seed user samples and After multiple non-seed user's samples, each feature that can be directed in the fisrt feature set calculates the seed with this feature First accounting of user's sample in multiple seed user samples and non-seed user's sample with this feature are at multiple non-kinds The second accounting in child user sample.
For example, it is assumed that the fisrt feature collection that above-mentioned target user's determining device is got is combined into:{ the age:38、40、45、 47,50, gender:Man, female, affiliated city:Beijing, Guangzhou, Shanghai, Tianjin }.The multiple seed user samples got are:With Family sample 1, user's sample 2, user's sample 3 and user's sample 4, the multiple non-seed user's samples got are:User's sample 5, user's sample 6, user's sample 7, user's sample 8, user's sample 9 and user's sample 10.
Wherein, user's sample 1 is characterized as:38, man, Beijing, user's sample 2 are characterized as 40, female, Guangzhou, user's sample Originally 3 it is characterized as that 45, man, Shanghai, user's sample 4 are characterized as that 47, female, Tianjin, user's sample 5 are characterized as 50, man, north Capital, user's sample 6 are characterized as that 47, man, Guangzhou, user's sample 7 are characterized as that 40, female, Tianjin, user's sample 8 are characterized as 50, man, Shanghai, user's sample 9 are characterized as that 38, female, Beijing, user's sample 10 are characterized as 45, man, Tianjin.
It is user's sample that then above-mentioned target user's determining device, which can count the seed user sample with feature " 38 ", Sheet 1, non-seed user's sample with feature " 38 " are user's sample 9.It is consequently possible to calculate providing the user of feature " 38 " First accounting of the sample 1 in 4 seed user samples is 25%, and user's sample 9 with feature " 38 " is in 6 non-seed use The second accounting in the sample of family is 16.7%.And so on, the seed user sample with feature " man " is in 4 seed user samples The first accounting in this is 50%, non-seed user's sample with feature " man " in 6 non-seed user's samples second Accounting is 66.7%.
By this step, above-mentioned target user's determining device can calculate each feature in first set multiple The second accounting in the first accounting and multiple non-seed user's samples in seed user sample.And then subsequent step can be passed through Filter out negative sample collection.
S130 generates second feature set or the according to the magnitude relationship of the first accounting and the second accounting of each feature Three characteristic sets;And according to second feature set or third feature set, selection first is non-in multiple non-seed user's samples Seed user sample generates negative sample collection.
In some instances, in order to use the preset Logic Regression Models of negative sample set pair to be trained, in this step, Second feature set or the can be generated according to the magnitude relationship of the first accounting and the second accounting of each feature in first set Three characteristic sets, further according to second feature set or third feature set, selection first is non-in multiple non-seed user's samples Seed user sample generates negative sample collection.
Specifically, when first accounting of this feature is less than the second accounting, this feature is added to second feature set, when When first accounting of this feature is more than the second accounting, this feature is added to third feature set.
For example, it is assumed that the first accounting of the above-mentioned calculated feature of target user's determining device " 38 " is 25%, second Accounting is 16.7%, then can generate include feature " 38 " third feature set, the first of calculated feature " man " accounts for Than being 50%, the second accounting be 66.7%, then can generate include feature " man " second feature set.And then it can basis Third feature set or second feature set select first non-seed user's sample in multiple non-seed user's samples, generate Negative sample collection.
In some instances, above-mentioned target user's determining device can only generate second feature set, or only generate Third feature set.When only generating second feature set, above-mentioned target user's determining device can be according to the second feature Set is selecting first non-seed user's sample in multiple non-seed user's samples, is generating negative sample collection.
When only generate third feature set when, above-mentioned target user's determining device can according to the third feature set, First non-seed user's sample is being selected in multiple non-seed user's samples, generates negative sample collection.
The method of determining target user through the embodiment of the present invention a kind of generates negative sample collection, can be in subsequent step In, it is trained using the preset Logic Regression Models of negative sample set pair, and then the logistic regression mould after training can be used Type finds target user.
S140 obtains multiple seed user samples, and using multiple seed user samples as positive sample collection, obtains positive sample The first sample label of collection, positive sample concentrate the second sample of the first eigenvector of each seed user sample, negative sample collection Label and negative sample concentrate the second feature vector of each non-seed user's sample, are trained to logic of propositions regression model, Logic Regression Models after being trained.
In some instances, can a first sample be arranged to positive sample collection in advance in above-mentioned target user's determining device Second sample label is arranged to negative sample collection in label, for example, the first sample label can be 1, the second sample label can Can also be -1 to be 0.
In some instances, may include in above-mentioned multiple seed user samples each seed user sample feature to It measures, may include the feature vector of each non-seed user's sample in multiple non-seed user's samples.Therefore, above-mentioned target is used Family determining device can get the first eigenvector that positive sample concentrates each seed user sample, and negative sample is concentrated each non- The second feature vector of seed user sample.
In some instances, above-mentioned logic of propositions regression model can be formula shown in formula (1):
Wherein, g (xi)=w0+w1xi1+…+wjxij…+wnxin, xijIndicate j-th of feature of i-th of user's sample to Amount.I >=1, n >=j >=1, n >=1.P(yk=1 | xi) indicate that the sample label of i-th of user's sample is 1 probability, k=1 or 0. The probability that the sample label of i-th of user's sample is 0 is formula (2):
It is assumed that the positive sample that above-mentioned target user's determining device is got concentrates seed user sample and negative sample to concentrate The sum of non-seed user's sample is m, due to mutual indepedent between m user's sample, the joint point of all user's samples Cloth is the product of each user's sample edge distribution, i.e. formula (3):
Then, the method meter for calculating maximal possibility estimation in the prior art may be used in above-mentioned target user's determining device Calculate g (xi) in parameters:w0,w1,…,wj,…,wnSo that L (w) obtains maximum value.Here estimate to calculating maximum likelihood The method of meter is not introduced excessively.
In some instances, above-mentioned target user's determining device can be by first eigenvector, first sample label, Two sample labels and second feature vector substitute into above-mentioned formula (3), calculate the parameter for meeting each user's sample:w0,w1,…, wj,…,wn.So as to the Logic Regression Models after being trained.
For example, it is assumed that user's sample 1, user's sample 2, user's sample 3, user's sample 4 are user's sample that positive sample is concentrated This, user's sample 5, user's sample 6, user's sample 8 and user's sample 10 are user's sample that negative sample is concentrated, then above-mentioned mesh Mark user determining device can obtain the feature vector of user's sample 1, the feature vector of user's sample 2, user's sample 3 respectively The feature vector of feature vector, user's sample 4, obtain respectively the feature vector of user sample 5, the feature vector of user's sample 6, The feature vector of the feature vector and user's sample 10 of user's sample 8.
Then pass through the feature vector of user's sample 1, the feature vector of user's sample 2, user's sample 3 of positive sample concentration Feature vector, the feature vector of user's sample 4, the spy of the feature vector of user's sample 5, user's sample 6 that negative sample is concentrated The feature vector of sign vector, the feature vector of user's sample 8 and user's sample 10, is trained logic of propositions regression model, The parameter for meeting above-mentioned 8 user's samples is calculated by above-mentioned formula (3):w0,w1,…,wj,…,wn, so as to Logic Regression Models after to training.
It through the embodiment of the present invention, will be according to second feature set or third using seed user sample as positive sample collection Characteristic set, the first non-seed user's sample selected in multiple non-seed user's samples is as negative sample collection, then uses just Sample set and negative sample set pair logic of propositions regression model are trained, the Logic Regression Models after the training that can make Positive sample and negative sample can be distinguished.Improve the accuracy by subsequent step selection target user.
S150, the third feature for obtaining each non-seed user's sample in multiple non-seed user's samples is vectorial, and according to Logic Regression Models after the training of third feature vector sum calculate each non-seed user's sample in multiple non-seed user's samples Sample value.
In some instances, in the first Logic Regression Models after training, P (yk=1 | xi) it can indicate i-th of user The probability that the sample label of sample is 1, the probability that sample label is 1 is bigger, it can be said that bright more suitable as target user.
In order to find target user in multiple non-seed users, above-mentioned target user's determining device can instructed After Logic Regression Models after white silk, the third that can obtain each non-seed user's sample in multiple non-seed user's sample is special It levies to ideal, the Logic Regression Models after then being trained by the third feature vector sum calculate multiple non-seed user's sample The probability that sample label of each non-seed user's sample is 1 in this, and be 1 by the sample label of each non-seed user's sample Sample value of the probability as non-seed user's sample, so as to obtain the sample value of all non-seed user's samples.
For example, above-mentioned target user's determining device is after the Logic Regression Models after being trained, it can be according to user The feature vector of sample 5, the feature vector of user's sample 6, the feature vector of user's sample 7, the feature vector of user's sample 8, The feature vector of the feature vector and user's sample 10 of user's sample 9, calculate user's sample 5 sample label be 1 probability, The sample label of probability, user's sample 8 that the sample label of probability, user's sample 7 that the sample label of user's sample 6 is 1 is 1 The probability that sample label for 1 probability, the probability that the sample label of user's sample 9 is 1 and user's sample 10 is 1.Then will The probability that respective sample label is 1 is as corresponding sample value.Therefore, sample value, the user's sample of user's sample 5 can be obtained This 6 sample value, the sample value of user's sample 7, the sample value of user's sample 8, the sample value of user's sample 9 and user's sample 10 Sample value.
S160 obtains target user's quantity, in multiple non-seed user's samples, according to sample value from big to small suitable Sequence, selection meet first non-seed user's sample of target user's quantity, and will be corresponding non-with first non-seed user's sample Seed user is as target user.
In some instances, may include target user's number in the want advertisement that advertiser sends to advertisement launching platform Amount, therefore, above-mentioned target user's determining device can get target user's quantity from advertisement launching platform.
Specifically, above-mentioned target user's determining device is after getting target user's quantity, it can be multiple non-seed In user's sample, according to the sequence of sample value from big to small, first non-seed user's sample corresponding with target user's quantity is selected This, then will non-seed user corresponding with first non-seed user's sample as target user.
For example, it is assumed that above-mentioned target user's determining device has calculated the sample of user's sample 5 in non-seed user's sample This value is 0.65, the sample value of user's sample 6 is 0.3, the sample value of user's sample 7 is 0.55, the sample value of user's sample 8 is 0.4, the sample value of user's sample 9 is 0.75 and the sample value of user's sample 10 is 0.2.In this step, non-kind can be obtained The sample value of child user sample, i.e.,:The sample value of user's sample 5, the sample value of user's sample 6, the sample value of user's sample 7, The sample value of the sample value of user's sample 8, the sample value and user's sample 10 of user's sample 9.
Assume again that above-mentioned target user's quantity is 3, then above-mentioned target user's determining device can be non-in above-mentioned 6 In seed user sample, according to the sequence of sample value from big to small, sample value 0.75,0.65 and 0.55, corresponding user are selected Sample is respectively:User's sample 9, user's sample 5 and user's sample 7.
Finally, it may be determined that go out and the corresponding non-seed user of user's sample 9, non-seed use corresponding with user's sample 5 Family and non-seed user corresponding with user's sample 7 are target user.
In some instances, above-mentioned target user's determining device is after determining target user, can will determine The identification information of target user is sent to above-mentioned advertisement launching platform, which can be to the target user's Terminal device corresponding to identification information launches advertisement.
A kind of target user through the embodiment of the present invention determines method, is getting fisrt feature set, multiple seeds After user's sample and multiple non-seed user's samples, for each feature in fisrt feature set, calculate with this feature First accounting of the seed user sample in multiple seed user samples and non-seed user's sample with this feature are multiple The second accounting in non-seed user's sample, it is raw then according to the magnitude relationship of the first accounting and the second accounting of each feature At the second feature set or third feature collection merging generation negative sample collection for generating negative sample collection;By according to the first accounting Negative sample collection is generated with the magnitude relationship of the second accounting so that the negative sample collection and positive sample collection training logic of propositions may be used Regression model can be by each non-seed in multiple non-seed user's samples after the Logic Regression Models after being trained Logic Regression Models after the third feature vector sum training of user's sample calculate non-kind each in multiple non-seed user's samples The sample value of child user sample;Sample value is bigger, then illustrates more to be likely to become target user, therefore, can be at multiple non-kinds In child user sample, according to the sequence of sample value from big to small, the first non-seed user corresponding with target user's quantity is selected Sample, and will non-seed user corresponding with first non-seed user's sample as target user, so as to realize according to wide The less seed user for accusing main offer, determines suitable target user.
In a kind of optional embodiment of the embodiment of the present invention, the method for determining target user shown in Fig. 1 a kind of On the basis of, the embodiment of the present invention additionally provides a kind of possible realization method, as shown in Fig. 2, for one kind of the embodiment of the present invention It determines the flow chart of second of embodiment of method of target user, in fig. 2, in S110, obtains fisrt feature set, multiple Before seed user sample and multiple non-seed user's samples, the method for determining target user of the embodiment of the present invention a kind of, also May include:
S170 obtains the second feature of the fisrt feature and multiple non-seed user's samples of multiple seed user samples, and According to fisrt feature and second feature, fisrt feature set is established.
Wherein, each feature in fisrt feature set does not repeat.
In some instances, advertisement launching platform can carry out the historical user for watching film signature analysis or right The historical user for clicking history advertisement carries out signature analysis, obtains the feature of historical user, and then establish fisrt feature set.
In some instances, when seed user sample and non-seed user's sample are all the history of above-mentioned advertisement launching platform When user's sample in customer data base, above-mentioned target user's determining device can be obtained from the historical use data library first The fisrt feature for taking seed user sample obtains the second feature of non-seed user's sample, so from the historical use data library Fisrt feature set is established by fisrt feature and second feature afterwards.
For example, it is assumed that multiple seed user samples and corresponding being characterized as:User's sample 1 corresponding is characterized as:38, male, Beijing, user's sample 2 corresponding are characterized as:40, female, Guangzhou, user's sample 3 corresponding are characterized as:45, man, Shanghai, user Sample 4 corresponding is characterized as:47, female, Tianjin, multiple non-seed user's samples and corresponding are characterized as:User's sample 5 corresponds to It is characterized as:50, man, Beijing, user's sample 6 corresponding are characterized as:47, man, Guangzhou, user's sample 7 corresponding are characterized as: 40, female, Tianjin, user's sample 8 corresponding are characterized as:50, man, Shanghai, user's sample 9 corresponding are characterized as:38, female, north Capital, user's sample 10 corresponding are characterized as:45, man, Tianjin.
Then above-mentioned target user's determining device can get fisrt feature and be:38,40,45,47, man, female, Beijing, Guangzhou, Shanghai, Tianjin.Second feature is:38,40,45,47,50, man, female, Beijing, Guangzhou, Tianjin, Shanghai.
Then fisrt feature and second feature can be merged, to after merging fisrt feature and second feature carry out Duplicate removal processing can obtain fisrt feature set.
The method of determining target user through the embodiment of the present invention a kind of generates fisrt feature set, can reduce by first The quantity of feature in characteristic set so as to reduce the calculation amount for calculating the first accounting and the second accounting, and then reduces application The method of determining target user of the embodiment of the present invention a kind of determines the time overhead of target user.
In a kind of optional embodiment of the embodiment of the present invention, the method for determining target user shown in Fig. 2 a kind of On the basis of, the embodiment of the present invention additionally provides a kind of possible realization method, as shown in figure 3, for one kind of the embodiment of the present invention It determines the flow chart of the third embodiment of the method for target user, in figure 3, in S110, obtains fisrt feature set, multiple After seed user sample and multiple non-seed user's samples, the method for determining target user of the embodiment of the present invention a kind of is also wrapped It includes:
S180 encodes each feature in fisrt feature set, the fisrt feature set after being encoded.
In some instances, in order to reduce the occupancy of each feature in fisrt feature set to hardware device memory space, The time overhead that target user is determined using the method for determining target user of the embodiment of the present invention a kind of is further decreased, it is above-mentioned Target user's determining device after getting fisrt feature set, can also to each feature in the fisrt feature set into Row coding, the fisrt feature set after being encoded.
For example, it is assumed that the fisrt feature collection that above-mentioned target user's determining device is got is combined into:Age 38,40, 45,47,50 }, gender { man, female }, affiliated city { Beijing, Guangzhou, Shanghai, Tianjin } }, then Arabic numerals can be used to this Each feature in fisrt feature set is encoded, and is to include Arabic number by each Feature Conversion in fisrt feature set The fisrt feature set of word:{1、2、3、4、5、6、7、8、9、10、11}.
In some instances, above-mentioned target user's determining device can also use lower case or upper case English alphabet pair first Characteristic set is encoded, the characteristic set after being encoded:{a、b、c、d、e、f、g、h、i、j、k}.
Correspondingly, above-mentioned target user's determining device is after encoding fisrt feature set, step S120, for Each feature in fisrt feature set calculates the of the seed user sample with this feature in multiple seed user samples One accounting and second accounting of the non-seed user's sample in multiple non-seed user's samples with this feature may include:
S121 calculates the seed user sample with this feature for each feature in the fisrt feature set after coding This first accounting in multiple seed user samples and non-seed user's sample with this feature are in multiple non-seed users The second accounting in sample.
In some instances, above-mentioned target user's determining device can also use after to fisrt feature collective encoding Coding mode identical with fisrt feature collective encoding mode, each feature to multiple seed user samples and multiple non-seed Each feature of user's sample encodes.
By the feature of feature and multiple non-seed user's samples to fisrt feature set, multiple seed user samples into Row coding, above-mentioned target user's determining device can use the feature after coding when calculating the first accounting and the second accounting The first accounting and the second accounting are calculated, so as to reduce feature to the occupancy of hardware device memory space, further decreases and answers The time overhead of target user is determined with the method for determining target user of the embodiment of the present invention a kind of.
In a kind of optional embodiment of the embodiment of the present invention, the method for determining target user shown in Fig. 1 a kind of On the basis of, the embodiment of the present invention additionally provides a kind of possible realization method, as shown in figure 4, for one kind of the embodiment of the present invention Determine the flow chart of the 4th kind of embodiment of method of target user, in Fig. 4, S130, according to the first accounting of each feature With the magnitude relationship of the second accounting, second feature set or third feature set are generated;And according to second feature set or third Characteristic set selects first non-seed user's sample in multiple non-seed user's samples, generates negative sample collection, may include:
S131, will when first accounting of this feature is less than the second accounting for each feature in fisrt feature set This feature is added in second feature set, obtains the second feature set added with multiple features.
In some instances, above-mentioned target user's determining device is accounted for according to the first accounting of each feature and second The magnitude relationship of ratio, when generating second feature set or third feature set, an embodiment of the present invention provides two kinds of possible realities Existing mode, in one possible implementation, when the first accounting of any feature in fisrt feature set is accounted for less than second Than when, which can be added in second feature set, it is special so as to obtain second added with multiple features Collection is closed.
For example, it is assumed that above-mentioned target user's determining device, the first accounting of calculated feature " man " is 50%, second Accounting is 66.7%, then this feature " man " can be added to second feature set, the first accounting of calculated feature " 38 " It is 40%, the second accounting is 45%, then second feature set can be added to this feature " 38 ", calculated feature " Beijing " First accounting is 66%, and the second accounting is 73%, then can this feature " Beijing " be added to second feature set etc., so as to To obtain the second feature set added with feature " 38 ", " man " and " Beijing ".
S132 obtains multiple features of multiple non-seed user's samples, and in multiple features, selection is present in second feature Third feature in set, and in multiple non-seed user's samples, non-seed user's sample corresponding with third feature is selected, Generate negative sample collection.
After obtaining the second feature set added with multiple features, in order to be trained to logic of propositions regression model, Above-mentioned target user's determining device can use the second feature set to being screened in multiple non-seed user's samples, obtain To the selection result, negative sample collection is then generated according to the selection result.
Specifically, above-mentioned target user's determining device can obtain the feature of each non-seed user's sample, then sentence Disconnected this feature whether there is in second feature set, if it is, obtaining non-seed user's sample corresponding with this feature.To Multiple non-seed user's samples that feature is present in second feature set can be obtained, are then present in second using this feature Multiple non-seed user's samples in characteristic set generate negative sample collection.
For example, it is assumed that second feature collection is combined into:" 38 ", " man " and " Beijing ", the feature of user's sample 5, user's sample 7 Feature, the feature of user's sample 8 and user's sample 10 be present in the second feature set, then above-mentioned target user is true Determine device, user's sample 5, user's sample 7, user's sample 8 and user's sample 10 can be obtained, and generate and include:User The negative sample collection of sample 5, user's sample 7, user's sample 8 and user's sample 10.
It is compared by the first accounting to feature and the second accounting, when the first accounting is less than the second accounting, It can illustrate that this feature is more likely to negative sample, it therefore, can be using the corresponding non-seed user's sample of this feature as negative sample User's sample of concentration, and then the negative sample set pair logic of propositions regression model generated can be used to be trained, find target User.
In some instances, above-mentioned target user's determining device obtains second feature set in S131 through the above steps When, there may be the features for belonging to seed user sample in the second feature set, if being generated using the second feature set negative Sample set can make the accurate of the Logic Regression Models after training according to the negative sample collection training logic of propositions regression model Degree reduces, and then reduces the accuracy for finding target user.
In order to improve the accuracy for finding target user using the Logic Regression Models after training, one kind shown in Fig. 1 On the basis of the method for determining target user, the embodiment of the present invention additionally provides alternatively possible realization method, to realize life At negative sample concentrate and only include the feature of non-seed user's sample.
As shown in figure 5, the flow of the 5th kind of embodiment of method for a kind of determining target user of the embodiment of the present invention Figure, in Figure 5, S130, according to the magnitude relationship of the first accounting and the second accounting of each feature, generate second feature set or Third feature set;And according to second feature set or third feature set, first is selected in multiple non-seed user's samples Non-seed user's sample generates negative sample collection, may include:
This feature is added to for each feature in fisrt feature set when the first accounting is more than second by S133 In third feature set, the third feature set added with multiple features is obtained.
In the alternatively possible realization method of the embodiment of the present invention, when of any feature in fisrt feature set When one accounting is more than the second accounting, illustrates that any feature is more likely to positive sample, which can be added to third In characteristic set, so as to obtain the third feature set added with multiple features.
For example, it is assumed that above-mentioned target user's determining device, the first accounting of calculated feature " female " is 61%, second Accounting is 60%, then this feature " female " can be added to third feature set, and the first accounting of calculated feature " 45 " is 75%, the second accounting is 47%, then can be added to third feature set with this feature " 45 ", the of calculated feature " Guangzhou " One accounting is 68%, and the second accounting is 59%, then can this feature " Guangzhou " be added to third feature set etc., so as to Obtain the third feature set added with feature " 45 ", " female " and " Guangzhou ".
S134 obtains multiple features of multiple non-seed user's samples, and in multiple features, selection is not present in third spy Fourth feature in collection conjunction, and in multiple non-seed user's samples, select non-seed user's sample corresponding with fourth feature This, generates negative sample collection.
After obtaining the third feature set added with multiple features, in order to avoid there may be categories in second feature set In the feature of seed user sample the case where, above-mentioned target user's determining device can use third feature set to multiple non- Seed user sample is screened, and the selection result is obtained, and then generates negative sample collection according to the selection result.
Specifically, above-mentioned target user's determining device can obtain the feature of each non-seed user's sample, then sentence Disconnected this feature whether there is in third feature set, if it is not, then non-seed user's sample corresponding with this feature is obtained, from And multiple non-seed user's samples that feature is not present in third feature set can be obtained, then it is not present using this feature Multiple non-seed user's samples in third feature set generate negative sample collection.
For example, it is assumed that third feature collection is combined into:" 45 ", " female ", " Guangzhou ", the feature of user's sample 5 and user's sample 8 Feature is not present in third feature set, then above-mentioned target user's determining device can obtain user's sample 5 and user's sample Sheet 8, and generation includes the negative sample collection of user's sample 5 and user's sample 8.
It is compared by the first accounting to feature and the second accounting, when the first accounting is more than the second accounting, It can illustrate that this feature is more likely to positive sample, it therefore, can be using this feature as third feature set to multiple non-seed use Family sample is screened so that the feature of multiple non-seed user's samples after screening is not present in third feature set. After being trained using the negative sample set pair logic of propositions regression model generated according to the result after screening, after training can be improved Logic Regression Models accuracy, and then can improve and be sought using the method for determining target user of the embodiment of the present invention a kind of Look for the accuracy of target user.
Corresponding to above-mentioned embodiment of the method, the embodiment of the present invention additionally provides a kind of device of determining target user, such as It is a kind of structural schematic diagram of the device of determining target user of the embodiment of the present invention shown in Fig. 6, in figure 6, the present invention is implemented A kind of device of determining target user of example may include:
Acquisition module 610, for obtaining fisrt feature set, multiple seed user samples and multiple non-seed user's samples This;
Accounting computing module 620, for for each feature in fisrt feature set, calculating the seed with this feature First accounting of user's sample in multiple seed user samples and non-seed user's sample with this feature are at multiple non-kinds The second accounting in child user sample;
Negative sample collection generation module 630 is used for the magnitude relationship of the first accounting and the second accounting according to each feature, raw At second feature set or third feature set;And according to second feature set or third feature set, in multiple non-seed use First non-seed user's sample is selected in the sample of family, generates negative sample collection;
Training module 640, for obtaining multiple seed user samples, and using multiple seed user samples as positive sample Collection, the first sample label of acquisition positive sample collection, positive sample concentrate first eigenvector, the negative sample of each seed user sample The second sample label and negative sample of collection concentrate the second feature vector of each non-seed user's sample, and mould is returned to logic of propositions Type is trained, the Logic Regression Models after being trained;
Sample value computing module 650, for obtaining each non-seed user's sample in multiple non-seed user's samples Three feature vectors, and according to the Logic Regression Models after the training of third feature vector sum, calculate in multiple non-seed user's samples The sample value of each non-seed user's sample;
Target user's selecting module 660, with multiple non-seed user's samples, is pressed for obtaining target user's quantity According to the sequence of sample value from big to small, selection meets first non-seed user's sample of target user's quantity, and will with it is first non- The corresponding non-seed user of seed user sample is as target user.
A kind of device of determining target user through the embodiment of the present invention is getting fisrt feature set, multiple seeds After user's sample and multiple non-seed user's samples, for each feature in fisrt feature set, calculate with this feature First accounting of the seed user sample in multiple seed user samples and non-seed user's sample with this feature are multiple The second accounting in non-seed user's sample, it is raw then according to the magnitude relationship of the first accounting and the second accounting of each feature At the second feature set or third feature collection merging generation negative sample collection for generating negative sample collection;By according to the first accounting Negative sample collection is generated with the magnitude relationship of the second accounting so that the negative sample collection and positive sample collection training logic of propositions may be used Regression model can be by each non-seed in multiple non-seed user's samples after the Logic Regression Models after being trained Logic Regression Models after the third feature vector sum training of user's sample calculate non-kind each in multiple non-seed user's samples The sample value of child user sample;Sample value is bigger, then illustrates more to be likely to become target user, therefore, can be at multiple non-kinds In child user sample, according to the sequence of sample value from big to small, the first non-seed user corresponding with target user's quantity is selected Sample, and will non-seed user corresponding with first non-seed user's sample as target user, so as to realize according to wide The less seed user for accusing main offer, determines suitable target user.
Specifically, the device of determining target user of the embodiment of the present invention a kind of, can also include:
Fisrt feature set establishes module, the fisrt feature for obtaining multiple seed user samples and multiple non-seed use The second feature of family sample, and according to fisrt feature and second feature, establish fisrt feature set, wherein fisrt feature set In each feature do not repeat.
Specifically, the device of determining target user of the embodiment of the present invention a kind of, can also include:
Coding module, for being encoded to each feature in fisrt feature set, the fisrt feature after being encoded Set;
Correspondingly, accounting computing module, is specifically used for:
For each feature in the fisrt feature set after coding, the seed user sample with this feature is calculated more The first accounting in a seed user sample and non-seed user's sample with this feature are in multiple non-seed user's samples The second accounting.
Specifically, negative sample collection generation module 630, including:
Second feature set generates submodule, for for each feature in fisrt feature set, the of this feature When one accounting is less than the second accounting, this feature is added in second feature set, it is special to obtain second added with multiple features Collection is closed;
First negative sample collection generates submodule, multiple features for obtaining multiple non-seed user's samples, in multiple spies In sign, selection is present in the third feature in second feature set, and in multiple non-seed user's samples, and selection is special with third Corresponding non-seed user's sample is levied, negative sample collection is generated.
Specifically, negative sample collection generation module 630, can also include:
Third feature set generates submodule, each feature for being directed in fisrt feature set, big in the first accounting When second, this feature is added in third feature set, obtains the third feature set added with multiple features;
Second negative sample collection generates submodule, multiple features for obtaining multiple non-seed user's samples, in multiple spies In sign, selection is not present in the fourth feature in third feature set, and in multiple non-seed user's samples, selection and the 4th The corresponding non-seed user's sample of feature generates negative sample collection.
The embodiment of the present invention additionally provides a kind of electronic equipment, as shown in fig. 7, a kind of electronics for the embodiment of the present invention is set Standby structural schematic diagram, including processor 710, communication interface 720, memory 730 and communication bus 740, wherein processor 710, communication interface 720, memory 730 completes mutual communication by communication bus 740,
Memory 730, for storing computer program;
Processor 710 when for executing the program stored on memory 730, realizes following steps:
Obtain fisrt feature set, multiple seed user samples and multiple non-seed user's samples;
For each feature in fisrt feature set, calculates the seed user sample with this feature and used in multiple seeds The first accounting in the sample of family and non-seed user's sample with this feature second accounting in multiple non-seed user's samples Than;
According to the magnitude relationship of the first accounting and the second accounting of each feature, second feature set or third feature are generated Set;And according to second feature set or third feature set, the first non-seed use is selected in multiple non-seed user's samples Family sample generates negative sample collection;
Multiple seed user samples are obtained, and using multiple seed user samples as positive sample collection, obtain positive sample collection First sample label, positive sample concentrate the second sample label of the first eigenvector of each seed user sample, negative sample collection The second feature vector that each non-seed user's sample is concentrated with negative sample, is trained logic of propositions regression model, obtains Logic Regression Models after training;
The third feature vector of each non-seed user's sample in multiple non-seed user's samples is obtained, and according to third spy The Logic Regression Models after vector sum training are levied, the sample of each non-seed user's sample in multiple non-seed user's samples is calculated Value;
Target user's quantity is obtained, with multiple non-seed user's samples, according to the sequence of sample value from big to small, is selected Select the first non-seed user's sample for meeting target user's quantity, and will non-seed use corresponding with first non-seed user's sample Family is as target user.
A kind of electronic equipment through the embodiment of the present invention is getting fisrt feature set, multiple seed user samples After multiple non-seed user's samples, for each feature in fisrt feature set, the seed user with this feature is calculated First accounting of the sample in multiple seed user samples and non-seed user's sample with this feature are in multiple non-seed use The second accounting in the sample of family generates then according to the magnitude relationship of the first accounting and the second accounting of each feature for giving birth to Merge at the second feature set or third feature collection of negative sample collection and generates negative sample collection;By being accounted for according to the first accounting and second The magnitude relationship of ratio generates negative sample collection so that the negative sample collection may be used and positive sample collection training logic of propositions returns mould Type can pass through each non-seed user's sample in multiple non-seed user's samples after the Logic Regression Models after being trained Logic Regression Models after this third feature vector sum training calculate each non-seed user in multiple non-seed user's samples The sample value of sample;Sample value is bigger, then illustrates more to be likely to become target user, therefore, can be in multiple non-seed users In sample, according to the sequence of sample value from big to small, first non-seed user's sample corresponding with target user's quantity is selected, and Will non-seed user corresponding with first non-seed user's sample as target user, provided according to advertiser so as to realize Less seed user, determine suitable target user.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, abbreviation EISA) bus etc..The communication bus can be divided into address bus, data/address bus, controlling bus etc.. For ease of indicating, only indicated with a thick line in figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, abbreviation RAM), can also include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Optionally, memory may be used also To be at least one storage device for being located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, Abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), application-specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array, Abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer so that computer executes any institute in above-described embodiment The method of the determining target user stated a kind of.
A kind of computer readable storage medium through the embodiment of the present invention, get fisrt feature set, multiple kinds After child user sample and multiple non-seed user's samples, for each feature in fisrt feature set, calculating has this feature First accounting of the seed user sample in multiple seed user samples and non-seed user's sample with this feature more The second accounting in a non-seed user's sample, then according to the magnitude relationship of the first accounting and the second accounting of each feature, It generates second feature set or third feature collection for generating negative sample collection and merges generation negative sample collection;By being accounted for according to first Negative sample collection is generated than the magnitude relationship with the second accounting so that the negative sample collection may be used and positive sample collection training is preset and patrolled Regression model is collected, it, can be by non-kind each in multiple non-seed user's samples after the Logic Regression Models after being trained Logic Regression Models after the third feature vector sum training of child user sample calculate each non-in multiple non-seed user's samples The sample value of seed user sample;Sample value is bigger, then illustrates more to be likely to become target user, therefore, can be multiple non- In seed user sample, according to the sequence of sample value from big to small, the first non-seed use corresponding with target user's quantity is selected Family sample, and will non-seed user corresponding with first non-seed user's sample as target user, so as to realize basis The less seed user that advertiser provides, determines suitable target user.
In another embodiment provided by the invention, a kind of computer program product including instruction is additionally provided, when it When running on computers so that the method that computer executes any a kind of determining target user in above-described embodiment.
Through the embodiment of the present invention it is a kind of comprising instruction computer program product, get fisrt feature set, After multiple seed user samples and multiple non-seed user's samples, for each feature in fisrt feature set, calculating has First accounting of the seed user sample of this feature in multiple seed user samples and non-seed user's sample with this feature Originally the second accounting in multiple non-seed user's samples, then according to the size of the first accounting and the second accounting of each feature Relationship generates second feature set or third feature collection for generating negative sample collection and merges generation negative sample collection;Pass through basis The magnitude relationship of first accounting and the second accounting generates negative sample collection so that the negative sample collection and positive sample collection training may be used Logic of propositions regression model can be by every in multiple non-seed user's samples after the Logic Regression Models after being trained Logic Regression Models after the third feature vector sum training of a non-seed user's sample, calculate in multiple non-seed user's samples The sample value of each non-seed user's sample;Sample value is bigger, then illustrates more to be likely to become target user, therefore, Ke Yi In multiple non-seed user's samples, according to the sequence of sample value from big to small, select corresponding with target user's quantity first non- Seed user sample, and will non-seed user corresponding with first non-seed user's sample as target user, so as to reality The less seed user now provided according to advertiser, determines suitable target user.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state disk Solid State Disk (SSD)) etc..
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (11)

1. a kind of method of determining target user, which is characterized in that the method includes:
Obtain fisrt feature set, multiple seed user samples and multiple non-seed user's samples;
For each feature in the fisrt feature set, the seed user sample with this feature is calculated at the multiple kind The first accounting in child user sample and non-seed user's sample with this feature are in the multiple non-seed user's sample The second accounting;
According to the magnitude relationship of the first accounting and the second accounting of each feature, second feature set or third feature are generated Set;And it according to the second feature set or the third feature set, is selected in the multiple non-seed user's sample First non-seed user's sample generates negative sample collection;
The multiple seed user sample is obtained, and using the multiple seed user sample as positive sample collection, acquisition is described just The first sample label of sample set, the positive sample concentrate the first eigenvector of each seed user sample, the negative sample Second sample label of collection and the negative sample concentrate the second feature vector of each non-seed user's sample, are returned to logic of propositions Model is returned to be trained, the Logic Regression Models after being trained;
The third feature vector of each non-seed user's sample in the multiple non-seed user's sample is obtained, and according to described the Logic Regression Models after three feature vectors and the training calculate each non-seed use in the multiple non-seed user's sample The sample value of family sample;
Obtain target user's quantity, in the multiple non-seed user's sample, according to the sample value from big to small suitable Sequence, selection meet first non-seed user's sample of target user's quantity, and will be with described first non-seed user's sample Corresponding non-seed user is as target user.
2. according to the method described in claim 1, it is characterized in that, in the acquisition fisrt feature set, multiple seed users Before sample and multiple non-seed user's samples, the method further includes:
Obtain the second feature of the fisrt feature and the multiple non-seed user's sample of the multiple seed user sample, and root According to the fisrt feature and the second feature, the fisrt feature set is established, wherein each in the fisrt feature set A feature does not repeat.
3. according to the method described in claim 1, it is characterized in that, in the acquisition fisrt feature set, multiple seed users After sample and multiple non-seed user's samples, the method further includes:
Each feature in the fisrt feature set is encoded, the fisrt feature set after being encoded;
Correspondingly, each feature in the fisrt feature set, calculates the seed user sample with this feature The first accounting in the multiple seed user sample and non-seed user's sample with this feature are at the multiple non-kind The second accounting in child user sample, including:
For each feature in the fisrt feature set after the coding, the seed user sample with this feature is calculated in institute The first accounting in multiple seed user samples and non-seed user's sample with this feature are stated in the multiple non-seed use The second accounting in the sample of family.
4. according to claims 1 to 3 any one of them method, which is characterized in that described according to the first of each feature The magnitude relationship of accounting and the second accounting generates second feature set or third feature set;And according to the second feature collection It closes or the third feature set, first non-seed user's sample of selection in the multiple non-seed user's sample generates negative Sample set, including:
For each feature in the fisrt feature set, it is less than second accounting in first accounting of this feature When, this feature is added in second feature set, the second feature set added with multiple features is obtained;
The multiple features for obtaining the multiple non-seed user's sample, in the multiple feature, selection is present in described second Third feature in characteristic set, and in the multiple non-seed user's sample, select corresponding with the third feature non- Seed user sample generates negative sample collection.
5. according to claims 1 to 3 any one of them method, which is characterized in that described according to the first of each feature The magnitude relationship of accounting and the second accounting generates second feature set or third feature set;And according to the second feature collection It closes or the third feature set, first non-seed user's sample of selection in the multiple non-seed user's sample generates negative Sample set, including:
This feature is added when first accounting is more than described second for each feature in the fisrt feature set It adds in third feature set, obtains the third feature set added with multiple features;
The multiple features for obtaining the multiple non-seed user's sample, in the multiple feature, selection is not present in described the Fourth feature in three characteristic sets, and in the multiple non-seed user's sample, select corresponding with the fourth feature Non-seed user's sample generates negative sample collection.
6. a kind of device of determining target user, which is characterized in that described device includes:
Acquisition module, for obtaining fisrt feature set, multiple seed user samples and multiple non-seed user's samples;
Accounting computing module is used for for each feature in the fisrt feature set, calculating the seed with this feature First accounting of the family sample in the multiple seed user sample and non-seed user's sample with this feature are described more The second accounting in a non-seed user's sample;
Negative sample collection generation module is used for the magnitude relationship of the first accounting and the second accounting according to each feature, generates Second feature set or third feature set;And according to the second feature set or the third feature set, described more First non-seed user's sample is selected in a non-seed user's sample, generates negative sample collection;
Training module, for obtaining the multiple seed user sample, and using the multiple seed user sample as positive sample Collection, obtain the first sample label of the positive sample collection, the positive sample concentrate the fisrt feature of each seed user sample to Amount, the second sample label of the negative sample collection and the negative sample concentrate the second feature of each non-seed user's sample to Amount, is trained logic of propositions regression model, the Logic Regression Models after being trained;
Sample value computing module, the third for obtaining each non-seed user's sample in the multiple non-seed user's sample are special Sign vector, and according to the Logic Regression Models after training described in the third feature vector sum, calculate the multiple non-seed use The sample value of each non-seed user's sample in the sample of family;
Target user's selecting module, for obtaining target user's quantity, in the multiple non-seed user's sample, according to institute The sequence of sample value from big to small is stated, selection meets first non-seed user's sample of target user's quantity, and will be with institute The corresponding non-seed user of first non-seed user's sample is stated as target user.
7. device according to claim 6, which is characterized in that described device further includes:
Fisrt feature set establishes module, fisrt feature for obtaining the multiple seed user sample and non-kind the multiple The second feature of child user sample, and according to the fisrt feature and the second feature, the fisrt feature set is established, In, each feature in the fisrt feature set does not repeat.
8. device according to claim 6, which is characterized in that described device further includes:
Coding module, for being encoded to each feature in the fisrt feature set, the fisrt feature after being encoded Set;
Correspondingly, the accounting computing module, is specifically used for:
For each feature in the fisrt feature set after the coding, the seed user sample with this feature is calculated in institute The first accounting in multiple seed user samples and non-seed user's sample with this feature are stated in the multiple non-seed use The second accounting in the sample of family.
9. according to claim 6~8 any one of them device, which is characterized in that the negative sample collection generation module, including:
Second feature set generates submodule, each feature for being directed in the fisrt feature set, in the institute of this feature When stating the first accounting less than second accounting, this feature is added in second feature set, obtains being added with multiple features Second feature set;
First negative sample collection generates submodule, multiple features for obtaining the multiple non-seed user's sample, described more In a feature, selection is present in the third feature in the second feature set, and in the multiple non-seed user's sample, Selection non-seed user's sample corresponding with the third feature, generates negative sample collection.
10. according to claim 6~8 any one of them device, which is characterized in that the negative sample collection generation module, including:
Third feature set generates submodule, for for each feature in the fisrt feature set, being accounted for described first When than being more than described second, this feature being added in third feature set, obtaining the third feature collection added with multiple features It closes;
Second negative sample collection generates submodule, multiple features for obtaining the multiple non-seed user's sample, described more In a feature, selection is not present in the fourth feature in the third feature set, and in the multiple non-seed user's sample In, non-seed user's sample corresponding with the fourth feature is selected, negative sample collection is generated.
11. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and steps of claim 1-5.
CN201810297028.4A 2018-04-04 2018-04-04 Method and device for determining target user and electronic equipment Active CN108647990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810297028.4A CN108647990B (en) 2018-04-04 2018-04-04 Method and device for determining target user and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810297028.4A CN108647990B (en) 2018-04-04 2018-04-04 Method and device for determining target user and electronic equipment

Publications (2)

Publication Number Publication Date
CN108647990A true CN108647990A (en) 2018-10-12
CN108647990B CN108647990B (en) 2022-06-03

Family

ID=63745395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810297028.4A Active CN108647990B (en) 2018-04-04 2018-04-04 Method and device for determining target user and electronic equipment

Country Status (1)

Country Link
CN (1) CN108647990B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447730A (en) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 Target user orientation method and device
US20170017998A1 (en) * 2015-07-17 2017-01-19 Adobe Systems Incorporated Determining context and mindset of users
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device
CN107093084A (en) * 2016-08-01 2017-08-25 北京小度信息科技有限公司 Potential user predicts method for transformation and device
CN107369052A (en) * 2017-08-29 2017-11-21 北京小度信息科技有限公司 User's registration behavior prediction method, apparatus and electronic equipment
CN107679920A (en) * 2017-10-20 2018-02-09 北京奇艺世纪科技有限公司 The put-on method and device of a kind of advertisement
CN107844584A (en) * 2017-11-14 2018-03-27 北京小度信息科技有限公司 Usage mining method, apparatus, electronic equipment and computer-readable recording medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170017998A1 (en) * 2015-07-17 2017-01-19 Adobe Systems Incorporated Determining context and mindset of users
CN105447730A (en) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 Target user orientation method and device
CN107093084A (en) * 2016-08-01 2017-08-25 北京小度信息科技有限公司 Potential user predicts method for transformation and device
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device
CN107369052A (en) * 2017-08-29 2017-11-21 北京小度信息科技有限公司 User's registration behavior prediction method, apparatus and electronic equipment
CN107679920A (en) * 2017-10-20 2018-02-09 北京奇艺世纪科技有限公司 The put-on method and device of a kind of advertisement
CN107844584A (en) * 2017-11-14 2018-03-27 北京小度信息科技有限公司 Usage mining method, apparatus, electronic equipment and computer-readable recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HTTP://DOC.MADAOMALL.COM/241.HTML: "聊聊lookalike模型的使用技巧", 《百度在线》 *
QIANG MA 等: "A Sub-linear, Massive-scale Look-alike Audience Extension System", 《IEEE》 *

Also Published As

Publication number Publication date
CN108647990B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN108197532B (en) The method, apparatus and computer installation of recognition of face
CN106651519B (en) Personalized recommendation method and system based on label information
US8706729B2 (en) Systems and methods for distributed data annotation
WO2019200782A1 (en) Sample data classification method, model training method, electronic device and storage medium
CN107835113A (en) Abnormal user detection method in a kind of social networks based on network mapping
CN109145245A (en) Predict method, apparatus, computer equipment and the storage medium of clicking rate
CN108132963A (en) Resource recommendation method and device, computing device and storage medium
CN109903086A (en) A kind of similar crowd's extended method, device and electronic equipment
CN108228684A (en) Training method, device, electronic equipment and the computer storage media of Clustering Model
CN108268575A (en) Processing method, the device and system of markup information
WO2022193753A1 (en) Continuous learning method and apparatus, and terminal and storage medium
CN108734587A (en) The recommendation method and terminal device of financial product
CN108984555A (en) User Status is excavated and information recommendation method, device and equipment
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN109447110A (en) The method of the multi-tag classification of comprehensive neighbours' label correlative character and sample characteristics
CN109543940B (en) Activity evaluation method, activity evaluation device, electronic equipment and storage medium
CN106156857B (en) The method and apparatus of the data initialization of variation reasoning
CN110457471A (en) File classification method and device based on A-BiLSTM neural network
CN110276243A (en) Score mapping method, face comparison method, device, equipment and storage medium
CN109345201A (en) Human Resources Management Method, device, electronic equipment and storage medium
CN107944026A (en) A kind of method, apparatus, server and the storage medium of atlas personalized recommendation
CN108647986A (en) A kind of target user determines method, apparatus and electronic equipment
CN110020910A (en) Object recommendation method and apparatus
CN112541010B (en) User gender prediction method based on logistic regression
CN104867032A (en) Electronic commerce client evaluation identification system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant