CN108804577A - A kind of predictor method of information label interest-degree - Google Patents
A kind of predictor method of information label interest-degree Download PDFInfo
- Publication number
- CN108804577A CN108804577A CN201810505164.8A CN201810505164A CN108804577A CN 108804577 A CN108804577 A CN 108804577A CN 201810505164 A CN201810505164 A CN 201810505164A CN 108804577 A CN108804577 A CN 108804577A
- Authority
- CN
- China
- Prior art keywords
- information
- user
- label
- interest
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 239000013598 vector Substances 0.000 claims abstract description 153
- 230000006399 behavior Effects 0.000 claims abstract description 61
- 238000013136 deep learning model Methods 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000000306 recurrent effect Effects 0.000 claims description 30
- 238000013528 artificial neural network Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 8
- 230000003542 behavioural effect Effects 0.000 description 9
- 238000003066 decision tree Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of predictor method of information label interest-degree, including:It creates and safeguards the candidate information library containing label;User property information label interest-degree vector is obtained according to user demographic;It obtains the historical behavior data of multiple users in preset time period and pre-processes, the deep learning model trained;It obtains the historical behavior data of active user and pre-processes, obtain the user behavior information label interest-degree vector of active user;User-information label interest-degree vector, final several information labels for determining user and being most interested in are calculated according to the user property information label interest-degree vector sum user behavior information label interest-degree vector of active user.The present invention solves the problems, such as the cold start-up that user interest degree is estimated; it avoids and chooses the low-quality problem of information that information will appear often directly from internet; the calculation amount to user interest degree pre-estimation problem is reduced simultaneously, and the scene of multiple labels is contained suitable for each sample.
Description
Technical field
The present invention relates to a kind of predictor methods of information label interest-degree, belong to computing technique field.
Background technology
With the fast development of internet, information substantial amounts and be in explosive growth, and the quality of information on network
It is very different, if directly carrying out user interest degree to all information of acquisition estimates operation, it is likely that can be poor by content quality
Message push to user, influence user experience, and carry out user interest degree to all information and estimate operation that algorithm meter can be caused
Calculation amount increases, and causes to waste to computing resource.
Although the information of user's browsing is different, the corresponding information label of information can often be divided into several big classifications, and use
It family will be far more than the duration interested to some information to the duration interested of certain information label.Such as user is clear
Some information label is look at as the same information will not be re-read substantially after the information of " finance and economics ", but user is still to " finance and economics "
Other information of label are interested.Therefore interested by finding user to the interest-degree predictor method of information label to user
Information label, studies information personalized push etc. and application is of great significance.
Common problem is cold start-up problem in user interest degree predictor method used in currently practical application,
I.e. user not browsed information when how to user carry out interest-degree estimate.
In the prior art, have and estimated by the method based on recurrent neural network, this method is by browsing user
The corresponding information label of information sequentially input the interested information mark of user be trained and predicted in recurrent neural network
Label.This method can utilize the temporal aspect in user's history behavior, therefore join in training sample abundance and recurrent neural network
In the case of number is adjusted suitably, effect is preferable.But this method has the following defects:
1 can not carry out user interest degree in user's not browsed any information estimates.
2 can not utilize the demographic of user, such as gender, age, region.
In the prior art, the method by being based on TF-IDF (word frequency-inverse document frequency) obtains each information
Keyword obtains user to the emerging of each keyword by the way that the keyword in the information that is browsed to active user is for statistical analysis
Interesting degree.
TF-IDF methods are a kind of statistical methods, and for an information, certain words appears in the frequency in the information
The significance level of the words is embodied, the number that certain words occurs in the information is more, and the words is important in the information
Property is bigger, but the raising of frequency that can occur in whole information with the words of the importance of the words and decline.I.e. if
The frequency that certain words occurs in current information is high, and seldom occurs in other information, then it is assumed that the words can be fine
Ground represents the information, and the words is the keyword of current information at this time.By the keyword in the information that is browsed to user into
Interest-degree of the user to each keyword can be obtained in row statistical analysis, can be used for the subsequently information individual character based on information keyword
Change push, but this method has the following defects:
Each words in 1 pair of each information counts the number that the words occurs in current information, in all information
The number of appearance, calculation amount are larger.
The keyword distribution that 2 statistics obtain is too extensive, and the content of each keyword representative may be directed to the neck of very little
Domain is unfavorable for the interested information range of user that control is estimated.Such as obtain certain information pair of user's browsing using TF-IDF
The keyword answered is " Lin Daiyu ", is estimated if carrying out user interest degree to user according to the keyword, the follow-up money for giving user
Be likely to excessively concentrate on the information for including " Lin Daiyu " in news push, and be difficult to expand to well " A Dream of Red Mansions " or " in
State's literature " etc. influences interest-degree and estimates and message push effect.
Even if the keyword that 3 certain information include score when user interest degree is estimated is very high, it is also possible to because of information quality
Problem and user interest cannot be caused.
4 can not carry out user interest degree in user's not browsed any information estimates.
The browsed all information of 5 users have par when carrying out user interest degree and estimating, and cannot embody difference
The information of browsing sequencing carries out current time difference when user interest degree is estimated.And the information pair often more closely browsed
What current interest-degree was estimated is affected.
6 can not utilize the demographic of user, such as gender, age, region.
In the prior art, have and estimated by promoting the user interest degree predictor method of traditional decision-tree based on gradient,
Gradient promoted decision tree (GBDT) be it is a kind of by more regression trees of iteration come the machine learning method of Shared Decision Making.Gradient is promoted
Decision tree is made of more regression trees, and each regression tree is fitted by the result and residual error of all regression trees before study
To current regression tree.Residual error refers to the value that actual value subtracts each other with predicted value herein.The result of all regression trees adds up work
The final result of decision tree is promoted for gradient.This method can utilize the information of the demographic and user's browsing of user simultaneously
Corresponding information label information.But this method has the following defects:
1 gradient promotes decision tree and is inherently suitable for use in regression problem, or completes two classification problems by threshold value is arranged.It is right
In user for information label interest-degree pre-estimation problem, the number of labels contained in information tag library is more, and each information
Often more than one, gradient promote each calculate of decision tree and can only obtain interest-degree of the user to some label contained label
Discreet value, if it is desired to obtain interest-degree of the user to each information label, each information label need to be respectively adopted gradient promotion
Traditional decision-tree is estimated to carry out interest-degree, and calculation amount is the m of gradient promotion decision tree calculation amount when solving two classification problems
(m is the total number of labels in information tag library) again, calculation amount is larger.
The browsed all information of 2 users have par when carrying out user interest degree and estimating, and cannot embody difference
The information of browsing sequencing carries out current time difference when user interest degree is estimated.And the information pair often more closely browsed
What current interest-degree was estimated is affected.
Invention content
In view of the foregoing drawbacks, the present invention provides a kind of predictor methods of information label interest-degree, are belonged to by establishing user
Property-information label interest-degree vector, it solves the problems, such as the cold start-up that user interest degree is estimated, establishes the candidate information library containing label,
It avoids and chooses the low-quality problem of information that information will appear often directly from internet, reduce pre- to user interest degree
The calculation amount for estimating problem contains the scene of multiple labels suitable for each sample.
In order to achieve the above objectives, the present invention implements by the following technical programs:
The present invention provides a kind of predictor method of information label interest-degree, this method includes:
It creates and safeguards the candidate information library containing label;
User property-information label interest-degree vector is obtained according to user demographic;
Obtain preset time period in multiple users historical behavior data and pre-process, input deep learning model into
The deep learning model that row has been trained;
It obtains the historical behavior data of active user and pre-processes, the deep learning model that use has been trained calculates
To user behavior-information label interest-degree vector of active user;
According to the user property of active user-information label interest-degree vector sum user behavior-information label interest-degree to
User-information label interest-degree vector, final several information labels for determining user and being most interested in are calculated in amount.
Further, described to create and include the step of safeguarding the candidate information library containing label:
It is selected from preset information tag library with the most matched one or more labels of information content as the information
Information after addition label is added in the candidate information library containing label label;To each information in candidate information library, according to
Each information is indicated that m is the mark in preset information tag library by the corresponding information label of information with the m information vectors tieed up
Label sum;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
Further, the user demographic includes but not limited to:Retrievable gender, age and/or region letter
One or more information that several groups are marked off to user in breath.
Further, described user property-information label interest-degree vector is obtained according to user demographic to include:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
Further, described to obtain the historical behavior data of multiple users in preset time period and pre-process, it inputs
Deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained,
And by information vector by progress recurrent neural network mould in the chronological order input recurrent neural networks model of browsing information
The training of type, the deep learning model trained.
Further, it is described obtain active user historical behavior data and pre-process, the depth that use has been trained
User behavior-information label interest-degree vector of active user is calculated in learning model, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to time order and function
It is ranked sequentially;
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user
It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters
After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user
Sign interest-degree vector.
Further, the user property-information label interest-degree vector sum user behavior-information according to active user
User-information label interest-degree vector, final several moneys for determining user and being most interested in are calculated in label interest-degree vector
Label is interrogated, including:
According to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information mark
Interest-degree vector is signed, m dimension interest-degree vector of the active user to information label can be calculated, calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior,
Information label) it is user behavior-information label interest-degree vector;V (user property, information label) is that the user of active user belongs to
Property-information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest of the active user to information label
Shared weight, w values should meet always in [0,1] range in degree vector.
Further, the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
A kind of predictor method of information label interest-degree provided by the invention, innovatively proposes user property-information
Label interest-degree vector, user behavior-information label interest-degree vector combine and obtain the side of user-information label interest-degree vector
Method so that in user's not browsed information, interest of the group to information label where finding it using user demographic
It spends to avoid cold start-up problem, and when the information of user's browsing gradually increases, user behavior-information label interest-degree vector institute
It accounts for weight gradually to increase, since user behavior-information label interest-degree vector is by the depth based on recurrent neural network
It practises model to be calculated, the temporal aspect in user's history behavioral data is utilized, therefore when training sample is sufficient and recurrent neural
In the case of network parameter is adjusted suitably, effect is better than common non-deep learning model.
User property-information label interest-degree vector can make up user not browsed information when cold start-up problem, it is single
It is pure using the deep learning model based on recurrent neural network come carry out can not be not clear in user when information label interest-degree is estimated
Look at information when estimate interest-degree.
The present invention establishes and safeguards the candidate information library containing label, avoids the selection information directly from internet and often can
The low-quality problem of information of appearance, and due to being screened to information, reduce the meter to user interest degree pre-estimation problem
Calculation amount.Operation is estimated to information label interest-degree to can be used for subsequent user after the information addition label of screening.
The present invention takes full advantage of in the demographic of user and the historical behavior data of user and historical behavior data
The timing information of information.By establishing user property-information label interest-degree vector, can solve user interest degree estimate it is cold
Starting problem.By establishing user behavior-information label interest-degree vector, the advantage of deep learning model is utilized.
The candidate information library containing label is established, avoids and chooses the information matter that information will appear often directly from internet
Low problem is measured, and due to being screened to information, reduces the calculation amount to user interest degree pre-estimation problem.
Information in candidate information library contains one or more information labels, and a lot of other interest-degree predictor methods only props up
Each sample is held to contain there are one information label, and in this method user behavior-information label interest-degree vector calculate using
Deep learning model based on recurrent neural network is calculated, and outputting and inputting for deep learning model can directly be used containing more
The vector of label information contains the scene of multiple labels suitable for each sample.
The tanh hyperbolic tangent functions that use when calculating user-information label interest-degree vector, can will [0 ,+∞) number
Be converted into [0,1) between number, meet this method just by user property-information label interest-degree vector sum user behavior-money
News label interest-degree vector is combined, and every dimension of the user being calculated-information label interest-degree vector is made to estimate
Value is all between [0,1] to reflect the demand of user interest degree.
Description of the drawings
Fig. 1 show a kind of one flow chart of embodiment of the predictor method of information label interest-degree provided by the invention.
Specific implementation mode
Technical scheme of the present invention is specifically addressed below, it should be pointed out that technical scheme of the present invention is unlimited
Embodiment described in embodiment, those skilled in the art's reference and the content for using for reference technical solution of the present invention, in this hair
The improvement and design carried out on the basis of bright, should belong to the scope of protection of the present invention.
Embodiment one
The embodiment of the present invention one provides a kind of predictor method of information label interest-degree, the method comprising the steps of S110-
S150:
Step S110, the candidate information library of establishment and maintenance containing label.
It is pushed to the quality of the information of user for guarantee, while algorithm calculation amount being made to be in zone of reasonableness, need to create and contain
The candidate information library of label, screens the information on network, selects the information of high quality and according to information content from default
Information tag library in select label with the most matched one or more label of information content as the information, label will be added
Information afterwards is added in the candidate information library containing label, is used for subsequent operation.
Tag number in preset information tag library is T1, T2..., Tm, m is the label in preset information tag library
Sum.For reference, preset information tag library can be that [finance and economics, sport is military, entertains, and lives, religion in the specific implementation
It educates, health, science and technology, culture, travelling, other], or be configured according to actual conditions, it needs to meet between each information label opposite
Independence, and the division of information label is unsuitable meticulous, influences subsequently to grasp to avoid because the corresponding information of each information label is very few
Make the accuracy of result.
To each information in candidate information library, each information is tieed up with a m according to the corresponding information label of information
Information vector indicates that m is the total number of labels in preset information tag library.When the information contains label TjWhen, m dimensional information vectors
Jth dimension value be 1, otherwise jth dimension value be 0.Such as certain information contains label T1, T2, T5, then the corresponding m of the information, which is tieed up, provides
News vector is [1,1,0,0,1,0,0 ..., 0,0].
Since information has certain timeliness, needs periodically to safeguard the candidate information library containing label, add
New information removes the information of no longer effective property.
Step S120, user property-information label interest-degree vector is obtained according to user demographic.
User demographic includes gender, age, region and other retrievable information, if being marked off to user with this
Cadres and masses body.To avoid generating large error because group divides the meticulous sample size deficiency for leading to each group, for user
Age can be divided into several gradients, such as:20 years old and hereinafter, 21-30 Sui, 31-40 Sui, 41-50 Sui, 51-60 Sui, 60 years old and
More than.To the regional information of user, can be divided according to province when sample number abundance, it can be by several province's numbers when sample number is less
According to merging, such as by Heilungkiang, Jilin, Liaoning is merged into " the Northeast ".
User is divided into several groups according to user demographic, such as [man, 31-40 Sui, Beijing] is a group
Body, [female, 21-30 Sui, Shanghai] is a group, and G is numbered to group1, G2..., Gn.I-th of group GiTo j-th
Information label TjInterest-degree HijFor:
HijValue between [0,1], and HijI-th of group G of the bigger explanation of valueiTo j-th of information label TjIt is emerging
Interesting degree is bigger.To each group, can be obtained a m dimensional vector, referred to as the user property of the group-information label interest-degree to
Amount is indicated with V (user property, information label).User property-information label interest-degree vector V (user property, information label)
The value of middle jth dimension is j-th of information label T of the group pairjInterest-degree.For example, the user of i-th of group belongs to
Property-information label interest-degree vector be [Hi1, Hi2..., Him], all same user property-information of users to share in the group
Label interest-degree vector.
Group's division is carried out using user demographic in this method, used demographic includes but not limited to
The information such as gender, age, the region of user, and divide group method can as the case may be depending on, provided in this method
It is only for reference to divide sample.
Step S130, the historical behavior data of multiple users in preset time period are obtained and are pre-processed, depth is inputted
Learning model is trained the deep learning model trained.
It obtains the historical behavior data of multiple users in preset time period and pre-processes.Preset time period T and extraction
Number of users can be set according to practical situations, such as setting period T be three months, extraction number of users be
Nuser, then from the randomly selected N in current all extracting datas three monthsuserThe historical behavior data of a user.For protection
Privacy of user and be convenient for data processing, by NuserA Customs Assigned Number is 1,2,3 ..., Nuser.To each user, its history is got
The corresponding information vector of each information browsed in behavioral data, arranges, the information first browsed is corresponded to according to chronological order
Information vector come after before the corresponding information vector of the information that browses.
The deep learning model that this method uses is recurrent neural networks model, which can be
RNN models and its improved model, such as LSTM.For each parameter elder generation random initializtion in deep learning model, subsequent basis
Customs Assigned Number in pretreated historical behavior data, for each user, the sequencing that information is browsed by the user will
The training of recurrent neural networks model is carried out in the corresponding m dimensional informations vector input recurrent neural networks model of information of browsing.
For recurrent neural networks model, for each user, the input of kth time is pretreated behavioral data
In k-th of information corresponding information vector for browsing of the user, at this time model obtain the output vector of a prediction, by the output
Vectorial information vector corresponding with+1 information of kth of user browsing in pretreated behavioral data compares, and calculates
Go out the deviation of recurrent neural network and constantly corrects the parameter of neural network model according to deviation.When a user after the pre-treatment
Historical behavior data in the corresponding information vector of information that all browses be sequentially inputted to the depth based on recurrent neural network
After being trained in degree learning model, by the corresponding money of information of next user's browsing in pretreated historical behavior data
News vector is sequentially input in chronological order in the recurrent neural networks model trained, and continues to train, until pretreatment
N in behavioral data afterwardsuserThe corresponding information vector of information of a user's browsing fully enters deep learning model training and finishes.
The deep learning model trained at this time.Since recurrent neural networks model generally uses in deep learning field,
Therefore the specific of recurrent neural networks model is built by this method repeats no more.
The deep learning model trained can be by pressing browsing information by the pretreated historical behavior data of some user
Chronological order sequentially input in model the interest-degree of information label estimated to carry out user, the deep learning trained
The specific output of model ties up predicted vector for a m, and m ties up the value that jth is tieed up in predicted vector and represents the deep learning trained
The user of model prediction is to information label TjInterest-degree, jth dimension the higher expression user of value to information label TjIt is interested
Possibility it is bigger.
Step S140, the historical behavior data of active user are obtained and are pre-processed, the deep learning that use has been trained
User behavior-information label interest-degree vector of active user is calculated in model.
When being estimated to active user's progress information label interest-degree using the deep learning model trained, gets it and go through
The corresponding information vector of each information browsed in history behavioral data, arranges, the information pair first browsed according to chronological order
Before the corresponding information vector of information that the information vector answered browses after coming.
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user
It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters
After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user
Interest-degree vector is signed, is indicated with V (user behavior, information label).
It includes recurrent neural network that the deep learning model of user behavior-information label interest-degree vector is calculated in this method
And its improved model.Improved recurrent neural networks model may be the neuronal quantity of network model, network model layer
Number, addition threshold function etc., if the calculating that the recurrent neural networks model after improved structure does not directly affect this method proposition is used
The mode of family behavior-information label interest-degree vector, then can be considered one of the implementation method of this method.
Step S150, according to the user property of active user-information label interest-degree vector sum user behavior-information label
User-information label interest-degree vector, final several information marks for determining user and being most interested in are calculated in interest-degree vector
Label.
According to the user property for the active user having calculated that-information label interest-degree vector V (user property, information mark
Label) and user behavior-information label interest-degree vector V (user behavior, information label), active user can be calculated to information mark
The m dimension interest-degree vectors of label, are indicated with V (user, information label).Calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
W in above-mentioned formula indicates that V (user behavior, information label) is calculating interest-degree of the active user to information label
Shared weight in vector.For new user, no historical behavior data, therefore w need to meet in active user without history row
W is 0 when being several, and due to gradually increasing with user's history behavioral data, and V (user behavior, information label) can be more accurate
Really reflect user interest degree, therefore the weight shared by V (user behavior, information label) should be with user's history behavioral data
Increase and gradually increase, and w values should meet always in [0,1] range.Based on the demand, the calculation formula of w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein tanh is hyperbolic tangent function, and a is the constant more than 0, is browsed in preset time period T in active user
In the case that information quantity is constant, a is bigger, and w is bigger, and the value of a can be arranged according to practical situations.For reference, a can
It is set as 0.05.
The value that the active user being calculated ties up jth in the interest-degree vector V (user, information label) of information label
Final user that this method is calculated is represented to label TjInterest-degree, jth dimension higher the expressions user of value to mark
Sign TjInterested possibility is bigger.Interest-degree vector V (user, information mark of the active user being calculated to information label
Label) interest-degree of the active user to all labels in information tag library is completely embodied, information is being carried out by label to user
When push, the highest several labels of value in V (user, information label) can be selected to carry out message push according to specific needs.
Such as when information tag library is [finance and economics, sport is military, entertains, and lives, education, health, science and technology, culture, travelling,
Other] when, if the active user being calculated be to the interest-degree vector V (user, information label) of information label [0.42,
0.08,0.02,0.20,0.01,0.06,0.33,0.41,0.05,0.19,0.14], by user-information label interest-degree vector
Understand that the active user that this method is estimated is respectively finance and economics, science and technology, health, joy from high to low to the interest-degree of each information label
Happy, travelling, other, sport, education, culture, military affairs, life.If preferentially choosing three labels that active user is most interested in
Information carries out personalized push, then the corresponding information label of information pushed is finance and economics, science and technology, health.
Further, described to create and include the step of safeguarding the candidate information library containing label:
It is selected from preset information tag library with the most matched one or more labels of information content as the information
Information after addition label is added in the candidate information library containing label label;To each information in candidate information library, according to
Each information is indicated that m is the mark in preset information tag library by the corresponding information label of information with the m information vectors tieed up
Label sum;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
Further, the user demographic includes but not limited to:Retrievable gender, age and/or region letter
One or more information that several groups are marked off to user in breath.
Further, described user property-information label interest-degree vector is obtained according to user demographic to include:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
Further, described to obtain the historical behavior data of multiple users in preset time period and pre-process, it inputs
Deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained,
And by information vector by progress recurrent neural network mould in the chronological order input recurrent neural networks model of browsing information
The training of type, the deep learning model trained.
Further, it is described obtain active user historical behavior data and pre-process, the depth that use has been trained
User behavior-information label interest-degree vector of active user is calculated in learning model, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to time order and function
It is ranked sequentially;
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user
It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters
After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user
Sign interest-degree vector.
Further, the user property-information label interest-degree vector sum user behavior-information according to active user
User-information label interest-degree vector, final several moneys for determining user and being most interested in are calculated in label interest-degree vector
Label is interrogated, including:
According to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information mark
Interest-degree vector is signed, m dimension interest-degree vector of the active user to information label can be calculated, calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior,
Information label) it is user behavior-information label interest-degree vector;V (user property, information label) is that the user of active user belongs to
Property-information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest of the active user to information label
Shared weight, w values should meet always in [0,1] range in degree vector.
Further, the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
A kind of predictor method of information label interest-degree provided by the invention, innovatively proposes user property-information
Label interest-degree vector, user behavior-information label interest-degree vector combine and obtain the side of user-information label interest-degree vector
Method so that in user's not browsed information, interest of the group to information label where finding it using user demographic
It spends to avoid cold start-up problem, and when the information of user's browsing gradually increases, user behavior-information label interest-degree vector institute
It accounts for weight gradually to increase, since user behavior-information label interest-degree vector is by the depth based on recurrent neural network
It practises model to be calculated, the temporal aspect in user's history behavioral data is utilized, therefore when training sample is sufficient and recurrent neural
In the case of network parameter is adjusted suitably, effect is better than common non-deep learning model.
User property-information label interest-degree vector can make up user not browsed information when cold start-up problem, it is single
It is pure using the deep learning model based on recurrent neural network come carry out can not be not clear in user when information label interest-degree is estimated
Look at information when estimate interest-degree.
The present invention establishes and safeguards the candidate information library containing label, avoids the selection information directly from internet and often can
The low-quality problem of information of appearance, and due to being screened to information, reduce the meter to user interest degree pre-estimation problem
Calculation amount.Operation is estimated to information label interest-degree to can be used for subsequent user after the information addition label of screening.
The present invention takes full advantage of in the demographic of user and the historical behavior data of user and historical behavior data
The timing information of information.By establishing user property-information label interest-degree vector, can solve user interest degree estimate it is cold
Starting problem.By establishing user behavior-information label interest-degree vector, the advantage of deep learning model is utilized.
The candidate information library containing label is established, avoids and chooses the information matter that information will appear often directly from internet
Low problem is measured, and due to being screened to information, reduces the calculation amount to user interest degree pre-estimation problem.
Information in candidate information library contains one or more information labels, and a lot of other interest-degree predictor methods only props up
Each sample is held to contain there are one information label, and in this method user behavior-information label interest-degree vector calculate using
Deep learning model based on recurrent neural network is calculated, and outputting and inputting for deep learning model can directly be used containing more
The vector of label information contains the scene of multiple labels suitable for each sample.
The tanh hyperbolic tangent functions that use when calculating user-information label interest-degree vector, can will [0 ,+∞) number
Be converted into [0,1) between number, meet this method just by user property-information label interest-degree vector sum user behavior-money
News label interest-degree vector is combined, and every dimension of the user being calculated-information label interest-degree vector is made to estimate
Value is all between [0,1] to reflect the demand of user interest degree.
The embodiment of the present invention proposes a kind of predictor method of information label interest-degree, and this method passes through to the money on network
News carry out screening and obtain the candidate information library containing label to the information addition label after screening.To needing progress information label emerging
The active user that interesting degree is estimated obtains the demographic characteristics of active user and arranges and analyze to data, is used using current
User property-information label interest-degree vector of active user is calculated in the demographic characteristics at family.This method has trained one
Deep learning model based on recurrent neural network obtains the historical behavior data of active user and arranges and divide to data
Analysis, and in the deep learning model that the input of the historical behavior data of active user has been trained, obtained using deep learning model
The user behavior of active user-information label interest-degree vector.The user property-of active user is utilized according to the method proposed
Information label interest-degree vector sum user behavior-information label interest-degree vector be calculated user-information label interest-degree to
Amount.The user for the active user being calculated-information label interest-degree vector embodies this method to active user to each information
The estimation results of label interest-degree.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can lead to
Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software.Based on this understanding, this hair
Bright technical solution can be expressed in the form of software products, which can be stored in a non-volatile memories
In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that a computer equipment (can be
Personal computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
Disclosed above is only several specific embodiments of the present invention, and still, the present invention is not limited to above-described embodiment,
The changes that any person skilled in the art can think of should all fall into protection scope of the present invention.
Claims (8)
1. a kind of predictor method of information label interest-degree, which is characterized in that this method includes:
It creates and safeguards the candidate information library containing label;
User property-information label interest-degree vector is obtained according to user demographic;
It obtains the historical behavior data of multiple users in preset time period and pre-processes, input deep learning model is instructed
Get the deep learning model trained;
It obtains the historical behavior data of active user and pre-processes, be calculated and worked as using the deep learning model trained
The user behavior of preceding user-information label interest-degree vector;
According to the user property of active user-information label interest-degree vector sum user behavior-information label interest-degree to gauge
Calculation obtains user-information label interest-degree vector, final several information labels for determining user and being most interested in.
2. the method as described in claim 1, which is characterized in that described the step of creating and safeguarding the candidate information library containing label
Including:
The label as the information with the most matched one or more labels of information content is selected from preset information tag library,
Information after addition label is added in the candidate information library containing label;To each information in candidate information library, according to information
Each information is indicated that m is that the label in preset information tag library is total by corresponding information label with the m information vectors tieed up
Number;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
3. the method as described in claim 1, which is characterized in that the user demographic includes but not limited to:It can obtain
Gender, one or more information that several groups are marked off to user in age and/or regional information.
4. the method as described in claim 1 or 3, which is characterized in that described to obtain user property-according to user demographic
Information label interest-degree vector includes:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
5. the method as described in claim 1, which is characterized in that the historical behavior for obtaining multiple users in preset time period
Data are simultaneously pre-processed, and input deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained, and will
Information vector is by progress recurrent neural networks model in the chronological order input recurrent neural networks model of browsing information
Training, the deep learning model trained.
6. the method as described in claim 1 or 5, which is characterized in that the historical behavior data for obtaining active user are gone forward side by side
Row pretreatment, using the deep learning model trained be calculated user behavior-information label interest-degree of active user to
Amount, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to chronological order
Arrangement;
The corresponding information vector of each information in the historical behavior data of active user is in chronological sequence sequentially sequentially input
Into the deep learning model trained, finished when the corresponding information vector of each information in historical behavior data fully enters
Afterwards, the m dimension predicted vectors that the deep learning model trained at this time obtains are that user behavior-information label of active user is emerging
Interesting degree vector.
7. the method as described in one of claim 1-6, which is characterized in that the user property-information according to active user
User-information label interest-degree vector is calculated in label interest-degree vector sum user behavior-information label interest-degree vector, most
Several information labels that user is most interested in are determined eventually, including:
It is emerging according to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information label
Interesting degree vector, can calculate m dimension interest-degree vector of the active user to information label, and calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior, information
Label) it is user behavior-information label interest-degree vector;V (user property, information label) is the user property-of active user
Information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest-degree of the active user to information label
Shared weight, w values should meet always in [0,1] range in vector.
8. the method for claim 7, which is characterized in that the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810505164.8A CN108804577B (en) | 2018-05-24 | 2018-05-24 | Method for estimating interest degree of information tag |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810505164.8A CN108804577B (en) | 2018-05-24 | 2018-05-24 | Method for estimating interest degree of information tag |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804577A true CN108804577A (en) | 2018-11-13 |
CN108804577B CN108804577B (en) | 2022-11-01 |
Family
ID=64091552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810505164.8A Active CN108804577B (en) | 2018-05-24 | 2018-05-24 | Method for estimating interest degree of information tag |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804577B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232620A (en) * | 2019-06-05 | 2019-09-13 | 拉扎斯网络科技(上海)有限公司 | Trade company's label determines method, apparatus, electronic equipment and readable storage medium storing program for executing |
CN111177538A (en) * | 2019-12-13 | 2020-05-19 | 杭州顺网科技股份有限公司 | Unsupervised weight calculation-based user interest tag construction method |
CN111914159A (en) * | 2019-05-10 | 2020-11-10 | 招商证券股份有限公司 | Information recommendation method and terminal |
CN112100221A (en) * | 2019-06-17 | 2020-12-18 | 腾讯科技(北京)有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN112133431A (en) * | 2020-08-27 | 2020-12-25 | 绿瘦健康产业集团有限公司 | Health information message pushing method, device, medium and terminal equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150363688A1 (en) * | 2014-06-13 | 2015-12-17 | Microsoft Corporation | Modeling interestingness with deep neural networks |
CN106446189A (en) * | 2016-09-29 | 2017-02-22 | 广州艾媒数聚信息咨询股份有限公司 | Message-recommending method and system |
CN107273538A (en) * | 2017-06-29 | 2017-10-20 | 广州优视网络科技有限公司 | Information recommends method, device and server |
CN107341245A (en) * | 2017-07-06 | 2017-11-10 | 广州优视网络科技有限公司 | Data processing method, device and server |
CN107908789A (en) * | 2017-12-12 | 2018-04-13 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
-
2018
- 2018-05-24 CN CN201810505164.8A patent/CN108804577B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150363688A1 (en) * | 2014-06-13 | 2015-12-17 | Microsoft Corporation | Modeling interestingness with deep neural networks |
CN106446189A (en) * | 2016-09-29 | 2017-02-22 | 广州艾媒数聚信息咨询股份有限公司 | Message-recommending method and system |
CN107273538A (en) * | 2017-06-29 | 2017-10-20 | 广州优视网络科技有限公司 | Information recommends method, device and server |
CN107341245A (en) * | 2017-07-06 | 2017-11-10 | 广州优视网络科技有限公司 | Data processing method, device and server |
CN107908789A (en) * | 2017-12-12 | 2018-04-13 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914159A (en) * | 2019-05-10 | 2020-11-10 | 招商证券股份有限公司 | Information recommendation method and terminal |
CN111914159B (en) * | 2019-05-10 | 2024-03-12 | 招商证券股份有限公司 | Information recommendation method and terminal |
CN110232620A (en) * | 2019-06-05 | 2019-09-13 | 拉扎斯网络科技(上海)有限公司 | Trade company's label determines method, apparatus, electronic equipment and readable storage medium storing program for executing |
CN110232620B (en) * | 2019-06-05 | 2021-07-30 | 拉扎斯网络科技(上海)有限公司 | Merchant label determination method and device, electronic equipment and readable storage medium |
CN112100221A (en) * | 2019-06-17 | 2020-12-18 | 腾讯科技(北京)有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN112100221B (en) * | 2019-06-17 | 2024-02-13 | 深圳市雅阅科技有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN111177538A (en) * | 2019-12-13 | 2020-05-19 | 杭州顺网科技股份有限公司 | Unsupervised weight calculation-based user interest tag construction method |
CN111177538B (en) * | 2019-12-13 | 2023-05-05 | 杭州顺网科技股份有限公司 | User interest label construction method based on unsupervised weight calculation |
CN112133431A (en) * | 2020-08-27 | 2020-12-25 | 绿瘦健康产业集团有限公司 | Health information message pushing method, device, medium and terminal equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108804577B (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804577A (en) | A kind of predictor method of information label interest-degree | |
CN108960499B (en) | Garment fashion trend prediction system integrating visual and non-visual features | |
CN110321926B (en) | Migration method and system based on depth residual error correction network | |
CN111163359B (en) | Bullet screen generation method and device and computer readable storage medium | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN103812872B (en) | A kind of network navy behavioral value method and system based on mixing Di Li Cray process | |
CN106919951A (en) | A kind of Weakly supervised bilinearity deep learning method merged with vision based on click | |
CN108446964B (en) | User recommendation method based on mobile traffic DPI data | |
CN105913296A (en) | Customized recommendation method based on graphs | |
CN108573041A (en) | Probability matrix based on weighting trusting relationship decomposes recommendation method | |
CN111275496B (en) | Self-media advertisement intelligent recommendation method | |
CN112967088A (en) | Marketing activity prediction model structure and prediction method based on knowledge distillation | |
CN110110663A (en) | A kind of age recognition methods and system based on face character | |
CN108710609A (en) | A kind of analysis method of social platform user information based on multi-feature fusion | |
CN106445915A (en) | New word discovery method and device | |
CN110414005A (en) | Intention recognition method, electronic device, and storage medium | |
Liu et al. | Multi-perspective User2Vec: Exploiting re-pin activity for user representation learning in content curation social network | |
CN104572915A (en) | User event relevance calculation method based on content environment enhancement | |
CN103605493A (en) | Parallel sorting learning method and system based on graphics processing unit | |
Jia et al. | Dynamic group recommendation algorithm based on member activity level | |
CN105678430A (en) | Improved user recommendation method based on neighbor project slope one algorithm | |
CN109740743A (en) | Hierarchical neural network query recommendation method and device | |
CN109255019A (en) | A kind of online exam pool and its application method based on artificial intelligence | |
CN111966829B (en) | Network topic outbreak time prediction method based on deep survival analysis | |
Zou et al. | TRCF: Temporal Reinforced Collaborative Filtering for Time-Aware QoS Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |