CN108804577A - A kind of predictor method of information label interest-degree - Google Patents

A kind of predictor method of information label interest-degree Download PDF

Info

Publication number
CN108804577A
CN108804577A CN201810505164.8A CN201810505164A CN108804577A CN 108804577 A CN108804577 A CN 108804577A CN 201810505164 A CN201810505164 A CN 201810505164A CN 108804577 A CN108804577 A CN 108804577A
Authority
CN
China
Prior art keywords
information
user
label
interest
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810505164.8A
Other languages
Chinese (zh)
Other versions
CN108804577B (en
Inventor
常剑
孙宇
张洪刚
徐彬
高珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
China Unicom Online Information Technology Co Ltd
Original Assignee
Beijing University of Posts and Telecommunications
China Unicom Online Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, China Unicom Online Information Technology Co Ltd filed Critical Beijing University of Posts and Telecommunications
Priority to CN201810505164.8A priority Critical patent/CN108804577B/en
Publication of CN108804577A publication Critical patent/CN108804577A/en
Application granted granted Critical
Publication of CN108804577B publication Critical patent/CN108804577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of predictor method of information label interest-degree, including:It creates and safeguards the candidate information library containing label;User property information label interest-degree vector is obtained according to user demographic;It obtains the historical behavior data of multiple users in preset time period and pre-processes, the deep learning model trained;It obtains the historical behavior data of active user and pre-processes, obtain the user behavior information label interest-degree vector of active user;User-information label interest-degree vector, final several information labels for determining user and being most interested in are calculated according to the user property information label interest-degree vector sum user behavior information label interest-degree vector of active user.The present invention solves the problems, such as the cold start-up that user interest degree is estimated; it avoids and chooses the low-quality problem of information that information will appear often directly from internet; the calculation amount to user interest degree pre-estimation problem is reduced simultaneously, and the scene of multiple labels is contained suitable for each sample.

Description

A kind of predictor method of information label interest-degree
Technical field
The present invention relates to a kind of predictor methods of information label interest-degree, belong to computing technique field.
Background technology
With the fast development of internet, information substantial amounts and be in explosive growth, and the quality of information on network It is very different, if directly carrying out user interest degree to all information of acquisition estimates operation, it is likely that can be poor by content quality Message push to user, influence user experience, and carry out user interest degree to all information and estimate operation that algorithm meter can be caused Calculation amount increases, and causes to waste to computing resource.
Although the information of user's browsing is different, the corresponding information label of information can often be divided into several big classifications, and use It family will be far more than the duration interested to some information to the duration interested of certain information label.Such as user is clear Some information label is look at as the same information will not be re-read substantially after the information of " finance and economics ", but user is still to " finance and economics " Other information of label are interested.Therefore interested by finding user to the interest-degree predictor method of information label to user Information label, studies information personalized push etc. and application is of great significance.
Common problem is cold start-up problem in user interest degree predictor method used in currently practical application, I.e. user not browsed information when how to user carry out interest-degree estimate.
In the prior art, have and estimated by the method based on recurrent neural network, this method is by browsing user The corresponding information label of information sequentially input the interested information mark of user be trained and predicted in recurrent neural network Label.This method can utilize the temporal aspect in user's history behavior, therefore join in training sample abundance and recurrent neural network In the case of number is adjusted suitably, effect is preferable.But this method has the following defects:
1 can not carry out user interest degree in user's not browsed any information estimates.
2 can not utilize the demographic of user, such as gender, age, region.
In the prior art, the method by being based on TF-IDF (word frequency-inverse document frequency) obtains each information Keyword obtains user to the emerging of each keyword by the way that the keyword in the information that is browsed to active user is for statistical analysis Interesting degree.
TF-IDF methods are a kind of statistical methods, and for an information, certain words appears in the frequency in the information The significance level of the words is embodied, the number that certain words occurs in the information is more, and the words is important in the information Property is bigger, but the raising of frequency that can occur in whole information with the words of the importance of the words and decline.I.e. if The frequency that certain words occurs in current information is high, and seldom occurs in other information, then it is assumed that the words can be fine Ground represents the information, and the words is the keyword of current information at this time.By the keyword in the information that is browsed to user into Interest-degree of the user to each keyword can be obtained in row statistical analysis, can be used for the subsequently information individual character based on information keyword Change push, but this method has the following defects:
Each words in 1 pair of each information counts the number that the words occurs in current information, in all information The number of appearance, calculation amount are larger.
The keyword distribution that 2 statistics obtain is too extensive, and the content of each keyword representative may be directed to the neck of very little Domain is unfavorable for the interested information range of user that control is estimated.Such as obtain certain information pair of user's browsing using TF-IDF The keyword answered is " Lin Daiyu ", is estimated if carrying out user interest degree to user according to the keyword, the follow-up money for giving user Be likely to excessively concentrate on the information for including " Lin Daiyu " in news push, and be difficult to expand to well " A Dream of Red Mansions " or " in State's literature " etc. influences interest-degree and estimates and message push effect.
Even if the keyword that 3 certain information include score when user interest degree is estimated is very high, it is also possible to because of information quality Problem and user interest cannot be caused.
4 can not carry out user interest degree in user's not browsed any information estimates.
The browsed all information of 5 users have par when carrying out user interest degree and estimating, and cannot embody difference The information of browsing sequencing carries out current time difference when user interest degree is estimated.And the information pair often more closely browsed What current interest-degree was estimated is affected.
6 can not utilize the demographic of user, such as gender, age, region.
In the prior art, have and estimated by promoting the user interest degree predictor method of traditional decision-tree based on gradient, Gradient promoted decision tree (GBDT) be it is a kind of by more regression trees of iteration come the machine learning method of Shared Decision Making.Gradient is promoted Decision tree is made of more regression trees, and each regression tree is fitted by the result and residual error of all regression trees before study To current regression tree.Residual error refers to the value that actual value subtracts each other with predicted value herein.The result of all regression trees adds up work The final result of decision tree is promoted for gradient.This method can utilize the information of the demographic and user's browsing of user simultaneously Corresponding information label information.But this method has the following defects:
1 gradient promotes decision tree and is inherently suitable for use in regression problem, or completes two classification problems by threshold value is arranged.It is right In user for information label interest-degree pre-estimation problem, the number of labels contained in information tag library is more, and each information Often more than one, gradient promote each calculate of decision tree and can only obtain interest-degree of the user to some label contained label Discreet value, if it is desired to obtain interest-degree of the user to each information label, each information label need to be respectively adopted gradient promotion Traditional decision-tree is estimated to carry out interest-degree, and calculation amount is the m of gradient promotion decision tree calculation amount when solving two classification problems (m is the total number of labels in information tag library) again, calculation amount is larger.
The browsed all information of 2 users have par when carrying out user interest degree and estimating, and cannot embody difference The information of browsing sequencing carries out current time difference when user interest degree is estimated.And the information pair often more closely browsed What current interest-degree was estimated is affected.
Invention content
In view of the foregoing drawbacks, the present invention provides a kind of predictor methods of information label interest-degree, are belonged to by establishing user Property-information label interest-degree vector, it solves the problems, such as the cold start-up that user interest degree is estimated, establishes the candidate information library containing label, It avoids and chooses the low-quality problem of information that information will appear often directly from internet, reduce pre- to user interest degree The calculation amount for estimating problem contains the scene of multiple labels suitable for each sample.
In order to achieve the above objectives, the present invention implements by the following technical programs:
The present invention provides a kind of predictor method of information label interest-degree, this method includes:
It creates and safeguards the candidate information library containing label;
User property-information label interest-degree vector is obtained according to user demographic;
Obtain preset time period in multiple users historical behavior data and pre-process, input deep learning model into The deep learning model that row has been trained;
It obtains the historical behavior data of active user and pre-processes, the deep learning model that use has been trained calculates To user behavior-information label interest-degree vector of active user;
According to the user property of active user-information label interest-degree vector sum user behavior-information label interest-degree to User-information label interest-degree vector, final several information labels for determining user and being most interested in are calculated in amount.
Further, described to create and include the step of safeguarding the candidate information library containing label:
It is selected from preset information tag library with the most matched one or more labels of information content as the information Information after addition label is added in the candidate information library containing label label;To each information in candidate information library, according to Each information is indicated that m is the mark in preset information tag library by the corresponding information label of information with the m information vectors tieed up Label sum;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
Further, the user demographic includes but not limited to:Retrievable gender, age and/or region letter One or more information that several groups are marked off to user in breath.
Further, described user property-information label interest-degree vector is obtained according to user demographic to include:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
Further, described to obtain the historical behavior data of multiple users in preset time period and pre-process, it inputs Deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained, And by information vector by progress recurrent neural network mould in the chronological order input recurrent neural networks model of browsing information The training of type, the deep learning model trained.
Further, it is described obtain active user historical behavior data and pre-process, the depth that use has been trained User behavior-information label interest-degree vector of active user is calculated in learning model, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to time order and function It is ranked sequentially;
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user Sign interest-degree vector.
Further, the user property-information label interest-degree vector sum user behavior-information according to active user User-information label interest-degree vector, final several moneys for determining user and being most interested in are calculated in label interest-degree vector Label is interrogated, including:
According to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information mark Interest-degree vector is signed, m dimension interest-degree vector of the active user to information label can be calculated, calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior, Information label) it is user behavior-information label interest-degree vector;V (user property, information label) is that the user of active user belongs to Property-information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest of the active user to information label Shared weight, w values should meet always in [0,1] range in degree vector.
Further, the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
A kind of predictor method of information label interest-degree provided by the invention, innovatively proposes user property-information Label interest-degree vector, user behavior-information label interest-degree vector combine and obtain the side of user-information label interest-degree vector Method so that in user's not browsed information, interest of the group to information label where finding it using user demographic It spends to avoid cold start-up problem, and when the information of user's browsing gradually increases, user behavior-information label interest-degree vector institute It accounts for weight gradually to increase, since user behavior-information label interest-degree vector is by the depth based on recurrent neural network It practises model to be calculated, the temporal aspect in user's history behavioral data is utilized, therefore when training sample is sufficient and recurrent neural In the case of network parameter is adjusted suitably, effect is better than common non-deep learning model.
User property-information label interest-degree vector can make up user not browsed information when cold start-up problem, it is single It is pure using the deep learning model based on recurrent neural network come carry out can not be not clear in user when information label interest-degree is estimated Look at information when estimate interest-degree.
The present invention establishes and safeguards the candidate information library containing label, avoids the selection information directly from internet and often can The low-quality problem of information of appearance, and due to being screened to information, reduce the meter to user interest degree pre-estimation problem Calculation amount.Operation is estimated to information label interest-degree to can be used for subsequent user after the information addition label of screening.
The present invention takes full advantage of in the demographic of user and the historical behavior data of user and historical behavior data The timing information of information.By establishing user property-information label interest-degree vector, can solve user interest degree estimate it is cold Starting problem.By establishing user behavior-information label interest-degree vector, the advantage of deep learning model is utilized.
The candidate information library containing label is established, avoids and chooses the information matter that information will appear often directly from internet Low problem is measured, and due to being screened to information, reduces the calculation amount to user interest degree pre-estimation problem.
Information in candidate information library contains one or more information labels, and a lot of other interest-degree predictor methods only props up Each sample is held to contain there are one information label, and in this method user behavior-information label interest-degree vector calculate using Deep learning model based on recurrent neural network is calculated, and outputting and inputting for deep learning model can directly be used containing more The vector of label information contains the scene of multiple labels suitable for each sample.
The tanh hyperbolic tangent functions that use when calculating user-information label interest-degree vector, can will [0 ,+∞) number Be converted into [0,1) between number, meet this method just by user property-information label interest-degree vector sum user behavior-money News label interest-degree vector is combined, and every dimension of the user being calculated-information label interest-degree vector is made to estimate Value is all between [0,1] to reflect the demand of user interest degree.
Description of the drawings
Fig. 1 show a kind of one flow chart of embodiment of the predictor method of information label interest-degree provided by the invention.
Specific implementation mode
Technical scheme of the present invention is specifically addressed below, it should be pointed out that technical scheme of the present invention is unlimited Embodiment described in embodiment, those skilled in the art's reference and the content for using for reference technical solution of the present invention, in this hair The improvement and design carried out on the basis of bright, should belong to the scope of protection of the present invention.
Embodiment one
The embodiment of the present invention one provides a kind of predictor method of information label interest-degree, the method comprising the steps of S110- S150:
Step S110, the candidate information library of establishment and maintenance containing label.
It is pushed to the quality of the information of user for guarantee, while algorithm calculation amount being made to be in zone of reasonableness, need to create and contain The candidate information library of label, screens the information on network, selects the information of high quality and according to information content from default Information tag library in select label with the most matched one or more label of information content as the information, label will be added Information afterwards is added in the candidate information library containing label, is used for subsequent operation.
Tag number in preset information tag library is T1, T2..., Tm, m is the label in preset information tag library Sum.For reference, preset information tag library can be that [finance and economics, sport is military, entertains, and lives, religion in the specific implementation It educates, health, science and technology, culture, travelling, other], or be configured according to actual conditions, it needs to meet between each information label opposite Independence, and the division of information label is unsuitable meticulous, influences subsequently to grasp to avoid because the corresponding information of each information label is very few Make the accuracy of result.
To each information in candidate information library, each information is tieed up with a m according to the corresponding information label of information Information vector indicates that m is the total number of labels in preset information tag library.When the information contains label TjWhen, m dimensional information vectors Jth dimension value be 1, otherwise jth dimension value be 0.Such as certain information contains label T1, T2, T5, then the corresponding m of the information, which is tieed up, provides News vector is [1,1,0,0,1,0,0 ..., 0,0].
Since information has certain timeliness, needs periodically to safeguard the candidate information library containing label, add New information removes the information of no longer effective property.
Step S120, user property-information label interest-degree vector is obtained according to user demographic.
User demographic includes gender, age, region and other retrievable information, if being marked off to user with this Cadres and masses body.To avoid generating large error because group divides the meticulous sample size deficiency for leading to each group, for user Age can be divided into several gradients, such as:20 years old and hereinafter, 21-30 Sui, 31-40 Sui, 41-50 Sui, 51-60 Sui, 60 years old and More than.To the regional information of user, can be divided according to province when sample number abundance, it can be by several province's numbers when sample number is less According to merging, such as by Heilungkiang, Jilin, Liaoning is merged into " the Northeast ".
User is divided into several groups according to user demographic, such as [man, 31-40 Sui, Beijing] is a group Body, [female, 21-30 Sui, Shanghai] is a group, and G is numbered to group1, G2..., Gn.I-th of group GiTo j-th Information label TjInterest-degree HijFor:
HijValue between [0,1], and HijI-th of group G of the bigger explanation of valueiTo j-th of information label TjIt is emerging Interesting degree is bigger.To each group, can be obtained a m dimensional vector, referred to as the user property of the group-information label interest-degree to Amount is indicated with V (user property, information label).User property-information label interest-degree vector V (user property, information label) The value of middle jth dimension is j-th of information label T of the group pairjInterest-degree.For example, the user of i-th of group belongs to Property-information label interest-degree vector be [Hi1, Hi2..., Him], all same user property-information of users to share in the group Label interest-degree vector.
Group's division is carried out using user demographic in this method, used demographic includes but not limited to The information such as gender, age, the region of user, and divide group method can as the case may be depending on, provided in this method It is only for reference to divide sample.
Step S130, the historical behavior data of multiple users in preset time period are obtained and are pre-processed, depth is inputted Learning model is trained the deep learning model trained.
It obtains the historical behavior data of multiple users in preset time period and pre-processes.Preset time period T and extraction Number of users can be set according to practical situations, such as setting period T be three months, extraction number of users be Nuser, then from the randomly selected N in current all extracting datas three monthsuserThe historical behavior data of a user.For protection Privacy of user and be convenient for data processing, by NuserA Customs Assigned Number is 1,2,3 ..., Nuser.To each user, its history is got The corresponding information vector of each information browsed in behavioral data, arranges, the information first browsed is corresponded to according to chronological order Information vector come after before the corresponding information vector of the information that browses.
The deep learning model that this method uses is recurrent neural networks model, which can be RNN models and its improved model, such as LSTM.For each parameter elder generation random initializtion in deep learning model, subsequent basis Customs Assigned Number in pretreated historical behavior data, for each user, the sequencing that information is browsed by the user will The training of recurrent neural networks model is carried out in the corresponding m dimensional informations vector input recurrent neural networks model of information of browsing.
For recurrent neural networks model, for each user, the input of kth time is pretreated behavioral data In k-th of information corresponding information vector for browsing of the user, at this time model obtain the output vector of a prediction, by the output Vectorial information vector corresponding with+1 information of kth of user browsing in pretreated behavioral data compares, and calculates Go out the deviation of recurrent neural network and constantly corrects the parameter of neural network model according to deviation.When a user after the pre-treatment Historical behavior data in the corresponding information vector of information that all browses be sequentially inputted to the depth based on recurrent neural network After being trained in degree learning model, by the corresponding money of information of next user's browsing in pretreated historical behavior data News vector is sequentially input in chronological order in the recurrent neural networks model trained, and continues to train, until pretreatment N in behavioral data afterwardsuserThe corresponding information vector of information of a user's browsing fully enters deep learning model training and finishes. The deep learning model trained at this time.Since recurrent neural networks model generally uses in deep learning field, Therefore the specific of recurrent neural networks model is built by this method repeats no more.
The deep learning model trained can be by pressing browsing information by the pretreated historical behavior data of some user Chronological order sequentially input in model the interest-degree of information label estimated to carry out user, the deep learning trained The specific output of model ties up predicted vector for a m, and m ties up the value that jth is tieed up in predicted vector and represents the deep learning trained The user of model prediction is to information label TjInterest-degree, jth dimension the higher expression user of value to information label TjIt is interested Possibility it is bigger.
Step S140, the historical behavior data of active user are obtained and are pre-processed, the deep learning that use has been trained User behavior-information label interest-degree vector of active user is calculated in model.
When being estimated to active user's progress information label interest-degree using the deep learning model trained, gets it and go through The corresponding information vector of each information browsed in history behavioral data, arranges, the information pair first browsed according to chronological order Before the corresponding information vector of information that the information vector answered browses after coming.
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user Interest-degree vector is signed, is indicated with V (user behavior, information label).
It includes recurrent neural network that the deep learning model of user behavior-information label interest-degree vector is calculated in this method And its improved model.Improved recurrent neural networks model may be the neuronal quantity of network model, network model layer Number, addition threshold function etc., if the calculating that the recurrent neural networks model after improved structure does not directly affect this method proposition is used The mode of family behavior-information label interest-degree vector, then can be considered one of the implementation method of this method.
Step S150, according to the user property of active user-information label interest-degree vector sum user behavior-information label User-information label interest-degree vector, final several information marks for determining user and being most interested in are calculated in interest-degree vector Label.
According to the user property for the active user having calculated that-information label interest-degree vector V (user property, information mark Label) and user behavior-information label interest-degree vector V (user behavior, information label), active user can be calculated to information mark The m dimension interest-degree vectors of label, are indicated with V (user, information label).Calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
W in above-mentioned formula indicates that V (user behavior, information label) is calculating interest-degree of the active user to information label Shared weight in vector.For new user, no historical behavior data, therefore w need to meet in active user without history row W is 0 when being several, and due to gradually increasing with user's history behavioral data, and V (user behavior, information label) can be more accurate Really reflect user interest degree, therefore the weight shared by V (user behavior, information label) should be with user's history behavioral data Increase and gradually increase, and w values should meet always in [0,1] range.Based on the demand, the calculation formula of w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein tanh is hyperbolic tangent function, and a is the constant more than 0, is browsed in preset time period T in active user In the case that information quantity is constant, a is bigger, and w is bigger, and the value of a can be arranged according to practical situations.For reference, a can It is set as 0.05.
The value that the active user being calculated ties up jth in the interest-degree vector V (user, information label) of information label Final user that this method is calculated is represented to label TjInterest-degree, jth dimension higher the expressions user of value to mark Sign TjInterested possibility is bigger.Interest-degree vector V (user, information mark of the active user being calculated to information label Label) interest-degree of the active user to all labels in information tag library is completely embodied, information is being carried out by label to user When push, the highest several labels of value in V (user, information label) can be selected to carry out message push according to specific needs.
Such as when information tag library is [finance and economics, sport is military, entertains, and lives, education, health, science and technology, culture, travelling, Other] when, if the active user being calculated be to the interest-degree vector V (user, information label) of information label [0.42, 0.08,0.02,0.20,0.01,0.06,0.33,0.41,0.05,0.19,0.14], by user-information label interest-degree vector Understand that the active user that this method is estimated is respectively finance and economics, science and technology, health, joy from high to low to the interest-degree of each information label Happy, travelling, other, sport, education, culture, military affairs, life.If preferentially choosing three labels that active user is most interested in Information carries out personalized push, then the corresponding information label of information pushed is finance and economics, science and technology, health.
Further, described to create and include the step of safeguarding the candidate information library containing label:
It is selected from preset information tag library with the most matched one or more labels of information content as the information Information after addition label is added in the candidate information library containing label label;To each information in candidate information library, according to Each information is indicated that m is the mark in preset information tag library by the corresponding information label of information with the m information vectors tieed up Label sum;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
Further, the user demographic includes but not limited to:Retrievable gender, age and/or region letter One or more information that several groups are marked off to user in breath.
Further, described user property-information label interest-degree vector is obtained according to user demographic to include:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
Further, described to obtain the historical behavior data of multiple users in preset time period and pre-process, it inputs Deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained, And by information vector by progress recurrent neural network mould in the chronological order input recurrent neural networks model of browsing information The training of type, the deep learning model trained.
Further, it is described obtain active user historical behavior data and pre-process, the depth that use has been trained User behavior-information label interest-degree vector of active user is calculated in learning model, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to time order and function It is ranked sequentially;
In chronological sequence sequentially successively by the corresponding information vector of each information in the historical behavior data of active user It is input in the deep learning model trained, when the corresponding information vector of each information in historical behavior data fully enters After, the m dimension predicted vectors that the deep learning model trained at this time obtains are user behavior-information mark of active user Sign interest-degree vector.
Further, the user property-information label interest-degree vector sum user behavior-information according to active user User-information label interest-degree vector, final several moneys for determining user and being most interested in are calculated in label interest-degree vector Label is interrogated, including:
According to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information mark Interest-degree vector is signed, m dimension interest-degree vector of the active user to information label can be calculated, calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior, Information label) it is user behavior-information label interest-degree vector;V (user property, information label) is that the user of active user belongs to Property-information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest of the active user to information label Shared weight, w values should meet always in [0,1] range in degree vector.
Further, the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
A kind of predictor method of information label interest-degree provided by the invention, innovatively proposes user property-information Label interest-degree vector, user behavior-information label interest-degree vector combine and obtain the side of user-information label interest-degree vector Method so that in user's not browsed information, interest of the group to information label where finding it using user demographic It spends to avoid cold start-up problem, and when the information of user's browsing gradually increases, user behavior-information label interest-degree vector institute It accounts for weight gradually to increase, since user behavior-information label interest-degree vector is by the depth based on recurrent neural network It practises model to be calculated, the temporal aspect in user's history behavioral data is utilized, therefore when training sample is sufficient and recurrent neural In the case of network parameter is adjusted suitably, effect is better than common non-deep learning model.
User property-information label interest-degree vector can make up user not browsed information when cold start-up problem, it is single It is pure using the deep learning model based on recurrent neural network come carry out can not be not clear in user when information label interest-degree is estimated Look at information when estimate interest-degree.
The present invention establishes and safeguards the candidate information library containing label, avoids the selection information directly from internet and often can The low-quality problem of information of appearance, and due to being screened to information, reduce the meter to user interest degree pre-estimation problem Calculation amount.Operation is estimated to information label interest-degree to can be used for subsequent user after the information addition label of screening.
The present invention takes full advantage of in the demographic of user and the historical behavior data of user and historical behavior data The timing information of information.By establishing user property-information label interest-degree vector, can solve user interest degree estimate it is cold Starting problem.By establishing user behavior-information label interest-degree vector, the advantage of deep learning model is utilized.
The candidate information library containing label is established, avoids and chooses the information matter that information will appear often directly from internet Low problem is measured, and due to being screened to information, reduces the calculation amount to user interest degree pre-estimation problem.
Information in candidate information library contains one or more information labels, and a lot of other interest-degree predictor methods only props up Each sample is held to contain there are one information label, and in this method user behavior-information label interest-degree vector calculate using Deep learning model based on recurrent neural network is calculated, and outputting and inputting for deep learning model can directly be used containing more The vector of label information contains the scene of multiple labels suitable for each sample.
The tanh hyperbolic tangent functions that use when calculating user-information label interest-degree vector, can will [0 ,+∞) number Be converted into [0,1) between number, meet this method just by user property-information label interest-degree vector sum user behavior-money News label interest-degree vector is combined, and every dimension of the user being calculated-information label interest-degree vector is made to estimate Value is all between [0,1] to reflect the demand of user interest degree.
The embodiment of the present invention proposes a kind of predictor method of information label interest-degree, and this method passes through to the money on network News carry out screening and obtain the candidate information library containing label to the information addition label after screening.To needing progress information label emerging The active user that interesting degree is estimated obtains the demographic characteristics of active user and arranges and analyze to data, is used using current User property-information label interest-degree vector of active user is calculated in the demographic characteristics at family.This method has trained one Deep learning model based on recurrent neural network obtains the historical behavior data of active user and arranges and divide to data Analysis, and in the deep learning model that the input of the historical behavior data of active user has been trained, obtained using deep learning model The user behavior of active user-information label interest-degree vector.The user property-of active user is utilized according to the method proposed Information label interest-degree vector sum user behavior-information label interest-degree vector be calculated user-information label interest-degree to Amount.The user for the active user being calculated-information label interest-degree vector embodies this method to active user to each information The estimation results of label interest-degree.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can lead to Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software.Based on this understanding, this hair Bright technical solution can be expressed in the form of software products, which can be stored in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
Disclosed above is only several specific embodiments of the present invention, and still, the present invention is not limited to above-described embodiment, The changes that any person skilled in the art can think of should all fall into protection scope of the present invention.

Claims (8)

1. a kind of predictor method of information label interest-degree, which is characterized in that this method includes:
It creates and safeguards the candidate information library containing label;
User property-information label interest-degree vector is obtained according to user demographic;
It obtains the historical behavior data of multiple users in preset time period and pre-processes, input deep learning model is instructed Get the deep learning model trained;
It obtains the historical behavior data of active user and pre-processes, be calculated and worked as using the deep learning model trained The user behavior of preceding user-information label interest-degree vector;
According to the user property of active user-information label interest-degree vector sum user behavior-information label interest-degree to gauge Calculation obtains user-information label interest-degree vector, final several information labels for determining user and being most interested in.
2. the method as described in claim 1, which is characterized in that described the step of creating and safeguarding the candidate information library containing label Including:
The label as the information with the most matched one or more labels of information content is selected from preset information tag library, Information after addition label is added in the candidate information library containing label;To each information in candidate information library, according to information Each information is indicated that m is that the label in preset information tag library is total by corresponding information label with the m information vectors tieed up Number;When the information contains label TjWhen, the jth dimension value of m dimensional information vectors is 1, and otherwise jth dimension value is 0;
Periodically the candidate information library containing label is safeguarded, adds new information, removes the information of no longer effective property.
3. the method as described in claim 1, which is characterized in that the user demographic includes but not limited to:It can obtain Gender, one or more information that several groups are marked off to user in age and/or regional information.
4. the method as described in claim 1 or 3, which is characterized in that described to obtain user property-according to user demographic Information label interest-degree vector includes:
I-th of group GiTo j-th of information label TjUser property-information label interest-degree vector HijFor:
HijValue between [0,1].
5. the method as described in claim 1, which is characterized in that the historical behavior for obtaining multiple users in preset time period Data are simultaneously pre-processed, and input deep learning model is trained the deep learning model trained, including:
The corresponding information vector of each information browsed in the historical behavior data of multiple users in preset time period is obtained, and will Information vector is by progress recurrent neural networks model in the chronological order input recurrent neural networks model of browsing information Training, the deep learning model trained.
6. the method as described in claim 1 or 5, which is characterized in that the historical behavior data for obtaining active user are gone forward side by side Row pretreatment, using the deep learning model trained be calculated user behavior-information label interest-degree of active user to Amount, including:
The corresponding information vector of each information browsed in the historical behavior data of active user is obtained, according to chronological order Arrangement;
The corresponding information vector of each information in the historical behavior data of active user is in chronological sequence sequentially sequentially input Into the deep learning model trained, finished when the corresponding information vector of each information in historical behavior data fully enters Afterwards, the m dimension predicted vectors that the deep learning model trained at this time obtains are that user behavior-information label of active user is emerging Interesting degree vector.
7. the method as described in one of claim 1-6, which is characterized in that the user property-information according to active user User-information label interest-degree vector is calculated in label interest-degree vector sum user behavior-information label interest-degree vector, most Several information labels that user is most interested in are determined eventually, including:
It is emerging according to the user property for the active user having calculated that-information label interest-degree vector sum user behavior-information label Interesting degree vector, can calculate m dimension interest-degree vector of the active user to information label, and calculation formula is as follows:
V (user, information label)=(1-w) * V (user property, information label)+w*V (user behavior, information label)
Wherein, V (user, information label) is that active user ties up interest-degree vector to the m of information label;V (user behavior, information Label) it is user behavior-information label interest-degree vector;V (user property, information label) is the user property-of active user Information label interest-degree vector;W indicates that V (user behavior, information label) is calculating interest-degree of the active user to information label Shared weight, w values should meet always in [0,1] range in vector.
8. the method for claim 7, which is characterized in that the calculation formula of the w is as follows:
W=tanh (the information quantity that a* active users browse in preset time period T)
Wherein, tanh is hyperbolic tangent function, and a is the constant more than 0.
CN201810505164.8A 2018-05-24 2018-05-24 Method for estimating interest degree of information tag Active CN108804577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810505164.8A CN108804577B (en) 2018-05-24 2018-05-24 Method for estimating interest degree of information tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810505164.8A CN108804577B (en) 2018-05-24 2018-05-24 Method for estimating interest degree of information tag

Publications (2)

Publication Number Publication Date
CN108804577A true CN108804577A (en) 2018-11-13
CN108804577B CN108804577B (en) 2022-11-01

Family

ID=64091552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810505164.8A Active CN108804577B (en) 2018-05-24 2018-05-24 Method for estimating interest degree of information tag

Country Status (1)

Country Link
CN (1) CN108804577B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232620A (en) * 2019-06-05 2019-09-13 拉扎斯网络科技(上海)有限公司 Trade company's label determines method, apparatus, electronic equipment and readable storage medium storing program for executing
CN111177538A (en) * 2019-12-13 2020-05-19 杭州顺网科技股份有限公司 Unsupervised weight calculation-based user interest tag construction method
CN111914159A (en) * 2019-05-10 2020-11-10 招商证券股份有限公司 Information recommendation method and terminal
CN112100221A (en) * 2019-06-17 2020-12-18 腾讯科技(北京)有限公司 Information recommendation method and device, recommendation server and storage medium
CN112133431A (en) * 2020-08-27 2020-12-25 绿瘦健康产业集团有限公司 Health information message pushing method, device, medium and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363688A1 (en) * 2014-06-13 2015-12-17 Microsoft Corporation Modeling interestingness with deep neural networks
CN106446189A (en) * 2016-09-29 2017-02-22 广州艾媒数聚信息咨询股份有限公司 Message-recommending method and system
CN107273538A (en) * 2017-06-29 2017-10-20 广州优视网络科技有限公司 Information recommends method, device and server
CN107341245A (en) * 2017-07-06 2017-11-10 广州优视网络科技有限公司 Data processing method, device and server
CN107908789A (en) * 2017-12-12 2018-04-13 北京百度网讯科技有限公司 Method and apparatus for generating information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363688A1 (en) * 2014-06-13 2015-12-17 Microsoft Corporation Modeling interestingness with deep neural networks
CN106446189A (en) * 2016-09-29 2017-02-22 广州艾媒数聚信息咨询股份有限公司 Message-recommending method and system
CN107273538A (en) * 2017-06-29 2017-10-20 广州优视网络科技有限公司 Information recommends method, device and server
CN107341245A (en) * 2017-07-06 2017-11-10 广州优视网络科技有限公司 Data processing method, device and server
CN107908789A (en) * 2017-12-12 2018-04-13 北京百度网讯科技有限公司 Method and apparatus for generating information

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914159A (en) * 2019-05-10 2020-11-10 招商证券股份有限公司 Information recommendation method and terminal
CN111914159B (en) * 2019-05-10 2024-03-12 招商证券股份有限公司 Information recommendation method and terminal
CN110232620A (en) * 2019-06-05 2019-09-13 拉扎斯网络科技(上海)有限公司 Trade company's label determines method, apparatus, electronic equipment and readable storage medium storing program for executing
CN110232620B (en) * 2019-06-05 2021-07-30 拉扎斯网络科技(上海)有限公司 Merchant label determination method and device, electronic equipment and readable storage medium
CN112100221A (en) * 2019-06-17 2020-12-18 腾讯科技(北京)有限公司 Information recommendation method and device, recommendation server and storage medium
CN112100221B (en) * 2019-06-17 2024-02-13 深圳市雅阅科技有限公司 Information recommendation method and device, recommendation server and storage medium
CN111177538A (en) * 2019-12-13 2020-05-19 杭州顺网科技股份有限公司 Unsupervised weight calculation-based user interest tag construction method
CN111177538B (en) * 2019-12-13 2023-05-05 杭州顺网科技股份有限公司 User interest label construction method based on unsupervised weight calculation
CN112133431A (en) * 2020-08-27 2020-12-25 绿瘦健康产业集团有限公司 Health information message pushing method, device, medium and terminal equipment

Also Published As

Publication number Publication date
CN108804577B (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN108804577A (en) A kind of predictor method of information label interest-degree
CN108960499B (en) Garment fashion trend prediction system integrating visual and non-visual features
CN110321926B (en) Migration method and system based on depth residual error correction network
CN111163359B (en) Bullet screen generation method and device and computer readable storage medium
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN103812872B (en) A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN106919951A (en) A kind of Weakly supervised bilinearity deep learning method merged with vision based on click
CN108446964B (en) User recommendation method based on mobile traffic DPI data
CN105913296A (en) Customized recommendation method based on graphs
CN108573041A (en) Probability matrix based on weighting trusting relationship decomposes recommendation method
CN111275496B (en) Self-media advertisement intelligent recommendation method
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN110110663A (en) A kind of age recognition methods and system based on face character
CN108710609A (en) A kind of analysis method of social platform user information based on multi-feature fusion
CN106445915A (en) New word discovery method and device
CN110414005A (en) Intention recognition method, electronic device, and storage medium
Liu et al. Multi-perspective User2Vec: Exploiting re-pin activity for user representation learning in content curation social network
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN103605493A (en) Parallel sorting learning method and system based on graphics processing unit
Jia et al. Dynamic group recommendation algorithm based on member activity level
CN105678430A (en) Improved user recommendation method based on neighbor project slope one algorithm
CN109740743A (en) Hierarchical neural network query recommendation method and device
CN109255019A (en) A kind of online exam pool and its application method based on artificial intelligence
CN111966829B (en) Network topic outbreak time prediction method based on deep survival analysis
Zou et al. TRCF: Temporal Reinforced Collaborative Filtering for Time-Aware QoS Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant