CN109871487A - A kind of news recalls method and system - Google Patents
A kind of news recalls method and system Download PDFInfo
- Publication number
- CN109871487A CN109871487A CN201910132210.9A CN201910132210A CN109871487A CN 109871487 A CN109871487 A CN 109871487A CN 201910132210 A CN201910132210 A CN 201910132210A CN 109871487 A CN109871487 A CN 109871487A
- Authority
- CN
- China
- Prior art keywords
- vector
- news
- user
- cosine similarity
- user characteristics
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The invention discloses a kind of news to recall method and system, news id sequence is clicked as user characteristics by obtaining user, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, user characteristics and label are trained based on pre-set prediction model, generate the multi-C vector of corresponding user characteristics, the news id clicked according to multi-C vector and subsequent time user, the news id not clicked on carries out cosine similarity calculating, obtain the cosine similarity of corresponding multi-C vector, N number of multi-C vector is chosen from large to small based on the corresponding cosine similarity of multi-C vector, determine the corresponding news id of N number of multi-C vector.Pass through the above method, the multi-C vector for generating user characteristics determines the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, the interested news of user is obtained, realizes that user obtains the higher news of its interest-degree by recalling the interested news of user.
Description
Technical field
The present invention relates to depth learning technology fields, recall method and system more specifically to a kind of news.
Background technique
With the rapid development of information technology and internet, Internet news is more and more welcomed by the people, becomes people
A kind of main path of information is obtained in daily life.At present news recall be people obtain information one of mode.
It is an important process in news recommendation field that news, which recalls work, and it is to utilize user's mistake that traditional news, which is recalled,
It goes the keyword of the title clicked and content to be recalled, and obtains the related news of this keyword.
In the prior art, since news record is more, the feature total amount of news is big, is provided by the news that traditional news is recalled
News, obtained Domestic News do not meet user interest a bit.
Summary of the invention
In view of this, this application provides a kind of news to recall method and system, it is emerging to realize that the news recalled meets user
The purpose of interest.
To achieve the goals above, it is proposed that scheme it is as follows:
First aspect present invention discloses a kind of news and recalls method, comprising:
Obtain user click news id sequence be used as user characteristics, and obtain subsequent time user click news id with
For the news id not clicked on as label, subsequent time is the newest of corresponding click news id in user click news id sequence
Moment determines;
The user characteristics and the label are trained based on pre-set prediction model, and special to the user
Sign is converted, and the multi-C vector of the corresponding user characteristics is generated;
The news id progress cosine phase clicking news id according to the multi-C vector and the subsequent time user, not clicking on
It is calculated like degree, obtains the cosine similarity for corresponding to the multi-C vector;
Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines institute
State the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Preferably, the setting up procedure of the pre-set prediction model, comprising:
Construct original neural network model;
Obtain the corresponding user of training user the news id clicked and the news id not clicked on of preset number;
News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to institute
Original neural network model is stated, the corresponding initial training result of each training user is obtained;
The original neural network parameter, which is updated, according to the initial training result obtains prediction model.
Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and
The user characteristics are converted, the multi-C vector of the corresponding user characteristics is generated, comprising:
In the embeding layer of pre-set prediction model, the user characteristics are turned based on lstm shot and long term memory models
Change multi-C vector into;
In the output layer of pre-set prediction model, the multi-C vector is thrown based on mlp multi-layer perception (MLP)
Shadow generates the corresponding multi-C vector of the user characteristics.
Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new
It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, comprising:
The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains described
Corresponding first cosine similarity of multi-C vector;
The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtains described more
Corresponding second cosine similarity of dimensional vector.
Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small
Vector determines the corresponding news id of N number of multi-C vector, comprising:
The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector,
Determine the corresponding news id of N number of multi-C vector;
Or
The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector
When similarity value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small,
Determine the corresponding news id of N number of multi-C vector.
Second aspect of the present invention discloses a kind of news recalling system, comprising:
Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user
For the news id of the click and news id not clicked on as label, subsequent time is that the user clicks corresponding points in news id sequence
The newest moment for hitting news id determines;
Training converting unit, for being instructed based on pre-set prediction model to the user characteristics and the label
Practice, and the user characteristics are converted, generates the multi-C vector of the corresponding user characteristics;
Computing unit, for according to the multi-C vector and the news id of subsequent time user click, do not click on
News id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector;
Determination unit is chosen N number of more from large to small for the size based on the corresponding cosine similarity of the multi-C vector
Dimensional vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and
The user characteristics are converted, the training converting unit of the multi-C vector of the corresponding user characteristics is generated, comprising:
Conversion module, will in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models
The user characteristics are converted into multi-C vector;
Projection module, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) will be described more
Dimensional vector is projected, and the corresponding multi-C vector of the user characteristics is generated.
Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new
It hears id and carries out cosine similarity calculating, obtain the computing unit for corresponding to the cosine similarity of the multi-C vector, comprising:
First computing module carries out cosine phase for the multi-C vector and the news id that the subsequent time user clicks
It is calculated like degree, obtains corresponding first cosine similarity of the multi-C vector;
Second computing module, it is similar to the news id progress cosine that the subsequent time does not click on for the multi-C vector
Degree calculates, and obtains corresponding second cosine similarity of the multi-C vector.
Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small
Vector determines the determination unit of the corresponding news id of N number of multi-C vector, comprising: sorting module or judgment module;
The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen by big
To small N number of multi-C vector, the corresponding news id of N number of multi-C vector is determined;
The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when described
When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined
It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.
As can be seen from the above technical solutions, the invention discloses a kind of news to recall method and system, by obtaining user
News id sequence is clicked as user characteristics, and obtains the news id that subsequent time user clicks and the news id not clicked on work
For label, user characteristics and label are trained based on pre-set prediction model, and user characteristics are converted, it is raw
The multi-C vector for answering user characteristics in pairs, the news id clicked according to multi-C vector and subsequent time user, the news not clicked on
Id carries out cosine similarity calculating, obtains the cosine similarity of corresponding multi-C vector, similar based on the corresponding cosine of multi-C vector
The size of degree chooses N number of multi-C vector from large to small, determines the corresponding news id of N number of multi-C vector.It is raw by the above method
At the multi-C vector of user characteristics, based on the size of the corresponding cosine similarity of multi-C vector, determine that multi-C vector is corresponding new
Id is heard, the interested news of user is obtained, by recalling the interested news of user to realize that it is higher that user obtains its interest-degree
News.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow diagram that a kind of news disclosed by the embodiments of the present invention recalls method;
Fig. 2 is the flow diagram of prediction model setting up procedure disclosed by the embodiments of the present invention;
Fig. 3 is the flow diagram disclosed by the embodiments of the present invention for updating original neural network parameter;
Fig. 4 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 5 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 6 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 7 is a kind of structural schematic diagram of news recalling system disclosed by the embodiments of the present invention;
Fig. 8 is the structural schematic diagram of the training converting unit of news recalling system disclosed by the embodiments of the present invention;
Fig. 9 is the structural schematic diagram of the computing unit of news recalling system disclosed by the embodiments of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In this application, the terms "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion,
So that the process, method, article or equipment for including a series of elements not only includes those elements, but also including not having
The other element being expressly recited, or further include for elements inherent to such a process, method, article, or device.Do not having
There is the element limited in the case where more limiting by sentence "including a ...", it is not excluded that in the mistake including the element
There is also other identical elements in journey, method, article or equipment.
It can be seen from background technology that in the prior art, it is the title clicked in the past using user and content that news, which is recalled,
Keyword is recalled, and obtains the related news of this keyword, recalls Domestic News by the news method of recalling, obtained
Domestic News do not meet user interest a bit.Therefore, the invention discloses a kind of news to recall method, by generating user characteristics
Multi-C vector determine the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, used
The interested news in family, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
As shown in Figure 1, recall the flow diagram of method for a kind of news disclosed by the embodiments of the present invention, specifically include as
Lower step:
Step S101: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click
News id and the news id that does not click on as label.
During executing step S101, since user behavior is affected by timeliness, thus the user's point chosen
Hitting sequence is the news id that clicked recently of user as user characteristics, and clicks the news that sequence subsequent time user clicks
Id and the news id not clicked on are as label.
It should be noted that the user determines after clicking the news content title that news id sequence is user's click,
The user click news id quantity can be it is multiple, be specifically configured according to the actual situation by technical staff.
It should be noted that the news id that the user clicks is in the news id that user clicks gathers, user's point
The news id quantity hit can specifically be chosen by technical staff to be multiple according to the actual situation.
It should be noted that subsequent time be the user click in news id sequence it is corresponding click news id it is newest when
It carves and determines.
The realization process of the news id clicked based on above-mentioned acquisition subsequent time user and the news id not clicked on, are lifted here
Example is illustrated:
For example, 02 timesharing at 15, news id15 that user clicks is as user characteristics, 03 timesharing when 15, by user's point
The news id16 hit and the news id17 not clicked on are as label.
It should be noted that the news id that does not click on of the user is in the news id set that user does not click on, the use
The id for the news that family does not click on is any one in the news id set that user does not click on, specifically by technical staff according to reality
Situation is chosen.
Step S102: the user characteristics and the label are trained based on pre-set prediction model, and right
The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
During executing step S102, the user characteristics and the label are passed through into pre-set prediction model
It is trained, and the user characteristics is converted into correspond to the multi-C vector of the user characteristics.The prediction model is to be based on
The original neural network model of sample data training obtains.
The setting up procedure for the pre-set prediction model being related in the step S102 that above-mentioned Fig. 1 is disclosed, such as Fig. 2 show
The flow diagram for having gone out prediction model setting up procedure, specifically comprises the following steps:
Step S201: original neural network model is constructed.
Step S202: it obtains the news id of the corresponding user's click of training user of preset number and does not click on new
Hear id.
It should be noted that the training user can be the user of different age group, the user of different sexes can be,
It is also possible to the user etc. of different hobbies, the determination of specific training user is selected according to the actual situation by technical staff
It selects.
Step S203: by the news id that the corresponding user of each training user the clicks and news id that user does not click on
It sequentially inputs to the original neural network model, obtains the corresponding initial training result of each training user.
During implementing step S203, news id that the corresponding user of each training user is clicked and
The news id not clicked on is sequentially input to original neural network model, and shot and long term memory models lstm is selected to use as training
Original neural network model.
It should be noted that when shot and long term memory models (long short-trem memory, lstm) are that one kind is based on
Between sequence deep neural network, for handle and predicted time sequence in be spaced and postpone relatively long material time.
Step S204: the original neural network parameter is updated according to the initial training result and obtains prediction model.
It is updated according to the initial training result by back-propagation algorithm by executing above-mentioned steps S201- step S204
The original neural network parameter obtains prediction model.
It should be noted that back-propagation algorithm (back propagation algorithm, bpa) often is used to train
Multi-layer perception (MLP), the back-propagation algorithm bpa is mainly propagated by excitation and weight updates iterative cycles iteration, until input
Until data response reaches scheduled target zone.
In the concrete realization, include multiple process layers in the prediction model of building, include at least: embedding layers, two-way
Lstm layers and predict layers.
It should be noted that the embedding layers of method according to matrix decomposition, passes through lstm shot and long term memory models
The user characteristics are converted into the multi-C vector with symbolical meanings.Described two-way lstm layers are remembered mould using lstm shot and long term
Type connects the mode training pattern of mlp multi-layer perception (MLP), and the news that the output result of mlp multi-layer perception (MLP) and user are clicked
The id and news id not clicked on carries out cosine similarity calculating.User characteristics are inputted in predict layers described and user clicks newly
It hears id, click on news id, it is similar with the cosine for the news id not clicked on to calculate the news id that user clicks according to prediction model
Degree.
It should be noted that the embedding layers and two-way lstm layers training stage for this prediction model, predict layers
For the forecast period of this prediction model.
Explanation is needed further exist for, it is subsequent to adjust the prediction as a result, can also be used as using what the prediction model obtained
The parameter of model prediction accuracy carry out using.
The foundation initial training result being related in the step S204 that above-mentioned Fig. 2 is disclosed updates the original neural network ginseng
Number obtain the process of prediction model, such as Fig. 3, show the flow diagram for updating original neural network parameter, specifically include as
Lower step:
Step S301: it obtains training user and clicks news id sequence.
It should be noted that the training user, which clicks in news id sequence, clicks news id comprising multiple users, specifically
The determination that training user clicks news id sequence is configured according to the actual situation by technical staff.
Step S302: the training user is clicked into news id sequence and passes through embedding layers of progress vectorization.
Step S303: the training user after vectorization is clicked into news id sequence and passes through lstm shot and long term memory models
It is converted into multi-C vector.
Step S304: the multi-C vector is projected by mlp multi-layer perception (MLP), it is corresponding more to obtain user characteristics
Dimensional vector.
Step S305: according to the corresponding multi-C vector of the user characteristics respectively with user click news id and do not click on
News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, and pass through backpropagation calculation
Method updates prediction model parameters.
Training user's click news id sequence is obtained by executing above-mentioned steps S301- step S305, by user's point
It hits news id sequence to be projected to obtain the corresponding multi-C vector of user characteristics, according to the corresponding multi-C vector of the user characteristics
The news id clicked respectively with user and the news id not clicked on carry out cosine similarity calculating, obtain corresponding to the multi-C vector
Cosine similarity, and prediction model parameters are updated by back-propagation algorithm.
Based on the process of above-mentioned update prediction model parameters, citing is illustrated here:
It is id0-id19 that active user, which clicks news id sequence, news id20 that subsequent time user clicks and is not clicked on
News id21 clicks news id sequence id0-id19 after embedding layers of vectorization as label, by the active user,
300 dimensional vectors are converted by lstm shot and long term memory models, 300 dimensional vector is thrown by mlp multi-layer perception (MLP)
Shadow obtains corresponding 300 dimensional vector of user characteristics, and 300 dimensional vector is carried out cosine similarity meter with id20, id21 respectively
It calculates, the second cosine similarity value for obtaining the first cosine similarity value of user's click and not clicking on passes through back-propagation algorithm
Update prediction model parameters.
Step S103: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on
Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
During executing step S103, news is clicked with the subsequent time user respectively according to the multi-C vector
Id, the news id not clicked on carry out cosine similarity calculating, obtain corresponding first cosine similarity of the multi-C vector
With the second cosine similarity.
It should be noted that according to the big of corresponding first cosine similarity of the multi-C vector and the second cosine similarity
It is small, the corresponding news id of multi-C vector is determined, so that the interested news of user is obtained, by recalling the interested news of user
So that user obtains the higher news of its interest-degree.
Step S104: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to
Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
During executing step S104, based on the size of the corresponding cosine similarity of the multi-C vector, You great Zhi
It is small be ranked up after choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.
It should be noted that multi-C vector can be 300 dimensional vectors, it is also possible to 600 dimensional vectors etc., specifically by technology people
Member is configured according to the actual situation.
It should be noted that the cosine similarity of multi-C vector is bigger, illustrate that the user the interested in current news.
It should be noted that the value of specific N is chosen according to the actual situation by technical staff.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user
User characteristics, and obtain news id that subsequent time user clicks and the news id that does not click on as label, based on setting in advance
The prediction model set is trained user characteristics and label, and converts to user characteristics, generates corresponding user characteristics
Multi-C vector, the news id progress cosine similarity meter that news id is clicked according to multi-C vector and subsequent time user, is not clicked on
It calculates, the cosine similarity for obtaining corresponding multi-C vector is selected from large to small based on the size of the corresponding cosine similarity of multi-C vector
N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.By the above method, generate the multidimensional of user characteristics to
Amount, based on the size of the corresponding cosine similarity of multi-C vector, determines the corresponding news id of multi-C vector, it is interested to obtain user
News, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
Optionally, it is evaluated and tested by user click data true on line, is recalled using lstm shot and long term memory models
The click-through-rate (click through rate, ctr) of news be higher than the click-through-rate recalled of traditional news.
It should be noted that ctr click-through-rate is hits/impression.
The embodiment of the present invention is recalled new by true user click data on line using lstm shot and long term memory models
The click-through-rate of news is higher than the click-through-rate that traditional news is recalled, so that recalled by lstm shot and long term memory models
The retention ratio of news and per capita read duration be improved.
Based on method described in Fig. 1, another kind news disclosed by the embodiments of the present invention recalls method, as shown in figure 4, specifically
Include the following steps:
Step S401: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click
News id and the news id that does not click on as label.
The implementation procedure of above-mentioned steps S401 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle
It is identical, reference can be made to, it is not discussed here.
Step S402:, will be described based on lstm shot and long term memory models in the embeding layer of pre-set prediction model
User characteristics are converted into multi-C vector.
During executing step S402, in the embeding layer of pre-set prediction model, according to matrix decomposition
The user characteristics are converted into multi-C vector by lstm shot and long term memory models by method.
It is clicked in news id sequence it should be noted that applying lstm shot and long term memory models in user, filtering and use
The incoherent news id of family interest, algorithmically selects the superior and eliminates the inferior, the interested news id of optimum selecting user.
Step S403: in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to
Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.
It should be noted that including: embeding layer in multi-layer perception (MLP) (multi layer perception, mlp), hide
Layer and output layer.Connection is between layers in mlp multi-layer perception (MLP), upper one layer any one neuron and next layer of institute
There is neuron to have connection.
It should be noted that multi-C vector is converted into after user characteristics to be input to the embeding layer of prediction model, by hidden
The multi-C vector is transferred in output layer by hiding layer from input layer, in output layer, after the multi-C vector is projected,
Generate the corresponding multi-C vector of user characteristics.
It should be noted that the user characteristics and the label are trained based on pre-set prediction model,
It is trained in such a way that lstm shot and long term memory models connect mlp multi-layer perception (MLP).
Step S404: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on
Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
Step S405: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to
Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
The implementation procedure of the implementation procedure of above-mentioned steps S404- step S405 and step S103- step S104 shown in fig. 1
It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user
User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, it is presetting
Prediction model embeding layer in, the user characteristics are converted into multi-C vector based on lstm shot and long term memory models, pre-
In the output layer for the prediction model being first arranged, the multi-C vector is projected based on mlp multi-layer perception (MLP), generates the use
The corresponding multi-C vector of family feature, according to the multi-C vector and the subsequent time user click news id, do not click on it is new
It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, it is corresponding based on the multi-C vector
Cosine similarity size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector.It is logical
The above method is crossed, the multi-C vector of user characteristics is generated, based on the size of the corresponding cosine similarity of multi-C vector, determines multidimensional
The corresponding news id of vector, obtains the interested news of user, by recalling the interested news of user to realize that user obtains
The higher news of its interest-degree.
It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in figure 5, specifically include as
Lower step:
Step S501: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click
News id and the news id that does not click on as label.
Step S502: the user characteristics and the label are trained based on pre-set prediction model, and right
The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
The implementation procedure of the implementation procedure of above-mentioned steps S501- step S502 and step S101- step S102 shown in fig. 1
It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.
Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S502, can also use
The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.
Step S503: the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating,
Obtain corresponding first cosine similarity of the multi-C vector.
It should be noted that first cosine similarity is the cosine similarity that user clicks news id.According to obtaining
Corresponding first cosine similarity of the multi-C vector is updated the parameter of model by back-propagation algorithm.
Step S504: the news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating,
Obtain corresponding second cosine similarity of the multi-C vector.
It should be noted that second cosine similarity is the cosine similarity that user does not click on news id.According to
The parameter of model is updated by back-propagation algorithm to corresponding second cosine similarity of the multi-C vector.
Step S505: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to
Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
The implementation procedure of above-mentioned steps S505 is identical as the implementation procedure of step S104 shown in fig. 1, and implementation principle
It is identical, reference can be made to, it is not discussed here.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user
User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on described more
The size of the corresponding cosine similarity of dimensional vector, chooses N number of multi-C vector from large to small, determines that N number of multi-C vector is corresponding
News id, the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating, obtain described
The news id that corresponding first cosine similarity of multi-C vector, the multi-C vector and the subsequent time do not click on is carried out
Cosine similarity calculates, and obtains corresponding second cosine similarity of the multi-C vector, is based on pre-set prediction mould
Type is trained the user characteristics and the label, and converts to the user characteristics, generates the corresponding user
The multi-C vector of feature.By the above method, the multi-C vector of user characteristics is generated, it is similar based on the corresponding cosine of multi-C vector
The size of degree determines the corresponding news id of multi-C vector, obtains the interested news of user, interested new by recalling user
It hears to realize that user obtains the higher news of its interest-degree.
It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in fig. 6, specifically include as
Lower step:
Step S601: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click
News id and the news id that does not click on as label.
The implementation procedure of above-mentioned steps S601 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle
It is identical, reference can be made to, it is not discussed here.
Step S602: the user characteristics and the label are trained based on pre-set prediction model, and right
The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
The implementation procedure of above-mentioned steps S602 is identical as the implementation procedure of step S102 shown in fig. 1, and implementation principle
It is identical, reference can be made to, it is not discussed here.
Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S602, can also use
The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.
Step S603: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on
Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
Optionally, the mode for specifically obtaining corresponding to the cosine similarity of the multi-C vector in step S603, can also adopt
The executive mode of the step S503- step S504 disclosed in Fig. 5 is realized.
Step S604: the size of the corresponding cosine similarity of the multi-C vector is judged, when the multi-C vector
The size of corresponding cosine similarity is judged, when the corresponding cosine similarity value of the multi-C vector is greater than preset threshold
When, it determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine N number of multi-C vector
Corresponding news id, wherein the value of N is the positive integer more than or equal to 2.
It should be noted that judging in the size to the corresponding cosine similarity of the multi-C vector, may go out
Existing cosine similarity value is equal and maximum or cosine similarity value is greater than multiple multi-C vectors of preset threshold, described pre-
If threshold value according to the actual situation, determines the selection numerical value of optimal multi-C vector by technical staff.
It should be noted that the value of N can specifically be chosen to be multiple according to the actual situation.
A kind of news disclosed by the embodiments of the present invention recalls method, clicks news id sequence as user by obtaining user
Feature, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on pre-set
Prediction model is trained the user characteristics and the label, and converts to the user characteristics, generates corresponding institute
The multi-C vector for stating user characteristics, according to the multi-C vector and the subsequent time user click news id, do not click on it is new
It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, the multi-C vector is corresponding
The size of cosine similarity is judged, when the size of the corresponding cosine similarity of the multi-C vector is judged, when described
When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined
It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.By the above method, user characteristics are generated
Multi-C vector is determined the corresponding news id of multi-C vector, is obtained user based on the size of the corresponding cosine similarity of multi-C vector
Interested news, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
Method specific implementation process is recalled based on above-mentioned news, citing is illustrated here:
For example, it is id0-id30 that active user, which clicks news sequence, the news id that subsequent user is clicked with do not click on it is new
The label for hearing id is id31 and id32 respectively, in the embeding layer of pre-set prediction model, is remembered based on lstm shot and long term
User is clicked news sequence id0-id30 and is trained with label id31 and label id31 by model, and clicks news sequence to user
Column id0-id19 is converted, and 500 dimensional vectors are generated, and in the output layer of pre-set prediction model, is based on the sense of mlp multilayer
Know that machine projects the multi-C vector, generate the corresponding multi-C vector of the user characteristics, by 500 dimensional vector and institute
It states id31, id31 and carries out cosine similarity calculating, the cosine similarity of 500 dimensional vector is obtained, according to 500 dimensional vector
Cosine similarity it is descending successively chosen, determine choose after the corresponding news id of 500 dimensional vectors, obtain user sense
The news of interest.Ctr (hits/impression) value for the news that the news method of recalling provided through the embodiment of the present invention is recalled
The ctr value of news is recalled higher than traditional conventional contents portrait, so that retention ratio and reading duration is obviously improved per capita.
Method is recalled based on a kind of news disclosed in the embodiments of the present invention, also correspondence of the embodiment of the present invention discloses one
Kind news recalling system, as shown in fig. 7, the news recalling system 700 specifically includes that
Acquiring unit 701 clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time and use
For the news id that family the is clicked and news id not clicked on as label, subsequent time is corresponding in user click news id sequence
The newest moment for clicking news id determines.
Training converting unit 702, for based on pre-set prediction model to the user characteristics and the label into
Row training, and the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics.
Computing unit 703, for according to the multi-C vector and the subsequent time user click news id, do not click on
News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector.
Determination unit 704 chooses N for the size based on the corresponding cosine similarity of the multi-C vector from large to small
A multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Further, the trained converting unit 702, as shown in Figure 8, comprising:
Module 801 is constructed, for constructing original neural network model.
Module 802 is obtained, the news sequence id that the corresponding user of the training user for obtaining preset number clicks
With the news sequence id not clicked on.
Input module 803, news sequence id for clicking the corresponding user of each training user and does not click on
News sequence id sequentially input to the original neural network model, obtain the corresponding initial training of each training user
As a result.
Update module 804 is predicted for updating the original neural network parameter according to the initial training result
Model.
Conversion module 805, for being based on lstm shot and long term memory models in the embeding layer of pre-set prediction model
The user characteristics are converted into multi-C vector.
Projection module 806, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) for institute
It states multi-C vector to be projected, generates the corresponding multi-C vector of the user characteristics.
Further, the computing unit 703, as shown in Figure 9, comprising:
First computing module 901, more than the news id progress of the multi-C vector and subsequent time user click
String similarity calculation obtains corresponding first cosine similarity of the multi-C vector.
Second computing module 902, the news id not clicked on for the multi-C vector and the subsequent time carry out cosine
Similarity calculation obtains corresponding second cosine similarity of the multi-C vector.
Further, the determination unit 704, comprising: sorting module 1001 or judgment module 1002;
The sorting module 1001, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen
Descending N number of multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2
Positive integer.
The judgment module 1002 judges for the size to the corresponding cosine similarity of the multi-C vector, when
When the corresponding cosine similarity value of the multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold is determined, by
It is big to choose N number of multi-C vector to small, determine the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2
Positive integer.
The specific principle of each unit and module and execution in news recalling system disclosed in the embodiments of the present invention
It is identical to recall method with news disclosed in the embodiments of the present invention for process, reference can be made to new disclosed in the embodiments of the present invention
News recalls corresponding part in method, is not discussed here.
Based on news recalling system disclosed in the embodiments of the present invention, above-mentioned each unit and module can pass through one kind
The hardware device being made of processor and memory is realized.Specifically: above-mentioned each unit and module are stored as program unit
In memory, above procedure unit stored in memory is executed by processor to realize that news is recalled.
Wherein, include kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can be set
One or more realizes that news is recalled by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited
Store up chip.
Further, the embodiment of the invention provides a kind of processors, and the processor is for running program, wherein institute
The news is executed when stating program operation to recall.
Equipment disclosed in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..
Further, the embodiment of the invention also provides a kind of storage medium, it is stored thereon with program, the program is processed
Realize that news recalls method when device executes.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, apparatus or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (9)
1. a kind of news recalls method characterized by comprising
It obtains user and clicks news id sequence as user characteristics, and obtain news id and non-point that subsequent time user clicks
For the news id hit as label, subsequent time is that the user clicks the corresponding newest moment for clicking news id in news id sequence
It determines;
The user characteristics and the label are trained based on pre-set prediction model, and to the user characteristics into
Row conversion generates the multi-C vector of the corresponding user characteristics;
The news id progress cosine similarity clicking news id according to the multi-C vector and the subsequent time user, not clicking on
It calculates, obtains the cosine similarity for corresponding to the multi-C vector;
Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines the N
The corresponding news id of a multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
2. the method according to claim 1, wherein the setting up procedure of the pre-set prediction model, packet
It includes:
Construct original neural network model;
Obtain the corresponding user of training user the news id clicked and the news id not clicked on of preset number;
News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to the original
Beginning neural network model obtains the corresponding initial training result of each training user;
The original neural network parameter, which is updated, according to the initial training result obtains prediction model.
3. the method according to claim 1, wherein described be based on pre-set prediction model to the user
Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to
Amount, comprising:
In the embeding layer of pre-set prediction model, the user characteristics are converted into based on lstm shot and long term memory models
Multi-C vector;
In the output layer of pre-set prediction model, the multi-C vector is projected based on mlp multi-layer perception (MLP), it is raw
At the corresponding multi-C vector of the user characteristics.
4. the method according to claim 1, wherein described use according to the multi-C vector and the subsequent time
The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector
Like degree, comprising:
The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains the multidimensional
Corresponding first cosine similarity of vector;
The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtain the multidimensional to
Measure corresponding second cosine similarity.
5. the method according to claim 1, wherein described be based on the corresponding cosine similarity of the multi-C vector
Size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector, comprising:
The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector, is determined
The corresponding news id of N number of multi-C vector;
Or
The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector is similar
It when angle value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine
The corresponding news id of N number of multi-C vector.
6. a kind of news recalling system characterized by comprising
Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user and click
News id and the news id that does not click on as label, subsequent time is that the user clicks corresponding in news id sequence click newly
The newest moment for hearing id determines;
Training converting unit, for being trained based on pre-set prediction model to the user characteristics and the label,
And the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics;
Computing unit, for according to the multi-C vector and the news id of subsequent time user click, the news that does not click on
Id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector;
Determination unit, for the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to
Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
7. system according to claim 6, which is characterized in that described to be based on pre-set prediction model to the user
Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to
The training converting unit of amount, comprising:
Conversion module, will be described in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models
User characteristics are converted into multi-C vector;
Projection module, in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to
Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.
8. system according to claim 6, which is characterized in that described to be used according to the multi-C vector and the subsequent time
The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector
Like the computing unit of degree, comprising:
First computing module, the news id clicked for the multi-C vector and the subsequent time user carry out cosine similarity
It calculates, obtains corresponding first cosine similarity of the multi-C vector;
Second computing module, the news id not clicked on by the multi-C vector and the subsequent time are carried out based on cosine similarity
It calculates, obtains corresponding second cosine similarity of the multi-C vector.
9. system according to claim 8, which is characterized in that described to be based on the corresponding cosine similarity of the multi-C vector
Size, choose N number of multi-C vector from large to small, determine the determination unit of the corresponding news id of N number of multi-C vector, wrap
It includes: sorting module or judgment module;
The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen descending
N number of multi-C vector, determine the corresponding news id of N number of multi-C vector;
The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when the multidimensional
When the corresponding cosine similarity value of vector is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, select from large to small
N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910132210.9A CN109871487B (en) | 2019-02-22 | 2019-02-22 | News recall method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910132210.9A CN109871487B (en) | 2019-02-22 | 2019-02-22 | News recall method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109871487A true CN109871487A (en) | 2019-06-11 |
CN109871487B CN109871487B (en) | 2021-03-23 |
Family
ID=66919134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910132210.9A Active CN109871487B (en) | 2019-02-22 | 2019-02-22 | News recall method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871487B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346899A (en) * | 2011-10-08 | 2012-02-08 | 亿赞普(北京)科技有限公司 | Method and device for predicting advertisement click rate based on user behaviors |
CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
WO2014143024A1 (en) * | 2013-03-15 | 2014-09-18 | Yahoo! Inc. | Almost online large scale collaborative filtering based recommendation system |
CN106599226A (en) * | 2016-12-19 | 2017-04-26 | 深圳大学 | Content recommendation method and content recommendation system |
US20180357321A1 (en) * | 2017-06-08 | 2018-12-13 | Ebay Inc. | Sequentialized behavior based user guidance |
CN109104620A (en) * | 2018-07-26 | 2018-12-28 | 腾讯科技(深圳)有限公司 | A kind of short video recommendation method, device and readable medium |
-
2019
- 2019-02-22 CN CN201910132210.9A patent/CN109871487B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346899A (en) * | 2011-10-08 | 2012-02-08 | 亿赞普(北京)科技有限公司 | Method and device for predicting advertisement click rate based on user behaviors |
CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
WO2014143024A1 (en) * | 2013-03-15 | 2014-09-18 | Yahoo! Inc. | Almost online large scale collaborative filtering based recommendation system |
CN106599226A (en) * | 2016-12-19 | 2017-04-26 | 深圳大学 | Content recommendation method and content recommendation system |
US20180357321A1 (en) * | 2017-06-08 | 2018-12-13 | Ebay Inc. | Sequentialized behavior based user guidance |
CN109104620A (en) * | 2018-07-26 | 2018-12-28 | 腾讯科技(深圳)有限公司 | A kind of short video recommendation method, device and readable medium |
Non-Patent Citations (1)
Title |
---|
GUORUI ZHOU等: "Deep Interest Network for Click-Through Rate Prediction", 《HTTPS://ARXIV.ORG/ABS/1706.06978》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109871487B (en) | 2021-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503531A (en) | The dynamic social activity scene recommended method of timing perception | |
CN108962238A (en) | Dialogue method, system, equipment and storage medium based on structural neural networks | |
CN110458663A (en) | A kind of vehicle recommended method, device, equipment and storage medium | |
CN108460082A (en) | A kind of recommendation method and device, electronic equipment | |
CN108230010A (en) | A kind of method and server for estimating ad conversion rates | |
WO2022198982A1 (en) | Conversational point-of-interest recommendation method and apparatus, and electronic device and storage medium | |
CN108182229A (en) | Information interacting method and device | |
CN110489655A (en) | Hot content determination, recommended method, device, equipment and readable storage medium storing program for executing | |
CN110619082B (en) | Project recommendation method based on repeated search mechanism | |
CN109961142A (en) | A kind of Neural network optimization and device based on meta learning | |
CN106776930A (en) | A kind of location recommendation method for incorporating time and geographical location information | |
CN109784959A (en) | A kind of target user's prediction technique, device, background server and storage medium | |
CN107633326A (en) | A kind of user delivers the construction method and computing device of wish model | |
CN108805346A (en) | A kind of hot continuous rolling force forecasting method based on more hidden layer extreme learning machines | |
CN110110372A (en) | A kind of user's timing behavior automatic segmentation prediction technique | |
CN110110899A (en) | Prediction technique, adaptive learning method and the electronic equipment of acquisition of knowledge degree | |
CN107274016A (en) | The strip exit thickness Forecasting Methodology of the random symmetrical extreme learning machine of algorithm optimization that leapfrogs | |
Lin et al. | Evolutionary game-based data aggregation model for wireless sensor networks | |
CN107944026A (en) | A kind of method, apparatus, server and the storage medium of atlas personalized recommendation | |
CN110222838A (en) | Deep neural network and its training method, device, electronic equipment and storage medium | |
CN110008411A (en) | It is a kind of to be registered the deep learning point of interest recommended method of sparse matrix based on user | |
CN107329887A (en) | A kind of data processing method and device based on commending system | |
CN105989005B (en) | A kind of method for pushing and device of information | |
CN111369324B (en) | Target information determining method, device, equipment and readable storage medium | |
CN109871487A (en) | A kind of news recalls method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |