CN109871487A - A kind of news recalls method and system - Google Patents

A kind of news recalls method and system Download PDF

Info

Publication number
CN109871487A
CN109871487A CN201910132210.9A CN201910132210A CN109871487A CN 109871487 A CN109871487 A CN 109871487A CN 201910132210 A CN201910132210 A CN 201910132210A CN 109871487 A CN109871487 A CN 109871487A
Authority
CN
China
Prior art keywords
vector
news
user
cosine similarity
user characteristics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910132210.9A
Other languages
Chinese (zh)
Other versions
CN109871487B (en
Inventor
安鸣佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sohu New Media Information Technology Co Ltd
Original Assignee
Beijing Sohu New Media Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sohu New Media Information Technology Co Ltd filed Critical Beijing Sohu New Media Information Technology Co Ltd
Priority to CN201910132210.9A priority Critical patent/CN109871487B/en
Publication of CN109871487A publication Critical patent/CN109871487A/en
Application granted granted Critical
Publication of CN109871487B publication Critical patent/CN109871487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of news to recall method and system, news id sequence is clicked as user characteristics by obtaining user, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, user characteristics and label are trained based on pre-set prediction model, generate the multi-C vector of corresponding user characteristics, the news id clicked according to multi-C vector and subsequent time user, the news id not clicked on carries out cosine similarity calculating, obtain the cosine similarity of corresponding multi-C vector, N number of multi-C vector is chosen from large to small based on the corresponding cosine similarity of multi-C vector, determine the corresponding news id of N number of multi-C vector.Pass through the above method, the multi-C vector for generating user characteristics determines the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, the interested news of user is obtained, realizes that user obtains the higher news of its interest-degree by recalling the interested news of user.

Description

A kind of news recalls method and system
Technical field
The present invention relates to depth learning technology fields, recall method and system more specifically to a kind of news.
Background technique
With the rapid development of information technology and internet, Internet news is more and more welcomed by the people, becomes people A kind of main path of information is obtained in daily life.At present news recall be people obtain information one of mode.
It is an important process in news recommendation field that news, which recalls work, and it is to utilize user's mistake that traditional news, which is recalled, It goes the keyword of the title clicked and content to be recalled, and obtains the related news of this keyword.
In the prior art, since news record is more, the feature total amount of news is big, is provided by the news that traditional news is recalled News, obtained Domestic News do not meet user interest a bit.
Summary of the invention
In view of this, this application provides a kind of news to recall method and system, it is emerging to realize that the news recalled meets user The purpose of interest.
To achieve the goals above, it is proposed that scheme it is as follows:
First aspect present invention discloses a kind of news and recalls method, comprising:
Obtain user click news id sequence be used as user characteristics, and obtain subsequent time user click news id with For the news id not clicked on as label, subsequent time is the newest of corresponding click news id in user click news id sequence Moment determines;
The user characteristics and the label are trained based on pre-set prediction model, and special to the user Sign is converted, and the multi-C vector of the corresponding user characteristics is generated;
The news id progress cosine phase clicking news id according to the multi-C vector and the subsequent time user, not clicking on It is calculated like degree, obtains the cosine similarity for corresponding to the multi-C vector;
Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines institute State the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Preferably, the setting up procedure of the pre-set prediction model, comprising:
Construct original neural network model;
Obtain the corresponding user of training user the news id clicked and the news id not clicked on of preset number;
News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to institute Original neural network model is stated, the corresponding initial training result of each training user is obtained;
The original neural network parameter, which is updated, according to the initial training result obtains prediction model.
Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and The user characteristics are converted, the multi-C vector of the corresponding user characteristics is generated, comprising:
In the embeding layer of pre-set prediction model, the user characteristics are turned based on lstm shot and long term memory models Change multi-C vector into;
In the output layer of pre-set prediction model, the multi-C vector is thrown based on mlp multi-layer perception (MLP) Shadow generates the corresponding multi-C vector of the user characteristics.
Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, comprising:
The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains described Corresponding first cosine similarity of multi-C vector;
The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtains described more Corresponding second cosine similarity of dimensional vector.
Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small Vector determines the corresponding news id of N number of multi-C vector, comprising:
The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector, Determine the corresponding news id of N number of multi-C vector;
Or
The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector When similarity value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, Determine the corresponding news id of N number of multi-C vector.
Second aspect of the present invention discloses a kind of news recalling system, comprising:
Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user For the news id of the click and news id not clicked on as label, subsequent time is that the user clicks corresponding points in news id sequence The newest moment for hitting news id determines;
Training converting unit, for being instructed based on pre-set prediction model to the user characteristics and the label Practice, and the user characteristics are converted, generates the multi-C vector of the corresponding user characteristics;
Computing unit, for according to the multi-C vector and the news id of subsequent time user click, do not click on News id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector;
Determination unit is chosen N number of more from large to small for the size based on the corresponding cosine similarity of the multi-C vector Dimensional vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and The user characteristics are converted, the training converting unit of the multi-C vector of the corresponding user characteristics is generated, comprising:
Conversion module, will in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models The user characteristics are converted into multi-C vector;
Projection module, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) will be described more Dimensional vector is projected, and the corresponding multi-C vector of the user characteristics is generated.
Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the computing unit for corresponding to the cosine similarity of the multi-C vector, comprising:
First computing module carries out cosine phase for the multi-C vector and the news id that the subsequent time user clicks It is calculated like degree, obtains corresponding first cosine similarity of the multi-C vector;
Second computing module, it is similar to the news id progress cosine that the subsequent time does not click on for the multi-C vector Degree calculates, and obtains corresponding second cosine similarity of the multi-C vector.
Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small Vector determines the determination unit of the corresponding news id of N number of multi-C vector, comprising: sorting module or judgment module;
The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen by big To small N number of multi-C vector, the corresponding news id of N number of multi-C vector is determined;
The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when described When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.
As can be seen from the above technical solutions, the invention discloses a kind of news to recall method and system, by obtaining user News id sequence is clicked as user characteristics, and obtains the news id that subsequent time user clicks and the news id not clicked on work For label, user characteristics and label are trained based on pre-set prediction model, and user characteristics are converted, it is raw The multi-C vector for answering user characteristics in pairs, the news id clicked according to multi-C vector and subsequent time user, the news not clicked on Id carries out cosine similarity calculating, obtains the cosine similarity of corresponding multi-C vector, similar based on the corresponding cosine of multi-C vector The size of degree chooses N number of multi-C vector from large to small, determines the corresponding news id of N number of multi-C vector.It is raw by the above method At the multi-C vector of user characteristics, based on the size of the corresponding cosine similarity of multi-C vector, determine that multi-C vector is corresponding new Id is heard, the interested news of user is obtained, by recalling the interested news of user to realize that it is higher that user obtains its interest-degree News.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow diagram that a kind of news disclosed by the embodiments of the present invention recalls method;
Fig. 2 is the flow diagram of prediction model setting up procedure disclosed by the embodiments of the present invention;
Fig. 3 is the flow diagram disclosed by the embodiments of the present invention for updating original neural network parameter;
Fig. 4 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 5 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 6 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method;
Fig. 7 is a kind of structural schematic diagram of news recalling system disclosed by the embodiments of the present invention;
Fig. 8 is the structural schematic diagram of the training converting unit of news recalling system disclosed by the embodiments of the present invention;
Fig. 9 is the structural schematic diagram of the computing unit of news recalling system disclosed by the embodiments of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In this application, the terms "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, So that the process, method, article or equipment for including a series of elements not only includes those elements, but also including not having The other element being expressly recited, or further include for elements inherent to such a process, method, article, or device.Do not having There is the element limited in the case where more limiting by sentence "including a ...", it is not excluded that in the mistake including the element There is also other identical elements in journey, method, article or equipment.
It can be seen from background technology that in the prior art, it is the title clicked in the past using user and content that news, which is recalled, Keyword is recalled, and obtains the related news of this keyword, recalls Domestic News by the news method of recalling, obtained Domestic News do not meet user interest a bit.Therefore, the invention discloses a kind of news to recall method, by generating user characteristics Multi-C vector determine the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, used The interested news in family, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
As shown in Figure 1, recall the flow diagram of method for a kind of news disclosed by the embodiments of the present invention, specifically include as Lower step:
Step S101: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.
During executing step S101, since user behavior is affected by timeliness, thus the user's point chosen Hitting sequence is the news id that clicked recently of user as user characteristics, and clicks the news that sequence subsequent time user clicks Id and the news id not clicked on are as label.
It should be noted that the user determines after clicking the news content title that news id sequence is user's click, The user click news id quantity can be it is multiple, be specifically configured according to the actual situation by technical staff.
It should be noted that the news id that the user clicks is in the news id that user clicks gathers, user's point The news id quantity hit can specifically be chosen by technical staff to be multiple according to the actual situation.
It should be noted that subsequent time be the user click in news id sequence it is corresponding click news id it is newest when It carves and determines.
The realization process of the news id clicked based on above-mentioned acquisition subsequent time user and the news id not clicked on, are lifted here Example is illustrated:
For example, 02 timesharing at 15, news id15 that user clicks is as user characteristics, 03 timesharing when 15, by user's point The news id16 hit and the news id17 not clicked on are as label.
It should be noted that the news id that does not click on of the user is in the news id set that user does not click on, the use The id for the news that family does not click on is any one in the news id set that user does not click on, specifically by technical staff according to reality Situation is chosen.
Step S102: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
During executing step S102, the user characteristics and the label are passed through into pre-set prediction model It is trained, and the user characteristics is converted into correspond to the multi-C vector of the user characteristics.The prediction model is to be based on The original neural network model of sample data training obtains.
The setting up procedure for the pre-set prediction model being related in the step S102 that above-mentioned Fig. 1 is disclosed, such as Fig. 2 show The flow diagram for having gone out prediction model setting up procedure, specifically comprises the following steps:
Step S201: original neural network model is constructed.
Step S202: it obtains the news id of the corresponding user's click of training user of preset number and does not click on new Hear id.
It should be noted that the training user can be the user of different age group, the user of different sexes can be, It is also possible to the user etc. of different hobbies, the determination of specific training user is selected according to the actual situation by technical staff It selects.
Step S203: by the news id that the corresponding user of each training user the clicks and news id that user does not click on It sequentially inputs to the original neural network model, obtains the corresponding initial training result of each training user.
During implementing step S203, news id that the corresponding user of each training user is clicked and The news id not clicked on is sequentially input to original neural network model, and shot and long term memory models lstm is selected to use as training Original neural network model.
It should be noted that when shot and long term memory models (long short-trem memory, lstm) are that one kind is based on Between sequence deep neural network, for handle and predicted time sequence in be spaced and postpone relatively long material time.
Step S204: the original neural network parameter is updated according to the initial training result and obtains prediction model.
It is updated according to the initial training result by back-propagation algorithm by executing above-mentioned steps S201- step S204 The original neural network parameter obtains prediction model.
It should be noted that back-propagation algorithm (back propagation algorithm, bpa) often is used to train Multi-layer perception (MLP), the back-propagation algorithm bpa is mainly propagated by excitation and weight updates iterative cycles iteration, until input Until data response reaches scheduled target zone.
In the concrete realization, include multiple process layers in the prediction model of building, include at least: embedding layers, two-way Lstm layers and predict layers.
It should be noted that the embedding layers of method according to matrix decomposition, passes through lstm shot and long term memory models The user characteristics are converted into the multi-C vector with symbolical meanings.Described two-way lstm layers are remembered mould using lstm shot and long term Type connects the mode training pattern of mlp multi-layer perception (MLP), and the news that the output result of mlp multi-layer perception (MLP) and user are clicked The id and news id not clicked on carries out cosine similarity calculating.User characteristics are inputted in predict layers described and user clicks newly It hears id, click on news id, it is similar with the cosine for the news id not clicked on to calculate the news id that user clicks according to prediction model Degree.
It should be noted that the embedding layers and two-way lstm layers training stage for this prediction model, predict layers For the forecast period of this prediction model.
Explanation is needed further exist for, it is subsequent to adjust the prediction as a result, can also be used as using what the prediction model obtained The parameter of model prediction accuracy carry out using.
The foundation initial training result being related in the step S204 that above-mentioned Fig. 2 is disclosed updates the original neural network ginseng Number obtain the process of prediction model, such as Fig. 3, show the flow diagram for updating original neural network parameter, specifically include as Lower step:
Step S301: it obtains training user and clicks news id sequence.
It should be noted that the training user, which clicks in news id sequence, clicks news id comprising multiple users, specifically The determination that training user clicks news id sequence is configured according to the actual situation by technical staff.
Step S302: the training user is clicked into news id sequence and passes through embedding layers of progress vectorization.
Step S303: the training user after vectorization is clicked into news id sequence and passes through lstm shot and long term memory models It is converted into multi-C vector.
Step S304: the multi-C vector is projected by mlp multi-layer perception (MLP), it is corresponding more to obtain user characteristics Dimensional vector.
Step S305: according to the corresponding multi-C vector of the user characteristics respectively with user click news id and do not click on News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, and pass through backpropagation calculation Method updates prediction model parameters.
Training user's click news id sequence is obtained by executing above-mentioned steps S301- step S305, by user's point It hits news id sequence to be projected to obtain the corresponding multi-C vector of user characteristics, according to the corresponding multi-C vector of the user characteristics The news id clicked respectively with user and the news id not clicked on carry out cosine similarity calculating, obtain corresponding to the multi-C vector Cosine similarity, and prediction model parameters are updated by back-propagation algorithm.
Based on the process of above-mentioned update prediction model parameters, citing is illustrated here:
It is id0-id19 that active user, which clicks news id sequence, news id20 that subsequent time user clicks and is not clicked on News id21 clicks news id sequence id0-id19 after embedding layers of vectorization as label, by the active user, 300 dimensional vectors are converted by lstm shot and long term memory models, 300 dimensional vector is thrown by mlp multi-layer perception (MLP) Shadow obtains corresponding 300 dimensional vector of user characteristics, and 300 dimensional vector is carried out cosine similarity meter with id20, id21 respectively It calculates, the second cosine similarity value for obtaining the first cosine similarity value of user's click and not clicking on passes through back-propagation algorithm Update prediction model parameters.
Step S103: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
During executing step S103, news is clicked with the subsequent time user respectively according to the multi-C vector Id, the news id not clicked on carry out cosine similarity calculating, obtain corresponding first cosine similarity of the multi-C vector With the second cosine similarity.
It should be noted that according to the big of corresponding first cosine similarity of the multi-C vector and the second cosine similarity It is small, the corresponding news id of multi-C vector is determined, so that the interested news of user is obtained, by recalling the interested news of user So that user obtains the higher news of its interest-degree.
Step S104: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
During executing step S104, based on the size of the corresponding cosine similarity of the multi-C vector, You great Zhi It is small be ranked up after choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.
It should be noted that multi-C vector can be 300 dimensional vectors, it is also possible to 600 dimensional vectors etc., specifically by technology people Member is configured according to the actual situation.
It should be noted that the cosine similarity of multi-C vector is bigger, illustrate that the user the interested in current news.
It should be noted that the value of specific N is chosen according to the actual situation by technical staff.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and obtain news id that subsequent time user clicks and the news id that does not click on as label, based on setting in advance The prediction model set is trained user characteristics and label, and converts to user characteristics, generates corresponding user characteristics Multi-C vector, the news id progress cosine similarity meter that news id is clicked according to multi-C vector and subsequent time user, is not clicked on It calculates, the cosine similarity for obtaining corresponding multi-C vector is selected from large to small based on the size of the corresponding cosine similarity of multi-C vector N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.By the above method, generate the multidimensional of user characteristics to Amount, based on the size of the corresponding cosine similarity of multi-C vector, determines the corresponding news id of multi-C vector, it is interested to obtain user News, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
Optionally, it is evaluated and tested by user click data true on line, is recalled using lstm shot and long term memory models The click-through-rate (click through rate, ctr) of news be higher than the click-through-rate recalled of traditional news.
It should be noted that ctr click-through-rate is hits/impression.
The embodiment of the present invention is recalled new by true user click data on line using lstm shot and long term memory models The click-through-rate of news is higher than the click-through-rate that traditional news is recalled, so that recalled by lstm shot and long term memory models The retention ratio of news and per capita read duration be improved.
Based on method described in Fig. 1, another kind news disclosed by the embodiments of the present invention recalls method, as shown in figure 4, specifically Include the following steps:
Step S401: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.
The implementation procedure of above-mentioned steps S401 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.
Step S402:, will be described based on lstm shot and long term memory models in the embeding layer of pre-set prediction model User characteristics are converted into multi-C vector.
During executing step S402, in the embeding layer of pre-set prediction model, according to matrix decomposition The user characteristics are converted into multi-C vector by lstm shot and long term memory models by method.
It is clicked in news id sequence it should be noted that applying lstm shot and long term memory models in user, filtering and use The incoherent news id of family interest, algorithmically selects the superior and eliminates the inferior, the interested news id of optimum selecting user.
Step S403: in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.
It should be noted that including: embeding layer in multi-layer perception (MLP) (multi layer perception, mlp), hide Layer and output layer.Connection is between layers in mlp multi-layer perception (MLP), upper one layer any one neuron and next layer of institute There is neuron to have connection.
It should be noted that multi-C vector is converted into after user characteristics to be input to the embeding layer of prediction model, by hidden The multi-C vector is transferred in output layer by hiding layer from input layer, in output layer, after the multi-C vector is projected, Generate the corresponding multi-C vector of user characteristics.
It should be noted that the user characteristics and the label are trained based on pre-set prediction model, It is trained in such a way that lstm shot and long term memory models connect mlp multi-layer perception (MLP).
Step S404: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
Step S405: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
The implementation procedure of the implementation procedure of above-mentioned steps S404- step S405 and step S103- step S104 shown in fig. 1 It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, it is presetting Prediction model embeding layer in, the user characteristics are converted into multi-C vector based on lstm shot and long term memory models, pre- In the output layer for the prediction model being first arranged, the multi-C vector is projected based on mlp multi-layer perception (MLP), generates the use The corresponding multi-C vector of family feature, according to the multi-C vector and the subsequent time user click news id, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, it is corresponding based on the multi-C vector Cosine similarity size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector.It is logical The above method is crossed, the multi-C vector of user characteristics is generated, based on the size of the corresponding cosine similarity of multi-C vector, determines multidimensional The corresponding news id of vector, obtains the interested news of user, by recalling the interested news of user to realize that user obtains The higher news of its interest-degree.
It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in figure 5, specifically include as Lower step:
Step S501: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.
Step S502: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
The implementation procedure of the implementation procedure of above-mentioned steps S501- step S502 and step S101- step S102 shown in fig. 1 It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.
Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S502, can also use The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.
Step S503: the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating, Obtain corresponding first cosine similarity of the multi-C vector.
It should be noted that first cosine similarity is the cosine similarity that user clicks news id.According to obtaining Corresponding first cosine similarity of the multi-C vector is updated the parameter of model by back-propagation algorithm.
Step S504: the news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, Obtain corresponding second cosine similarity of the multi-C vector.
It should be noted that second cosine similarity is the cosine similarity that user does not click on news id.According to The parameter of model is updated by back-propagation algorithm to corresponding second cosine similarity of the multi-C vector.
Step S505: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
The implementation procedure of above-mentioned steps S505 is identical as the implementation procedure of step S104 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.
The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on described more The size of the corresponding cosine similarity of dimensional vector, chooses N number of multi-C vector from large to small, determines that N number of multi-C vector is corresponding News id, the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating, obtain described The news id that corresponding first cosine similarity of multi-C vector, the multi-C vector and the subsequent time do not click on is carried out Cosine similarity calculates, and obtains corresponding second cosine similarity of the multi-C vector, is based on pre-set prediction mould Type is trained the user characteristics and the label, and converts to the user characteristics, generates the corresponding user The multi-C vector of feature.By the above method, the multi-C vector of user characteristics is generated, it is similar based on the corresponding cosine of multi-C vector The size of degree determines the corresponding news id of multi-C vector, obtains the interested news of user, interested new by recalling user It hears to realize that user obtains the higher news of its interest-degree.
It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in fig. 6, specifically include as Lower step:
Step S601: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.
The implementation procedure of above-mentioned steps S601 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.
Step S602: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.
The implementation procedure of above-mentioned steps S602 is identical as the implementation procedure of step S102 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.
Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S602, can also use The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.
Step S603: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.
Optionally, the mode for specifically obtaining corresponding to the cosine similarity of the multi-C vector in step S603, can also adopt The executive mode of the step S503- step S504 disclosed in Fig. 5 is realized.
Step S604: the size of the corresponding cosine similarity of the multi-C vector is judged, when the multi-C vector The size of corresponding cosine similarity is judged, when the corresponding cosine similarity value of the multi-C vector is greater than preset threshold When, it determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine N number of multi-C vector Corresponding news id, wherein the value of N is the positive integer more than or equal to 2.
It should be noted that judging in the size to the corresponding cosine similarity of the multi-C vector, may go out Existing cosine similarity value is equal and maximum or cosine similarity value is greater than multiple multi-C vectors of preset threshold, described pre- If threshold value according to the actual situation, determines the selection numerical value of optimal multi-C vector by technical staff.
It should be noted that the value of N can specifically be chosen to be multiple according to the actual situation.
A kind of news disclosed by the embodiments of the present invention recalls method, clicks news id sequence as user by obtaining user Feature, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on pre-set Prediction model is trained the user characteristics and the label, and converts to the user characteristics, generates corresponding institute The multi-C vector for stating user characteristics, according to the multi-C vector and the subsequent time user click news id, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, the multi-C vector is corresponding The size of cosine similarity is judged, when the size of the corresponding cosine similarity of the multi-C vector is judged, when described When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.By the above method, user characteristics are generated Multi-C vector is determined the corresponding news id of multi-C vector, is obtained user based on the size of the corresponding cosine similarity of multi-C vector Interested news, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.
Method specific implementation process is recalled based on above-mentioned news, citing is illustrated here:
For example, it is id0-id30 that active user, which clicks news sequence, the news id that subsequent user is clicked with do not click on it is new The label for hearing id is id31 and id32 respectively, in the embeding layer of pre-set prediction model, is remembered based on lstm shot and long term User is clicked news sequence id0-id30 and is trained with label id31 and label id31 by model, and clicks news sequence to user Column id0-id19 is converted, and 500 dimensional vectors are generated, and in the output layer of pre-set prediction model, is based on the sense of mlp multilayer Know that machine projects the multi-C vector, generate the corresponding multi-C vector of the user characteristics, by 500 dimensional vector and institute It states id31, id31 and carries out cosine similarity calculating, the cosine similarity of 500 dimensional vector is obtained, according to 500 dimensional vector Cosine similarity it is descending successively chosen, determine choose after the corresponding news id of 500 dimensional vectors, obtain user sense The news of interest.Ctr (hits/impression) value for the news that the news method of recalling provided through the embodiment of the present invention is recalled The ctr value of news is recalled higher than traditional conventional contents portrait, so that retention ratio and reading duration is obviously improved per capita.
Method is recalled based on a kind of news disclosed in the embodiments of the present invention, also correspondence of the embodiment of the present invention discloses one Kind news recalling system, as shown in fig. 7, the news recalling system 700 specifically includes that
Acquiring unit 701 clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time and use For the news id that family the is clicked and news id not clicked on as label, subsequent time is corresponding in user click news id sequence The newest moment for clicking news id determines.
Training converting unit 702, for based on pre-set prediction model to the user characteristics and the label into Row training, and the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics.
Computing unit 703, for according to the multi-C vector and the subsequent time user click news id, do not click on News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector.
Determination unit 704 chooses N for the size based on the corresponding cosine similarity of the multi-C vector from large to small A multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
Further, the trained converting unit 702, as shown in Figure 8, comprising:
Module 801 is constructed, for constructing original neural network model.
Module 802 is obtained, the news sequence id that the corresponding user of the training user for obtaining preset number clicks With the news sequence id not clicked on.
Input module 803, news sequence id for clicking the corresponding user of each training user and does not click on News sequence id sequentially input to the original neural network model, obtain the corresponding initial training of each training user As a result.
Update module 804 is predicted for updating the original neural network parameter according to the initial training result Model.
Conversion module 805, for being based on lstm shot and long term memory models in the embeding layer of pre-set prediction model The user characteristics are converted into multi-C vector.
Projection module 806, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) for institute It states multi-C vector to be projected, generates the corresponding multi-C vector of the user characteristics.
Further, the computing unit 703, as shown in Figure 9, comprising:
First computing module 901, more than the news id progress of the multi-C vector and subsequent time user click String similarity calculation obtains corresponding first cosine similarity of the multi-C vector.
Second computing module 902, the news id not clicked on for the multi-C vector and the subsequent time carry out cosine Similarity calculation obtains corresponding second cosine similarity of the multi-C vector.
Further, the determination unit 704, comprising: sorting module 1001 or judgment module 1002;
The sorting module 1001, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen Descending N number of multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2 Positive integer.
The judgment module 1002 judges for the size to the corresponding cosine similarity of the multi-C vector, when When the corresponding cosine similarity value of the multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold is determined, by It is big to choose N number of multi-C vector to small, determine the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2 Positive integer.
The specific principle of each unit and module and execution in news recalling system disclosed in the embodiments of the present invention It is identical to recall method with news disclosed in the embodiments of the present invention for process, reference can be made to new disclosed in the embodiments of the present invention News recalls corresponding part in method, is not discussed here.
Based on news recalling system disclosed in the embodiments of the present invention, above-mentioned each unit and module can pass through one kind The hardware device being made of processor and memory is realized.Specifically: above-mentioned each unit and module are stored as program unit In memory, above procedure unit stored in memory is executed by processor to realize that news is recalled.
Wherein, include kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can be set One or more realizes that news is recalled by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.
Further, the embodiment of the invention provides a kind of processors, and the processor is for running program, wherein institute The news is executed when stating program operation to recall.
Equipment disclosed in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..
Further, the embodiment of the invention also provides a kind of storage medium, it is stored thereon with program, the program is processed Realize that news recalls method when device executes.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, apparatus or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (9)

1. a kind of news recalls method characterized by comprising
It obtains user and clicks news id sequence as user characteristics, and obtain news id and non-point that subsequent time user clicks For the news id hit as label, subsequent time is that the user clicks the corresponding newest moment for clicking news id in news id sequence It determines;
The user characteristics and the label are trained based on pre-set prediction model, and to the user characteristics into Row conversion generates the multi-C vector of the corresponding user characteristics;
The news id progress cosine similarity clicking news id according to the multi-C vector and the subsequent time user, not clicking on It calculates, obtains the cosine similarity for corresponding to the multi-C vector;
Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines the N The corresponding news id of a multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
2. the method according to claim 1, wherein the setting up procedure of the pre-set prediction model, packet It includes:
Construct original neural network model;
Obtain the corresponding user of training user the news id clicked and the news id not clicked on of preset number;
News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to the original Beginning neural network model obtains the corresponding initial training result of each training user;
The original neural network parameter, which is updated, according to the initial training result obtains prediction model.
3. the method according to claim 1, wherein described be based on pre-set prediction model to the user Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to Amount, comprising:
In the embeding layer of pre-set prediction model, the user characteristics are converted into based on lstm shot and long term memory models Multi-C vector;
In the output layer of pre-set prediction model, the multi-C vector is projected based on mlp multi-layer perception (MLP), it is raw At the corresponding multi-C vector of the user characteristics.
4. the method according to claim 1, wherein described use according to the multi-C vector and the subsequent time The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector Like degree, comprising:
The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains the multidimensional Corresponding first cosine similarity of vector;
The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtain the multidimensional to Measure corresponding second cosine similarity.
5. the method according to claim 1, wherein described be based on the corresponding cosine similarity of the multi-C vector Size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector, comprising:
The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector, is determined The corresponding news id of N number of multi-C vector;
Or
The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector is similar It when angle value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine The corresponding news id of N number of multi-C vector.
6. a kind of news recalling system characterized by comprising
Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user and click News id and the news id that does not click on as label, subsequent time is that the user clicks corresponding in news id sequence click newly The newest moment for hearing id determines;
Training converting unit, for being trained based on pre-set prediction model to the user characteristics and the label, And the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics;
Computing unit, for according to the multi-C vector and the news id of subsequent time user click, the news that does not click on Id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector;
Determination unit, for the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.
7. system according to claim 6, which is characterized in that described to be based on pre-set prediction model to the user Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to The training converting unit of amount, comprising:
Conversion module, will be described in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models User characteristics are converted into multi-C vector;
Projection module, in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.
8. system according to claim 6, which is characterized in that described to be used according to the multi-C vector and the subsequent time The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector Like the computing unit of degree, comprising:
First computing module, the news id clicked for the multi-C vector and the subsequent time user carry out cosine similarity It calculates, obtains corresponding first cosine similarity of the multi-C vector;
Second computing module, the news id not clicked on by the multi-C vector and the subsequent time are carried out based on cosine similarity It calculates, obtains corresponding second cosine similarity of the multi-C vector.
9. system according to claim 8, which is characterized in that described to be based on the corresponding cosine similarity of the multi-C vector Size, choose N number of multi-C vector from large to small, determine the determination unit of the corresponding news id of N number of multi-C vector, wrap It includes: sorting module or judgment module;
The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen descending N number of multi-C vector, determine the corresponding news id of N number of multi-C vector;
The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when the multidimensional When the corresponding cosine similarity value of vector is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, select from large to small N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.
CN201910132210.9A 2019-02-22 2019-02-22 News recall method and system Active CN109871487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910132210.9A CN109871487B (en) 2019-02-22 2019-02-22 News recall method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910132210.9A CN109871487B (en) 2019-02-22 2019-02-22 News recall method and system

Publications (2)

Publication Number Publication Date
CN109871487A true CN109871487A (en) 2019-06-11
CN109871487B CN109871487B (en) 2021-03-23

Family

ID=66919134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910132210.9A Active CN109871487B (en) 2019-02-22 2019-02-22 News recall method and system

Country Status (1)

Country Link
CN (1) CN109871487B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
WO2014143024A1 (en) * 2013-03-15 2014-09-18 Yahoo! Inc. Almost online large scale collaborative filtering based recommendation system
CN106599226A (en) * 2016-12-19 2017-04-26 深圳大学 Content recommendation method and content recommendation system
US20180357321A1 (en) * 2017-06-08 2018-12-13 Ebay Inc. Sequentialized behavior based user guidance
CN109104620A (en) * 2018-07-26 2018-12-28 腾讯科技(深圳)有限公司 A kind of short video recommendation method, device and readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
WO2014143024A1 (en) * 2013-03-15 2014-09-18 Yahoo! Inc. Almost online large scale collaborative filtering based recommendation system
CN106599226A (en) * 2016-12-19 2017-04-26 深圳大学 Content recommendation method and content recommendation system
US20180357321A1 (en) * 2017-06-08 2018-12-13 Ebay Inc. Sequentialized behavior based user guidance
CN109104620A (en) * 2018-07-26 2018-12-28 腾讯科技(深圳)有限公司 A kind of short video recommendation method, device and readable medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUORUI ZHOU等: "Deep Interest Network for Click-Through Rate Prediction", 《HTTPS://ARXIV.ORG/ABS/1706.06978》 *

Also Published As

Publication number Publication date
CN109871487B (en) 2021-03-23

Similar Documents

Publication Publication Date Title
CN110503531A (en) The dynamic social activity scene recommended method of timing perception
CN108962238A (en) Dialogue method, system, equipment and storage medium based on structural neural networks
CN110458663A (en) A kind of vehicle recommended method, device, equipment and storage medium
CN108460082A (en) A kind of recommendation method and device, electronic equipment
CN108230010A (en) A kind of method and server for estimating ad conversion rates
WO2022198982A1 (en) Conversational point-of-interest recommendation method and apparatus, and electronic device and storage medium
CN108182229A (en) Information interacting method and device
CN110489655A (en) Hot content determination, recommended method, device, equipment and readable storage medium storing program for executing
CN110619082B (en) Project recommendation method based on repeated search mechanism
CN109961142A (en) A kind of Neural network optimization and device based on meta learning
CN106776930A (en) A kind of location recommendation method for incorporating time and geographical location information
CN109784959A (en) A kind of target user's prediction technique, device, background server and storage medium
CN107633326A (en) A kind of user delivers the construction method and computing device of wish model
CN108805346A (en) A kind of hot continuous rolling force forecasting method based on more hidden layer extreme learning machines
CN110110372A (en) A kind of user's timing behavior automatic segmentation prediction technique
CN110110899A (en) Prediction technique, adaptive learning method and the electronic equipment of acquisition of knowledge degree
CN107274016A (en) The strip exit thickness Forecasting Methodology of the random symmetrical extreme learning machine of algorithm optimization that leapfrogs
Lin et al. Evolutionary game-based data aggregation model for wireless sensor networks
CN107944026A (en) A kind of method, apparatus, server and the storage medium of atlas personalized recommendation
CN110222838A (en) Deep neural network and its training method, device, electronic equipment and storage medium
CN110008411A (en) It is a kind of to be registered the deep learning point of interest recommended method of sparse matrix based on user
CN107329887A (en) A kind of data processing method and device based on commending system
CN105989005B (en) A kind of method for pushing and device of information
CN111369324B (en) Target information determining method, device, equipment and readable storage medium
CN109871487A (en) A kind of news recalls method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant