CN109871487A

CN109871487A - A kind of news recalls method and system

Info

Publication number: CN109871487A
Application number: CN201910132210.9A
Authority: CN
Inventors: 安鸣佳
Original assignee: Beijing Sohu New Media Information Technology Co Ltd
Current assignee: Beijing Sohu New Media Information Technology Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2019-06-11
Anticipated expiration: 2039-02-22
Also published as: CN109871487B

Abstract

The invention discloses a kind of news to recall method and system, news id sequence is clicked as user characteristics by obtaining user, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, user characteristics and label are trained based on pre-set prediction model, generate the multi-C vector of corresponding user characteristics, the news id clicked according to multi-C vector and subsequent time user, the news id not clicked on carries out cosine similarity calculating, obtain the cosine similarity of corresponding multi-C vector, N number of multi-C vector is chosen from large to small based on the corresponding cosine similarity of multi-C vector, determine the corresponding news id of N number of multi-C vector.Pass through the above method, the multi-C vector for generating user characteristics determines the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, the interested news of user is obtained, realizes that user obtains the higher news of its interest-degree by recalling the interested news of user.

Description

A kind of news recalls method and system

Technical field

The present invention relates to depth learning technology fields, recall method and system more specifically to a kind of news.

Background technique

With the rapid development of information technology and internet, Internet news is more and more welcomed by the people, becomes people A kind of main path of information is obtained in daily life.At present news recall be people obtain information one of mode.

It is an important process in news recommendation field that news, which recalls work, and it is to utilize user's mistake that traditional news, which is recalled, It goes the keyword of the title clicked and content to be recalled, and obtains the related news of this keyword.

In the prior art, since news record is more, the feature total amount of news is big, is provided by the news that traditional news is recalled News, obtained Domestic News do not meet user interest a bit.

Summary of the invention

In view of this, this application provides a kind of news to recall method and system, it is emerging to realize that the news recalled meets user The purpose of interest.

To achieve the goals above, it is proposed that scheme it is as follows:

First aspect present invention discloses a kind of news and recalls method, comprising:

Obtain user click news id sequence be used as user characteristics, and obtain subsequent time user click news id with For the news id not clicked on as label, subsequent time is the newest of corresponding click news id in user click news id sequence Moment determines；

The user characteristics and the label are trained based on pre-set prediction model, and special to the user Sign is converted, and the multi-C vector of the corresponding user characteristics is generated；

The news id progress cosine phase clicking news id according to the multi-C vector and the subsequent time user, not clicking on It is calculated like degree, obtains the cosine similarity for corresponding to the multi-C vector；

Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines institute State the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

Preferably, the setting up procedure of the pre-set prediction model, comprising:

Construct original neural network model；

Obtain the corresponding user of training user the news id clicked and the news id not clicked on of preset number；

News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to institute Original neural network model is stated, the corresponding initial training result of each training user is obtained；

The original neural network parameter, which is updated, according to the initial training result obtains prediction model.

Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and The user characteristics are converted, the multi-C vector of the corresponding user characteristics is generated, comprising:

In the embeding layer of pre-set prediction model, the user characteristics are turned based on lstm shot and long term memory models Change multi-C vector into；

In the output layer of pre-set prediction model, the multi-C vector is thrown based on mlp multi-layer perception (MLP) Shadow generates the corresponding multi-C vector of the user characteristics.

Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, comprising:

The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains described Corresponding first cosine similarity of multi-C vector；

The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtains described more Corresponding second cosine similarity of dimensional vector.

Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small Vector determines the corresponding news id of N number of multi-C vector, comprising:

The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector, Determine the corresponding news id of N number of multi-C vector；

Or

The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector When similarity value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, Determine the corresponding news id of N number of multi-C vector.

Second aspect of the present invention discloses a kind of news recalling system, comprising:

Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user For the news id of the click and news id not clicked on as label, subsequent time is that the user clicks corresponding points in news id sequence The newest moment for hitting news id determines；

Training converting unit, for being instructed based on pre-set prediction model to the user characteristics and the label Practice, and the user characteristics are converted, generates the multi-C vector of the corresponding user characteristics；

Computing unit, for according to the multi-C vector and the news id of subsequent time user click, do not click on News id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector；

Determination unit is chosen N number of more from large to small for the size based on the corresponding cosine similarity of the multi-C vector Dimensional vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

Preferably, described that the user characteristics and the label are trained based on pre-set prediction model, and The user characteristics are converted, the training converting unit of the multi-C vector of the corresponding user characteristics is generated, comprising:

Conversion module, will in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models The user characteristics are converted into multi-C vector；

Projection module, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) will be described more Dimensional vector is projected, and the corresponding multi-C vector of the user characteristics is generated.

Preferably, the news id clicked according to the multi-C vector and the subsequent time user, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the computing unit for corresponding to the cosine similarity of the multi-C vector, comprising:

First computing module carries out cosine phase for the multi-C vector and the news id that the subsequent time user clicks It is calculated like degree, obtains corresponding first cosine similarity of the multi-C vector；

Second computing module, it is similar to the news id progress cosine that the subsequent time does not click on for the multi-C vector Degree calculates, and obtains corresponding second cosine similarity of the multi-C vector.

Preferably, the size based on the corresponding cosine similarity of the multi-C vector, chooses N number of multidimensional from large to small Vector determines the determination unit of the corresponding news id of N number of multi-C vector, comprising: sorting module or judgment module；

The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen by big To small N number of multi-C vector, the corresponding news id of N number of multi-C vector is determined；

The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when described When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.

As can be seen from the above technical solutions, the invention discloses a kind of news to recall method and system, by obtaining user News id sequence is clicked as user characteristics, and obtains the news id that subsequent time user clicks and the news id not clicked on work For label, user characteristics and label are trained based on pre-set prediction model, and user characteristics are converted, it is raw The multi-C vector for answering user characteristics in pairs, the news id clicked according to multi-C vector and subsequent time user, the news not clicked on Id carries out cosine similarity calculating, obtains the cosine similarity of corresponding multi-C vector, similar based on the corresponding cosine of multi-C vector The size of degree chooses N number of multi-C vector from large to small, determines the corresponding news id of N number of multi-C vector.It is raw by the above method At the multi-C vector of user characteristics, based on the size of the corresponding cosine similarity of multi-C vector, determine that multi-C vector is corresponding new Id is heard, the interested news of user is obtained, by recalling the interested news of user to realize that it is higher that user obtains its interest-degree News.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the flow diagram that a kind of news disclosed by the embodiments of the present invention recalls method；

Fig. 2 is the flow diagram of prediction model setting up procedure disclosed by the embodiments of the present invention；

Fig. 3 is the flow diagram disclosed by the embodiments of the present invention for updating original neural network parameter；

Fig. 4 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method；

Fig. 5 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method；

Fig. 6 is the flow diagram that another news disclosed by the embodiments of the present invention recalls method；

Fig. 7 is a kind of structural schematic diagram of news recalling system disclosed by the embodiments of the present invention；

Fig. 8 is the structural schematic diagram of the training converting unit of news recalling system disclosed by the embodiments of the present invention；

Fig. 9 is the structural schematic diagram of the computing unit of news recalling system disclosed by the embodiments of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In this application, the terms "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, So that the process, method, article or equipment for including a series of elements not only includes those elements, but also including not having The other element being expressly recited, or further include for elements inherent to such a process, method, article, or device.Do not having There is the element limited in the case where more limiting by sentence "including a ...", it is not excluded that in the mistake including the element There is also other identical elements in journey, method, article or equipment.

It can be seen from background technology that in the prior art, it is the title clicked in the past using user and content that news, which is recalled, Keyword is recalled, and obtains the related news of this keyword, recalls Domestic News by the news method of recalling, obtained Domestic News do not meet user interest a bit.Therefore, the invention discloses a kind of news to recall method, by generating user characteristics Multi-C vector determine the corresponding news id of multi-C vector based on the size of the corresponding cosine similarity of multi-C vector, used The interested news in family, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.

As shown in Figure 1, recall the flow diagram of method for a kind of news disclosed by the embodiments of the present invention, specifically include as Lower step:

Step S101: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.

During executing step S101, since user behavior is affected by timeliness, thus the user's point chosen Hitting sequence is the news id that clicked recently of user as user characteristics, and clicks the news that sequence subsequent time user clicks Id and the news id not clicked on are as label.

It should be noted that the user determines after clicking the news content title that news id sequence is user's click, The user click news id quantity can be it is multiple, be specifically configured according to the actual situation by technical staff.

It should be noted that the news id that the user clicks is in the news id that user clicks gathers, user's point The news id quantity hit can specifically be chosen by technical staff to be multiple according to the actual situation.

It should be noted that subsequent time be the user click in news id sequence it is corresponding click news id it is newest when It carves and determines.

The realization process of the news id clicked based on above-mentioned acquisition subsequent time user and the news id not clicked on, are lifted here Example is illustrated:

For example, 02 timesharing at 15, news id15 that user clicks is as user characteristics, 03 timesharing when 15, by user's point The news id16 hit and the news id17 not clicked on are as label.

It should be noted that the news id that does not click on of the user is in the news id set that user does not click on, the use The id for the news that family does not click on is any one in the news id set that user does not click on, specifically by technical staff according to reality Situation is chosen.

Step S102: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.

During executing step S102, the user characteristics and the label are passed through into pre-set prediction model It is trained, and the user characteristics is converted into correspond to the multi-C vector of the user characteristics.The prediction model is to be based on The original neural network model of sample data training obtains.

The setting up procedure for the pre-set prediction model being related in the step S102 that above-mentioned Fig. 1 is disclosed, such as Fig. 2 show The flow diagram for having gone out prediction model setting up procedure, specifically comprises the following steps:

Step S201: original neural network model is constructed.

Step S202: it obtains the news id of the corresponding user's click of training user of preset number and does not click on new Hear id.

It should be noted that the training user can be the user of different age group, the user of different sexes can be, It is also possible to the user etc. of different hobbies, the determination of specific training user is selected according to the actual situation by technical staff It selects.

Step S203: by the news id that the corresponding user of each training user the clicks and news id that user does not click on It sequentially inputs to the original neural network model, obtains the corresponding initial training result of each training user.

During implementing step S203, news id that the corresponding user of each training user is clicked and The news id not clicked on is sequentially input to original neural network model, and shot and long term memory models lstm is selected to use as training Original neural network model.

It should be noted that when shot and long term memory models (long short-trem memory, lstm) are that one kind is based on Between sequence deep neural network, for handle and predicted time sequence in be spaced and postpone relatively long material time.

Step S204: the original neural network parameter is updated according to the initial training result and obtains prediction model.

It is updated according to the initial training result by back-propagation algorithm by executing above-mentioned steps S201- step S204 The original neural network parameter obtains prediction model.

It should be noted that back-propagation algorithm (back propagation algorithm, bpa) often is used to train Multi-layer perception (MLP), the back-propagation algorithm bpa is mainly propagated by excitation and weight updates iterative cycles iteration, until input Until data response reaches scheduled target zone.

In the concrete realization, include multiple process layers in the prediction model of building, include at least: embedding layers, two-way Lstm layers and predict layers.

It should be noted that the embedding layers of method according to matrix decomposition, passes through lstm shot and long term memory models The user characteristics are converted into the multi-C vector with symbolical meanings.Described two-way lstm layers are remembered mould using lstm shot and long term Type connects the mode training pattern of mlp multi-layer perception (MLP), and the news that the output result of mlp multi-layer perception (MLP) and user are clicked The id and news id not clicked on carries out cosine similarity calculating.User characteristics are inputted in predict layers described and user clicks newly It hears id, click on news id, it is similar with the cosine for the news id not clicked on to calculate the news id that user clicks according to prediction model Degree.

It should be noted that the embedding layers and two-way lstm layers training stage for this prediction model, predict layers For the forecast period of this prediction model.

Explanation is needed further exist for, it is subsequent to adjust the prediction as a result, can also be used as using what the prediction model obtained The parameter of model prediction accuracy carry out using.

The foundation initial training result being related in the step S204 that above-mentioned Fig. 2 is disclosed updates the original neural network ginseng Number obtain the process of prediction model, such as Fig. 3, show the flow diagram for updating original neural network parameter, specifically include as Lower step:

Step S301: it obtains training user and clicks news id sequence.

It should be noted that the training user, which clicks in news id sequence, clicks news id comprising multiple users, specifically The determination that training user clicks news id sequence is configured according to the actual situation by technical staff.

Step S302: the training user is clicked into news id sequence and passes through embedding layers of progress vectorization.

Step S303: the training user after vectorization is clicked into news id sequence and passes through lstm shot and long term memory models It is converted into multi-C vector.

Step S304: the multi-C vector is projected by mlp multi-layer perception (MLP), it is corresponding more to obtain user characteristics Dimensional vector.

Step S305: according to the corresponding multi-C vector of the user characteristics respectively with user click news id and do not click on News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, and pass through backpropagation calculation Method updates prediction model parameters.

Training user's click news id sequence is obtained by executing above-mentioned steps S301- step S305, by user's point It hits news id sequence to be projected to obtain the corresponding multi-C vector of user characteristics, according to the corresponding multi-C vector of the user characteristics The news id clicked respectively with user and the news id not clicked on carry out cosine similarity calculating, obtain corresponding to the multi-C vector Cosine similarity, and prediction model parameters are updated by back-propagation algorithm.

Based on the process of above-mentioned update prediction model parameters, citing is illustrated here:

It is id0-id19 that active user, which clicks news id sequence, news id20 that subsequent time user clicks and is not clicked on News id21 clicks news id sequence id0-id19 after embedding layers of vectorization as label, by the active user, 300 dimensional vectors are converted by lstm shot and long term memory models, 300 dimensional vector is thrown by mlp multi-layer perception (MLP) Shadow obtains corresponding 300 dimensional vector of user characteristics, and 300 dimensional vector is carried out cosine similarity meter with id20, id21 respectively It calculates, the second cosine similarity value for obtaining the first cosine similarity value of user's click and not clicking on passes through back-propagation algorithm Update prediction model parameters.

Step S103: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.

During executing step S103, news is clicked with the subsequent time user respectively according to the multi-C vector Id, the news id not clicked on carry out cosine similarity calculating, obtain corresponding first cosine similarity of the multi-C vector With the second cosine similarity.

It should be noted that according to the big of corresponding first cosine similarity of the multi-C vector and the second cosine similarity It is small, the corresponding news id of multi-C vector is determined, so that the interested news of user is obtained, by recalling the interested news of user So that user obtains the higher news of its interest-degree.

Step S104: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

During executing step S104, based on the size of the corresponding cosine similarity of the multi-C vector, You great Zhi It is small be ranked up after choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.

It should be noted that multi-C vector can be 300 dimensional vectors, it is also possible to 600 dimensional vectors etc., specifically by technology people Member is configured according to the actual situation.

It should be noted that the cosine similarity of multi-C vector is bigger, illustrate that the user the interested in current news.

It should be noted that the value of specific N is chosen according to the actual situation by technical staff.

The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and obtain news id that subsequent time user clicks and the news id that does not click on as label, based on setting in advance The prediction model set is trained user characteristics and label, and converts to user characteristics, generates corresponding user characteristics Multi-C vector, the news id progress cosine similarity meter that news id is clicked according to multi-C vector and subsequent time user, is not clicked on It calculates, the cosine similarity for obtaining corresponding multi-C vector is selected from large to small based on the size of the corresponding cosine similarity of multi-C vector N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.By the above method, generate the multidimensional of user characteristics to Amount, based on the size of the corresponding cosine similarity of multi-C vector, determines the corresponding news id of multi-C vector, it is interested to obtain user News, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.

Optionally, it is evaluated and tested by user click data true on line, is recalled using lstm shot and long term memory models The click-through-rate (click through rate, ctr) of news be higher than the click-through-rate recalled of traditional news.

It should be noted that ctr click-through-rate is hits/impression.

The embodiment of the present invention is recalled new by true user click data on line using lstm shot and long term memory models The click-through-rate of news is higher than the click-through-rate that traditional news is recalled, so that recalled by lstm shot and long term memory models The retention ratio of news and per capita read duration be improved.

Based on method described in Fig. 1, another kind news disclosed by the embodiments of the present invention recalls method, as shown in figure 4, specifically Include the following steps:

Step S401: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.

The implementation procedure of above-mentioned steps S401 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.

Step S402:, will be described based on lstm shot and long term memory models in the embeding layer of pre-set prediction model User characteristics are converted into multi-C vector.

During executing step S402, in the embeding layer of pre-set prediction model, according to matrix decomposition The user characteristics are converted into multi-C vector by lstm shot and long term memory models by method.

It is clicked in news id sequence it should be noted that applying lstm shot and long term memory models in user, filtering and use The incoherent news id of family interest, algorithmically selects the superior and eliminates the inferior, the interested news id of optimum selecting user.

Step S403: in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.

It should be noted that including: embeding layer in multi-layer perception (MLP) (multi layer perception, mlp), hide Layer and output layer.Connection is between layers in mlp multi-layer perception (MLP), upper one layer any one neuron and next layer of institute There is neuron to have connection.

It should be noted that multi-C vector is converted into after user characteristics to be input to the embeding layer of prediction model, by hidden The multi-C vector is transferred in output layer by hiding layer from input layer, in output layer, after the multi-C vector is projected, Generate the corresponding multi-C vector of user characteristics.

It should be noted that the user characteristics and the label are trained based on pre-set prediction model, It is trained in such a way that lstm shot and long term memory models connect mlp multi-layer perception (MLP).

Step S404: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.

Step S405: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

The implementation procedure of the implementation procedure of above-mentioned steps S404- step S405 and step S103- step S104 shown in fig. 1 It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.

The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, it is presetting Prediction model embeding layer in, the user characteristics are converted into multi-C vector based on lstm shot and long term memory models, pre- In the output layer for the prediction model being first arranged, the multi-C vector is projected based on mlp multi-layer perception (MLP), generates the use The corresponding multi-C vector of family feature, according to the multi-C vector and the subsequent time user click news id, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, it is corresponding based on the multi-C vector Cosine similarity size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector.It is logical The above method is crossed, the multi-C vector of user characteristics is generated, based on the size of the corresponding cosine similarity of multi-C vector, determines multidimensional The corresponding news id of vector, obtains the interested news of user, by recalling the interested news of user to realize that user obtains The higher news of its interest-degree.

It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in figure 5, specifically include as Lower step:

Step S501: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.

Step S502: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.

The implementation procedure of the implementation procedure of above-mentioned steps S501- step S502 and step S101- step S102 shown in fig. 1 It is identical, and implementation principle is also identical, reference can be made to, it is not discussed here.

Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S502, can also use The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.

Step S503: the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating, Obtain corresponding first cosine similarity of the multi-C vector.

It should be noted that first cosine similarity is the cosine similarity that user clicks news id.According to obtaining Corresponding first cosine similarity of the multi-C vector is updated the parameter of model by back-propagation algorithm.

Step S504: the news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, Obtain corresponding second cosine similarity of the multi-C vector.

It should be noted that second cosine similarity is the cosine similarity that user does not click on news id.According to The parameter of model is updated by back-propagation algorithm to corresponding second cosine similarity of the multi-C vector.

Step S505: the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

The implementation procedure of above-mentioned steps S505 is identical as the implementation procedure of step S104 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.

The embodiment of the present invention recalls method by news disclosed above, clicks news id sequence conduct by obtaining user User characteristics, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on described more The size of the corresponding cosine similarity of dimensional vector, chooses N number of multi-C vector from large to small, determines that N number of multi-C vector is corresponding News id, the multi-C vector and the subsequent time user click news id and carry out cosine similarity calculating, obtain described The news id that corresponding first cosine similarity of multi-C vector, the multi-C vector and the subsequent time do not click on is carried out Cosine similarity calculates, and obtains corresponding second cosine similarity of the multi-C vector, is based on pre-set prediction mould Type is trained the user characteristics and the label, and converts to the user characteristics, generates the corresponding user The multi-C vector of feature.By the above method, the multi-C vector of user characteristics is generated, it is similar based on the corresponding cosine of multi-C vector The size of degree determines the corresponding news id of multi-C vector, obtains the interested news of user, interested new by recalling user It hears to realize that user obtains the higher news of its interest-degree.

It is disclosed by the embodiments of the present invention another kind news recall the flow diagram of method, as shown in fig. 6, specifically include as Lower step:

Step S601: it obtains user and clicks news id sequence as user characteristics, and obtain subsequent time user and click News id and the news id that does not click on as label.

The implementation procedure of above-mentioned steps S601 is identical as the implementation procedure of step S101 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.

Step S602: the user characteristics and the label are trained based on pre-set prediction model, and right The user characteristics are converted, and the multi-C vector of the corresponding user characteristics is generated.

The implementation procedure of above-mentioned steps S602 is identical as the implementation procedure of step S102 shown in fig. 1, and implementation principle It is identical, reference can be made to, it is not discussed here.

Optionally, the mode that the multi-C vector of the corresponding user characteristics is specifically generated in step S602, can also use The executive mode of step S402- step S403 disclosed in Fig. 4 is realized.

Step S603: the news id for clicking news id according to the multi-C vector and the subsequent time user, not clicking on Cosine similarity calculating is carried out, the cosine similarity for corresponding to the multi-C vector is obtained.

Optionally, the mode for specifically obtaining corresponding to the cosine similarity of the multi-C vector in step S603, can also adopt The executive mode of the step S503- step S504 disclosed in Fig. 5 is realized.

Step S604: the size of the corresponding cosine similarity of the multi-C vector is judged, when the multi-C vector The size of corresponding cosine similarity is judged, when the corresponding cosine similarity value of the multi-C vector is greater than preset threshold When, it determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine N number of multi-C vector Corresponding news id, wherein the value of N is the positive integer more than or equal to 2.

It should be noted that judging in the size to the corresponding cosine similarity of the multi-C vector, may go out Existing cosine similarity value is equal and maximum or cosine similarity value is greater than multiple multi-C vectors of preset threshold, described pre- If threshold value according to the actual situation, determines the selection numerical value of optimal multi-C vector by technical staff.

It should be noted that the value of N can specifically be chosen to be multiple according to the actual situation.

A kind of news disclosed by the embodiments of the present invention recalls method, clicks news id sequence as user by obtaining user Feature, and the news id that subsequent time user clicks and the news id not clicked on are obtained as label, based on pre-set Prediction model is trained the user characteristics and the label, and converts to the user characteristics, generates corresponding institute The multi-C vector for stating user characteristics, according to the multi-C vector and the subsequent time user click news id, do not click on it is new It hears id and carries out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector, the multi-C vector is corresponding The size of cosine similarity is judged, when the size of the corresponding cosine similarity of the multi-C vector is judged, when described When the corresponding cosine similarity value of multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold, You great Zhi are determined It is small to choose N number of multi-C vector, determine the corresponding news id of N number of multi-C vector.By the above method, user characteristics are generated Multi-C vector is determined the corresponding news id of multi-C vector, is obtained user based on the size of the corresponding cosine similarity of multi-C vector Interested news, by recalling the interested news of user to realize that user obtains the higher news of its interest-degree.

Method specific implementation process is recalled based on above-mentioned news, citing is illustrated here:

For example, it is id0-id30 that active user, which clicks news sequence, the news id that subsequent user is clicked with do not click on it is new The label for hearing id is id31 and id32 respectively, in the embeding layer of pre-set prediction model, is remembered based on lstm shot and long term User is clicked news sequence id0-id30 and is trained with label id31 and label id31 by model, and clicks news sequence to user Column id0-id19 is converted, and 500 dimensional vectors are generated, and in the output layer of pre-set prediction model, is based on the sense of mlp multilayer Know that machine projects the multi-C vector, generate the corresponding multi-C vector of the user characteristics, by 500 dimensional vector and institute It states id31, id31 and carries out cosine similarity calculating, the cosine similarity of 500 dimensional vector is obtained, according to 500 dimensional vector Cosine similarity it is descending successively chosen, determine choose after the corresponding news id of 500 dimensional vectors, obtain user sense The news of interest.Ctr (hits/impression) value for the news that the news method of recalling provided through the embodiment of the present invention is recalled The ctr value of news is recalled higher than traditional conventional contents portrait, so that retention ratio and reading duration is obviously improved per capita.

Method is recalled based on a kind of news disclosed in the embodiments of the present invention, also correspondence of the embodiment of the present invention discloses one Kind news recalling system, as shown in fig. 7, the news recalling system 700 specifically includes that

Acquiring unit 701 clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time and use For the news id that family the is clicked and news id not clicked on as label, subsequent time is corresponding in user click news id sequence The newest moment for clicking news id determines.

Training converting unit 702, for based on pre-set prediction model to the user characteristics and the label into Row training, and the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics.

Computing unit 703, for according to the multi-C vector and the subsequent time user click news id, do not click on News id carry out cosine similarity calculating, obtain the cosine similarity for corresponding to the multi-C vector.

Determination unit 704 chooses N for the size based on the corresponding cosine similarity of the multi-C vector from large to small A multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

Further, the trained converting unit 702, as shown in Figure 8, comprising:

Module 801 is constructed, for constructing original neural network model.

Module 802 is obtained, the news sequence id that the corresponding user of the training user for obtaining preset number clicks With the news sequence id not clicked on.

Input module 803, news sequence id for clicking the corresponding user of each training user and does not click on News sequence id sequentially input to the original neural network model, obtain the corresponding initial training of each training user As a result.

Update module 804 is predicted for updating the original neural network parameter according to the initial training result Model.

Conversion module 805, for being based on lstm shot and long term memory models in the embeding layer of pre-set prediction model The user characteristics are converted into multi-C vector.

Projection module 806, in the output layer of pre-set prediction model, being based on mlp multi-layer perception (MLP) for institute It states multi-C vector to be projected, generates the corresponding multi-C vector of the user characteristics.

Further, the computing unit 703, as shown in Figure 9, comprising:

First computing module 901, more than the news id progress of the multi-C vector and subsequent time user click String similarity calculation obtains corresponding first cosine similarity of the multi-C vector.

Second computing module 902, the news id not clicked on for the multi-C vector and the subsequent time carry out cosine Similarity calculation obtains corresponding second cosine similarity of the multi-C vector.

Further, the determination unit 704, comprising: sorting module 1001 or judgment module 1002；

The sorting module 1001, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen Descending N number of multi-C vector determines the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2 Positive integer.

The judgment module 1002 judges for the size to the corresponding cosine similarity of the multi-C vector, when When the corresponding cosine similarity value of the multi-C vector is greater than preset threshold, the N number of multi-C vector for being greater than preset threshold is determined, by It is big to choose N number of multi-C vector to small, determine the corresponding news id of N number of multi-C vector, wherein the value of N is more than or equal to 2 Positive integer.

The specific principle of each unit and module and execution in news recalling system disclosed in the embodiments of the present invention It is identical to recall method with news disclosed in the embodiments of the present invention for process, reference can be made to new disclosed in the embodiments of the present invention News recalls corresponding part in method, is not discussed here.

Based on news recalling system disclosed in the embodiments of the present invention, above-mentioned each unit and module can pass through one kind The hardware device being made of processor and memory is realized.Specifically: above-mentioned each unit and module are stored as program unit In memory, above procedure unit stored in memory is executed by processor to realize that news is recalled.

Wherein, include kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can be set One or more realizes that news is recalled by adjusting kernel parameter.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.

Further, the embodiment of the invention provides a kind of processors, and the processor is for running program, wherein institute The news is executed when stating program operation to recall.

Equipment disclosed in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..

Further, the embodiment of the invention also provides a kind of storage medium, it is stored thereon with program, the program is processed Realize that news recalls method when device executes.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, apparatus or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims

1. a kind of news recalls method characterized by comprising

It obtains user and clicks news id sequence as user characteristics, and obtain news id and non-point that subsequent time user clicks For the news id hit as label, subsequent time is that the user clicks the corresponding newest moment for clicking news id in news id sequence It determines；

The user characteristics and the label are trained based on pre-set prediction model, and to the user characteristics into Row conversion generates the multi-C vector of the corresponding user characteristics；

The news id progress cosine similarity clicking news id according to the multi-C vector and the subsequent time user, not clicking on It calculates, obtains the cosine similarity for corresponding to the multi-C vector；

Based on the size of the corresponding cosine similarity of the multi-C vector, N number of multi-C vector is chosen from large to small, determines the N The corresponding news id of a multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

2. the method according to claim 1, wherein the setting up procedure of the pre-set prediction model, packet It includes:

Construct original neural network model；

News id that the corresponding user of each training user clicks and the news id not clicked on are sequentially input to the original Beginning neural network model obtains the corresponding initial training result of each training user；

3. the method according to claim 1, wherein described be based on pre-set prediction model to the user Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to Amount, comprising:

In the embeding layer of pre-set prediction model, the user characteristics are converted into based on lstm shot and long term memory models Multi-C vector；

In the output layer of pre-set prediction model, the multi-C vector is projected based on mlp multi-layer perception (MLP), it is raw At the corresponding multi-C vector of the user characteristics.

4. the method according to claim 1, wherein described use according to the multi-C vector and the subsequent time The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector Like degree, comprising:

The news id that the multi-C vector and the subsequent time user click carries out cosine similarity calculating, obtains the multidimensional Corresponding first cosine similarity of vector；

The news id that the multi-C vector and the subsequent time do not click on carries out cosine similarity calculating, obtain the multidimensional to Measure corresponding second cosine similarity.

5. the method according to claim 1, wherein described be based on the corresponding cosine similarity of the multi-C vector Size, choose N number of multi-C vector from large to small, determine the corresponding news id of N number of multi-C vector, comprising:

The size of cosine similarity based on the multi-C vector is ranked up, and chooses descending N number of multi-C vector, is determined The corresponding news id of N number of multi-C vector；

Or

The size of the corresponding cosine similarity of the multi-C vector is judged, when the corresponding cosine of the multi-C vector is similar It when angle value is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, chooses N number of multi-C vector from large to small, determine The corresponding news id of N number of multi-C vector.

6. a kind of news recalling system characterized by comprising

Acquiring unit clicks news id sequence as user characteristics for obtaining user, and obtains subsequent time user and click News id and the news id that does not click on as label, subsequent time is that the user clicks corresponding in news id sequence click newly The newest moment for hearing id determines；

Training converting unit, for being trained based on pre-set prediction model to the user characteristics and the label, And the user characteristics are converted, generate the multi-C vector of the corresponding user characteristics；

Computing unit, for according to the multi-C vector and the news id of subsequent time user click, the news that does not click on Id carries out cosine similarity calculating, obtains the cosine similarity for corresponding to the multi-C vector；

Determination unit, for the size based on the corresponding cosine similarity of the multi-C vector, choose from large to small N number of multidimensional to Amount, determines the corresponding news id of N number of multi-C vector, wherein the value of N is the positive integer more than or equal to 2.

7. system according to claim 6, which is characterized in that described to be based on pre-set prediction model to the user Feature and the label are trained, and are converted to the user characteristics, generate the multidimensional of the corresponding user characteristics to The training converting unit of amount, comprising:

Conversion module, will be described in the embeding layer of pre-set prediction model, being based on lstm shot and long term memory models User characteristics are converted into multi-C vector；

Projection module, in the output layer of pre-set prediction model, based on mlp multi-layer perception (MLP) by the multidimensional to Amount is projected, and the corresponding multi-C vector of the user characteristics is generated.

8. system according to claim 6, which is characterized in that described to be used according to the multi-C vector and the subsequent time The news id of family click, the news id not clicked on carry out cosine similarity calculating, obtain the cosine phase for corresponding to the multi-C vector Like the computing unit of degree, comprising:

First computing module, the news id clicked for the multi-C vector and the subsequent time user carry out cosine similarity It calculates, obtains corresponding first cosine similarity of the multi-C vector；

Second computing module, the news id not clicked on by the multi-C vector and the subsequent time are carried out based on cosine similarity It calculates, obtains corresponding second cosine similarity of the multi-C vector.

9. system according to claim 8, which is characterized in that described to be based on the corresponding cosine similarity of the multi-C vector Size, choose N number of multi-C vector from large to small, determine the determination unit of the corresponding news id of N number of multi-C vector, wrap It includes: sorting module or judgment module；

The sorting module, the size for the cosine similarity based on the multi-C vector are ranked up, and are chosen descending N number of multi-C vector, determine the corresponding news id of N number of multi-C vector；

The judgment module judges for the size to the corresponding cosine similarity of the multi-C vector, when the multidimensional When the corresponding cosine similarity value of vector is greater than preset threshold, determines the N number of multi-C vector for being greater than preset threshold, select from large to small N number of multi-C vector is taken, determines the corresponding news id of N number of multi-C vector.