CN109871487B

CN109871487B - News recall method and system

Info

Publication number: CN109871487B
Application number: CN201910132210.9A
Authority: CN
Inventors: 安鸣佳
Original assignee: Beijing Sohu New Media Information Technology Co Ltd
Current assignee: Beijing Sohu New Media Information Technology Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2021-03-23
Anticipated expiration: 2039-02-22
Also published as: CN109871487A

Abstract

The invention discloses a news recall method and a system, wherein a news id sequence clicked by a user is obtained as a user characteristic, a news id clicked by the user at the next moment and a news id not clicked are obtained as labels, the user characteristic and the labels are trained on the basis of a preset prediction model to generate multidimensional vectors corresponding to the user characteristic, cosine similarity calculation is carried out according to the multidimensional vectors, the news id clicked by the user at the next moment and the news id not clicked to obtain cosine similarity of the corresponding multidimensional vectors, N multidimensional vectors are selected from large to small on the basis of the cosine similarity corresponding to the multidimensional vectors, and the news id corresponding to the N multidimensional vectors is determined. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the user can obtain the news with higher interest by recalling the news interesting to the user.

Description

News recall method and system

Technical Field

The invention relates to the technical field of deep learning, in particular to a news recall method and a news recall system.

Background

With the rapid development of information technology and the internet, the network news is more and more popular with people and becomes a main way for people to obtain information in daily life. News recalls are currently one of the ways people obtain information.

News recall work is an important work in the field of news recommendation, and the traditional news recall work is to recall by using keywords of titles and contents clicked by a user in the past and acquire related news of the keywords.

In the prior art, because news is large in size and large in total quantity of news characteristics, the obtained news information is not in accordance with the user interests in some cases through the traditional news information recalled by news.

Disclosure of Invention

In view of this, the present application provides a news recall method and system, which achieve the purpose that the recalled news meets the user interests.

In order to achieve the above object, the following solutions are proposed:

the invention discloses a news recall method in a first aspect, which comprises the following steps:

acquiring a user click news id sequence as a user characteristic, acquiring a next moment of user click news id and an un-click news id as tags, and determining the next moment as the latest moment of the corresponding click news id in the user click news id sequence;

training the user features and the labels based on a preset prediction model, converting the user features, and generating a multi-dimensional vector corresponding to the user features;

according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked, cosine similarity calculation is carried out, and cosine similarity corresponding to the multidimensional vector is obtained;

and selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2.

Preferably, the setting process of the preset prediction model includes:

constructing an original neural network model;

acquiring user clicked news id and non-clicked news id which respectively correspond to a preset number of training users;

sequentially inputting the clicked news id and the non-clicked news id of the user corresponding to each training user to the original neural network model to obtain an initial training result corresponding to each training user;

and updating the original neural network parameters according to the initial training result to obtain a prediction model.

Preferably, the training the user features and the labels based on a preset prediction model, and converting the user features to generate a multidimensional vector corresponding to the user features includes:

in an embedded layer of a preset prediction model, converting the user features into multidimensional vectors based on an lstm long-term short-term memory model;

and in an output layer of a preset prediction model, projecting the multi-dimensional vector based on mlp a multilayer perceptron to generate a multi-dimensional vector corresponding to the user feature.

Preferably, the calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the non-clicked news id to obtain the cosine similarity corresponding to the multidimensional vector includes:

performing cosine similarity calculation on the multidimensional vectors and the news id clicked by the user at the next moment to obtain first cosine similarities corresponding to the multidimensional vectors respectively;

and performing cosine similarity calculation on the multidimensional vectors and the news id which is not clicked at the next moment to obtain second cosine similarity corresponding to the multidimensional vectors.

Preferably, the selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vector, and determining the news id corresponding to the N multidimensional vectors includes:

sorting based on the cosine similarity of the multi-dimensional vectors, selecting N multi-dimensional vectors from large to small, and determining news ids corresponding to the N multi-dimensional vectors;

or

And judging the magnitude of cosine similarity corresponding to the multidimensional vectors, determining N multidimensional vectors which are larger than a preset threshold when the cosine similarity corresponding to the multidimensional vectors is larger than the preset threshold, selecting the N multidimensional vectors from large to small, and determining news ids corresponding to the N multidimensional vectors.

A second aspect of the present invention discloses a news recall system, including:

the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a user click news id sequence as a user characteristic, acquiring a next moment of a user click news id and an un-click news id as tags, and determining the next moment as the latest moment of the corresponding click news id in the user click news id sequence;

the training conversion unit is used for training the user characteristics and the labels based on a preset prediction model, and converting the user characteristics to generate a multi-dimensional vector corresponding to the user characteristics;

the computing unit is used for performing cosine similarity computation according to the multi-dimensional vector, the news id clicked by the user at the next moment and the non-clicked news id to obtain cosine similarity corresponding to the multi-dimensional vector;

and the determining unit is used for selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2.

Preferably, the training conversion unit that trains the user feature and the label based on a preset prediction model, and converts the user feature to generate a multidimensional vector corresponding to the user feature includes:

the conversion module is used for converting the user features into multidimensional vectors based on an lstm long-term short-term memory model in a preset embedding layer of a prediction model;

and the projection module is used for projecting the multidimensional vector based on mlp multilayer perceptrons in an output layer of a preset prediction model to generate the multidimensional vector corresponding to the user characteristic.

Preferably, the calculating unit that calculates the cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment, and the news id not clicked, to obtain the cosine similarity corresponding to the multidimensional vector includes:

the first calculation module is used for calculating cosine similarity between the multidimensional vector and the news id clicked by the user at the next moment to obtain first cosine similarity corresponding to the multidimensional vector;

and the second calculation module is used for performing cosine similarity calculation on the multidimensional vector and the next-time non-clicked news id to obtain second cosine similarities corresponding to the multidimensional vector.

Preferably, the determining unit that selects N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors and determines the news id corresponding to the N multidimensional vectors includes: a sorting module or a judging module;

the sorting module is used for sorting based on the cosine similarity of the multi-dimensional vectors, selecting N multi-dimensional vectors from large to small, and determining news ids corresponding to the N multi-dimensional vectors;

the judgment module is used for judging the cosine similarity corresponding to the multidimensional vectors, when the cosine similarity corresponding to the multidimensional vectors is larger than a preset threshold, determining N multidimensional vectors larger than the preset threshold, selecting the N multidimensional vectors from large to small, and determining news ids corresponding to the N multidimensional vectors.

According to the technical scheme, a news id sequence clicked by a user is obtained as a user characteristic, a news id clicked by the user at the next moment and a news id not clicked are obtained as labels, the user characteristic and the labels are trained based on a preset prediction model, the user characteristic is converted to generate a multidimensional vector corresponding to the user characteristic, cosine similarity calculation is carried out according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked to obtain cosine similarity of the corresponding multidimensional vector, N multidimensional vectors are selected from large to small based on the cosine similarity corresponding to the multidimensional vector, and news ids corresponding to the N multidimensional vectors are determined. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the news interesting to the user is recalled to enable the user to obtain the news with higher interest.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a news recall method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a predictive model setup process according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating updating of primitive neural network parameters according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating another news recall method according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating another news recall method according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating another news recall method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a news recall system according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a training conversion unit of the news recall system according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a computing unit of a news recall system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As can be seen from the background art, in the prior art, a news recall is performed by using keywords of titles and contents clicked by a user in the past, and obtaining news related to the keywords, and the obtained news information is somewhat not in accordance with the user's interest by recalling the news information through a news recall method. Therefore, the invention discloses a news recall method, which comprises the steps of generating a multidimensional vector of user characteristics, determining news id corresponding to the multidimensional vector based on the cosine similarity corresponding to the multidimensional vector to obtain news interested by a user, and recalling the news interested by the user to realize that the user obtains the news with higher interest.

As shown in fig. 1, a schematic flow chart of a news recall method disclosed in the embodiment of the present invention specifically includes the following steps:

step S101: and acquiring a user click news id sequence as a user characteristic, and acquiring a next moment user click news id and an un-click news id as tags.

In the process of executing step S101, since the user behavior is greatly influenced by timeliness, the selected user click sequence is a news id clicked by the user most recently as a user characteristic, and a news id clicked by the user at the next time of the click sequence and an un-clicked news id are used as tags.

It should be noted that the user clicked news id sequence is determined after the news content title clicked by the user, the number of the user clicked news ids may be multiple, and the number is specifically set by a technician according to actual conditions.

It should be noted that the news id clicked by the user is in the news id set clicked by the user, and the number of the news ids clicked by the user may be multiple, and the number is specifically selected by a technician according to actual situations.

It should be noted that the next time is determined for the latest time corresponding to the click news id in the user click news id sequence.

Based on the above implementation process of obtaining the news id clicked and the news id not clicked by the user at the next time, the following example is given here:

for example, at time 15, time 02, the news id15 clicked by the user is used as the user characteristic, and at time 15, time 03, the news id16 clicked by the user and the news id17 not clicked are used as tags.

It should be noted that the news id not clicked by the user is in the news id set not clicked by the user, and the id of the news not clicked by the user is any one of the news id sets not clicked by the user, and is specifically selected by a technician according to an actual situation.

Step S102: training the user features and the labels based on a preset prediction model, converting the user features, and generating a multi-dimensional vector corresponding to the user features.

In the process of executing step S102, the user features and the labels are trained through a preset prediction model, and the user features are converted into multidimensional vectors corresponding to the user features. The prediction model is obtained by training an original neural network model based on sample data.

The setting process of the preset prediction model related to step S102 disclosed in fig. 1, as shown in fig. 2, shows a flow diagram of the setting process of the prediction model, and specifically includes the following steps:

step S201: and constructing an original neural network model.

Step S202: and acquiring user clicked news id and non-clicked news id which correspond to preset number of training users respectively.

It should be noted that the training users may be users of different ages, users of different genders, users of different interests, and the like, and the determination of the specific training user is selected by the technical staff according to the actual situation.

Step S203: and sequentially inputting the user clicked news id and the user un-clicked news id corresponding to each training user to the original neural network model to obtain an initial training result corresponding to each training user.

In the process of implementing step S203 specifically, the clicked news id and the un-clicked news id of the user corresponding to each training user are sequentially input to the original neural network model, and the long-short term memory model lstm is selected as the original neural network model used for training.

It should be noted that the long-short term memory model (lstm) is a time-series-based deep neural network, and is used for processing and predicting important time with relatively long interval and delay in the time series.

Step S204: and updating the original neural network parameters according to the initial training result to obtain a prediction model.

And updating the original neural network parameters through a back propagation algorithm according to the initial training result by executing the steps S201 to S204 to obtain a prediction model.

It should be noted that a back propagation algorithm (bpa) is often used to train the multi-layered perceptron, and the back propagation algorithm bpa is iterated by repeating the loop mainly including the excitation propagation and the weight update until the input data response reaches a predetermined target range.

In a specific implementation, the constructed prediction model includes a plurality of processing layers, including at least: an embedding layer, a bidirectional lstm layer and a prediction layer.

It should be noted that the embedding layer converts the user features into a multidimensional vector with a characteristic meaning through an lstm long-short term memory model according to a matrix decomposition method. The bidirectional lstm layer trains a model by connecting an lstm long-short term memory model with mlp multi-layer perceptrons, and cosine similarity calculation is carried out on output results of mlp multi-layer perceptrons, news id clicked by a user and news id not clicked by the user. And inputting user characteristics, a user click news id and an un-click news id in the prediction layer, and calculating the cosine similarity of the user click news id and the un-click news id according to the prediction model.

It should be noted that the embedding layer and the bidirectional lstm layer are training stages of the prediction model, and the predict layer is a prediction stage of the prediction model.

It should be noted that the result obtained by using the prediction model may be used as a parameter for adjusting the prediction accuracy of the prediction model.

The process of updating the original neural network parameters to obtain the prediction model according to the initial training result, which is disclosed in step S204 of fig. 2, is as shown in fig. 3, which shows a schematic flow chart of updating the original neural network parameters, and specifically includes the following steps:

step S301: and acquiring a news id sequence clicked by a training user.

It should be noted that the training user click news id sequence includes a plurality of user click news ids, and the determination of the specific training user click news id sequence is set by a technician according to actual conditions.

Step S302: and vectorizing the news id sequence clicked by the training user through an embedding layer.

Step S303: and converting the vectorized training user click news id sequence into a multi-dimensional vector through an lstm long-short term memory model.

Step S304: and projecting the multi-dimensional vectors through mlp multilayer perceptrons to obtain the multi-dimensional vectors corresponding to the user characteristics.

Step S305: and respectively carrying out cosine similarity calculation on the multidimensional vector corresponding to the user characteristics, the news id clicked by the user and the news id not clicked by the user to obtain cosine similarity corresponding to the multidimensional vector, and updating the prediction model parameters through a back propagation algorithm.

The method comprises the steps of S301-S305, obtaining a training user click news id sequence, projecting the user click news id sequence to obtain a multidimensional vector corresponding to user characteristics, calculating cosine similarity between the multidimensional vector corresponding to the user characteristics and the user click news id and the non-click news id respectively to obtain the cosine similarity corresponding to the multidimensional vector, and updating a prediction model parameter through a back propagation algorithm.

Based on the above process of updating the prediction model parameters, the following is exemplified here:

the current user clicked news id sequence is id0-id19, the next time user clicked news id20 and the next time user clicked news id21 are used as labels, the current user clicked news id sequence id0-id19 is quantized through an imbedding layer, then converted into a 300-dimensional vector through an lstm long short-term memory model, the 300-dimensional vector is projected through an mlp multi-layer perceptron to obtain a 300-dimensional vector corresponding to the user characteristics, cosine similarity calculation is carried out on the 300-dimensional vector and the id20 and the id21 respectively to obtain a first cosine similarity value clicked by the user and a second cosine similarity value not clicked, and prediction model parameters are updated through a back propagation algorithm.

Step S103: and calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked, and obtaining the cosine similarity corresponding to the multidimensional vector.

In the process of executing step S103, cosine similarity calculation is performed according to the multidimensional vector and the next-time user clicked news id and the next-time user unchecked news id, so as to obtain a first cosine similarity and a second cosine similarity corresponding to the multidimensional vector.

It should be noted that, according to the first cosine similarity and the second cosine similarity corresponding to the multidimensional vector, the news id corresponding to the multidimensional vector is determined, so that the news which the user is interested in is obtained, and the news which the user is interested in is recalled, so that the user obtains the news with higher interest.

Step S104: and selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2.

In the process of executing step S104, based on the magnitude of cosine similarity corresponding to the multidimensional vectors, N multidimensional vectors are selected after being sorted from large to small, and news ids corresponding to the N multidimensional vectors are determined.

The multidimensional vector may be a 300-dimensional vector, a 600-dimensional vector, or the like, and is set by a skilled person in accordance with actual circumstances.

It should be noted that the larger the cosine similarity of the multidimensional vector, the more interesting the user is to the current news.

It should be noted that the specific value of N is selected by a skilled person according to an actual situation.

According to the news recall method disclosed by the embodiment of the invention, a news id sequence clicked by a user is obtained as a user characteristic, a news id clicked by the user at the next moment and a news id not clicked are obtained as labels, the user characteristic and the labels are trained based on a preset prediction model, the user characteristic is converted to generate a multidimensional vector corresponding to the user characteristic, cosine similarity calculation is carried out according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked, cosine similarity of the corresponding multidimensional vector is obtained, N multidimensional vectors are selected from large to small based on the size of the cosine similarity corresponding to the multidimensional vector, and the news id corresponding to the N multidimensional vectors is determined. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the news interesting to the user is recalled to enable the user to obtain the news with higher interest.

Optionally, the evaluation is performed through real online user click data, and the click through rate (ctr) of the news recalled by using the lstm long-short term memory model is higher than that of the traditional news recall.

Note that the ctr click passage rate is the number of clicks/the number of exposures.

According to the embodiment of the invention, through real online user click data, the click through rate of the news recalled by adopting the lstm long and short term memory model is higher than that of the traditional news recall, so that the retention rate and the average reading time of the news recalled by the lstm long and short term memory model are improved.

Based on the method described in fig. 1, another news recall method disclosed in the embodiment of the present invention, as shown in fig. 4, specifically includes the following steps:

step S401: and acquiring a user click news id sequence as a user characteristic, and acquiring a next moment user click news id and an un-click news id as tags.

The execution process of step S401 is the same as the execution process of step S101 shown in fig. 1, and the execution principle is also the same, which can be referred to and is not described herein again.

Step S402: in the preset embedding layer of the prediction model, the user features are converted into multi-dimensional vectors based on an lstm long-short term memory model.

In the process of performing step S402, the user features are converted into multidimensional vectors through an lstm long-short term memory model according to a matrix decomposition method in an embedded layer of a preset prediction model.

It should be noted that the lstm long and short term memory model is applied to the sequence of the user clicked news ids, news ids irrelevant to the user interests are filtered, and news ids which are interesting to the user are selected preferentially.

Step S403: and in an output layer of a preset prediction model, projecting the multi-dimensional vector based on mlp a multilayer perceptron to generate a multi-dimensional vector corresponding to the user feature.

Note that the multi layer perception engine (mlp) includes: an embedding layer, a hiding layer and an output layer. mlp the connection between layers of the multi-layer perceptron is that any neuron in the upper layer is connected with all neurons in the lower layer.

It should be noted that, the user features are input into the embedding layer of the prediction model and then converted into multidimensional vectors, the multidimensional vectors are transmitted from the input layer to the output layer through the hidden layer, and in the output layer, the multidimensional vectors are projected to generate multidimensional vectors corresponding to the user features.

It should be noted that the user features and the labels are trained based on a preset prediction model, and the training is performed by connecting mlp a multi-layer perceptron with an lstm long-short term memory model.

Step S404: and calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked, and obtaining the cosine similarity corresponding to the multidimensional vector.

Step S405: and selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2.

The execution process of the above steps S404 to S405 is the same as the execution process of the steps S103 to S104 shown in fig. 1, and the execution principle is also the same, which can be referred to and is not described again here.

The embodiment of the invention adopts the news recall method disclosed above, and obtains the news id sequence clicked by the user as the user characteristic, and acquiring a news id clicked by the user and an un-clicked news id at the next moment as tags, in the preset embedding layer of the prediction model, the user features are converted into multi-dimensional vectors based on an lstm long-short term memory model, projecting the multidimensional vectors in an output layer of a preset prediction model based on mlp a multilayer perceptron to generate multidimensional vectors corresponding to the user features, and performing cosine similarity calculation according to the multidimensional vectors, the news id clicked by the user at the next moment and the news id not clicked to obtain cosine similarity corresponding to the multidimensional vectors, selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining the news id corresponding to the N multidimensional vectors. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the news interesting to the user is recalled to enable the user to obtain the news with higher interest.

As shown in fig. 5, another flow diagram of a news recall method disclosed in the embodiment of the present invention specifically includes the following steps:

step S501: and acquiring a user click news id sequence as a user characteristic, and acquiring a next moment user click news id and an un-click news id as tags.

Step S502: training the user features and the labels based on a preset prediction model, converting the user features, and generating a multi-dimensional vector corresponding to the user features.

The execution process of the above steps S501 to S502 is the same as the execution process of the steps S101 to S102 shown in fig. 1, and the execution principle is also the same, which can be referred to and is not described again here.

Optionally, the manner of specifically generating the multidimensional vector corresponding to the user feature in step S502 may also be implemented by using the execution manners of step S402 to step S403 disclosed in fig. 4.

Step S503: and the multidimensional vector and the news id clicked by the user at the next moment are subjected to cosine similarity calculation to obtain first cosine similarities corresponding to the multidimensional vector.

It should be noted that the first cosine similarity is a cosine similarity of a news id clicked by a user. And updating parameters of the model through a back propagation algorithm according to the obtained first cosine similarity corresponding to the multidimensional vector.

Step S504: and performing cosine similarity calculation on the multidimensional vectors and the news id which is not clicked at the next moment to obtain second cosine similarity corresponding to the multidimensional vectors.

It should be noted that the second cosine similarity is a cosine similarity of a news id that is not clicked by the user. And updating the parameters of the model through a back propagation algorithm according to the obtained second cosine similarity corresponding to the multidimensional vector.

Step S505: and selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2.

The execution process of step S505 is the same as the execution process of step S104 shown in fig. 1, and the execution principle is also the same, which can be referred to and is not described herein again.

The embodiment of the invention selects N multidimensional vectors from large to small by acquiring a news id sequence clicked by a user as a user characteristic and acquiring a news id clicked by the user at the next moment and an un-clicked news id as a label based on the magnitude of cosine similarity corresponding to the multidimensional vectors, determines the news ids corresponding to the N multidimensional vectors, calculates cosine similarity between the multidimensional vectors and the news id clicked by the user at the next moment to obtain first cosine similarity corresponding to the multidimensional vectors respectively, calculates cosine similarity between the multidimensional vectors and the news id not clicked at the next moment to obtain second cosine similarity corresponding to the multidimensional vectors respectively, trains the user characteristic and the label based on a preset prediction model, and converts the user characteristic, and generating a multi-dimensional vector corresponding to the user characteristics. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the news interesting to the user is recalled to enable the user to obtain the news with higher interest.

As shown in fig. 6, another flow diagram of a news recall method disclosed in the embodiment of the present invention specifically includes the following steps:

step S601: and acquiring a user click news id sequence as a user characteristic, and acquiring a next moment user click news id and an un-click news id as tags.

The execution process of step S601 is the same as the execution process of step S101 shown in fig. 1, and the execution principle is also the same, which can be referred to herein, and is not described again.

Step S602: training the user features and the labels based on a preset prediction model, converting the user features, and generating a multi-dimensional vector corresponding to the user features.

The execution process of step S602 is the same as the execution process of step S102 shown in fig. 1, and the execution principle is also the same, which can be referred to herein, and is not described again.

Optionally, the manner of specifically generating the multidimensional vector corresponding to the user feature in step S602 may also be implemented by using the execution manners of step S402 to step S403 disclosed in fig. 4.

Step S603: and calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked, and obtaining the cosine similarity corresponding to the multidimensional vector.

Optionally, the way of specifically obtaining the cosine similarity corresponding to the multi-dimensional vector in step S603 may also be implemented by using the execution ways of step S503 to step S504 disclosed in fig. 5.

Step S604: and judging the cosine similarity corresponding to the multidimensional vectors, when the cosine similarity corresponding to the multidimensional vectors is larger than a preset threshold, determining N multidimensional vectors larger than the preset threshold, selecting N multidimensional vectors from the largest to the smallest, and determining the news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer larger than or equal to 2.

It should be noted that when the cosine similarity corresponding to the multidimensional vector is determined, there may be a plurality of multidimensional vectors with the same cosine similarity and the largest cosine similarity or with the cosine similarity greater than a preset threshold, and the preset threshold is determined by a technician according to an actual situation, which is an optimal value of the multidimensional vector.

It should be noted that a plurality of values of N may be provided, and may be specifically selected according to actual situations.

The embodiment of the invention discloses a news recall method, which comprises the steps of obtaining a news id sequence clicked by a user as a user characteristic, obtaining a news id clicked by the user at the next moment and a news id not clicked by the user as a label, training the user characteristic and the label based on a preset prediction model, converting the user characteristic to generate a multidimensional vector corresponding to the user characteristic, calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the news id not clicked to obtain the cosine similarity corresponding to the multidimensional vector, judging the cosine similarity corresponding to the multidimensional vector, when the cosine similarity corresponding to the multidimensional vector is larger than a preset threshold, determining N multidimensional vectors larger than the preset threshold, and selecting N multidimensional vectors from big to small, and determining news id corresponding to the N multidimensional vectors. By the method, the multidimensional vector of the user characteristics is generated, the news id corresponding to the multidimensional vector is determined based on the cosine similarity corresponding to the multidimensional vector, the news interesting to the user is obtained, and the news interesting to the user is recalled to enable the user to obtain the news with higher interest.

Based on the above-mentioned news recall method, a specific implementation process is described here by way of example:

for example, the current news clicking sequence of the user is id0-id30, the tags of the news id clicked by the subsequent user and the news id not clicked are id31 and id32 respectively, in the embedded layer of the preset prediction model, the news clicking sequence id0-id30, the tag id31 and the tag id31 of the user are trained based on the lstm long-short term memory model, the news clicking sequence id0-id19 of the user is converted to generate a 500-dimensional vector, in the output layer of the preset prediction model, the multidimensional vector corresponding to the user characteristics is generated by projecting the multidimensional vector based on mlp multi-layer perceptron, cosine similarity calculation is performed on the 500-dimensional vector and the ids 31 and 31 to obtain cosine similarity of the 500-dimensional vector, the cosine similarity of the 500-dimensional vector is sequentially selected from large to small according to the cosine similarity of the 500-dimensional vector, the news id corresponding to the 500-dimensional vector is determined, news of interest to the user is obtained. The ctr (click number/exposure number) value of news recalled by the news recall method provided by the embodiment of the invention is higher than the ctr value of the traditional content portrait recall news, so that the retention rate and the average reading time of people are obviously improved.

Based on the above-mentioned news recall method disclosed in the embodiment of the present invention, the embodiment of the present invention also correspondingly discloses a news recall system, as shown in fig. 7, the news recall system 700 mainly includes:

the obtaining unit 701 is configured to obtain a user click news id sequence as a user feature, and obtain a next time, where the next time is determined by a latest time of a corresponding click news id in the user click news id sequence, where the next time is a news id clicked by the user and a news id not clicked is used as a tag.

A training conversion unit 702, configured to train the user feature and the label based on a preset prediction model, and convert the user feature to generate a multidimensional vector corresponding to the user feature.

And the calculating unit 703 is configured to perform cosine similarity calculation according to the multidimensional vector, the news id clicked by the user at the next moment, and the news id not clicked, so as to obtain cosine similarity corresponding to the multidimensional vector.

A determining unit 704, configured to select N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vector, and determine a news id corresponding to the N multidimensional vectors, where a value of N is a positive integer greater than or equal to 2.

Further, the training conversion unit 702, as shown in fig. 8, includes:

and the building module 801 is used for building an original neural network model.

The obtaining module 802 is configured to obtain a user clicked news sequence id and an un-clicked news sequence id respectively corresponding to a preset number of training users.

An input module 803, configured to sequentially input the clicked news sequence id and the unchecked news sequence id of the user corresponding to each training user to the original neural network model, so as to obtain an initial training result corresponding to each training user.

And the updating module 804 is configured to update the original neural network parameters according to the initial training result to obtain a prediction model.

A conversion module 805, configured to convert the user features into multidimensional vectors based on an lstm long-short term memory model in a preset embedding layer of the prediction model.

And a projection module 806, configured to project, in an output layer of a preset prediction model, the multidimensional vector based on mlp a multi-layer perceptron, so as to generate a multidimensional vector corresponding to the user feature.

Further, the calculating unit 703, as shown in fig. 9, includes:

a first calculating module 901, configured to perform cosine similarity calculation on the multidimensional vector and the news id clicked by the user at the next time to obtain first cosine similarities corresponding to the multidimensional vector.

A second calculating module 902, configured to perform cosine similarity calculation on the multidimensional vector and the next-time non-clicked news id to obtain second cosine similarities corresponding to the multidimensional vector.

Further, the determining unit 704 includes: a sorting module 1001 or a judging module 1002;

the sorting module 1001 is configured to sort based on the magnitude of cosine similarity of the multidimensional vectors, select N multidimensional vectors from large to small, and determine news ids corresponding to the N multidimensional vectors, where a value of N is a positive integer greater than or equal to 2.

The determining module 1002 is configured to determine the magnitude of cosine similarity corresponding to the multidimensional vector, determine, when the cosine similarity value corresponding to the multidimensional vector is greater than a preset threshold, N multidimensional vectors that are greater than the preset threshold, select N multidimensional vectors from the largest to the smallest, and determine a news id corresponding to the N multidimensional vectors, where a value of N is a positive integer greater than or equal to 2.

The specific principle and the implementation process of each unit and module in the news recall system disclosed in the embodiment of the present invention are the same as those of the news recall method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the news recall method disclosed in the embodiment of the present invention, which are not described herein again.

Based on the news recall system disclosed in the embodiment of the present invention, the units and modules may be implemented by a hardware device including a processor and a memory. The method specifically comprises the following steps: the units and modules are stored in the memory as program units, and the processor executes the program units stored in the memory to realize news recall.

The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can set one or more, and the news recall is realized by adjusting the kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

Further, an embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the news recall when running.

The equipment disclosed in the embodiment of the invention can be a server, a PC, a PAD, a mobile phone and the like.

Further, an embodiment of the present invention provides a storage medium having a program stored thereon, where the program is executed by a processor to implement a news recall method.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A news recall method, comprising:

selecting N multidimensional vectors from big to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news id corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2;

wherein, the setting process of the preset prediction model comprises the following steps:

constructing an original neural network model;

and updating the parameters of the original neural network model according to the initial training result to obtain a prediction model.

2. The method according to claim 1, wherein the training the user features and the labels based on a preset prediction model and the converting the user features to generate a multidimensional vector corresponding to the user features comprises:

3. The method according to claim 1, wherein the calculating cosine similarity according to the multidimensional vector, the news id clicked by the user at the next moment and the non-clicked news id to obtain cosine similarity corresponding to the multidimensional vector comprises:

4. The method according to claim 1, wherein the selecting N multidimensional vectors from large to small based on the magnitude of the cosine similarity corresponding to the multidimensional vector, and determining the news id corresponding to the N multidimensional vectors comprises:

or

5. A news recall system, comprising:

the determining unit is used for selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors, and determining news ids corresponding to the N multidimensional vectors, wherein the value of N is a positive integer greater than or equal to 2;

wherein the training conversion unit comprises:

the building module is used for building an original neural network model;

the acquisition module is used for acquiring user clicked news sequence ids and non-clicked news sequence ids which correspond to a preset number of training users respectively;

the input module is used for sequentially inputting the clicked news sequence id and the non-clicked news sequence id of the user corresponding to each training user into the original neural network model to obtain an initial training result corresponding to each training user;

and the updating module is used for updating the parameters of the original neural network model according to the initial training result to obtain a prediction model.

6. The system according to claim 5, wherein the training transformation unit for training the user features and the labels based on a preset prediction model and transforming the user features to generate the multidimensional vector corresponding to the user features comprises:

7. The system according to claim 5, wherein the calculating unit for performing cosine similarity calculation according to the multidimensional vector, the news id clicked by the user at the next moment and the non-clicked news id to obtain cosine similarity corresponding to the multidimensional vector comprises:

8. The system according to claim 5, wherein the determining unit for determining the news id corresponding to the N multidimensional vectors by selecting N multidimensional vectors from large to small based on the cosine similarity corresponding to the multidimensional vectors comprises: a sorting module or a judging module;