CN110765359A

CN110765359A - New media content recommendation method and system

Info

Publication number: CN110765359A
Application number: CN201911044661.3A
Authority: CN
Inventors: 范锋
Original assignee: Beijing Fast Network Technology Co Ltd
Current assignee: Beijing Fast Network Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-07
Anticipated expiration: 2039-10-30
Also published as: CN110765359B

Abstract

The invention provides a method and a system for recommending new media content, and relates to the field of data processing. The method expresses texts in a new media text database by using the semantic neural network, not only reflects real texts, but also reflects knowledge which is most probably imagined when people see the texts, and on one hand, the method is favorable for introducing prior knowledge, completes information lacking in text contents by introducing the prior knowledge, on the other hand, can alleviate the problem of natural language ambiguity, and enhances the learning and reasoning capabilities of an algorithm. On the basis of a semantic neural network, a three-layer attention model is constructed, the three-layer attention model is improved by introducing external knowledge and a feedback mechanism to obtain a recommendation model, a deep learning algorithm is improved on the basis of the recommendation model, the improved deep learning algorithm is applied, intelligent recommendation of new media contents is achieved, the matching degree of users of the new media contents interested by the users is improved, and accurate pushing is achieved.

Description

New media content recommendation method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a new media content recommendation method.

Background

In the information overload era, the recommendation system is used for contacting users and information to help the users to find valuable information and enable the information to be presented to interested users, so that the win-win situation of information consumers and information producers is realized.

Existing recommendation systems generally employ two approaches: a content-based recommendation method and a collaborative filtering method. The basic idea of content-based recommendation is to recommend similar content to a user based on the content that the user browses or collects and the user's preference settings. Collaborative filtering methods are further classified into user-based collaborative filtering and model-based collaborative filtering. The user interest is analyzed by the user-based collaborative filtering method, similar users of the specified user are found in the user group, and the collection or browsing condition of the similar users on certain content is integrated to form preference degree prediction of the specified user on the content by the system. And the model-based collaborative filtering means that the data of m articles and n users only have scoring data between part of the users and part of the data, and the scoring of other parts is blank, and at the moment, the existing part of sparse data is used for predicting the scoring relationship between the blank articles and the data, and the article with the highest scoring is found and recommended to the user. Both the content-based recommendation method and the collaborative filtering method are modeled through the idea of machine learning, the recommendation problem is solved through a model, and a large amount of data sets are needed before general modeling.

However, since new media data is updated very quickly, new words and concepts continuously appear, and new media data is spoken, a large-scale training data set constructed by consuming a large amount of manpower and material resources cannot be effectively adapted to a new test data set, so that the content recommended by the existing recommendation system when applied to a new media is not the content of interest to the user, that is, the recommendation is not accurate.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a new media content recommendation method, which solves the technical problem that the recommendation is not accurate when the existing recommendation system is applied to new media content.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

the invention provides a new media content recommendation method, which is executed by a computer and comprises the following steps:

s1, acquiring a new media text database;

s2, processing the new media text database based on the semantic neural network to obtain two-dimensional representation of the text;

s3, constructing a three-layer attention model based on the two-dimensional representation of the text;

s4, constructing a feedback mechanism, and improving the three-layer attention model based on external knowledge and the feedback mechanism to obtain a recommendation model;

s5, improving a deep learning algorithm based on the recommendation model, and applying the improved deep learning algorithm to realize intelligent pushing of new media content.

Preferably, the step S2 specifically includes:

the expression mode of the semantic neural network is as follows:

G＝<V，E>

wherein:

v represents a set of finite points;

e represents a set of finite edges;

processing the new media text database by using a semantic neural network, wherein each node vi belongs to V in the semantic neural network to represent a word, and each edge eij belongs to E to represent correlation between the words vi and vj;

in the semantic neural network, each node has four attributes attr (vi) { ID, Name, status, description }; wherein ID is the number of the word vi in the new media text database, Name is the word vi itself, status represents the state of the node, and description is the semantic interpretation of the word vi; the weight (eij) of the edge eij indicates how close the connection between the words vi and vj is.

Preferably, the three-layer attention model includes a word-level attention model, a sentence-level attention model, and a topic-level attention model.

Preferably, the word-level attention model construction method includes:

an attention E contains a series of continuous or discontinuous phrases, defined as E ═ E₁,e₂,...,e_t,...,e_m},e_tRepresenting the position of the phrase; hidden layer of LSTMCan be expressed asThe vector representation of attention E can be calculated by the following formula:

wherein α is α₁,α₂,...,α_m}，α_jIs the jth self-attention vector in α, hej is the jth output in H ', α is the self-attention vector whose output is the entire LSTM hidden state H ', α can be calculated by introducing the hidden output H ' into the following two-level acquisition function:

α＝softmax(w_a2tanh(W_a1H′))

wherein:

w_a2is a size d_aParameter vector of d_aIs the number of cells of the LSTM hidden layer;

W_a1is shaped like d_aA x 2u weight matrix;

based on the size of H' being m × 2u, the size of attention vector α being m;

a parameter vector W_a2Expanded to a size r x d_nMatrix W of_a2The attention vector α will also be converted into the attention matrix A:

Α＝softmax(W_a2tanh(W_a1H′))

constructing a word level attention model V based on LSTM hidden state H' and attention matrix A_t；

V_t＝ΑH′。

Preferably, the sentence-level attention model construction method includes:

let s denote a sentence of length L, H ═ H₁,h₂,...,h_L) An output representing an implied layer of the LSTM; substituting the output formula of the hidden layer of the LSMT into the attention matrix A, and calculating by linearly integrating hidden vectors to obtain:

wherein:

Β＝[β_r,1,β_r,2,...,β_r,L]is a sentence level attention matrix, each β_r,tFor phrase w in sentence s_tCarrying out encoding;

s represents a sentence with the length of L, and e represents a word in the sentence;

based on a multilayer neural network with a tanh activation function, the unit h of each hidden layer is divided into a plurality of units_tIs converted into a_mAnd (3) generating a probability distribution on the whole sentence s by using a density softmax function to obtain a sentence-level attention model:

wherein:

H′⊙v_emeans that v is_eElement h concentrated to each hidden layer_t；

W_mRefers to assigning an attention weight to the input value in each LSTM.

Preferably, the topic level attention model construction method comprises the following steps:

given an ac q of length X, the output of the LSTM hidden layer is denoted as H ═ (H)₁,h₂,...,h_Q) Obtaining a topic level attention model:

wherein:

Ψ＝[ψ_r,1,ψ_r,2,...,ψ_r,x]is an attention matrix of each psi_r,xRepresents the expression w in discussion j_xThe attention matrix can be obtained by the following formula:

wherein:

representing the position of the word in attention E;

W_mrefers to assigning attention weights to the input values in each LSTM;

v_eis an attention vector expression.

Preferably, in S4, the feedback mechanism includes:

assume that there are K alternative concepts, denoted as μ ═ μ_t，1，μ_t，2，...，μ_t，KThe expression of the feedback mechanism is as follows:

wherein:

μ refers to a collection of alternative concepts;

μ_irefers to a text vector obtained by computing the collection through a word-level attention model.

8. The method for recommending media contents according to claim 1, wherein in said S4, said method for introducing external knowledge comprises:

and acquiring new words, and adding the new words into the new media text data.

Preferably, in S5, the improved deep learning algorithm includes: the LSTM encoder, whose mathematical expression is:

f_t＝σ_g(W_f[x_t,h_t-1,μ_t]+b_f

i_t＝σ_g(W_i[x_t,h_t-1,μ_t]+b_i)

c_t＝f_t·c_t-1+i_t·tanh(W_c[x_t,h_t-1]+b_c)

o_t＝σ_g(W_o[x_t,h_t-1,μ_t]+b_o)

wherein: f. of_tA forget gate at time t; i.e. i_tAn input gate that is time t; o_tIs the time t output gate; c. C_tIs a state vector of the network node; x is the number of_tIs the input vector at time t; h is_tIs the output vector at time t; mu.s_tIs the output vector of the hidden layer; sigma_gIs a Sigmoid function; w_f、W_i、W_c、W_o、W_co、b_f、b_i、b_c、b_oAnd b_coAre all deviation scale vectors for the respective gates.

The present invention also provides a media content recommendation system, the system comprising a computer, the computer comprising:

at least one memory cell;

at least one processing unit;

wherein the at least one memory unit has stored therein at least one instruction that is loaded and executed by the at least one processing unit to perform the steps of:

s1, acquiring a new media text database;

(III) advantageous effects

The invention provides a new media content recommendation method and system. Compared with the prior art, the method has the following beneficial effects:

the method comprises the steps of processing a new media text database based on a semantic neural network to obtain two-dimensional representation of a text, then constructing a three-layer attention model based on the two-dimensional representation of the text, improving the three-layer attention model by constructing a feedback mechanism and improving external knowledge to obtain a recommendation model, improving a deep learning algorithm based on the recommendation model, and realizing intelligent recommendation of new media contents by applying the improved deep learning algorithm. The method expresses texts in a new media text database by using the semantic neural network, not only reflects real texts, but also reflects knowledge which is most probably imagined when people see the texts, and on one hand, the method is favorable for introducing prior knowledge, completes information lacking in text contents by introducing the prior knowledge, on the other hand, can alleviate the problem of natural language ambiguity, and enhances the learning and reasoning capabilities of an algorithm. On the basis of a semantic neural network, a three-layer attention model is constructed according to the characteristics of new media data, then the three-layer attention model is improved by introducing external knowledge and a feedback mechanism to obtain a recommendation model, feedback information in the external knowledge and the feedback mechanism can be used for adjusting information transmission rules and attention distribution rules among deep learning neurons, and therefore the purpose of improving the performance of the recommendation model is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a block diagram of a new media content recommendation method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application provides a new media content recommendation method and system, solves the technical problem that the recommendation is not accurate when the existing recommendation system is applied to new media content, and realizes high-accuracy recommendation of interesting content for users.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

the embodiment of the invention processes a new media text database based on a semantic neural network to obtain two-dimensional representation of a text, then constructs a three-layer attention model based on the two-dimensional representation of the text, improves the three-layer attention model by constructing a feedback mechanism and improving external knowledge to obtain a recommendation model, improves a deep learning algorithm based on the recommendation model, and uses the improved deep learning algorithm to realize intelligent recommendation of new media content. The embodiment of the invention expresses the text in the new media text database by the semantic neural network, not only reflects the real text, but also reflects the knowledge which is most probably imagined when people see the text. On the basis of a semantic neural network, a three-layer attention model is constructed according to the characteristics of new media data, then the three-layer attention model is improved by introducing external knowledge and a feedback mechanism to obtain a recommendation model, feedback information in the external knowledge and the feedback mechanism can be used for adjusting information transmission rules and attention distribution rules among deep learning neurons, and therefore the purpose of improving the performance of the recommendation model is achieved.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

An embodiment of the present invention provides a new media content recommendation method, as shown in fig. 1, where the method is executed by a computer, and includes steps S1 to S5:

s1, acquiring a new media text database;

The embodiment of the invention expresses the text in the new media text database by the semantic neural network, not only reflects the real text, but also reflects the knowledge which is most probably imagined when people see the text. On the basis of a semantic neural network, a three-layer attention model is constructed according to the characteristics of new media data, then the three-layer attention model is improved by introducing external knowledge and a feedback mechanism to obtain a recommendation model, and feedback information in the external knowledge and the feedback mechanism can be used for adjusting information transmission rules and attention distribution rules among deep learning neurons, so that the purpose of improving the performance of the recommendation model is achieved. The method has the advantages that the deep learning algorithm is improved based on the recommendation model, the improved deep learning algorithm is applied, intelligent recommendation of new media contents is achieved, the matching degree of users of the new media contents which are interested by the users is improved, accurate pushing is achieved, and the probability that the recommendation model recommends the contents which are not interested by the users to the users is effectively reduced.

Each step is described in detail below.

In step S1, a new media text database is obtained, and the specific implementation process is as follows: the method comprises the steps of obtaining text contents on websites such as Baidu encyclopedia, interactive encyclopedia and online Xinhua dictionary through a web crawler technology, and combining the contents together to form a new media text database.

In step S2, the new media text database is processed based on the semantic neural network to obtain a two-dimensional representation of the text. The specific implementation process is as follows:

and representing the text in the new media text database by utilizing a semantic neural network to obtain two-dimensional representation of the text. The expression mode of the semantic neural network is as follows:

G＝<V，E>

wherein:

v represents a set of finite points;

e represents a set of finite edges;

processing the new media text database by using a semantic neural network, wherein each node vi belongs to V in the semantic neural network to represent a word, and each edge eij belongs to E to represent the correlation between the words vi and vj;

The semantic neural network is used to represent the text in the new media text database, not only reflecting the real text, but also reflecting the knowledge that is most likely to be imagined when a person sees the text, which is convenient to introduce a priori knowledge, and on the other hand, can alleviate the problem of natural language ambiguity. It is especially important for the discretization of dynamic data of new media.

In step S3, a three-layer attention model is constructed based on the two-dimensional representation of the text. The three-layer attention model comprises a word level attention model, a sentence level attention model and a topic level attention model. The specific implementation process is as follows:

s301, constructing a word level attention model, which specifically comprises the following steps:

an attention E contains a series of continuous or discontinuous phrases, defined as E ═ E₁,e₂,...,e_t,...,e_m},e_tRepresenting the position of the phrase; the output of the hidden layer of the LSTM can be represented as

The vector representation of attention E can be calculated by the following formula:

wherein α is α₁,α₂,...,α_m}, α is a self-attention vector whose output is the entire LSTM hidden state H ', α can be calculated by introducing the hidden output H' into the following two-level acquisition function:

α＝softmax(w_a2tanh(W_a1H′))

wherein:

W_a1is shaped like d_aA x 2u weight matrix;

based on the size of H' being m × 2u, the size of attention vector α being m;

the function softmax guarantees that the sum of the calculated weights is 1.

Vector of parameters w_a2Expanded to a size r x d_nMatrix W of_a2The attention vector α will also be converted into the attention matrix A:

Α＝softmax(W_a2tanh(W_a1H′))

V_t＝ΑH′

S302, constructing a sentence-level attention model, which specifically comprises the following steps:

wherein:

s represents a sentence of length L and e represents a word in the sentence.

wherein:

H′⊙v_emeans that v is_eElement h concentrated to each hidden layer_t；

W_mRefers to assigning an attention weight to the input value in each LSTM.

S303, a topic level attention model is built, and unlike articles or comments, communication on new media is obvious topical, and one communication is usually concentrated on one topic (for example, a notebook computer of a certain model is being discussed). However, during communication, some expressions may be meaningful, while others are relatively small in information content. In consideration of the characteristic of new media communication, a topic level attention model is constructed to emphasize meaningful and rich information content in the communication. To gain topic-level attention, it is first necessary to determine whether content that is a communication is focused on a related topic. To achieve this goal, a topic model (nested hierarchical dirichlet process, Paisley et al.2015) was introduced. Then, the topic embedding of one topic z is calculated on the basis of all discussion data. Finally, topic-level attention models can be derived similar to the word-level and sentence-level attention models. The method specifically comprises the following steps:

wherein:

wherein:

representing the position of the word in attention E;

W_mrefers to assigning attention weights to the input values in each LSTM;

v_eis an attention vector expression.

In step S4, a feedback mechanism is constructed, and the three-layer attention model is improved by the feedback mechanism and by introducing external knowledge, resulting in a recommended model. To further provide the accuracy of the recommended model, the three-layer attention model is improved by introducing external information optimization. There are two methods for introducing external information, one is to introduce external knowledge such as existing semantic networks (e.g., widely used encyclopedia, interactive encyclopedia, etc.). Mainly used for the supplement of knowledge and the discovery and understanding of new words. The other is reinforcement learning. The user use information is obtained mainly through a feedback mechanism, and the effect of the model is improved. The specific process is as follows:

s401, constructing a feedback mechanism, and improving the three-layer attention model by using the feedback mechanism, wherein the feedback mechanism specifically comprises the following steps:

assume that there are K alternative concepts, denoted as μ ═ μ_t，1，μ_t，2，...，μ_t，KThen the text vector can be obtained through a simple attention model, which is a feedback mechanism, and the specific formula is as follows:

wherein:

μ refers to a collection of alternative concepts;

μ_irefers to a text vector obtained by computing the collection through a word-level attention model. As can be seen from the above equation, the feedback mechanism is equivalent to updating the word-level attention model.

S402, obtaining new words, adding the new words into a new media text database, updating the new media text database, and optimizing the weight of the three-layer attention model value by introducing external information and external feedback by applying a feedback mechanism, so that the accuracy of the recommendation model is improved.

In step S5, a deep learning algorithm is improved based on the recommendation model, and the improved deep learning algorithm is applied to realize intelligent pushing of new media content. It should be noted that, in the embodiment of the present invention, the recommended model is an improved three-layer attention model.

Specifically, the recommendation model is used for adjusting the calculation resource distribution of the deep learning algorithm, the deep learning algorithm is improved, and the intelligent pushing of the new media content is realized through the improved deep learning algorithm.

The improved deep learning algorithm comprises an LSTM encoder, and the mathematical expression of the improved deep learning algorithm can be summarized as follows:

f_t＝σ_g(W_f[x_t,h_t-1,μ_t]+b_f)

i_t＝σ_g(W_i[x_t,h_t-1,μ_t]+b_i)

c_t＝f_t·c_t-1+i_t·tanh(W_c[x_t,h_t-1]+b_c)

o_t＝σ_g(W_o[x_t,h_t-1,μ_t]+_bo)

wherein: f. of_tA forget gate at time t; i.e. i_tAn input gate that is time t; o_tIs the time t output gate; c. C_tIs the state vector of the network node (neuron); x is the number of_tIs the input vector at time t; h is_tIs the output vector at time t; mu.s_tIs the output vector of the hidden layer; sigma_gIs a Sigmoid function; w_f、W_i、W_c、W_o、W_co、b_f、b_i、b_c、b_oAnd b_coAre all deviation scale vectors for the respective gates.

An embodiment of the present invention further provides a media content recommendation system, where the system includes a computer, and the computer includes:

at least one memory cell;

at least one processing unit;

wherein at least one instruction is stored in the at least one memory unit, and the at least one instruction is loaded and executed by the at least one processing unit to implement the following steps:

s1, acquiring a new media text database;

It can be understood that, the media content recommendation system provided in the embodiment of the present invention corresponds to the media content recommendation method, and the explanation, examples, and beneficial effects of the relevant content may refer to the corresponding content in the media content recommendation method, which is not described herein again.

In summary, compared with the prior art, the method has the following beneficial effects:

the embodiment of the invention expresses the text in the new media text database by the semantic neural network, not only reflects the real text, but also reflects the knowledge which is most probably imagined when people see the text. On the basis of a semantic neural network, a three-layer attention model is constructed according to the characteristics of new media data, then the three-layer attention model is improved by introducing external knowledge and a feedback mechanism to obtain a recommendation model, feedback information in the external knowledge and the feedback mechanism can be used for adjusting information transmission rules and attention distribution rules among deep learning neurons, and therefore the purpose of improving the performance of the recommendation model is achieved. The probability that the recommendation model recommends the contents which are not interested in the user to the user is effectively reduced.

It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A new media content recommendation method, wherein the method is executed by a computer, comprising the steps of:

s1, acquiring a new media text database;

2. The method of claim 1, wherein the S2 specifically includes:

the expression mode of the semantic neural network is as follows:

G＝<V，E>

wherein:

v represents a set of finite points;

e represents a set of finite edges;

in the semantic neural network, each node has four attributes attr (vi) { ID, Name, status, description }; wherein ID is the number of a word vi in a new media text database, Name is the word vi itself, status represents the state of the node, and description is the semantic interpretation of the word vi; the weight (eij) of the edge eij indicates how close the connection between the words vi and vj is.

3. A method for recommending media content according to claim 2, wherein said three-level attention model comprises a word-level attention model, a sentence-level attention model, a topic-level attention model.

4. A method for recommending media contents according to claim 3, characterized in that said word-level attention model construction method comprises:

an attention E contains a series of continuous or discontinuous phrases, defined as E ═ E₁,e₂,...,e_t,...,e_m}，e_tRepresenting the position of the phrase; the output of the hidden layer of the LSTM can be represented asThe vector representation of attention E can be calculated by the following formula:

α＝softmax(w_a2tanh(W_a1H′))

wherein:

W_a1is shaped like d_aA x 2u weight matrix;

based on the size of H' being m × 2u, the size of attention vector α being m;

Α＝sofymax(W_a2tanh(W_a1H′))

V_t＝ΑH′。

5. A method for recommending media contents according to claim 3, characterized in that said sentence-level attention model construction method comprises:

wherein:

H′⊙v_emeans that v is_eElement h concentrated to each hidden layer_t；

W_mRefers to assigning an attention weight to the input value in each LSTM.

6. A method as claimed in claim 3, wherein the topic level attention model is constructed by:

wherein:

wherein:

representing the position of the word in attention E;

W_mrefers to assigning attention weights to the input values in each LSTM;

v_eis an attention vector expression.

7. A method as claimed in claim 3, wherein in S4, the feedback mechanism comprises:

wherein:

μ refers to a collection of alternative concepts;

and acquiring new words, and adding the new words into the new media text data.

9. The media content recommendation method of claim 7, wherein in S5, the improved deep learning algorithm comprises: the LSTM encoder, whose mathematical expression is:

f_t＝σ_g(W_f[x_t,h_t-1,μ_t]+b_f)

i_t＝σ_g(W_i[x_t,h_t-1,μ_t]+b_i)

c_t＝f_t·c_t-1+i_t·tanh(W_c[x_t,h_t-1]+b_c)

o_t＝σ_g(W_o[x_t,h_t-1,μ_t]+b_o)

10. A media content recommendation system, the system comprising a computer, the computer comprising:

at least one memory cell;

at least one processing unit;

s1, acquiring a new media text database;