CN114564594A

CN114564594A - Knowledge graph user preference entity recall method based on double-tower model

Info

Publication number: CN114564594A
Application number: CN202210169936.1A
Authority: CN
Inventors: 陆佳炜; 吴俚达; 程振波; 韦航俊; 朱昊天; 方静雯; 徐俊; 肖刚
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-05-31

Abstract

The invention discloses a knowledge graph user preference entity recalling method based on a double-tower model, wherein an optimization method is added in the traditional double-tower model and is used for better learning interaction between a user and an article, and a trained double-tower model can be used for recalling an entity related to user preference on a knowledge graph. Firstly, taking an entity corresponding to an article in a knowledge graph recorded by a user history as a starting point, and searching all neighbor entities along the edge. And then screening the recalled entities through a trained optimized double-tower model. And finally, repeating the operation by taking the recalled entity as a new starting point. Ultimately forming a knowledge graph capable of representing user preferences and potential preferences.

Description

Knowledge graph user preference entity recall method based on double-tower model

Technical Field

The invention relates to the technical field of knowledge graphs and deep learning, in particular to a knowledge graph user preference entity recalling method based on a double-tower model.

Background

The knowledge graph is a concept proposed by google in 2012, and is a knowledge base used by google to enhance the function of its search engine. Essentially, a knowledge graph is intended to describe various entities or concepts and their relationships that exist in the real world, and constitutes a huge semantic network graph, with nodes representing entities or concepts and edges consisting of attributes or relationships. The representation of each piece of knowledge is in the form of a triplet (h, r, t), where h represents the head entity, t represents the tail entity, and r represents the relationship between the head and tail entities. The knowledge graph plays an important role in the fields of recommendation systems, intelligent question answering, information retrieval and the like by virtue of strong semantic processing capability and open organization capability, and lays a foundation for knowledge organization and intelligent application in the internet era.

Conventional recommendation systems use explicit or implicit information as input for prediction, and there are two main problems. Firstly, the sparsity problem, in the actual scene, the mutual information of user and article is often very sparse, uses so few observation data to predict a large amount of unknown information, can greatly increase the risk of overfitting. The second is the cold start problem, and for newly added users or articles, the newly added users or articles do not have corresponding historical information, so that accurate modeling and recommendation are difficult to perform.

The knowledge graph contains rich semantic association between entities, and provides a potential auxiliary information source for a recommendation system. The knowledge graph introduces more semantic relations for the articles, and can deeply discover the user interests. The variety is linked through different relations in the knowledge graph, so that the divergence of recommendation results is facilitated. The knowledge graph can be connected with the historical records of the user and the recommendation results, so that the satisfaction degree and the acceptance degree of the user on the recommendation results are improved, and the trust of the user on the system is enhanced.

The existing methods for knowledge graph recommendation mainly have two types. One type is based on an embedding-based method (embedding-based methods), by means of a knowledge graph vector embedding algorithm, entities and relations in a knowledge graph are learned, vector representations of the entities and the relations are obtained, and then vectors of the entities and the relations are introduced into a recommendation system framework. For example, DKN framework based on convolutional neural Network (Deep Knowledge-aware Network), and CKE framework based on Collaborative Knowledge base Embedding (Collaborative Knowledge base Embedding). Although the knowledge graph recommendation method based on vector embedding has strong flexibility, the method is generally suitable for intra-graph link prediction application, and the recommendation scene needs to mine the potential interest of the user. The other type is based on path-based methods (path-based methods), which explores various connections between various entities in the knowledge graph and provides additional guidance for the recommendation system. For example, Personalized Entity-Based Recommendation methods (Personalized Entity Recommendation), and metagraph-Based Recommendation methods (Meta-Graph Based Recommendation). While path-based knowledge-graph recommendation methods can use knowledge-graphs in a more natural and intuitive way, they rely heavily on manually designed meta-paths, which are difficult to optimize in practice.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a knowledge graph user preference entity recalling method based on a double-tower model. An optimization method is added in a traditional double-tower model and used for better learning interaction between users and articles; the trained two-tower model can be used to recall entities on the knowledge graph that are relevant to the user's preferences; firstly, taking an entity corresponding to an article of a user historical record in a knowledge graph as a starting point, and searching all neighbor entities along the edge; and then screening the recalled entities through a trained optimized double-tower model. Finally, the recalled entity is taken as a new starting point, and the operation is repeated; ultimately forming a knowledge graph that can represent user preferences and potential preferences.

The technical scheme adopted by the invention is as follows:

a knowledge graph user preference entity recalling method based on a double-tower model comprises the following steps:

1. defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;

2. training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;

3. defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;

4. and inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preferred entity transmission mode, screening entities with high probability, and finally obtaining a knowledge graph representing the user preference and the potential preference.

The process of the step 1 is as follows:

1.1, the user characteristics refer to the interaction behaviors of the user on the object, including click records, search records, social data, personal data and sample age, and the user characteristic vector is obtained by converting the interaction data into a vector and splicing (concatemate). The method for converting the original data into the vector is called vector embedding (embedding), which is a method for representing data features commonly used in machine learning and aims to extract features from the original data, namely low-dimensional vectors after mapping through a neural network;

further, the scheme of 1.1 is as follows:

1.1.1, the embedding of the user click record is a weighted average of id types of the clicked items, wherein the id types of the embedding are vectors which map the item unique identifiers to the same dimension, and the weight of the id types of the embedding is in direct proportion to the item browsing time. The imbedding calculation formula of the user click record is as follows:

wherein v is_clickRepresenting user clicksThe recorded imbedding of the recording is carried out,

denotes the ith weight, v_click,iRepresenting id type embedding of the ith item in the click record, wherein n represents the number of the embedding; wherein the content of the first and second substances,

can be calculated by the following formula:

wherein

Representing the time of browsing the item i by the user, N representing the total number of samples, and k representing the total number of positive examples;

1.1.2, the embedding of the user search record is to perform word segmentation on the keywords of the historical search to obtain entries. The process of Word segmentation is to obtain embedding of the corresponding entry through a Word2vec model, and then carry out weighted average on the embedding of the user search record.

The word segmentation is a technology for segmenting a keyword string submitted by a user to be searched into different entries token by a search engine.

The Word2vec model converts content words in the text into space vectors, and the numerical values of the Word vectors are influenced by the context and contain the correlation between words.

The formula for embedding the user search records is as follows:

wherein v is_searchEmbedding representing a user's search records,

denotes the ith weight, v_search,iIndicating the imbedding of the ith entry in the search record, wherein n indicates the number of the imbedding;

weight calculation of embedding of search records:

judging whether the user clicks the article after searching the article according to the searching validity;

1.1.3, the social data of the user comprises an embedding weighted average corresponding to the collection, praise and subscription data. The imbedding corresponding to the collection and approval data refers to the imbedding of the item id class collected and approved by the user; the imbedding corresponding to the subscription data refers to the imbedding of the id class of the responsible person corresponding to the user subscription item.

The formula for embedding the social data of the user is as follows:

wherein v is_socialEmbedding representing a user's search records,

denotes the ith weight, v_social,iEmbedding representing the ith social data in the search record;

weight calculation for favorites and praise embedding:

wherein

embedding weight calculation for a subscription:

wherein

Showing the browsing time of the ith item of the subscriber, wherein N represents the total number of samples, and k represents the total number of positive examples;

1.1.4, personal data of the user comprises the gender, age and region of the user; the neutral characteristics are simple binary characteristics, the age and the region belong to continuous characteristics, and the characteristics can be normalized into real values in the [0,1] interval. The embedding of the user personal data is a vector obtained by splicing the processed values of the sex, the age and the region;

further, the scheme of 1.1.4 is as follows:

1.1.4.1, calculating the binary expression of the user gender, wherein the formula is as follows:

1.1.4.2, calculating the normalized real value of the user age and the region, wherein the normalized formula is as follows:

wherein X represents the sample value, μ is the mean of all sample data, and σ is the standard deviation of all sample data;

1.1.4.3, splicing the sex binary value, age and region normalized real value of 1.1.4 to obtain a vector, wherein the vector splicing operation formula is as follows:

v_personal＝[gender,z_age,z_region]

wherein v is_personalRepresenting the user feature vector, gender, z_ageAnd z_regionAre respectively provided withA normalized value representing the age and region of the user;

1.1.5, the user clicks the recorded embedding in the flow of 1.1, the user searches the recorded embedding, the user interacts with the recorded embedding of the data, the embedding of the user personal data is subjected to a locate connection operation to obtain a user characteristic vector, and the formula is as follows:

v_user＝concatenate(v_click,v_search,v_social,v_personal)

＝[v_click[1],v_click[2],…,v_search[1],v_search[2],…,v_social[1],v_social[2],…,v_personal[1],v_personal[2],…]

wherein v is_userRepresenting a user feature vector, v_click[i]The ith component, v, representing the user clicking on embedding_search[i]The i-th component, v, representing the user's search record embedding_social[i]The i component, v, representing the user's social data embedding_personal[i]An ith component representing user personal data embedding;

1.2, the article characteristics comprise the id of the article and the context information thereof, and the article characteristic vector is formed by splicing the id type imbedding of the article with the imbedding of the context information thereof;

further, the scheme of 1.2 is as follows:

1.2.1, giving id type embedding of the article, wherein the id type embedding is a vector for mapping the unique identifier of the article to the same dimension;

1.2.2, providing context information embedding of the article, wherein the context information is a vector obtained by Word2 vec;

1.2.3, carrying out a concatemate connection operation on the id type embedding and the context information embedding in the step 1.2 to obtain an article feature vector, wherein the formula is as follows:

v_item＝concatenate(v_id,v_context)＝[v_id[1],v_id[2],…,v_context[1],v_context[2],…]

wherein v is_itemA feature vector of the article is represented,v_idid class embedding, v representing an item_contextEmbedding, v representing item context information_id[i]The i component, v, of the id class embedding representing the item_context[i]The ith component of embedding representing item context information;

in step 2, the two-column model is derived from DSSM (deep Structured Selective models). DSSM is a deep structured semantic model, which is commonly used to solve the problem of semantic similarity in the field of natural language processing. The double-tower model can be divided into an input layer, a presentation layer and a matching layer from top to bottom, and is divided into two input layers, two presentation layers and one matching layer when viewed from the horizontal direction. The outputs of the two input layers are respectively the inputs of the two representation layers, the outputs of the two representation layers are collected to the matching layer, and the whole structure is in a double-tower form. In the invention, two inputs of an input layer of the double-tower model are respectively a user characteristic vector and an article characteristic vector, two representation layers are of the same neural network structure, and the characteristic vectors can obtain vectors of the same dimension after calculation through the neural network. Finally, the two vectors are normalized by L2 and then subjected to inner product. Furthermore, the optimized double-tower model is optimized by adopting a frequency estimation method based on a Hash sequence. This approach reduces the sampling bias problem that negative sampling may occur in each batch, thereby optimizing the loss function.

The flow of the step 2 is as follows:

2.1, giving two imbedding functions with parameters:

wherein

Representing a d-dimensional real number vector.

Is the user feature vector and the article feature vector extracted by the deep neural network, wherein

And

respectively, are the input values required for the double tower model.

Further, the scheme of 2.1 is as follows:

2.1.1, the double tower model contains two deep neural network models. The component unit of the neural network model is a perceptron, the perceptron has a plurality of inputs and an output, wherein the output and the input are a linear relation trained, the output value can obtain the final result through a nonlinear activation function;

2.1.2, the deep neural network is expanded on the neural network model, the deep neural network has an input layer, an output layer and a plurality of hidden layers, the invention adopts 3 hidden layers, and the layers adopt full connection, and the full connection means that each node is connected with all nodes of the previous layer;

2.1.3, inputting the user characteristic vector and the article characteristic vector into a corresponding neural network, wherein because the model needs to be trained, the output result is represented by a function containing a parameter theta to be trained, and the output results are respectively as follows:

wherein

2.1.4, will output the result

The L2 normalization processing is carried out, and the L2 normalization formula is as follows:

wherein

γ_iIs represented by [ Y]The ith component on the vector;

2.2, by using a frequency estimation method based on a hash sequence, namely recording the serial number of the sample by using the hash sequence, the problem of sampling deviation possibly occurring in each batch of negative samples is reduced, so that a loss function is optimized, and the following steps are circulated;

further, the scheme of 2.2 is as follows:

2.2.1, randomly taking T samples from the sample set, expressed as follows:

wherein x_iRepresenting the ith user feature vector in sample T,

y_irepresenting the ith item feature vector in sample T,

r_irepresents the degree of feedback of the ith user in the sample T, and r_i ∈[0,1]。

2.2.2 calculating y in each sample by using a frequency estimation method based on a Hash sequence_iProbability p of_i；

The 2.2.2 flow is as follows:

2.2.2.1, setting arrays A and D with learning rate alpha and size H and a hash function H;

wherein the hash function h may hash each y_iMapping to a range of [0, H]Is an integer of (1).

2.2.2.2, for each step t ═ 1,2, …, all collections of items in one batch (batch) of training sample numbers

For each article

Comprises the following steps:

2.2.2.2.1、

where ← represents the assignment,

denotes y_iThe time of the last time that it was sampled,

denotes y_iThe number of steps of one pass of the sampled data;

2.2.2.2.2、

2.2.2.3, for each

Probability of sampling

2.2.3 computing the optimized loss function

And giving a derivation process;

the 2.2.3 flow is as follows:

2.2.3.1, the formula of vector inner product is given:

2.2.3.2, given a vector x, obtaining one of the M items

The probability of (c) can be calculated using the softmax function, which is formulated as follows:

wherein e is a natural constant;

2.2.3.3, the weighted log-likelihood loss function is:

wherein T represents the randomly taken T samples of 2.2.1;

2.2.3.4, the invention uses a negative sampling algorithm to calculate, the negative sampling algorithm adopts a smoothing strategy, which can improve the sampling probability of the low frequency sample. We sample negative samples in the same batch. Give a minimum batch of

Wherein the batch-softmax function is:

2.2.3.5, in each batch, because the existence of the power distribution phenomenon, that is, the random sampling of negative samples in each batch can make the hot goods easy to be sampled, and the loss function excessively penalizes the hot goods, the frequency is considered to correct the sampling, and the formula is as follows:

s^c(x_i,y_i)＝s(x_i,y_i)-log(p_i)

wherein s is^c(x_i,y_i) Denotes s (x)_i,y_i) Correction value of p_iThe hash sequence based frequency estimation algorithm from 2.2.2;

2.2.3.6, the conditional probability function thus modified is:

2.2.3.7, the final loss function is:

and 2.2.4, updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value. Wherein the gradient descent algorithm machine learns the commonly used algorithm for solving the model parameters;

the flow of the step 3 is as follows:

3.1, given a user interaction matrix

Denotes the first

A user and the first

The interactive condition of each item, U and V respectively represent a user set and an item set;

the expression of the user interaction matrix Y is as follows:

3.2, definition of O_i,jRepresenting the interaction condition of the user i to the item j, and mapping the item with the interaction of the user i to the knowledge graph

The entity of (a);

further, the 3.2 scheme is as follows:

3.2.1、O_i,jis an interaction matrix composed of row vectors of users. O is_i,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;

3.2.2, defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the articles, and the value stores the entity corresponding to the object;

3.2.3, expressing the interaction condition of a user from a row vector of a user interaction matrix O, storing the row vector in an array, wherein the array is defined as Ee;

3.2.3, defining a temporary set temp _ set user storage entity, traversing an array E, and if the value of an element in the array is 1, accessing an article HashMap according to the index of the element to acquire a corresponding entity, and storing the entity in the temporary set temp _ set;

the flow of the step 4 is as follows:

4.1, giving the user feature vector v of the first step to a user_user；

4.2, initializing a temporary set temp _ set of the user according to the third step;

4.3, defining a HashMap preferred by a user, namely a user _ map, storing the circulation times of a key value, and storing a triple set by a value;

4.4, defining a set used _ set for storing the recalled triples;

4.5, setting the loop time number K to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the loop, and executing the following steps:

4.5.1, traverse the entity in temp _ set, remove the entity in used _ set

temp_set←temp_set-used_set

Wherein ← represents valuation, -represents difference set operation;

4.5.2 finding triple sets from the knowledge graph

And stored in the user _ map, defined as follows:

wherein (h ', r', t ') represents a triplet, h' represents a head entity, r 'represents a relationship, and t' represents a tail entity;

4.5.3, due to the existence of a loop in the knowledge graph, in order to prevent the collection of the recalled entities, the entities in temp _ set need to be added to used _ set;

4.5.4, taking out

Tail entities of middle triplets, deposited into collections

4.5.5, go through

The corresponding article feature vector v is taken out from the entity in (1)_item；

4.5.6, input parameter v described by the second step_userAnd v_itemTo the double tower model, the users and the sets are obtained

The probability of the corresponding article in (1), and ordering the entities according to the probability;

4.5.7, a value τ is given, 0< τ ≦ 1, for determining the number of screening entities:

wherein

Representing the elements in the set sorted by probability and returning an array, getSubVec (i, j) representing the fetching of the child array, i.e. the i to j elements of the original array, newSet () representing the conversion of the array into the set,

represents rounding up;

4.5.8 from

Screening of tailing entity genera from collectionsIn that

The triplet of (2):

4.5.9, method for storing the triple set lambda to user _ map: user _ map (k, Λ);

4.5.10, will

Set overlay to temp _ set;

and 4.6, after the execution is finished by 4.5, finally obtaining the knowledge graph preferred by the user.

The invention has the following beneficial effects: and screening entities preferred by users in the knowledge graph through the optimized double-tower model. The optimization method of the double-tower model adopts a frequency estimation method based on a Hash sequence, so that the article can better adapt to various data distributions. By screening the knowledge graph entities, not only can better data be obtained, but also recalled entities can be really close to user preferences; and the screening entities are beneficial to the deep recalling of the knowledge graph, because the recalled entities in each time can be increased due to the explosion of the cardinality of the previous entities, and if the screening is not carried out, the computing efficiency and the exploration of the potential preference of the user can be influenced.

Detailed Description

The present invention is further described below with reference to examples.

Example (b):

the first step is as follows: defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;

the process of the first step is as follows:

further, the scheme of 1.1 is as follows

1.1.1, embedding of the user click record is weighted average of id type embedding of all clicked items, wherein the id type embedding is a vector for mapping item unique identifiers to the same dimension, and the weight of the id type embedding is in direct proportion to item browsing time. The imbedding calculation formula of the user click record is as follows:

wherein v is_clickIndicating that the user clicked on the recorded embedding,

denotes the ith weight, v_click,iThe id type embedding of the ith item in the click record is represented, and n represents the number of the embedding; wherein the content of the first and second substances,

can be calculated by the following formula:

wherein

The Word2vec model is proposed by Mikolov et al in 2013, the model converts content words in a text into space vectors through conversion processing, and the numerical values of the Word vectors are influenced by context and contain the mutual relevance between words.

The formula for embedding the user search records is as follows:

wherein v is_searchEmbedding representing a user's search records,

weight calculation of embedding of search records:

The formula for embedding the social data of the user is as follows:

wherein v is_socialEmbedding representing a user's search records,

weight calculation for favorites and praise embedding:

wherein

embedding weight calculation for a subscription:

wherein

1.1.4, the user's personal data includes the user's gender, age, and location. The neutral features are simple binary features, the age and the region belong to continuous features, and the binary features can be normalized to real values in the interval of [0,1 ]. Embedding of user personal data is a vector obtained by splicing the processed values of gender, age and region;

further, the scheme of 1.1.4 is as follows:

1.1.4.2, calculating the normalized real value of the age and the region of the user, wherein the normalized formula is as follows:

1.1.4.3, splicing the sex binary value, age and the region normalized real value of 1.1.4 to obtain a vector, wherein the vector splicing operation formula is as follows:

v_personal＝[gender,z_age,z_region]

wherein v is_personalRepresenting the user feature vector, gender, z_ageAnd z_regionNormalized values respectively representing the age and region of the user;

v_user＝concatenate(v_click,v_search,v_social,v_personal)

wherein v is_userRepresenting a user feature vector, v_click[i]The ith component, v, representing the user clicking embedding_search[i]The i-th component, v, representing the user's search record embedding_social[i]The i component, v, representing the user's social data embedding_personal[i]An ith component representing user personal data embedding;

1.2, the article characteristics comprise the id and the context information of the article, and the article characteristic vector is formed by splicing the id class embedding of the article with the embedding of the context information;

further, the scheme of 1.2 is as follows:

wherein v is_itemRepresenting feature vectors, v, of the article_idId class embedding, v representing an item_contextEmbedding, v representing item context information_id[i]The i component, v, of the id class embedding representing the item_context[i]The ith component of embedding representing item context information;

the second step is that: training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;

in the second step, the two-column model is derived from DSSM (deep Structured semiconductor models). DSSM is a deep structured semantic model, which is commonly used to solve the problem of semantic similarity in the field of natural language processing. The double-tower model can be divided into an input layer, a presentation layer and a matching layer from top to bottom, and is divided into two input layers, two presentation layers and one matching layer when viewed from the horizontal direction. The outputs of the two input layers are respectively the inputs of the two representation layers, the outputs of the two representation layers are gathered to the matching layer, and the whole body presents a form of 'double towers'. In the invention, two inputs of an input layer of the double-tower model are respectively a user characteristic vector and an article characteristic vector, two representation layers are of the same neural network structure, and the characteristic vectors can obtain vectors of the same dimension after calculation through the neural network. Finally, the two vectors are normalized by L2 and then subjected to inner product. Furthermore, the optimized double-tower model is optimized by adopting a frequency estimation method based on a Hash sequence. This approach reduces the sampling bias problem that negative sampling may occur in each batch, thereby optimizing the loss function.

The flow of the second step is as follows:

2.1, giving two imbedding functions with parameters:

wherein

Representing a d-dimensional real vector.

And

respectively, the input values required for the two-tower model.

Further, the procedure of 2.1 is as follows

wherein

2.1.4, will output the result

wherein

γ_iIs represented by [ Y]The ith component on the vector;

further, the scheme of 2.2 is as follows:

2.2.1, randomly taking T samples from the sample set, expressed as follows:

wherein x_iRepresenting the ith user feature vector in sample T,

y_irepresenting the ith item feature vector in sample T,

2.2.2 computing y in each sample by using a frequency estimation method based on a Hash sequence_iProbability p of_i；

The 2.2.2 flow is as follows:

For each article

Comprises the following steps:

2.2.2.2.1、

where ← represents the assignment,

denotes y_iThe time of the last time that it was sampled,

denotes y_iThe number of steps of one pass of the sampled data;

2.2.2.2.2、

2.2.2.3, for each

Probability of sampling

2.2.3 computational optimizationPost loss function

And giving a derivation process;

the 2.2.3 process is as follows:

2.2.3.1, the formula of vector inner product is given:

2.2.3.2, given a vector x, obtaining one of the M items

wherein; theta after the symbol

B, carrying parameters, e is a natural constant;

2.2.3.3, the weighted log-likelihood loss function is:

wherein T represents the randomly taken T samples of 2.2.1;

2.2.3.4, the invention uses the negative sampling algorithm to calculate, the negative sampling algorithm adopts a smooth strategy, and the sampling probability of the low frequency sample can be improved. We sample negative samples in the same batch. Giving a minimum batch of

Wherein the batch-softmax function is:

s^c(x_i,y_i)＝s(x_i,y_i)-log(p_i)

2.2.3.6, the conditional probability function thus modified is:

2.2.3.7, the final loss function is:

and 2.2.4, updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value. Wherein the gradient descent algorithm

A machine learning common algorithm for solving model parameters;

the third step: defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;

the third step of the process is as follows:

3.1, given a user interaction matrix

Is shown as

A user and the second

the expression of the user interaction matrix Y is as follows:

The entity of (1);

further, the 3.2 scheme is as follows:

3.2.1、O_i,jis an interaction matrix, which is composed of row vectors of users. O is_i,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;

3.2.2, defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the article, and the value stores the entity corresponding to the object;

the fourth step: inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preferred entity transmission mode, screening entities with high probability, and finally obtaining a knowledge graph representing the user preference and the potential preference;

the fourth step is as follows:

4.1, providing the user feature vector v of the first step for a user_user；

4.4, defining a set used _ set for storing the recalled triples;

4.5, setting the cycle number K to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the cycle, and executing the following steps:

4.5.1 traversing the entity in temp _ set, removing the entity in used _ set

temp_set←temp_set-used_set

Wherein ← represents valuation, -represents difference set operation;

4.5.2 finding triple sets from the knowledge graph

And stored in the user _ map, defined as follows:

4.5.3, due to the existence of a loop in the knowledge graph, in order to prevent the recalled entities from being collected, the entities in temp _ set need to be added to used _ set;

4.5.4, taking out

Tail entities of middle triplets, deposited into collections

4.5.5, go through

4.5.6, input parameter v described by the second step_userAnd v_itemTo the double tower model, the user and the set are obtained

wherein

represents rounding up;

4.5.8 from

The screened tail entities in the set belong to

The triplet of (2):

4.5.9, storing the triple set Lambda to the user _ map: user _ map (k, Λ); 4.5.10, will

Set overlay to temp _ set;

Claims

1. A knowledge graph user preference entity recalling method based on a double-tower model is characterized by comprising the following steps:

1) defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;

2) training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;

3) defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;

4) and inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preference entity transmission mode, screening the entities according to the prediction probability, and finally obtaining a knowledge graph representing the user preference and the potential preference.

2. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of step 1) is as follows:

1.1) user characteristics refer to interactive behaviors of a user on an article, including click records, search records, social data, personal data and sample age, and user characteristic vectors are obtained by converting the interactive data into vectors and splicing the vectors; the mode of converting the original data into the vector is called vector embedding;

1.1.1) the embedding of the user click record is the weighted average of the id types of the clicked items, wherein the id types of the embedding are vectors which map the unique identifiers of the items to the same dimension, and the weight of the id types of the embedding is in direct proportion to the item browsing time; the imbedding calculation formula of the user click record is as follows:

wherein v is_clickAn embedding representing the user clicking on a record,

can be calculated by the following formula:

wherein

Representing the time when the user browses the item i, N representing the total number of samples, and k representing the total number of positive examples;

1.1.2) the embedding of the user search record is to perform word segmentation on the keywords of the historical search to obtain entries; the Word segmentation process is to obtain embedding of a corresponding entry through a Word2vec model, and then carry out weighted average on the embedding of the user search record;

the formula for embedding the user search records is as follows:

wherein v is_searchEmbedding representing a user's search records,

weight calculation of embedding of search records:

1.1.3) the social data of the user comprises the embedding weighted average corresponding to the collection, praise and subscription data; the imbedding corresponding to the collection and approval data refers to the imbedding of the item id class collected and approved by the user; subscribing the embedding corresponding to the data refers to subscribing the embedding of the id class of a person in charge corresponding to the item by the user;

the formula for embedding the social data of the user is as follows:

wherein v is_socialEmbedding representing a user's search records,

weight calculation for favorites and praise embedding:

wherein

Representing user to itemi, browsing time, N represents the total number of samples, and k represents the total number of positive cases;

embedding weight calculation for a subscription:

wherein

1.1.4) personal data of a user includes the user's gender, age, and location; the neutral characteristic is a simple binary characteristic, the age and the region belong to a continuous characteristic, and the continuous characteristic is normalized into a real numerical value in a [0,1] interval; embedding of user personal data is a vector obtained by splicing the processed values of gender, age and region;

1.1.4.1) calculates a binary representation of the user's gender, which is formulated as follows:

1.1.4.2) calculating the normalized real value of the age and the region of the user, wherein the normalized formula is as follows:

1.1.4.3) splicing the sex binary value, age and the region normalized real value in the step 1.1.4) to obtain a vector, wherein the vector splicing operation formula is as follows:

v_personal＝[gender,z_age,z_region]

1.1.5) clicking recorded embedding by the user according to the process in the step 1.1), searching recorded embedding by the user, interacting data embedding by the user, and carrying out concatemate connection operation on the embedding of personal data of the user to obtain a user characteristic vector, wherein the formula is as follows:

v_user＝concatenate(v_click,v_search,v_social,v_personal)＝[v_click[1],v_click[2],…,v_search[1],v_search[2],…,v_social[1],v_social[2],…,v_personal[1],v_personal[2],…]

1.2) the article characteristics comprise the id of the article and the context information thereof, and the article characteristic vector is formed by splicing the id class embedding of the article and the embedding of the context information thereof;

1.2.1) giving an id class embedding of the article, wherein the id class embedding is a vector for mapping the unique identifier of the article to the same dimension;

1.2.2) giving context information embedding of the article, wherein the context information is a vector obtained by Word2 vec;

1.2.3) carrying out a containing connection operation on the id type embedding and the context information embedding in the step 1.2) to obtain an article feature vector, wherein the formula is as follows:

wherein v is_itemIndicating the nature of the articleEigenvectors, v_idId class embedding, v representing an item_contextEmbedding, v representing item context information_id[i]The i component, v, of the id class embedding representing the item_context[i]The ith component of embedding representing item context information.

3. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 2) is as follows:

2.1) the embedding function gives two band parameters:

wherein

Representing a d-dimensional real number vector;

And

respectively are input values required by the double-tower model;

2.1.1) the double tower model contains two deep neural network models; the component unit of the neural network model is a perceptron, the perceptron has a plurality of inputs and an output, wherein the output and the input are a linear relation trained, the output value can obtain the final result through a nonlinear activation function;

2.1.2) the deep neural network is expanded on the neural network model, and the deep neural network is provided with an input layer, an output layer and a plurality of hidden layers;

2.1.3) inputting the user characteristic vector and the article characteristic vector into a corresponding neural network, wherein the output result is represented by a function containing a parameter theta to be trained because the model needs to be trained, and the output results are respectively as follows:

wherein

2.1.4) will output the result

wherein

γ_iIs represented by [ Y]The ith component on the vector;

2.2) through a frequency estimation method based on a hash sequence, namely, the hash sequence is used for recording the serial number of the sample, the problem of sampling deviation possibly occurring in each batch of negative samples is reduced, so that a loss function is optimized, and the following steps are circulated;

2.2.1) randomly taking T samples from the sample set, expressed as follows:

wherein x_iRepresenting the ith user feature vector in sample T,

y_irepresenting the ith item feature vector in sample T,

r_irepresents the degree of feedback of the ith user in the sample T, and r_i∈[0,1]；

2.2.2) computing y in each sample by using a frequency estimation method based on a Hash sequence_iProbability p of_i；

2.2.2.1) setting arrays A and D with learning rate alpha and size H and a hash function H;

wherein the hash function h may hash each y_iMapping to a range of [0, H]An integer of (d);

2.2.2.2) for each step t ═ 1,2, …, all collections of items in one batch of the training sample number batch

For each article

Comprises the following steps:

2.2.2.2.1)

where ← represents the assignment,

denotes y_iThe time of the last time that it was sampled,

denotes y_iThe number of steps of one pass of the sampled data;

2.2.2.2.2)

2.2.2.3) for each

Probability of sampling

2.2.3) after calculation optimizationLoss function of

And giving a derivation process;

2.2.3.1) gives the vector inner product formula:

2.2.3.2) given a vector

Obtaining one of the M articles

wherein e is a natural constant;

2.2.3.3) weighted log-likelihood loss function as:

wherein T represents 2.2.1) the randomly taken T samples;

2.2.3.4) calculating by using a negative sampling algorithm, wherein the negative sampling algorithm adopts a smoothing strategy and can improve the sampling probability of low-frequency samples; sampling negative samples in the same batch; giving a minimum batch of

Wherein the batch-softmax function is:

2.2.3.5) in each batch, the sampling is corrected by frequency, which is considered to be the following equation, because of the power distribution phenomenon, that is, the negative samples are randomly sampled in each batch, which makes the hot goods easy to sample, and the hot goods are excessively punished in the loss function:

s^c(x_i,y_i)＝s(x_i,y_i)-log(p_i)

wherein s is^c(x_i,y_i) Denotes s (x)_i,y_i) Correction value of p_iIs y_iThe probability of (d);

2.2.3.6) the conditional probability function thus modified is:

2.2.3.7) the final loss function is:

2.2.4) updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value; the gradient descent algorithm machine learns the commonly used algorithm and is used for solving the model parameters.

4. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 3) is as follows:

3.1) given a user interaction matrix

Is shown as

A user and the second

the expression of the user interaction matrix Y is as follows:

3.2) definition of O_i,jRepresenting the interaction condition of the user i to the item j, and mapping the item with the interaction of the user i to the knowledge graph

The entity of (1);

3.2.1)O_i,jis an interactive matrix, which is composed of row vectors of users; o is_i,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;

3.2.2) defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the article, and the value stores the entity corresponding to the object;

3.2.3) expressing the interaction condition of a user from the row vector of the user interaction matrix O, and storing the row vector in an array, wherein the array is defined as Ee;

3.2.3) defining a temporary set temp _ set user storage entity, traversing the array E, and if the value of an element in the array is 1, accessing the article HashMap according to the index of the element, acquiring the corresponding entity, and storing the entity in the temporary set temp _ set.

5. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 4) is as follows:

4.1) providing a user with the user feature vector v of the first step_user；

4.2) initializing the temporary set temp _ set of the user according to the third step;

4.3) defining a HashMap preferred by a user, namely a user _ map, storing the circulation times of a key value, and storing a triple set by a value;

4.4) defining a set used _ set for storing the recalled triples;

4.5) making the number K of loops equal to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the loop, and executing the following steps:

4.5.1) traverse the entities in temp _ set, remove the entities in used _ set:

temp_set←temp_set-used_set

wherein ← represents valuation, -represents difference set operation;

4.5.2) finding triple sets from the knowledge-graph

And stored in the user _ map, defined as follows:

4.5.3) due to the existence of a loop of the knowledge graph, in order to prevent the collection of the recalled entities, the entities in temp _ set need to be added to used _ set;

4.5.4) taking out