CN112307281A

CN112307281A - Entity recommendation method and device

Info

Publication number: CN112307281A
Application number: CN201910677886.6A
Authority: CN
Inventors: 佟娜; 刘智朋; 许静芳; 陈炜鹏
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2021-02-02

Abstract

The application discloses an entity recommendation method and device, which are used for acquiring a target query request input by a user when the user carries out search query so as to extract target entity words included in the target query request. And then, according to an entity word vector set generated by pre-training, obtaining a word vector of the target entity word, and determining a word vector corresponding to the target query request according to the word vectors of the target entity words included in the target query request. And searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set, taking the entity word corresponding to the searched at least one word vector as a recommended word corresponding to the target query request, and recommending the recommended word to the user. Therefore, when determining the recommended word according to the target query request input by the user, the entity word included in the target query request is used for determining the recommended word related to the target query request, so that the relevance of the determined recommended word and the target entity word is ensured, and the recommendation accuracy is improved.

Description

Entity recommendation method and device

Technical Field

The application relates to the technical field of internet, in particular to an entity recommendation method and device.

Background

With the development of internet technology, users may query for content of interest using a search engine. When a user searches for a certain keyword, the related keyword can be recommended to the user so that the user can continuously check the content of the related keyword, and the process of recommending the keyword can be regarded as an entity recommendation process.

In the traditional entity recommendation process, some related query requests can be expanded according to the query requests of the user, and then entity recommendation is carried out by using the related query requests. However, in some cases, the difference between the related query request and the original query request is large, so that the correlation between the result obtained by recommending the entity by using the related query request and the entity in the original query request is poor, and the entity recommendation is inaccurate.

Disclosure of Invention

In view of this, embodiments of the present application provide an entity recommendation method and apparatus to solve the technical problem in the prior art that entity recommendation is inaccurate.

In order to solve the above problem, the technical solution provided by the embodiment of the present application is as follows:

an entity recommendation method, the method comprising:

acquiring a target query request input by a user, and extracting target entity words included in the target query request;

obtaining a word vector of the target entity word according to a pre-generated entity word vector set; determining a word vector corresponding to the target query request according to the word vector of each target entity word included in the target query request; the entity word vector set comprises word vectors corresponding to entity words, and is obtained by training the entity word vector set according to at least one related entity word set;

searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set, and determining the entity word corresponding to the searched at least one word vector as the recommended word corresponding to the target query request;

and recommending the recommended word to the user.

In a possible implementation manner, the determining, according to the word vector of each target entity word included in the target query request, a word vector corresponding to the target query request includes:

and calculating the average value of the word vectors of all target entity words included in the target query request, and determining the average value as the word vector corresponding to the target query request.

In a possible implementation manner, the determining the entity word corresponding to the at least one found word vector as the recommended word corresponding to the target query request includes:

sorting the entity words corresponding to the at least one searched word vector according to the relevance with the target query request;

and determining the entity words corresponding to the word vectors of which the sequencing results meet the preset conditions as recommended words corresponding to the target query request.

In one possible implementation, the method further includes:

acquiring at least one related entity word set;

and training a language model according to the at least one related entity word set to obtain word vectors of all entity words, and forming the entity word vector set.

In one possible implementation manner, the obtaining at least one related entity word set includes:

acquiring at least one query request input by the same user within a preset time length;

extracting entity words from each of the query requests;

and constructing a related entity word set according to the extracted entity words.

In a possible implementation manner, the constructing a related entity word set according to the extracted entity words includes:

combining every two entity words extracted from different query requests to form an entity word pair;

determining whether two entity words in the entity word pair have relevance;

and eliminating the entity word pairs of which two entity words in the entity word pairs do not have correlation, and dividing the entity words corresponding to the entity word pairs containing the same entity words into the same related entity word set in the remaining entity word pairs to construct at least one related entity word set.

In one possible implementation manner, the determining whether two entity words in the entity word pair have a correlation includes:

calculating a relevance value between two entity words in the entity word pair;

when the relevance value is larger than or equal to a first threshold value, the relevance exists between the two entity words in the entity word pair;

and when the relevance value is smaller than a first threshold value, the two entity words in the entity word pair have no relevance.

acquiring the times of two entity words in the entity word pair appearing in the same query request;

when the times are larger than or equal to a second threshold value, the correlation exists between the two entity words in the entity word pair;

and when the times are smaller than a second threshold value, the two entity words in the entity word pair have no correlation.

In one possible implementation, the method further includes:

establishing a high-dimensional search tree according to the entity word vector set;

the searching for at least one word vector with higher word vector similarity corresponding to the query request in the entity word vector set includes:

and searching at least one word vector with higher word vector similarity corresponding to the target query request in the high-dimensional search tree.

An entity recommendation apparatus, the apparatus comprising:

the system comprises an extraction unit, a query unit and a query unit, wherein the extraction unit is used for acquiring a target query request input by a user and extracting target entity words included in the target query request;

the first determining unit is used for obtaining a word vector of the target entity word according to a pre-generated entity word vector set; determining a word vector corresponding to the target query request according to the word vector of each target entity word included in the target query request; the entity word vector set comprises word vectors corresponding to entity words, and is obtained by training the entity word vector set according to at least one related entity word set;

the searching unit is used for searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set;

a second determining unit, configured to determine an entity word corresponding to the at least one found word vector as a recommended word corresponding to the target query request;

and the recommending unit is used for recommending the recommended words to the user.

In a possible implementation manner, the first determining unit is further configured to:

In a possible implementation manner, the second determining unit includes:

the sequencing subunit is used for sequencing the entity words corresponding to the at least one searched word vector according to the correlation with the target query request;

and the first determining subunit is used for determining the entity word corresponding to the word vector of which the sequencing result meets the preset condition as the recommended word corresponding to the target query request.

In one possible implementation, the apparatus further includes:

the acquiring unit is used for acquiring at least one related entity word set;

and the training unit is used for training a language model according to the at least one related entity word set to obtain word vectors of all entity words and form the entity word vector set.

In a possible implementation manner, the obtaining unit includes:

the acquisition subunit is used for acquiring at least one query request input by the same user within a preset time length;

a first extraction subunit, configured to extract an entity word from each of the query requests;

and the construction subunit is used for constructing a related entity word set according to the extracted entity words.

In one possible implementation, the building subunit includes:

the second extraction subunit is used for combining the entity words extracted from the different query requests pairwise to form entity word pairs;

a second determining subunit, configured to determine whether two entity words in the entity word pair have a correlation;

and the dividing subunit is used for removing the entity word pairs in which two entity words in the entity word pairs do not have correlation, and dividing the entity words corresponding to the entity word pairs containing the same entity words into the same related entity word set in the remaining entity word pairs so as to construct at least one related entity word set.

In a possible implementation manner, the second determining subunit is specifically configured to:

calculating a relevance value between two entity words in the entity word pair; when the relevance value is larger than or equal to a first threshold value, the relevance exists between the two entity words in the entity word pair; and when the relevance value is smaller than a first threshold value, the two entity words in the entity word pair have no relevance.

acquiring the times of two entity words in the entity word pair appearing in the same query request; when the times are larger than or equal to a second threshold value, the correlation exists between the two entity words in the entity word pair; and when the times are smaller than a second threshold value, the two entity words in the entity word pair have no correlation.

In one possible implementation, the apparatus further includes:

the building unit is used for building a high-dimensional search tree according to the entity word vector set;

the search unit is specifically configured to:

An apparatus for entity recommendation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:

and recommending the recommended word to the user.

A computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the entity recommendation method described above.

Therefore, the embodiment of the application has the following beneficial effects:

when a user carries out search query, the server side can obtain a target query request input by the user so as to extract a target entity word included in the target query request. And then, according to an entity word vector set generated by pre-training, obtaining a word vector of the target entity word, and determining a word vector corresponding to the target query request according to the word vectors of each target entity word included in the target query request. And then, at least one word vector with higher word vector similarity corresponding to the target query request is searched in the entity word vector set, the entity word corresponding to the searched at least one word vector is used as a recommended word corresponding to the target query request, and the recommended word is recommended to the user. It can be seen that, in the embodiment of the present application, when determining a recommended word according to a target query request input by a user, the recommended word related to the recommended word is determined by using an entity word included in the target query request, so that the correlation between the determined recommended word and the target entity word is ensured, and a situation that the entity word in the query request expanded in the prior art has a large deviation from the entity word in the original query request is avoided, so that the recommended word having the correlation with the target entity word can be recommended for the user, and the recommendation accuracy is improved.

Drawings

Fig. 1 is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application;

fig. 2 is a flowchart of an entity recommendation method according to an embodiment of the present application;

fig. 3 is a flowchart of a method for obtaining a set of vector quantities of entity words according to an embodiment of the present application;

fig. 4 is a flowchart of a method for constructing a related entity word set according to an embodiment of the present application;

fig. 5 is a device structure diagram of an entity recommendation method according to an embodiment of the present application;

fig. 6 is a diagram of an apparatus structure of another entity recommendation method according to an embodiment of the present application;

fig. 7 is a schematic diagram of a server device according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.

The inventor finds that the traditional entity recommendation is expanded according to the user query request to obtain some related query requests, and then the entity recommendation is performed by using the related query requests. However, when the related query request is expanded, a problem that the entity word in the expanded related query request has a large deviation from the entity word in the original query request occurs, and further, the entity recommendation is inaccurate is caused. For example, the original query request input by the user is "wedding", the query request expanded by the system is "wedding is in the vegetable greenhouse", wherein the deviation between the "vegetable greenhouse" and the entity "wedding" in the original query request is large, and then results related to some vegetable greenhouses can be obtained when recommendation is performed according to the "wedding is in the vegetable greenhouse", which is obviously unrelated to the original query request, and user experience is influenced.

Based on this, an embodiment of the present application provides an entity recommendation method, which is specifically configured to, when a user performs search query, obtain a target query request input by the user, and extract a target entity word included in the target query request. And then, obtaining a word vector corresponding to each target entity word according to a pre-generated entity word vector set, and determining a word vector corresponding to the target query request according to the word vector corresponding to each target entity word. And searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set, determining the entity word corresponding to the searched at least one word vector as a recommended word corresponding to the target query request, and recommending the recommended word to the user. That is, when determining the recommended word, the recommended word is determined according to the target entity word and the entity word vector set included in the target query request. The entity word vector sets are obtained by training according to the related entity word sets, and because each related entity word set comprises related entity words, word vectors in the entity word vector sets obtained by training according to the large number of related entity word sets meet the condition that the word vectors are more similar if the fruit body words are more related. After the word vector corresponding to the target entity word is determined, the word vector related to the word vector corresponding to the target entity word is searched in the entity word vector set, so that the entity word corresponding to the related word vector is determined as the recommended word, the fact that the recommended word is related to the target entity word in the target query request is guaranteed, deviation existing in the prior art is avoided, and recommendation accuracy is improved.

To facilitate understanding of the embodiments of the present application, reference is made to fig. 1, which is a schematic diagram of a framework of an exemplary application scenario provided by the embodiments of the present application. The entity recommendation method provided by the embodiment of the present application may be applied to the client 10 or the server 20.

In practical application, the server 20 may obtain a target query request input by a user through the client 10, and extract a target entity word included in the target query request. Then, a word vector corresponding to the target entity word is obtained according to a pre-generated entity word vector set, so that a word vector with the highest word vector similarity corresponding to the target entity word is searched in the entity word vector set, the entity word corresponding to the word vector with the highest similarity is determined as a recommended word, and the recommended word is sent to the client 10 to be recommended to the user.

Or, the client 10 obtains a target query request input by the user, and extracts a target entity word included in the target query request. And then, obtaining a word vector corresponding to the target entity word according to the entity word vector set so as to search a word vector with the highest word vector similarity corresponding to the target entity word from the entity word vector set, determining the entity word corresponding to the word vector with the highest similarity as a recommended word, and recommending the recommended word to the user.

Further, the client 10 may also send the recommended word to the server 20 according to a triggering operation of the user on the recommended word, so that the server 20 performs a related search according to the recommended word.

Those skilled in the art will appreciate that the block diagram shown in fig. 1 is only one example in which embodiments of the present application may be implemented. The scope of applicability of the embodiments of the present application is not limited in any way by this framework.

It is noted that the client 10 may be hosted by a terminal, which may be any user equipment now existing, developing, or later developed that is capable of interacting with each other through any form of wired and/or wireless connection (e.g., Wi-Fi, LAN, cellular, coaxial cable, etc.), including but not limited to: smart wearable devices, smart phones, non-smart phones, tablets, laptop personal computers, desktop personal computers, minicomputers, midrange computers, mainframe computers, and the like, either now in existence, under development, or developed in the future. The embodiments of the present application are not limited in any way in this respect. It should also be noted that the server 20 in the embodiment of the present application may be an example of an existing, developing or future developing device capable of providing a search service to a user. The embodiments of the present application are not limited in any way in this respect.

To facilitate understanding of an entity recommendation method provided by the embodiments of the present application, the method will be described below with reference to the accompanying drawings.

Referring to fig. 2, which is a flowchart of an entity recommendation method provided in an embodiment of the present application, as shown in fig. 2, the method may include:

s201: and acquiring a target query request input by a user, and extracting target entity words included in the target query request.

In this embodiment, when a user performs a search query, a target query request input by the user may be obtained, and a target entity word included in the target query request is extracted. The target query request is a query request input by a user at the current moment, the target entity words may be all entity words or partial entity words included in the target query request, and the number of the target entity words may be one or more. For example, the target query request input by the user is "better is wedding or standing dress at wedding", wherein "wedding", "standing dress" may be the target entity words.

In specific implementation, the word segmentation processing may be performed on the target query request, and then the target entity word may be extracted according to the word segmentation result. In practical application, the target entity word may be extracted from the target query request by using an existing entity word extraction method, which is not described herein again in this embodiment.

S202: obtaining a word vector of a target entity word according to a pre-generated entity word vector set; and determining a word vector corresponding to the target query request according to the word vector of each target entity word included in the target query request.

In this implementation, after the target entity words included in the target query are extracted, the word vector of each target entity word may be determined according to the entity word vector set generated in advance, and the word vector corresponding to the target query request may be determined according to the word vectors of the target entity words included in the target query request. The entity word vector set comprises word vectors corresponding to all entity words, and the entity word vector set is obtained by training according to at least one related entity word set.

The similarity of the word vectors in the entity word vector set can be used for reflecting the correlation of the corresponding entity words, and because certain correlation exists between the entity words in each related entity word set, the entity word vector set obtained by training according to a large number of related entity word sets can meet the condition that the more related the entity words are, the more similar the corresponding word vectors are. The word vector is a way to mathematically transform words in a language, and as the name implies, a word is represented as a vector, and the generation of the entity word vector set will be described in detail in the following embodiments.

It is understood that the entity word vector set may store a mapping relationship between entity words and word vectors. In a possible implementation manner, after the target entity word is extracted, a word vector corresponding to each target entity word may be determined according to a mapping relationship between the entity word and the word vector in the entity word vector set, and then word vectors corresponding to all target entity words included in the target query request may jointly constitute the word vector of the target query request. For example, according to the mapping relationship in the entity word vector set, if a word vector corresponding to the target entity word "wedding" is a, a word vector corresponding to the "wedding dress" is b, and a word vector corresponding to the "standing dress" is c, the word vector corresponding to the target query request "which is better for wearing the wedding dress or the standing dress at the wedding dress" is { word vector a, word vector b, word vector c }.

In a possible implementation manner, another manner of determining a word vector corresponding to the target query request is provided, specifically, an average value of word vectors of all target entity words included in the target query request is calculated, and the average value is determined as the word vector corresponding to the target query request. That is, after the word vector of each target entity word is obtained, the average value of the word vectors of all the target entity words may also be calculated, and the average value is determined as the word vector corresponding to the target query request. For example, if the word vector corresponding to the target entity word "wedding" in the target query request is a, the word vector corresponding to "wedding" is b, and the word vector corresponding to "standing dress" is c, then the word vector corresponding to the target query request "which is better when the wedding or standing dress is worn at the wedding" is: the average of the word vector a, the word vector b, the word vector c.

S203: and searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set, and determining the entity word corresponding to the searched at least one word vector as the recommended word corresponding to the target query request.

In this embodiment, after determining the word vector corresponding to the target query request, at least one word vector with higher word vector similarity to the target query request is searched from the entity word vector set, and the entity word corresponding to the searched at least one word vector is determined as the recommended word corresponding to the target query request. Wherein, the higher similarity can be: and arranging the similarity in descending order, and taking at least one of the similarity.

The word vector similarity can be represented by the distance between two word vectors, and the smaller the distance between two word vectors is, the greater the similarity is. The distance between the word vectors may be an euclidean distance, a cosine distance, etc., and this embodiment is not limited herein.

During specific implementation, the similarity between the word vectors in the entity word vector set and the word vectors corresponding to the target query request can be determined in a mode of traversing the entity word vector set, then at least one word vector with higher similarity corresponding to each word vector in the target query request is determined through similarity comparison, and the entity words corresponding to the word vectors with higher similarity are determined as the recommended words corresponding to the target query request. For example, the entity word vector set includes word vectors of 100 entity words in total, the word vectors corresponding to the target query request are word vector a, word vector b, and word vector c, and for word vector a, the similarity between a and 100 word vectors is determined, and the entity word corresponding to at least one word vector with higher similarity is determined as the recommended word of the target query request; for the word vector b, determining the similarity between the word vector b and 100 word vectors respectively, and determining an entity word corresponding to at least one word vector with higher similarity as a recommended word of the target query request; and for the word vector c, determining the similarity between the word vector c and 100 word vectors respectively, and determining the entity word corresponding to at least one word vector with higher similarity as the recommended word of the target query request, so that a plurality of recommended words corresponding to the target query request can be determined.

It should be noted that, when the target query request only includes one target entity word, the word vector corresponding to the target query request is the word vector of the target entity word. When at least one word vector with high word vector similarity corresponding to the target query request is searched in the entity word vector set, the word vector corresponding to the target entity word needs to be excluded. That is, when the target query request only includes one target entity word, the recommended word corresponding to the target query request needs to exclude the target entity word itself.

It can be understood that searching for the word vector with the highest similarity of the word vectors corresponding to the target query request in a traversal manner affects the search speed. In specific implementation, the search can be performed through the high-dimensional search tree, and the search speed is improved. A description will be given to a subsequent embodiment of an implementation manner of establishing a high-dimensional search tree according to the entity word vector set and searching for a word vector with the highest similarity using the high-dimensional search tree.

And finding at least one word vector with higher word vector similarity corresponding to the target query request, wherein the entity words corresponding to the found word vectors can be regarded as entity words related to the target entity words in the target query request, and then determining the entity words corresponding to the found at least one word vector as recommended words corresponding to the target query request.

In the concrete implementation, when the number of the entity words corresponding to the at least one word vector is found to be large, the entity words to be recommended can be screened first, so that the embodiment provides an implementation way for determining the recommended words corresponding to the target query request, specifically, the entity words corresponding to the at least one word vector are sorted according to the relevance with the target query request; and determining the entity words corresponding to the word vectors of which the sequencing results meet the preset conditions as recommended words corresponding to the target query request. That is, in an implementation manner, the relevance between the entity word corresponding to each found word vector and the target query request may be calculated first, and the ranking may be performed according to the relevance. And then, determining the entity words corresponding to the word vectors of which the sequencing results meet the preset conditions as recommended words corresponding to the target query request. The preset conditions may be set according to an actual sorting condition, for example, when sorting is performed from high to low according to the relevance, the preset conditions may be a preset number; the preset condition may be a reciprocal preset number when sorted from small to large in the degree of correlation. Or, the preset condition may be that the entity word corresponding to the word vector with the correlation greater than the preset threshold is determined as the recommended word corresponding to the target query request.

S204: and recommending the recommended words to the user.

In the embodiment, after the recommended word corresponding to the target query request is determined, the recommended word is recommended to the user, so that content recommendation can be performed by using the recommended word according to selection of the user, and the recommendation accuracy is improved.

As can be seen from the above description, when a user performs a search query, the server may obtain a target query request input by the user, so as to extract a target entity word included in the target query request. And then, according to an entity word vector set generated by pre-training, obtaining a word vector of the target entity word, and determining a word vector corresponding to the target query request according to the word vector. And then, at least one word vector with higher word vector similarity corresponding to the target query request is searched in the entity word vector set, the entity word corresponding to the searched at least one word vector is used as a recommended word corresponding to the target query request, and the recommended word is recommended to the user. It can be seen that, in the embodiment of the present application, when determining a recommended word according to a target query request input by a user, the recommended word related to the recommended word is determined by using an entity word included in the target query request, so that the correlation between the determined recommended word and the target entity word is ensured, and a situation that the entity word in the query request expanded in the prior art has a large deviation from the entity word in the original query request is avoided, so that the recommended word having the correlation with the target entity word can be recommended for the user, and the recommendation accuracy is improved.

In a possible implementation manner of the embodiment of the present application, a method for searching for a word vector by using a high-dimensional search tree is provided, specifically, a high-dimensional search tree is established according to an entity word vector set; and searching at least one word vector with higher word vector similarity corresponding to the target query request in the high-dimensional search tree. Namely, a high-dimensional search tree is established by using the entity word vector set, and then at least one word vector with the highest word vector similarity corresponding to the target query request is searched by using the high-dimensional search tree. The high-dimensional search tree structure can be a binary tree, and the high-dimensional search tree aims at realizing quick search, namely, solving the nearest neighbor in a high-dimensional space. Because if the entity word vector set is directly used, each vector needs to be traversed to find the nearest neighbor, the query time is O (n), and the query time adopting the high-dimensional search tree can be shortened to O (logn), wherein n is the number of entity words in the entity word vector set.

In practical applications, the idea of constructing a high-dimensional search tree is that two points adjacent to each other in the original space also appear close to each other in the tree structure. That is, if two points are spatially similar, the two points will be very likely to be divided into a branch when building the tree structure. When the high-dimensional search tree is built by using the entity word vector set, if two word vectors in the entity word vector set have greater similarity, the two word vectors may be divided into the same direction of the high-dimensional search tree, if the similarity of the two word vectors is smaller, the two word vectors are divided into different directions of the high-dimensional search tree, and further, when the word vectors having similarity with the word vector corresponding to the target query request are searched by using the high-dimensional search tree, the search direction can be determined first, and further, the similar word vectors can be searched in one determined direction, so that the search speed is improved.

In view of the foregoing embodiments, the entity word vector set is obtained by training at least one related entity word set, and the generation of the entity word vector set will be described with reference to the drawings.

Referring to fig. 3, which is a flowchart of a method for generating a set of entity word vectors according to an embodiment of the present application, as shown in fig. 3,

s301: at least one related entity word set is obtained.

In this embodiment, to obtain the entity word vector set for training, training data, that is, a related entity word set, is first obtained, so that the entity word vector set is generated by training a large number of related entity word sets.

It can be understood that, in order to ensure richness and diversity of the entity word vector set, when the related entity word set is obtained, a large number of related entity word sets may be obtained, so that the entity word vector set generated by training includes word vectors corresponding to the required entity words as much as possible. In specific implementation, this embodiment provides an implementation manner for obtaining a related entity word set, which may specifically be implemented by the following steps:

1) and acquiring at least one query request input by the same user within a preset time length.

2) Entity words are extracted from each query request.

In this embodiment, for the same user, one or more query requests input by the user within a preset time duration are obtained, so as to extract an entity word from each query request. The preset time period may be determined according to an actual situation, for example, the preset time period is 5 minutes, which is not limited herein.

For example, if the user a inputs four query requests within 5 minutes, the entity words a1 and b1 included in the query requests are extracted from the query request 1; extracting the entity words a2 and b2 included in the query request 2; extracting the entity words a3 and b3 included in the query request 3; the entity words a4, b4 that it includes are extracted from the query request 4.

3) And constructing a related entity word set according to the extracted entity words.

In this embodiment, it may be considered that query requests input by users within a certain time period may have relevance, and after entity words in each query request are extracted from the query requests corresponding to the same user, a relevant entity word set may be constructed according to the extracted entity words. And certain correlation exists between the entity words included in each related entity word set, so that when the entity word vector set is generated by utilizing the related entity word set training, the word vectors corresponding to the entity words in the related entity word set also have correlation in the entity word vector set. A specific implementation of constructing a related entity word set by using the extracted entity words will be described in the following embodiments.

S302: and training a language model according to at least one related entity word set to obtain word vectors of all entity words, and forming the entity word vector set.

In this embodiment, after the relevant entity word set is obtained, the language model is trained by using the relevant entity word set as training data to obtain word vectors of each entity word, so that the entity word vector set is formed according to the word vectors of each entity word. For example, the related entity word set 1 is obtained as [ a1b1a2b2]]The related entity word set 2 is [ a3b3a4b4]]Training a language model by utilizing two related entity word sets to obtain a word vector corresponding to each entity word, wherein the word vector is a^_1、b_1、a_2、b _2, a _3, b _3, a _4, b _4, then 8 word vectors are formed into a set of entity word vectors.

The language model can be a word vector word2vec model, the word2vec model is an efficient algorithm model capable of representing entity words as real numerical vectors, processing of the entity words is simplified into vector operation in a K-dimensional vector space through training by means of deep learning thought, and similarity on the vector space can be used for representing similarity between the entity words. The core idea of the word2vec model is that each entity word is mapped into a K-dimensional real number vector through training, and the semantic similarity between the corresponding entity words is judged according to the distance (such as cosine similarity, Euclidean distance and the like) between word vectors. The shorter the distance between two word vectors is, the greater the similarity of the entity words corresponding to the two word vectors is.

It can be understood that, because there is a correlation between entity words in the related entity word set, when the word vector of each entity word is obtained by using the language model generated by training the related entity word set, it can be ensured that there is a correlation between word vectors of each entity word in the same related entity word set. That is, the more related the entity words in the entity word vector set, the more similar the word vectors corresponding to the entity words, so that the similarity of the word vectors can embody the relevance of the entity words.

By the method, the language model can be trained by utilizing the related entity word set to obtain the word vector of each entity word, and further the entity word vector set is obtained. When entity recommendation is needed, extracting a target entity word from a target query request input by a user, obtaining a word vector corresponding to the target entity word according to an entity word vector set, and taking the word vector as a word vector corresponding to the target query request. And searching at least one word vector with higher word vector similarity corresponding to the target query request in the entity word vector set so as to determine the entity words corresponding to the at least one word vector with higher similarity as recommended words to be recommended to the user.

Referring to fig. 4, which is a flowchart of a method for constructing a related entity word set according to an embodiment of the present application, as shown in fig. 4, the method may include:

s401: and combining the entity words extracted from different query requests pairwise to form entity word pairs.

In this embodiment, in a specific implementation, after the entity words included in each of the different query requests are extracted, the entity word pairs composed of the entity words in the different query requests are obtained in a pairwise combination manner. It can be understood that the number of the formed entity word pairs is not only related to the number of the query requests acquired within the preset duration, but also related to the number of the entity words included in each query request.

For example, four query requests are obtained, where query request 1 includes m entity words, query request 2 includes n entity words, query request 3 includes p entity words, and query request 4 includes q entity words, and the number of entity word pairs that can be formed is m × n + m × p + m + q + n × p + n × q + p.

S402: it is determined whether two entity words within an entity word pair have a correlation.

S403: and eliminating the entity word pairs of which two entity words in the entity word pairs do not have correlation, and dividing the entity words corresponding to the entity word pairs containing the same entity words into the same related entity set in the remaining entity word pairs to construct at least one related entity set.

It is understood that the query requests input by the user within the preset time period may have correlation between partial query requests, and no correlation between partial query requests. When entity word pairs are formed in a pairwise combination mode, entity word pairs formed by entity words included in two query requests without relevance exist. In order to ensure that the entity words included in the constructed related entity word set have correlation, after acquiring the entity word pairs composed of the entity words included in the different query requests, it is further required to determine whether two entity words in each entity word pair have correlation. Aiming at the entity word pairs with the correlation between the two entity words, the entity words corresponding to the entity word pairs containing the same entity word are divided into the same related entity word set, so that the related entity word set is constructed. And when two entity words included in one entity word pair have no correlation, the entity word pair is removed and is not put into the correlated entity word set.

For example, a user inputs four query requests within a preset time length, wherein the query requests 1 comprise entity words a1 and b 1; entity words a2, b2 included in the query request 2; entity words a3, b3 included in the query request 3; entity words a4 and b4 included in the query request 4 can form 24 entity word pairs [ a1a2], [ a1b2], [ b1a2], [ b1b2 ]; [ a1a3], [ a1b3], [ b1a3], and [ b1b3 ]; [ a1a4], [ a1b4], [ b1a4], and [ b1b4 ]; [ a2a3], [ a2b3], [ b2a3], [ b2b3 ]; [ a2a4], [ a2b4], [ b2a4], [ b2b4 ]; [ a3a4], [ a3b4], [ b3a4], and [ b3b4 ]. However, query request 1 and query request 2 entered by the user are both for "wedding"; and the query request 3 and the query request 4 relate to "tax in 19 years", there is no correlation between the entity word in the query request 1 and the entity word in the query request 3 or the query request 4, and in order to avoid dividing the irrelevant entity word into the same relevant entity word set, it is also necessary to determine whether two entity words in each entity word pair have correlation.

When judging whether two entity words in the entity word pair have correlation, determining the removed entity word pair and the entity word pair which can be divided into the same related entity word set according to the judgment result. Suppose that there is a correlation between two entity words included in each entity word pair of [ a1a2], [ a1b2], [ b1a2], [ b1b2], and there is a correlation between two entity words included in each entity word pair of [ a3a4], [ a3b4], [ b3a4], [ b3b4 ]. And [ a1a3], [ a1b3], [ b1a3], [ b1b3] there is no correlation between the two entity words included in each entity word pair, [ a1a4], [ a1b4], [ b1a4], [ b1b4] there is no correlation between the two entity words included in each entity word pair, [ a2a3], [ a2b3], [ b2a3], [ b2b3] there is no correlation between the two entity words included in each entity word pair, [ a2a4], [ a2b4], [ b2a4], [ b2b4] there is no correlation between the two entity words included in each entity word pair, the entity words including no correlation entity word are eliminated, and the entity words including the same entity word pair are classified into the same set of related entities. For example, [ a1a2], [ a1b2] contain the same entity word a1, [ b1a2], [ b1b2] contain the same entity word b1, and a1 and b1 are related by default, so that the related entity word set 1 can be established as [ a1b1a2b2 ]. If [ a3a4], [ a3b4] contains the same entity word a3, [ b3a4], [ b3b4] contains the same entity word b3, and a3 and b3 are related by default, the related entity word set 2 is constructed to be [ a3b3a4b4], that is, two related entity word sets are constructed together.

It should be noted that, when a language model is actually trained, a large number of related entity word sets respectively corresponding to users need to be acquired, so that subsequent language model training is performed by using the large number of related entity word sets. The related entity word set corresponding to each user can be obtained through the method, so that a large number of related entity word sets are obtained.

In a possible implementation manner of the embodiment of the present application, two implementation manners for determining whether two entity words in an entity word pair have a correlation are provided, and the two implementation manners will be described below separately.

One is to calculate a relevance value between two entity words in the entity word pair, where the relevance value is greater than or equal to a first threshold, there is a relevance between the two entity words in the entity word pair, and where the relevance value is less than the first threshold, there is no relevance between the two entity words in the entity word pair. That is, for each entity word pair, a relevance value between two entity words is calculated, if the relevance value is greater than or equal to a first threshold value, it is determined that there is a relevance between the two entity words within the entity word pair, otherwise it is determined that there is no relevance between the two entity words within the entity word pair. The first threshold may be set according to an actual situation, and this example is not limited herein.

The other is to obtain the times of occurrence of two entity words in the entity word pair in the same query request, when the times is greater than or equal to a second threshold, there is a correlation between the two entity words in the entity word pair, and when the times is less than the second threshold, there is no correlation between the two entity words in the entity word pair. In a specific implementation, entity words in the same query request can be considered to be related, and then the occurrence of entity word pairs in a large number of query requests of different users can be counted. For an entity word pair, the occurrence frequency of two entity words in the entity word pair in the same query request can be obtained, if the occurrence frequency of the two entity words in the entity word pair is greater than or equal to a second threshold value, the fact that the correlation exists between the two entity words in the entity word pair is determined, and otherwise, the correlation does not exist. The second threshold may be determined according to actual conditions, and this embodiment is not limited herein.

By the method provided by the embodiment, a required related entity word set can be constructed, so that a language model is trained by using the related entity word set, word vectors corresponding to all entity words in the related entity word set are obtained, and an entity word vector set is further obtained, so that entity recommendation is performed by using the entity word vector set.

Based on the above method embodiment, the present application further provides an entity recommendation apparatus, which will be described below with reference to the apparatus.

Referring to fig. 5, which is a block diagram of an entity recommending apparatus provided in an embodiment of the present application, as shown in fig. 4, the apparatus may include:

an extracting unit 501, configured to obtain a target query request input by a user, and extract a target entity word included in the target query request;

a first determining unit 502, configured to obtain a word vector of the target entity word according to a pre-generated entity word vector set; determining a word vector corresponding to the target query request according to the word vector of each target entity word included in the target query request; the entity word vector set comprises word vectors corresponding to entity words, and is obtained by training the entity word vector set according to at least one related entity word set;

a searching unit 503, configured to search, in the entity word vector set, at least one word vector with a higher word vector similarity corresponding to the target query request;

a second determining unit 504, configured to determine an entity word corresponding to the found at least one word vector as a recommended word corresponding to the target query request;

a recommending unit 505, configured to recommend the recommended word to the user.

In a possible implementation manner, the second determining unit includes:

In one possible implementation, the apparatus further includes:

the acquiring unit is used for acquiring at least one related entity word set;

and the training unit is used for training a language model according to the at least one related entity word set to obtain word vectors of all entity words and form an entity word vector set.

In a possible implementation manner, the obtaining unit includes:

In one possible implementation, the building subunit includes:

the second extraction subunit also has the difficulty that every two entity words extracted from different query requests are combined to form an entity word pair;

and calculating a relevance value between the two entity words in the entity word pair, wherein when the relevance value is greater than or equal to a first threshold value, the relevance exists between the two entity words in the entity word pair, and when the relevance value is smaller than the first threshold value, the relevance does not exist between the two entity words in the entity word pair.

In one possible implementation, the apparatus further includes:

the search unit is specifically configured to:

It should be noted that, implementation of each unit in this embodiment may refer to the above method embodiment, and this embodiment is not described herein again.

FIG. 6 illustrates a block diagram of an apparatus 600 for implementing entity recommendations. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 810 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following methods:

and recommending the recommended word to the user.

Optionally, the determining, according to the word vector of each target entity word included in the target query request, a word vector corresponding to the target query request includes:

Optionally, the determining the entity word corresponding to the at least one found word vector as the recommended word corresponding to the target query request includes:

Optionally, the method further includes:

acquiring at least one related entity word set;

Optionally, the obtaining at least one related entity word set includes:

extracting entity words from each of the query requests;

Optionally, the constructing a related entity word set according to the extracted entity words includes:

determining whether two entity words in the entity word pair have relevance;

Optionally, the determining whether two entity words in the entity word pair have relevance includes:

calculating a relevance value between two entity words in the entity word pair;

Optionally, the method further includes:

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of entity recommendation, the method comprising:

and recommending the recommended word to the user.

Optionally, the method further includes:

acquiring at least one related entity word set;

Optionally, the obtaining at least one related entity word set includes:

extracting entity words from each of the query requests;

determining whether two entity words in the entity word pair have relevance;

calculating a relevance value between two entity words in the entity word pair;

Optionally, the method further includes:

Fig. 7 is a schematic structural diagram of a server in an embodiment of the present invention. The server 700 may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.

The terminal 700 can also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An entity recommendation method, the method comprising:

and recommending the recommended word to the user.

2. The method according to claim 1, wherein the determining, according to the word vector of each target entity word included in the target query request, a word vector corresponding to the target query request includes:

3. The method according to claim 1, wherein the determining the entity word corresponding to the at least one found word vector as the recommended word corresponding to the target query request includes:

4. The method of claim 1, further comprising:

acquiring at least one related entity word set;

5. The method of claim 4, wherein obtaining at least one set of related entity words comprises:

extracting entity words from each of the query requests;

6. The method of claim 5, wherein constructing the set of related entity words from the extracted entity words comprises:

determining whether two entity words in the entity word pair have relevance;

7. The method of claim 6, wherein the determining whether two entity words in the entity word pair have a correlation comprises:

calculating a relevance value between two entity words in the entity word pair;

8. The method of claim 6, wherein the determining whether two entity words in the entity word pair have a correlation comprises:

9. The method according to any one of claims 1-8, further comprising:

10. An entity recommendation apparatus, the apparatus comprising:

11. An apparatus for entity recommendation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for:

and recommending the recommended word to the user.

12. A computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the entity recommendation method of any of claims 1-9.