WO2020253591A1

WO2020253591A1 - Search method and apparatus applying tag knowledge network

Info

Publication number: WO2020253591A1
Application number: PCT/CN2020/095370
Authority: WO
Inventors: 郝俊禹; 文辉; 陈运文
Original assignee: 达而观信息科技（上海）有限公司
Priority date: 2019-06-19
Filing date: 2020-06-10
Publication date: 2020-12-24
Also published as: CN110059271B; CN110059271A

Abstract

A search method and apparatus applying a tag knowledge network. The method comprises: acquiring a plurality of recommended items, carrying out tag extraction on text information related to each of the recommended items to obtain one or more corresponding item tags, and determining an item tag set composed of all the item tags (S1); determining historical behavior data of a user for different items, analyzing user tags that the user prefers according to the historical behavior data, and determining a user tag set composed of all the user tags (S2); constructing a tag knowledge network by means of the item tag set, a knowledge graph and a word2vec model (S3); generating item feature vectors of the recommended items and user feature vectors of the user according to the item tag set, the user tag set and the tag knowledge network (S4); and respectively performing retrieval by means of the item feature vectors or the user feature vectors to obtain related items or related users (S5). The method can not only ensure the relevance of a content algorithm recall result, but also overcome the defects of semantic limitation and poor expansibility of a tag recall result.

Description

Search method and device using tag knowledge network

Technical field

This application relates to the field of intelligent search technology, and specifically to a search method and device using a tag knowledge network.

Background technique

With the development of Internet technology and social networks, a large amount of information including textual information, pictures, videos, etc. are posted on the Internet every day. Traditional search technology has been unable to meet the needs of users for information discovery, and the personalized recommendation system came into being to solve the problem of information overload. It can recommend the required information to users according to their interests and behaviors, and help users quickly discover and satisfy the masses of information, so as to increase users' stickiness to products, improve user retention, and improve product competitiveness.

Currently commonly used recommendation algorithms include content-based recall algorithms, collaborative filtering-based recall algorithms (user-based collaborative filtering and item-based collaborative filtering), and model learning-based recall algorithms (from simple logistic regression models to gradient boosting trees, Then to deep learning) and so on. Among them, the content-based recall algorithm is the most common but very important recommendation algorithm. Its key point is the construction and mining of the label system. The recommended items (such as information, pictures, videos, etc.) must be decomposed into a series of tags; then according to the user's behavior on the items (such as browsing, clicking, purchasing, etc.) The user is also described as a series of label collections. This series of labels is our characterization of the user, that is, the user portrait; finally, we recall the user’s favorite items through the user’s favorite label.

This paper introduces the tag knowledge network based on the content recall algorithm, designs a search application system based on the tag knowledge network, and vectorizes the user and item features based on the tag knowledge network, and then searches for similar items, similar users, Recall of items that users like.

The content-based recall algorithm has many advantages. For example, it can dig out a lot of effective information from the item (item, hereinafter item is equivalent to the item) data, which can enable new items to be launched quickly, and has very good interpretability. But it also has the following disadvantages:

1. Recall results have limited semantics and poor scalability

Content-based recall algorithms use tags to recall the results, but due to the fixed tags, the results of the recall are very limited and difficult to extend. For example, through the label "Monkey King", only information related to the Monkey King can be recalled, such as the Monkey King three fights the bones, and the Monkey King makes a noise in the Heavenly Palace. Master brother relationship), unless an item contains both the Monkey King and Zhu Bajie tags. After all, for the majority of West Journey fans, Monkey King and Zhu Bajie are indispensable.

2. Poor mining accuracy for similar users and similar items

Recommendation systems seldom use tags to discover similar users and items, mainly because the tags are too fine and poor in scalability. The label of the item is generally generated by the machine based on the text information of the item, because the cost of manual labeling is too high for a large number of items. Unlike the knowledge experts who have rich prior knowledge, the simple model cannot distinguish whether "Andy Lau" and "Hua Tsai" are the same semantic label.

Aiming at the semantic limitations of recall results, poor scalability, and poor accuracy in mining similar users and similar items in related technologies, no effective solutions have been proposed.

Summary of the invention

The main purpose of this application is to provide a search method and device using a tag knowledge network to solve at least one problem in the related technology.

In order to achieve the above objective, according to one aspect of the present application, a search method using a tag knowledge network is provided.

Obtain multiple recommended items, extract the text information related to each recommended item to obtain the corresponding one or more item tags, and determine an item tag set composed of all the item tags;

Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;

Construct a tag knowledge network through the item tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;

Generating an item feature vector of the recommended item and a user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

Relevant items or related users are retrieved through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.

Further, as in the aforementioned search method using the tag knowledge network, a plurality of recommended items are obtained; the text information related to each recommended item is extracted to obtain the corresponding one or more item tags, and it is determined that the An article label set composed of article labels, including:

Determine the text of each of the recommended items; wherein, the text includes: title and description;

Word segmentation of the text to obtain multiple phrases;

Determine the characteristics of each phrase; wherein, the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;

Score each phrase according to its characteristics, and retain the phrase that meets the first score threshold as the item tag of the corresponding recommended item;

Determine all the item tags of each of the recommended items, and obtain the item tag set.

Further, as in the aforementioned search method using the tag knowledge network, analyzing the user tags preferred by the user according to the historical behavior data includes:

Determine the corresponding item according to the historical behavior data;

Determining a second item tag corresponding to each corresponding item;

Weighting and merging all the second article labels, and determining the score of each weighted and merging second article label;

The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user.

Further, as in the aforementioned search method using the tag knowledge network, the method for determining the score of each weighted and combined second item tag is as follows:

Among them, N represents the number of items the user has clicked on, InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0, t _cur represents the current timestamp, t _ck represents the corresponding user click The timestamp of the item.

Further, as in the aforementioned search method using a tag knowledge network, the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:

Taking the item tag set as a corpus, and using the word2vec model to generate a vector for each tag;

The similarity w _tag between different tags is calculated according to the vector of each item tag and the cosine similarity value is used to generate a tag association network G _tag =<V _tag , E _tag >; where V _tag is the vertex of the tag association network Set, that is, the set of all tags; E _tag is the edge set of the tag association network, that is, the set of similarities w _tag between different tags;

The relationship between entities and entities in the knowledge graph is converted into association weights w _graph to generate a knowledge network G _k =<V _k , E _k >; where V _k is the vertex set of the knowledge network, that is, mapping knowledge all tags in a set of entities; E _k is the set of knowledge network side, i.e., between different tags associated weight w _graph entities weight set;

The knowledge network G _k =<V _k ,E _k > and the tag association network G _tag =<V _tag ,E _tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V is the vertex set of the tag knowledge network, and the vertex set of the tag knowledge network is exactly the same as the vertex set V _{tag of} the tag association network, that is, V=V _tag ; E is the tag knowledge The edge set of the network, the edge set of the tag knowledge network is the collection of the edge set of the tag association network and the edge set subset E′ of the knowledge network, and the edge set subset E′ of the knowledge network is contained in the knowledge network All entities edge set V _tag label forming a label, i.e., _{V = V tag, E = E} tag + E ',

The correlation weight w _{e in} E′ = w _tag + w _graph ;

Remove all the association relationships whose association weight w _{e is} lower than w _threshold in E′, and obtain E _cut ; where,

w _threshold is the associated weight threshold.

Further, as in the aforementioned search method using the tag knowledge network, generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:

Determine the tag vector T of each tag in the item tag set according to the item tag set and the tag knowledge network;

Determine the item feature vector I of each item according to the tag vector included in each item, as follows:

Wherein, N represents the number of tagged items included, T _i represents the i-th vector of the tag label.

Further, as in the aforementioned search method using the tag knowledge network, the vector dimension of the tag vector T is the number of edges in E _cut , where the value of the edge directly connected to the node of the tag is w _e , and the others are 0.

Further, as in the aforementioned search method using the tag knowledge network, generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:

The calculation of the user feature vector U according to the user tag set and the tag knowledge network is as follows:

Among them, K represents the number of tags that the user likes, W _i represents the degree of user's preference for the i-th tag, and T _i represents the tag vector of the i-th tag.

Further, as in the aforementioned search method using the tag knowledge network, the retrieval of the item feature vector to be retrieved or the user feature vector to be retrieved to obtain related items or related users respectively includes:

Calculate the first cosine value of the feature vector of the item to be retrieved and the second item feature vector of each recalled item; or

Calculating the second cosine value of the user feature vector U to be retrieved and the second user feature vector of each recalled user;

According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined.

In order to achieve the above objective, according to another aspect of the present application, a search device using a tag knowledge network is provided.

The search device using the tag knowledge network according to this application includes:

The label building module is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;

The user modeling module is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;

The tag knowledge network construction module is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;

The user and item feature construction module is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

The vector search module is used to retrieve related items or related users through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.

In the embodiment of this application, a search method and device using a tag knowledge network are adopted. The method includes: acquiring a plurality of recommended items, extracting the text information related to each recommended item to obtain the corresponding one or more item tags, and determining the item tag set composed of all the item tags; determining that the user has different items According to the historical behavior data, the user tags preferred by the user are analyzed according to the historical behavior data, and the user tag set composed of all the user tags is determined; the tag knowledge is constructed through the item tag set, the knowledge graph and the word2vec model Network; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges; according to the item tag set, user tag set, and the tag knowledge network, the item characteristics of the recommended items are generated Vector and the user feature vector of the user; through the item feature vector or user feature vector to retrieve related items or related users respectively; thus, it is possible to introduce a tag knowledge network based on the content recall algorithm and use the relationship vector of the tag network Recall strategy is designed based on vector search. In addition, the problem of exploding the correlation dimension between tags can be solved by pruning the tag knowledge network, so as to ensure the relevance of the content algorithm recall results. It can effectively solve the technical effect of the semantic limitation and poor scalability of the label recall results.

Description of the drawings

The drawings constituting a part of the application are used to provide a further understanding of the application, so that other features, purposes and advantages of the application become more obvious. The drawings and descriptions of the schematic embodiments of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:

Fig. 1 is a schematic diagram of a method flow of a search method using a tag knowledge network according to an embodiment of the present application;

Figure 2 is a schematic structural diagram of a tag knowledge network constructed according to a method in an embodiment of the present application;

3 is a schematic diagram of the connection structure of functional modules of a search device using a tag knowledge network according to an embodiment of the present application; and

Fig. 4 is a system flow diagram for searching through the search device using the tag knowledge network shown in Fig. 3.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances for the purposes of the embodiments of the present application described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", The orientation or positional relationship indicated by "vertical", "horizontal", "horizontal", "vertical", etc. are based on the orientation or positional relationship shown in the drawings. These terms are mainly used to better describe the present application and its embodiments, and are not used to limit that the indicated device, element, or component must have a specific orientation, or be constructed and operated in a specific orientation.

In addition, some of the above terms may be used to indicate other meanings in addition to the position or position relationship. For example, the term "shang" may also be used to indicate a certain dependency relationship or connection relationship in some cases. For those of ordinary skill in the art, the specific meanings of these terms in this application can be understood according to specific circumstances.

In addition, the terms "installation", "setup", "provided", "connected", "connected", and "socketed" should be interpreted broadly. For example, it can be a fixed connection, a detachable connection, or an integral structure; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediary, or between two devices, components, or components. Connectivity within the room. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific circumstances.

It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with embodiments.

In order to achieve the above objective, according to one aspect of the present application, a search method using a tag knowledge network is provided. As shown in Figure 1, the method includes the following steps S1 to S5:

S1. Obtain multiple recommended items, perform label extraction on text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;

Specifically, the recommended items may be articles, commodities, etc. Generally, articles or commodities on the Internet describe their functions, attributes, or article content through text; therefore, while acquiring the multiple recommended items , You can get the text information related to each of the recommended items; when you extract the text information by labeling, you can get a label that can represent part of its characteristics; for example: when shopping online, you can enter Several key pieces of information are matched to obtain products with corresponding characteristics; and a product often includes multiple characteristics;

S2. Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;

Specifically, acquiring the user's historical behavior data on different items is used to analyze the user through a large amount of historical behavior data to determine the label of his preference. For example, when acquiring the user's browsed behavior data includes: Journey to the West Four people learn from the West, and Monkey King makes a big noise in Tiangong and Monkey King three hits white bone spirits; then it can be determined that the common (user-preferred) tag is Monkey King; when the behavior data browsed by users is obtained, they include: Zhu Bajie married his wife and Zhu Bajie’s past and present; It can be determined that the common (user-preferred) tag is Zhu Bajie; when the same user has browsed the above content at the same time, it is determined that the user's user tag set includes: Monkey King and Zhu Bajie;

S3. Construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;

Specifically, each tag in the item tag set can be used to visually express the relationship between the tags in the form of the degree of association as the connecting edge, where the degree of association is used to characterize the relationship between different tags in the same item. The strength of the association relationship. If the association relationship between two tags is strong, the degree of association is used as the connection edge. Generally, after determining the user's preferred tag in the item, the other tags in the item and the user's preferred tag are passed through the The degree of relevance is related to each other; therefore, the connection between different labels can be more clearly shown;

S4. Generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

Specifically, the reason for obtaining the item tag set is not to obtain which features the item includes, but to finally determine which tag has the greater weight on the basis of each tag included in the item. Therefore, it is necessary to pass the item tag set and The tag knowledge network obtains the item feature vector; at the same time, the reason for obtaining the user tag set is not only to obtain the tags used to browse which features in the historical data, but to finally determine which tags the user has There is the greatest degree of like, and the greater the like, the greater the weight. Therefore, it is necessary to obtain the user feature vector through the user tag set and the tag knowledge network; because the user has a preference for a certain tag, it is established On the basis that the preferred tag of the item occupies a larger weight, if a user is recommended to an item whose preferred tag occupies a small weight, and the fit between the user and the item cannot be satisfied, then It is bound to affect the user experience;

S5. Obtain the item feature vector of the first item that needs to be retrieved or the user feature vector of the first user, and retrieve the related item or the first user through the item feature vector of the first item or the user feature vector of the first user. Related users

Specifically, this step is used to realize that through the known item feature vector of the first item and the user feature vector of the first user, recall or retrieve related items similar to the first item, or related items similar to the first item. Related users whose items are matched, or related users similar to the first user, or items that match the first user; furthermore, it can provide a comprehensive matching rule, search for products that meet each user’s preferences and even get a match Other users with the same preferences.

In some embodiments, as in the aforementioned search method using the tag knowledge network, a plurality of recommended items are obtained; the text information related to each recommended item is extracted to obtain the corresponding one or more item tags, and the corresponding item tags are determined by An article label set composed of all the article labels, including:

Word segmentation of the text to obtain multiple phrases;

Specifically, this embodiment is used for label extraction of the text information of the recommended items, which is an indispensable part of the content recall algorithm. First, the title, description and other texts in the article are divided into Chinese words, and then comprehensively scored according to the parts of speech, frequency of occurrence, and whether they are useless words. (The scoring can be performed by various preset thresholds or judgment methods. , I won’t repeat them here), and keep the words with higher scores as the tags of the items to be recommended. Table 1 shows an example of an item label set (the description is too long and not listed, the words in the label must have appeared in the title or description):

In some embodiments, as in the aforementioned search method using the tag knowledge network, analyzing the user tags preferred by the user according to the historical behavior data includes:

Determine the corresponding item according to the historical behavior data;

Specifically, the historical behavior data may be user browsing or purchase record data; and the corresponding item may be a corresponding product or article in the browsing or purchase record data;

Determining a second item tag corresponding to each corresponding item;

Specifically, Chinese word segmentation can be performed on the title, description and other texts of the corresponding article, and then comprehensively scored according to the part of speech, frequency of occurrence, and whether it is a useless word and other characteristics of each word, and words with higher scores are retained as corresponding The item corresponding to the second item tag.

The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user; specifically, the second score threshold may be specifically defined according to specific scenarios and tag screening requirements.

In some embodiments, as in the aforementioned search method using the tag knowledge network, the method for determining the score of each weighted and combined second item tag is as follows:

Specifically, the score of the second item tag calculated by using this method can accurately capture the user's preferred tag, so that it can finally match the user's preferred item.

In some embodiments, as in the aforementioned search method using a tag knowledge network, the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:

According to the vector of each item tag and calculating the similarity w _tag between different tags through the cosine similarity value, a tag association network G _tag =<V _tag , E _tag > is generated; where V _tag represents the vertex set of the tag association network, That is, the set of all tags, E _tag represents the edge set of the tag association network, that is, the similarity w _tag set between different tags;

Convert the relationship between entities and entities in the knowledge _graph into association weights w _graph to generate a knowledge network G _k =<V _k , E _k >, where V _k represents the vertex set of the knowledge network, that is, all the labels in the graph Entity set, E _k represents the edge set of the knowledge network, that is, the association weight w _graph set between different label entities;

The knowledge network G _k =<V _k ,E _k > and the tag association network G _tag =<V _tag ,E _tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V represents the vertex set of the tag knowledge network, which is exactly the same as the vertex set V _{tag of the tag} association network, that is, V=V _tag ; E represents the edge set of the tag knowledge network, and this set is the tag association The edge set of the network and the edge set subset E′ of the knowledge network constitute a collection. The subset E′ is the edge set formed by all the tag entities in the knowledge network that contain the V _tag tag, that is, V=V _tag , E=E _tag +E′,

The correlation weight in w _e = w _tag + w _graph ;

Remove all the association relations in E′ whose association weight w _{e is} lower than w _threshold (that is, use the pruning method to prun the edge set), and obtain E _cut ; where,

w _threshold is the associated weight threshold;

Specifically, using this method to construct a tag knowledge network can accurately indicate the degree of association between tags; for example, the tag knowledge network shown in FIG. 2 can be constructed according to the item tag set in Table 1.

In some embodiments, as in the aforementioned search method using the tag knowledge network, generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:

Preferably, the vector of each label is T, and the vector dimension of T is the number of edges in E _cut , where the value of the edge directly connected to the label node is w _e , and the others are 0. According to Figure 2 above, the feature vector T of the tag Monkey King = [w _e1 ,w _e2 ,w _e3 ,w _e4 ,w _e5 ,w _e6 ,w _e7 ,w _e8 ], where w _e2 =w _e3 =w _e4 = 0;

By adopting this method, the item feature vector of each item can be calculated simply and quickly, and at the same time, it can accurately characterize the preference degree of each label in the item by the user.

In some embodiments, as in the aforementioned search method using the tag knowledge network, generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:

Using this method, the user feature vector U corresponding to each user can be calculated simply and quickly, and at the same time, it can accurately represent the specific like degree of each tag that the user likes, so that the user feature vector U contains more information. To be comprehensive and accurate.

In some embodiments, as in the aforementioned search method using the tag knowledge network, the retrieval of the item feature vector of the first item or the user feature vector of the first user to obtain related items or related users respectively includes:

Calculate the first cosine value of the item feature vector and the second item feature vector of each recalled item; wherein the recalled item is an item used for similarity matching with the item to be retrieved in a database or the Internet ；

Calculate the second cosine value of the user feature vector U and the second user feature vector of each recalled user respectively; wherein, the recalled user is used in a database or the Internet for similarity matching with the user to be retrieved user;

According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined respectively.

The following recall (search) can be done by the method in this embodiment:

a) Recall related items, that is, the similarity between items;

b) Users recall related users, that is, the similarity between users;

c) The user recalls related items, that is, the similarity between the user and the item.

It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the logical sequence is shown in the flowchart, in some cases, The steps shown or described can be performed in a different order than here.

According to an embodiment of the present invention, there is also provided a search device using a tag knowledge network for implementing the above search method using a tag knowledge network. As shown in FIG. 3, the device includes:

The label construction module 1 is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;

The user modeling module 2 is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user ；

The tag knowledge network construction module 3 is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;

The user and item feature construction module 4 is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

The vector search module 5 is used to retrieve related items or related users through the item feature vector or user feature vector, respectively.

Specifically, for the specific process of each module in the device of the embodiment of the present invention to realize its function, please refer to the relevant description in the method embodiment, which will not be repeated here.

As shown in FIG. 4, it is a system flowchart of searching through the search device using the tag knowledge network shown in FIG. 3.

Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented by a general computing device. They can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, alternatively, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, or they can be made into individual integrated circuit modules, or they can be Multiple modules or steps are made into a single integrated circuit module to achieve. In this way, the present invention is not limited to any specific combination of hardware and software.

The above descriptions are only preferred embodiments of the application, and are not used to limit the application. For those skilled in the art, the application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application.

Claims

A search method using tag knowledge network is characterized in that it includes:

Obtain multiple recommended items, extract the text information related to each recommended item to obtain the corresponding one or more item tags, and determine an item tag set composed of all the item tags;

Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;

Construct a tag knowledge network through the item tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;

Generating an item feature vector of the recommended item and a user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

Relevant items or related users are retrieved through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
The search method using a tag knowledge network according to claim 1, wherein said acquiring a plurality of recommended items; performing tag extraction on the text information related to each recommended item to obtain the corresponding one or more item tags, and Determine the article label set consisting of all the article labels, including:

Determine the text of each of the recommended items; wherein, the text includes: title and description;

Word segmentation of the text to obtain multiple phrases;

Determine the characteristics of each phrase; wherein, the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;

Score each phrase according to its characteristics, and retain the phrase that meets the first score threshold as the item tag of the corresponding recommended item;

Determine all the item tags of each of the recommended items, and obtain the item tag set.
The search method using a tag knowledge network according to claim 1, wherein said analyzing the user tags preferred by the user according to the historical behavior data comprises:

Determine the corresponding item according to the historical behavior data;

Determining a second item tag corresponding to each corresponding item;

Weighting and merging all the second article labels, and determining the score of each weighted and merging second article label;

The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user.
The search method using the tag knowledge network according to claim 3, wherein the method for determining the score of each weighted and combined second item tag is as follows:

Among them, N represents the number of items the user has clicked on, InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0, t cur represents the current timestamp, t ck represents the corresponding user click The timestamp of the item.
The search method using a tag knowledge network according to claim 1, wherein said constructing a tag knowledge network through said item tag set, knowledge graph and word2vec model comprises:

Taking the item tag set as a corpus, and using the word2vec model to generate a vector for each tag;

The similarity w tag between different tags is calculated according to the vector of each item tag and the cosine similarity value is used to generate a tag association network G tag =<V tag , E tag >; where V tag is the vertex of the tag association network Set, that is, the set of all tags; E tag is the edge set of the tag association network, that is, the set of similarities w tag between different tags;

The relationship between entities and entities in the knowledge graph is converted into association weights w graph to generate a knowledge network G k =<V k , E k >; where V k is the vertex set of the knowledge network, that is, mapping knowledge all tags in a set of entities; E k is the set of knowledge network side, i.e., between different tags associated weight w graph entities weight set;

The knowledge network G k =<V k ,E k > and the tag association network G tag =<V tag ,E tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V is the vertex set of the tag knowledge network, and the vertex set of the tag knowledge network is exactly the same as the vertex set V tag of the tag association network, that is, V=V tag ; E is the tag knowledge The edge set of the network, the edge set of the tag knowledge network is the collection of the edge set of the tag association network and the edge set subset E′ of the knowledge network, and the edge set subset E′ of the knowledge network is contained in the knowledge network All entities edge set V tag label forming a label, i.e., V = V tag, E = E tag + E ',
The correlation weight w e in E′ = w tag + w graph ;

Remove all the association relationships whose association weight w e is lower than w threshold in E′, and obtain E cut ; where,
w threshold is the associated weight threshold.
The search method using a tag knowledge network according to claim 5, wherein generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network comprises:

Determine the tag vector T of each tag in the item tag set according to the item tag set and the tag knowledge network;

Determine the item feature vector I of each item according to the tag vector included in each item, as follows:

Wherein, N represents the number of tagged items included, T i represents the i-th vector of the tag label.
The search method using the tag knowledge network according to claim 6, wherein the vector dimension of the tag vector T is the number of edges in E cut , wherein the value of the edge directly connected to the node of the tag is w e , the others are 0.
The search method using a tag knowledge network according to claim 1, wherein generating the user feature vector of the user according to the user tag set and the tag knowledge network comprises:

The calculation of the user feature vector U according to the user tag set and the tag knowledge network is as follows:

Among them, K represents the number of tags that the user likes, W i represents the degree of user's preference for the i-th tag, and T i represents the tag vector of the i-th tag.
The search method using a tag knowledge network according to claim 1, wherein the retrieval of related items or related users through the feature vector of the item to be retrieved or the feature vector of the user that needs to be retrieved comprises:

Calculate the first cosine value of the feature vector of the item to be retrieved and the second item feature vector of each recalled item; or

Calculating the second cosine value of the user feature vector U to be retrieved and the second user feature vector of each recalled user;

According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined.
A search device using tag knowledge network is characterized in that it includes:

The label building module is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;

The user modeling module is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;

The tag knowledge network construction module is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;

The user and item feature construction module is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;

The vector search module is used to retrieve related items or related users through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.