WO2020253591A1 - Search method and apparatus applying tag knowledge network - Google Patents
Search method and apparatus applying tag knowledge network Download PDFInfo
- Publication number
- WO2020253591A1 WO2020253591A1 PCT/CN2020/095370 CN2020095370W WO2020253591A1 WO 2020253591 A1 WO2020253591 A1 WO 2020253591A1 CN 2020095370 W CN2020095370 W CN 2020095370W WO 2020253591 A1 WO2020253591 A1 WO 2020253591A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tag
- item
- user
- knowledge network
- tags
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Definitions
- This application relates to the field of intelligent search technology, and specifically to a search method and device using a tag knowledge network.
- recommendation algorithms include content-based recall algorithms, collaborative filtering-based recall algorithms (user-based collaborative filtering and item-based collaborative filtering), and model learning-based recall algorithms (from simple logistic regression models to gradient boosting trees, Then to deep learning) and so on.
- content-based recall algorithm is the most common but very important recommendation algorithm. Its key point is the construction and mining of the label system.
- the recommended items (such as information, pictures, videos, etc.) must be decomposed into a series of tags; then according to the user's behavior on the items (such as browsing, clicking, purchasing, etc.)
- the user is also described as a series of label collections. This series of labels is our characterization of the user, that is, the user portrait; finally, we recall the user’s favorite items through the user’s favorite label.
- This paper introduces the tag knowledge network based on the content recall algorithm, designs a search application system based on the tag knowledge network, and vectorizes the user and item features based on the tag knowledge network, and then searches for similar items, similar users, Recall of items that users like.
- the content-based recall algorithm has many advantages. For example, it can dig out a lot of effective information from the item (item, hereinafter item is equivalent to the item) data, which can enable new items to be launched quickly, and has very good interpretability. But it also has the following disadvantages:
- the main purpose of this application is to provide a search method and device using a tag knowledge network to solve at least one problem in the related technology.
- a search method using a tag knowledge network is provided.
- the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;
- Relevant items or related users are retrieved through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
- the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;
- analyzing the user tags preferred by the user according to the historical behavior data includes:
- the weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user.
- the method for determining the score of each weighted and combined second item tag is as follows:
- N represents the number of items the user has clicked on
- InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0
- t cur represents the current timestamp
- t ck represents the corresponding user click The timestamp of the item.
- the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:
- V k is the vertex set of the knowledge network, that is, mapping knowledge all tags in a set of entities;
- E k is the set of knowledge network side, i.e., between different tags associated weight w graph entities weight set;
- E is the tag knowledge
- generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:
- N represents the number of tagged items included
- T i represents the i-th vector of the tag label.
- the vector dimension of the tag vector T is the number of edges in E cut , where the value of the edge directly connected to the node of the tag is w e , and the others are 0.
- generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:
- K represents the number of tags that the user likes
- W i represents the degree of user's preference for the i-th tag
- T i represents the tag vector of the i-th tag.
- the retrieval of the item feature vector to be retrieved or the user feature vector to be retrieved to obtain related items or related users respectively includes:
- a search device using a tag knowledge network is provided.
- the search device using the tag knowledge network according to this application includes:
- the label building module is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;
- the user modeling module is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;
- the tag knowledge network construction module is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;
- the user and item feature construction module is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
- the vector search module is used to retrieve related items or related users through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
- a search method and device using a tag knowledge network are adopted.
- the method includes: acquiring a plurality of recommended items, extracting the text information related to each recommended item to obtain the corresponding one or more item tags, and determining the item tag set composed of all the item tags; determining that the user has different items
- the tag knowledge is constructed through the item tag set, the knowledge graph and the word2vec model Network; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges; according to the item tag set, user tag set, and the tag knowledge network, the item characteristics of the recommended items are generated Vector and the user feature vector of the user; through the item feature vector or user feature vector to retrieve related items or related users respectively; thus, it is possible to introduce a tag knowledge network based on the content recall algorithm and use the relationship vector of the tag network Recall strategy is designed based
- Fig. 1 is a schematic diagram of a method flow of a search method using a tag knowledge network according to an embodiment of the present application
- Figure 2 is a schematic structural diagram of a tag knowledge network constructed according to a method in an embodiment of the present application
- FIG. 3 is a schematic diagram of the connection structure of functional modules of a search device using a tag knowledge network according to an embodiment of the present application.
- Fig. 4 is a system flow diagram for searching through the search device using the tag knowledge network shown in Fig. 3.
- installation should be interpreted broadly.
- it can be a fixed connection, a detachable connection, or an integral structure; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediary, or between two devices, components, or components. Connectivity within the room.
- the specific meanings of the above terms in this application can be understood according to specific circumstances.
- a search method using a tag knowledge network includes the following steps S1 to S5:
- the recommended items may be articles, commodities, etc.
- articles or commodities on the Internet describe their functions, attributes, or article content through text; therefore, while acquiring the multiple recommended items , You can get the text information related to each of the recommended items; when you extract the text information by labeling, you can get a label that can represent part of its characteristics; for example: when shopping online, you can enter Several key pieces of information are matched to obtain products with corresponding characteristics; and a product often includes multiple characteristics;
- S2 Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;
- acquiring the user's historical behavior data on different items is used to analyze the user through a large amount of historical behavior data to determine the label of his preference.
- acquiring the user's browsed behavior data includes: Journey to the West Four people learn from the West, and Monkey King makes a big noise in Tiangong and Monkey King three hits white bone spirits; then it can be determined that the common (user-preferred) tag is Monkey King; when the behavior data browsed by users is obtained, they include: Zhu Bajie married his wife and Zhu Bajie’s past and present; It can be determined that the common (user-preferred) tag is Zhu Bajie; when the same user has browsed the above content at the same time, it is determined that the user's user tag set includes: Monkey King and Zhu Bajie;
- each tag in the item tag set can be used to visually express the relationship between the tags in the form of the degree of association as the connecting edge, where the degree of association is used to characterize the relationship between different tags in the same item.
- the strength of the association relationship If the association relationship between two tags is strong, the degree of association is used as the connection edge.
- the other tags in the item and the user's preferred tag are passed through the The degree of relevance is related to each other; therefore, the connection between different labels can be more clearly shown;
- the reason for obtaining the item tag set is not to obtain which features the item includes, but to finally determine which tag has the greater weight on the basis of each tag included in the item. Therefore, it is necessary to pass the item tag set and The tag knowledge network obtains the item feature vector; at the same time, the reason for obtaining the user tag set is not only to obtain the tags used to browse which features in the historical data, but to finally determine which tags the user has There is the greatest degree of like, and the greater the like, the greater the weight.
- this step is used to realize that through the known item feature vector of the first item and the user feature vector of the first user, recall or retrieve related items similar to the first item, or related items similar to the first item.
- Related users whose items are matched, or related users similar to the first user, or items that match the first user; furthermore, it can provide a comprehensive matching rule, search for products that meet each user’s preferences and even get a match Other users with the same preferences.
- a plurality of recommended items are obtained; the text information related to each recommended item is extracted to obtain the corresponding one or more item tags, and the corresponding item tags are determined by An article label set composed of all the article labels, including:
- the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;
- this embodiment is used for label extraction of the text information of the recommended items, which is an indispensable part of the content recall algorithm.
- the title, description and other texts in the article are divided into Chinese words, and then comprehensively scored according to the parts of speech, frequency of occurrence, and whether they are useless words. (The scoring can be performed by various preset thresholds or judgment methods. , I won’t repeat them here), and keep the words with higher scores as the tags of the items to be recommended.
- Table 1 shows an example of an item label set (the description is too long and not listed, the words in the label must have appeared in the title or description):
- analyzing the user tags preferred by the user according to the historical behavior data includes:
- the historical behavior data may be user browsing or purchase record data; and the corresponding item may be a corresponding product or article in the browsing or purchase record data;
- Chinese word segmentation can be performed on the title, description and other texts of the corresponding article, and then comprehensively scored according to the part of speech, frequency of occurrence, and whether it is a useless word and other characteristics of each word, and words with higher scores are retained as corresponding The item corresponding to the second item tag.
- the weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user; specifically, the second score threshold may be specifically defined according to specific scenarios and tag screening requirements.
- the method for determining the score of each weighted and combined second item tag is as follows:
- N represents the number of items the user has clicked on
- InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0
- t cur represents the current timestamp
- t ck represents the corresponding user click The timestamp of the item.
- the score of the second item tag calculated by using this method can accurately capture the user's preferred tag, so that it can finally match the user's preferred item.
- the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:
- a tag association network G tag ⁇ V tag , E tag > is generated;
- V tag represents the vertex set of the tag association network, That is, the set of all tags,
- E tag represents the edge set of the tag association network, that is, the similarity w tag set between different tags;
- E represents the edge set of the tag knowledge network, and this set is the tag association
- the edge set of the network and the edge set subset E′ of the knowledge network constitute a collection.
- the correlation weight in w e w tag + w graph ;
- tag knowledge network shown in FIG. 2 can be constructed according to the item tag set in Table 1.
- generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:
- the vector of each label is T
- the vector dimension of T is the number of edges in E cut , where the value of the edge directly connected to the label node is w e , and the others are 0.
- N represents the number of tagged items included
- T i represents the i-th vector of the tag label.
- the item feature vector of each item can be calculated simply and quickly, and at the same time, it can accurately characterize the preference degree of each label in the item by the user.
- generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:
- K represents the number of tags that the user likes
- W i represents the degree of user's preference for the i-th tag
- T i represents the tag vector of the i-th tag.
- the user feature vector U corresponding to each user can be calculated simply and quickly, and at the same time, it can accurately represent the specific like degree of each tag that the user likes, so that the user feature vector U contains more information. To be comprehensive and accurate.
- the retrieval of the item feature vector of the first item or the user feature vector of the first user to obtain related items or related users respectively includes:
- the device includes:
- the label construction module 1 is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;
- the user modeling module 2 is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user ;
- the tag knowledge network construction module 3 is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;
- the user and item feature construction module 4 is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
- the vector search module 5 is used to retrieve related items or related users through the item feature vector or user feature vector, respectively.
- FIG. 4 it is a system flowchart of searching through the search device using the tag knowledge network shown in FIG. 3.
- modules or steps of the present invention can be implemented by a general computing device. They can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, alternatively, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, or they can be made into individual integrated circuit modules, or they can be Multiple modules or steps are made into a single integrated circuit module to achieve. In this way, the present invention is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A search method and apparatus applying a tag knowledge network. The method comprises: acquiring a plurality of recommended items, carrying out tag extraction on text information related to each of the recommended items to obtain one or more corresponding item tags, and determining an item tag set composed of all the item tags (S1); determining historical behavior data of a user for different items, analyzing user tags that the user prefers according to the historical behavior data, and determining a user tag set composed of all the user tags (S2); constructing a tag knowledge network by means of the item tag set, a knowledge graph and a word2vec model (S3); generating item feature vectors of the recommended items and user feature vectors of the user according to the item tag set, the user tag set and the tag knowledge network (S4); and respectively performing retrieval by means of the item feature vectors or the user feature vectors to obtain related items or related users (S5). The method can not only ensure the relevance of a content algorithm recall result, but also overcome the defects of semantic limitation and poor expansibility of a tag recall result.
Description
本申请涉及智能搜索技术领域,具体而言,涉及一种运用标签知识网络的搜索方法及装置。This application relates to the field of intelligent search technology, and specifically to a search method and device using a tag knowledge network.
随着互联网技术和社会化网络的发展,每天有大量的信息包括文字资讯,图片,视频等发布到互联网上。传统的搜索技术已经无法满足用户对信息发现的需求,而个性化推荐系统正是为了解决信息过载问题应运而生。它能根据用户的兴趣和行为,向用户推荐所需要的信息,帮助用户在海量的信息中快速发现和满足,以此来提高用户对产品的黏性,提高用户留存,提高产品的竞争力。With the development of Internet technology and social networks, a large amount of information including textual information, pictures, videos, etc. are posted on the Internet every day. Traditional search technology has been unable to meet the needs of users for information discovery, and the personalized recommendation system came into being to solve the problem of information overload. It can recommend the required information to users according to their interests and behaviors, and help users quickly discover and satisfy the masses of information, so as to increase users' stickiness to products, improve user retention, and improve product competitiveness.
目前常用的推荐算法有基于内容的召回算法,基于协同过滤的召回算法(基于用户的协同过滤和基于物品的协同过滤),基于模型学习的召回算法(从简单的逻辑回归模型到梯度提升树,再到深度学习)等。其中基于内容的召回算法是一种最普遍但又非常重要的推荐算法。它的关键点是标签体系的构建和挖掘,要先将推荐的物品(比如资讯,图片,视频等)分解为一系列的标签;然后根据用户对物品的行为(比如浏览,点击,购买等)将用户也描述为一系列的标签集合,这一系列的标签就是我们对用户特征的刻画,即用户画像;最后我们通过用户喜欢的标签来召回用户喜欢的物品。Currently commonly used recommendation algorithms include content-based recall algorithms, collaborative filtering-based recall algorithms (user-based collaborative filtering and item-based collaborative filtering), and model learning-based recall algorithms (from simple logistic regression models to gradient boosting trees, Then to deep learning) and so on. Among them, the content-based recall algorithm is the most common but very important recommendation algorithm. Its key point is the construction and mining of the label system. The recommended items (such as information, pictures, videos, etc.) must be decomposed into a series of tags; then according to the user's behavior on the items (such as browsing, clicking, purchasing, etc.) The user is also described as a series of label collections. This series of labels is our characterization of the user, that is, the user portrait; finally, we recall the user’s favorite items through the user’s favorite label.
本文在基于内容召回算法的基础上引入标签知识网络,设计一种基于标签知识网络的搜索应用系统,基于标签知识网络将用户和物品特征向量化,然后通过向量搜索来进行相似物品、相似用户、用户喜欢的物品的召回。This paper introduces the tag knowledge network based on the content recall algorithm, designs a search application system based on the tag knowledge network, and vectorizes the user and item features based on the tag knowledge network, and then searches for similar items, similar users, Recall of items that users like.
基于内容的召回算法有较多优势,比如能从物品(item,下文item等同于物品)数据中可以挖掘很多有效信息,能够使得新物品被快速推出,有非常好的解释性。但是它也有如下缺点:The content-based recall algorithm has many advantages. For example, it can dig out a lot of effective information from the item (item, hereinafter item is equivalent to the item) data, which can enable new items to be launched quickly, and has very good interpretability. But it also has the following disadvantages:
1.召回结果语义局限,扩展性差1. Recall results have limited semantics and poor scalability
基于内容的召回算法都是通过标签来召回结果,但是由于标签固定,导致召回的结果很局限,而且延伸比较难。比如通过“孙悟空”这个标签只能召回和孙悟空相关的信息,像孙悟空三打白骨精,孙悟空大闹天宫等信息,但是很难召回有关猪八戒的信息(孙悟空和猪八戒西游记里的主角之一,是师兄弟关系),除非某个物品同时包含孙悟空和猪八戒两个标签。毕竟对于广大的西游迷而言,孙悟空和猪八戒缺一不可。Content-based recall algorithms use tags to recall the results, but due to the fixed tags, the results of the recall are very limited and difficult to extend. For example, through the label "Monkey King", only information related to the Monkey King can be recalled, such as the Monkey King three fights the bones, and the Monkey King makes a noise in the Heavenly Palace. Master brother relationship), unless an item contains both the Monkey King and Zhu Bajie tags. After all, for the majority of West Journey fans, Monkey King and Zhu Bajie are indispensable.
2.相似用户和相似物品挖掘精度较差2. Poor mining accuracy for similar users and similar items
推荐系统较少使用标签来挖掘相似用户和物品,主要因为标签太精细,扩展性差。物品的标签一般都是根据物品的文本信息使用机器来生成,因为面对海量的物品,人工标注成本太高。而简易模型不像知识专家有丰富的先验知识,无法辨别“刘德华”和“华仔”是否是同一语义的标签。Recommendation systems seldom use tags to discover similar users and items, mainly because the tags are too fine and poor in scalability. The label of the item is generally generated by the machine based on the text information of the item, because the cost of manual labeling is too high for a large number of items. Unlike the knowledge experts who have rich prior knowledge, the simple model cannot distinguish whether "Andy Lau" and "Hua Tsai" are the same semantic label.
针对相关技术中召回结果语义局限,扩展性差,以及相似用户和相似物品挖掘精度较差的问题,目前尚未提出有效的解决方案。Aiming at the semantic limitations of recall results, poor scalability, and poor accuracy in mining similar users and similar items in related technologies, no effective solutions have been proposed.
发明内容Summary of the invention
本申请的主要目的在于提供一种运用标签知识网络的搜索方法及装置,以解决相关技术中存在的至少一个问题。The main purpose of this application is to provide a search method and device using a tag knowledge network to solve at least one problem in the related technology.
为了实现上述目的,根据本申请的一个方面,提供了一种运用标签知识网络的搜索方法。In order to achieve the above objective, according to one aspect of the present application, a search method using a tag knowledge network is provided.
获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;Obtain multiple recommended items, extract the text information related to each recommended item to obtain the corresponding one or more item tags, and determine an item tag set composed of all the item tags;
确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述 用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;
通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;Construct a tag knowledge network through the item tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;
根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;Generating an item feature vector of the recommended item and a user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
通过需要进行检索的所述所述物品特征向量或需要进行检索的用户特征向量检索得到相关物品或相关用户。Relevant items or related users are retrieved through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
进一步的,如前述的运用标签知识网络的搜索方法,所述获取多个推荐物品;对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集,包括:Further, as in the aforementioned search method using the tag knowledge network, a plurality of recommended items are obtained; the text information related to each recommended item is extracted to obtain the corresponding one or more item tags, and it is determined that the An article label set composed of article labels, including:
确定每个所述推荐物品的文本;其中,所述文本包括:标题及描述内容;Determine the text of each of the recommended items; wherein, the text includes: title and description;
对所述文本进行分词,得到多个词组;Word segmentation of the text to obtain multiple phrases;
确定每个所述词组的特征;其中,所述特征包括:词性、出现频率、是否是无用词;Determine the characteristics of each phrase; wherein, the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;
根据每个所述词组的特征对其进行评分,并保留满足第一分数阈值要求的词组作为其对应的所述推荐物品的物品标签;Score each phrase according to its characteristics, and retain the phrase that meets the first score threshold as the item tag of the corresponding recommended item;
确定每个所述推荐物品的所有所述物品标签,并得到所述物品标签集。Determine all the item tags of each of the recommended items, and obtain the item tag set.
进一步的,如前述的运用标签知识网络的搜索方法,所述根据所述历史行为数据分析出所述用户偏好的用户标签,包括:Further, as in the aforementioned search method using the tag knowledge network, analyzing the user tags preferred by the user according to the historical behavior data includes:
根据所述历史行为数据确定对应的物品;Determine the corresponding item according to the historical behavior data;
确定每个所述对应的物品对应的第二物品标签;Determining a second item tag corresponding to each corresponding item;
将所有的所述第二物品标签进行加权合并,并确定每个加权合并后的第二物品标签的分数;Weighting and merging all the second article labels, and determining the score of each weighted and merging second article label;
取分数满足第二分数阈值要求的所述加权合并后的第二物品标签作为所述用户偏好的用户标签。The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user.
进一步的,如前述的运用标签知识网络的搜索方法,所述确定每个加权合并后的第二物品标签的分数的方法如下所述:Further, as in the aforementioned search method using the tag knowledge network, the method for determining the score of each weighted and combined second item tag is as follows:
其中,N表示用户点击过的物品item个数,InItem(tag)表示点击的物品item是否包含物品标签tag,包含返回1,不包含返回0,t
cur表示当前时间戳,t
ck表示用户点击对应物品时的时间戳。
Among them, N represents the number of items the user has clicked on, InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0, t cur represents the current timestamp, t ck represents the corresponding user click The timestamp of the item.
进一步的,如前述的运用标签知识网络的搜索方法,所述通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;包括:Further, as in the aforementioned search method using a tag knowledge network, the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:
将所述物品标签集当作语料,使用word2vec模型生成每个标签的向量;Taking the item tag set as a corpus, and using the word2vec model to generate a vector for each tag;
根据所述每个物品标签的向量并通过余弦相似值计算不同标签间的相似度w
tag,生成标签关联网络G
tag=<V
tag,E
tag>;其中V
tag为所述标签关联网络的顶点集,即所有标签的集合;E
tag为所述标签关联网络的边集,即不同标签间的相似度w
tag集合;
The similarity w tag between different tags is calculated according to the vector of each item tag and the cosine similarity value is used to generate a tag association network G tag =<V tag , E tag >; where V tag is the vertex of the tag association network Set, that is, the set of all tags; E tag is the edge set of the tag association network, that is, the set of similarities w tag between different tags;
将所述知识图谱中实体与实体之间的关系转换成关联权重w
graph,生成知识网络G
k=<V
k,E
k>;其中,V
k为所述知识网络的顶点集,即所述知识图谱中所有标签实体集合;E
k为所述知识网络的边集,即不同标签实体间的关联权重w
graph集合;
The relationship between entities and entities in the knowledge graph is converted into association weights w graph to generate a knowledge network G k =<V k , E k >; where V k is the vertex set of the knowledge network, that is, mapping knowledge all tags in a set of entities; E k is the set of knowledge network side, i.e., between different tags associated weight w graph entities weight set;
将所述知识网络G
k=<V
k,E
k>和标签关联网络G
tag=<V
tag,E
tag>以所述标签关联网络的节点为基础进行合并生成所述标签知识网络G=<V,E>;其中,V为所述标签知识网络的顶点集,所述标签知识网络的顶点集和所述标签关联网络的顶点集V
tag完全一致,即V=V
tag;E为标签知识网络的边集,所述标签知识网络的边集是标签关联网络的边集和知识网络的边集子集E′构成 的合集,所述知识网络的边集子集E′是知识网络中包含V
tag标签的所有标签实体形成的边集,即V=V
tag,E=E
tag+E′,
E′中的关联权重w
e=w
tag+w
graph;
The knowledge network G k =<V k ,E k > and the tag association network G tag =<V tag ,E tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V is the vertex set of the tag knowledge network, and the vertex set of the tag knowledge network is exactly the same as the vertex set V tag of the tag association network, that is, V=V tag ; E is the tag knowledge The edge set of the network, the edge set of the tag knowledge network is the collection of the edge set of the tag association network and the edge set subset E′ of the knowledge network, and the edge set subset E′ of the knowledge network is contained in the knowledge network All entities edge set V tag label forming a label, i.e., V = V tag, E = E tag + E ', The correlation weight w e in E′ = w tag + w graph ;
将所述E′中的关联权重w
e低于w
threshold的关联关系全部去掉,并得到E
cut;其中,
w
threshold为关联权重阈值。
Remove all the association relationships whose association weight w e is lower than w threshold in E′, and obtain E cut ; where, w threshold is the associated weight threshold.
进一步的,如前述的运用标签知识网络的搜索方法,根据所述物品标签集以及所述标签知识网络生成所述推荐物品的物品特征向量,包括:Further, as in the aforementioned search method using the tag knowledge network, generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:
根据所述物品标签集以及所述标签知识网络确定所述物品标签集中每个标签的标签向量T;Determine the tag vector T of each tag in the item tag set according to the item tag set and the tag knowledge network;
根据每个物品中包括的标签向量确定每个物品的物品特征向量I,如下所述:Determine the item feature vector I of each item according to the tag vector included in each item, as follows:
其中,N表示物品包含的标签数,T
i表示第i个标签的标签向量。
Wherein, N represents the number of tagged items included, T i represents the i-th vector of the tag label.
进一步的,如前述的运用标签知识网络的搜索方法,所述标签向量T的向量维度为E
cut中边的个数,其中和标签的节点直接相连的边的取值为w
e,其它的为0。
Further, as in the aforementioned search method using the tag knowledge network, the vector dimension of the tag vector T is the number of edges in E cut , where the value of the edge directly connected to the node of the tag is w e , and the others are 0.
进一步的,如前述的运用标签知识网络的搜索方法,根据所述用户标签集以及所述标签知识网络生成所述用户的用户特征向量,包括:Further, as in the aforementioned search method using the tag knowledge network, generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:
根据所述用户标签集以及所述标签知识网络计算所述用户特征向量U,如下所述:The calculation of the user feature vector U according to the user tag set and the tag knowledge network is as follows:
其中,K表示用户喜欢的标签数,W
i表示用户对第i个标签的喜欢程度,T
i表示第i个标签的标签向量。
Among them, K represents the number of tags that the user likes, W i represents the degree of user's preference for the i-th tag, and T i represents the tag vector of the i-th tag.
进一步的,如前述的运用标签知识网络的搜索方法,所述通过需要进行检索的物品特征向量或需要进行检索的用户特征向量分别检索得到相关物品或相关用户,包括:Further, as in the aforementioned search method using the tag knowledge network, the retrieval of the item feature vector to be retrieved or the user feature vector to be retrieved to obtain related items or related users respectively includes:
计算所述需要进行检索的物品特征向量分别与各个召回物品的第二物品特征向量的第一余弦值;或Calculate the first cosine value of the feature vector of the item to be retrieved and the second item feature vector of each recalled item; or
计算所述需要进行检索的用户特征向量U分别与各个召回用户的第二用户特征向量的第二余弦值;Calculating the second cosine value of the user feature vector U to be retrieved and the second user feature vector of each recalled user;
根据所述第一余弦值和第二余弦值确定若干个满足相似度阈值要求的相关物品或相关用户。According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined.
为了实现上述目的,根据本申请的另一方面,提供了一种运用标签知识网络的搜索装置。In order to achieve the above objective, according to another aspect of the present application, a search device using a tag knowledge network is provided.
根据本申请的运用标签知识网络的搜索装置包括:The search device using the tag knowledge network according to this application includes:
标签构建模块,用于获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;The label building module is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;
用户建模模块,用于确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;The user modeling module is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;
标签知识网络构建模块,用于通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;The tag knowledge network construction module is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;
用户和物品特征构建模块,用于根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;The user and item feature construction module is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
向量搜索模块,用于通过需要进行检索的所述所述物品特征向量或需要进行检索的用户特征向量检索得到相关物品或相关用户。The vector search module is used to retrieve related items or related users through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
在本申请实施例中,采用一种运用标签知识网络的搜索方法及装置。方法包括:获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户标签构成的用户标签集;通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;通过所述物品特征向量或用户特征向量分别检索得到相关物品或相关用户;因而能够在基于内容召回算法的基础上引入标签知识网络,使用标签网络的关系向量化表示用户和物品,基于向量搜索设计召回策略;此外对于标签间的相关性维度爆炸问题可以通过对标签知识网络的剪枝来解决,从而实现了既能保证内容算法召回结果的相关性,又能有效的解决标签召回结果语义局限,扩展性差的缺陷的技术效果。In the embodiment of this application, a search method and device using a tag knowledge network are adopted. The method includes: acquiring a plurality of recommended items, extracting the text information related to each recommended item to obtain the corresponding one or more item tags, and determining the item tag set composed of all the item tags; determining that the user has different items According to the historical behavior data, the user tags preferred by the user are analyzed according to the historical behavior data, and the user tag set composed of all the user tags is determined; the tag knowledge is constructed through the item tag set, the knowledge graph and the word2vec model Network; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges; according to the item tag set, user tag set, and the tag knowledge network, the item characteristics of the recommended items are generated Vector and the user feature vector of the user; through the item feature vector or user feature vector to retrieve related items or related users respectively; thus, it is possible to introduce a tag knowledge network based on the content recall algorithm and use the relationship vector of the tag network Recall strategy is designed based on vector search. In addition, the problem of exploding the correlation dimension between tags can be solved by pruning the tag knowledge network, so as to ensure the relevance of the content algorithm recall results. It can effectively solve the technical effect of the semantic limitation and poor scalability of the label recall results.
构成本申请的一部分的附图用来提供对本申请的进一步理解,使得本申请的其它特征、目的和优点变得更明显。本申请的示意性实施例附图及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings constituting a part of the application are used to provide a further understanding of the application, so that other features, purposes and advantages of the application become more obvious. The drawings and descriptions of the schematic embodiments of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:
图1是根据本申请一种实施例的运用标签知识网络的搜索方法的方法流程示意图;Fig. 1 is a schematic diagram of a method flow of a search method using a tag knowledge network according to an embodiment of the present application;
图2是根据本申请一种实施例中的方法构建的标签知识网络的结构示意图;Figure 2 is a schematic structural diagram of a tag knowledge network constructed according to a method in an embodiment of the present application;
图3是根据本申请一种实施例的运用标签知识网络的搜索装置的功能模块连接结构示意图;以及3 is a schematic diagram of the connection structure of functional modules of a search device using a tag knowledge network according to an embodiment of the present application; and
图4是通过图3所示的运用标签知识网络的搜索装置进行搜索的系统流 程图。Fig. 4 is a system flow diagram for searching through the search device using the tag knowledge network shown in Fig. 3.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances for the purposes of the embodiments of the present application described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
在本申请中,术语“上”、“下”、“左”、“右”、“前”、“后”、“顶”、“底”、“内”、“外”、“中”、“竖直”、“水平”、“横向”、“纵向”等指示的方位或位置关系为基于附图所示的方位或位置关系。这些术语主要是为了更好地描述本申请及其实施例,并非用于限定所指示的装置、元件或组成部分必须具有特定方位,或以特定方位进行构造和操作。In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", The orientation or positional relationship indicated by "vertical", "horizontal", "horizontal", "vertical", etc. are based on the orientation or positional relationship shown in the drawings. These terms are mainly used to better describe the present application and its embodiments, and are not used to limit that the indicated device, element, or component must have a specific orientation, or be constructed and operated in a specific orientation.
并且,上述部分术语除了可以用于表示方位或位置关系以外,还可能用于表示其他含义,例如术语“上”在某些情况下也可能用于表示某种依附关系或连接关系。对于本领域普通技术人员而言,可以根据具体情况理解这些术语在本申请中的具体含义。In addition, some of the above terms may be used to indicate other meanings in addition to the position or position relationship. For example, the term "shang" may also be used to indicate a certain dependency relationship or connection relationship in some cases. For those of ordinary skill in the art, the specific meanings of these terms in this application can be understood according to specific circumstances.
此外,术语“安装”、“设置”、“设有”、“连接”、“相连”、“套接”应做广义理解。例如,可以是固定连接,可拆卸连接,或整体式构造;可以是机械连接,或电连接;可以是直接相连,或者是通过中间媒介间接相连,又或者是两个装置、元件或组成部分之间内部的连通。对于本领域普通技术人员而言,可 以根据具体情况理解上述术语在本申请中的具体含义。In addition, the terms "installation", "setup", "provided", "connected", "connected", and "socketed" should be interpreted broadly. For example, it can be a fixed connection, a detachable connection, or an integral structure; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediary, or between two devices, components, or components. Connectivity within the room. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific circumstances.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with embodiments.
为了实现上述目的,根据本申请的一个方面,提供了一种运用标签知识网络的搜索方法。如图1所示,该方法包括如下的步骤S1至步骤S5:In order to achieve the above objective, according to one aspect of the present application, a search method using a tag knowledge network is provided. As shown in Figure 1, the method includes the following steps S1 to S5:
S1.获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;S1. Obtain multiple recommended items, perform label extraction on text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;
具体的,所述推荐物品可以是文章,商品等,一般的,在互联网上的文章或者商品都是通过文字对其功能、属性或者文章内容进行描述的;因此获取所述多个推荐物品的同时,即可获取到与每个所述推荐物品相关的文本信息;在通过对所述文本信息进行标签提取时即可获得能够代表其一部分特征的标签;例如:在进行网上购物时,可以通过输入若干个关键信息,匹配得到相应特征的产品;且一个产品往往包括多个特征;Specifically, the recommended items may be articles, commodities, etc. Generally, articles or commodities on the Internet describe their functions, attributes, or article content through text; therefore, while acquiring the multiple recommended items , You can get the text information related to each of the recommended items; when you extract the text information by labeling, you can get a label that can represent part of its characteristics; for example: when shopping online, you can enter Several key pieces of information are matched to obtain products with corresponding characteristics; and a product often includes multiple characteristics;
S2.确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;S2. Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;
具体的,获取所述用户对不同物品的历史行为数据,用于通过大量的历史行为数据对用户进行分析,确定其偏好的标签,例如:当获取用户浏览过的行为数据包括:西游记师徒四人西天取经,孙悟空大闹天宫和孙悟空三打白骨精;则可以判定其中共有的(用户偏好的)标签就是孙悟空;当获取用户浏览过的行为数据包括:猪八戒娶媳妇和猪八戒的前世今生;则可以判定其中共有的(用户偏好的)标签就是猪八戒;当同一个用户同时浏览过上述内容之后,则确定该用户的用户标签集包括:孙悟空和猪八戒;Specifically, acquiring the user's historical behavior data on different items is used to analyze the user through a large amount of historical behavior data to determine the label of his preference. For example, when acquiring the user's browsed behavior data includes: Journey to the West Four people learn from the West, and Monkey King makes a big noise in Tiangong and Monkey King three hits white bone spirits; then it can be determined that the common (user-preferred) tag is Monkey King; when the behavior data browsed by users is obtained, they include: Zhu Bajie married his wife and Zhu Bajie’s past and present; It can be determined that the common (user-preferred) tag is Zhu Bajie; when the same user has browsed the above content at the same time, it is determined that the user's user tag set includes: Monkey King and Zhu Bajie;
S3.通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;S3. Construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;
具体的,可以将所述物品标签集中的各个标签以关联度为连接边的形式直观地表现各个标签之间的关系,其中,所述关联度用于表征同一个物品中不同的标签之间的关联关系的强弱,若两个标签关联关系强则通过关联度为连接边,一般的,在确定物品中用户偏好的标签之后,将该物品中的其他标签与该用户偏好的标签通过所述关联度相互关联;因而能够更加明确地表现出不同的标签之间联系;Specifically, each tag in the item tag set can be used to visually express the relationship between the tags in the form of the degree of association as the connecting edge, where the degree of association is used to characterize the relationship between different tags in the same item. The strength of the association relationship. If the association relationship between two tags is strong, the degree of association is used as the connection edge. Generally, after determining the user's preferred tag in the item, the other tags in the item and the user's preferred tag are passed through the The degree of relevance is related to each other; therefore, the connection between different labels can be more clearly shown;
S4.根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;S4. Generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
具体的,之所以要获取物品标签集,不是为了获取所述物品包括哪些特征,而是为了最终确定物品在包括各个标签的基础上,哪个标签的权重更大,因此就需要通过物品标签集以及所述标签知识网络获得所述物品特征向量;同时,之所以要获取用户标签集,不仅仅是为了获取所述用于历史数据中浏览过哪些特征的标签,而是为了最终确定用户对那些标签有最大的喜欢程度,喜欢程度越大的标签的则权重也越大,因此就需要通过用户标签集以及所述标签知识网络获得所述用户特征向量;因为用户对某个标签有偏好,是建立在该物品中其偏好的标签占有较大的权重的基础上,如果一个所述用户被推荐了一个其偏好的标签所占权重很小的物品,无法满足用户与物品之间的契合度,则势必影响用户的体验;Specifically, the reason for obtaining the item tag set is not to obtain which features the item includes, but to finally determine which tag has the greater weight on the basis of each tag included in the item. Therefore, it is necessary to pass the item tag set and The tag knowledge network obtains the item feature vector; at the same time, the reason for obtaining the user tag set is not only to obtain the tags used to browse which features in the historical data, but to finally determine which tags the user has There is the greatest degree of like, and the greater the like, the greater the weight. Therefore, it is necessary to obtain the user feature vector through the user tag set and the tag knowledge network; because the user has a preference for a certain tag, it is established On the basis that the preferred tag of the item occupies a larger weight, if a user is recommended to an item whose preferred tag occupies a small weight, and the fit between the user and the item cannot be satisfied, then It is bound to affect the user experience;
S5.获取需要进行检索的第一物品的所述物品特征向量或第一用户的用户特征向量,通过所述第一物品的所述物品特征向量或第一用户的用户特征向量检索得到相关物品或相关用户;S5. Obtain the item feature vector of the first item that needs to be retrieved or the user feature vector of the first user, and retrieve the related item or the first user through the item feature vector of the first item or the user feature vector of the first user. Related users
具体的,本步骤用于能够实现通过已知的第一物品的物品特征向量和第一用户的用户特征向量,召回或检索得到与所述第一物品相似的相关物品,或者与所述第一物品匹配的相关用户,或者与所述第一用户相似的相关用户,或者与所述第一用户的匹配的物品;进而能够提供全面的匹配规则,搜索到满足每个用户偏好的产品甚至匹配得到具有相同偏好的其他用户。Specifically, this step is used to realize that through the known item feature vector of the first item and the user feature vector of the first user, recall or retrieve related items similar to the first item, or related items similar to the first item. Related users whose items are matched, or related users similar to the first user, or items that match the first user; furthermore, it can provide a comprehensive matching rule, search for products that meet each user’s preferences and even get a match Other users with the same preferences.
在一些实施例中,如前述的运用标签知识网络的搜索方法,所述获取多个 推荐物品;对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集,包括:In some embodiments, as in the aforementioned search method using the tag knowledge network, a plurality of recommended items are obtained; the text information related to each recommended item is extracted to obtain the corresponding one or more item tags, and the corresponding item tags are determined by An article label set composed of all the article labels, including:
确定每个所述推荐物品的文本;其中,所述文本包括:标题及描述内容;Determine the text of each of the recommended items; wherein, the text includes: title and description;
对所述文本进行分词,得到多个词组;Word segmentation of the text to obtain multiple phrases;
确定每个所述词组的特征;其中,所述特征包括:词性、出现频率、是否是无用词;Determine the characteristics of each phrase; wherein, the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;
根据每个所述词组的特征对其进行评分,并保留满足第一分数阈值要求的词组作为其对应的所述推荐物品的物品标签;Score each phrase according to its characteristics, and retain the phrase that meets the first score threshold as the item tag of the corresponding recommended item;
确定每个所述推荐物品的所有所述物品标签,并得到所述物品标签集。Determine all the item tags of each of the recommended items, and obtain the item tag set.
具体的,本实施例用于对推荐物品的文本信息进行标签提取,是基于内容召回算法中不可或缺的一环。首先将物品中的标题,描述等文本进行中文分词,然后根据各个词的词性、出现频率、是否是无用词等特征进行综合打分(其中,打分可以通过预先设定的各种阈值或判定方法进行,在此不再赘述),保留分数较高的词作为待推荐物品的标签。表格1给出了物品标签集示例(描述太长未列出,标签中的词一定在标题或描述中出现过):Specifically, this embodiment is used for label extraction of the text information of the recommended items, which is an indispensable part of the content recall algorithm. First, the title, description and other texts in the article are divided into Chinese words, and then comprehensively scored according to the parts of speech, frequency of occurrence, and whether they are useless words. (The scoring can be performed by various preset thresholds or judgment methods. , I won’t repeat them here), and keep the words with higher scores as the tags of the items to be recommended. Table 1 shows an example of an item label set (the description is too long and not listed, the words in the label must have appeared in the title or description):
在一些实施例中,如前述的运用标签知识网络的搜索方法,所述根据所述历史行为数据分析出所述用户偏好的用户标签,包括:In some embodiments, as in the aforementioned search method using the tag knowledge network, analyzing the user tags preferred by the user according to the historical behavior data includes:
根据所述历史行为数据确定对应的物品;Determine the corresponding item according to the historical behavior data;
具体的,所述历史行为数据可以是用户浏览或购买记录数据;且对应的物品可以是浏览或购买记录数据中相应的产品、文章;Specifically, the historical behavior data may be user browsing or purchase record data; and the corresponding item may be a corresponding product or article in the browsing or purchase record data;
确定每个所述对应的物品对应的第二物品标签;Determining a second item tag corresponding to each corresponding item;
具体的,可以通过将所述对应的物品中的标题,描述等文本进行中文分词,然后根据各个词的词性、出现频率、是否是无用词等特征进行综合打分,保留分数较高的词作为对应的物品对应的第二物品标签。Specifically, Chinese word segmentation can be performed on the title, description and other texts of the corresponding article, and then comprehensively scored according to the part of speech, frequency of occurrence, and whether it is a useless word and other characteristics of each word, and words with higher scores are retained as corresponding The item corresponding to the second item tag.
将所有的所述第二物品标签进行加权合并,并确定每个加权合并后的第二物品标签的分数;Weighting and merging all the second article labels, and determining the score of each weighted and merging second article label;
取分数满足第二分数阈值要求的所述加权合并后的第二物品标签作为所述用户偏好的用户标签;具体的,所述第二分数阈值可以根据具体场景以及标签筛选的要求进行具体限定。The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user; specifically, the second score threshold may be specifically defined according to specific scenarios and tag screening requirements.
在一些实施例中,如前述的运用标签知识网络的搜索方法,所述确定每个加权合并后的第二物品标签的分数的方法如下所述:In some embodiments, as in the aforementioned search method using the tag knowledge network, the method for determining the score of each weighted and combined second item tag is as follows:
其中,N表示用户点击过的物品item个数,InItem(tag)表示点击的物品item是否包含物品标签tag,包含返回1,不包含返回0,t
cur表示当前时间戳,t
ck表示用户点击对应物品时的时间戳。
Among them, N represents the number of items the user has clicked on, InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0, t cur represents the current timestamp, t ck represents the corresponding user click The timestamp of the item.
具体的,采用此方法计算得到的所述第二物品标签的分数能够准确抓取出其中用户的偏好的标签,因而能最终匹配到用户偏好的物品。Specifically, the score of the second item tag calculated by using this method can accurately capture the user's preferred tag, so that it can finally match the user's preferred item.
在一些实施例中,如前述的运用标签知识网络的搜索方法,所述通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;包括:In some embodiments, as in the aforementioned search method using a tag knowledge network, the construction of a tag knowledge network through the item tag set, knowledge graph and word2vec model includes:
将所述物品标签集当作语料,使用word2vec模型生成每个标签的向量;Taking the item tag set as a corpus, and using the word2vec model to generate a vector for each tag;
根据所述每个物品标签的向量并通过余弦相似值计算不同标签间的相似度w
tag,生成标签关联网络G
tag=<V
tag,E
tag>;其中V
tag表示标签关联网络的顶点集,也就是所有标签的集合,E
tag表示标签关联网络的边集,也就是不同标签间的相似度w
tag集合;
According to the vector of each item tag and calculating the similarity w tag between different tags through the cosine similarity value, a tag association network G tag =<V tag , E tag > is generated; where V tag represents the vertex set of the tag association network, That is, the set of all tags, E tag represents the edge set of the tag association network, that is, the similarity w tag set between different tags;
将所述知识图谱中实体与实体之间的关系转换成关联权重w
graph,生成知识网络G
k=<V
k,E
k>,其中V
k表示知识网络的顶点集,也就是图谱中所有标签实体集合,E
k表示知识网络的边集,也就是不同标签实体间的关联权重w
graph集合;
Convert the relationship between entities and entities in the knowledge graph into association weights w graph to generate a knowledge network G k =<V k , E k >, where V k represents the vertex set of the knowledge network, that is, all the labels in the graph Entity set, E k represents the edge set of the knowledge network, that is, the association weight w graph set between different label entities;
将所述知识网络G
k=<V
k,E
k>和标签关联网络G
tag=<V
tag,E
tag>以所述标签关联网络的节点为基础进行合并生成所述标签知识网络G=<V,E>;其中,V表示标签知识网络的顶点集,该集合和标签关联网络的顶点集V
tag完全一致,即V=V
tag;E表示标签知识网络的边集,该集合是标签关联网络的边集和知识网络的边集子集E′构成的合集,该子集E′是知识网络中包含V
tag标签的所有标签实体形成的边集,即V=V
tag,E=E
tag+E′,
中的关联权重w
e=w
tag+w
graph;
The knowledge network G k =<V k ,E k > and the tag association network G tag =<V tag ,E tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V represents the vertex set of the tag knowledge network, which is exactly the same as the vertex set V tag of the tag association network, that is, V=V tag ; E represents the edge set of the tag knowledge network, and this set is the tag association The edge set of the network and the edge set subset E′ of the knowledge network constitute a collection. The subset E′ is the edge set formed by all the tag entities in the knowledge network that contain the V tag tag, that is, V=V tag , E=E tag +E′, The correlation weight in w e = w tag + w graph ;
将所述E′中的关联权重w
e低于w
threshold的关联关系全部去掉(即:使用剪枝法对边集合进行剪枝),并得到E
cut;其中,
w
threshold为关联权重阈值;
Remove all the association relations in E′ whose association weight w e is lower than w threshold (that is, use the pruning method to prun the edge set), and obtain E cut ; where, w threshold is the associated weight threshold;
具体的,采用此方法构建标签知识网络,能够准确地表明各个标签之间的关联度;举例的,根据所述表格1中的物品标签集可以构建如图2所示的标签知识网络。Specifically, using this method to construct a tag knowledge network can accurately indicate the degree of association between tags; for example, the tag knowledge network shown in FIG. 2 can be constructed according to the item tag set in Table 1.
在一些实施例中,如前述的运用标签知识网络的搜索方法,根据所述物品标签集以及所述标签知识网络生成所述推荐物品的物品特征向量,包括:In some embodiments, as in the aforementioned search method using the tag knowledge network, generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network includes:
根据所述物品标签集以及所述标签知识网络确定所述物品标签集中每个 标签的标签向量T;Determine the tag vector T of each tag in the item tag set according to the item tag set and the tag knowledge network;
优选的,每个标签的向量为T,T的向量维度为E
cut中边的个数,其中和标签节点直接相连的边的取值为w
e,其它的为0。按照上图2所示,标签孙悟空的特征向量T=[w
e1,w
e2,w
e3,w
e4,w
e5,w
e6,w
e7,w
e8],其中w
e2=w
e3=w
e4=0;
Preferably, the vector of each label is T, and the vector dimension of T is the number of edges in E cut , where the value of the edge directly connected to the label node is w e , and the others are 0. According to Figure 2 above, the feature vector T of the tag Monkey King = [w e1 ,w e2 ,w e3 ,w e4 ,w e5 ,w e6 ,w e7 ,w e8 ], where w e2 =w e3 =w e4 = 0;
根据每个物品中包括的标签向量确定每个物品的物品特征向量I,如下所述:Determine the item feature vector I of each item according to the tag vector included in each item, as follows:
其中,N表示物品包含的标签数,T
i表示第i个标签的标签向量。
Wherein, N represents the number of tagged items included, T i represents the i-th vector of the tag label.
采用此方法能够简单快速地计算得到每个物品的物品特征向量,同时能够准确地表征出该物品中的每个标签受用户的偏好程度。By adopting this method, the item feature vector of each item can be calculated simply and quickly, and at the same time, it can accurately characterize the preference degree of each label in the item by the user.
在一些实施例中,如前述的运用标签知识网络的搜索方法,根据所述用户标签集以及所述标签知识网络生成所述用户的用户特征向量,包括:In some embodiments, as in the aforementioned search method using the tag knowledge network, generating the user feature vector of the user according to the user tag set and the tag knowledge network includes:
根据所述用户标签集以及所述标签知识网络计算所述用户特征向量U,如下所述:The calculation of the user feature vector U according to the user tag set and the tag knowledge network is as follows:
其中,K表示用户喜欢的标签数,W
i表示用户对第i个标签的喜欢程度,T
i表示第i个标签的标签向量。
Among them, K represents the number of tags that the user likes, W i represents the degree of user's preference for the i-th tag, and T i represents the tag vector of the i-th tag.
采用此方法能够简单快速地计算得到每个用户对应的用户特征向量U,同时能够准确地表征出该用户所喜欢的每个标签的具体的喜欢程度,使的用户特征向量U中包含的信息更为全面及准确。Using this method, the user feature vector U corresponding to each user can be calculated simply and quickly, and at the same time, it can accurately represent the specific like degree of each tag that the user likes, so that the user feature vector U contains more information. To be comprehensive and accurate.
在一些实施例中,如前述的运用标签知识网络的搜索方法,所述通过所述第一物品的所述物品特征向量或第一用户的用户特征向量分别检索得到相关物品或相关用户,包括:In some embodiments, as in the aforementioned search method using the tag knowledge network, the retrieval of the item feature vector of the first item or the user feature vector of the first user to obtain related items or related users respectively includes:
计算所述物品特征向量分别与各个召回物品的第二物品特征向量的第一余弦值;其中,所述召回物品为在数据库或互联网上用于与所述待检索物品进行相似度匹配的物品;Calculate the first cosine value of the item feature vector and the second item feature vector of each recalled item; wherein the recalled item is an item used for similarity matching with the item to be retrieved in a database or the Internet ;
计算所述用户特征向量U分别与各个召回用户的第二用户特征向量的第二余弦值;其中,所述召回用户为在数据库或互联网上用于与所述待检索用户进行相似度匹配的用户;Calculate the second cosine value of the user feature vector U and the second user feature vector of each recalled user respectively; wherein, the recalled user is used in a database or the Internet for similarity matching with the user to be retrieved user;
分别根据所述第一余弦值和第二余弦值确定若干个满足相似度阈值要求的相关物品或相关用户。According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined respectively.
通过该实施例中的方法可以做如下召回(检索):The following recall (search) can be done by the method in this embodiment:
a)物品召回相关物品,即物品之间的相似度;a) Recall related items, that is, the similarity between items;
b)用户召回相关用户,即用户之间的相似度;b) Users recall related users, that is, the similarity between users;
c)用户召回相关物品,即用户和物品的相似度。c) The user recalls related items, that is, the similarity between the user and the item.
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the logical sequence is shown in the flowchart, in some cases, The steps shown or described can be performed in a different order than here.
根据本发明实施例,还提供了一种用于实施上述运用标签知识网络的搜索方法的运用标签知识网络的搜索装置,如图3所示,该装置包括:According to an embodiment of the present invention, there is also provided a search device using a tag knowledge network for implementing the above search method using a tag knowledge network. As shown in FIG. 3, the device includes:
标签构建模块1,用于获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;The label construction module 1 is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;
用户建模模块2,用于确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;The user modeling module 2 is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user ;
标签知识网络构建模块3,用于通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点, 标签之间的关联度为边的网络;The tag knowledge network construction module 3 is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;
用户和物品特征构建模块4,用于根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;The user and item feature construction module 4 is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;
向量搜索模块5,用于通过所述物品特征向量或用户特征向量分别检索得到相关物品或相关用户。The vector search module 5 is used to retrieve related items or related users through the item feature vector or user feature vector, respectively.
具体的,本发明实施例的装置中各模块实现其功能的具体过程可参见方法实施例中的相关描述,此处不再赘述。Specifically, for the specific process of each module in the device of the embodiment of the present invention to realize its function, please refer to the relevant description in the method embodiment, which will not be repeated here.
如图4所示,为通过图3所示的运用标签知识网络的搜索装置进行搜索的系统流程图。As shown in FIG. 4, it is a system flowchart of searching through the search device using the tag knowledge network shown in FIG. 3.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented by a general computing device. They can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, alternatively, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, or they can be made into individual integrated circuit modules, or they can be Multiple modules or steps are made into a single integrated circuit module to achieve. In this way, the present invention is not limited to any specific combination of hardware and software.
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the application, and are not used to limit the application. For those skilled in the art, the application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application.
Claims (10)
- 一种运用标签知识网络的搜索方法,其特征在于,包括:A search method using tag knowledge network is characterized in that it includes:获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;Obtain multiple recommended items, extract the text information related to each recommended item to obtain the corresponding one or more item tags, and determine an item tag set composed of all the item tags;确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;Determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;Construct a tag knowledge network through the item tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and the degree of association between tags as edges;根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;Generating an item feature vector of the recommended item and a user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;通过需要进行检索的物品特征向量或需要进行检索的用户特征向量检索得到相关物品或相关用户。Relevant items or related users are retrieved through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
- 根据权利要求1所述的运用标签知识网络的搜索方法,其特征在于,所述获取多个推荐物品;对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集,包括:The search method using a tag knowledge network according to claim 1, wherein said acquiring a plurality of recommended items; performing tag extraction on the text information related to each recommended item to obtain the corresponding one or more item tags, and Determine the article label set consisting of all the article labels, including:确定每个所述推荐物品的文本;其中,所述文本包括:标题及描述内容;Determine the text of each of the recommended items; wherein, the text includes: title and description;对所述文本进行分词,得到多个词组;Word segmentation of the text to obtain multiple phrases;确定每个所述词组的特征;其中,所述特征包括:词性、出现频率、是否是无用词;Determine the characteristics of each phrase; wherein, the characteristics include: part of speech, frequency of occurrence, and whether it is a useless word;根据每个所述词组的特征对其进行评分,并保留满足第一分数阈值要求的词组作为其对应的所述推荐物品的物品标签;Score each phrase according to its characteristics, and retain the phrase that meets the first score threshold as the item tag of the corresponding recommended item;确定每个所述推荐物品的所有所述物品标签,并得到所述物品标签集。Determine all the item tags of each of the recommended items, and obtain the item tag set.
- 根据权利要求1所述的运用标签知识网络的搜索方法,其特征在于, 所述根据所述历史行为数据分析出所述用户偏好的用户标签,包括:The search method using a tag knowledge network according to claim 1, wherein said analyzing the user tags preferred by the user according to the historical behavior data comprises:根据所述历史行为数据确定对应的物品;Determine the corresponding item according to the historical behavior data;确定每个所述对应的物品对应的第二物品标签;Determining a second item tag corresponding to each corresponding item;将所有的所述第二物品标签进行加权合并,并确定每个加权合并后的第二物品标签的分数;Weighting and merging all the second article labels, and determining the score of each weighted and merging second article label;取分数满足第二分数阈值要求的所述加权合并后的第二物品标签作为所述用户偏好的用户标签。The weighted and combined second item tag whose score meets the second score threshold requirement is taken as the user tag preferred by the user.
- 根据权利要求3所述的运用标签知识网络的搜索方法,其特征在于,所述确定每个加权合并后的第二物品标签的分数的方法如下所述:The search method using the tag knowledge network according to claim 3, wherein the method for determining the score of each weighted and combined second item tag is as follows:其中,N表示用户点击过的物品item个数,InItem(tag)表示点击的物品item是否包含物品标签tag,包含返回1,不包含返回0,t cur表示当前时间戳,t ck表示用户点击对应物品时的时间戳。 Among them, N represents the number of items the user has clicked on, InItem(tag) represents whether the clicked item contains the item tag tag, including return 1, not including return 0, t cur represents the current timestamp, t ck represents the corresponding user click The timestamp of the item.
- 根据权利要求1所述的运用标签知识网络的搜索方法,其特征在于,所述通过所述物品标签集、知识图谱以及word2vec模型构建标签知识网络;包括:The search method using a tag knowledge network according to claim 1, wherein said constructing a tag knowledge network through said item tag set, knowledge graph and word2vec model comprises:将所述物品标签集当作语料,使用word2vec模型生成每个标签的向量;Taking the item tag set as a corpus, and using the word2vec model to generate a vector for each tag;根据所述每个物品标签的向量并通过余弦相似值计算不同标签间的相似度w tag,生成标签关联网络G tag=<V tag,E tag>;其中V tag为所述标签关联网络的顶点集,即所有标签的集合;E tag为所述标签关联网络的边集,即不同标签间的相似度w tag集合; The similarity w tag between different tags is calculated according to the vector of each item tag and the cosine similarity value is used to generate a tag association network G tag =<V tag , E tag >; where V tag is the vertex of the tag association network Set, that is, the set of all tags; E tag is the edge set of the tag association network, that is, the set of similarities w tag between different tags;将所述知识图谱中实体与实体之间的关系转换成关联权重w graph,生成知识网络G k=<V k,E k>;其中,V k为所述知识网络的顶点集,即所述知识图谱中所有标签实体集合;E k为所述知识网络的边集,即不同标签实体间的关联权 重w graph集合; The relationship between entities and entities in the knowledge graph is converted into association weights w graph to generate a knowledge network G k =<V k , E k >; where V k is the vertex set of the knowledge network, that is, mapping knowledge all tags in a set of entities; E k is the set of knowledge network side, i.e., between different tags associated weight w graph entities weight set;将所述知识网络G k=<V k,E k>和标签关联网络G tag=<V tag,E tag>以所述标签关联网络的节点为基础进行合并生成所述标签知识网络G=<V,E>;其中,V为所述标签知识网络的顶点集,所述标签知识网络的顶点集和所述标签关联网络的顶点集V tag完全一致,即V=V tag;E为标签知识网络的边集,所述标签知识网络的边集是标签关联网络的边集和知识网络的边集子集E′构成的合集,所述知识网络的边集子集E′是知识网络中包含V tag标签的所有标签实体形成的边集,即V=V tag,E=E tag+E′, E′中的关联权重w e=w tag+w graph; The knowledge network G k =<V k ,E k > and the tag association network G tag =<V tag ,E tag > are combined based on the nodes of the tag association network to generate the tag knowledge network G=< V, E>; where V is the vertex set of the tag knowledge network, and the vertex set of the tag knowledge network is exactly the same as the vertex set V tag of the tag association network, that is, V=V tag ; E is the tag knowledge The edge set of the network, the edge set of the tag knowledge network is the collection of the edge set of the tag association network and the edge set subset E′ of the knowledge network, and the edge set subset E′ of the knowledge network is contained in the knowledge network All entities edge set V tag label forming a label, i.e., V = V tag, E = E tag + E ', The correlation weight w e in E′ = w tag + w graph ;
- 根据权利要求5所述的运用标签知识网络的搜索方法,其特征在于,根据所述物品标签集以及所述标签知识网络生成所述推荐物品的物品特征向量,包括:The search method using a tag knowledge network according to claim 5, wherein generating the item feature vector of the recommended item according to the item tag set and the tag knowledge network comprises:根据所述物品标签集以及所述标签知识网络确定所述物品标签集中每个标签的标签向量T;Determine the tag vector T of each tag in the item tag set according to the item tag set and the tag knowledge network;根据每个物品中包括的标签向量确定每个物品的物品特征向量I,如下所述:Determine the item feature vector I of each item according to the tag vector included in each item, as follows:其中,N表示物品包含的标签数,T i表示第i个标签的标签向量。 Wherein, N represents the number of tagged items included, T i represents the i-th vector of the tag label.
- 根据权利要求6所述的运用标签知识网络的搜索方法,其特征在于,所述标签向量T的向量维度为E cut中边的个数,其中和标签的节点直接相连的边的取值为w e,其它的为0。 The search method using the tag knowledge network according to claim 6, wherein the vector dimension of the tag vector T is the number of edges in E cut , wherein the value of the edge directly connected to the node of the tag is w e , the others are 0.
- 根据权利要求1所述的运用标签知识网络的搜索方法,其特征在于,根 据所述用户标签集以及所述标签知识网络生成所述用户的用户特征向量,包括:The search method using a tag knowledge network according to claim 1, wherein generating the user feature vector of the user according to the user tag set and the tag knowledge network comprises:根据所述用户标签集以及所述标签知识网络计算所述用户特征向量U,如下所述:The calculation of the user feature vector U according to the user tag set and the tag knowledge network is as follows:其中,K表示用户喜欢的标签数,W i表示用户对第i个标签的喜欢程度,T i表示第i个标签的标签向量。 Among them, K represents the number of tags that the user likes, W i represents the degree of user's preference for the i-th tag, and T i represents the tag vector of the i-th tag.
- 根据权利要求1所述的运用标签知识网络的搜索方法,其特征在于,所述通过需要进行检索的物品特征向量或需要进行检索的用户特征向量检索得到相关物品或相关用户,包括:The search method using a tag knowledge network according to claim 1, wherein the retrieval of related items or related users through the feature vector of the item to be retrieved or the feature vector of the user that needs to be retrieved comprises:计算所述需要进行检索的物品特征向量分别与各个召回物品的第二物品特征向量的第一余弦值;或Calculate the first cosine value of the feature vector of the item to be retrieved and the second item feature vector of each recalled item; or计算所述需要进行检索的用户特征向量U分别与各个召回用户的第二用户特征向量的第二余弦值;Calculating the second cosine value of the user feature vector U to be retrieved and the second user feature vector of each recalled user;根据所述第一余弦值和第二余弦值确定若干个满足相似度阈值要求的相关物品或相关用户。According to the first cosine value and the second cosine value, several related items or related users that meet the similarity threshold requirement are determined.
- 一种运用标签知识网络的搜索装置,其特征在于,包括:A search device using tag knowledge network is characterized in that it includes:标签构建模块,用于获取多个推荐物品,对每个推荐物品相关的文本信息进行标签提取得到对应的一个或多个物品标签,并确定由所有所述物品标签构成的物品标签集;The label building module is used to obtain multiple recommended items, extract the text information related to each recommended item to obtain one or more corresponding item tags, and determine an item tag set composed of all the item tags;用户建模模块,用于确定用户对不同物品的历史行为数据,根据所述历史行为数据分析出所述用户偏好的用户标签,并确定由所有所述用户偏好的用户标签构成的用户标签集;The user modeling module is used to determine the user's historical behavior data for different items, analyze the user tags preferred by the user according to the historical behavior data, and determine a user tag set composed of all the user tags preferred by the user;标签知识网络构建模块,用于通过所述物品标签集、知识图谱以及 word2vec模型构建标签知识网络;其中,所述标签知识网络是以标签为节点,标签之间的关联度为边的网络;The tag knowledge network construction module is used to construct a tag knowledge network through the article tag set, knowledge graph and word2vec model; wherein, the tag knowledge network is a network with tags as nodes and correlation between tags as edges;用户和物品特征构建模块,用于根据所述物品标签集、用户标签集以及所述标签知识网络生成所述推荐物品的物品特征向量以及所述用户的用户特征向量;The user and item feature construction module is configured to generate the item feature vector of the recommended item and the user feature vector of the user according to the item tag set, the user tag set, and the tag knowledge network;向量搜索模块,用于通过需要进行检索的物品特征向量或需要进行检索的用户特征向量检索得到相关物品或相关用户。The vector search module is used to retrieve related items or related users through the item feature vector that needs to be retrieved or the user feature vector that needs to be retrieved.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910529138.3A CN110059271B (en) | 2019-06-19 | 2019-06-19 | Searching method and device applying tag knowledge network |
CN201910529138.3 | 2019-06-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253591A1 true WO2020253591A1 (en) | 2020-12-24 |
Family
ID=67325752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/095370 WO2020253591A1 (en) | 2019-06-19 | 2020-06-10 | Search method and apparatus applying tag knowledge network |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110059271B (en) |
WO (1) | WO2020253591A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065952A1 (en) * | 2022-09-30 | 2024-04-04 | 中国四维测绘技术有限公司 | Remote sensing satellite information recommendation method, system and device |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110059271B (en) * | 2019-06-19 | 2020-01-10 | 达而观信息科技(上海)有限公司 | Searching method and device applying tag knowledge network |
CN110941740B (en) * | 2019-11-08 | 2023-07-14 | 深圳市雅阅科技有限公司 | Video recommendation method and computer-readable storage medium |
CN111177410B (en) * | 2019-12-27 | 2021-01-12 | 浙江理工大学 | Knowledge graph storage and similarity retrieval method based on evolution R-tree |
CN111353300B (en) * | 2020-02-14 | 2023-09-01 | 中科天玑数据科技股份有限公司 | Data set construction and related information acquisition method and device |
CN111368141B (en) * | 2020-03-18 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Video tag expansion method, device, computer equipment and storage medium |
CN111598644B (en) * | 2020-04-01 | 2023-05-02 | 华瑞新智科技(北京)有限公司 | Article recommendation method, device and medium |
CN112016003B (en) * | 2020-08-19 | 2022-07-12 | 重庆邮电大学 | Social crowd user tag mining and similar user recommending method based on CNN |
CN111932321B (en) * | 2020-09-23 | 2021-01-05 | 北京每日优鲜电子商务有限公司 | Method and device for pushing article information for user, electronic equipment and medium |
CN112206512B (en) * | 2020-10-28 | 2024-04-19 | 网易(杭州)网络有限公司 | Information processing method, device, electronic equipment and storage medium |
CN112256979B (en) * | 2020-12-24 | 2021-06-04 | 上海二三四五网络科技有限公司 | Control method and device for similar article recommendation |
CN112381627B (en) * | 2021-01-14 | 2021-05-07 | 北京育学园健康管理中心有限公司 | Commodity scoring processing recommendation method and device under child-care knowledge |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243817A1 (en) * | 2007-03-30 | 2008-10-02 | Chan James D | Cluster-based management of collections of items |
CN103593792A (en) * | 2013-11-13 | 2014-02-19 | 复旦大学 | Individual recommendation method and system based on Chinese knowledge mapping |
CN104035917A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Knowledge graph management method and system based on semantic space mapping |
CN106959966A (en) * | 2016-01-12 | 2017-07-18 | 腾讯科技(深圳)有限公司 | A kind of information recommendation method and system |
CN108334558A (en) * | 2018-01-02 | 2018-07-27 | 南京师范大学 | A kind of collaborative filtering recommending method of combination tag and time factor |
CN110059271A (en) * | 2019-06-19 | 2019-07-26 | 达而观信息科技(上海)有限公司 | With the searching method and device of label knowledge network |
-
2019
- 2019-06-19 CN CN201910529138.3A patent/CN110059271B/en active Active
-
2020
- 2020-06-10 WO PCT/CN2020/095370 patent/WO2020253591A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243817A1 (en) * | 2007-03-30 | 2008-10-02 | Chan James D | Cluster-based management of collections of items |
CN103593792A (en) * | 2013-11-13 | 2014-02-19 | 复旦大学 | Individual recommendation method and system based on Chinese knowledge mapping |
CN104035917A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Knowledge graph management method and system based on semantic space mapping |
CN106959966A (en) * | 2016-01-12 | 2017-07-18 | 腾讯科技(深圳)有限公司 | A kind of information recommendation method and system |
CN108334558A (en) * | 2018-01-02 | 2018-07-27 | 南京师范大学 | A kind of collaborative filtering recommending method of combination tag and time factor |
CN110059271A (en) * | 2019-06-19 | 2019-07-26 | 达而观信息科技(上海)有限公司 | With the searching method and device of label knowledge network |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065952A1 (en) * | 2022-09-30 | 2024-04-04 | 中国四维测绘技术有限公司 | Remote sensing satellite information recommendation method, system and device |
Also Published As
Publication number | Publication date |
---|---|
CN110059271B (en) | 2020-01-10 |
CN110059271A (en) | 2019-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253591A1 (en) | Search method and apparatus applying tag knowledge network | |
CN107748754B (en) | Knowledge graph perfecting method and device | |
CN110427563B (en) | Professional field system cold start recommendation method based on knowledge graph | |
KR102075833B1 (en) | Curation method and system for recommending of art contents | |
US7707162B2 (en) | Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification | |
Zhao et al. | Topical keyphrase extraction from twitter | |
CN106294425B (en) | Method and system for automatic graphic summarization of commodity-related web articles | |
WO2018014759A1 (en) | Method, device and system for presenting clustering data table | |
TWI557664B (en) | Product information publishing method and device | |
More | Attribute extraction from product titles in ecommerce | |
WO2016179938A1 (en) | Method and device for question recommendation | |
US20140201180A1 (en) | Intelligent Supplemental Search Engine Optimization | |
US20140143250A1 (en) | Centralized Tracking of User Interest Information from Distributed Information Sources | |
CN110134792B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN104036038A (en) | News recommendation method and system | |
CN111309936A (en) | A method of constructing movie user portraits | |
CN112182145B (en) | Text similarity determination method, device, equipment and storage medium | |
CN108846097A (en) | The interest tags representation method of user, article recommended method and device, equipment | |
TWI645348B (en) | System and method for automatically summarizing images and comments within commodity-related web articles | |
CN104035955B (en) | searching method and device | |
CN106874397B (en) | An automatic semantic annotation method for IoT devices | |
CN104951478A (en) | Information processing method and information processing device | |
CN107908749A (en) | A kind of personage's searching system and method based on search engine | |
CN104077419B (en) | With reference to semantic method for reordering is retrieved with the long query image of visual information | |
JP2009015796A (en) | Apparatus and method for extracting multiplex topics in text, program, and recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20827110 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20827110 Country of ref document: EP Kind code of ref document: A1 |