WO2019141143A1 - 物品间关系挖掘及推荐方法、装置、计算设备、存储介质 - Google Patents

物品间关系挖掘及推荐方法、装置、计算设备、存储介质 Download PDF

Info

Publication number
WO2019141143A1
WO2019141143A1 PCT/CN2019/071570 CN2019071570W WO2019141143A1 WO 2019141143 A1 WO2019141143 A1 WO 2019141143A1 CN 2019071570 W CN2019071570 W CN 2019071570W WO 2019141143 A1 WO2019141143 A1 WO 2019141143A1
Authority
WO
WIPO (PCT)
Prior art keywords
item
behavior
domain
user
items
Prior art date
Application number
PCT/CN2019/071570
Other languages
English (en)
French (fr)
Inventor
王智楠
肖文明
王骏
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019141143A1 publication Critical patent/WO2019141143A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • the present disclosure relates to the field of information recommendation technology, and in particular to information recommendation in the case of a user cold start.
  • Information recommendation technology can solve the problem of information overload.
  • the current information recommendation technology mainly utilizes the user's personal information and historical behavior, realizes the matching of the user and the information through the algorithm, and presents the information that the user may be interested in to the user, thereby reducing the user's selection cost.
  • the use of information recommendation technology can also enable information to be displayed more efficiently.
  • Information recommendation technology has been widely used in life. For example, whether you use the headlines to read news, use Taobao to buy things, or use Douban to view movie information, you can see the information recommended by personalized recommendation technology.
  • information recommendation technology has even become a standard for e-commerce and content distribution applications. There are a large number of companies in the market that offer personalized recommendation algorithm services. Some smaller applications can even provide users with personalized information recommendation services by simply accessing data.
  • Information recommendation technology mainly uses user behavior, user tags, context, social network and other data to predict the user's future behavior, and among them, user behavior data is the most effective and most common. Designing a personalized recommendation system without a lot of user data is a cold start issue.
  • the item is cold start.
  • how to recommend it to a user who might be interested in it is possible.
  • the system is cold start. How to design a personalized recommendation system on a newly developed website or application, so that users can experience personalized recommendation service when the website or application is just released.
  • the present disclosure is mainly directed to a solution proposed by the user for cold start problems.
  • a method for mining a relationship between items in different fields comprising: acquiring first behavior information of a user for a first domain item, and second content of the user for a second domain item Behavior information; determining a degree of relevance between the first domain item and the second domain item based on the first behavior information and the second behavior information of the plurality of users.
  • the step of determining the degree of relevance between the first domain item and the second domain item may include determining, based on the first behavior information of the plurality of users, the first domain item relative to the a first behavioral feature distribution of the plurality of users; determining, according to the second behavior information of the plurality of users, a second behavioral feature distribution of the second domain item relative to the plurality of users; according to the first behavioral feature A degree of similarity between the distribution and the second behavioral feature distribution determines a degree of correlation between the first domain item and the second domain item.
  • the first behavior feature distribution and/or the second behavior feature distribution may include one or more of the following: whether the user performs an action on the item; the user's behavior on the item; the user's preference for the item .
  • the first behavior information and/or the second behavior information includes: whether the user performed the behavior on the item; and/or behavior data generated based on the behavior performed by the user on the item.
  • the behavior data may include one or more of the following: a behavior type; a behavior number; a behavior duration.
  • the first behavioral feature distribution comprises a first preference for each of the plurality of users for the first domain item
  • the second behavioral feature distribution comprising each of the plurality of users
  • the second preference of the second domain item comprises: establishing the plurality of users respectively for the first domain item a preference vector and a second preference vector of the second domain item; determining the first domain item and the location by calculating a similarity between the first preference vector and the second preference vector The correlation between items in the second field.
  • the user's preference for the item is equal to the sum of the sub-preferences corresponding to each behavior type of the user for at least part of the behavior type of the item, wherein the sub-preference level is positively correlated with the number of actions and the behavior weight, respectively.
  • the user's preference r for the item can be determined using the following formula,
  • T is the set of behavior types of the user for the item
  • t is the behavior type
  • q t is the number of behaviors under the behavior type t
  • W t is the behavior weight corresponding to the behavior type t.
  • the relationship mining method may further include: performing normalization processing on the first preference vector and the second preference vector, respectively.
  • a method for mining a relationship between items in different fields comprising: separately acquiring, for each of a plurality of users, one or more items of the first field for the user First behavior data, and second behavior data of the user for one or more second domain items; determining at least a portion of the first domain based on the first behavior data and the second behavior data of the plurality of users The degree of correlation between each of the items and at least some of the items in the second field.
  • an item recommendation method comprising: acquiring first behavior data of a user in a first domain, the first behavior data relating to one or more first domain items; Selecting a second field item from the at least one second field item by at least one of the one or more first field items and each of the at least one second field item; and The user recommends the selected second field item.
  • the degree of correlation between the first domain item and the second domain item may be obtained by the relationship mining method described in the first aspect or the second aspect of the present disclosure.
  • the step of selecting the second field item from the at least one second field item may include: calculating a recommendation degree of each second field item; selecting the top ranked order according to the order of recommendation degree from large to small The number of second field items.
  • the degree of recommendation of the second field item is positively correlated with the relevance of each of the at least one first field item to the second field item.
  • the recommendation degree of the second domain item is equal to a sum of the sub-recommendation of the second domain item to each of the at least one first domain item, the sub-recommendation degree and the first
  • the relevance of the domain item to the item in the second field and the user's preference for the item in the first field are positively correlated.
  • the recommendation of the second field item can be calculated using the following formula,
  • rec uj represents the recommendation degree of the user u for the second domain item j
  • I is the collection of the first domain item involved in the first behavior data of the user in the first field
  • i is the first domain item
  • sim(i j) represents the degree of correlation between the item i of the first field and the item j of the second field
  • r ui represents the preference of the user u for the item i of the first field.
  • the user's preference for the first domain item is equal to a sum of sub-preferences corresponding to each behavior type of the user for at least part of the behavior type of the first domain item, wherein the sub-preference degrees are respectively The number of behaviors is positively correlated with the behavioral weight.
  • the first behavior data includes one or more of the following information of the behavior performed by the user on the first domain item: a behavior type; a behavior number; a behavior duration.
  • a relationship mining device between items in different fields, comprising: a behavior information acquiring module, configured to acquire first behavior information of a user for an item in a first domain, and the user a second behavior information for the second domain item; a correlation determining module, configured to determine the first domain item and the second domain based on the first behavior information and the second behavior information of the multiple users The degree of relevance between items.
  • the relevance determining module may include: a first behavior feature distribution determining unit, configured to determine, according to the first behavior information of the plurality of users, an item of the first domain relative to the plurality of users a behavioral feature distribution unit, configured to determine a second behavioral feature distribution of the second domain item relative to the plurality of users based on the second behavior information of the plurality of users; And a degree determining unit, configured to determine a correlation between the first domain item and the second domain item according to the similarity degree of the first behavior feature distribution and the second behavior feature distribution.
  • the first behavioral feature distribution and/or the second behavioral feature distribution may comprise one or more of the following: whether the user performed an action on the item; the number of times the user has acted on the item; the user's preference for the item.
  • the first behavior information and/or the second behavior information includes: whether the user performed the behavior on the item; and/or behavior data generated based on the behavior performed by the user on the item.
  • the behavior data may include one or more of the following: a behavior type; a behavior number; a behavior duration.
  • the first behavioral feature distribution comprises a first preference for each of the plurality of users for the first domain item
  • the second behavioral feature distribution comprising each of the plurality of users a second preference of the second domain item
  • the relevance determining unit comprising: a vector establishing unit, configured to establish a first preference vector and the second of the plurality of users respectively for the first domain item a second preference vector of the domain item; and a correlation calculation unit configured to determine a correlation between the first domain item and the second domain item by calculating a similarity between the first preference vector and the second preference vector degree.
  • the user's preference for the item is equal to the sum of the sub-preferences corresponding to each behavior type of the user for at least part of the behavior type of the item, wherein the sub-preference level is positively correlated with the number of actions and the behavior weight, respectively.
  • the first behavior feature distribution determining unit and/or the second behavior feature distribution determining unit may determine the preference r of the user for the item using the following formula,
  • T is the set of behavior types of the user for the item
  • t is the behavior type
  • q t is the number of behaviors under the behavior type t
  • W t is the behavior weight corresponding to the behavior type t.
  • the relationship mining device may further include: a normalization processing module, configured to perform normalization processing on the first preference vector and the second preference vector, respectively.
  • a normalization processing module configured to perform normalization processing on the first preference vector and the second preference vector, respectively.
  • a relationship mining device between items in different fields, comprising: a behavior data obtaining module, configured to acquire, for each user of the plurality of users, the user for one Or first behavior data of the plurality of first domain items, and second behavior data of the user for the one or more second domain items; a relevance determining module, configured to determine the first behavior data based on the plurality of users And the second behavior data determining a degree of correlation between each of the at least a portion of the first domain item and each of the at least a portion of the second domain item.
  • an item recommendation apparatus comprising: a first behavior data acquisition module, configured to acquire first behavior data of a user in a first domain, wherein the first behavior data relates to one Or a plurality of first field items; an item selection module for determining a degree of correlation between each of the one or more first field items and each of the at least one second field item Selecting a second field item from the at least one second field item; and an item recommendation module for recommending the selected second field item to the user.
  • the degree of correlation between the first domain item and the second domain item may be obtained by the relationship mining method described in the first aspect or the second aspect of the present disclosure.
  • the item selection module may include: a recommendation degree calculation unit, configured to calculate a recommendation degree of each second domain item; and an item selection unit, configured to select a predetermined number of the top ranked according to a recommendation degree from large to small The second field of goods.
  • the degree of recommendation of the second field item is positively correlated with the relevance of each of the at least one first field item to the second field item.
  • the recommendation degree of each of the second domain items is equal to a sum of sub-recommendations of the second domain items to each of the at least one first domain item, the sub-recommendation degrees respectively.
  • the relevance of the first field item to the second field item and the user's preference for the first field item are positively correlated.
  • the recommendation degree calculation unit may calculate the recommendation degree of the second field item using the following formula,
  • rec uj represents the recommendation degree of the user u for the second domain item j
  • I is the collection of the first domain item involved in the first behavior data of the user in the first field
  • i is the first domain item
  • sim(i j) represents the degree of correlation between the item i of the first field and the item j of the second field
  • r ui represents the preference of the user u for the item i of the first field.
  • a computing device comprising: a processor; and a memory having stored thereon executable code that, when executed by the processor, causes the processor to perform the present disclosure The method described in the first aspect or the second aspect.
  • a non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the present disclosure
  • executable code that, when executed by a processor of an electronic device, causes the processor to perform the present disclosure
  • the degree of correlation between cross-domain items can be determined using a relationship mining scheme between items of different fields of the present disclosure.
  • the object related to the behavior data known by the user in other fields can be mapped to the similar items in the target domain based on the correlation between the cross-domain items, thereby solving the user.
  • FIG. 1 is a schematic flow chart showing a method of mining an inter-item relationship in different fields according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flow chart showing an item recommendation method according to an embodiment of the present disclosure
  • FIG. 3 is a flow chart showing an overall implementation of a cross-domain item-to-item relationship mining scheme and a user cold-starting scheme of the present disclosure
  • 4A and 4B illustrate a schematic diagram of an application for implementing cross-domain recommendation using the present disclosure
  • FIG. 5 is a schematic block diagram showing the structure of a relationship mining device between items in different fields according to an embodiment of the present disclosure
  • FIG. 6 is a schematic block diagram showing the structure of a relationship mining device between items in different fields of another embodiment of the present disclosure
  • FIG. 7 is a schematic block diagram showing the structure of an article recommendation device of the present disclosure.
  • FIG. 8 is a schematic block diagram of a computing device that can be used to perform a relationship mining method and an item recommendation method between items in different fields of the present disclosure.
  • the present disclosure is mainly directed to a solution proposed for the user cold start problem involved in the information recommendation process.
  • the core idea of the present disclosure is to use the user as an associated link to pre-establish association relationships between items in different fields (ie, relevance described below) by collecting behavior data of different users in different fields for different fields.
  • the target domain item having a higher relevance to the other domain items browsed by the user in other fields is recommended to the user.
  • the user cold start problem can be solved.
  • the articles mentioned in the present disclosure mainly refer to items displayed on the Internet, which may be virtual items such as news, pictures, videos, music, or physical items.
  • the items in the field may be virtual items or physical objects to be sold displayed on the shopping platform, such as virtual items such as game coins and game props packages, or may be Physical goods such as clothing and digital products sold by merchants.
  • different applications may be considered as different domains, and different modules in the same application may be considered as different domains.
  • different fields may be divided according to the attributes of the items, for example, may be divided into various fields such as music, video, pictures, novels, and the like according to the format of the items. And for the well-defined areas, it can be further divided. For example, for the video field, it can be further divided into multiple fields such as romance drama, anti-Japanese drama, and costume drama according to the tags carried by the video.
  • different types of applications can also be considered as different fields.
  • social communication applications such as WeChat, QQ
  • news reading applications now headlines, content terminals, Phoenix news, etc.
  • FIG. 1 is a schematic flow chart showing a method of mining an inter-item relationship in different fields according to an embodiment of the present disclosure.
  • step S110 first behavior information of the user for the first domain item and second behavior information of the user for the second domain item are obtained.
  • the first behavior information and/or the second behavior information may include information indicating whether the user performed an action on the article, and in a case where the user performs an action on the article, the first behavior information and/or the second behavior information may further include Behavioral data generated based on the behavior performed by the user on the item.
  • the behavior information (the first behavior information and/or the second behavior information) is null information, it may indicate that the user does not perform an action on the item, and if the behavior information is not empty, the user item may be executed. behavior.
  • the behavior information may further include information indicating whether the user performed an action on the item, such as identification information, the identification information may be “1” and “0”, and “1” indicates that the user performed the behavior on the item, “0” "" indicates that the user has not performed an action on the item.
  • the behavior data may include data generated by the user on the behavior of the item, and may also include data obtained by statistically calculating the behavior performed by the user.
  • the behavior data may include, but is not limited to, a plurality of types of behaviors such as clicks, plays, evaluations, etc. performed by the user on the items, the number of behaviors of each type of behavior, and the duration of the behavior.
  • step S120 based on the first behavior information and the second behavior information of the plurality of users, the correlation between the first domain item and the second domain item is determined.
  • the disclosure may determine, by the user as an associated link, by analyzing the first behavior information and the second behavior information of each of the plurality of users, respectively determining the first behavioral feature distribution of the first domain item relative to the plurality of users and the second The second item of behavioral characteristics of the domain item relative to the plurality of users.
  • the first behavior feature distribution may represent behavior characteristics of the plurality of users respectively for the first domain item
  • the second behavior feature distribution may represent behavior characteristics of the plurality of users respectively for the second domain item.
  • the behavioral feature may be whether the user performs an action on the item, or the number of times the user has acted on the item, or the user's preference for the item by calculation. Among them, whether the user performs behaviors and behavior times on the items can be obtained from the behavior information, and the calculation method of the preference degree will be described below, and will not be described here.
  • the degree of correlation between A and B can be determined based on the degree of similarity between the first behavioral feature distribution of A and the second behavioral characteristic distribution of B.
  • the determining principle is that if a plurality of users have similar distributions of the first behavioral features of A and the second behavioral features of B, then A can be considered to be strongly correlated with B, and vice versa.
  • the first field may be video
  • the second field may be music
  • the first field item is a specific video, such as video 1 - video 4
  • the second field item is specific music, such as music 1 - music 3.
  • the number 1 indicated by the user under the corresponding item indicates that the user performed an action on the corresponding item (video or music), has behavior data (such as click, play, favorite, etc.), and the user shown in the table is in the corresponding item.
  • the lower number 0 indicates that the user has no behavior on the corresponding item (video or music) and has no behavior data.
  • the user 4 performed the behavior for the video 2, and the user 1, the user 3, and the user 5 performed the behavior on the music 2. Therefore, the first behavioral feature distribution of video 2 can be represented as ⁇ 0, 0, 0, 1, 0 ⁇ , and the second behavioral feature distribution of music 2 can be represented as ⁇ 1, 0, 1, 0, 1 ⁇ . Since video 2 and music 2 do not have a common user, it can be considered that video 2 and music 2 are weakly correlated, that is, the correlation between video 2 and music 2 can be considered to be zero.
  • the first behavioral feature distribution may characterize the distribution of preferences of the plurality of users for the first domain item, respectively.
  • the second behavioral feature distribution may characterize the distribution of preferences of the plurality of users for the second domain item. That is, the first behavioral feature distribution may include a first preference of each of the plurality of users for the first domain item, and the second behavioral feature distribution may include a second of the plurality of users for the second domain item Preference.
  • a and A can be determined by calculating the similarity between the distribution of the preferences of the plurality of users for A and the distribution of the preference for B.
  • the calculation process of correlation is as follows.
  • the behavior information may further include behavior data generated based on the behavior performed by the user on the item.
  • the behavior data included in the first behavior information may be referred to as “ The first behavior data
  • the behavior data included in the second behavior information may be referred to as "second behavior data.”
  • the behavior information does not include the behavior data, or the included behavior data is a null value.
  • the user's behavior data for the item may include multiple behavior types such as click, play, and evaluation, and the number of executions of different behavior types is also different.
  • the user's total preference for the item can be considered to be equal to the sum of the sub-preferences corresponding to each of the user's at least part of the behavior type of the item.
  • the sub-preference can be positively correlated with the number of behaviors and the weight of behavior.
  • the user's preference r for the item can be calculated using the following formula,
  • T is the user's behavior type for at least part (preferably all) of the item
  • t is a different behavior type
  • q t is the number of behaviors under the behavior type t
  • W t is the behavior weight corresponding to the behavior type t.
  • the behavior type and the number of behaviors can be obtained from the behavior data.
  • the weights corresponding to different behavior types can be determined in advance by way of assignment, or can be determined by other methods. For example, the weight of the behavior type can be determined according to the duration of the behavior type.
  • Step 2 the establishment of the vector
  • a first preference vector for the first domain item and a second preference vector for the second domain item may be established for the plurality of users.
  • the number of elements in the preference vector is consistent with the number of users, and the value of the element is the user's preference for the corresponding item.
  • the degree of similarity between the first preference vector and the second preference vector can be calculated in a variety of calculations.
  • the calculated similarity can be used as the correlation between the corresponding first field item and the second field item.
  • it can be calculated by various vector similarity calculation methods such as cosine similarity, Jaccard similarity, Pearson correlation coefficient, Euclidean distance, Manhattan distance, and Mahalanobis distance.
  • the first preference vector and the second preference vector calculated in step 1 respectively may be separately determined.
  • the normalization process is performed, and the normalized processed preference vector is used to participate in the correlation calculation.
  • common normalization methods include min-max standardization, log function conversion, z-score standardization, etc., and the process of normalization processing will not be repeated here.
  • the numbers corresponding to the items in Table 2 indicate the user's preference for the items.
  • the preference vector of video 1 is ⁇ 2, 5, 0, 0, 0 ⁇
  • the preference vector of video 2 is ⁇ 0, 0, 0, 1, 0 ⁇
  • the preference vector of video 3 is ⁇ 0, 4, 5, 0, 0 ⁇
  • the preference vector of video 4 is ⁇ 2, 0, 0, 3, 0 ⁇
  • the preference vector of music 1 is ⁇ 4, 0, 0, 5, 4 ⁇
  • the preference vector of music 2 is ⁇ 5, 0, 2, 0, 2 ⁇
  • the preference vector of music 3 is ⁇ 0, 1, 0, 4, 0 ⁇ .
  • the cosine similarity calculation method can be used to calculate the correlation between items in different fields.
  • the calculation results are shown in Table 3 below, and the specific calculation process is no longer described.
  • the first field and the second field have been taken as examples to describe the process of determining the correlation between cross-domain items. It should be noted that, by using the relationship mining scheme of the present disclosure, the correlation between items in the two fields can be tapped for any two of a plurality of different fields.
  • the correlation between items in different fields can be determined by other means. For example, for items in different fields, tags or keywords that can characterize their attributes in multiple dimensions can be extracted, such as a topic model, which can be mapped to a tag based on the element attribute information of the item. You can also use seq2vec to map an item's element attribute information into a vector. In this way, the correlation between items in different fields can be determined by calculating the similarity of labels or vectors of different items.
  • keywords of items in different fields may be extracted, and the degree of similarity between items in different fields may be determined by analyzing the degree of similarity of keywords between items in different fields.
  • first domain item in the first field one or more keywords of the first domain item may be extracted to generate a first keyword vector.
  • keywords of the second field item may be extracted to generate a second keyword vector.
  • the degree of similarity between the first keyword vector and the second keyword vector can be calculated by using the cosine similarity calculation method, so that the correlation between the first domain item and the second domain item can also be determined.
  • the degree of correlation between the cross-domain items can also be determined in various other ways, and details are not described herein again.
  • the first behavior data of the user for one or more first domain items and the second of the user for one or more second domain items may be separately acquired for each of the plurality of users Behavioral data.
  • the plurality of users mentioned herein are preferably users who have behavior data in both the first field and the second field.
  • the first field is different from the second field.
  • the first domain and the second domain may refer to different applications, different modules in the same application, different types of applications, or different fields according to attributes. Such as music, videos, pictures, and more.
  • the first behavior data and the second behavior data may be collected by the client log collection system.
  • the client log can be cleaned to filter out invalid logs caused by user exceptions, abnormal user operations, and server exceptions.
  • the first behavioral data, second behavioral data referred to herein refers to the total behavioral data of the user in the corresponding domain, which may relate to one or more items.
  • the relationship mining method described above in connection with FIG. 1 may be utilized to determine each of at least a portion of the first domain item and at least a portion of the second domain item The correlation between each one. The specific determination process of the correlation is not repeated here.
  • FIG. 2 is a schematic flow chart showing an item recommendation method according to an embodiment of the present disclosure.
  • step S210 first behavior data of the user in the first domain is acquired, and the first behavior data relates to one or more first domain items.
  • step S220 selecting a second field item from the at least one second field item based on a degree of correlation between each of the one or more first field items and each of the at least one second field item .
  • the user in this embodiment refers to a user who lacks behavior data in the second field, that is, the user can be regarded as a new user in the second field.
  • the user is faced with a cold start problem. The user does not have a cold start problem in the first field.
  • the first field in this embodiment may generally refer to a known one or more users different from the second field having behavior data. field.
  • part or all of the items involved in the behavior data in one or more other fields in which the user is known in the behavior data may be from the second field.
  • a second field item having a higher degree of relevance to a part or all of the items involved by the user in other fields is selected as an item suitable for recommendation to the user.
  • the correlation between the first domain item and the second domain item may be predetermined, for example, may be obtained by using a mining method of the inter-item relationship of different fields mentioned above.
  • the selected second field item is recommended to the user.
  • the user when the user is recommended for content (items) in an unknown domain, the user can use the known behavior data of other content areas to search for similar items across domains, and map the user's interest in other content areas to In the unknown domain, it can solve the problem of cold start of users in the unknown domain and improve the user experience.
  • the recommendation degree of each second field item in the at least one second field item may be calculated, and the ranking is selected according to the order of recommendation degree from large to small. A predetermined number of second field items before.
  • the recommendation degree of the second domain item may be positively correlated with the relevance of each of the first domain items and the second domain item, respectively, to the at least part (preferably all) of the items of the user's first behavior data.
  • the degree of recommendation of the second field item may be equal to the sum of the sub-recommendations of the second field item to each of the first field items of the user's first behavior data.
  • the sub-recommendation degree is positively correlated with the degree of relevance of the first field item and the second field item and the user's preference for the first field item.
  • the following formula can be used to calculate the recommendation degree of the second field item.
  • rec uj represents the recommendation degree of the user u for the second domain item j
  • I is the collection of the first domain item involved in the first behavior data of the user in the first field
  • i is the first domain item
  • sim(i j) represents the degree of correlation between the item i of the first field and the item j of the second field
  • r ui represents the preference of the user u for the item i of the first field.
  • the preference may be regarded as the weight of the correlation sim(i, j) between the first domain item i and the second domain item j.
  • the present disclosure can be used to solve application recommendations such as videos, music, news, applications, games, themes, and the like in various electronic devices such as mobile phones, tablets, computers, televisions, smart speakers, smart watches, and the like where there is a user cold start problem.
  • FIG. 3 is a flow chart showing the overall implementation of the cross-domain item-to-item relationship mining scheme and the user cold-starting scheme of the present disclosure. The implementation steps shown in Figure 3 are as follows.
  • Step 1 Collect the behavior log generated by the user.
  • the user can collect behavior data such as click, play, and evaluation of items in different fields on various terminals.
  • Step 2 Log cleaning and calculating preference data
  • the original log is cleaned first, filtering out invalid logs caused by abnormal users, misoperations, server exceptions, and so on. Then, by analyzing the behavior data, the user's preference for the item can be obtained. Among them, the calculation method of preference is not described here.
  • Step 3 Calculate relationship data between the cross-domain items according to the preference data.
  • the preference vector of the items in different fields under multiple users can be derived.
  • the preference vector of Video1 is ⁇ 2, 5, 0, 0, 0 ⁇
  • the preference vector of Video2 is ⁇ 0, 0, 0, 1, 0 ⁇
  • the preference vector of Video3 is ⁇ 0.
  • the preference vector of Video4 is ⁇ 2,0,0,3,0 ⁇
  • the preference vector of Music1 is ⁇ 4,0,0,5,4 ⁇
  • the preference of Music2 The vector is ⁇ 5,0,2,0,2 ⁇
  • the preference vector of Music3 is ⁇ 0, 1, 0, 4, 0 ⁇ .
  • the Cosine similarity calculation method can be used to calculate the similarity between the preference vectors corresponding to the items in different fields, as the correlation between the items in different fields, so that the relationship data between the items in different fields can be obtained.
  • Step 4 calculate the recommendation degree of the item
  • User5 has no behavior data in the video field, so User5 can be regarded as a new user in the video field, and it faces a cold boot problem when recommending video for User5.
  • the recommendation degree of different videos can be calculated for User5 according to the behavior data of User5 in the music field, and the relationship data between the pre-determined video field and the cross-domain items in the music field.
  • the degree of recommendation of User5 for different videos can be calculated using the following formula.
  • rec uj represents the recommendation degree of User5 for video j
  • I is a collection of music related to the behavior data of User5 in the music field, which is ⁇ Music1, Music2 ⁇ .
  • Sim(i,j) represents the degree of correlation between video j and music i
  • r ui represents the preference of User5 for music i.
  • the expansion formula for calculating the recommended value of Video1 is (similarity_mlv1) ⁇ (value_m1)+(similarity_m2v1) ⁇ (value_m2)+(similarity_m3v1) ⁇ (value_m3), where similarity_mlv1 represents Musicic1.
  • value_m1 indicates the preference of User5 for Music1.
  • similarity_m2v1 indicates the correlation between Music2 and Video1
  • value_m2 indicates the preference of User5 for Music2.
  • Similarity_m3v1 indicates the correlation between Music3 and Video1
  • value_m3 indicates the preference of User5 for Music3.
  • the recommended degree of User5 for different videos calculated by the above calculation method is that the recommendation degree of User5 to Video1 is 1.4, the recommendation degree of User5 to Video2 is 2.4, the recommendation degree of User5 to Video3 is 05, and the recommendation degree of User5 to Video4 is Is 4.2.
  • Step 5 According to the recommendation ranking, select the item to recommend
  • the items may be arranged in descending order according to the recommendation degree of the items, and the top ranked items may be displayed to the user as a recommendation list. For example, you can recommend the top ranked Video4 and Video2 to User5.
  • the present disclosure can use other fields of data to complement the problem of insufficient user behavior data in the target domain, solve the cold start problem of the user in the recommendation system, and improve the user experience in the recommendation system.
  • relationship mining method and the item recommendation method between items in different fields of the present disclosure have been described in detail above with reference to FIGS. 1 through 3.
  • a relationship mining device, an item recommendation device, and a computing device between items in different fields of the present disclosure will be described below with reference to FIGS. 5 through 8.
  • FIG. 5 is a schematic block diagram showing the structure of a relationship excavation apparatus between items of different fields of the present disclosure. The details of the related content are the same as those described above with reference to FIG. 1, and details are not described herein again.
  • the relationship mining device 300 may include a behavior information acquisition module 310 and a relevance determination module 320.
  • the behavior information obtaining module 310 may be configured to acquire first behavior information of the user for the first domain item and second behavior information of the user for the second domain item;
  • the relevance determination module may determine the degree of relevance between the first domain item and the second domain item based on the first behavior information and the second behavior information of the plurality of users.
  • the relevance determining module 320 may optionally include a first behavior feature distribution determining unit 321, a second behavior feature distribution determining unit 323, and a relevance determining unit 325 shown by a broken line in the figure.
  • the first behavior feature distribution determining unit 321 may determine a first behavior feature distribution of the first domain item with respect to the plurality of users based on the first behavior information of the plurality of users.
  • the second behavior feature distribution determining unit 323 may determine a second behavior feature distribution of the second domain item with respect to the plurality of users based on the second behavior information of the plurality of users.
  • the relevance determining unit 325 may determine the degree of correlation between the first domain item and the second domain item according to the similarity degree of the first behavior feature distribution of the first domain item and the second behavior feature distribution of the second domain item.
  • the first behavioral feature distribution and/or the second behavioral feature distribution may include one or more of the following: whether the user performed the behavior on the item, the number of times the user has acted on the item, and the user's preference for the item.
  • the first behavioral feature distribution may include a first preference for each of the plurality of users for the first domain item
  • the second behavioral feature distribution may include a second of the plurality of users for the second domain item Preference.
  • the user's preference for the item may be equal to the sum of the sub-preferences corresponding to each behavior type of the part or all of the behavior types of the item, wherein the sub-preference degree is positively correlated with the behavior number and the behavior weight, respectively.
  • the first behavior feature distribution determining unit 321 and/or the second behavior feature distribution determining unit 323 may calculate the user's preference r for the item using the following formula,
  • T is the user's behavior type for the item
  • t is a different behavior type
  • q t is the number of behaviors under the behavior type t
  • W t is the behavior weight corresponding to the behavior type t.
  • the correlation determination unit 325 may include a vector establishment unit 3251 and a correlation calculation unit 3253.
  • the vector establishing unit 3251 is configured to establish a first preference vector of the plurality of users for the first domain item and a second preference vector of the second domain item.
  • the correlation calculation unit 3253 may determine the degree of correlation between the first domain item and the second domain item by calculating the similarity between the first preference vector and the second preference vector.
  • the relationship mining device 300 can also optionally include a normalization processing module 330 as shown by the dashed box in the figure.
  • the normalization processing module 330 may perform normalization processing on the first preference vector and the second preference vector, respectively, and the correlation calculation unit 3253 may calculate the normalized first preference vector and the second preference vector. The degree of similarity between the first field item and the second field item.
  • Fig. 6 is a schematic block diagram showing the structure of a relationship excavation apparatus between items in different fields of the present disclosure. The details of the related content are the same as those described above with reference to FIG. 1, and details are not described herein again.
  • the relationship mining device 600 can include a behavior data acquisition module 610 and a relevance determination module 620.
  • the behavior data obtaining module 610 is configured to acquire, for each of the plurality of users, first behavior data of the user for the one or more first domain items in the first domain, and the user for one or more of the second domain Second behavioral data for items in the second field.
  • the first behavior data and the second behavior data may include one or more of the following: a behavior type, a behavior number, and a behavior duration.
  • the relevance determination module 620 is configured to determine a degree of correlation between each of the at least a portion of the first domain items and each of the at least portions of the second domain items based on the first behavior data and the second behavior data of the plurality of users. For the specific determination manner of determining the correlation between the first field item and the second field item, refer to the related description above, and details are not described herein again.
  • FIG. 7 is a schematic block diagram showing the structure of an article recommendation device of the present disclosure. The details of the related content are the same as those described above with reference to FIG. 2, and details are not described herein again.
  • the item recommendation device 400 may include a first behavior data acquisition module 410, an item selection module 420, and an item recommendation module 430.
  • the first behavior data obtaining module 410 may acquire first behavior data of the user in the first domain, and the first behavior data relates to one or more first domain items.
  • the item selection module 420 may select the first item from the at least one second field item based on a correlation between each of the one or more first field items and each of the at least one second field item Two field items.
  • the correlation between the first field item and the second field item may be obtained by using the relationship mining method described above.
  • the item recommendation module 430 can be used to recommend the selected second field item to the user.
  • the item selection module 420 may also optionally include a recommendation degree calculation unit 421 and an item selection unit 423 shown by a broken line in the figure.
  • the degree of recommendation calculation unit 421 can be used for the degree of recommendation of each item of the second field.
  • the item selecting unit 423 may select a predetermined number of second-area items ranked in the top in order of the degree of recommendation.
  • the recommendation degree of each second domain item may be positively correlated with the relevance of each first domain item and the second domain item involved in the first behavior data of the user.
  • the recommendation degree of the second field item is equal to the sum of the sub-recommendation of the second field item to each of the at least one first field item, and the sub-recommendation degree is respectively related to the first field item and the second field item Correlation and the user's preference for items in the first field are positively correlated.
  • the recommendation degree calculation unit 421 can calculate the degree of recommendation of the second field item using the following formula,
  • rec uj represents the recommendation degree of the user u for the second domain item j
  • I is the collection of the first domain item involved in the first behavior data of the user in the first field
  • i is the first domain item
  • sim(i j) represents the degree of correlation between the item i of the first field and the item j of the second field
  • r ui represents the preference of the user u for the item i of the first field.
  • a computing device that can be used to perform the character recognition model training method and information recommendation method of the present disclosure.
  • FIG. 8 is a schematic block diagram of a computing device that can be used to perform a relationship mining method and an item recommendation method between items in different fields of the present disclosure.
  • the computing device 500 can include a processor 510 and a memory 530.
  • An executable code is stored on the memory 530.
  • the processor 510 executes the executable code, the processor 510 is caused to execute the relationship mining method and the item recommendation method described above.
  • the method according to the invention may also be embodied as a computer program or computer program product comprising computer program code instructions for performing the various steps defined above in the above method of the invention.
  • the present invention may be embodied as a non-transitory machine readable storage medium (or computer readable storage medium, or machine readable storage medium) having stored thereon executable code (or computer program, or computer instruction code)
  • executable code or computer program, or computer instruction code
  • a processor of an electronic device or computing device, server, etc.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than the ones in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提供了一种物品间关系挖掘及推荐方法、装置、计算设备、存储介质。获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;基于多个用户的第一行为信息和第二行为信息,确定第一领域物品与第二领域物品之间的相关度。由此,在用户在目标领域内没有行为数据时,可以基于跨领域物品间的相关度,将用户在其他领域内的行为数据涉及的物品映射到目标领域内与其相似的物品上,从而可以解决用户冷启动问题,提升用户体验。

Description

物品间关系挖掘及推荐方法、装置、计算设备、存储介质
本申请要求2018年01月17日递交的申请号为201810046319.6、发明名称为“物品间关系挖掘及推荐方法、装置、计算设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及信息推荐技术领域,特别是涉及用户冷启动情况下的信息推荐。
背景技术
随着互联网信息技术的发展,人们逐渐从信息匮乏的时代走入了信息过载的时代。目前通过一部小小的手机,便可以观看各种电影、电视、直播,浏览来自世界各个地的新闻。然而面对海量信息,如何快速从中找到符合自身兴趣的部分就成了一件非常困难的事情。
信息推荐技术可以很好地解决信息过载的问题。目前的信息推荐技术主要是利用用户的个人信息及历史行为,通过算法实现用户与信息的匹配,将用户可能感兴趣的信息展现给用户,降低了用户的选择成本。另外,对于信息提供者来说,利用信息推荐技术也可以使得信息能够得到更加高效的展示。
信息推荐技术目前已在生活中得到广泛的应用。例如,无论是使用今日头条看新闻,还是使用淘宝买东西,抑或是使用豆瓣查看电影资讯,都能看到利用个性化推荐技术推荐后的信息。在我国,信息推荐技术甚至已经成为电商和内容分发应用的标配。市场上逐渐出现大量专业提供个性化推荐算法服务的公司,一些规模较小的应用,甚至只需要接入数据就可以为用户提供个性化的信息推荐服务。
信息推荐技术主要利用用户行为、用户标签、上下文、社交网络等数据来预测用户未来的行为,而这其中,用户行为数据最有效也最常见。在没有大量用户数据的情况下设计个性化推荐系统,这就是冷启动问题。
冷启动主要分为三类:
1、用户冷启动。针对新用户,因为缺少历史行为数据,所以无法对其兴趣进行预测,从而无法为其提供精准地个性化推荐服务。
2、物品冷启动。当出现新的物品时,如何将其推荐给可能对它感兴趣的用户。
3、系统冷启动。如何在一个新开发的网站或应用上设计个性化推荐系统,从而在网站或应用刚发布的时候就让用户体验到个性化推荐服务。
其中,以用户冷启动最为常见。传统上,我们对这部分用户提供非个性化的推荐,直至收集足够多的用户行为数据后,才为其提供个性化的信息推荐服务,而这无疑会降低用户的体验。
因此,需要一种能够解决用户冷启动问题的方案。
发明内容
本公开主要是针对用户冷启动问题,提出的一种解决方案。
根据本公开的第一个方面,提供了一种不同领域的物品间的关系挖掘方法,包括:获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;基于多个用户的所述第一行为信息和所述第二行为信息,确定所述第一领域物品与所述第二领域物品之间的相关度。
优选地,确定所述第一领域物品与所述第二领域物品之间的相关度的步骤可以包括:基于所述多个用户的第一行为信息,确定所述第一领域物品相对于所述多个用户的第一行为特征分布;基于所述多个用户的第二行为信息,确定所述第二领域物品相对于所述多个用户的第二行为特征分布;根据所述第一行为特征分布和所述第二行为特征分布的相似程度,确定所述第一领域物品与所述第二领域物品之间的相关度。
优选地,所述第一行为特征分布和/或所述第二行为特征分布可以包括以下一项或多项:用户对物品是否执行了行为;用户对物品的行为次数;用户对物品的偏好度。
优选地,第一行为信息和/或第二行为信息包括:用户对物品是否执行了行为;和/或基于用户对物品执行的行为而产生的行为数据。
优选地,行为数据可以包括以下一项或多项:行为类型;行为次数;行为时长。
优选地,第一行为特征分布包括所述多个用户中每个用户对所述第一领域物品的第一偏好度,所述第二行为特征分布包括所述多个用户中每个用户对所述第二领域物品的第二偏好度,所述确定第一领域物品与所述第二领域物品之间的相关度的步骤包括:建立所述多个用户分别针对所述第一领域物品的第一偏好度向量和所述第二领域物品的第二偏好度向量;通过计算所述第一偏好度向量和所述第二偏好度向量之间的相似度,确定所述第一领域物品与所述第二领域物品之间的相关度。
优选地,用户对物品的偏好度等于用户针对物品的至少部分行为类型中每个行为类 型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
优选地,可以使用如下公式确定用户针对物品的偏好度r,
Figure PCTCN2019071570-appb-000001
其中,T为用户针对物品的行为类型集合,t为行为类型,q t为行为类型t下的行为次数,W t为行为类型t对应的行为权重。
优选地,该关系挖掘方法还可以包括:分别对第一偏好度向量和第二偏好度向量进行归一化处理。
根据本公开的第二个方面,还提供了一种不同领域的物品间的关系挖掘方法,包括:对于多个用户中的每个用户,分别获取所述用户针对一个或多个第一领域物品的第一行为数据、以及所述用户针对一个或多个第二领域物品的第二行为数据;基于所述多个用户的第一行为数据和所述第二行为数据,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。
根据本公开的第三个方面,还提供了一种物品推荐方法,包括:获取用户在第一领域内的第一行为数据,所述第一行为数据涉及一个或多个第一领域物品;基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从所述至少一个第二领域物品中选取第二领域物品;以及向所述用户推荐所选取的第二领域物品。
优选地,第一领域物品与第二领域物品之间的相关度可以是利用本公开第一个方面或第二个方面述及的关系挖掘方法得到的。
优选地,从所述至少一个第二领域物品中选取第二领域物品的步骤可以包括:计算每个第二领域物品的推荐度;按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
优选地,所述第二领域物品的推荐度分别与所述至少一个中的每一个第一领域物品与所述第二领域物品的相关度正相关。
优选地,所述第二领域物品的推荐度等于所述第二领域物品对所述至少一个中的每一个第一领域物品的子推荐度的总和,所述子推荐度分别与所述第一领域物品与所述第二领域物品的相关度以及所述用户对所述第一领域物品的偏好度正相关。
优选地,可以使用如下公式计算第二领域物品的推荐度,
Figure PCTCN2019071570-appb-000002
其中,rec uj表示用户u对第二领域物品j的推荐度,I为用户在第一领域内的第一行为数据所涉及的第一领域物品的集合,i为第一领域物品,sim(i,j)表示第一领域物品i和第二领域物品j之间的相关度,r ui表示用户u对第一领域物品i的偏好度。
优选地,用户对所述第一领域物品的偏好度等于用户针对所述第一领域物品的至少部分行为类型中每个行为类型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
优选地,第一行为数据包括用户对第一领域物品执行的行为的以下一项或多项信息:行为类型;行为次数;行为时长。
根据本公开的第四个方面,还提供了一种不同领域的物品间的关系挖掘装置,包括:行为信息获取模块,用于获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;相关度确定模块,用于基于多个用户的所述第一行为信息和所述第二行为信息,确定所述第一领域物品与所述第二领域物品之间的相关度。
优选地,所述相关度确定模块可以包括:第一行为特征分布确定单元,用于基于所述多个用户的第一行为信息,确定所述第一领域物品相对于所述多个用户的第一行为特征分布;第二行为特征分布确定单元,用于基于所述多个用户中的第二行为信息,确定所述第二领域物品相对于所述多个用户的第二行为特征分布;相关度确定单元,用于根据所述第一行为特征分布和所述第二行为特征分布的相似程度,确定所述第一领域物品与所述第二领域物品之间的相关度。
优选地,第一行为特征分布和/或所述第二行为特征分布可以包括以下一项或多项:用户对物品是否执行了行为;用户对物品的行为次数;用户对物品的偏好度。
优选地,第一行为信息和/或第二行为信息包括:用户对物品是否执行了行为;和/或基于用户对物品执行的行为而产生的行为数据。
优选地,行为数据可以包括以下一项或多项:行为类型;行为次数;行为时长。
优选地,第一行为特征分布包括所述多个用户中每个用户对所述第一领域物品的第一偏好度,所述第二行为特征分布包括所述多个用户中每个用户对所述第二领域物品的第二偏好度,所述相关度确定单元包括:向量建立单元,用于建立所述多个用户分别针对所述第一领域物品的第一偏好度向量和所述第二领域物品的第二偏好度向量;以及相 关度计算单元,用于通过计算第一偏好度向量和第二偏好度向量之间的相似度,确定第一领域物品与第二领域物品之间的相关度。
优选地,用户对物品的偏好度等于用户针对物品的至少部分行为类型中每个行为类型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
优选地,第一行为特征分布确定单元和/或所述第二行为特征分布确定单元可以使用如下公式确定用户针对物品的偏好度r,
Figure PCTCN2019071570-appb-000003
其中,T为用户针对物品的行为类型集合,t为行为类型,q t为行为类型t下的行为次数,W t为行为类型t对应的行为权重。
优选地,该关系挖掘装置还可以包括:归一化处理模块,用于分别对第一偏好度向量和第二偏好度向量进行归一化处理。
根据本公开的第五个方面,还提供了一种不同领域的物品间的关系挖掘装置,包括:行为数据获取模块,用于对于多个用户中的每个用户,分别获取所述用户针对一个或多个第一领域物品的第一行为数据、以及所述用户针对一个或多个第二领域物品的第二行为数据;相关度确定模块,用于基于所述多个用户的第一行为数据和所述第二行为数据,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。
根据本公开的第六个方面,还提供了一种物品推荐装置,包括:第一行为数据获取模块,用于获取用户在第一领域内的第一行为数据,所述第一行为数据涉及一个或多个第一领域物品;物品选取模块,用于基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从所述至少一个第二领域物品中选取第二领域物品;以及物品推荐模块,用于向所述用户推荐所选取的第二领域物品。
优选地,第一领域物品与第二领域物品之间的相关度可以是利用本公开第一个方面或第二个方面述及的关系挖掘方法得到的。
优选地,物品选取模块可以包括:推荐度计算单元,用于计算每个第二领域物品的推荐度;物品选取单元,用于按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
优选地,所述第二领域物品的推荐度分别与所述至少一个中的每一个第一领域物品 与所述第二领域物品的相关度正相关。
优选地,每个所述第二领域物品的推荐度等于所述第二领域物品对所述至少一个中的每一个第一领域物品的子推荐度的总和,所述子推荐度分别与所述第一领域物品与所述第二领域物品的相关度以及所述用户对所述第一领域物品的偏好度正相关。
优选地,推荐度计算单元可以使用如下公式计算第二领域物品的推荐度,
Figure PCTCN2019071570-appb-000004
其中,rec uj表示用户u对第二领域物品j的推荐度,I为用户在第一领域内的第一行为数据所涉及的第一领域物品的集合,i为第一领域物品,sim(i,j)表示第一领域物品i和第二领域物品j之间的相关度,r ui表示用户u对第一领域物品i的偏好度。
根据本公开的第七个方面,还提供了一种计算设备,包括:处理器;以及存储器,其上存储有可执行代码,当可执行代码被处理器执行时,使处理器执行本公开的第一个方面或第二个方面述及的方法。
根据本公开的第八个方面,还提供了一种非暂时性机器可读存储介质,其上存储有可执行代码,当可执行代码被电子设备的处理器执行时,使处理器执行本公开的第一个方面或第二个方面述及的方法。
利用本公开的不同领域的物品间的关系挖掘方案可以确定跨领域物品间的相关度。在用户在目标领域内没有行为数据时,可以基于跨领域物品间的相关度,将用户在其他领域内已知的行为数据涉及的物品映射到目标领域内与其相似的物品上,从而可以解决用户冷启动问题,提升用户体验。
附图说明
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。
图1是示出了根据本公开一实施例的不同领域的物品间关系的挖掘方法的示意性流程图;
图2是示出了根据本公开一实施例的物品推荐方法的示意性流程图;
图3是示出了本公开的跨领域物品间关系挖掘方案以及用户冷启动方案的整体实现 流程图;
图4A、图4B示出了利用本公开实现跨领域推荐的一种应用示意图;
图5是示出了本公开一实施例的不同领域的物品间的关系挖掘装置的结构的示意性框图;
图6是示出了本公开另一实施例的不同领域的物品间的关系挖掘装置的结构的示意性框图;
图7是示出了本公开的物品推荐装置的结构的示意性框图;
图8是可以用于执行本公开的不同领域的物品间的关系挖掘方法和物品推荐方法的计算设备的示意性框图。
具体实施方式
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
【概述】
本公开主要是针对信息推荐过程中涉及的用户冷启动问题,提出的一种解决方案。本公开的核心思想是,以用户作为关联纽带,通过收集多个用户在不同领域内针对不同领域物品的行为数据,预先建立不同领域物品间的关联关系(即下文述及的相关度)。
对于在目标领域没有行为数据的用户,在为其推荐目标领域内的物品时,可以基于预先建立的不同领域物品间的关联关系以及用户在其他领域内针对其他领域物品的行为数据,从目标领域内选取与用户在其他领域内浏览的其他领域物品的关联性较高的目标领域物品推荐给用户。由此,可以解决用户冷启动问题。
本公开述及的物品主要是指在互联网中展示的物品,其可以是新闻、图片、视频、音乐等虚拟物品,也可以是实体物品。例如,以涉及的领域为购物平台为例,该领域内的物品可以是在购物平台上展示的待出售的虚拟物品或实体物体,如可以是游戏币、游戏道具礼包等虚拟物品,也可以是商家出售的服装、数码产品等实体物品。
本公开述及的领域则可以有多种划分方式。
作为本公开的一个示例,可以将不同的应用视为不同的领域,也可以将同一应用中的不同模块视为不同的领域。以新闻资讯应用(如今日头条)为例,可以将应用中的不 同频道视为不同的领域,如可以将今日头条中的视频频道、社会频道、娱乐频道、财经频道、时尚频道视为不同的领域。以购物平台(如京东商城)为例,可以将商城下的电脑整机、办公耗材、电气设备、手机数码、超市百货等商品分类视为不同的领域。
作为本公开的另一个示例,还可以根据物品的属性划分不同的领域,例如可以根据物品的格式划分为音乐、视频、图片、小说等多种领域。并且针对划分好的领域,还可以对其做进一步划分。例如,针对视频领域,还可以根据视频携带的标签,进一步划分为言情剧、抗日剧、古装剧等多个领域。
作为本公开的再一个示例,还可以将不同类型的应用视为不同领域。如可以将社交通讯类应用(如微信、QQ)与新闻阅读类应用(如今日头条、内涵端子、凤凰新闻等)视为不同领域。
另外还可以有其它多种领域划分方式,此处不再赘述。
下面对本公开的技术方案所涉及的各个方面分别进行描述。
【跨领域物品间关系的建立】
图1是示出了根据本公开一实施例的不同领域的物品间关系的挖掘方法的示意性流程图。
参见图1,在步骤S110,获取用户针对第一领域物品的第一行为信息、以及该用户针对第二领域物品的第二行为信息。
第一行为信息和/或第二行为信息可以包括表示用户对物品是否执行了行为的信息,并且在用户对物品执行了行为的情况下,第一行为信息和/或第二行为信息还可以包括基于用户对物品执行的行为而产生的行为数据。
例如,在行为信息(第一行为信息和/或第二行为信息)为空信息的情况下,可以表示用户对物品没有执行行为,行为信息不为空的情况下,可以表示用户度物品执行了行为。再例如,行为信息还可以包括用于表示用户对物品是否执行了行为的信息,如标识信息,标识信息可以是“1”和“0”,“1”表示用户对物品执行了行为,“0”表示用户对物品没有执行行为。
行为数据可以包括用户对物品执行行为产生的数据,也可以包括对用户执行的行为进行统计得到的数据。例如,行为数据可以包括但不限于用户针对物品执行的点击、播放、评价等多种行为类型、每种行为类型的行为次数以及行为时长等。
在步骤S120,基于多个用户的第一行为信息和第二行为信息,确定第一领域物品与第二领域物品之间的相关度。
本公开可以以用户为关联纽带,通过分析多个用户中每个用户的第一行为信息和第二行为信息,分别确定第一领域物品相对于这多个用户的第一行为特征分布和第二领域物品相对于这多个用户的第二行为特征分布。
第一行为特征分布可以表征多个用户中分别对第一领域物品的行为特征,第二行为特征分布可以表征多个用户分别对第二领域物品的行为特征。作为示例,行为特征可以是用户对物品是否执行了行为,也可以是用户对物品的行为次数,还可以是通过计算得到的用户对物品的偏好度等等。其中,用户对物品是否执行了行为、行为次数均可以从行为信息中获取,偏好度的计算方式将在下文进行说明,此处暂不赘述。
针对特定的第一领域物品A和第二领域物品B,可以根据A的第一行为特征分布和B的第二行为特征分布之间的相似程度,确定A与B的相关度。其确定原理为,如果多个用户对A的第一行为特征分布和对B的第二行为特征分布相似,则可以认为A与B强相关,反之弱相关。
  视频1 视频2 视频3 视频4 音乐1 音乐2 音乐3
用户1 1 0 0 1 1 1 0
用户2 1 0 1 0 0 0 1
用户3 0 0 1 0 0 1 0
用户4 0 1 0 1 1 0 1
用户5 0 0 0 0 1 1 0
表一
以表一为例,第一领域可以为视频,第二领域可以为音乐,第一领域物品为具体的视频,如视频1-视频4,第二领域物品为具体的音乐,如音乐1-音乐3。表中示出的用户在相应物品下的数字1表示用户对相应的物品(视频或音乐)执行了行为,具有行为数据(如点击、播放、收藏等),表中示出的用户在相应物品下的数字0表示用户对相应的物品(视频或音乐)没有执行行为,没有行为数据。
如表一所示,用户4针对视频2执行了行为,用户1、用户3以及用户5均对音乐2执行了行为。因此,视频2的第一行为特征分布可以表示为{0,0,0,1,0},音乐2的第二行为特征分布可以表示为{1,0,1,0,1}。由于视频2和音乐2没有共同的使用用户,因此可以认为视频2和音乐2弱相关,即视频2和音乐2之间的相关度可以认为是零。
【相关度的计算】
作为本公开的一个示例,第一行为特征分布可以表征这多个用户分别对第一领域物品的偏好度的分布情况。第二行为特征分布可以表征这多个用户分别对第二领域物品的偏好度的分布情况。即,第一行为特征分布可以包括这多个用户中每个用户对第一领域物品的第一偏好度,第二行为特征分布可以包括多个用户中每个用户对第二领域物品的第二偏好度。
如此针对特定的第一领域物品A和第二领域物品B,可以通过计算这多个用户分别对A的偏好度的分布情况和对B的偏好度的分布情况之间的相似度,确定A和B之间的相关度。相关度的计算过程如下。
步骤1、偏好度计算
如上文所述,在用户对物品执行了行为的情况下,行为信息还可以包括基于用户对物品执行的行为而产生的行为数据,为了便于区分,第一行为信息包括的行为数据可以称为“第一行为数据”,第二行为信息包括的行为数据可以称为“第二行为数据”。其中,在用户没有对物品执行了行为的情况下,行为信息不包括行为数据,或者说包括的行为数据为空值。
可以根据多个用户中每个用户的第一行为数据和第二行为数据,计算每个用户对第一领域内的第一领域物品的第一偏好度,以及对第二领域内的第二领域物品的第二偏好度。
如上文所述,用户针对物品的行为数据可以包括点击、播放、评价等多种行为类型,并且不同的行为类型的执行次数也不尽相同。因此,可以认为用户对物品的总的偏好度等于用户针对物品的至少部分行为类型中每个行为类型对应的子偏好度的总和。其中,子偏好度可以分别与行为次数和行为权重正相关。
例如可以使用如下公式计算用户针对物品的偏好度r,
Figure PCTCN2019071570-appb-000005
其中,T为用户针对物品的至少部分(优选地是全部)行为类型,t为不同的行为类型,q t为行为类型t下的行为次数,W t为行为类型t对应的行为权重。行为类型、行为次数可以从行为数据中获取,不同行为类型对应的权重可以通过赋值的方式预先确定,也可以通过其它方式确定,例如可以根据行为类型的行为时长确定其权重。
步骤2、向量的建立
可以建立所述多个用户分别针对第一领域物品的第一偏好度向量和第二领域物品的第二偏好度向量。偏好度向量中的元素的个数与用户的个数一致,元素的值为用户针对相应物品的偏好度。
步骤3、计算向量间的相似度
可以采取多种计算方式计算第一偏好度向量和第二偏好度向量之间的相似度。计算得到的相似度可以作为相应的第一领域物品和第二领域物品之间的相关度。
例如可以通过余弦相似度、Jaccard相似度、皮尔逊相关系数、欧几里德距离、曼哈顿距离、马氏距离等多种向量相似度计算方式进行计算。
需要说明的是,由于不同领域计算得到的物品的偏好度的量纲可能不同,为了避免量纲不同造成的差异,可以分别对步骤1中计算得到的第一偏好度向量和第二偏好度向量进行归一化处理,使用归一化处理后的偏好度向量参与相关度的计算。其中,常见的归一化方法有min-max标准化、log函数转化、z-score标准化等,归一化处理的过程此处不再赘述。
计算示例
  视频1 视频2 视频3 视频4 音乐1 音乐2 音乐3
用户1 2 0 0 2 4 5 0
用户2 5 0 4 0 0 0 1
用户3 0 0 5 0 0 2 0
用户4 0 1 0 3 5 0 4
用户5 0 0 0 0 4 2 0
表二
表二中物品与用户对应的数字表示了用户对物品的偏好度。根据表二的偏好度计算结果可以得出,视频1的偏好度向量为{2,5,0,0,0},视频2的偏好度向量为{0,0,0,1,0},视频3的偏好度向量为{0,4,5,0,0},视频4的偏好度向量为{2,0,0,3,0},音乐1的偏好度向量为{4,0,0,5,4},音乐2的偏好度向量为{5,0,2,0,2},音乐3的偏好度向量为{0,1,0,4,0}。
可以利用余弦相似度计算方式计算不同领域下的物品间的相关度,计算结果如下表三所示,具体计算过程不再展开描述。
  视频1 视频2 视频3 视频4
音乐1 0.20 0.66 0.00 0.84
音乐2 0.32 0.00 0.27 0.48
音乐3 0.22 0.97 0.15 0.81
表三
至此以第一领域和第二领域为例,描述了跨领域物品间的相关度的确定过程。需要说明是,利用本公开的关系挖掘方案,可以针对多个不同的领域中的任意两个领域,挖掘这两个领域的物品间的相关度。
另外,还可以通过其它方式确定不同领域的物品间的相关度。例如,针对不同领域的物品,可以抽取在多种维度下能够表征其属性的标签或关键词,如可以采取主题模型的方式,根据物品的元素属性信息,将其映射到一个标签上。也可以采取seq2vec的方式,根据物品的元素属性信息,将其映射到一个向量中。如此可以通过计算不同物品的标签或向量的相似程度,确定不同领域的物品间的相关度。
作为本公开的一个示例,可以抽取不同领域的物品的关键词,通过分析不同领域的物品间的关键词的相似程度,确定不同领域的物品间的相关度。具体来说,针对第一领域内的第一领域物品,可以抽取第一领域物品的一个或多个关键词,生成第一关键词向量。针对第二领域内的第二领域物品,可以抽取第二领域物品的一个或多个关键词,生成第二关键词向量。通过可以利用余弦相似度计算方式计算第一关键词向量和第二关键词向量之间的相似程度,如此也可以确定第一领域物品和第二领域物品之间的相关度。
基于本公开的构思,还可以通过多种其它方式确定跨领域物品间的相关度,此处不再赘述。
作为本公开的一个示例,可以针对多个用户中的每个用户,分别获取用户针对一个或多个第一领域物品的第一行为数据、以及用户针对一个或多个第二领域物品的第二行为数据。
其中,此处述及的多个用户优选地是在第一领域内和第二领域内均具有行为数据的用户。第一领域不同于第二领域。根据上文对领域的划分方式的描述可知,第一领域和第二领域可以是指不同应用,也可以是同一应用中的不同模块,还可以是不同类型的应用,或者根据属性划分的不同领域,如音乐、视频、图片等等。
可以通过客户端日志采集系统收集第一行为数据和第二行为数据。并且在通过客户端日志收集行为数据时,还可以对客户端日志进行清洗,滤除其中用户异常、用户操作 异常、服务器异常等导致的无效日志。此处述及的第一行为数据、第二行为数据是指用户在相应领域内的总的行为数据,其可能涉及一个或多个物品。
基于所述多个用户的第一行为数据和第二行为数据,可以利用上文结合图1描述的关系挖掘方法,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。其中相关度的具体确定过程此处不再赘述。
【跨领域物品推荐】
图2是示出了根据本公开一实施例的物品推荐方法的示意性流程图。
参见图2,在步骤S210,获取用户在第一领域内的第一行为数据,第一行为数据涉及一个或多个第一领域物品。
在步骤S220,基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从至少一个第二领域物品中选取第二领域物品。
本实施例中的用户是指在第二领域内缺少行为数据的用户,即用户可以视为是第二领域内的新用户。在为用户推荐第二领域内的第二领域物品时,面临用户冷启动问题。而用户在第一领域内则不存在用户冷启动问题。
与上文在不同领域的物品间的关系挖掘方案中描述到的第一领域不同,本实施例中第一领域可以泛指不同于第二领域的其它一个或多个用户具有行为数据的已知领域。
也就是说,在为用户推荐第二领域内的第二领域物品时,可以基于用户在行为数据已知的一个或多个其它领域内的行为数据所涉及的部分或所有物品,从第二领域内选取与用户在其它领域内涉及的部分或所有物品的相关度较高的第二领域物品作为适宜向用户推荐的物品。
其中,第一领域物品和第二领域物品之间的相关度可以是预先确定的,例如可以是使用上文述及的不同领域的物品间关系的挖掘方法得到的。
在步骤S230,向用户推荐所选取的第二领域物品。
由此,在为用户进行某个未知领域内的内容(物品)推荐时,可以利用用户在其他内容领域的已知行为数据,通过跨领域寻找相似物品,将用户在其他内容领域的兴趣映射到未知领域中,从而可以解决未知领域内的用户冷启动问题,提升用户体验。
作为本公开的一个示例,在选取第二领域物品时,可以通过计算所述至少一个第二领域物品中每一个第二领域物品的推荐度,按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
其中,第二领域物品的推荐度可以分别与用户的第一行为数据涉及的物品集中至少部分(优选地是全部)第一领域物品中的每一个与第二领域物品的相关度正相关。
例如,第二领域物品的推荐度可以等于第二领域物品对用户的第一行为数据涉及的各个第一领域物品的子推荐度的总和。其中,子推荐度分别与第一领域物品与第二领域物品的相关度以及用户对第一领域物品的偏好度正相关。
具体可以使用如下公式计算第二领域物品的推荐度,
Figure PCTCN2019071570-appb-000006
其中,rec uj表示用户u对第二领域物品j的推荐度,I为用户在第一领域内的第一行为数据所涉及的第一领域物品的集合,i为第一领域物品,sim(i,j)表示第一领域物品i和第二领域物品j之间的相关度,r ui表示用户u对第一领域物品i的偏好度。其中,偏好度可以视为第一领域物品i和第二领域物品j之间的相关度sim(i,j)的权重,关于偏好度的计算方式可以参见上文相关说明,此处不再赘述。
具体应用例
本公开可以用于解决存在用户冷启动问题的手机、平板、电脑、电视、智能音箱、智能手表等多种电子设备中的视频、音乐、新闻、应用、游戏、主题等应用推荐中。
图3是示出了本公开的跨领域物品间关系挖掘方案以及用户冷启动方案的整体实现流程图。图3所示的实现步骤如下。
步骤1、收集用户产生的行为日志
可以通过客户端日志采集系统,收集用户在各种终端上的不同领域中的针对物品的点击、播放、评价等行为数据。
步骤2、日志清洗、计算偏好数据
首先对原始日志进行清洗,过滤掉异常用户、误操作、服务器异常等导致的无效日志。然后通过对行为数据分析,可以得到用户对物品的偏好度。其中,关于偏好度的计算方式,此处不再赘述。
步骤3、根据偏好数据计算跨域物品间的关系数据。
根据步骤2得到的偏好数据,可以得出不同领域的物品在多个用户(User,即图中示出的User1至User5)下的偏好度向量。如图3所示,Video1的偏好度向量为{2,5,0,0,0},Video2的偏好度向量为{0,0,0,1,0},Video3的偏好度向量为{0,4,5, 0,0},Video4的偏好度向量为{2,0,0,3,0},Music1的偏好度向量为{4,0,0,5,4},Music2的偏好度向量为{5,0,2,0,2},Music3的偏好度向量为{0,1,0,4,0}。
可以利用Cosine相似度计算方式计算不同领域的物品对应的偏好度向量之间的相似度,以作为不同领域物品间的相关度,从而可以得到不同领域物品间的关系数据。
步骤4、计算物品的推荐度
User5在视频领域(Video)中没有行为数据,因此User5可以视为视频领域中的新用户,在为User5推荐视频时,面临冷启动问题。
可以根据User5在音乐领域中的行为数据,以及预先确定的视频领域和音乐领域中的跨领域物品间的关系数据,针对User5计算不同Video的推荐度。
具体地,可以使用如下公式计算User5对不同Video的推荐度。
Figure PCTCN2019071570-appb-000007
此处,rec uj表示User5对视频j的推荐度,I为User5在音乐领域中的行为数据涉及的音乐的集合,为{Music1,Music2}。sim(i,j)表示视频j和音乐i之间的相关度,r ui表示User5对音乐i的偏好度。
如图3所示,计算Video1的推荐值(即推荐度)的展开公式为,(similarity_mlv1)·(value_m1)+(similarity_m2v1)·(value_m2)+(similarity_m3v1)·(value_m3),其中,similarity_mlv1表示Music1和Video1之间的相关度,value_m1表示User5对Music1的偏好度。imilarity_m2v1表示Music2和Video1之间的相关度,value_m2表示User5对Music2的偏好度。imilarity_m3v1表示Music3和Video1之间的相关度,value_m3表示User5对Music3的偏好度。
利用上述计算方式最终计算得到的User5对不同Video的推荐度为,User5对Video1的推荐度为1.4,User5对Video2的推荐度为2.4,User5对Video3的推荐度为05,User5对Video4的推荐度为4.2。
步骤5、根据推荐度排名,选取物品进行推荐
如图3所示,可以按照物品的推荐度由大到小的顺序进行排列,排名靠前的物品可以作为推荐列表展示给用户。例如可以将排名靠前的Video4、Video2推荐给User5。
由此,对于在目标领域缺少用户行为数据的新用户,原本只能看到非个性化的信息,利用本公开则可以看到个性化的信息推荐结果。如图4A、图4B所示,用户虽然没有使用过视频中心,但根据用户看过小说《三生三世十里桃花》,则在用户打开视频应用“猜 你喜欢”模块后,可以看到推荐的电视剧《三生三世十里桃花》。
综上,本公开可以使用其他领域的数据来补足目标领域用户行为数据不足问题,解决用户在推荐系统中的冷启动问题,提升用户在推荐系统中的体验。
至此,上文中已经参考图1至图3详细描述了本公开的不同领域的物品间的关系挖掘方法和物品推荐方法。下面参考图5至图8描述本公开的不同领域的物品间的关系挖掘装置、物品推荐装置及计算设备。
【关系挖掘装置】
图5是示出了本公开的不同领域的物品间的关系挖掘装置的结构的示意性框图。其中,有关内容的细节与上文中参考图1的描述相同,在此不再赘述。
参见图5,关系挖掘装置300可以包括行为信息获取模块310和相关度确定模块320。
行为信息获取模块310可以用于获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;
相关度确定模块可以基于多个用户的第一行为信息和第二行为信息,确定第一领域物品与第二领域物品之间的相关度。
如图5所示,相关度确定模块320可以可选地包括图中虚线框所示的第一行为特征分布确定单元321、第二行为特征分布确定单元323和相关度确定单元325。
第一行为特征分布确定单元321可以基于多个用户的第一行为信息,确定第一领域物品相对于多个用户的第一行为特征分布。
第二行为特征分布确定单元323可以基于多个用户的第二行为信息,确定第二领域物品相对于多个用户的第二行为特征分布。
相关度确定单元325可以根据第一领域物品的第一行为特征分布和第二领域物品的第二行为特征分布的相似程度,确定第一领域物品和第二领域物品之间的相关度。
第一行为特征分布和/或第二行为特征分布可以包括以下一项或多项:用户对物品是否执行了行为、用户对物品的行为次数、用户对物品的偏好度。
作为示例,第一行为特征分布可以包括多个用户中每个用户对第一领域物品的第一偏好度,第二行为特征分布可以包括多个用户中每个用户对第二领域物品的第二偏好度。
用户对物品的偏好度可以等于用户针对物品的部分或所有行为类型中每个行为类型对应的子偏好度的总和,其中,子偏好度分别与行为次数和行为权重正相关。例如,第一行为特征分布确定单元321和/或第二行为特征分布确定单元323可以使用如下公式计 算用户针对物品的偏好度r,
Figure PCTCN2019071570-appb-000008
其中,T为用户针对物品的所有行为类型,t为不同的行为类型,q t为行为类型t下的行为次数,W t为行为类型t对应的行为权重。
相关度确定单元325可以包括向量建立单元3251和相关度计算单元3253。
向量建立单元3251用于建立多个用户分别针对第一领域物品的第一偏好度向量和第二领域物品的第二偏好度向量。
相关度计算单元3253可以通过计算第一偏好度向量和第二偏好度向量之间的相似度,确定第一领域物品和第二领域物品之间的相关度。
如图5所示,关系挖掘装置300还可以可选地包括图中虚线框所示的归一化处理模块330。归一化处理模块330可以分别对第一偏好度向量和第二偏好度向量进行归一化处理,相关度计算单元3253可以计算归一化处理后的第一偏好度向量和第二偏好度向量之间的相似度,作为第一领域物品和第二领域物品间的相关度。
图6示出了本公开的不同领域的物品间的关系挖掘装置的结构的示意性框图。其中,有关内容的细节与上文中参考图1的描述相同,在此不再赘述。
参见图6,关系挖掘装置600可以包括行为数据获取模块610和相关度确定模块620。
行为数据获取模块610用于对于多个用户中每个用户,分别获取用户在第一领域内针对一个或多个第一领域物品的第一行为数据、以及用户在第二领域内针对一个或多个第二领域物品的第二行为数据。
第一行为数据和第二行为数据可以包括以下一项或多项:行为类型、行为次数、行为时长。
相关度确定模块620用于基于多个用户的第一行为数据和第二行为数据,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。其中,确定第一领域物品与第二领域物品间的相关度的具体确定方式可以参见上文相关说明,此处不再赘述。
【物品推荐装置】
图7是示出了本公开的物品推荐装置的结构的示意性框图。其中,有关内容的细节与上文中参考图2的描述相同,在此不再赘述。
参见图7,物品推荐装置400可以包括第一行为数据获取模块410、物品选取模块420以及物品推荐模块430。
第一行为数据获取模块410可以获取用户在第一领域内的第一行为数据,第一行为数据涉及一个或多个第一领域物品。
物品选取模块420可以基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从所述至少一个第二领域物品中选取第二领域物品。其中,第一领域物品与第二领域物品之间的相关度可以是利用上文述及的关系挖掘方法得到的。
物品推荐模块430可以用于向用户推荐所选取的第二领域物品。
如图7所示,物品选取模块420还可以可选地包括图中虚线框所示的推荐度计算单元421和物品选取单元423。
推荐度计算单元421可以用于每个第二领域物品的推荐度。物品选取单元423可以按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
其中,每个第二领域物品的推荐度可以与用户的第一行为数据涉及的各个第一领域物品与第二领域物品的相关度正相关。
作为示例,第二领域物品的推荐度等于第二领域物品对所述至少一个中的每一个第一领域物品的子推荐度的总和,子推荐度分别与第一领域物品与第二领域物品的相关度以及用户对第一领域物品的偏好度正相关。
例如,推荐度计算单元421可以使用如下公式计算第二领域物品的推荐度,
Figure PCTCN2019071570-appb-000009
其中,rec uj表示用户u对第二领域物品j的推荐度,I为用户在第一领域内的第一行为数据所涉及的第一领域物品的集合,i为第一领域物品,sim(i,j)表示第一领域物品i和第二领域物品j之间的相关度,r ui表示用户u对第一领域物品i的偏好度。
【计算设备】
根据本公开还提供了一种可以用于执行本公开的角色识别模型训练方法和信息推荐方法的计算设备。
图8是可以用于执行本公开的不同领域的物品间的关系挖掘方法和物品推荐方法的计算设备的示意性框图。
如图8所示,该计算设备500可以包括处理器510和存储器530。存储器530上存储有可执行代码。当处理器510执行该可执行代码时,使得处理器510执行上面描述的关系挖掘方法和物品推荐方法。
上文中已经参考附图详细描述了根据本发明的物品间关系挖掘及推荐方法、装置、计算设备。
此外,根据本发明的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本发明的上述方法中限定的上述各步骤的计算机程序代码指令。
或者,本发明还可以实施为一种非暂时性机器可读存储介质(或计算机可读存储介质、或机器可读存储介质),其上存储有可执行代码(或计算机程序、或计算机指令代码),当所述可执行代码(或计算机程序、或计算机指令代码)被电子设备(或计算设备、服务器等)的处理器执行时,使所述处理器执行根据本发明的上述方法的各个步骤。
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (32)

  1. 一种不同领域的物品间的关系挖掘方法,其特征在于,包括:
    获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;
    基于多个用户的所述第一行为信息和所述第二行为信息,确定所述第一领域物品与所述第二领域物品之间的相关度。
  2. 根据权利要求1所述的关系挖掘方法,其特征在于,所述确定所述第一领域物品与所述第二领域物品之间的相关度的步骤包括:
    基于所述多个用户的第一行为信息,确定所述第一领域物品相对于所述多个用户的第一行为特征分布;
    基于所述多个用户的第二行为信息,确定所述第二领域物品相对于所述多个用户的第二行为特征分布;
    根据所述第一行为特征分布和所述第二行为特征分布的相似程度,确定所述第一领域物品与所述第二领域物品之间的相关度。
  3. 根据权利要求2所述的关系挖掘方法,其特征在于,所述第一行为特征分布和/或所述第二行为特征分布包括以下一项或多项:
    用户对物品是否执行了行为;
    用户对物品的行为次数;
    用户对物品的偏好度。
  4. 根据权利要求1所述的关系挖掘方法,其特征在于,所述第一行为信息和/或所述第二行为信息包括:
    用户对物品是否执行了行为;和/或
    基于用户对物品执行的行为而产生的行为数据。
  5. 根据权利要求4所述的关系挖掘方法,其特征在于,所述行为数据包括以下一项或多项:
    行为类型;
    行为次数;
    行为时长。
  6. 根据权利要求2所述的关系挖掘方法,其特征在于,所述第一行为特征分布包括所述多个用户中每个用户对所述第一领域物品的第一偏好度,所述第二行为特征分布包括所述多个用户中每个用户对所述第二领域物品的第二偏好度,所述确定第一领域物品与所述第二领域物品之间的相关度的步骤包括:
    建立所述多个用户分别针对所述第一领域物品的第一偏好度向量和所述第二领域物品的第二偏好度向量;
    通过计算所述第一偏好度向量和所述第二偏好度向量之间的相似度,确定所述第一领域物品与所述第二领域物品之间的相关度。
  7. 根据权利要求6所述的关系挖掘方法,其特征在于,
    用户对物品的偏好度等于用户针对物品的至少部分行为类型中每个行为类型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
  8. 根据权利要求6所述的关系挖掘方法,其特征在于,还包括:
    分别对所述第一偏好度向量和所述第二偏好度向量进行归一化处理。
  9. 一种不同领域的物品间的关系挖掘方法,其特征在于,包括:
    对于多个用户中的每个用户,分别获取所述用户针对一个或多个第一领域物品的第一行为数据、以及所述用户针对一个或多个第二领域物品的第二行为数据;
    基于所述多个用户的第一行为数据和所述第二行为数据,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。
  10. 一种物品推荐方法,其特征在于,包括:
    获取用户在第一领域内的第一行为数据,所述第一行为数据涉及一个或多个第一领域物品;
    基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从所述至少一个第二领域物品中选取第二领域物品;以及
    向所述用户推荐所选取的第二领域物品。
  11. 根据权利要求10所述的物品推荐方法,其特征在于,所述第一领域物品与所述第二领域物品之间的相关度是利用权利要求1至9中任何一项所述的关系挖掘方法得到的。
  12. 根据权利要求10所述的物品推荐方法,其特征在于,从所述至少一个第二领域物品中选取第二领域物品的步骤包括:
    计算每个所述第二领域物品的推荐度;
    按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
  13. 根据权利要求12所述的物品推荐方法,其特征在于,
    所述第二领域物品的推荐度分别与所述至少一个中的每一个第一领域物品与所述第二领域物品的相关度正相关。
  14. 根据权利要求13所述的物品推荐方法,其特征在于,
    所述第二领域物品的推荐度等于所述第二领域物品对所述至少一个中的每一个第一领域物品的子推荐度的总和,所述子推荐度分别与所述第一领域物品与所述第二领域物品的相关度以及所述用户对所述第一领域物品的偏好度正相关。
  15. 根据权利要求14所述的物品推荐方法,其特征在于,
    所述用户对所述第一领域物品的偏好度等于用户针对所述第一领域物品的至少部分行为类型中每个行为类型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
  16. 根据权利要求10所述的物品推荐方法,其特征在于,所述第一行为数据包括用户对第一领域物品执行的行为的以下一项或多项信息:
    行为类型;
    行为次数;
    行为时长。
  17. 一种不同领域的物品间的关系挖掘装置,其特征在于,包括:
    行为信息获取模块,用于获取用户针对第一领域物品的第一行为信息、以及所述用户针对第二领域物品的第二行为信息;
    相关度确定模块,用于基于多个用户的所述第一行为信息和所述第二行为信息,确定所述第一领域物品与所述第二领域物品之间的相关度。
  18. 根据权利要求17所述的关系挖掘装置,其特征在于,所述相关度确定模块包括:
    第一行为特征分布确定单元,用于基于所述多个用户的第一行为信息,确定所述第一领域物品相对于所述多个用户的第一行为特征分布;
    第二行为特征分布确定单元,用于基于所述多个用户的第二行为信息,确定所述第二领域物品相对于所述多个用户的第二行为特征分布;
    相关度确定单元,用于根据所述第一行为特征分布和所述第二行为特征分布的相似程度,确定所述第一领域物品与所述第二领域物品之间的相关度。
  19. 根据权利要求18所述的关系挖掘装置,其特征在于,所述第一行为特征分布和/或所述第二行为特征分布包括以下一项或多项:
    用户对物品是否执行了行为;
    用户对物品的行为次数;
    用户对物品的偏好度。
  20. 根据权利要求17所述的关系挖掘装置,其特征在于,所述第一行为信息和/或所述第二行为信息包括:
    用户对物品是否执行了行为;和/或
    基于用户对物品执行的行为而产生的行为数据。
  21. 根据权利要求20所述的关系挖掘装置,其特征在于,所述行为数据包括以下一项或多项:
    行为类型;
    行为次数;
    行为时长。
  22. 根据权利要求18所述的关系挖掘装置,其特征在于,所述第一行为特征分布包 括所述多个用户中每个用户对所述第一领域物品的第一偏好度,所述第二行为特征分布包括所述多个用户中每个用户对所述第二领域物品的第二偏好度,所述相关度确定单元包括:
    向量建立单元,用于建立所述多个用户分别针对所述第一领域物品的第一偏好度向量和所述第二领域物品的第二偏好度向量;以及
    相关度计算单元,用于通过计算所述第一偏好度向量和所述第二偏好度向量之间的相似度,确定所述第一领域物品与所述第二领域物品之间的相关度。
  23. 根据权利要求22所述的关系挖掘装置,其特征在于,
    用户对物品的偏好度等于用户针对物品的至少部分行为类型中每个行为类型对应的子偏好度的总和,其中,所述子偏好度分别与行为次数和行为权重正相关。
  24. 根据权利要求22所述的关系挖掘装置,其特征在于,还包括:
    归一化处理模块,用于分别对所述第一偏好度向量和所述第二偏好度向量进行归一化处理。
  25. 一种不同领域的物品间的关系挖掘装置,其特征在于,包括:
    行为数据获取模块,用于对于多个用户中的每个用户,分别获取所述用户针对一个或多个第一领域物品的第一行为数据、以及所述用户针对一个或多个第二领域物品的第二行为数据;
    相关度确定模块,用于基于所述多个用户的第一行为数据和所述第二行为数据,确定至少部分第一领域物品中的每一个与至少部分第二领域物品中的每一个之间的相关度。
  26. 一种物品推荐装置,其特征在于,包括:
    第一行为数据获取模块,用于获取用户在第一领域内的第一行为数据,所述第一行为数据涉及一个或多个第一领域物品;
    物品选取模块,用于基于所述一个或多个第一领域物品中的至少一个分别与至少一个第二领域物品中的每一个之间的相关度,从所述至少一个第二领域物品中选取第二领域物品;以及
    物品推荐模块,用于向所述用户推荐所选取的第二领域物品。
  27. 根据权利要求26所述的物品推荐装置,其特征在于,所述第一领域物品与所述第二领域物品之间的相关度是利用权利要求1至9中任何一项所述的关系挖掘方法得到的。
  28. 根据权利要求26所述的物品推荐装置,其特征在于,所述物品选取模块包括:
    推荐度计算单元,用于计算每个所述第二领域物品的推荐度;
    物品选取单元,用于按照推荐度由大到小的顺序,选取排名靠前的预定数量的第二领域物品。
  29. 根据权利要求28所述的物品推荐装置,其特征在于,
    所述第二领域物品的推荐度分别与所述至少一个中的每一个第一领域物品与所述第二领域物品的相关度正相关。
  30. 根据权利要求29所述的物品推荐装置,其特征在于,
    所述第二领域物品的推荐度等于所述第二领域物品对所述至少一个中的每一个第一领域物品的子推荐度的总和,所述子推荐度分别与所述第一领域物品与所述第二领域物品的相关度以及所述用户对所述第一领域物品的偏好度正相关。
  31. 一种计算设备,包括:
    处理器;以及
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1-16中任何一项所述的方法。
  32. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1至16中任一项所述的方法。
PCT/CN2019/071570 2018-01-17 2019-01-14 物品间关系挖掘及推荐方法、装置、计算设备、存储介质 WO2019141143A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810046319.6A CN110110206B (zh) 2018-01-17 2018-01-17 物品间关系挖掘及推荐方法、装置、计算设备、存储介质
CN201810046319.6 2018-01-17

Publications (1)

Publication Number Publication Date
WO2019141143A1 true WO2019141143A1 (zh) 2019-07-25

Family

ID=67301960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/071570 WO2019141143A1 (zh) 2018-01-17 2019-01-14 物品间关系挖掘及推荐方法、装置、计算设备、存储介质

Country Status (3)

Country Link
CN (1) CN110110206B (zh)
TW (1) TW201933231A (zh)
WO (1) WO2019141143A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750004A (zh) * 2019-10-31 2021-05-04 深圳云天励飞技术有限公司 一种跨领域商品的冷启动推荐方法、装置和电子设备
CN111241394B (zh) * 2020-01-07 2023-09-22 腾讯科技(深圳)有限公司 数据处理方法、装置、计算机可读存储介质及电子设备
CN112035753B (zh) * 2020-11-02 2021-03-02 北京每日优鲜电子商务有限公司 推荐页面生成方法、装置、电子设备和计算机可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156721A (zh) * 2011-03-29 2011-08-17 张栋 基于标签的互联网视频广告精准投放方法
CN103309967A (zh) * 2013-06-05 2013-09-18 清华大学 基于相似性传递的协同过滤方法及系统
CN103744966A (zh) * 2014-01-07 2014-04-23 Tcl集团股份有限公司 一种物品推荐方法、装置
CN104598643A (zh) * 2015-02-13 2015-05-06 成都品果科技有限公司 一种物品相似度贡献系数、相似度获取方法及物品推荐方法及其系统
CN106296270A (zh) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 商品推荐方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573108A (zh) * 2015-01-30 2015-04-29 联想(北京)有限公司 信息处理方法和信息处理装置
CN105809479A (zh) * 2016-03-07 2016-07-27 海信集团有限公司 物品推荐方法及装置
CN106651542B (zh) * 2016-12-31 2021-06-25 珠海市魅族科技有限公司 一种物品推荐的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156721A (zh) * 2011-03-29 2011-08-17 张栋 基于标签的互联网视频广告精准投放方法
CN103309967A (zh) * 2013-06-05 2013-09-18 清华大学 基于相似性传递的协同过滤方法及系统
CN103744966A (zh) * 2014-01-07 2014-04-23 Tcl集团股份有限公司 一种物品推荐方法、装置
CN104598643A (zh) * 2015-02-13 2015-05-06 成都品果科技有限公司 一种物品相似度贡献系数、相似度获取方法及物品推荐方法及其系统
CN106296270A (zh) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 商品推荐方法及装置

Also Published As

Publication number Publication date
CN110110206A (zh) 2019-08-09
TW201933231A (zh) 2019-08-16
CN110110206B (zh) 2023-10-31

Similar Documents

Publication Publication Date Title
WO2020048084A1 (zh) 资源推荐方法、装置、计算机设备及计算机可读存储介质
CN108960975A (zh) 基于用户画像的个性化精准营销方法、服务器及存储介质
CN106326391B (zh) 多媒体资源推荐方法及装置
CN108664513B (zh) 用于推送关键词的方法、装置以及设备
CN107704560B (zh) 一种信息推荐的方法、装置及设备
Mazeh et al. A personal data store approach for recommender systems: enhancing privacy without sacrificing accuracy
WO2019141143A1 (zh) 物品间关系挖掘及推荐方法、装置、计算设备、存储介质
WO2015123541A1 (en) Method, apparatus, and system for displaying order information
WO2016078533A1 (zh) 搜索方法、装置、设备及非易失性计算机存储介质
CN109242537A (zh) 广告投放方法、装置、计算机设备及存储介质
CN105718951B (zh) 用户相似度的估算方法及估算系统
EP3818492B1 (en) Communication via simulated user
US11954162B2 (en) Recommending information to present to users without server-side collection of user data for those users
CN107977678A (zh) 用于输出信息的方法和装置
CN113225580B (zh) 直播数据处理方法、装置、电子设备及介质
WO2012159308A1 (zh) 一种业务交叉时的项目推荐方法及系统
CN114066533A (zh) 产品推荐方法、装置、电子设备及存储介质
US11200288B1 (en) Validating interests for a search and feed service
CN112036987B (zh) 确定推荐商品的方法和装置
CN112836126A (zh) 基于知识图谱的推荐方法、装置、电子设备及存储介质
WO2020135420A1 (zh) 对用户进行分类的方法和装置
US20160124959A1 (en) System and method to recommend a bundle of items based on item/user tagging and co-install graph
CN110827044A (zh) 提取用户兴趣模式的方法和装置
CA3178677A1 (en) User search category predictor
CN112507220A (zh) 信息推送方法、装置及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19740920

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19740920

Country of ref document: EP

Kind code of ref document: A1