CN110110206B

CN110110206B - Method, device, computing equipment and storage medium for mining and recommending relationships among articles

Info

Publication number: CN110110206B
Application number: CN201810046319.6A
Authority: CN
Inventors: 王智楠; 肖文明; 王骏
Original assignee: Banma Zhixing Network Hongkong Co Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2018-01-17
Filing date: 2018-01-17
Publication date: 2023-10-31
Anticipated expiration: 2038-01-17
Also published as: CN110110206A; WO2019141143A1; TW201933231A

Abstract

The disclosure provides a method, a device, a computing device and a storage medium for mining and recommending relationships among articles. Acquiring first behavior information of a user aiming at an article in a first field and second behavior information of the user aiming at an article in a second field; a degree of correlation between the first domain item and the second domain item is determined based on the first behavior information and the second behavior information of the plurality of users. Therefore, when the user does not have the behavior data in the target field, the objects related to the behavior data of the user in other fields can be mapped to the objects similar to the objects in the target field based on the correlation among the objects in the cross-field, so that the problem of cold start of the user can be solved, and the user experience is improved.

Description

Method, device, computing equipment and storage medium for mining and recommending relationships among articles

Technical Field

The disclosure relates to the technical field of information recommendation, and in particular to information recommendation under a cold start condition of a user.

Background

With the development of internet information technology, people gradually move from the age of information deficiency to the age of information overload. At present, various movies, televisions and live broadcasting can be watched through a small mobile phone, and news from various places in the world can be browsed. However, it is very difficult to find out the parts meeting the own interests from the mass information.

The information recommendation technology can well solve the problem of information overload. The current information recommendation technology mainly utilizes personal information and historical behaviors of a user, realizes matching of the user and the information through an algorithm, displays the information possibly interested by the user to the user, and reduces the selection cost of the user. In addition, information recommendation techniques may also enable more efficient presentation of information to information providers.

Information recommendation technology is now widely used in life. For example, information recommended by the personalized recommendation technique can be seen whether news is watched by the present day, things are bought by the panning, or movie information is watched by the bean. In China, the information recommendation technology is even becoming a standard for electronic commerce and content distribution applications. A large number of companies specialized in providing personalized recommendation algorithm services gradually appear in the market, and some applications with smaller scale can provide personalized information recommendation services for users even by only accessing data.

Information recommendation techniques primarily utilize data of user behavior, user tags, context, social networks, etc. to predict future behavior of a user, where user behavior data is most efficient and common. The personalized recommendation system is designed without a large amount of user data, which is a cold start problem.

Cold starts are mainly divided into three categories:

1. the user is cold started. For a new user, because of the lack of historical behavior data, their interests cannot be predicted, and thus cannot be provided with an accurate personalized recommendation service.

2. The article is cold started. When a new item appears, how it is recommended to users who may be interested in it.

3. The system is cold started. How to design a personalized recommendation system on a newly developed website or application, so that a user can experience personalized recommendation services when the website or application is just released.

Among them, cold start by the user is most common. Traditionally, we provided non-personalized recommendations to this part of the user, until enough user behavior data is collected, and then personalized information recommendation services are not provided for this, which undoubtedly reduces the user experience.

Therefore, a solution to the problem of cold start by the user is needed.

Disclosure of Invention

The present disclosure is directed primarily to a solution to the problem of cold start by a user.

According to a first aspect of the present disclosure, there is provided a method for mining a relationship between items in different fields, including: acquiring first behavior information of a user aiming at an article in a first field and second behavior information of the user aiming at an article in a second field; and determining the correlation degree between the first field object and the second field object based on the first behavior information and the second behavior information of a plurality of users.

Preferably, the step of determining the degree of correlation between the first field of articles and the second field of articles may comprise: determining a first behavioral characteristic distribution of the first domain item relative to the plurality of users based on the first behavioral information of the plurality of users; determining a second behavioral characteristic distribution of the second domain item relative to the plurality of users based on second behavioral information of the plurality of users; and determining the correlation degree between the first field of articles and the second field of articles according to the similarity degree of the first behavior characteristic distribution and the second behavior characteristic distribution.

Preferably, the first behavioral characteristic distribution and/or the second behavioral characteristic distribution may comprise one or more of the following: whether the user performed a behavior on the item; the number of actions of the user on the article; preference of the user for the item.

Preferably, the first behavior information and/or the second behavior information comprises: whether the user performed a behavior on the item; and/or behavioral data generated based on a user's performed behavior on the item.

Preferably, the behavioral data may include one or more of the following: behavior type; the number of behaviors; duration of the behavior.

Preferably, the first behavioral profile includes a first preference of each of the plurality of users for the first domain item, the second behavioral profile includes a second preference of each of the plurality of users for the second domain item, and the step of determining a correlation between the first domain item and the second domain item includes: establishing a first preference vector of the plurality of users for the first domain item and a second preference vector of the second domain item respectively; and determining the correlation between the first field of articles and the second field of articles by calculating the similarity between the first preference vector and the second preference vector.

Preferably, the preference of the user to the item is equal to the sum of sub-preference degrees corresponding to each of at least part of the behavior types of the item by the user, wherein the sub-preference degrees are positively correlated with the behavior times and the behavior weights, respectively.

Preferably, the user's preference r for an item can be determined using the following formula,

wherein T is a behavior type set of a user aiming at an article, T is a behavior type, and q is _t For the number of behaviors under behavior type t, W _t And the behavior weight corresponding to the behavior type t.

Preferably, the relation mining method may further include: and respectively carrying out normalization processing on the first preference vector and the second preference vector.

According to a second aspect of the present disclosure, there is also provided a method for mining a relationship between items in different fields, including: for each user of a plurality of users, respectively acquiring first behavior data of the user for one or more first domain items and second behavior data of the user for one or more second domain items; a degree of correlation between each of the at least some first domain items and each of the at least some second domain items is determined based on the first behavioral data and the second behavioral data of the plurality of users.

According to a third aspect of the present disclosure, there is also provided an item recommendation method, including: acquiring first behavior data of a user in a first field, wherein the first behavior data relates to one or more first field objects; selecting a second domain item from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively; and recommending the selected second-area item to the user.

Preferably, the correlation between the first domain item and the second domain item may be obtained by using the relation mining method described in the first aspect or the second aspect of the present disclosure.

Preferably, the step of selecting the second domain item from the at least one second domain item may comprise: calculating the recommendation degree of each second-field object; and selecting a preset number of second field objects which are ranked at the top according to the order of the recommendation degree from the high degree to the low degree.

Preferably, the recommendation level of the second-field item is positively correlated with the correlation level of each of the at least one first-field item and the second-field item, respectively.

Preferably, the recommendation level of the second domain item is equal to a sum of sub-recommendation levels of the second domain item for each first domain item in the at least one, the sub-recommendation levels being positively correlated with the correlation of the first domain item with the second domain item and the preference of the user for the first domain item, respectively.

Preferably, the recommendation level for the second field of items may be calculated using the following formula,

wherein rec _uj Representing the recommendation degree of a user u to a second field object j, wherein I is a set of first field objects related to first row data of the user in the first field, I is the first field object, sim (I, j) represents the correlation degree between the first field object I and the second field object j, and r _ui The preference of the user u for the first area item i is indicated.

Preferably, the preference of the user to the first domain item is equal to the sum of sub-preference degrees corresponding to each of at least part of the behavior types of the first domain item, wherein the sub-preference degrees are positively related to the behavior times and the behavior weights respectively.

Preferably, the first behavior data comprises one or more of the following information of a behavior performed by the user on the first field of items: behavior type; the number of behaviors; duration of the behavior.

According to a fourth aspect of the present disclosure, there is also provided a relationship excavation apparatus between articles of different fields, comprising: the behavior information acquisition module is used for acquiring first behavior information of a user aiming at an article in a first field and second behavior information of the user aiming at an article in a second field; and the relevance determining module is used for determining relevance between the first field article and the second field article based on the first behavior information and the second behavior information of the plurality of users.

Preferably, the correlation determination module may include: a first behavior feature distribution determining unit configured to determine a first behavior feature distribution of the first-domain article with respect to the plurality of users based on first behavior information of the plurality of users; a second behavior feature distribution determining unit configured to determine a second behavior feature distribution of the second-domain article with respect to the plurality of users based on second behavior information of the plurality of users; and the correlation determining unit is used for determining the correlation between the first field object and the second field object according to the similarity degree of the first behavior characteristic distribution and the second behavior characteristic distribution.

Preferably, the first behavior feature distribution includes a first preference of each of the plurality of users for the first domain item, the second behavior feature distribution includes a second preference of each of the plurality of users for the second domain item, and the relevance determining unit includes: a vector establishing unit, configured to establish a first preference vector of the plurality of users for the first field of articles and a second preference vector of the second field of articles, respectively; and a correlation calculation unit for determining a correlation between the first-domain article and the second-domain article by calculating a similarity between the first preference vector and the second preference vector.

Preferably, the first behavior feature distribution determining unit and/or the second behavior feature distribution determining unit may determine the preference r of the user for the item using the following formula,

Preferably, the relation-mining device may further include: and the normalization processing module is used for respectively carrying out normalization processing on the first preference degree vector and the second preference degree vector.

According to a fifth aspect of the present disclosure, there is also provided a relationship excavation apparatus between articles of different fields, including: the behavior data acquisition module is used for respectively acquiring first behavior data of the user for one or more first field objects and second behavior data of the user for one or more second field objects for each of a plurality of users; and a relevance determining module for determining a relevance between each of at least part of the first domain items and each of at least part of the second domain items based on the first behavior data and the second behavior data of the plurality of users.

According to a sixth aspect of the present disclosure, there is also provided an item recommendation device, including: a first behavior data acquisition module for acquiring first behavior data of a user in a first domain, the first behavior data relating to one or more first domain items; an item selection module for selecting a second domain item from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively; and the article recommending module is used for recommending the selected articles in the second field to the user.

Preferably, the article selection module may include: a recommendation degree calculating unit for calculating recommendation degree of each second field article; and the article selecting unit is used for selecting a preset number of articles in the second field, which are ranked at the front, according to the sequence from the high recommendation degree to the low recommendation degree.

Preferably, the recommendation level of each of the second domain items is equal to a sum of sub-recommendation levels of the second domain items for each of the at least one first domain item, the sub-recommendation levels being positively correlated with the relevance of the first domain item to the second domain item and the preference of the user for the first domain item, respectively.

Preferably, the recommendation degree calculating unit may calculate the recommendation degree of the second-field item using the following formula,

wherein rec _uj Representing the recommendation degree of the user u to the second domain object j, wherein I is the set of the first domain objects related to the first row data of the user in the first domain, I is the first domain object, sim (I, j) represents the first domain object I and the second domain objectCorrelation between field objects j, r _ui The preference of the user u for the first area item i is indicated.

According to a seventh aspect of the present disclosure, there is also provided a computing device, comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor causes the processor to perform the method recited in the first aspect or the second aspect of the present disclosure.

According to an eighth aspect of the present disclosure there is also provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method recited in the first or second aspect of the present disclosure.

The correlation degree among the articles in the cross-domain can be determined by utilizing the relation mining scheme among the articles in different domains. When the user does not have behavior data in the target field, the objects related to the behavior data known in other fields by the user can be mapped to the similar objects in the target field based on the correlation among the objects in the cross-field, so that the problem of cold start of the user can be solved, and the user experience is improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout exemplary embodiments of the disclosure.

FIG. 1 is a schematic flow chart diagram illustrating a method of mining relationships between items in different areas according to an embodiment of the present disclosure.

Fig. 2 is a schematic flow chart diagram illustrating an item recommendation method according to an embodiment of the present disclosure.

FIG. 3 is a flow chart illustrating an overall implementation of the cross-domain inter-item relationship mining scheme and user cold start scheme of the present disclosure.

Fig. 4A, 4B illustrate one application schematic for implementing cross-domain recommendations using the present disclosure.

Fig. 5 is a schematic block diagram showing the structure of a relation-excavating device between articles of different fields according to an embodiment of the present disclosure.

Fig. 6 is a schematic block diagram showing a structure of a relation-excavating device between articles of different fields according to another embodiment of the present disclosure.

Fig. 7 is a schematic block diagram showing the structure of the article recommendation device of the present disclosure.

FIG. 8 is a schematic block diagram of a computing device that may be used to perform the relationship mining method and the item recommendation method between items of different fields of the present disclosure.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[ SUMMARY ]

The method and the device mainly aim at the problem of cold start of the user in the information recommendation process, and a solution is provided. The core idea of the present disclosure is to use users as association ties, and pre-establish association relations (i.e., correlation degree described below) between articles in different fields by collecting behavior data of a plurality of users for articles in different fields.

For the user without behavior data in the target domain, when recommending the articles in the target domain, the user can select the target domain articles with higher relevance with the articles in other domains browsed by the user in other domains from the target domain based on the pre-established association relation among the articles in different domains and the behavior data of the user aiming at the articles in other domains. Thus, the user cold start problem can be solved.

The article mentioned in the disclosure mainly refers to an article displayed in the internet, which may be a virtual article such as news, pictures, videos, music, or a physical article. For example, taking the related field as a shopping platform as an example, the articles in the field can be virtual articles or physical articles to be sold, such as game coins, game property gifts and the like, displayed on the shopping platform, and also can be physical articles such as clothing, digital products and the like sold by merchants.

The field to which the disclosure refers may have various divisions.

As an example of the present disclosure, different applications may be considered as different domains, and different modules in the same application may be considered as different domains. Taking news information applications (current day headlines) as an example, different channels in the applications can be considered as different fields, e.g., video channels, social channels, entertainment channels, financial channels, fashion channels in the present day headlines can be considered as different fields. Taking a shopping platform (such as a jindong mall) as an example, the commodity classification of a computer complete machine, office consumables, electrical equipment, mobile phone numbers, supermarkets, department stores and the like in the mall can be regarded as different fields.

As another example of the present disclosure, different fields may also be divided according to the attributes of the article, for example, various fields of music, video, pictures, novels, etc. may be divided according to the format of the article. And for the divided fields, they can be further divided. For example, for the video domain, the video can be further divided into multiple domains such as a talk episode, an anti-japanese episode, an ancient drama, and the like according to the tag carried by the video.

As yet another example of the present disclosure, different types of applications may also be considered different fields. For example, social communication applications (e.g., weChat, QQ) and news reading applications (now Japanese, connotation, phoenix news, etc.) may be considered different fields.

In addition, there may be other various field division modes, which are not described herein.

Various aspects related to the technical solutions of the present disclosure are described below.

[ establishing relationship between Cross-Domain items ]

Referring to fig. 1, in step S110, first behavior information of a user for a first domain item and second behavior information of the user for a second domain item are acquired.

The first behavior information and/or the second behavior information may include information indicating whether the user performed a behavior on the article, and in the case where the user performed a behavior on the article, the first behavior information and/or the second behavior information may further include behavior data generated based on the behavior performed on the article by the user.

For example, when the behavior information (first behavior information and/or second behavior information) is empty information, it may indicate that the user does not perform a behavior on the article, and when the behavior information is not empty, it may indicate that the user performs a behavior on the article. For another example, the behavior information may further include information indicating whether the user performed a behavior on the article, such as identification information, where the identification information may be "1" and "0", where "1" indicates that the user performed a behavior on the article, and "0" indicates that the user did not perform a behavior on the article.

The behavior data may include data generated by a user performing a behavior on the article, or may include data obtained by counting the behavior performed by the user. For example, the behavior data may include, but is not limited to, various behavior types such as clicking, playing, evaluating, etc., performed by the user on the item, the number of behaviors per behavior type, and the duration of the behavior, etc.

In step S120, a degree of correlation between the first domain item and the second domain item is determined based on the first behavior information and the second behavior information of the plurality of users.

The present disclosure may determine, with a user as an association tie, a first behavioral profile of a first domain item relative to the plurality of users and a second behavioral profile of a second domain item relative to the plurality of users, respectively, by analyzing the first behavioral information and the second behavioral information of each of the plurality of users.

The first behavioral characteristic distribution may characterize behavioral characteristics of the plurality of users, respectively, for the first domain item, and the second behavioral characteristic distribution may characterize behavioral characteristics of the plurality of users, respectively, for the second domain item. As an example, the behavior feature may be whether the user performed the behavior on the article, the number of times the user performed the behavior on the article, the preference of the user to the article obtained through calculation, and the like. The calculation mode of the preference degree will be described below, and will not be described herein again.

For a particular first domain item a and second domain item B, the degree of correlation of a with B may be determined based on the degree of similarity between the first behavioral characteristic distribution of a and the second behavioral characteristic distribution of B. The principle of determination is that if the first behavior feature distribution of the plurality of users on A and the second behavior feature distribution on B are similar, the A and the B can be considered to be strongly related, and vice versa.

	Video 1	Video 2	Video 3	Video 4	Music 1	Music 2	Music 3
								User 1	1	0	0	1	1	1	0
User 2	1	0	1	0	0	0	1
								User 3	0	0	1	0	0	1	0
User 4	0	1	0	1	1	0	1
								User 5	0	0	0	0	1	1	0

List one

Taking table one as an example, the first field may be video, the second field may be music, the first field object may be a specific video, such as video 1-video 4, and the second field object may be a specific music, such as music 1-music 3. The number 1 of the user under the corresponding item shown in the table indicates that the user performed a behavior on the corresponding item (video or music) with behavior data (e.g., click, play, collection, etc.), and the number 0 of the user under the corresponding item shown in the table indicates that the user did not perform a behavior on the corresponding item (video or music) without behavior data.

As shown in table one, user 4 performed a behavior for video 2, and user 1, user 3, and user 5 performed a behavior for music 2. Thus, the first behavioral characteristic distribution of video 2 may be represented as {0, 1,0}, and the second behavioral characteristic distribution of music 2 may be represented as {1,0,1,0,1}. Since video 2 and music 2 are not commonly used by users, video 2 and music 2 may be considered weakly correlated, i.e., the correlation between video 2 and music 2 may be considered zero.

[ calculation of correlation ]

As one example of the present disclosure, the first behavioral characteristic distribution may characterize a distribution of preferences of the plurality of users, respectively, for the first domain item. The second behavioral characteristic distribution may characterize a distribution of preferences of the plurality of users, respectively, for the second domain of items. That is, the first behavioral profile may include a first preference of each of the plurality of users for the first domain item and the second behavioral profile may include a second preference of each of the plurality of users for the second domain item.

Thus, for a specific first domain item a and second domain item B, the correlation between a and B may be determined by calculating the similarity between the distribution of the preference levels of the plurality of users to a and the distribution of the preference levels to B, respectively. The correlation degree is calculated as follows.

Step 1, preference degree calculation

As described above, in the case where the user performs a behavior on the article, the behavior information may further include behavior data generated based on the behavior performed on the article by the user, and for convenience of distinction, the behavior data included in the first behavior information may be referred to as "first behavior data", and the behavior data included in the second behavior information may be referred to as "second behavior data". Wherein, in case the user does not perform the action on the article, the action information does not include the action data, or the included action data is null.

A first preference for a first domain item in a first domain and a second preference for a second domain item in a second domain for each user may be calculated based on the first behavior data and the second behavior data for each user of the plurality of users.

As described above, the behavior data of the user for the article may include various behavior types such as clicking, playing, evaluating, etc., and the execution times of different behavior types are not the same. Thus, the user's total preference for an item may be considered to be equal to the sum of the user's corresponding sub-preferences for each of at least some of the behavior types of the item. Wherein, the sub preference degree can be positively correlated with the behavior times and the behavior weights, respectively.

The user's preference r for items may be calculated for example using the following formula,

where T is at least part (preferably all) of the behavior types of the user for the item, T is a different behavior type, q _t For the number of behaviors under behavior type t, W _t And the behavior weight corresponding to the behavior type t. The behavior type and the number of behaviors can be determined from the behaviorsThe weights corresponding to different behavior types can be predetermined in a value assignment mode or can be determined in other modes, for example, the weights can be determined according to the behavior duration of the behavior types.

Step 2, vector establishment

A first preference vector for the first domain item and a second preference vector for the second domain item, respectively, for the plurality of users may be established. The number of the elements in the preference degree vector is consistent with the number of the users, and the values of the elements are the preference degree of the users for the corresponding articles.

Step 3, calculating the similarity between vectors

The similarity between the first preference vector and the second preference vector may be calculated in a number of computational ways. The calculated similarity may be used as a correlation between the corresponding first domain item and the second domain item.

For example, the calculation may be performed by a plurality of vector similarity calculation methods such as cosine similarity, jaccard similarity, pearson correlation coefficient, euclidean distance, manhattan distance, mahalanobis distance, and the like.

It should be noted that, since the sizes of the preference degrees of the articles calculated in different fields may be different, in order to avoid the difference caused by the different sizes, the first preference degree vector and the second preference degree vector calculated in the step 1 may be normalized respectively, and the preference degree vector after the normalization is used to participate in the calculation of the correlation degree. Common normalization methods include min-max normalization, log function transformation, z-score normalization, and the like, and the normalization process is not described herein.

Computing examples

	Video 1	Video 2	Video 3	Video 4	Music 1	Music 2	Music 3
								User 1	2	0	0	2	4	5	0
User 2	5	0	4	0	0	0	1
								User 3	0	0	5	0	0	2	0
User 4	0	1	0	3	5	0	4
								User 5	0	0	0	0	4	2	0

Watch II

The number in table two corresponding to the item and the user indicates the preference of the user for the item. From the preference calculation results of Table two, it can be derived that the preference vector of video 1 is {2,5,0,0,0}, the preference vector of video 2 is {0,1, 0}, the preference vector of video 3 is {0,4,5,0,0}, the preference vector of video 4 is {2,0,0,3,0}, the preference vector of music 1 is {4,0,0,5,4}, the preference vector of music 2 is {5,0,2,0,2}, and the preference vector of music 3 is {0,1,0,4,0}.

The cosine similarity calculation mode can be used for calculating the correlation among the articles in different fields, the calculation results are shown in the following table III, and the specific calculation process is not described.

	Video 1	Video 2	Video 3	Video 4
					Music 1	0.20	0.66	0.00	0.84
Music 2	0.32	0.00	0.27	0.48
					Music 3	0.22	0.97	0.15	0.81

Watch III

The process of determining the degree of correlation between items across domains has been described so far using the first domain and the second domain as examples. By using the relation mining scheme of the present disclosure, the correlation between the articles in any two fields among a plurality of different fields can be mined.

In addition, the correlation degree among the articles in different fields can be determined in other modes. For example, for objects in different fields, labels or keywords that can characterize their attributes in multiple dimensions may be extracted, e.g., a theme model may be used, and the objects may be mapped to a label according to element attribute information of the objects. The element attribute information of the object can be mapped into a vector by adopting the method of seq2 vec. The degree of similarity of the labels or vectors of different objects can be calculated, and the degree of correlation among objects in different fields can be determined.

As an example of the present disclosure, keywords of items in different domains may be extracted, and a degree of correlation between items in different domains may be determined by analyzing a degree of similarity of the keywords between items in different domains. Specifically, for a first domain item within a first domain, one or more keywords of the first domain item may be extracted, generating a first keyword vector. For second-domain items within the second domain, one or more keywords of the second-domain items may be extracted, generating a second keyword vector. By calculating the degree of similarity between the first keyword vector and the second keyword vector by means of cosine similarity calculation, the degree of correlation between the first domain object and the second domain object can be determined.

Based on the concepts of the present disclosure, the relevance between cross-domain items may also be determined in a variety of other ways, which are not described in detail herein.

As one example of the present disclosure, first behavior data of a user for one or more first domain items and second behavior data of a user for one or more second domain items may be acquired separately for each of a plurality of users.

Wherein the plurality of users mentioned herein are preferably users having behavioral data in both the first area and the second area. The first domain is different from the second domain. As can be seen from the above description of the division manner of the domains, the first domain and the second domain may refer to different applications, may be different modules in the same application, may be different types of applications, or may be different domains divided according to attributes, such as music, video, pictures, and the like.

The first behavior data and the second behavior data may be collected by a client-side log collection system. And when the behavior data is collected through the client log, the client log can be cleaned, and invalid logs caused by user abnormality, user operation abnormality, server abnormality and the like can be filtered. The first behavior data and the second behavior data referred to herein refer to the total behavior data of the user in the corresponding field, which may relate to one or more articles.

Based on the first behavior data and the second behavior data of the plurality of users, a degree of correlation between each of the at least a portion of the first domain items and each of the at least a portion of the second domain items may be determined using the relationship mining method described above in connection with fig. 1. The specific determination process of the correlation degree is not described herein.

[ Cross-Domain item recommendation ]

Referring to fig. 2, at step S210, first behavior data of a user within a first domain is acquired, the first behavior data relating to one or more first domain items.

In step S220, a second domain item is selected from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively.

The user in the present embodiment refers to a user lacking behavior data in the second domain, that is, the user can be regarded as a new user in the second domain. When recommending the second area items in the second area to the user, the user is faced with a cold start problem. And the user has no problem of cold start of the user in the first field.

Unlike the first domain described above in relation-mining schemes between items in different domains, the first domain in this embodiment may refer broadly to a known domain in which one or more other users, different from the second domain, have behavioral data.

That is, when recommending items of the second domain in the second domain to the user, items of the second domain having a higher degree of correlation with some or all of the items of the user in the other domains may be selected as items suitable for recommendation to the user from the second domain based on some or all of the items of the user in one or more other domains whose behavior data is known.

Wherein the correlation between the first field of articles and the second field of articles may be predetermined, for example, may be obtained using the mining method of relationships between articles of different fields as described above.

In step S230, the selected second-area item is recommended to the user.

Therefore, when recommending the content (article) in a certain unknown domain for the user, the known behavior data of the user in other content domains can be utilized, and the interests of the user in other content domains are mapped to the unknown domain by searching similar articles across domains, so that the problem of cold start of the user in the unknown domain can be solved, and the user experience is improved.

As an example of the present disclosure, when selecting the second-domain items, a predetermined number of the second-domain items that are top ranked may be selected in order of the degree of recommendation from the top by calculating the degree of recommendation for each of the at least one second-domain item.

Wherein the degree of recommendation of the second domain items may be positively correlated with at least part (preferably all) of each of the first domain items in the set of items to which the first row of data of the user relates, respectively, with the degree of correlation of the second domain items.

For example, the degree of recommendation of the second-area item may be equal to the sum of the sub-degrees of recommendation of the second-area item to the respective first-area items to which the first row of data of the user relates. The sub recommendation degree is positively correlated with the correlation degree of the first field object and the second field object and the preference degree of the user to the first field object.

The recommendation level of the second field of items may be calculated in particular using the following formula,

wherein rec _uj Representing the recommendation degree of a user u to a second field object j, wherein I is a set of first field objects related to first row data of the user in the first field, I is the first field object, sim (I, j) represents the correlation degree between the first field object I and the second field object j, and r _ui The preference of the user u for the first area item i is indicated. The preference degree may be regarded as a weight of the correlation sim (i, j) between the first field item i and the second field item j, and the calculation manner of the preference degree may be referred to the above related description, which is not repeated herein.

Specific application example

The method and the device can be used for recommending applications such as videos, music, news, applications, games, themes and the like in various electronic devices such as mobile phones, tablet computers, televisions, intelligent sound boxes and intelligent watches which have the problem of cold starting of users.

FIG. 3 is a flow chart illustrating an overall implementation of the cross-domain inter-item relationship mining scheme and user cold start scheme of the present disclosure. The implementation steps shown in fig. 3 are as follows.

Step 1, collecting behavior logs generated by users

The behavior data of clicking, playing, evaluating and the like of the user in different fields on various terminals can be collected through the client log acquisition system.

Step 2, log cleaning and preference data calculation

Firstly, cleaning an original log, and filtering invalid logs caused by abnormal users, misoperation, server abnormality and the like. And then, analyzing the behavior data to obtain the preference degree of the user to the article. The calculation method of the preference is not described herein.

And step 3, calculating the relation data among the cross-domain items according to the preference data.

According to the preference data obtained in the step 2, preference vectors of articles in different fields under a plurality of users (User, namely User1 to User5 shown in the figure) can be obtained. As shown in FIG. 3, the preference vector for Video1 is {2,5,0,0,0}, the preference vector for Video2 is {0,1, 0}, the preference vector for Video3 is {0,4,5,0,0}, the preference vector for Video4 is {2,0,0,3,0}, the preference vector for Music1 is {4,0,0,5,4}, the preference vector for Music2 is {5,0,2,0,2}, and the preference vector for Music3 is {0,1,0,4,0}.

Similarity between preference vectors corresponding to the articles in different fields can be calculated by using a similarity calculation mode to serve as the correlation between the articles in different fields, so that the relationship data between the articles in different fields can be obtained.

Step 4, calculating the recommendation degree of the article

User5 has no behavior data in the Video domain (Video), so User5 can be considered a new User in the Video domain, and is faced with a cold start problem when recommending Video for User 5.

The recommendation degree of different videos can be calculated for the User5 according to behavior data of the User5 in the music field and predetermined relation data among the cross-field objects in the Video field and the music field.

Specifically, the recommendation of User5 for different Video can be calculated using the following formula.

Here, rec _uj The recommendation degree of User5 to video j is shown, I is the set of Music related to the behavior data of User5 in the Music domain, and is { Music1, music2}. sim (i, j) represents the correlation between video j and music i, r _ui Indicating the preference of User5 for music i.

As shown in fig. 3, the expansion formula for calculating the recommended value (i.e., recommendation level) of Video1 is (similarity_ mlv 1) · (value_m1) + (similarity_m2v1) · (value_m2) + (similarity_m3v1) · (value_m3), where similarity_ mlv1 represents the correlation between Music1 and Video1, and value_m1 represents the preference of User5 for Music 1. The im_m2v1 represents the correlation between Music2 and Video1, and the value_m2 represents the preference of User5 for Music 2. im_m3v1 represents the correlation between Music3 and Video1, and value_m3 represents the preference of User5 for Music 3.

The recommendation degree of User5 to different Video finally calculated by the calculation method is that the recommendation degree of User5 to Video1 is 1.4, the recommendation degree of User5 to Video2 is 2.4, the recommendation degree of User5 to Video3 is 05, and the recommendation degree of User5 to Video4 is 4.2.

Step 5, selecting articles to recommend according to the recommendation degree ranking

As shown in fig. 3, the items may be arranged in order of high recommendation degree, and the top-ranked items may be presented to the user as a recommendation list. For example, top ranked Video4, video2 may be recommended to User5.

Therefore, for a new user lacking user behavior data in the target field, only non-personalized information can be seen, and personalized information recommendation results can be seen by using the method and the device. As shown in fig. 4A and 4B, although the user does not use the video center, according to the user's saying "three-life ten-liner peach blossom", after the user opens the video application "guess you like" module, the recommended television series "three-life ten-liner peach blossom" can be seen.

In summary, the problem of insufficient behavior data of the user in the target field can be complemented by using data in other fields, so that the problem of cold start of the user in the recommendation system is solved, and the experience of the user in the recommendation system is improved.

Heretofore, the relation mining method and the item recommendation method between items in different fields of the present disclosure have been described in detail above with reference to fig. 1 to 3. A relationship mining apparatus, an article recommendation apparatus, and a computing device between articles in different fields of the present disclosure are described below with reference to fig. 5 to 8.

[ relation excavation device ]

Fig. 5 is a schematic block diagram showing the structure of a relation-excavating device between articles of different fields of the present disclosure. Details of the content are the same as those described above with reference to fig. 1, and are not repeated here.

Referring to fig. 5, the relationship mining apparatus 300 may include a behavior information acquisition module 310 and a relevance determination module 320.

The behavior information acquisition module 310 may be configured to acquire first behavior information of a user for a first domain item and second behavior information of the user for a second domain item;

the relevance determining module may determine a relevance between the first domain item and the second domain item based on the first behavior information and the second behavior information of the plurality of users.

As shown in fig. 5, the correlation determination module 320 may optionally include a first behavior feature distribution determination unit 321, a second behavior feature distribution determination unit 323, and a correlation determination unit 325, which are shown by dashed boxes in the figure.

The first behavior feature distribution determining unit 321 may determine a first behavior feature distribution of the first-domain article with respect to the plurality of users based on the first behavior information of the plurality of users.

The second behavior feature distribution determining unit 323 may determine a second behavior feature distribution of the second-domain article with respect to the plurality of users based on the second behavior information of the plurality of users.

The degree of correlation determination unit 325 may determine the degree of correlation between the first-domain item and the second-domain item according to the degree of similarity of the first behavior feature distribution of the first-domain item and the second behavior feature distribution of the second-domain item.

The first behavioral characteristic distribution and/or the second behavioral characteristic distribution may include one or more of the following: whether the user performs a behavior on the article, the number of times the user performs the behavior on the article, and the preference of the user on the article.

As an example, the first behavioral profile may include a first preference of each of the plurality of users for the first domain item and the second behavioral profile may include a second preference of each of the plurality of users for the second domain item.

The user's preference for the item may be equal to the sum of the sub-preferences for each of some or all of the behavior types of the item, where the sub-preferences are positively correlated with the number of behaviors and the behavior weight, respectively. For example, the first behavior feature distribution determining unit 321 and/or the second behavior feature distribution determining unit 323 may calculate the preference r of the user for the item using the following formula,

wherein T is all behavior types of the user aiming at the object, T is different behavior types, q _t For the number of behaviors under behavior type t, W _t And the behavior weight corresponding to the behavior type t.

The correlation determination unit 325 may include a vector establishment unit 3251 and a correlation calculation unit 3253.

The vector creation unit 3251 is for creating a first preference vector for the first domain item and a second preference vector for the second domain item for the plurality of users, respectively.

The correlation calculation unit 3253 may determine a correlation between the first domain item and the second domain item by calculating a similarity between the first preference vector and the second preference vector.

As shown in fig. 5, the relationship mining apparatus 300 may also optionally include a normalization processing module 330, shown in dashed boxes. The normalization processing module 330 may perform normalization processing on the first preference degree vector and the second preference degree vector, and the correlation calculating unit 3253 may calculate a similarity between the normalized first preference degree vector and the normalized second preference degree vector as a correlation between the first-field article and the second-field article.

Fig. 6 shows a schematic block diagram of the structure of a relation-mining apparatus between articles of different fields of the present disclosure. Details of the content are the same as those described above with reference to fig. 1, and are not repeated here.

Referring to fig. 6, a relationship mining apparatus 600 may include a behavior data acquisition module 610 and a relevance determination module 620.

The behavior data acquisition module 610 is configured to acquire, for each of a plurality of users, first behavior data of the user for one or more items in a first domain in the first domain and second behavior data of the user for one or more items in a second domain in the second domain, respectively.

The first behavior data and the second behavior data may include one or more of: behavior type, behavior times, behavior duration.

The relevance determination module 620 is configured to determine a relevance between each of at least a portion of the first domain items and each of at least a portion of the second domain items based on the first behavioral data and the second behavioral data of the plurality of users. The specific determination manner of determining the correlation between the first field of articles and the second field of articles may be referred to the above related description, and will not be repeated here.

[ means for recommending articles ]

Fig. 7 is a schematic block diagram showing the structure of the article recommendation device of the present disclosure. Details of the content are the same as those described above with reference to fig. 2, and are not repeated here.

Referring to fig. 7, the item recommendation device 400 may include a first behavior data acquisition module 410, an item selection module 420, and an item recommendation module 430.

The first behavior data acquisition module 410 may acquire first behavior data of a user within a first domain, the first behavior data relating to one or more first domain items.

The item selection module 420 may select a second domain item from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively. Wherein the correlation between the first domain object and the second domain object may be obtained by using the above-mentioned relation mining method.

The item recommendation module 430 may be used to recommend the selected second area item to the user.

As shown in fig. 7, the item selection module 420 may further optionally include a recommendation degree calculating unit 421 and an item selection unit 423 shown in the dashed line boxes.

The recommendation degree calculating unit 421 may be used for the recommendation degree of each second-area item. The item selection unit 423 may select a predetermined number of the second-area items that are top-ranked in order of the recommendation degree from high to low.

Wherein the recommendation level of each second-field item may be positively correlated with the correlation level of each first-field item to which the first-row data of the user relates and the second-field item.

As an example, the degree of recommendation of the second domain item is equal to the sum of the sub-degrees of recommendation of the second domain item for each of the at least one first domain item, the sub-degrees of recommendation being positively correlated with the degree of correlation of the first domain item with the second domain item and the degree of preference of the user for the first domain item, respectively.

The recommendation degree calculating unit 421 may calculate the recommendation degree of the second-field item using the following formula,

[ computing device ]

There is also provided in accordance with the present disclosure a computing device that may be used to perform the character recognition model training method and the information recommendation method of the present disclosure.

As shown in fig. 8, the computing device 500 may include a processor 510 and a memory 530. Memory 530 has executable code stored thereon. The executable code, when executed by the processor 510, causes the processor 510 to perform the relationship mining method and the item recommendation method described above.

The method, the device and the computing equipment for mining and recommending the relationship among the articles according to the invention have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for performing the steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An item recommendation method, comprising:

acquiring first behavior data of a user in a first field, wherein the first behavior data relates to one or more first field objects;

selecting a second domain item from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively; and

recommending the selected second-domain items to the user, wherein the user is a new user in the second domain;

the step of selecting a second domain item from the at least one second domain item comprises:

Calculating the recommendation degree of each second-field object;

selecting a preset number of second field objects which are ranked at the front according to the sequence of the recommendation degree from high to low;

wherein the degree of recommendation of the second domain item is positively correlated with the degree of correlation of each first domain item in the at least one and the second domain item, respectively; the recommendation degree of the second field of articles is equal to the sum of sub-recommendation degrees of the second field of articles to each first field of articles in the at least one, and the sub-recommendation degrees are positively correlated with the correlation degree of the first field of articles and the second field of articles and the preference degree of the user to the first field of articles respectively.

2. The item recommendation method of claim 1, wherein the determination of the degree of correlation between the first domain item and the second domain item comprises:

acquiring first behavior information of a user aiming at an article in a first field and second behavior information of the user aiming at an article in a second field;

and determining the correlation degree between the first field object and the second field object based on the first behavior information and the second behavior information of a plurality of users.

3. The item recommendation method of claim 2, wherein the step of determining a degree of correlation between the first domain item and the second domain item comprises:

determining a first behavioral characteristic distribution of the first domain item relative to the plurality of users based on the first behavioral information of the plurality of users;

determining a second behavioral characteristic distribution of the second domain item relative to the plurality of users based on second behavioral information of the plurality of users;

and determining the correlation degree between the first field of articles and the second field of articles according to the similarity degree of the first behavior characteristic distribution and the second behavior characteristic distribution.

4. The item recommendation method of claim 3, wherein the first behavioral characteristic distribution and/or the second behavioral characteristic distribution comprises one or more of:

whether the user performed a behavior on the item;

the number of actions of the user on the article;

preference of the user for the item.

5. The item recommendation method according to claim 2, wherein the first behavior information and/or the second behavior information comprises:

whether the user performed a behavior on the item; and/or

Behavior data generated based on a behavior performed by a user on an item.

6. The item recommendation method of claim 5, wherein the behavioral data comprises one or more of:

behavior type;

the number of behaviors;

duration of the behavior.

7. The item recommendation method of claim 3 wherein said first behavioral characteristic distribution comprises a first preference for said first domain item by each of said plurality of users and said second behavioral characteristic distribution comprises a second preference for said second domain item by each of said plurality of users, said step of determining a correlation between a first domain item and said second domain item comprising:

establishing a first preference vector of the plurality of users for the first domain item and a second preference vector of the second domain item respectively;

and determining the correlation between the first field of articles and the second field of articles by calculating the similarity between the first preference vector and the second preference vector.

8. The method for recommending items according to claim 7, wherein,

the preference degree of the user on the article is equal to the sum of sub-preference degrees corresponding to each of at least part of the behavior types of the article by the user, wherein the sub-preference degrees are positively correlated with the behavior times and the behavior weights respectively.

9. The item recommendation method of claim 7, further comprising:

and respectively carrying out normalization processing on the first preference degree vector and the second preference degree vector.

10. The item recommendation method of claim 2, wherein the determination of the degree of correlation between the first domain item and the second domain item comprises:

for each user of a plurality of users, respectively acquiring first behavior data of the user for one or more first domain items and second behavior data of the user for one or more second domain items;

a degree of correlation between each of the at least some first domain items and each of the at least some second domain items is determined based on the first behavioral data and the second behavioral data of the plurality of users.

11. The method of claim 1, wherein,

the preference degree of the user on the first field of articles is equal to the sum of sub-preference degrees corresponding to each behavior type in at least part of behavior types of the user on the first field of articles, wherein the sub-preference degrees are positively related to the behavior times and the behavior weights respectively.

12. The item recommendation method of claim 1, wherein the first action data comprises one or more of the following information of actions performed by a user on the first field item:

behavior type;

the number of behaviors;

duration of the behavior.

13. An article recommendation device, comprising:

a first behavior data acquisition module for acquiring first behavior data of a user in a first domain, the first behavior data relating to one or more first domain items;

an item selection module for selecting a second domain item from the at least one second domain item based on a degree of correlation between at least one of the one or more first domain items and each of the at least one second domain item, respectively; and

the article recommending module is used for recommending the selected articles in the second field to the user, wherein the user is a new user in the second field;

the article selection module comprises:

a recommendation degree calculating unit, configured to calculate a recommendation degree of each of the second-domain articles;

the article selecting unit is used for selecting a preset number of articles in the second field, which are ranked at the front, according to the sequence from the high recommendation degree to the low recommendation degree;

14. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor causes the processor to perform the method of any of claims 1-12.

15. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1 to 12.