CN111611499B

CN111611499B - Collaborative filtering method, collaborative filtering device and collaborative filtering system

Info

Publication number: CN111611499B
Application number: CN202010470716.3A
Authority: CN
Inventors: 李政浩; 董天南
Original assignee: Seashell Housing Beijing Technology Co Ltd
Current assignee: Seashell Housing Beijing Technology Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2021-08-17
Anticipated expiration: 2040-05-28
Also published as: CN111611499A

Abstract

The invention provides a collaborative filtering method, a collaborative filtering device and a collaborative filtering system, and belongs to the technical field of house information processing. The method comprises the following steps: determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user; determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users; and acquiring a trained vector decomposition model through the second preference data set, acquiring a characteristic vector set corresponding to the house resources in the second house resource set by using the second preference data set and the trained vector decomposition model, calculating the similarity of the characteristic vector set, and forming a similarity set after the calculation is finished. The method is used for determining the recommended house source with the user preference characteristic through similarity calculation.

Description

Collaborative filtering method, collaborative filtering device and collaborative filtering system

Technical Field

The invention relates to the technical field of house information processing, in particular to a collaborative filtering method, a recommendation method, a collaborative filtering device, a recommendation device, a system, equipment and a computer readable storage medium.

Background

In collaborative filtering algorithms based on articles, recommendation lists are generally generated by utilizing similarities among the articles, and such algorithms have time complexity O (n)²). But currently in the era of information overload, the items of information confronted are countless, such as house information (house source, as the Item to be processed); with the gradual increase of the data amount n, the time cost and hardware resource consumption for calculating the similarity between any two items are very high, so an effective calculation strategy is needed to reduce the calculation cost and save the calculation resources.

Disclosure of Invention

The invention aims to provide a collaborative filtering method, a collaborative filtering device and a collaborative filtering system, which solve the technical problems of low recall rate of high-correlation effective house resources, long time for similarity calculation, high time complexity, excessive occupation of hardware resources and the like caused by the existence of a large amount of redundant similarity calculation with poor correlation in the prior art.

According to an aspect of an embodiment of the present disclosure, there is provided a collaborative filtering method including:

determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, wherein the room sources in the first room source set are recorded with behavior data of the selected user;

determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;

determining a to-be-trained vector decomposition model, performing iterative computation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative computation is completed;

and performing factorization by at least utilizing preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculating the similarity of the feature vector set, and forming a similarity set after the calculation is finished.

In an embodiment of the present disclosure, the forming, by the preference data of the house resources in the first house resource set, a first preference data set corresponding to the selected user specifically includes:

and obtaining the scores of the house sources in the first house source set by utilizing the behavior data of the selected user and combining with a preset implicit scoring rule, recording the scores corresponding to the house sources in the first house source set as preference data, and forming a first preference data set corresponding to the selected user through the preference data.

In another embodiment of the present disclosure, the determining a location area range and selecting a part of users according to the location area range includes:

and determining the administrative area range of the selected user according to the physical position information or the network address information of the selected user, and selecting part of users in the administrative area range.

and determining a current administrative area range according to the current position information of the selected user, and selecting a part of users outside the current administrative area range and in the administrative area range corresponding to the position record information according to the current administrative area range and by combining the position record information in the user image of the selected user.

In yet another embodiment of the present disclosure, after the determining the second room source set and by the time of calculating the similarity of the feature vector sets, the method includes:

taking a first preference data set corresponding to the selected user as a first scoring matrix, and taking a second preference data set corresponding to the part of users as a second scoring matrix, wherein any one preference data is a score;

determining a to-be-trained vector decomposition model, performing iterative computation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative computation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;

and performing factorization by using the second scoring matrix and combining the trained vector decomposition model to at least obtain the hidden attribute feature vectors of the house source factors in the second house source set, forming a feature vector set, and calculating the similarity of the feature vector set.

In another embodiment of the present disclosure, after obtaining the trained vector decomposition model, and when obtaining at least the implicit attribute feature vector related to the house source factor in the second house source set, the method specifically comprises:

determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;

and filtering the second scoring matrix by utilizing the corresponding relation between the co-occurrence house source and the scores in the set, and then performing factorization by using the filtered second scoring matrix and combining the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factors in the second house source set.

In yet another embodiment of the present disclosure, after the determining the second room source set and before forming the similarity set, the method further includes:

forming a first user set and a second user set, wherein each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set of the co-occurrence room source pair, and each user in the second user set is recorded with behavior data corresponding to the other room source in the room source pair;

calculating a co-occurrence score according to the number of users in the intersection and the number of users in the union of the first user set and the second user set;

weighting the similarity with the co-occurrence score.

According to another aspect of the embodiments of the present disclosure, there is provided a recommendation method including:

determining a third room source set, and forming preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, wherein the room sources in the third room source set are recorded with behavior data of the recommended user;

and determining a recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.

In an embodiment of the present disclosure, a user is selected from the location area range as a recommended user.

According to still another aspect of an embodiment of the present disclosure, there is provided a collaborative filtering apparatus including:

the system comprises a first selection module, a second selection module and a third selection module, wherein the first selection module is used for determining a first room source set and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, and the room sources in the first room source set are recorded with behavior data of the selected user;

the second selection module is used for determining a position area range, selecting a part of users according to the position area range, then determining a second room source set, and forming preference data of the room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;

the similarity calculation module is used for determining a to-be-trained vector decomposition model, performing iterative calculation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative calculation is completed;

the similarity calculation module is further configured to perform factorization by using at least some preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed.

In an embodiment of the present disclosure, the first selection module is specifically configured to obtain, by using the behavior data of the selected user and in combination with a preset implicit rating rule, a rating of a house source in the first house source set, record a rating corresponding to a house source in the first house source set as preference data, and form, through the preference data, a first preference data set corresponding to the selected user.

In another embodiment of the present disclosure, the second selection module is specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.

In yet another embodiment of the present disclosure, the second selection module is specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in a user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.

In yet another embodiment of the present disclosure, the first selection module is specifically configured to select a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to select a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data is a score;

the similarity calculation module is specifically used for determining a to-be-trained vector decomposition model, performing iterative calculation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative calculation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;

the similarity calculation module is further specifically configured to perform factorization on the second scoring matrix in combination with the trained vector decomposition model to obtain at least hidden attribute feature vectors related to the house source factors in the second house source set, form a feature vector set, and calculate similarity of the feature vector set.

In yet another embodiment of the present disclosure, the similarity calculation module is further specifically configured to, after the obtaining of the trained vector decomposition model and when at least the implicit attribute feature vector related to the house source factor in the second house source set is obtained, determine a co-occurrence house source pair set according to the house sources in the first house source set and the house sources in the second house source set,

the similarity calculation module is further specifically configured to filter the second scoring matrix by using the co-occurrence house source to the corresponding relationship between the house source in the set and the score, and perform factorization by using the filtered second scoring matrix in combination with the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factor in the second house source set.

In yet another embodiment of the present disclosure, the collaborative filtering apparatus further includes:

a co-occurrence weighting module, configured to determine a set of co-occurrence room source pairs according to the room sources in the first room source set and the room sources in the second room source set after the second room source set is determined and before the similarity set is formed,

the co-occurrence weighting module is used for forming a first user set and a second user set, wherein all users in the first user set are recorded with behavior data corresponding to one of the room sources in the room source pair set in the co-occurrence room source pair set, and all users in the second user set are recorded with behavior data corresponding to the other room source in the room source pair,

the co-occurrence weighting module is used for calculating a co-occurrence score through the number of users in the intersection and the number of users in the union of the first user set and the second user set,

the co-occurrence weighting module is further configured to weight the similarity using the co-occurrence score.

According to still another aspect of the embodiments of the present disclosure, there is provided a recommendation apparatus including:

a third selection module, configured to determine a third room source set, and form preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, where the room sources in the third room source set are recorded with behavior data of the recommended user;

and the recommending module is used for determining the recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.

In an embodiment of the present disclosure, the recommended user in the third selection module is a user selected from the aforementioned location area range.

According to yet another aspect of embodiments of the present disclosure, there is provided a system comprising:

a recommendation engine configured to execute instructions corresponding to the foregoing method.

According to still another aspect of the embodiments of the present disclosure, there is provided an apparatus for recommending a house source, including:

at least one processor;

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implements the aforementioned method by executing the instructions stored by the memory.

According to yet another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the aforementioned method.

Corresponding to the content, the invention utilizes the house source of the behavior generated by the user as the preference feature (explicit preference feature and/or implicit preference feature) of the user, combines the range of the position area, filters out partial users possibly having similar preference feature with the user, obviously reduces the similarity calculation complexity of the house source of the behavior generated by the user and the partial users, solves the corresponding relation of the obtained similarity set with the house source, the corresponding relation is transmitted to the similar preference data set (combined result) after the similarity set is combined with the third preference data set, and can determine partial similar preference data (a plurality of) in the similar preference data set and the recommended house source corresponding to the similar preference data by sequencing the similar preference data set according to the size sequence;

according to the method, the implicit preference characteristics of the user are quantized by utilizing the behavior data and the scoring rules, and compared with the method of selecting the explicit preference characteristics, the collected data can be prevented from being too sparse;

the invention considers the preference of the user to the house, has the regionalized clustering characteristic, and particularly has more similarity to the preference characteristic of the house source for the user group in an administrative area, on one hand, a large number of basically irrelevant preference characteristics are filtered, more practical and accurate similarity can be obtained, on the other hand, the complexity of the similarity calculation is remarkably reduced, and the time and hardware resources required by the similarity calculation can be greatly reduced;

according to the method, the preference of users across administrative areas to houses is considered, parts with similar preference are determined outside the current administrative area and inside the recorded administrative area by utilizing historical position record information in user pictures of the users or province and city information filled by the users, for example, the users who live in the administrative area with heating facilities generally in the houses come to the administrative area without public heating, and the preference characteristics of the heating facilities can be reflected in a similarity set by the recorded preference characteristics of the users in the administrative area and the users, so that a recommended house source candidate set with extremely high similarity correlation degree can be found by simply filtering house sources in the range of the current administrative area;

according to the invention, a large amount of implicit preference data with poor correlation is filtered, so that the data volume used for participating in similar calculation is obviously reduced, and therefore, a relatively complex basis for expressing the implicit preference characteristics about the house resources in the user scoring matrix by using the implicit attribute feature vector is provided, and the recall rate of the required recommended house resources is improved;

the invention constructs the co-occurrence house source pair, can further reduce the data volume used for processing before calculating the similarity, and further reduces the time required by calculation and the requirement of hardware resources;

according to the method, the co-occurrence score of the user is constructed by using the co-occurrence of the user on the house source, the weighting of the similarity of the house source is completed through the co-occurrence score, and the subjective behavior preference feature information of the user, which is ignored by the hidden attribute feature vector, can be introduced into the similarity, so that the similarity set of the selected user on the house source preference is completely and accurately found;

the recommendation engine can take the position area range as the data processing granularity, reduces the complexity of correlation between room sources needing to be calculated, and reduces the time required by calculation and the requirement of hardware resources.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a schematic diagram of the main method steps of an embodiment of the present invention;

fig. 2 is a flow chart of an exemplary house source recommendation algorithm according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

Example 1

An embodiment of the present invention provides a collaborative filtering method, as shown in fig. 1, where the collaborative filtering method includes:

s1), determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, wherein the room sources in the first room source set are recorded with behavior data of the selected user;

s2) determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;

s3) determining a to-be-trained vector decomposition model, performing iterative computation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative computation is completed;

s4) performing factorization by using at least preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house sources in the second house source set, calculating the similarity of the feature vector set, and forming a similarity set after the calculation is completed.

Generally, with house information (house source) and various types of business information about a house, a data set of the house can be constructed, and the data set can have a unique identifier which can be regarded as a house source identifier; for the program, service or system that records, processes and applies the data set, the behavior data of users (such as selected users, partial users and recommended users) can also be recorded, each user generally has a unique identifier, which can be regarded as a user identifier; for describing the distinction and simplicity, the house source identifier may be simply referred to as a house source in this embodiment, and the user identifier may be simply referred to as a user in this embodiment, that is, referring to the house source, may refer to the house source identifier, and referring to the user may refer to the user identifier;

for data correspondence, a certain house source, a certain user and certain preference data can be recorded simultaneously correspondingly to form ternary data, for example, the ternary data is recorded with a corresponding relationship, the corresponding relationship can be realized through a relationship table or key value data, for example, the relationship table or the key value data is respectively realized through a relationship database or a key value database, in some specific implementations, the corresponding relationship can be cleared without data processing, namely, any one of the house source, the certain user and the certain preference data can always determine the other two through the corresponding relationship;

any room source set (such as a first room source set, a second room source set and a third room source set) has at least a plurality of room sources (room source identifiers), the preference data can be directly scored (obtained through explicit behaviors, such as score values obtained through direct evaluation of the room sources by users), and the preference data can also be indirectly scored (obtained through implicit behaviors, such as score values formed through behavior data of information corresponding to the room sources by users and mapping of specific behavior data).

In some implementations, the selected user may be one user, and some of the users may be multiple users (which may be large-scale relative to the selected user), for example, for some location area range, the preference data is too sparse, and the selected user may be a default user configured and may have a random preference characteristic or some preset preference characteristic, so that the similarity set for recommending the house resources to the large-scale user can be implemented with very low computational overhead.

In some implementations, the selected user may be a plurality of users, the partial user may be a plurality of users, the selected user may include the partial user or some users of the partial user, and any one of the selected users may be different from any one of the partial users;

the behavior data may include browsing or clicking a page (browse) where the house source is located, paying attention to or marking favorite house sources (favorite), dialing a contact phone (400) associated with the house source, sending online information (im) to a service provider associated with the house source, sharing the house source, adding the house source to a house source comparison queue of the user, and the like;

the preference characteristic information can be embodied by a current parameter generated by training of an initial parameter of the vector decomposition model to be trained, wherein the current parameter is a parameter of the trained vector decomposition model;

co-occurrence means that a plurality of users have behavior data for two house sources, and the two house sources can be called co-occurrence house source pairs; according to the data correspondence, the house sources corresponding to part of the users and the house sources corresponding to the selected users can be totally a co-occurrence house source pair, or partially a co-occurrence house source pair, or totally have no co-occurrence;

the selected user or portion of users may be a plurality of users within a coordinate locating range of a particular distance or network addresses within a particular administrative area; the behavior data set a of the user can be taken as { brown, favorite, 400, im }; the position area range can be a geometric range taking the coordinate position information of the selected user as the center, can also be ranges such as buildings, districts, street areas and the like, and can also be administrative areas; according to the preference data corresponding to the recommended user in the first preference data set and the similarity set (which still has a corresponding relationship with a room source, for example, the similarity is a table value, and the room source is a row description of a table), obtaining a similar preference data set (which keeps a corresponding relationship with a room source, for example, the similar preference data is a table value, and the room source is still a row description information of a table), and then determining the room source corresponding to the similar preference data with partial sequencing before by sequencing the similar preference data in the similar preference data set, so as to form a recommended room source candidate set corresponding to the recommended user in the selected user, in some specific implementations, the first room source set, the first preference data set, the similarity set, the second room source set, the recommended room source candidate set, and the like can all be selected as a matrix or a vector, the preference data and the similar preference data may be score values, and the score values are ranked in order of magnitude, and the score value is large in the top.

Specifically, in step S1), the forming of the preference data of the house resources in the first house resource set into the first preference data set corresponding to the selected user may specifically be:

obtaining the scores of the house sources in the first house source set by utilizing the behavior data of the selected user and combining with a preset implicit scoring rule, recording the scores corresponding to the house sources in the first house source set as preference data, and forming a first preference data set corresponding to the selected user through the preference data; in addition, the preference data of the house sources in the second house source set can also be formed in the same form, and so on, to form the preference data of the house sources in the qth (Q is a positive integer) house source set.

As shown in fig. 2, a user scoring matrix may be constructed according to the user behavior data of the selected user, and the user scoring matrix is associated with the house resources in the first house resource set and the selected user. For forming a user score, the user-generated behavioral house source is scored as Item, the set of items of user u (which may be one of the selected users)i _u={item₁, item₂, …,item_x, …, item_kK is a positive integer, item_xBelongs to Item }, x is an integer, and x is more than or equal to 1 and less than or equal to k; the preset implicit rating rule may be a predefined mapping rule, and may be configured to map a specific behavior data into a specific score value (rating), for example, a rating mapping relationship such as a rating weight rating or a rating number of behaviors, for example, configure a rating of 2, a focus or a mark of 3, and a communication with a house service provider of 5 (or may be identified by a percentage and a preset total rating, at this time, 20%, 30%, 50%, respectively, and the total rating may be 10%), where a browsing behavior and a communication behavior related to information of a specific house occur to a certain user, and a house source (identifier) of the specific house corresponds to a behavior to be recorded that the certain user has performed and an indirect rating of the certain user is 7; in some implementations, the preset implicit scoring rule may additionally have a time decay rule and an equal proportion distribution mapping relationship, the time decay rule may be a mapping rule for distributing weights according to time, for example, in an exemplary configuration, the percentage of the weight of the behavior of the user in the last month is 65%, the percentage of the weight of the behavior before the last month is 35%, and the calculation score finally considering the preset implicit scoring rule may be a product of the calculation result of the scoring mapping relationship and the corresponding percentage of the weight in the proportion distribution mapping relationship, for example, only the browsing behavior occurs, and the calculation score does not occur in the last month, and the calculation score occurs once before the last month, that is, 2 × 35%, and the indirect score of a certain user corresponding to the house source is 0.7.

According to the preset implicit scoring rule, a preset total score c can be determined, and then different weight percentages of corresponding behavior data, such as w_a={browse_a, favorite_a, 400_a, im_aB, browsing behavior browse scoring value r of user u to a certain house source_xNamely c and brown_aThereby, a scoring matrix R of the user u can be obtained_u=[r₁, r₂, r₃, …, r_k]If the user u is at the same item within a period of time_xIf too many behaviors are generated, the scores are weighted and summed according to time attenuation (namely each score value can be split into the sum of the weighted scores of the behaviors); the scoring matrix R = { R) of all users U in a period of time can be obtained_u, u∈U}。

Specifically, the step S2) of determining a location area range according to the user information of the selected user, and selecting a part of users according to the location area range may include:

determining a administrative area range of the selected user according to the physical position information or the network address information of the selected user, and selecting a part of users in the administrative area range, wherein the physical position information or the network address information belongs to the user information of the selected user, and the administrative area range is used as the current position area range of the selected user;

the user information can be obtained by collecting the log of the user terminal and the stored user file; the physical location information may be residential address information, mailing address information, or collected GPS location information of the user terminal.

The step S2) of determining the location area range and selecting a part of users according to the location area range may include:

determining a current administrative area range according to the current position information of the selected user, and selecting a part of users outside the current administrative area range and within the administrative area range corresponding to the position record information according to the current administrative area range and by combining position record information in the user image of the selected user; step S3), filtering the recommended house source candidate set by using filtering conditions, and obtaining the recommended house source candidate set belonging to the current administrative area range after filtering;

the user profile may include a user profile and behavioral logs (occurrence time and behavioral type) about the house source, etc., and the user profile may have recorded location record information such as city information, current real-time location information, and historical track location record information, gender information, and age information; the current position information can be real-time GPS positioning information, and the position recording information can select recorded city-saving information, so that preference characteristics which cannot be obtained from the current administrative district range can be used for similar calculation with the preference characteristics of the selected user; for example, a recommended user browses a house with floor heating and heating facilities, the behavior data of the user is less (for example, a new user who is soon registered), the constructed scoring matrix may be too sparse, but a user image of the user has a position record of an administrative district with a public heating facility, at this time, indiscriminate preference data (the scoring for house total price is low and house area is large, namely, the user is defaulted to have the behavior data of the house source) can be combined, interpolation of the user scoring matrix is performed, and then a part of users outside the current administrative district range and within the administrative district range corresponding to the position record information is selected for similar calculation with the user.

Specifically, after the determining the second room source set and by the time of calculating the similarity of the feature vector set, the method may include:

The method for executing the factorization of the scoring matrix is flexible, and a gradient descending method and a least square method can be selected; in some implementations, an alternating-Least Squares (ALS) model may be used, as in fig. 2, for ALS algorithm training according to a constructed user scoring matrix, specifically, after a user scoring matrix R (first scoring matrix) is obtained, to perform ALS algorithm training

Completing iterative training of the ALS model for an objective function of the ALS model, and obtaining a user hidden attribute matrix U and a room source hidden attribute matrix V, wherein,

is the value of the user i's credit to the house source j,

and

respectively representing the hidden attribute feature vector of the ith user and the hidden attribute feature vector of the jth house source; and substituting the second scoring matrix into the trained ALS model to complete matrix decomposition, so as to obtain a hidden attribute matrix V about the preference of the selected user to the house resources.

Specifically, after obtaining the trained vector decomposition model, and when obtaining at least the implicit attribute feature vector related to the house source factor in the second house source set, the method may specifically be:

For the house requirements, a user behavior has clustering characteristics in a certain specific area, if the clustering characteristics are not considered, a lot of redundancy exists in the similarity calculation between every two house source items generating behaviors, so that a recommendation engine consumes a lot of time and occupies too much hardware resources; when calculating the similarity between the property-generating property sources Item, adding a constraint condition, wherein the constraint condition can be set by a position area range, for example, taking an administrative district range as a constraint condition for similar calculation, and only considering the similarity between property sources simultaneously appearing in one or more user behavior sets in the same administrative district when constructing the property-generating property source Item; the room source Item set with behavior generated by user u is i_u={item₁, item₂, …, item_x, …, item_kK is a positive integer, item_xE.g. Item }, the Item set of behavior generated by user l is i_l={item₁, item₂, …, item_y…, item_mM is a positive integer, item_yBelongs to Item }, y is an integer, and y is more than or equal to 1 and less than or equal to m; as shown in fig. 2, the second room source set and the first room source set are matched to query room sources with co-occurrence, and then a room source co-occurrence pair is formed, the ALS algorithm training process may only use room sources with co-occurrence, optionally filter out the remaining non-co-occurrence room sources, and calculate to obtain the room source hidden attribute; in particular, by using i_u∩i_lChina house source (house source belongs to i)_u∩i_lAnd then the co-occurrence house source) construct a co-occurrence house source pair set< item₁, item₂>, < item₃, item₄>… …, constructing a set I of all co-occurring source pairs throughout the administrative district, in some implementationsFor example, for user u and user l, the co-occurrence matrix is an N × N matrix, N is max { k, m }, and the row-column dimension variables are (item [) variables₁, item₂, …, item_N) For line item₁Column item₂Can be filled in as item₁Latent attribute feature vector (which may correspond to the first set of house sources) and item₂Similarity of hidden attribute feature vectors (which can correspond to a second room source set), a default value can be set at a position without co-occurrence features in a co-occurrence matrix, and other co-occurrence room source pairs can be set in a similar manner;

further, after the ALS is used for decomposing the user scoring matrix, a hidden attribute feature V capable of representing the potential objective characteristics of the Item can be obtained (namely training is completed, the hidden attribute feature V is substituted into a certain user scoring matrix, and a hidden attribute feature vector of the certain user scoring matrix can be rapidly calculated through the hidden attribute feature vector V

,

Or

Is a positive integer and is less than or equal to N); co-occurrence Item pair set I = tone constructed based on constraint condition of administrative region granularity and Item co-occurrence characteristics< item₁, item₂>, < item₃, item₄>… … }. Therefore, the similarity S = { cos = (C) ((C))< V₁, V₂>), cos < V₃, V₄>… … }, wherein

Which represents the degree of cosine similarity,

or

May be a value in a hidden attribute feature matrix V.

Specifically, after determining the second room source set and before forming the similarity set, the method may further include:

weighting the similarity of the room source pair by using the co-occurrence score, wherein the weighting can be the product of the co-occurrence score and the corresponding similarity, and the summation conforming to the calculation rule, for example, the co-occurrence score is embodied in the form of a co-occurrence score matrix, and the similarity is also embodied in the form of a similarity matrix, and the weighting is the product of the co-occurrence score matrix and the similarity matrix;

on the basis of the set of co-occurrence house sources (or house source co-occurrence pair, house source pair, or house source pair) (filtering is performed by using the co-occurrence house source pair), as shown in fig. 2, for the obtained set of house source (co-occurrence) pairs, co-occurrence scores of two user sets having behavior data (the behavior may be only browsing or clicking) for the house sources in the set of house source co-occurrence pairs can be calculated, that is, the co-occurrence scores are calculated by the number of users in the intersection and the number of users in the union set of the first user set and the second user set, specifically, the users in the intersection can represent that the house sources in the house source co-occurrence pair are all browsed by the users in the intersection or clicked by the users in the intersection, the users in the union set can represent that the house sources in the house source co-occurrence pair are independently clicked by the users in the two user sets, and the users in the union set can also represent that the house sources in the house source co-occurrence pair are both browsed by the users in the intersection or clicked by the users in the intersection, for calculating the co-occurrence score, the number of users in the intersection can be larger than the number of users in the union, and the higher the co-occurrence score is, the higher the probability that two house sources in the house source pair are clicked at the same time is. For simplicity of illustration, a Co-occurrence Score, CS (Co-occurence Score), may be defined:

U1 = {u₁,u₂, …,u_pis the set of users (p is a positive integer) that are behaving with Item1, U2 = { U = }₁,u₂, …,u_qIs a set of users (q is a positive integer) that are behaving with Item2, since Item1 and Item2 are room-source co-occurrence pairs, according to the implicit attribute feature vectors of Item1 and Item2, one would be able to do so

Is written as

By analogy, for all the room pairs obtained in the foregoing, there may be:

in the formula (I), the compound is shown in the specification,

is a co-occurrence score set.

Further, as in FIG. 2, the similarity between items is weighted with co-occurrence scores; specifically, the cosine similarity of the Item pair calculated by the above hidden attribute feature vector is weighted by the CS of the calculated Item pair to obtain the final similarity score of the Item pair, and in some cases, the final similarity score is obtained

The particular form of the product may be selected based on the global data definition or organization, such as a matrix multiplication form,

or

Can be processed as transposed form; and then generating a recommendation candidate set of the Item according to the final similarity score, wherein optionally, the cosine similarity of the Item pair obtained by calculation and a scoring matrix of the recommended user are subjected to matrix multiplication, the scoring values are sorted after the matrix multiplication, the scoring values higher than a preset threshold value can be selected after the sorting is finished, and the recommendation candidate set of the recommended user about the house source Item which produces the behavior is generated according to the house source corresponding to the selected scoring value.

The embodiment of the invention also provides a recommendation method based on the collaborative filtering method, which comprises the following steps:

s1), determining a third room source set, and forming preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, wherein the room sources in the third room source set are recorded with behavior data of the recommended user;

s2) determining a recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.

For example, the recommended house source received by the recommended user may be a house source that has past behavior data, may be a house source that has not past behavior data but has been recorded by the user participating in the similarity calculation, or may be a house source that has been recorded by the user who intends to recommend the house source (when the recommended user is about to send the house source) after determining that all the score values corresponding to the house sources are lower than a score threshold, acquiring a default recommended house source in the position area range and taking the default recommended house source as a recommended house source, wherein the default recommended house source can be a nearby price-reducing house source, a new house source, a house source with a house evaluation score higher than an evaluation score threshold value and the like, or the default recommended house source can be alternately inserted into a recommended house source candidate set, inserting one or more default recommended house sources into every other recommended house source or recommended house sources in the obtained recommended house source candidate set, and then sending the recommended house sources to the recommended users; wherein the recommended user position is within the aforementioned position area range.

According to the embodiment of the invention, the ItemCF recommendation accuracy is improved, the operation efficiency of the room source recommendation system is improved, the effectiveness of similarity calculation and the interpretability of an algorithm are improved by fusing a CS weighting optimization strategy, and the calculation time and the hardware requirement of a model are reduced.

Example 2

Based on the inventive concept of embodiment 1, an embodiment of the present invention provides a collaborative filtering apparatus, which may include:

the first selection module may be configured to determine a first room source set, and form preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, where the room sources in the first room source set are recorded with behavior data of the selected user;

the second selection module may be configured to determine a location area range, select a part of users according to the location area range, determine a second room source set, and form preference data of the room sources in the second room source set into a second preference data set corresponding to the part of users, where the room sources in the second room source set are recorded with behavior data of the part of users;

the similarity calculation module can be used for determining a to-be-trained vector decomposition model, performing iterative calculation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative calculation is completed;

the similarity calculation module may be further configured to perform factorization with at least preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed.

Optionally, the first selection module may be specifically configured to obtain, by using the behavior data of the selected user and in combination with a preset implicit rating rule, a rating of the house source in the first house source set, record the rating corresponding to the house source in the first house source set as preference data, and form, through the preference data, a first preference data set corresponding to the selected user.

Optionally, the second selection module may be specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.

Optionally, the second selection module may be specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in the user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.

Optionally, the first selection module may be specifically configured to take a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to take a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data is a score;

the similarity calculation module may be specifically configured to determine a to-be-trained vector decomposition model, perform iterative calculation using the first scoring matrix in combination with the to-be-trained vector decomposition model, and obtain a trained vector decomposition model after the iterative calculation is completed, where the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference feature information;

the similarity calculation module may be further specifically configured to perform factorization on the second scoring matrix in combination with the trained vector decomposition model to obtain at least hidden attribute feature vectors related to the house source factors in the second house source set, form a feature vector set, and calculate a similarity of the feature vector set.

Optionally, the similarity calculation module may be further specifically configured to, after obtaining the trained vector decomposition model and when at least the implicit attribute feature vector related to the room source factor in the second room source set is obtained, determine a co-occurrence room source pair set according to the room sources in the first room source set and the room sources in the second room source set,

the similarity calculation module may be further specifically configured to filter the second scoring matrix by using the co-occurrence house source to the corresponding relationship between the house source in the set and the score, and perform factorization by using the filtered second scoring matrix in combination with the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factor in the second house source set.

Optionally, the collaborative filtering apparatus may further include:

a co-occurrence weighting module, configured to determine a set of co-occurrence house source pairs according to the house sources in the first house source set and the house sources in the second house source set after the determining the second house source set and before forming the similarity set,

the co-occurrence weighting module may be configured to form a first user set and a second user set, where each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set, and each user in the second user set is recorded with behavior data corresponding to the other of the room sources in the room source pair,

the co-occurrence weighting module may be configured to calculate a co-occurrence score by a number of users in the intersection and a number of users in the union of the first set of users and the second set of users,

the co-occurrence weighting module may be further operable to weight the similarity using the co-occurrence score.

An embodiment of the present invention further provides a recommendation apparatus, where the recommendation apparatus includes:

Optionally, the recommended user in the third selection module is a user selected from the aforementioned location area range.

Example 3

Based on the inventive concept of embodiment 1, an embodiment of the present invention provides a system for recommending a house source, including: one or more programs that may form one or more services in some production environments, each program or each service may perform one or more steps; in some implementations, one or more programs may be compiled or encrypted into an executable engine, which may call the output data of some executable programs, and which may rely on or have some function libraries and model libraries; the engine may be a recommendation engine, and the processing granularity of the recommendation engine may be a granularity determined by an administrative district;

a recommendation engine configured to execute instructions corresponding to the method described in embodiment 1.

According to the method, the co-occurrence Item pairs are constructed by using the administrative region granularity constraint conditions and the co-occurrence statistics of the items in the user behavior data, so that the time complexity of similarity calculation between the items is reduced, the calculation efficiency is improved, the quantity of the Item pairs needing to be calculated can be effectively reduced by the Item co-occurrence and the administrative region granularity constraint in the user behavior, and further the required hardware resources are greatly reduced;

the method calculates the CS between the items as the preference estimation of the user for browsing the two items simultaneously, and weights the similarity between the Item pairs by using the CS as the weight, while the traditional ALS-based Item CF does not consider the Item co-occurrence in the user behavior, namely the preference of the user for the items, and effectively fuses the correlation between the CS and the hidden attributes of the items, and the CS weighting optimization strategy improves the effectiveness of similarity calculation.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.

Those skilled in the art will understand that all or part of the steps in the method according to the above embodiments may be implemented by a program, which is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In addition, any combination of various different implementation manners of the embodiments of the present invention is also possible, and the embodiments of the present invention should be considered as disclosed in the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.

Claims

1. A collaborative filtering method, characterized in that the collaborative filtering method comprises:

performing factorization by at least using preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculating similarity of the feature vector set, and forming a similarity set after calculation is completed;

wherein after the determining the second room source set and before forming the similarity set, further comprising:

weighting the similarity with the co-occurrence score;

wherein the co-occurrence score CS:

U1 = {u₁,u₂, …,u_pis the set of users that acted on Item1, U2 = { U = }₁,u₂, …,u_q"is the set of users that are behaving with Item2, Item1 and Item2 are room-source co-occurrence pairs, and p, q are both positive integers.

2. The collaborative filtering method according to claim 1, wherein the forming of the preference data of the house resources in the first house resource set into a first preference data set corresponding to the selected user specifically includes:

3. The collaborative filtering method according to claim 1, wherein the determining a location area range and selecting a portion of users based on the location area range comprises:

4. The collaborative filtering method according to claim 1, wherein the determining a location area range and selecting a portion of users based on the location area range comprises:

5. The collaborative filtering method according to claim 1, wherein after the determining the second room source set and until the similarity of the feature vector set is calculated, the method comprises:

6. The collaborative filtering method according to claim 5, wherein after the obtaining of the trained vector decomposition model and when at least the implicit attribute feature vectors for the house-source factors in the second house-source set are obtained, specifically:

7. A recommendation method, wherein the similarity set is obtained from the collaborative filtering method according to any one of claims 1 to 6, the recommendation method comprising:

8. The recommendation method according to claim 7, wherein the user is selected from the range of the location area determined in the collaborative filtering method according to any one of claims 1 to 6 as the recommended user.

9. A collaborative filtering apparatus, comprising:

the similarity calculation module is further configured to perform factorization by using at least some preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed;

the collaborative filtering apparatus further comprises:

the co-occurrence weighting module is further configured to weight the similarity with the co-occurrence score;

wherein the co-occurrence score CS:

10. The collaborative filtering device of claim 9,

the first selection module is specifically configured to obtain scores of the house resources in the first house resource set by using the behavior data of the selected user in combination with a preset implicit scoring rule, record the scores corresponding to the house resources in the first house resource set as preference data, and form a first preference data set corresponding to the selected user through the preference data.

11. The collaborative filtering device of claim 9,

the second selection module is specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.

12. The collaborative filtering device of claim 9,

the second selection module is specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in the user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.

13. The collaborative filtering device of claim 9,

the first selection module is specifically configured to select a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to select a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data set is a score;

14. The collaborative filtering device of claim 13,

the similarity calculation module is further specifically configured to determine a set of co-occurrence house source pairs according to the house sources in the first house source set and the house sources in the second house source set after the trained vector decomposition model is obtained and when at least the implicit attribute feature vectors related to the house source factors in the second house source set are obtained,

15. A recommendation apparatus, wherein the similarity set is obtained in the collaborative filtering apparatus according to any one of claims 9 to 14, the recommendation apparatus comprising:

16. The recommendation device according to claim 15, wherein the recommended user in the third selection module is a user selected from the location area range determined by the second selection module in the collaborative filtering device according to any one of claims 9 to 14.

17. A system for recommending a house source, the system comprising:

a recommendation engine configured to execute instructions corresponding to the method of any of claims 1 to 8.

18. An apparatus for processing premises information, comprising:

at least one processor;

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1 to 8 by executing the instructions stored by the memory.

19. A computer readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.