CN111611499B - Collaborative filtering method, collaborative filtering device and collaborative filtering system - Google Patents

Collaborative filtering method, collaborative filtering device and collaborative filtering system Download PDF

Info

Publication number
CN111611499B
CN111611499B CN202010470716.3A CN202010470716A CN111611499B CN 111611499 B CN111611499 B CN 111611499B CN 202010470716 A CN202010470716 A CN 202010470716A CN 111611499 B CN111611499 B CN 111611499B
Authority
CN
China
Prior art keywords
house
room
user
source
preference data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010470716.3A
Other languages
Chinese (zh)
Other versions
CN111611499A (en
Inventor
李政浩
董天南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seashell Housing Beijing Technology Co Ltd
Original Assignee
Seashell Housing Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seashell Housing Beijing Technology Co Ltd filed Critical Seashell Housing Beijing Technology Co Ltd
Priority to CN202010470716.3A priority Critical patent/CN111611499B/en
Publication of CN111611499A publication Critical patent/CN111611499A/en
Application granted granted Critical
Publication of CN111611499B publication Critical patent/CN111611499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0259Targeted advertisements based on store location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0261Targeted advertisements based on user location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Abstract

The invention provides a collaborative filtering method, a collaborative filtering device and a collaborative filtering system, and belongs to the technical field of house information processing. The method comprises the following steps: determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user; determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users; and acquiring a trained vector decomposition model through the second preference data set, acquiring a characteristic vector set corresponding to the house resources in the second house resource set by using the second preference data set and the trained vector decomposition model, calculating the similarity of the characteristic vector set, and forming a similarity set after the calculation is finished. The method is used for determining the recommended house source with the user preference characteristic through similarity calculation.

Description

Collaborative filtering method, collaborative filtering device and collaborative filtering system
Technical Field
The invention relates to the technical field of house information processing, in particular to a collaborative filtering method, a recommendation method, a collaborative filtering device, a recommendation device, a system, equipment and a computer readable storage medium.
Background
In collaborative filtering algorithms based on articles, recommendation lists are generally generated by utilizing similarities among the articles, and such algorithms have time complexity O (n)2). But currently in the era of information overload, the items of information confronted are countless, such as house information (house source, as the Item to be processed); with the gradual increase of the data amount n, the time cost and hardware resource consumption for calculating the similarity between any two items are very high, so an effective calculation strategy is needed to reduce the calculation cost and save the calculation resources.
Disclosure of Invention
The invention aims to provide a collaborative filtering method, a collaborative filtering device and a collaborative filtering system, which solve the technical problems of low recall rate of high-correlation effective house resources, long time for similarity calculation, high time complexity, excessive occupation of hardware resources and the like caused by the existence of a large amount of redundant similarity calculation with poor correlation in the prior art.
According to an aspect of an embodiment of the present disclosure, there is provided a collaborative filtering method including:
determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, wherein the room sources in the first room source set are recorded with behavior data of the selected user;
determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;
determining a to-be-trained vector decomposition model, performing iterative computation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative computation is completed;
and performing factorization by at least utilizing preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculating the similarity of the feature vector set, and forming a similarity set after the calculation is finished.
In an embodiment of the present disclosure, the forming, by the preference data of the house resources in the first house resource set, a first preference data set corresponding to the selected user specifically includes:
and obtaining the scores of the house sources in the first house source set by utilizing the behavior data of the selected user and combining with a preset implicit scoring rule, recording the scores corresponding to the house sources in the first house source set as preference data, and forming a first preference data set corresponding to the selected user through the preference data.
In another embodiment of the present disclosure, the determining a location area range and selecting a part of users according to the location area range includes:
and determining the administrative area range of the selected user according to the physical position information or the network address information of the selected user, and selecting part of users in the administrative area range.
In another embodiment of the present disclosure, the determining a location area range and selecting a part of users according to the location area range includes:
and determining a current administrative area range according to the current position information of the selected user, and selecting a part of users outside the current administrative area range and in the administrative area range corresponding to the position record information according to the current administrative area range and by combining the position record information in the user image of the selected user.
In yet another embodiment of the present disclosure, after the determining the second room source set and by the time of calculating the similarity of the feature vector sets, the method includes:
taking a first preference data set corresponding to the selected user as a first scoring matrix, and taking a second preference data set corresponding to the part of users as a second scoring matrix, wherein any one preference data is a score;
determining a to-be-trained vector decomposition model, performing iterative computation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative computation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;
and performing factorization by using the second scoring matrix and combining the trained vector decomposition model to at least obtain the hidden attribute feature vectors of the house source factors in the second house source set, forming a feature vector set, and calculating the similarity of the feature vector set.
In another embodiment of the present disclosure, after obtaining the trained vector decomposition model, and when obtaining at least the implicit attribute feature vector related to the house source factor in the second house source set, the method specifically comprises:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
and filtering the second scoring matrix by utilizing the corresponding relation between the co-occurrence house source and the scores in the set, and then performing factorization by using the filtered second scoring matrix and combining the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factors in the second house source set.
In yet another embodiment of the present disclosure, after the determining the second room source set and before forming the similarity set, the method further includes:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
forming a first user set and a second user set, wherein each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set of the co-occurrence room source pair, and each user in the second user set is recorded with behavior data corresponding to the other room source in the room source pair;
calculating a co-occurrence score according to the number of users in the intersection and the number of users in the union of the first user set and the second user set;
weighting the similarity with the co-occurrence score.
According to another aspect of the embodiments of the present disclosure, there is provided a recommendation method including:
determining a third room source set, and forming preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, wherein the room sources in the third room source set are recorded with behavior data of the recommended user;
and determining a recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
In an embodiment of the present disclosure, a user is selected from the location area range as a recommended user.
According to still another aspect of an embodiment of the present disclosure, there is provided a collaborative filtering apparatus including:
the system comprises a first selection module, a second selection module and a third selection module, wherein the first selection module is used for determining a first room source set and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, and the room sources in the first room source set are recorded with behavior data of the selected user;
the second selection module is used for determining a position area range, selecting a part of users according to the position area range, then determining a second room source set, and forming preference data of the room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;
the similarity calculation module is used for determining a to-be-trained vector decomposition model, performing iterative calculation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative calculation is completed;
the similarity calculation module is further configured to perform factorization by using at least some preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed.
In an embodiment of the present disclosure, the first selection module is specifically configured to obtain, by using the behavior data of the selected user and in combination with a preset implicit rating rule, a rating of a house source in the first house source set, record a rating corresponding to a house source in the first house source set as preference data, and form, through the preference data, a first preference data set corresponding to the selected user.
In another embodiment of the present disclosure, the second selection module is specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.
In yet another embodiment of the present disclosure, the second selection module is specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in a user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.
In yet another embodiment of the present disclosure, the first selection module is specifically configured to select a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to select a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data is a score;
the similarity calculation module is specifically used for determining a to-be-trained vector decomposition model, performing iterative calculation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative calculation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;
the similarity calculation module is further specifically configured to perform factorization on the second scoring matrix in combination with the trained vector decomposition model to obtain at least hidden attribute feature vectors related to the house source factors in the second house source set, form a feature vector set, and calculate similarity of the feature vector set.
In yet another embodiment of the present disclosure, the similarity calculation module is further specifically configured to, after the obtaining of the trained vector decomposition model and when at least the implicit attribute feature vector related to the house source factor in the second house source set is obtained, determine a co-occurrence house source pair set according to the house sources in the first house source set and the house sources in the second house source set,
the similarity calculation module is further specifically configured to filter the second scoring matrix by using the co-occurrence house source to the corresponding relationship between the house source in the set and the score, and perform factorization by using the filtered second scoring matrix in combination with the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factor in the second house source set.
In yet another embodiment of the present disclosure, the collaborative filtering apparatus further includes:
a co-occurrence weighting module, configured to determine a set of co-occurrence room source pairs according to the room sources in the first room source set and the room sources in the second room source set after the second room source set is determined and before the similarity set is formed,
the co-occurrence weighting module is used for forming a first user set and a second user set, wherein all users in the first user set are recorded with behavior data corresponding to one of the room sources in the room source pair set in the co-occurrence room source pair set, and all users in the second user set are recorded with behavior data corresponding to the other room source in the room source pair,
the co-occurrence weighting module is used for calculating a co-occurrence score through the number of users in the intersection and the number of users in the union of the first user set and the second user set,
the co-occurrence weighting module is further configured to weight the similarity using the co-occurrence score.
According to still another aspect of the embodiments of the present disclosure, there is provided a recommendation apparatus including:
a third selection module, configured to determine a third room source set, and form preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, where the room sources in the third room source set are recorded with behavior data of the recommended user;
and the recommending module is used for determining the recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
In an embodiment of the present disclosure, the recommended user in the third selection module is a user selected from the aforementioned location area range.
According to yet another aspect of embodiments of the present disclosure, there is provided a system comprising:
a recommendation engine configured to execute instructions corresponding to the foregoing method.
According to still another aspect of the embodiments of the present disclosure, there is provided an apparatus for recommending a house source, including:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implements the aforementioned method by executing the instructions stored by the memory.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the aforementioned method.
Corresponding to the content, the invention utilizes the house source of the behavior generated by the user as the preference feature (explicit preference feature and/or implicit preference feature) of the user, combines the range of the position area, filters out partial users possibly having similar preference feature with the user, obviously reduces the similarity calculation complexity of the house source of the behavior generated by the user and the partial users, solves the corresponding relation of the obtained similarity set with the house source, the corresponding relation is transmitted to the similar preference data set (combined result) after the similarity set is combined with the third preference data set, and can determine partial similar preference data (a plurality of) in the similar preference data set and the recommended house source corresponding to the similar preference data by sequencing the similar preference data set according to the size sequence;
according to the method, the implicit preference characteristics of the user are quantized by utilizing the behavior data and the scoring rules, and compared with the method of selecting the explicit preference characteristics, the collected data can be prevented from being too sparse;
the invention considers the preference of the user to the house, has the regionalized clustering characteristic, and particularly has more similarity to the preference characteristic of the house source for the user group in an administrative area, on one hand, a large number of basically irrelevant preference characteristics are filtered, more practical and accurate similarity can be obtained, on the other hand, the complexity of the similarity calculation is remarkably reduced, and the time and hardware resources required by the similarity calculation can be greatly reduced;
according to the method, the preference of users across administrative areas to houses is considered, parts with similar preference are determined outside the current administrative area and inside the recorded administrative area by utilizing historical position record information in user pictures of the users or province and city information filled by the users, for example, the users who live in the administrative area with heating facilities generally in the houses come to the administrative area without public heating, and the preference characteristics of the heating facilities can be reflected in a similarity set by the recorded preference characteristics of the users in the administrative area and the users, so that a recommended house source candidate set with extremely high similarity correlation degree can be found by simply filtering house sources in the range of the current administrative area;
according to the invention, a large amount of implicit preference data with poor correlation is filtered, so that the data volume used for participating in similar calculation is obviously reduced, and therefore, a relatively complex basis for expressing the implicit preference characteristics about the house resources in the user scoring matrix by using the implicit attribute feature vector is provided, and the recall rate of the required recommended house resources is improved;
the invention constructs the co-occurrence house source pair, can further reduce the data volume used for processing before calculating the similarity, and further reduces the time required by calculation and the requirement of hardware resources;
according to the method, the co-occurrence score of the user is constructed by using the co-occurrence of the user on the house source, the weighting of the similarity of the house source is completed through the co-occurrence score, and the subjective behavior preference feature information of the user, which is ignored by the hidden attribute feature vector, can be introduced into the similarity, so that the similarity set of the selected user on the house source preference is completely and accurately found;
the recommendation engine can take the position area range as the data processing granularity, reduces the complexity of correlation between room sources needing to be calculated, and reduces the time required by calculation and the requirement of hardware resources.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a schematic diagram of the main method steps of an embodiment of the present invention;
fig. 2 is a flow chart of an exemplary house source recommendation algorithm according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
Example 1
An embodiment of the present invention provides a collaborative filtering method, as shown in fig. 1, where the collaborative filtering method includes:
s1), determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, wherein the room sources in the first room source set are recorded with behavior data of the selected user;
s2) determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;
s3) determining a to-be-trained vector decomposition model, performing iterative computation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative computation is completed;
s4) performing factorization by using at least preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house sources in the second house source set, calculating the similarity of the feature vector set, and forming a similarity set after the calculation is completed.
Generally, with house information (house source) and various types of business information about a house, a data set of the house can be constructed, and the data set can have a unique identifier which can be regarded as a house source identifier; for the program, service or system that records, processes and applies the data set, the behavior data of users (such as selected users, partial users and recommended users) can also be recorded, each user generally has a unique identifier, which can be regarded as a user identifier; for describing the distinction and simplicity, the house source identifier may be simply referred to as a house source in this embodiment, and the user identifier may be simply referred to as a user in this embodiment, that is, referring to the house source, may refer to the house source identifier, and referring to the user may refer to the user identifier;
for data correspondence, a certain house source, a certain user and certain preference data can be recorded simultaneously correspondingly to form ternary data, for example, the ternary data is recorded with a corresponding relationship, the corresponding relationship can be realized through a relationship table or key value data, for example, the relationship table or the key value data is respectively realized through a relationship database or a key value database, in some specific implementations, the corresponding relationship can be cleared without data processing, namely, any one of the house source, the certain user and the certain preference data can always determine the other two through the corresponding relationship;
any room source set (such as a first room source set, a second room source set and a third room source set) has at least a plurality of room sources (room source identifiers), the preference data can be directly scored (obtained through explicit behaviors, such as score values obtained through direct evaluation of the room sources by users), and the preference data can also be indirectly scored (obtained through implicit behaviors, such as score values formed through behavior data of information corresponding to the room sources by users and mapping of specific behavior data).
In some implementations, the selected user may be one user, and some of the users may be multiple users (which may be large-scale relative to the selected user), for example, for some location area range, the preference data is too sparse, and the selected user may be a default user configured and may have a random preference characteristic or some preset preference characteristic, so that the similarity set for recommending the house resources to the large-scale user can be implemented with very low computational overhead.
In some implementations, the selected user may be a plurality of users, the partial user may be a plurality of users, the selected user may include the partial user or some users of the partial user, and any one of the selected users may be different from any one of the partial users;
the behavior data may include browsing or clicking a page (browse) where the house source is located, paying attention to or marking favorite house sources (favorite), dialing a contact phone (400) associated with the house source, sending online information (im) to a service provider associated with the house source, sharing the house source, adding the house source to a house source comparison queue of the user, and the like;
the preference characteristic information can be embodied by a current parameter generated by training of an initial parameter of the vector decomposition model to be trained, wherein the current parameter is a parameter of the trained vector decomposition model;
co-occurrence means that a plurality of users have behavior data for two house sources, and the two house sources can be called co-occurrence house source pairs; according to the data correspondence, the house sources corresponding to part of the users and the house sources corresponding to the selected users can be totally a co-occurrence house source pair, or partially a co-occurrence house source pair, or totally have no co-occurrence;
the selected user or portion of users may be a plurality of users within a coordinate locating range of a particular distance or network addresses within a particular administrative area; the behavior data set a of the user can be taken as { brown, favorite, 400, im }; the position area range can be a geometric range taking the coordinate position information of the selected user as the center, can also be ranges such as buildings, districts, street areas and the like, and can also be administrative areas; according to the preference data corresponding to the recommended user in the first preference data set and the similarity set (which still has a corresponding relationship with a room source, for example, the similarity is a table value, and the room source is a row description of a table), obtaining a similar preference data set (which keeps a corresponding relationship with a room source, for example, the similar preference data is a table value, and the room source is still a row description information of a table), and then determining the room source corresponding to the similar preference data with partial sequencing before by sequencing the similar preference data in the similar preference data set, so as to form a recommended room source candidate set corresponding to the recommended user in the selected user, in some specific implementations, the first room source set, the first preference data set, the similarity set, the second room source set, the recommended room source candidate set, and the like can all be selected as a matrix or a vector, the preference data and the similar preference data may be score values, and the score values are ranked in order of magnitude, and the score value is large in the top.
Specifically, in step S1), the forming of the preference data of the house resources in the first house resource set into the first preference data set corresponding to the selected user may specifically be:
obtaining the scores of the house sources in the first house source set by utilizing the behavior data of the selected user and combining with a preset implicit scoring rule, recording the scores corresponding to the house sources in the first house source set as preference data, and forming a first preference data set corresponding to the selected user through the preference data; in addition, the preference data of the house sources in the second house source set can also be formed in the same form, and so on, to form the preference data of the house sources in the qth (Q is a positive integer) house source set.
As shown in fig. 2, a user scoring matrix may be constructed according to the user behavior data of the selected user, and the user scoring matrix is associated with the house resources in the first house resource set and the selected user. For forming a user score, the user-generated behavioral house source is scored as Item, the set of items of user u (which may be one of the selected users)i u ={item1, item2, …,itemx, …, itemkK is a positive integer, itemxBelongs to Item }, x is an integer, and x is more than or equal to 1 and less than or equal to k; the preset implicit rating rule may be a predefined mapping rule, and may be configured to map a specific behavior data into a specific score value (rating), for example, a rating mapping relationship such as a rating weight rating or a rating number of behaviors, for example, configure a rating of 2, a focus or a mark of 3, and a communication with a house service provider of 5 (or may be identified by a percentage and a preset total rating, at this time, 20%, 30%, 50%, respectively, and the total rating may be 10%), where a browsing behavior and a communication behavior related to information of a specific house occur to a certain user, and a house source (identifier) of the specific house corresponds to a behavior to be recorded that the certain user has performed and an indirect rating of the certain user is 7; in some implementations, the preset implicit scoring rule may additionally have a time decay rule and an equal proportion distribution mapping relationship, the time decay rule may be a mapping rule for distributing weights according to time, for example, in an exemplary configuration, the percentage of the weight of the behavior of the user in the last month is 65%, the percentage of the weight of the behavior before the last month is 35%, and the calculation score finally considering the preset implicit scoring rule may be a product of the calculation result of the scoring mapping relationship and the corresponding percentage of the weight in the proportion distribution mapping relationship, for example, only the browsing behavior occurs, and the calculation score does not occur in the last month, and the calculation score occurs once before the last month, that is, 2 × 35%, and the indirect score of a certain user corresponding to the house source is 0.7.
According to the preset implicit scoring rule, a preset total score c can be determined, and then different weight percentages of corresponding behavior data, such as wa={browsea, favoritea, 400a, imaB, browsing behavior browse scoring value r of user u to a certain house sourcexNamely c and brownaThereby, a scoring matrix R of the user u can be obtainedu=[r1, r2, r3, …, r k ]If the user u is at the same item within a period of timexIf too many behaviors are generated, the scores are weighted and summed according to time attenuation (namely each score value can be split into the sum of the weighted scores of the behaviors); the scoring matrix R = { R) of all users U in a period of time can be obtainedu, u∈U}。
Specifically, the step S2) of determining a location area range according to the user information of the selected user, and selecting a part of users according to the location area range may include:
determining a administrative area range of the selected user according to the physical position information or the network address information of the selected user, and selecting a part of users in the administrative area range, wherein the physical position information or the network address information belongs to the user information of the selected user, and the administrative area range is used as the current position area range of the selected user;
the user information can be obtained by collecting the log of the user terminal and the stored user file; the physical location information may be residential address information, mailing address information, or collected GPS location information of the user terminal.
The step S2) of determining the location area range and selecting a part of users according to the location area range may include:
determining a current administrative area range according to the current position information of the selected user, and selecting a part of users outside the current administrative area range and within the administrative area range corresponding to the position record information according to the current administrative area range and by combining position record information in the user image of the selected user; step S3), filtering the recommended house source candidate set by using filtering conditions, and obtaining the recommended house source candidate set belonging to the current administrative area range after filtering;
the user profile may include a user profile and behavioral logs (occurrence time and behavioral type) about the house source, etc., and the user profile may have recorded location record information such as city information, current real-time location information, and historical track location record information, gender information, and age information; the current position information can be real-time GPS positioning information, and the position recording information can select recorded city-saving information, so that preference characteristics which cannot be obtained from the current administrative district range can be used for similar calculation with the preference characteristics of the selected user; for example, a recommended user browses a house with floor heating and heating facilities, the behavior data of the user is less (for example, a new user who is soon registered), the constructed scoring matrix may be too sparse, but a user image of the user has a position record of an administrative district with a public heating facility, at this time, indiscriminate preference data (the scoring for house total price is low and house area is large, namely, the user is defaulted to have the behavior data of the house source) can be combined, interpolation of the user scoring matrix is performed, and then a part of users outside the current administrative district range and within the administrative district range corresponding to the position record information is selected for similar calculation with the user.
Specifically, after the determining the second room source set and by the time of calculating the similarity of the feature vector set, the method may include:
taking a first preference data set corresponding to the selected user as a first scoring matrix, and taking a second preference data set corresponding to the part of users as a second scoring matrix, wherein any one preference data is a score;
determining a to-be-trained vector decomposition model, performing iterative computation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative computation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;
and performing factorization by using the second scoring matrix and combining the trained vector decomposition model to at least obtain the hidden attribute feature vectors of the house source factors in the second house source set, forming a feature vector set, and calculating the similarity of the feature vector set.
The method for executing the factorization of the scoring matrix is flexible, and a gradient descending method and a least square method can be selected; in some implementations, an alternating-Least Squares (ALS) model may be used, as in fig. 2, for ALS algorithm training according to a constructed user scoring matrix, specifically, after a user scoring matrix R (first scoring matrix) is obtained, to perform ALS algorithm training
Figure DEST_PATH_IMAGE001
Completing iterative training of the ALS model for an objective function of the ALS model, and obtaining a user hidden attribute matrix U and a room source hidden attribute matrix V, wherein,
Figure 818839DEST_PATH_IMAGE002
is the value of the user i's credit to the house source j,
Figure DEST_PATH_IMAGE003
and
Figure 405416DEST_PATH_IMAGE004
respectively representing the hidden attribute feature vector of the ith user and the hidden attribute feature vector of the jth house source; and substituting the second scoring matrix into the trained ALS model to complete matrix decomposition, so as to obtain a hidden attribute matrix V about the preference of the selected user to the house resources.
Specifically, after obtaining the trained vector decomposition model, and when obtaining at least the implicit attribute feature vector related to the house source factor in the second house source set, the method may specifically be:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
and filtering the second scoring matrix by utilizing the corresponding relation between the co-occurrence house source and the scores in the set, and then performing factorization by using the filtered second scoring matrix and combining the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factors in the second house source set.
For the house requirements, a user behavior has clustering characteristics in a certain specific area, if the clustering characteristics are not considered, a lot of redundancy exists in the similarity calculation between every two house source items generating behaviors, so that a recommendation engine consumes a lot of time and occupies too much hardware resources; when calculating the similarity between the property-generating property sources Item, adding a constraint condition, wherein the constraint condition can be set by a position area range, for example, taking an administrative district range as a constraint condition for similar calculation, and only considering the similarity between property sources simultaneously appearing in one or more user behavior sets in the same administrative district when constructing the property-generating property source Item; the room source Item set with behavior generated by user u is iu={item1, item2, …, itemx, …, itemkK is a positive integer, itemxE.g. Item }, the Item set of behavior generated by user l is il={item1, item2, …, itemy…, itemmM is a positive integer, itemyBelongs to Item }, y is an integer, and y is more than or equal to 1 and less than or equal to m; as shown in fig. 2, the second room source set and the first room source set are matched to query room sources with co-occurrence, and then a room source co-occurrence pair is formed, the ALS algorithm training process may only use room sources with co-occurrence, optionally filter out the remaining non-co-occurrence room sources, and calculate to obtain the room source hidden attribute; in particular, by using iu∩ilChina house source (house source belongs to i)u∩ilAnd then the co-occurrence house source) construct a co-occurrence house source pair set< item1, item2>, < item3, item4>… …, constructing a set I of all co-occurring source pairs throughout the administrative district, in some implementationsFor example, for user u and user l, the co-occurrence matrix is an N × N matrix, N is max { k, m }, and the row-column dimension variables are (item [) variables1, item2, …, itemN) For line item1Column item2Can be filled in as item1Latent attribute feature vector (which may correspond to the first set of house sources) and item2Similarity of hidden attribute feature vectors (which can correspond to a second room source set), a default value can be set at a position without co-occurrence features in a co-occurrence matrix, and other co-occurrence room source pairs can be set in a similar manner;
further, after the ALS is used for decomposing the user scoring matrix, a hidden attribute feature V capable of representing the potential objective characteristics of the Item can be obtained (namely training is completed, the hidden attribute feature V is substituted into a certain user scoring matrix, and a hidden attribute feature vector of the certain user scoring matrix can be rapidly calculated through the hidden attribute feature vector V
Figure DEST_PATH_IMAGE005
,
Figure 734767DEST_PATH_IMAGE006
Or
Figure 730404DEST_PATH_IMAGE007
Is a positive integer and is less than or equal to N); co-occurrence Item pair set I = tone constructed based on constraint condition of administrative region granularity and Item co-occurrence characteristics< item1, item2>, < item3, item4>… … }. Therefore, the similarity S = { cos = (C) ((C))< V1, V2>), cos < V3, V4>… … }, wherein
Figure 930442DEST_PATH_IMAGE008
Which represents the degree of cosine similarity,
Figure 658226DEST_PATH_IMAGE009
or
Figure 943714DEST_PATH_IMAGE010
May be a value in a hidden attribute feature matrix V.
Specifically, after determining the second room source set and before forming the similarity set, the method may further include:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
forming a first user set and a second user set, wherein each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set of the co-occurrence room source pair, and each user in the second user set is recorded with behavior data corresponding to the other room source in the room source pair;
calculating a co-occurrence score according to the number of users in the intersection and the number of users in the union of the first user set and the second user set;
weighting the similarity of the room source pair by using the co-occurrence score, wherein the weighting can be the product of the co-occurrence score and the corresponding similarity, and the summation conforming to the calculation rule, for example, the co-occurrence score is embodied in the form of a co-occurrence score matrix, and the similarity is also embodied in the form of a similarity matrix, and the weighting is the product of the co-occurrence score matrix and the similarity matrix;
on the basis of the set of co-occurrence house sources (or house source co-occurrence pair, house source pair, or house source pair) (filtering is performed by using the co-occurrence house source pair), as shown in fig. 2, for the obtained set of house source (co-occurrence) pairs, co-occurrence scores of two user sets having behavior data (the behavior may be only browsing or clicking) for the house sources in the set of house source co-occurrence pairs can be calculated, that is, the co-occurrence scores are calculated by the number of users in the intersection and the number of users in the union set of the first user set and the second user set, specifically, the users in the intersection can represent that the house sources in the house source co-occurrence pair are all browsed by the users in the intersection or clicked by the users in the intersection, the users in the union set can represent that the house sources in the house source co-occurrence pair are independently clicked by the users in the two user sets, and the users in the union set can also represent that the house sources in the house source co-occurrence pair are both browsed by the users in the intersection or clicked by the users in the intersection, for calculating the co-occurrence score, the number of users in the intersection can be larger than the number of users in the union, and the higher the co-occurrence score is, the higher the probability that two house sources in the house source pair are clicked at the same time is. For simplicity of illustration, a Co-occurrence Score, CS (Co-occurence Score), may be defined:
Figure 211884DEST_PATH_IMAGE011
U1 = {u1,u2, …,upis the set of users (p is a positive integer) that are behaving with Item1, U2 = { U = }1,u2, …,uqIs a set of users (q is a positive integer) that are behaving with Item2, since Item1 and Item2 are room-source co-occurrence pairs, according to the implicit attribute feature vectors of Item1 and Item2, one would be able to do so
Figure 728896DEST_PATH_IMAGE013
Is written as
Figure 767259DEST_PATH_IMAGE014
By analogy, for all the room pairs obtained in the foregoing, there may be:
Figure 573541DEST_PATH_IMAGE015
in the formula (I), the compound is shown in the specification,
Figure 748170DEST_PATH_IMAGE016
is a co-occurrence score set.
Further, as in FIG. 2, the similarity between items is weighted with co-occurrence scores; specifically, the cosine similarity of the Item pair calculated by the above hidden attribute feature vector is weighted by the CS of the calculated Item pair to obtain the final similarity score of the Item pair, and in some cases, the final similarity score is obtained
Figure 817757DEST_PATH_IMAGE017
The particular form of the product may be selected based on the global data definition or organization, such as a matrix multiplication form,
Figure 77837DEST_PATH_IMAGE018
or
Figure 687810DEST_PATH_IMAGE019
Can be processed as transposed form; and then generating a recommendation candidate set of the Item according to the final similarity score, wherein optionally, the cosine similarity of the Item pair obtained by calculation and a scoring matrix of the recommended user are subjected to matrix multiplication, the scoring values are sorted after the matrix multiplication, the scoring values higher than a preset threshold value can be selected after the sorting is finished, and the recommendation candidate set of the recommended user about the house source Item which produces the behavior is generated according to the house source corresponding to the selected scoring value.
The embodiment of the invention also provides a recommendation method based on the collaborative filtering method, which comprises the following steps:
s1), determining a third room source set, and forming preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, wherein the room sources in the third room source set are recorded with behavior data of the recommended user;
s2) determining a recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
For example, the recommended house source received by the recommended user may be a house source that has past behavior data, may be a house source that has not past behavior data but has been recorded by the user participating in the similarity calculation, or may be a house source that has been recorded by the user who intends to recommend the house source (when the recommended user is about to send the house source) after determining that all the score values corresponding to the house sources are lower than a score threshold, acquiring a default recommended house source in the position area range and taking the default recommended house source as a recommended house source, wherein the default recommended house source can be a nearby price-reducing house source, a new house source, a house source with a house evaluation score higher than an evaluation score threshold value and the like, or the default recommended house source can be alternately inserted into a recommended house source candidate set, inserting one or more default recommended house sources into every other recommended house source or recommended house sources in the obtained recommended house source candidate set, and then sending the recommended house sources to the recommended users; wherein the recommended user position is within the aforementioned position area range.
According to the embodiment of the invention, the ItemCF recommendation accuracy is improved, the operation efficiency of the room source recommendation system is improved, the effectiveness of similarity calculation and the interpretability of an algorithm are improved by fusing a CS weighting optimization strategy, and the calculation time and the hardware requirement of a model are reduced.
Example 2
Based on the inventive concept of embodiment 1, an embodiment of the present invention provides a collaborative filtering apparatus, which may include:
the first selection module may be configured to determine a first room source set, and form preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, where the room sources in the first room source set are recorded with behavior data of the selected user;
the second selection module may be configured to determine a location area range, select a part of users according to the location area range, determine a second room source set, and form preference data of the room sources in the second room source set into a second preference data set corresponding to the part of users, where the room sources in the second room source set are recorded with behavior data of the part of users;
the similarity calculation module can be used for determining a to-be-trained vector decomposition model, performing iterative calculation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative calculation is completed;
the similarity calculation module may be further configured to perform factorization with at least preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed.
Optionally, the first selection module may be specifically configured to obtain, by using the behavior data of the selected user and in combination with a preset implicit rating rule, a rating of the house source in the first house source set, record the rating corresponding to the house source in the first house source set as preference data, and form, through the preference data, a first preference data set corresponding to the selected user.
Optionally, the second selection module may be specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.
Optionally, the second selection module may be specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in the user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.
Optionally, the first selection module may be specifically configured to take a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to take a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data is a score;
the similarity calculation module may be specifically configured to determine a to-be-trained vector decomposition model, perform iterative calculation using the first scoring matrix in combination with the to-be-trained vector decomposition model, and obtain a trained vector decomposition model after the iterative calculation is completed, where the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference feature information;
the similarity calculation module may be further specifically configured to perform factorization on the second scoring matrix in combination with the trained vector decomposition model to obtain at least hidden attribute feature vectors related to the house source factors in the second house source set, form a feature vector set, and calculate a similarity of the feature vector set.
Optionally, the similarity calculation module may be further specifically configured to, after obtaining the trained vector decomposition model and when at least the implicit attribute feature vector related to the room source factor in the second room source set is obtained, determine a co-occurrence room source pair set according to the room sources in the first room source set and the room sources in the second room source set,
the similarity calculation module may be further specifically configured to filter the second scoring matrix by using the co-occurrence house source to the corresponding relationship between the house source in the set and the score, and perform factorization by using the filtered second scoring matrix in combination with the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factor in the second house source set.
Optionally, the collaborative filtering apparatus may further include:
a co-occurrence weighting module, configured to determine a set of co-occurrence house source pairs according to the house sources in the first house source set and the house sources in the second house source set after the determining the second house source set and before forming the similarity set,
the co-occurrence weighting module may be configured to form a first user set and a second user set, where each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set, and each user in the second user set is recorded with behavior data corresponding to the other of the room sources in the room source pair,
the co-occurrence weighting module may be configured to calculate a co-occurrence score by a number of users in the intersection and a number of users in the union of the first set of users and the second set of users,
the co-occurrence weighting module may be further operable to weight the similarity using the co-occurrence score.
An embodiment of the present invention further provides a recommendation apparatus, where the recommendation apparatus includes:
a third selection module, configured to determine a third room source set, and form preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, where the room sources in the third room source set are recorded with behavior data of the recommended user;
and the recommending module is used for determining the recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
Optionally, the recommended user in the third selection module is a user selected from the aforementioned location area range.
Example 3
Based on the inventive concept of embodiment 1, an embodiment of the present invention provides a system for recommending a house source, including: one or more programs that may form one or more services in some production environments, each program or each service may perform one or more steps; in some implementations, one or more programs may be compiled or encrypted into an executable engine, which may call the output data of some executable programs, and which may rely on or have some function libraries and model libraries; the engine may be a recommendation engine, and the processing granularity of the recommendation engine may be a granularity determined by an administrative district;
a recommendation engine configured to execute instructions corresponding to the method described in embodiment 1.
According to the method, the co-occurrence Item pairs are constructed by using the administrative region granularity constraint conditions and the co-occurrence statistics of the items in the user behavior data, so that the time complexity of similarity calculation between the items is reduced, the calculation efficiency is improved, the quantity of the Item pairs needing to be calculated can be effectively reduced by the Item co-occurrence and the administrative region granularity constraint in the user behavior, and further the required hardware resources are greatly reduced;
the method calculates the CS between the items as the preference estimation of the user for browsing the two items simultaneously, and weights the similarity between the Item pairs by using the CS as the weight, while the traditional ALS-based Item CF does not consider the Item co-occurrence in the user behavior, namely the preference of the user for the items, and effectively fuses the correlation between the CS and the hidden attributes of the items, and the CS weighting optimization strategy improves the effectiveness of similarity calculation.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.
Those skilled in the art will understand that all or part of the steps in the method according to the above embodiments may be implemented by a program, which is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In addition, any combination of various different implementation manners of the embodiments of the present invention is also possible, and the embodiments of the present invention should be considered as disclosed in the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.

Claims (19)

1. A collaborative filtering method, characterized in that the collaborative filtering method comprises:
determining a first room source set, and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, wherein the room sources in the first room source set are recorded with behavior data of the selected user;
determining a position area range, selecting a part of users according to the position area range, determining a second room source set, and forming preference data of room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;
determining a to-be-trained vector decomposition model, performing iterative computation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative computation is completed;
performing factorization by at least using preference data in the second preference data set and combining the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculating similarity of the feature vector set, and forming a similarity set after calculation is completed;
wherein after the determining the second room source set and before forming the similarity set, further comprising:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
forming a first user set and a second user set, wherein each user in the first user set is recorded with behavior data corresponding to one of the room sources in the room source pair set of the co-occurrence room source pair, and each user in the second user set is recorded with behavior data corresponding to the other room source in the room source pair;
calculating a co-occurrence score according to the number of users in the intersection and the number of users in the union of the first user set and the second user set;
weighting the similarity with the co-occurrence score;
wherein the co-occurrence score CS:
Figure 528576DEST_PATH_IMAGE001
U1 = {u1,u2, …,upis the set of users that acted on Item1, U2 = { U = }1,u2, …,uq"is the set of users that are behaving with Item2, Item1 and Item2 are room-source co-occurrence pairs, and p, q are both positive integers.
2. The collaborative filtering method according to claim 1, wherein the forming of the preference data of the house resources in the first house resource set into a first preference data set corresponding to the selected user specifically includes:
and obtaining the scores of the house sources in the first house source set by utilizing the behavior data of the selected user and combining with a preset implicit scoring rule, recording the scores corresponding to the house sources in the first house source set as preference data, and forming a first preference data set corresponding to the selected user through the preference data.
3. The collaborative filtering method according to claim 1, wherein the determining a location area range and selecting a portion of users based on the location area range comprises:
and determining the administrative area range of the selected user according to the physical position information or the network address information of the selected user, and selecting part of users in the administrative area range.
4. The collaborative filtering method according to claim 1, wherein the determining a location area range and selecting a portion of users based on the location area range comprises:
and determining a current administrative area range according to the current position information of the selected user, and selecting a part of users outside the current administrative area range and in the administrative area range corresponding to the position record information according to the current administrative area range and by combining the position record information in the user image of the selected user.
5. The collaborative filtering method according to claim 1, wherein after the determining the second room source set and until the similarity of the feature vector set is calculated, the method comprises:
taking a first preference data set corresponding to the selected user as a first scoring matrix, and taking a second preference data set corresponding to the part of users as a second scoring matrix, wherein any one preference data is a score;
determining a to-be-trained vector decomposition model, performing iterative computation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative computation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;
and performing factorization by using the second scoring matrix and combining the trained vector decomposition model to at least obtain the hidden attribute feature vectors of the house source factors in the second house source set, forming a feature vector set, and calculating the similarity of the feature vector set.
6. The collaborative filtering method according to claim 5, wherein after the obtaining of the trained vector decomposition model and when at least the implicit attribute feature vectors for the house-source factors in the second house-source set are obtained, specifically:
determining a co-occurrence house source pair set according to house sources in the first house source set and house sources in the second house source set;
and filtering the second scoring matrix by utilizing the corresponding relation between the co-occurrence house source and the scores in the set, and then performing factorization by using the filtered second scoring matrix and combining the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factors in the second house source set.
7. A recommendation method, wherein the similarity set is obtained from the collaborative filtering method according to any one of claims 1 to 6, the recommendation method comprising:
determining a third room source set, and forming preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, wherein the room sources in the third room source set are recorded with behavior data of the recommended user;
and determining a recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
8. The recommendation method according to claim 7, wherein the user is selected from the range of the location area determined in the collaborative filtering method according to any one of claims 1 to 6 as the recommended user.
9. A collaborative filtering apparatus, comprising:
the system comprises a first selection module, a second selection module and a third selection module, wherein the first selection module is used for determining a first room source set and forming preference data of room sources in the first room source set into a first preference data set corresponding to a selected user, and the room sources in the first room source set are recorded with behavior data of the selected user;
the second selection module is used for determining a position area range, selecting a part of users according to the position area range, then determining a second room source set, and forming preference data of the room sources in the second room source set into a second preference data set corresponding to the part of users, wherein the room sources in the second room source set are recorded with behavior data of the part of users;
the similarity calculation module is used for determining a to-be-trained vector decomposition model, performing iterative calculation by using at least part of preference data in the first preference data set and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model with preference characteristic information after the iterative calculation is completed;
the similarity calculation module is further configured to perform factorization by using at least some preference data in the second preference data set in combination with the trained vector decomposition model to obtain a feature vector set corresponding to the house resources in the second house resource set, calculate similarity of the feature vector set, and form a similarity set after the calculation is completed;
the collaborative filtering apparatus further comprises:
a co-occurrence weighting module, configured to determine a set of co-occurrence room source pairs according to the room sources in the first room source set and the room sources in the second room source set after the second room source set is determined and before the similarity set is formed,
the co-occurrence weighting module is used for forming a first user set and a second user set, wherein all users in the first user set are recorded with behavior data corresponding to one of the room sources in the room source pair set in the co-occurrence room source pair set, and all users in the second user set are recorded with behavior data corresponding to the other room source in the room source pair,
the co-occurrence weighting module is used for calculating a co-occurrence score through the number of users in the intersection and the number of users in the union of the first user set and the second user set,
the co-occurrence weighting module is further configured to weight the similarity with the co-occurrence score;
wherein the co-occurrence score CS:
Figure 287771DEST_PATH_IMAGE001
U1 = {u1,u2, …,upis the set of users that acted on Item1, U2 = { U = }1,u2, …,uq"is the set of users that are behaving with Item2, Item1 and Item2 are room-source co-occurrence pairs, and p, q are both positive integers.
10. The collaborative filtering device of claim 9,
the first selection module is specifically configured to obtain scores of the house resources in the first house resource set by using the behavior data of the selected user in combination with a preset implicit scoring rule, record the scores corresponding to the house resources in the first house resource set as preference data, and form a first preference data set corresponding to the selected user through the preference data.
11. The collaborative filtering device of claim 9,
the second selection module is specifically configured to determine an administrative area range of the selected user according to the physical location information or the network address information of the selected user, and select a part of users located in the administrative area range.
12. The collaborative filtering device of claim 9,
the second selection module is specifically configured to determine a current administrative area range according to the current location information of the selected user, and select, according to the current administrative area range and in combination with location record information in the user image of the selected user, a part of users that are outside the current administrative area range and are within the administrative area range corresponding to the location record information.
13. The collaborative filtering device of claim 9,
the first selection module is specifically configured to select a first preference data set corresponding to the selected user as a first scoring matrix, and the second selection module is specifically configured to select a second preference data set corresponding to the part of users as a second scoring matrix, where any one preference data set is a score;
the similarity calculation module is specifically used for determining a to-be-trained vector decomposition model, performing iterative calculation by using the first scoring matrix and combining the to-be-trained vector decomposition model, and obtaining a trained vector decomposition model after the iterative calculation is completed, wherein the to-be-trained vector decomposition model and the trained vector decomposition model are both alternative least square models, and the trained vector decomposition model has preference characteristic information;
the similarity calculation module is further specifically configured to perform factorization on the second scoring matrix in combination with the trained vector decomposition model to obtain at least hidden attribute feature vectors related to the house source factors in the second house source set, form a feature vector set, and calculate similarity of the feature vector set.
14. The collaborative filtering device of claim 13,
the similarity calculation module is further specifically configured to determine a set of co-occurrence house source pairs according to the house sources in the first house source set and the house sources in the second house source set after the trained vector decomposition model is obtained and when at least the implicit attribute feature vectors related to the house source factors in the second house source set are obtained,
the similarity calculation module is further specifically configured to filter the second scoring matrix by using the co-occurrence house source to the corresponding relationship between the house source in the set and the score, and perform factorization by using the filtered second scoring matrix in combination with the trained vector decomposition model to obtain the hidden attribute feature vector of the house source factor in the second house source set.
15. A recommendation apparatus, wherein the similarity set is obtained in the collaborative filtering apparatus according to any one of claims 9 to 14, the recommendation apparatus comprising:
a third selection module, configured to determine a third room source set, and form preference data of room sources in the third room source set into a third preference data set corresponding to a recommended user, where the room sources in the third room source set are recorded with behavior data of the recommended user;
and the recommending module is used for determining the recommended house source candidate set of the recommended user by utilizing the similarity set and combining the third preference data set.
16. The recommendation device according to claim 15, wherein the recommended user in the third selection module is a user selected from the location area range determined by the second selection module in the collaborative filtering device according to any one of claims 9 to 14.
17. A system for recommending a house source, the system comprising:
a recommendation engine configured to execute instructions corresponding to the method of any of claims 1 to 8.
18. An apparatus for processing premises information, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1 to 8 by executing the instructions stored by the memory.
19. A computer readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.
CN202010470716.3A 2020-05-28 2020-05-28 Collaborative filtering method, collaborative filtering device and collaborative filtering system Active CN111611499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010470716.3A CN111611499B (en) 2020-05-28 2020-05-28 Collaborative filtering method, collaborative filtering device and collaborative filtering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010470716.3A CN111611499B (en) 2020-05-28 2020-05-28 Collaborative filtering method, collaborative filtering device and collaborative filtering system

Publications (2)

Publication Number Publication Date
CN111611499A CN111611499A (en) 2020-09-01
CN111611499B true CN111611499B (en) 2021-08-17

Family

ID=72203731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010470716.3A Active CN111611499B (en) 2020-05-28 2020-05-28 Collaborative filtering method, collaborative filtering device and collaborative filtering system

Country Status (1)

Country Link
CN (1) CN111611499B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328658B (en) * 2020-11-03 2023-08-08 北京百度网讯科技有限公司 User profile data processing method, device, equipment and storage medium
CN113963234B (en) * 2021-10-25 2024-02-23 北京百度网讯科技有限公司 Data annotation processing method, device, electronic equipment and medium
TWI826876B (en) * 2022-01-14 2023-12-21 信義房屋股份有限公司 Inquiry device based on garbage removal routes
CN114780861B (en) * 2022-06-20 2022-10-21 上海二三四五网络科技有限公司 Clustering technology-based user multi-interest recommendation method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2518679A1 (en) * 2011-04-26 2012-10-31 YooChoose GmbH Method and system fo recommending geo-tagged items
CN102789499A (en) * 2012-07-16 2012-11-21 浙江大学 Collaborative filtering method on basis of scene implicit relation among articles
CN103294812A (en) * 2013-06-06 2013-09-11 浙江大学 Commodity recommendation method based on mixed model
CN106850750A (en) * 2016-12-26 2017-06-13 北京五八信息技术有限公司 A kind of method and apparatus of real time propelling movement information
CN107256512A (en) * 2017-06-08 2017-10-17 贵州优联博睿科技有限公司 One kind house-purchase personalized recommendation method and system
CN108256067A (en) * 2018-01-16 2018-07-06 平安好房(上海)电子商务有限公司 Calculate method, apparatus, equipment and the storage medium of source of houses similarity
CN109670113A (en) * 2018-12-20 2019-04-23 重庆锐云科技有限公司 A kind of source of houses recommended method, device and server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10932003B2 (en) * 2015-01-27 2021-02-23 The Toronto-Dominion Bank Method and system for making recommendations from binary data using neighbor-score matrix and latent factors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2518679A1 (en) * 2011-04-26 2012-10-31 YooChoose GmbH Method and system fo recommending geo-tagged items
CN102789499A (en) * 2012-07-16 2012-11-21 浙江大学 Collaborative filtering method on basis of scene implicit relation among articles
CN103294812A (en) * 2013-06-06 2013-09-11 浙江大学 Commodity recommendation method based on mixed model
CN106850750A (en) * 2016-12-26 2017-06-13 北京五八信息技术有限公司 A kind of method and apparatus of real time propelling movement information
CN107256512A (en) * 2017-06-08 2017-10-17 贵州优联博睿科技有限公司 One kind house-purchase personalized recommendation method and system
CN108256067A (en) * 2018-01-16 2018-07-06 平安好房(上海)电子商务有限公司 Calculate method, apparatus, equipment and the storage medium of source of houses similarity
CN109670113A (en) * 2018-12-20 2019-04-23 重庆锐云科技有限公司 A kind of source of houses recommended method, device and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户网络嵌入的民宿房源推荐方法;刘彤等;《计算机应用》;20191110;全文 *

Also Published As

Publication number Publication date
CN111611499A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111611499B (en) Collaborative filtering method, collaborative filtering device and collaborative filtering system
US9569499B2 (en) Method and apparatus for recommending content on the internet by evaluating users having similar preference tendencies
Sieg et al. Improving the effectiveness of collaborative recommendation with ontology-based user profiles
WO2018040069A1 (en) Information recommendation system and method
CN106708844A (en) User group partitioning method and device
CN104866474A (en) Personalized data searching method and device
WO2009148621A1 (en) Associative memory operators, methods and computer program products for using a social network for predictive marketing analysis
CN107292648A (en) A kind of user behavior analysis method and device
CN108415913A (en) Crowd&#39;s orientation method based on uncertain neighbours
CN104239335B (en) User-specific information acquisition methods and device
Li et al. Social recommendation based on trust and influence in SNS environments
CN110795613B (en) Commodity searching method, device and system and electronic equipment
CN114036376A (en) Time-aware self-adaptive interest point recommendation method based on K-means clustering
CN116166878A (en) Time perception self-adaptive interest point recommendation method based on K-means clustering
CN113656699B (en) User feature vector determining method, related equipment and medium
CN109299368B (en) Method and system for intelligent and personalized recommendation of environmental information resources AI
CN113239266A (en) Personalized recommendation method and system based on local matrix decomposition
Zhang et al. The approaches to contextual transaction trust computation in e‐Commerce environments
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
WO2020135420A1 (en) Method and apparatus for classifying users
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
CN104794135A (en) Method and device for carrying out sorting on search results
Yang et al. Personalized recommendation based on collaborative filtering in social network
CN106919653B (en) Log filtering method based on user behavior
CN114022233A (en) Novel commodity recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201102

Address after: 100085 Floor 102-1, Building No. 35, West Second Banner Road, Haidian District, Beijing

Applicant after: Seashell Housing (Beijing) Technology Co.,Ltd.

Address before: 300280 unit 05, room 112, floor 1, building C, comprehensive service area, Nangang Industrial Zone, Binhai New Area, Tianjin

Applicant before: BEIKE TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant