Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a technology resource and service recommendation system based on a hybrid recommendation algorithm includes a data preprocessing module, a user semantic description module, a personalized technology resource recommendation module, an enterprise similarity calculation module, a personalized service recommendation module, and a technology resource and service potential combination mining module.
Each module is specifically described below in detail.
The data preprocessing module takes technical resource records browsed by a user, enterprise service records purchased by the user and enterprise registration information as input. Wherein, technical resource records are browsed by a user, enterprise service records purchased by the user are extracted from a server log file through a regular expression; the enterprise registration information is obtained from a database. And then clustering by using a density-based clustering algorithm DBSCAN according to the time generated by the user behavior record to obtain a user behavior cluster.
Wherein:
the user browsing technical resource records and the enterprise information vectors are respectively used for the user semantic description module and the enterprise similarity calculation module, and the obtained user browsing technical resource record cluster and the user purchasing enterprise service record cluster are provided for the technical resource and service potential combination mining module.
The data preprocessing module firstly reads in technical resource records browsed by a user, enterprise service records purchased by the user and enterprise registration information, wherein the technical resource records browsed by the user comprise information such as user ID, technical resource ID, recording time and technical resource browsing amount; the user purchase enterprise service record includes information such as a user ID, a service ID, a recording time, and the like.
According to the project requirements of users in a period of time or the assumption that interested contents are concentrated in a certain limited field, the records are clustered based on the density according to the time to obtain a cluster.
The enterprise registration information includes the enterprise-oriented domain, the registered fund of the enterprise, the number of employees of the enterprise, the establishment time of the enterprise, the business turnover, the profit margin of the enterprise, and the like.
One-hot coding is carried out on characteristics such as enterprise-oriented fields, and characteristics such as registered fund of enterprises, number of employees of enterprises, establishment time of enterprises, business turnover and profit rate of enterprises are standardized to obtain enterprise information vectors.
When the user semantic description module calculates the weight of a certain technical resource, a certain punishment is carried out according to the browsing amount of the technical resource and is used for providing the user semantic description module for the personalized technical resource recommendation module. Because many users browse popular technical resources only because of the popular technical resources, the individuality of the users cannot be reflected; conversely, if a user browses cold technical resources, a general probability may determine the area of interest to the user.
The user semantic description module models the user in the form of tags. Firstly, the top N keywords with the highest scores are extracted by a TextRank algorithm when each technical resource is uploaded and are used as tags of the technical resource. After the user browses the technical resource, the same label as the technical resource label in the user label is added with a certain weight (if the label is not in the user label, the label is added).
The method comprises the following specific steps:
and extracting the technical resource key words by adopting a TextRank algorithm because the technical resources are stored and presented in a text form. Specifically, the top N keywords with the highest score are obtained as the tags of the technical resource, and the score of each keyword is normalized as the weight of the corresponding tag.
Assume that user u has a label set of LuThe label set of the technical resource i is LiThe browsing volume of the technical resource i is SiThe weight of the label a in the technical resource i is wiaThen the weight of tag a in user u is increased by a value Δ wuaIs composed of
If the user does not browse the technical resources related to a certain label within a period of time, the weight of the label is attenuated, so as to reflect the recent interest of the user. Assuming that a user has t time units and does not browse technical resources of a certain label, the weight before attenuation of the label is w, the weight attenuation factor is alpha, and the weight after attenuation is w
Because some users like a large amount of browsing technical resources, and the number of user labels is large, the first K labels with the maximum user label weight form a semantic description vector of the user.
The personalized technical resource recommendation module takes the user semantic description vector as input, calculates M technical resources most similar to the user semantic description vector through cosine similarity, and adds the M technical resources into a technical resource recommendation list of the user.
The input of the personalized technical resource recommendation module is a semantic description vector V of a user uuLet the weight of tag a in the vector be wuaThe weight of the label a in the technical resource i is wiaThen the preference degree p (u, i) of the user u to the technical resource i is
And after the preference degrees of the user to all the technical resources are obtained through calculation, recommending the top M technical resources with the highest preference degrees to the user.
The enterprise similarity calculation module comprises two similarity calculation methods which are respectively used for calculating the enterprise similarity by using enterprise registration information; and calculating the enterprise similarity by using the contact ratio of the enterprise user group. When the first method is used for recommending the system to be started in a cold state, the newly added enterprises cannot calculate the similarity of the enterprises by using the second method because the newly added enterprises do not have client groups.
The first method inputs enterprise information vectors obtained by a data preprocessing module, and the similarity of two enterprises is obtained through cosine similarity calculation. Assume that the information vector of enterprise e1 is Ve1The information vector of enterprise e2 is Ve2Then degree of similarity
And calculating the user group contact ratio between the two enterprises as the enterprise similarity, wherein the user group contact ratio can be calculated by using the Jaccard similarity. Specifically, the user group set input into the enterprise, that is, the users who have used the service provided by the enterprise, obtains the similarity of the two enterprises through the Jaccard similarity calculation. Suppose the user population of enterprise e1 is Ue1The user group of the enterprise e2 is Ue2Then degree of similarity
The personalized service recommendation module realizes the functions based on the similar enterprise calculation module. And according to the enterprise to which the service historically purchased by the user belongs, obtaining N enterprises closest to the enterprise through an enterprise similarity calculation module, and recommending hot services of the N enterprises to the user.
As shown in fig. 2:
the technical resource and service potential combination mining module firstly searches a cluster of user browsing technical resources near the time point according to the time of purchasing enterprise service by the user, and stores the service ID and the technical resource ID in the cluster into a set.
After a large number of sets consisting of the service IDs and the technical resource IDs are obtained, a frequent item set is obtained through calculation of an association rule algorithm FPgrowth (existing algorithm), the sets indicate technical resources which can be browsed by a plurality of users before and after a certain service is purchased, and this indicates that the service and the technical resources have strong correlation, namely, the service and the technical resources can be combined.
On the basis of obtaining a plurality of frequent itemses, if a user browses technical resources in the frequent itemsets, recommending corresponding enterprise services to the user; and if the user purchases the enterprise service, recommending the corresponding technical resource to the user.