CN114610859A

CN114610859A - Product recommendation method, device and equipment based on content and collaborative filtering

Info

Publication number: CN114610859A
Application number: CN202210435260.6A
Authority: CN
Inventors: 徐滨
Original assignee: Kangjian Information Technology Shenzhen Co Ltd
Current assignee: Kangjian Information Technology Shenzhen Co Ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-06-10
Also published as: WO2023206960A1

Abstract

The application discloses a product recommendation method, device and equipment based on content and collaborative filtering, relates to the technical field of internet, and can solve the problems that when a user searches for a target product in a large number of products, the accuracy is low and the personalized preference of the user cannot be met. The method comprises the following steps: calculating a first similarity between a product query text and a preset product text by using a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text; calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set of the recommended product text, and forming a first recommendation result according to the descending order of the weighted similarity value; calculating the score of the inquiry user on the historical behavior set based on a collaborative filtering algorithm, and determining a second recommendation result according to the score; and determining a target product recommendation result according to the first recommendation result and the second recommendation result.

Description

Product recommendation method, device and equipment based on content and collaborative filtering

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, and a device for recommending a product based on content and collaborative filtering.

Background

In the development process of applying informatization technology in the medical industry, a large amount of complex data is included in a medical product library, and different user preferences are different, so that products meeting the requirements of users are difficult to accurately search from the large amount of data.

At present, fuzzy search is carried out through keywords, obtained fuzzy search results are ranked and recommended to a user according to the sequence of historical access times from high to low, but on one hand, the product accuracy queried through the method of fuzzy query through the keywords is not high, and on the other hand, the personalized preference of the user cannot be met through recommendation according to the sequence of the access times.

Disclosure of Invention

In view of this, the application provides a product recommendation method, device and equipment based on content and collaborative filtering, relates to the technical field of internet, and can solve the problems that a user has low accuracy and cannot meet personalized preference of the user when searching for a target product in a large number of products.

According to one aspect of the application, a product recommendation method based on content and collaborative filtering is provided, and the method comprises the following steps:

the method comprises the steps of obtaining a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text;

extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small;

determining neighbor users with the correlation with the behavior of the inquiry user higher than a first preset threshold value and historical behavior sets of the neighbor users for products, calculating scores of the inquiry user for the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores;

and determining a target product recommendation result according to the first recommendation result and the second recommendation result.

According to another aspect of the present application, there is provided a product recommendation apparatus based on content and collaborative filtering, the apparatus including:

the screening module is used for acquiring a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text;

the first recommending module is used for acquiring an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommending result according to the sequence of the weighted similarity values from large to small;

the second recommending module is used for determining neighbor users of which the correlation with the behaviors of the inquiring user is higher than a first preset threshold value and historical behavior sets of the neighbor users for products, calculating scores of the inquiring user for the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommending result according to the scores;

and the determining module is used for determining a target product recommendation result according to the first recommendation result and the second recommendation result.

According to yet another aspect of the present application, there is provided a non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, implements the content and collaborative filtering based product recommendation method described above.

According to yet another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor, and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the content and collaborative filtering based product recommendation method when executing the program.

By means of the technical scheme, the application discloses a product recommendation method, a device and equipment based on content and collaborative filtering, the method comprises the steps of firstly obtaining a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text; further, extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small; in addition, determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and historical behavior sets of the neighbor users for the products, calculating scores of the query user on the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores; and finally, determining a target product recommendation result according to the first recommendation result and the second recommendation result. According to the technical scheme, a first recommendation result for the target product is obtained from a product query text, a second recommendation result for the target product is obtained from the perspective of a neighbor user with high correlation with the behavior of the query user, the target product recommendation result is determined by using the first recommendation result and the second recommendation result, the target product recommendation result is integrated into the query user recommendation through multiple dimensions, the recommendation accuracy is high, and the individual requirements of the query user are met.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:

FIG. 1 is a flow chart illustrating a method for recommending a product based on content and collaborative filtering according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating another method for recommending a product based on content and collaborative filtering according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a product recommendation device based on content and collaborative filtering according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another product recommendation device based on content and collaborative filtering according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In view of the current problems, an embodiment of the present application provides a product recommendation method based on content and collaborative filtering, as shown in fig. 1, the method includes:

101. the method comprises the steps of obtaining a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text.

The target product is a product required by a query user, the product query text is a product specification and the like related to the target product, the preset product text exists in the product database and is used for being matched with the product query text, and then the preset product text meeting the condition that the first similarity is larger than a preset similarity threshold value is determined, wherein the preset product text can also be the product specification and the like.

For the embodiment, the predetermined word segmentation technology may be any one of the existing word segmentation technologies, such as a CRF word segmentation device, an IKAnalyzer word segmentation device, and the like. The method comprises the steps of performing word segmentation processing on a product query text through a preset word segmentation technology to obtain a product query feature word set comprising at least one product query feature word, calculating a product query feature vector corresponding to the product query feature word set through a TF-IDF algorithm, performing word segmentation processing on the preset product text through the preset word segmentation technology to obtain a preset product feature word set comprising at least one preset product feature word, and calculating a preset product feature vector corresponding to the preset product feature word set through the TF-IDF algorithm.

Among them, TF-IDF is a commonly used information weighting technique, and is generally applied to the fields of information retrieval and data mining. The TF-IDF value can be used to evaluate whether a feature word in a text is a keyword of the text, and the larger the TF-IDF value is, the more important the feature word is to the text is, that is, the feature word is a keyword of the text, and the word frequency TF of a feature word in the text is high and does not represent the keyword of the text, so the TF-IDF value is the product of the word frequency TF of a feature word in the text and the inverse document frequency IDF corresponding to the feature word, for example, for the most common feature word, "the smallest IDF is given, and for the rare feature words," influenza, virus ", etc., the larger IDF is given. Therefore, the product query feature vector calculated through the TF-IDF algorithm comprises a TF-IDF value corresponding to each product query feature word, and the preset product feature vector calculated through the TF-IDF algorithm comprises a TF-IDF value corresponding to each preset product feature word.

Further, a preset similarity calculation formula is used for calculating a first similarity between the product query feature vector and the preset product feature vector, the preset similarity calculation formula may include a cosine calculation formula, and the cosine calculation formula is described as follows:

in the formula, the first step is that,

the feature vectors are queried for the product,

the feature vector of the preset product is used as the feature vector,

is the first similarity.

After the first similarity between the product query feature vector and each preset product feature vector is calculated, the preset product texts corresponding to the preset product texts with the first similarities larger than the preset similarity threshold are screened out and determined as recommended product texts, and preliminary screening is performed on a large number of preset product texts, so that the preliminarily screened recommended product texts are further matched with the product query texts, and the product recommendation efficiency and accuracy are improved.

102. And extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the descending order of the weighted similarity value.

The recommended product feature words after word segmentation processing on the recommended product text are grouped according to the fact that the recommended product feature words belong to identification words or descriptive words, the recommended product feature words belonging to the identification words form an identification feature word set, and the recommended product feature words belonging to the descriptive words form a description feature word set, for example, if the first feature word of the recommended product text 1 belongs to a descriptive word, the first feature word is classified into the description feature word set, and if the second feature word of the recommended product text 1 belongs to the identification words, the second feature word is classified into the identification feature word set …, so that the identification feature word set { second feature word, third feature word and fifth feature word … } of the recommended product text 1, and the description feature word set { first feature word, fourth feature word and sixth feature word … } of the recommended product text 1 are provided, wherein the identification words such as "flu, word, the virus 'and the descriptive words such as' are, are and 'are', so the classification has the beneficial effect of improving the accuracy of the first recommendation result by giving a smaller proportion of the descriptive words and a larger proportion of the identifying words to each recommended product text.

For the embodiment, the weighted similarity value between the recommended product text and the product query text is calculated according to the identification feature word set and the description feature word set, that is, a smaller proportion is given to the description feature word set and a larger proportion is given to the identification feature word set, and the weighted similarity value between each recommended product text and the product query text is calculated again so as to order a plurality of recommended product texts to obtain the first recommendation result. Example steps for calculating a weighted similarity value may include: and calculating a first intersection of the identification characteristic word set and the product query characteristic word set and a second intersection of the description characteristic word set and the product query characteristic word set. And calculating a third weight value of the identification feature word set relative to the product query feature word set according to the first intersection, calculating a fourth weight value of the description feature word set relative to the product query feature word set according to the second intersection, and weighting the third weight value and the fourth weight value by using a preset coefficient to obtain a weighted similarity value of the recommended product text and the product query text.

The first recommendation result comprises each recommended product text and the corresponding weighted similarity value, the recommended product texts are sorted according to the sequence of the weighted similarity values from large to small, and the recommended product text corresponding to the largest weighted similarity value is arranged at the first position of the first recommendation result.

103. Determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and historical behavior sets of the neighbor users for products, calculating scores of the query user on the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores.

The collaborative filtering algorithm discovers the preference bias of an inquiring user based on the mining of the historical behavior data of the inquiring user and predicts the products required by the inquiring user, and the main implementation mode comprises the following steps: and calculating neighbor users having common requirements with the inquiry user, and recommending according to historical behavior data of the neighbor users. The collaborative filtering algorithm can still help the inquiring user to recommend when no product inquiry text exists or the product inquiry text is not accurate enough.

For this embodiment, the neighbor users having a common requirement with the querying user are represented by neighbor users having a behavior correlation with the querying user higher than a first preset threshold, specifically, the correlation coefficients between the querying user and other users are calculated according to a preset correlation coefficient calculation formula, such as a cosine calculation formula, the calculated correlation coefficients are compared with the first preset threshold, and other users having a correlation number higher than the first preset threshold are taken as the neighbor users.

The set of historical behaviors of the neighbor user for the product includes: the method comprises the following steps of a behavior set to be predicted and an adjacent behavior set of the behavior set to be predicted, wherein the behavior set to be predicted is a behavior existing in the behavior set of a neighbor user but not existing in the behavior set of a query user. The determination of the adjacent behavior set of the behavior set to be predicted specifically includes: and calculating second similarity of the behavior set to be predicted and other behavior sets according to a k-nearest neighbor algorithm or a k-means algorithm, and determining the other behavior sets with the second similarity larger than a second preset threshold value as adjacent behavior sets. And finally, calculating the scores of the inquiry users on the historical behavior set based on a collaborative filtering algorithm, wherein the method specifically comprises the following steps: and calculating a first score of the query behavior user on the to-be-predicted set based on a collaborative filtering algorithm of the user, calculating a second score of the query user on the adjacent behavior set based on a collaborative filtering algorithm of the terms, finally, calculating the first score and the second score in a weighted mode to obtain the score of the query user on the historical behavior set, and determining a second recommendation result according to the score.

The second recommendation includes a ranking of the scores and the products corresponding to the scores, with the largest score and the product corresponding to the largest score ranked first in the second recommendation.

104. And determining a target product recommendation result according to the first recommendation result and the second recommendation result.

For the embodiment, as a preferred implementation manner, the target product recommendation result may be obtained by performing weighted calculation on the first recommendation result and the second recommendation result. The method comprises the steps that a first recommendation result calculated according to a preset word segmentation technology and a TF-IDF algorithm is obtained based on text content, a second recommendation result calculated according to a collaborative filtering algorithm is obtained based on user behavior, recommendation results obtained by the two dimensions are combined through a weighting method, a target product recommendation result is obtained comprehensively, and the method is more accurate than a recommendation result obtained by a single dimension.

The application discloses a product recommendation method, a device and equipment based on content and collaborative filtering, which comprises the steps of firstly obtaining a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold as a recommended product text; further, extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small; in addition, determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and historical behavior sets of the neighbor users for the products, calculating scores of the query user on the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores; and finally, determining a target product recommendation result according to the first recommendation result and the second recommendation result. According to the technical scheme, a first recommendation result for the target product is obtained from a product query text, a second recommendation result for the target product is obtained from the perspective of a neighbor user with high correlation with the behavior of the query user, the target product recommendation result is determined by using the first recommendation result and the second recommendation result, the target product recommendation result is integrated into the query user recommendation through multiple dimensions, the recommendation accuracy is high, and the individual requirements of the query user are met.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the implementation process in this embodiment, another product recommendation method based on content and collaborative filtering is provided, as shown in fig. 2, and the method includes:

201. the method comprises the steps of obtaining a product query text aiming at a target product sent by a query user, carrying out word segmentation processing on the product query text according to a preset word segmentation technology to obtain a product query feature word set, and carrying out word segmentation processing on the preset product text to obtain a preset product feature word set.

For this embodiment, in a specific application scenario, the product query text may include: the product instruction book, the medical diagnosis book and the like utilize a preset word segmentation technology to perform word segmentation processing on a product query text, and specifically comprise the following steps: the method comprises the steps of obtaining a product query feature word set { product query feature word 1, product query feature word 2 and product query feature word 3 … }, wherein, for example, a medical diagnosis book comprises a detailed description text of disease conditions, and after word segmentation processing, the product query feature word set { oral ulcer, recurrent outbreak, facial symmetry and recurrent aphtha … } are obtained, and word segmentation processing is carried out on each preset product text to obtain a corresponding preset product feature word set { preset product feature word 1, preset product feature word 2 and preset product feature word 3 … }, for example, the preset product feature word set 1{ recurrent oral ulcer, herpetic oral ulcer and immune suppression function … }.

202. And calculating a product query feature vector corresponding to the product query feature word set and a preset product feature vector corresponding to the preset product feature word set by using a TF-IDF algorithm.

As a preferred embodiment, the present embodiment may include: calculating a first weight value of each product query characteristic word in the product query characteristic word set to a product query text and a second weight value of each preset product characteristic word in the preset product characteristic word set to a preset product text; and constructing a product query feature vector comprising the product query feature words and corresponding to the first weight value, and constructing a preset product feature vector comprising the preset product feature words and corresponding to the second weight value.

Specifically, the TF-IDF algorithm comprises word frequency TF calculation and inverse document frequency IDF calculation, and further, the word frequency TF and the inverse document frequency IDF are multiplied to obtain a weight value of the feature word to the text.

The word frequency of the product query characteristic words represents the frequency of the product query characteristic words appearing in the product query characteristic word set, and each word frequency is standardized because each product query text has a difference of length, so that the word frequency is divided in a word frequency calculation formulaBy Σ_kn_k,j。

The word frequency calculation formula is described as:

wherein, i represents a product query feature word, j represents a product query feature word set, tf_i,jRepresenting the word frequency, n, of i in the set j_i,jRepresents the number of times i appears in set j, Σ_kn_k,jRepresenting the sum of the number of occurrences of all words in set j.

The inverse document frequency calculation formula is described as:

wherein i represents a product query feature word, j represents a product query feature word set, idf_iRepresenting the inverse document frequency of i in set j, | D | representing the total number of product texts in the database, | { j: t |_j∈d_jAnd | represents the number of product texts in which the product query feature word i appears. I { j: t_j∈d_jThe smaller the value of IDF is, the better the text distinguishing effect of the product query feature word i is. Conversely, the smaller the IDF value, the less effective the text discrimination of the product query feature words.

The TF-IDF value calculation formula is described as: tfidf_i,j＝tf_i,j×idf_i

Wherein tfidf_i,jAnd inquiring the first weight value of the characteristic word i for the product.

The specific implementation process of calculating the second weight value of each preset product feature word by using the TF-IDF algorithm may refer to a process of calculating the first weight value of each product query feature word by using the TF-IDF algorithm.

Constructing a product query feature vector: { (product query feature term 1, first weight value of product query feature term 1), (product query feature term 2, first weight value of product query feature term 2), (product query feature term 3, first weight value of product query feature term 3) … }.

Similarly, constructing a preset product feature vector: { (preset product feature word 1, preset second weight value of product feature word 1), (preset product feature word 2, preset second weight value of product feature word 2), (preset product feature word 3, preset second weight value of product feature word 3) … }.

203. And calculating the first similarity between the product query feature vector and the preset product feature vector by using a preset similarity calculation formula, and determining a preset product text corresponding to the condition that the first similarity is greater than a preset similarity threshold value as a recommended product text.

For this embodiment, the preset similarity calculation formula may include a cosine calculation formula, where the cosine calculation formula is described as:

in the formula, the first step is that,

the feature vectors are queried for the product,

the feature vector of the preset product is used as the feature vector,

is the first similarity.

204. Extracting an identification characteristic word set and a description characteristic word set of the recommended product text, and calculating a first intersection of the identification characteristic word set and the product query characteristic word set and a second intersection of the description characteristic word set and the product query characteristic word set.

For this embodiment, the recommended product feature words obtained after the word segmentation processing of the recommended product text may be grouped according to whether the recommended product feature words belong to the identifying words or the descriptive words, the recommended product feature word set is divided into the identifying feature word set and the descriptive feature word set, a first intersection of the identifying feature word set and the product query feature word set is calculated, taking the recommended product text 1 as an example, if a second feature word existing in the identifying feature word set does not intersect with the product query feature word set, the first intersection does not include the second feature word, and if a third feature word existing in the identifying feature word set intersects with the product query feature word set, the first intersection includes the third feature word, so the first intersection is { third feature word … }, and the same principle is that a second intersection of the descriptive feature word set and the product query feature word set is calculated.

205. And calculating a third weight value of the identification characteristic word set relative to the product query characteristic word set according to the first intersection, and calculating a fourth weight value of the description characteristic word set relative to the product query characteristic word set according to the second intersection.

For this embodiment, the third weight value calculation process is: tfidf_w＝∑_t∈ntf_t,w×idf_t,w

Wherein tfidf_wAnd n represents a first intersection of the identification feature word set w and the product query feature word set j. tf is_t,wDenotes the TF value, idf, of the identifying token t in w_t,wRepresenting the IDF value of the identifying token t in w.

The fourth weight value calculation process is as follows: tfidf_v＝∑_t∈mtf_t,v×idf_t,v

Wherein tfidf_vAnd m represents a second intersection of the characteristic word set v and the product query characteristic word set j. tf is_t,vDenotes the TF value, idf, of the descriptive feature word t in v_t,vDenotes the IDF value of the feature word t in v.

206. And weighting the third weight value and the fourth weight value by using a preset coefficient to obtain a weighted similarity value of the recommended product text and the product query text, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small.

For the embodiment, the calculation process of the weighted similarity value of each recommended product text and the product query text is as follows: c ═ λ tfidf_w+(1-λ)tfidf_v

Wherein λ is a predetermined coefficient, C is a weighted similarity value, tfidf_wRepresents a third weight value, tfidf_vRepresenting a fourth weight value.

The first recommendation result includes a weighted similarity value and a recommended product text corresponding to the weighted similarity value, where the first order of the first recommendation result is the largest weighted similarity value and the recommended product text corresponding to the largest weighted similarity value, for example, the first recommendation result includes: { (triamcinolone acetonide ointment, 0.5), (domiphen bromide buccal tablet, 0.3), (lidocaine gel, 0.15) }.

The function of weighting the third weight value and the fourth weight value by using the preset coefficient to obtain the weighted similarity value of the recommended product text and the product query text is as follows: by giving a small coefficient, namely (1-lambda), to the descriptive words of each recommended product text and a large coefficient, namely lambda, to the identifying words of each recommended product text, interference of the descriptive words on the first recommendation result is reduced, and therefore accuracy of the first recommendation result is improved.

And on one hand, the recommendation is made according to the product query text of the query user, and the recommendation is irrelevant to the personal data of the query user, so that the problems of cold start and new users do not exist. On the other hand, each preset product text has the possibility of being recommended and is irrelevant to the warehousing time and the sequence of the preset product text information, so that the problem of new projects does not exist. Finally, compared with a method for directly taking the keywords to perform fuzzy query in the database, the method is based on content text search closer to the requirement of the query user, so that the first recommendation result is more accurate.

207. And determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and a historical behavior set of the neighbor users for the product.

The historical behavior set comprises a behavior set of a neighbor user, a behavior set to be predicted, which is different from the behavior set of the query user, and an adjacent behavior set adjacent to the behavior set to be predicted.

For this embodiment, as a preferred implementation manner, the step of determining the neighbor users whose correlation with the query user behavior is higher than the first preset threshold includes: and calculating the correlation coefficient between the query user and other users by using a preset correlation coefficient calculation formula, and determining other users with the correlation number larger than a first preset threshold value as neighbor users. Specifically, the preset correlation coefficient calculation formula may be represented as:

set of co-invoked products, r_u,pAnd r_b,pRespectively representing the historical number of calls u and b to the commonly called product p,

and

respectively represent u and b to I_u∩I_bAverage historical invocation times of products in the collection.

The larger the value of S (u, b), the larger the correlation coefficient between u and b. The value interval of S (u, b) is generally [ -1, +1 ]. And determining other users b with S (u, b) larger than a preset first threshold value as neighbor users h.

Specific embodiment steps for determining that the behavior set of the neighbor user is different from the behavior set of the querying user to be predicted include: and determining the difference between the behavior set of the neighbor user and the behavior set of the query user as a behavior set to be predicted. Specifically, the behavior set to be predicted exists in the behavior set of the neighbor user, but does not exist in the behavior set of the query user.

Specific embodiment steps for determining a set of adjacent behaviors adjacent to a set of behaviors to be predicted include: and calculating second similarity of the behavior set to be predicted and other behavior sets according to a k-nearest neighbor algorithm or a k-means algorithm, and determining other behavior sets with the second similarity larger than a second preset threshold value as adjacent behavior sets. Specifically, the k-nearest neighbor algorithm or the k-means algorithm can be referred to in the prior art, and is not described herein again.

208. And calculating a first score of the query behavior to the user for the prediction set based on the collaborative filtering algorithm of the user.

The main idea of the collaborative filtering algorithm based on the user is as follows: and finding the neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value based on the similarity between the historical behaviors of the query user and the historical behaviors of the neighbor users, so that the historical behaviors of the neighbor users but not the historical behaviors of the query user can be the historical behaviors of the query user. In this embodiment, the historical behaviors that the neighbor user has but not the query user are represented in the set to be predicted.

For the present embodiment, the first score calculation formula is described as:

wherein p (u, i) represents a first score of a query user u for a product i in the set of behaviors to be predicted,

represents the average number of times of calling u to the product by the query user, s (u, h) represents the correlation coefficient of u and h, and p_h,qRepresents the score of the neighbor user h to i, and n is the number of the neighbor users.

209. And calculating a second score of the query user on the adjacent behavior set by the term-based collaborative filtering algorithm, and determining a second recommendation result according to the first score and/or the second score.

Wherein the adjacent behavior set is an adjacent behavior set adjacent to the behavior set to be predicted. For this embodiment, the second score calculation formula is described as:

wherein p (u, i) represents a second score of the querying user u for the product i in the set to be predicted,

represents the average number of calls i by the neighbor user h_kIs a set of adjacent behaviors that are,

for neighbor user h to neighbor behavior set i_kAverage number of calls of s (i, i)_k) Represents i and i_kIs determined to be the second degree of similarity of (c),

invoking neighboring behavior set i on behalf of querying user u_kThe number of times.

As an embodiment, the second recommendation result may include only the first score, only the second score, or a weighted calculation of the first score and the second score, where the weighted calculation of the first score and the second score is used to: and two collaborative filtering algorithms are combined, so that the accuracy of the second recommendation result is improved.

The calculation formula for weighting the first score and the second score is described as: p ═ μ P (u, i) + (1- μ) P (u, i)

Where μ is the weighting coefficient and P is the score.

Sorting products corresponding to the adjacent behavior sets from large to small according to the scores to obtain a second recommendation result, wherein the second recommendation result is specifically as follows: product 1, score 16; product 7, score 10; product 5, score 3 …

Further, before sorting products corresponding to adjacent behavior sets from large to small according to the scores, the method further comprises the following steps: products with scores less than or equal to 0 are deleted. Because the querying user will not search for products with a score less than 0, only products with a score greater than 0 are saved.

210. And determining a target product recommendation result according to the first recommendation result and the second recommendation result.

Before performing weighting processing on the first recommendation result obtained in step 206 and the second recommendation result obtained in step 209, the method further includes: the scores are compressed to the (0, 1) range using normalization.

For example, the second recommendation is: product 1, score 16 before normalization; product 7, score 10 before normalization; product 5, pre-normalization score 3 …

The normalized second recommendation result is: product 1, score 0.53, product 7, score 0.33, product 5, score 0.1 …

For this embodiment, as a preferred implementation, a union product of the product corresponding to the first recommendation result and the product corresponding to the second recommendation result is calculated; weighting the weighted similarity value and the score of the product by using a preset third coefficient to obtain a target recommendation value; and sequencing according to the target recommendation values from large to small to obtain a target product recommendation result.

For example, the first recommendation result calculates a weighted similarity value between the product and the corresponding product, and the second recommendation result calculates a weighted similarity value between the product and the corresponding score, for example, if the weighted similarity value of the product 1 in the first recommendation result is 0.5, and the weighted similarity value of the product 1 in the second recommendation result is 0.53, the target recommendation value is 0.5 +0.53 (1-a preset third coefficient), and if the weighted similarity value of the product 2 in the first recommendation result is 0.1, but there is no product 2 in the second recommendation result, the target recommendation value is 0.1 +0 (1-a preset third coefficient).

The application discloses a product recommendation method, a product recommendation device and product recommendation equipment based on content and collaborative filtering, wherein the method comprises the steps of firstly obtaining a product query text aiming at a target product sent by a query user, calculating a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determining the preset product text corresponding to the first similarity larger than a preset similarity threshold value as a recommended product text; further, extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small; in addition, determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and historical behavior sets of the neighbor users for the products, calculating scores of the query user on the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores; and finally, determining a target product recommendation result according to the first recommendation result and the second recommendation result. According to the technical scheme, a first recommendation result for the target product is obtained from the product query text, a second recommendation result for the target product is obtained from the perspective of a neighbor user with high correlation with the behavior of the query user, the first recommendation result and the second recommendation result are used for jointly determining the recommendation result of the target product, and the target product is recommended for the query user through multi-dimension integration, so that the recommendation accuracy is high, and the personalized requirements of the query user are met.

Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present application provides a product recommendation device based on content and collaborative filtering, as shown in fig. 3, the device includes: the system comprises a screening module 31, a first recommending module 32, a second recommending module 33 and a determining module 34;

the screening module 31 is configured to obtain a product query text for a target product sent by a query user, calculate a first similarity between the product query text and a preset product text by using a preset word segmentation technology and a TF-IDF algorithm, and determine the preset product text corresponding to the first similarity greater than a preset similarity threshold as a recommended product text;

the first recommending module 32 is configured to extract an identification feature word set and a description feature word set of a recommended product text, calculate a weighted similarity value between the recommended product text and a product query text according to the identification feature word set and the description feature word set, and form a first recommending result according to a descending order of the weighted similarity value;

the second recommending module 33 is configured to determine neighbor users whose correlation with the behavior of the querying user is higher than a first preset threshold and a historical behavior set of the neighbor users for the product, calculate scores of the querying user on the historical behavior set based on a collaborative filtering algorithm, and determine a second recommending result according to the scores;

the determining module 34 may be configured to determine a recommendation result of the target product according to the first recommendation result and the second recommendation result.

In a specific application scenario, in order to calculate a first similarity between a product query text and a preset product text by using a preset word segmentation technique and a TF-IDF algorithm, as shown in fig. 4, the screening module 31 may specifically include: a word segmentation unit 311, a first calculation unit 312 and a second calculation unit 313;

the word segmentation unit 311 is configured to perform word segmentation on the product query text according to a preset word segmentation technology to obtain a product query feature word set, and perform word segmentation on the preset product text to obtain a preset product feature word set;

the first calculating unit 312 is configured to calculate a product query feature vector corresponding to the product query feature word set and a preset product feature vector corresponding to the preset product feature word set by using a TF-IDF algorithm;

the second calculating unit 313 may be configured to calculate a first similarity between the product query feature vector and the predetermined product feature vector by using a predetermined similarity calculation formula.

Correspondingly, in order to calculate the product query feature vector corresponding to the product query feature word set and the preset product feature vector corresponding to the preset product feature word set by using the TF-IDF algorithm, the first calculating unit 312 is specifically configured to calculate a first weight value of each product query feature word pair product query text in the product query feature word set and a second weight value of each preset product feature word pair preset product text in the preset product feature word set; and constructing a product query feature vector comprising the product query feature words and corresponding to the first weight value, and constructing a preset product feature vector comprising the preset product feature words and corresponding to the second weight value.

In a specific application scenario, a weighted similarity value between a recommended product text and a product query text is calculated according to the identification feature word set and the description feature word set, as shown in fig. 4, the first recommending module 32 may specifically include: intersection unit 321, weight unit 322, first weighting unit 323;

an intersection unit 321, configured to calculate a first intersection of the identification feature word set and the product query feature word set, and a second intersection of the description feature word set and the product query feature word set;

a weight unit 322, configured to calculate a third weight value of the identification feature word set relative to the product query feature word set according to the first intersection, and calculate a fourth weight value of the description feature word set relative to the product query feature word set according to the second intersection;

the first weighting unit 323 may be configured to weight the third weight value and the fourth weight value by using a preset coefficient to obtain a weighted similarity value between the recommended product text and the product query text.

In a specific application scenario, the historical behavior set includes a to-be-predicted behavior set in which a behavior set of a neighbor user is different from a behavior set of a query user, and an adjacent behavior set adjacent to the to-be-predicted behavior set, in order to determine a neighbor user whose correlation with a behavior of the query user is higher than a first preset threshold and a historical behavior set of the neighbor user for a product, as shown in fig. 4, the second recommending module 33 may specifically include: a first screening unit 331, a first determining unit 332, a second screening unit 333;

the first screening unit 331 is configured to calculate correlation coefficients of the querying user and other users by using a preset correlation coefficient calculation formula, and determine other users whose correlation coefficients are greater than a first preset threshold as neighboring users;

a first determining unit 332, configured to determine a difference between the behavior set of the neighbor user and the behavior set of the querying user as a behavior set to be predicted;

the second screening unit 333 may be configured to calculate a second similarity between the behavior set to be predicted and another behavior set according to a k-nearest neighbor algorithm or a k-means algorithm, and determine another behavior set with the second similarity greater than a second preset threshold as an adjacent behavior set.

In a specific application scenario, the score of the query user on the historical behavior set is calculated based on a collaborative filtering algorithm, and a second recommendation result is determined according to the score, as shown in fig. 4, the second recommendation module 33 may further include: a first scoring unit 334, a second scoring unit 335, a second screening unit 336;

the first scoring unit 334 is configured to calculate a first score of the query user on the set of behaviors to be predicted based on a collaborative filtering algorithm of the user;

a second scoring unit 335 operable to calculate a second score for the querying user for the set of neighboring behaviors based on the term-based collaborative filtering algorithm;

the second determining unit 336 may be configured to determine the second recommendation result according to the first score and/or the second score.

In a specific application scenario, the target product recommendation result is determined according to the first recommendation result and the second recommendation result, as shown in fig. 4, the determining module 34 may specifically include: a union unit 341, a second weighting unit 342, and a recommendation unit 343;

the union unit 341 is configured to calculate a union product of the product corresponding to the first recommendation result and the product corresponding to the second recommendation result;

the second weighting unit 342 is configured to weight the weighted similarity value and the score of the product by using a preset third coefficient to obtain a target recommendation value;

and the recommending unit 343 is configured to obtain a target product recommendation result by sorting according to the target recommendation values from large to small.

It should be noted that other corresponding descriptions of the functional units related to the product recommendation device based on content and collaborative filtering provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 2, and are not repeated herein.

Based on the methods shown in fig. 1 to fig. 2, correspondingly, the present embodiment further provides a storage medium, which may be volatile or nonvolatile, and on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the method for recommending a product based on content and collaborative filtering as shown in fig. 1 to fig. 2 is implemented.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the method of the embodiments of the present application.

Based on the method shown in fig. 1 to fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above object, the present embodiment further provides a computer device, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the content and collaborative filtering based product recommendation method as described above with reference to fig. 1-2.

Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.

The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the computer device described above, supporting the operation of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.

Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus an essential general hardware platform, and can also be implemented by means of hardware.

By applying the technical scheme of the application, compared with the prior art, the application discloses a product recommendation method, a device and equipment based on content and collaborative filtering, the application firstly obtains a product query text aiming at a target product sent by a query user, calculates a first similarity between the product query text and a preset product text by utilizing a preset word segmentation technology and a TF-IDF algorithm, and determines the preset product text corresponding to the first similarity larger than a preset similarity threshold as a recommended product text; further, extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small; in addition, determining neighbor users with the correlation with the behavior of the query user higher than a first preset threshold value and historical behavior sets of the neighbor users for the products, calculating scores of the query user on the historical behavior sets based on a collaborative filtering algorithm, and determining a second recommendation result according to the scores; and finally, determining a target product recommendation result according to the first recommendation result and the second recommendation result. According to the technical scheme, a first recommendation result for the target product is obtained from a product query text, a second recommendation result for the target product is obtained from the perspective of a neighbor user with high correlation with the behavior of the query user, the target product recommendation result is determined by using the first recommendation result and the second recommendation result, the target product recommendation result is integrated into the query user recommendation through multiple dimensions, the recommendation accuracy is high, and the individual requirements of the query user are met.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A product recommendation method based on content and collaborative filtering is characterized by comprising the following steps:

2. The method of claim 1, wherein calculating the first similarity between the product query text and the predetermined product text using the predetermined word segmentation technique and the TF-IDF algorithm comprises:

performing word segmentation processing on the product query text according to a preset word segmentation technology to obtain a product query feature word set, and performing word segmentation processing on the preset product text to obtain a preset product feature word set;

calculating a product query feature vector corresponding to the product query feature word set and a preset product feature vector corresponding to the preset product feature word set by using a TF-IDF algorithm;

and calculating the first similarity between the product query feature vector and the preset product feature vector by using a preset similarity calculation formula.

3. The method according to claim 2, wherein the calculating the product query feature vector corresponding to the product query feature word set and the preset product feature vector corresponding to the preset product feature word set by using the TF-IDF algorithm comprises:

calculating a first weight value of each product query characteristic word in the product query characteristic word set to the product query text and a second weight value of each preset product characteristic word in the preset product characteristic word set to the preset product text;

and constructing a product query feature vector comprising the product query feature words and corresponding to the first weight value, and constructing a preset product feature vector comprising the preset product feature words and corresponding to the second weight value.

4. The method of claim 2, wherein said calculating a weighted similarity value between said recommended product text and said product query text based on said set of identifying feature words and said set of describing feature words comprises:

calculating a first intersection of the identification characteristic word set and the product query characteristic word set and a second intersection of the description characteristic word set and the product query characteristic word set;

calculating a third weight value of the identification feature word set relative to the product query feature word set according to the first intersection, and calculating a fourth weight value of the description feature word set relative to the product query feature word set according to the second intersection;

and weighting the third weight value and the fourth weight value by using a preset coefficient to obtain a weighted similarity value of the recommended product text and the product query text.

5. The method of claim 1, wherein the historical behavior set comprises a to-be-predicted behavior set of the neighbor user whose behavior set is different from the behavior set of the querying user, and an adjacent behavior set adjacent to the to-be-predicted behavior set, wherein determining the neighbor user whose correlation with the querying user behavior is higher than a first preset threshold, and the historical behavior set of the neighbor user for the product comprises:

calculating the correlation coefficient between the query user and other users by using a preset correlation coefficient calculation formula, and determining other users with the correlation coefficient larger than a first preset threshold value as neighbor users;

determining the difference between the behavior set of the neighbor user and the behavior set of the inquiry user as a behavior set to be predicted;

and calculating second similarity of the behavior set to be predicted and other behavior sets according to a k-nearest neighbor algorithm or a k-means algorithm, and determining the other behavior sets with the second similarity larger than a second preset threshold value as adjacent behavior sets.

6. The method of claim 5, wherein the computing the score of the query user for the set of historical behaviors based on a collaborative filtering algorithm and determining a second recommendation based on the score comprises:

calculating a first score of the inquiring user on the behavior set to be predicted based on a collaborative filtering algorithm of the user;

calculating a second score of the querying user for the set of neighboring behaviors based on a term-based collaborative filtering algorithm;

and determining a second recommendation result according to the first score and/or the second score.

7. The method of claim 1, wherein determining a target product recommendation based on the first recommendation and the second recommendation comprises:

calculating a union product of the product corresponding to the first recommendation result and the product corresponding to the second recommendation result;

weighting the weighted similarity value and the score of the union product by using a preset third coefficient to obtain a target recommendation value;

and sequencing according to the sequence of the target recommendation values from large to small to obtain a target product recommendation result.

8. A product recommendation device based on content and collaborative filtering, comprising:

the first recommendation module is used for extracting an identification feature word set and a description feature word set of the recommended product text, calculating a weighted similarity value of the recommended product text and the product query text according to the identification feature word set and the description feature word set, and forming a first recommendation result according to the sequence of the weighted similarity values from large to small;

9. A storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the content and collaborative filtering based product recommendation method of any of claims 1 to 7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the content and collaborative filtering based product recommendation method according to any one of claims 1 to 7 when executing the program.