WO2023206960A1 - 基于内容与协同过滤的产品推荐方法、装置及计算机设备 - Google Patents

基于内容与协同过滤的产品推荐方法、装置及计算机设备 Download PDF

Info

Publication number
WO2023206960A1
WO2023206960A1 PCT/CN2022/122200 CN2022122200W WO2023206960A1 WO 2023206960 A1 WO2023206960 A1 WO 2023206960A1 CN 2022122200 W CN2022122200 W CN 2022122200W WO 2023206960 A1 WO2023206960 A1 WO 2023206960A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
preset
text
query
feature word
Prior art date
Application number
PCT/CN2022/122200
Other languages
English (en)
French (fr)
Inventor
徐滨
Original Assignee
康键信息技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 康键信息技术(深圳)有限公司 filed Critical 康键信息技术(深圳)有限公司
Publication of WO2023206960A1 publication Critical patent/WO2023206960A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • This application relates to the field of Internet technology, and in particular to a product recommendation method, device and equipment based on content and collaborative filtering.
  • the medical product library includes a large amount of complex data, and different users have different preferences. Therefore, it is difficult to accurately search for products that meet user needs from a large amount of data.
  • fuzzy search is performed through keywords, and the obtained fuzzy search results are sorted and recommended to users in descending order of the number of historical visits.
  • fuzzy search through keywords can be used to query The accuracy of the products received is not high.
  • recommendations based on the order of the number of visits cannot meet the user's personalized preferences.
  • this application provides a product recommendation method, device and equipment based on content and collaborative filtering, which relates to the field of Internet technology and can solve the problem of low accuracy when users search for target products among a large number of products and the inability to satisfy users' personalized preferences.
  • the problem is a product recommendation method, device and equipment based on content and collaborative filtering, which relates to the field of Internet technology and can solve the problem of low accuracy when users search for target products among a large number of products and the inability to satisfy users' personalized preferences. The problem.
  • a product recommendation method based on content and collaborative filtering which method includes:
  • Extract the identification feature word set and the descriptive feature word set of the recommended product text calculate the weighted similarity value of the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and calculate the weighted similarity value between the recommended product text and the product query text according to the
  • the weighted similarity values are ordered from large to small to form the first recommendation result
  • the target product recommendation result is determined according to the first recommendation result and the second recommendation result.
  • a product recommendation device based on content and collaborative filtering which device includes:
  • the screening module is used to obtain the product query text for the target product sent by the querying user, use the preset word segmentation technology and TF-IDF algorithm to calculate the first similarity between the product query text and the preset product text, and then calculate the first similarity between the product query text and the preset product text.
  • a preset product text whose similarity is greater than the preset similarity threshold is determined to be a recommended product text;
  • the first recommendation module is used to obtain the identification feature word set and the descriptive feature word set of the recommended product text, and calculate the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set. weighted similarity values, and form the first recommendation result in descending order of the weighted similarity values;
  • the second recommendation module is used to determine the neighbor users whose behavior correlation with the query user is higher than the first preset threshold, and the historical behavior set of the neighbor users for products, and calculate the query user's response to the query user based on the collaborative filtering algorithm. Score the historical behavior collection, and determine the second recommendation result based on the score;
  • a determining module configured to determine a target product recommendation result according to the first recommendation result and the second recommendation result.
  • a non-volatile readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the above-mentioned product recommendation method based on content and collaborative filtering is implemented.
  • a computer device including a non-volatile readable storage medium, a processor, and a computer program stored on the non-volatile readable storage medium and capable of running on the processor, When the processor executes the program, the product recommendation method based on content and collaborative filtering is implemented.
  • this application discloses a product recommendation method, device and equipment based on content and collaborative filtering.
  • This application first obtains the product query text for the target product sent by the query user, and uses preset word segmentation technology and TF-
  • the IDF algorithm calculates the first similarity between the product query text and the preset product text, and determines the preset product text whose corresponding first similarity is greater than the preset similarity threshold as the recommended product text; further, extracts the identification features of the recommended product text
  • the word set and the descriptive feature word set calculate the weighted similarity value of the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and form the first recommendation result in order of the weighted similarity value from large to small; in addition, Determine the neighbor users whose behavior correlation with the query user is higher than the first preset threshold, as well as the neighbor users' historical behavior set for the product, calculate the query user's score for the historical behavior set based on the collaborative filtering algorithm, and determine the second recommendation result based on
  • the first recommendation result for the target product is obtained from the product query text
  • the second recommendation result for the target product is obtained from the perspective of neighbor users with high correlation with the query user's behavior, and then the first recommendation result is obtained
  • the recommendation results and the second recommendation results jointly determine the target product recommendation results, and comprehensively recommend the query users through multiple dimensions.
  • the recommendations are highly accurate and meet the personalized needs of the query users.
  • Figure 1 shows a schematic flow chart of a product recommendation method based on content and collaborative filtering provided by an embodiment of the present application
  • Figure 2 shows a schematic flow chart of another product recommendation method based on content and collaborative filtering provided by an embodiment of the present application
  • Figure 3 shows a schematic structural diagram of a product recommendation device based on content and collaborative filtering provided by an embodiment of the present application
  • Figure 4 shows a schematic structural diagram of another product recommendation device based on content and collaborative filtering provided by an embodiment of the present application.
  • embodiments of this application provide a product recommendation method based on content and collaborative filtering, as shown in Figure 1.
  • the method includes:
  • the product query text for the target product sent by the querying user use the preset word segmentation technology and TF-IDF algorithm to calculate the first similarity between the product query text and the preset product text, and make the corresponding first similarity greater than the preset similarity.
  • the preset product text with the degree threshold is determined as the recommended product text.
  • the target product is the product that queries the user's needs
  • the product query text is the product manual about the target product, etc.
  • the preset product text exists in the product database and is used to match the product query text, and then determines the product that matches the first similarity greater than Preset product text with a preset similarity threshold, where the preset product text can also be a product manual, etc.
  • the preset word segmentation technology can be any existing word segmentation technology, such as CRF word segmentation device, IKAnalyzer word segmentation device, etc.
  • the product query text is segmented using preset word segmentation technology to obtain a product query feature word set including at least one product query feature word.
  • the TF-IDF algorithm is used to calculate the product query feature vector corresponding to the product query feature word set.
  • the preset word segmentation technology performs word segmentation processing on the preset product text to obtain a preset product feature word set including at least one preset product feature word, and calculates the preset product feature vector corresponding to the preset product feature word set through the TF-IDF algorithm.
  • TF-IDF is a commonly used information weighting technology and is widely used in the fields of information retrieval and data mining.
  • the TF-IDF value can be used to evaluate whether a certain feature word in the text is a keyword of the text. The greater the TF-IDF value, the greater the importance of the feature word to the text. That is, the feature word is the key word of the text. Keywords of the text. The high word frequency of a certain feature word in the text does not mean that it is the keyword of the text. Therefore, the TF-IDF value is the frequency TF of a feature word in the text that appears in the text and the word frequency TF of the text.
  • the product query feature vector calculated through the TF-IDF algorithm includes the TF-IDF value corresponding to each product query feature word
  • the preset product feature vector calculated through the TF-IDF algorithm includes the corresponding TF-IDF value for each preset product feature word. TF-IDF value.
  • a preset similarity calculation formula is used to calculate the first similarity between the product query feature vector and the preset product feature vector.
  • the preset similarity calculation formula may include a cosine calculation formula.
  • the cosine calculation formula is described as:
  • Query feature vectors for products is the preset product feature vector, is the first degree of similarity.
  • the preset product texts corresponding to the first similarity greater than the preset similarity threshold are filtered out and determined as recommended product texts, from a large number of Preliminary screening is performed on the preset product text to further match the initially screened recommended product text with the product query text, thereby improving the efficiency and accuracy of product recommendation.
  • the first recommended result is formed in sequence.
  • the recommended product feature words after word segmentation processing of the recommended product text are grouped according to whether they belong to identifying words or descriptive words.
  • the recommended product feature words that belong to identifying words constitute the identifying feature word set, and the recommended products that belong to descriptive words form a set of identifying feature words.
  • Feature words constitute a descriptive feature word set. For example, the first feature word of recommended product text 1 is a descriptive word, then the first feature word is classified into the descriptive feature word set, and the second feature word of recommended product text 1 is identifying.
  • the second feature word is classified into the identification feature word set..., therefore, the identification feature word set ⁇ second feature word, third feature word, fifth feature word... ⁇ of recommended product text 1, recommended product text 1
  • the set of descriptive feature words ⁇ first feature word, fourth feature word, sixth feature word... ⁇ , among which, the identifying words such as "flu, virus” and the descriptive words such as "of, in, are” are classified in this way
  • the beneficial effect is to improve the accuracy of the first recommendation result by giving a smaller proportion of the descriptive words of each recommended product text and a larger proportion of the identifying words of each recommended product text.
  • the weighted similarity value of the recommended product text and the product query text is calculated based on the identification feature word set and the descriptive feature word set, that is, by further assigning a smaller proportion to the descriptive feature word set and assigning it to the identification feature word set. If the proportion is larger, the weighted similarity value of each recommended product text and the product query text is calculated again, so that multiple recommended product texts can be sorted to obtain the first recommendation result.
  • the embodiment step of calculating the weighted similarity value may include: calculating a first intersection of the identification feature word set and the product query feature word set, and a second intersection of the description feature word set and the product query feature word set.
  • the third weight value and the fourth weight value obtain the weighted similarity value between the recommended product text and the product query text.
  • the first recommendation result includes each recommended product text and the corresponding weighted similarity value.
  • the recommended product texts are sorted according to the weighted similarity value from large to small.
  • the first recommendation result is the one with the largest weighted similarity value. Corresponding recommended product text.
  • the collaborative filtering algorithm discovers the preferences of the query user based on the mining of historical behavioral data of the query user, and predicts the products that the query user needs.
  • the main implementation methods include: calculating neighbor users who have common needs with the query user, and based on these neighbors Use historical behavioral data of users to make recommendations.
  • the collaborative filtering algorithm can still help query users make recommendations when there is no product query text or the product query text is not accurate enough.
  • neighbor users who have common needs with the query user are embodied as neighbor users whose behavior correlation with the query user is higher than the first preset threshold.
  • the query is calculated according to the preset correlation coefficient calculation formula, such as the cosine calculation formula.
  • the calculated correlation coefficient is compared with the first preset threshold, and other users whose correlation coefficients are higher than the first preset threshold are regarded as neighbor users.
  • the historical behavior set of neighbor users for products includes: the behavior set to be predicted and the adjacent behavior set of the behavior set to be predicted. Among them, the behavior set to be predicted exists in the behavior set of neighbor users but does not exist in the behavior set of the query user. Behavior.
  • the determination of the adjacent behavior sets of the behavior set to be predicted specifically includes: calculating the second similarity between the behavior set to be predicted and other behavior sets according to the k-nearest neighbor algorithm or k-means algorithm, and selecting those whose second similarity is greater than the second preset threshold. Other behavior sets are identified as adjacent behavior sets.
  • the query user's score for the historical behavior set is calculated based on the collaborative filtering algorithm, which specifically includes: the user-based collaborative filtering algorithm calculates the query user's first score for the prediction set to be treated, and the item-based collaborative filtering algorithm calculates the query user's score for the adjacent set. The second score of the behavior set. Finally, the first score and the second score are weighted to obtain the query user's score for the historical behavior set, and the second recommendation result is determined based on the score.
  • the second recommendation result includes the ranking of ratings and products corresponding to the ratings.
  • the first ranked second recommendation result is the highest rating and the product corresponding to the highest rating.
  • the first recommendation result and the second recommendation result can be weighted and calculated to obtain the target product recommendation result.
  • the first recommendation result calculated based on the preset word segmentation technology and TF-IDF algorithm is based on text content
  • the second recommendation result calculated based on the collaborative filtering algorithm is based on user behavior.
  • This application discloses a product recommendation method, device and equipment based on content and collaborative filtering.
  • the product query text for the target product sent by the query user is obtained, and the preset word segmentation technology and TF-IDF algorithm are used to calculate the product query text and the preset Assuming the first similarity of the product text, the preset product text corresponding to the first similarity greater than the preset similarity threshold is determined as the recommended product text; further, the identification feature word set and the descriptive feature word set of the recommended product text are extracted, Calculate the weighted similarity value between the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and form the first recommendation result in order of the weighted similarity value from large to small; in addition, determine the high correlation with the query user behavior Neighbor users within the first preset threshold and the historical behavior set of neighbor users for the product are calculated based on the collaborative filtering algorithm to calculate the query user's score for the historical behavior set, and determine the second recommendation result based on the score; finally,
  • the first recommendation result for the target product is obtained from the product query text
  • the second recommendation result for the target product is obtained from the perspective of neighbor users with high correlation with the query user's behavior, and then the first recommendation result is obtained
  • the recommendation results and the second recommendation results jointly determine the target product recommendation results, and comprehensively recommend the query users through multiple dimensions.
  • the recommendations are highly accurate and meet the personalized needs of the query users.
  • the product query text may include: product instructions, medical diagnosis certificates, etc.
  • the preset word segmentation technology is used to perform word segmentation processing on the product query text.
  • product query feature word set ⁇ product query Feature word 1, product query feature word 2, product query feature word 3... ⁇ .
  • the medical diagnosis book includes a detailed description of the condition.
  • the product query feature word set ⁇ oral ulcer, recurring, Facial symmetry, recurrent aphtha... ⁇
  • perform word segmentation processing on each preset product text to obtain the corresponding set of preset product feature words ⁇ preset product feature word 1, preset product feature word 2, preset product feature word 3... ⁇ , for example, the default product feature word set 1 ⁇ recurrent oral ulcer, herpetic oral ulcer, suppressed immune function... ⁇ .
  • it may include: calculating the first weight value of each product query feature word in the product query feature word set to the product query text and each preset product in the preset product feature word set The second weight value of the feature word to the preset product text; constructing a product query feature vector including the product query feature word and the corresponding first weight value, and constructing a preset product feature including the preset product feature word and the corresponding second weight value vector.
  • the TF-IDF algorithm includes the calculation of word frequency tf and the calculation of inverse document frequency idf. Furthermore, the weight value of the feature word to the text is obtained by multiplying the word frequency tf and the inverse document frequency idf.
  • the word frequency of product query feature words represents the number of times the product query feature word appears in the product query feature word set. Because each product query text has a length, each word frequency is standardized, so the word frequency calculation formula is divided by ⁇ k n k,j .
  • the word frequency calculation formula is described as:
  • i represents the product query feature word
  • j represents the product query feature word set
  • tf i,j represents the word frequency of i in the set j
  • n i,j represents the number of times i appears in the set j
  • ⁇ k n k,j Represents the total number of occurrences of all words in set j.
  • i represents the product query feature word
  • j represents the product query feature word set
  • idf i represents the inverse document frequency of i in the set j
  • represents the total number of product texts in the database
  • tfidf i,j tf i,j ⁇ idf i
  • tfidf i,j is the first weight value of product query feature word i.
  • the specific implementation process of using the TF-IDF algorithm to calculate the second weight value of each preset product feature word may refer to the process of using the TF-IDF algorithm to calculate the first weight value of each product query feature word.
  • a preset product feature vector ⁇ (preset product feature word 1, preset second weight value of product feature word 1), (preset product feature word 2, preset second weight value of product feature word 2) value), (preset product feature word 3, preset second weight value of product feature word 3)... ⁇ .
  • the preset similarity calculation formula may include a cosine calculation formula, and the cosine calculation formula is described as:
  • Query feature vectors for products is the preset product feature vector, is the first degree of similarity.
  • the preset product texts corresponding to the first similarity greater than the preset similarity threshold are filtered out and determined as recommended product texts, from a large number of Preliminary screening is performed on the preset product text to further match the initially screened recommended product text with the product query text, thereby improving the efficiency and accuracy of product recommendation.
  • Extract the identification feature word set and the description feature word set of the recommended product text calculate the first intersection of the identification feature word set and the product query feature word set, and the second intersection of the description feature word set and the product query feature word set.
  • the recommended product feature words obtained after word segmentation processing of the recommended product text can be grouped according to whether they belong to identifying words or descriptive words, and the recommended product feature word set is divided into an identifying feature word set and a descriptive feature word set.
  • Calculate the first intersection of the identification feature word set and the product query feature word set taking recommended product text 1 as an example. If there is no intersection between the second feature word that exists in the identification feature word set and the product query feature word set, then in the first intersection Excluding the second feature word, if the third feature word that exists in the identification feature word set intersects with the product query feature word set, then the first intersection includes the third feature word. Therefore, the first intersection is ⁇ third feature word ... ⁇ , in the same way, calculate the second intersection of the description feature word set and the product query feature word set.
  • tfidf w represents the third weight value of the identification feature word set w relative to the product query feature word set j
  • n represents the first intersection of the identification feature word set w and the product query feature word set j
  • tf t,w represents the TF value that identifies the feature word t in w
  • idf t,w represents the IDF value that identifies the feature word t in w.
  • tfidf v represents the fourth weight value of the descriptive feature word set v relative to the product query feature word set j
  • m represents the second intersection of the descriptive feature word set v and the product query feature word set j
  • tf t,v represents the TF value of the descriptive feature word t in v
  • idf t,v represents the IDF value of the feature word t in v.
  • is the preset coefficient
  • C is the weighted similarity value
  • tfidf w represents the third weight value
  • tfidf v represents the fourth weight value.
  • the first recommendation result includes a weighted similarity value and a recommended product text corresponding to the weighted similarity value, where the first recommendation result is the largest weighted similarity value and the recommended product text corresponding to the largest weighted similarity value,
  • the first recommended results include: ⁇ (triamcinolone ointment, 0.5), (dumifene lozenges, 0.3), (lidocaine gel, 0.15)... ⁇ .
  • the purpose of using the preset coefficients to weight the third weight value and the fourth weight value to obtain the weighted similarity value between the recommended product text and the product query text is to assign a smaller coefficient to the descriptive words of each recommended product text, which is (1 - ⁇ ), assign a larger coefficient to the identifying words of each recommended product text, which is ⁇ , to reduce the interference of descriptive words on the first recommendation results, thereby improving the accuracy of the first recommendation results.
  • Recommendations are made based on the user's product query text. On the one hand, it has nothing to do with the query user's personal data, so there are no cold start or new user problems. On the other hand, each preset product text has the possibility of being recommended, regardless of the storage time and order of the preset product text information, so there is no new project problem. Finally, compared with the method of directly taking keywords into the database to do fuzzy queries, this method is based on text search that is closer to the needs of the querying user, so the first recommendation result is more accurate.
  • the historical behavior set includes the behavior set to be predicted whose behavior set of neighbor users is different from the behavior set of the query user, and the adjacent behavior set adjacent to the behavior set to be predicted.
  • the step of determining neighbor users whose behavior correlation with the query user is higher than a first preset threshold includes: using a preset correlation coefficient calculation formula to calculate the relationship between the query user and other users. Correlation coefficient, determine other users whose correlation coefficient is greater than the first preset threshold as neighbor users.
  • the preset correlation coefficient calculation formula can be expressed as:
  • S(u,b) represents the correlation coefficient between query user u and other user b
  • I u ⁇ I b represents the product set jointly called by u and b
  • r u,p and r b,p represent the pair of u and b respectively.
  • the number of historical calls of the jointly called product p and Represents the average number of historical calls by u and b to the products in the set I u ⁇ I b respectively.
  • the value range of S(u,b) is generally [-1, +1].
  • Other users b whose S(u,b) is greater than the preset first threshold are determined as neighbor users h.
  • steps of determining the behavior set to be predicted in which the behavior set of the neighbor user is different from the behavior set of the query user include: determining the difference between the behavior set of the neighbor user and the behavior set of the query user as the behavior set to be predicted. Specifically, the behavior set to be predicted exists in the behavior set of neighbor users, but does not exist in the behavior set of the query user.
  • steps for determining adjacent behavior sets adjacent to the behavior set to be predicted include: calculating the second similarity between the behavior set to be predicted and other behavior sets according to the k-nearest neighbor algorithm or k-means algorithm, and setting the second similarity to be greater than Other behavior sets with the second preset threshold are determined as adjacent behavior sets.
  • the k-nearest neighbor algorithm or k-means algorithm can be found in the prior art, and will not be described again here.
  • the user-based collaborative filtering algorithm calculates the query user's first rating for the prediction set.
  • the main idea of the user-based collaborative filtering algorithm is to find neighbor users whose behavior correlation with the query user is higher than the first preset threshold. Based on the similarity of the historical behaviors of the query user and neighbor users, therefore, the historical behaviors of the neighbor users And querying the historical behaviors that the user has not had may be querying the historical behaviors that the user has. Among them, historical behaviors that neighbor users have had and historical behaviors that the query user has not are reflected in the set to be predicted in this embodiment.
  • the first score calculation formula is described as:
  • p(u,i) represents the first rating of product i in the predicted behavior set of query user u. Represents the average number of product calls made by query user u, s(u,h) represents the correlation coefficient between u and h, p h, q represents the rating of neighbor user h on i, and n is the number of neighbor users.
  • the item-based collaborative filtering algorithm calculates the second rating of the query user on the adjacent behavior set, and determines the second recommendation result based on the first rating and/or the second rating.
  • the adjacent behavior set is the adjacent behavior set adjacent to the behavior set to be predicted.
  • the second score calculation formula is described as:
  • p(u,i) represents the second rating of query user u for product i in the prediction set, represents the average number of calls to i by neighbor user h, i k is the adjacent behavior set, is the average number of calls of neighbor user h to adjacent behavior set i k , s(i,i k ) represents the second similarity between i and i k , Represents the number of times query user u calls adjacent behavior set i k .
  • the second recommendation result may include only the first score or only the second score.
  • the first score and the second score may also be weighted to calculate the effects of the first score and the second score.
  • is the weighting coefficient and P is the score.
  • the products corresponding to the adjacent behavior sets are sorted from large to small according to the scores to obtain the second recommendation results.
  • the second recommendation results are: product 1, with a score of 16; product 7, with a score of 10; product 5, Rated 3...
  • the query user before sorting the products corresponding to the adjacent behavior sets from large to small according to the scores, it also includes: deleting products with scores less than or equal to 0. Because the query user will not search for products with a rating less than 0, only products with a rating greater than 0 are saved.
  • the first recommendation result obtained in step 206 and the second recommendation result obtained in step 209 it also includes: using normalization to compress the score into the range of (0, 1).
  • the second recommendation result is: Product 1, the score before normalization is 16; Product 7, the score before normalization is 10; Product 5, the score before normalization is 3...
  • the second recommendation result after normalization is: product 1, score is 0.53, product 7, score is 0.33, product 5, score is 0.1...
  • the first recommendation result calculates the product and its corresponding weighted similarity value
  • the second recommendation result calculates the product and its corresponding score.
  • the weighted similarity value of product 1 in the first recommendation result is 0.5
  • the weighted similarity value of product 1 in the first recommendation result is 0.5
  • the score in the second recommendation result is 0.53
  • the target recommendation value is 0.5*preset third coefficient+0.53*(1-preset third coefficient). If the weighted similarity value of product 2 in the first recommendation result is 0.1, but There is no product 2 in the second recommendation result, so the target recommendation value is 0.1*preset third coefficient+0*(1-preset third coefficient).
  • This application discloses a product recommendation method, device and equipment based on content and collaborative filtering.
  • This application first obtains the product query text for the target product sent by the query user, and uses the preset word segmentation technology and TF-IDF algorithm to calculate the product query text Based on the first similarity with the preset product text, the preset product text whose corresponding first similarity is greater than the preset similarity threshold is determined as the recommended product text; further, the identification feature word set and descriptive feature words of the recommended product text are extracted set, calculate the weighted similarity value of the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and form the first recommendation result in order of the weighted similarity value from large to small; in addition, determine the information related to the query user behavior Neighbor users whose characteristics are higher than the first preset threshold, as well as the historical behavior set of neighbor users for the product, calculate the query user's score for the historical behavior set based on the collaborative filtering algorithm, and determine the second recommendation result based on the
  • the first recommendation result for the target product is obtained from the product query text
  • the second recommendation result for the target product is obtained from the perspective of neighbor users with high correlation with the query user's behavior, and then the first recommendation result is obtained
  • the recommendation results and the second recommendation results jointly determine the target product recommendation results, and comprehensively recommend the query users through multiple dimensions.
  • the recommendations are highly accurate and meet the personalized needs of the query users.
  • the device includes: a screening module 31, a first Recommendation module 32, second recommendation module 33, determination module 34;
  • the screening module 31 can be used to obtain the product query text for the target product sent by the querying user, use the preset word segmentation technology and TF-IDF algorithm to calculate the first similarity between the product query text and the preset product text, and compare the corresponding first similarity Preset product texts that are greater than the preset similarity threshold are determined as recommended product texts;
  • the first recommendation module 32 can be used to extract the identification feature word set and the descriptive feature word set of the recommended product text, calculate the weighted similarity value of the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and calculate the weighted similarity value between the recommended product text and the product query text according to the weighted similarity
  • the values form the first recommended result in order from large to small;
  • the second recommendation module 33 can be used to determine the neighbor users whose behavior correlation with the query user is higher than the first preset threshold, as well as the neighbor users' historical behavior set for the product, and calculate the query user's score for the historical behavior set based on the collaborative filtering algorithm. And determine the second recommendation result based on the score;
  • the determination module 34 may be used to determine the target product recommendation result based on the first recommendation result and the second recommendation result.
  • the screening module 31 may specifically include: a word segmentation unit 311 , the first calculation unit 312, the second calculation unit 313;
  • the word segmentation unit 311 can be used to perform word segmentation processing on the product query text according to the preset word segmentation technology to obtain a product query feature word set, and perform word segmentation processing on the preset product text to obtain a preset product feature word set;
  • the first calculation unit 312 can be used to calculate the product query feature vector corresponding to the product query feature word set and the preset product feature vector corresponding to the preset product feature word set using the TF-IDF algorithm;
  • the second calculation unit 313 may be used to calculate the first similarity between the product query feature vector and the preset product feature vector using a preset similarity calculation formula.
  • the first calculation unit 312 may be specifically used to calculate the product query feature word set. Concentrate the first weight value of each product query feature word on the product query text and the second weight value of each preset product feature word in the preset product feature word set on the preset product text; construct the product query feature word and the corresponding A product query feature vector with a first weight value, and a preset product feature vector including a preset product feature word and a corresponding second weight value is constructed.
  • the weighted similarity value of the recommended product text and the product query text is calculated based on the identification feature word set and the descriptive feature word set.
  • the first recommendation module 32 may specifically include: an intersection unit 321, Weighting unit 322, first weighting unit 323;
  • the intersection unit 321 may be used to calculate the first intersection of the identification feature word set and the product query feature word set, and the second intersection of the description feature word set and the product query feature word set;
  • the weight unit 322 may be used to calculate the third weight value of the identification feature word set relative to the product query feature word set based on the first intersection, and calculate the fourth weight value of the description feature word set relative to the product query feature word set based on the second intersection. ;
  • the first weighting unit 323 may be configured to use a preset coefficient to weight the third weight value and the fourth weight value to obtain a weighted similarity value between the recommended product text and the product query text.
  • the historical behavior set includes the behavior set to be predicted whose behavior set of neighbor users is different from the behavior set of the query user, and the adjacent behavior set adjacent to the behavior set to be predicted.
  • the second recommendation module 33 may specifically include: a first screening unit 331, a first determination unit 332, second screening unit 333;
  • the first screening unit 331 may be used to calculate the correlation coefficient between the query user and other users using a preset correlation coefficient calculation formula, and determine other users whose correlation coefficients are greater than the first preset threshold as neighbor users;
  • the first determining unit 332 may be configured to determine the difference between the behavior set of the neighbor user and the behavior set of the query user as the behavior set to be predicted;
  • the second screening unit 333 may be used to calculate the second similarity between the behavior set to be predicted and other behavior sets according to the k nearest neighbor algorithm or the k-means algorithm, and determine other behavior sets whose second similarity is greater than the second preset threshold as related. neighbor behavior set.
  • the query user's score for the historical behavior set is calculated based on the collaborative filtering algorithm, and the second recommendation result is determined based on the score.
  • the second recommendation module 33 may also include: first score Unit 334, second scoring unit 335, second screening unit 336;
  • the first scoring unit 334 can be used to calculate the first score of the query user's behavior set to be predicted based on the user's collaborative filtering algorithm
  • the second scoring unit 335 can be used for the item-based collaborative filtering algorithm to calculate the query user's second score for the adjacent behavior set;
  • the second determination unit 336 may be used to determine the second recommendation result according to the first score and/or the second score.
  • the target product recommendation result is determined based on the first recommendation result and the second recommendation result.
  • the determination module 34 may specifically include: a union unit 341, a second weighting unit 342, and a recommendation unit. 343;
  • the union unit 341 can be used to calculate the union product of the product corresponding to the first recommendation result and the product corresponding to the second recommendation result;
  • the second weighting unit 342 can be used to use the preset third coefficient to weight the weighted union of the weighted similarity value and score of the product to obtain the target recommendation value;
  • the recommendation unit 343 may be used to sort the target recommendation values from large to small to obtain the target product recommendation results.
  • this embodiment also provides a readable storage medium.
  • the readable storage medium can be volatile or non-volatile, and a computer-readable storage medium is stored thereon. Read instructions. When the readable instructions are executed by the processor, the above-mentioned product recommendation method based on content and collaborative filtering as shown in Figures 1 to 2 is implemented.
  • the technical solution of the present application can be embodied in the form of a software product.
  • the software product can be stored in a readable storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) and includes a number of instructions. So that a computer device (which can be a personal computer, a server, or a network device, etc.) executes the method of each implementation scenario of this application.
  • this embodiment also provides a computer device, the computer device includes a non-volatile A readable storage medium and a processor; a non-volatile readable storage medium for storing computer-readable instructions; a processor for executing computer-readable instructions to implement the above content-based and Collaborative filtering product recommendation method.
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, etc.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc.
  • the optional user interface may also include a USB interface, a card reader interface, etc.
  • Optional network interfaces may include standard wired interfaces, wireless interfaces (such as WI-FI interfaces), etc.
  • a computer device does not constitute a limitation on the physical device, and may include more or less components, or combine certain components, or arrange different components.
  • the non-volatile readable storage medium may also include an operating system and a network communication module.
  • An operating system is a program that manages the hardware and software resources of the above computer equipment and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to realize communication between components within the non-volatile readable storage medium, as well as communication with other hardware and software in the information processing physical device.
  • this application discloses a product recommendation method, device and equipment based on content and collaborative filtering.
  • This application first obtains the product query for the target product sent by the query user. Text, use the preset word segmentation technology and TF-IDF algorithm to calculate the first similarity between the product query text and the preset product text, and determine the preset product text whose corresponding first similarity is greater than the preset similarity threshold as the recommended product text; Further, extract the identification feature word set and descriptive feature word set of the recommended product text, calculate the weighted similarity value of the recommended product text and the product query text based on the identification feature word set and the descriptive feature word set, and calculate the weighted similarity value from large to small according to the weighted similarity value
  • the first recommendation result is formed in the order of Score, and determine the second recommendation result based on the score; finally, determine the target product recommendation result based on the first recommendation result and the second recommendation result.
  • the first recommendation result for the target product is obtained from the product query text
  • the second recommendation result for the target product is obtained from the perspective of neighbor users with high correlation with the query user's behavior, and then the first recommendation result is obtained
  • the recommendation results and the second recommendation results jointly determine the target product recommendation results, and comprehensively recommend the query users through multiple dimensions.
  • the recommendations are highly accurate and meet the personalized needs of the query users.
  • the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present application.
  • the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or can be correspondingly changed and located in one or more devices different from the implementation scenario.
  • the modules of the above implementation scenarios can be combined into one module or further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种基于内容与协同过滤的产品推荐方法、装置及设备,涉及互联网技术领域,可以解决用户在大量产品中搜索目标产品时准确度低以及无法满足用户个性化偏好的问题。包括:利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;根据推荐产品文本的标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;根据第一推荐结果与第二推荐结果确定目标产品推荐结果。

Description

基于内容与协同过滤的产品推荐方法、装置及计算机设备
本申请要求于2022年4月24日提交中国专利局、申请号为202210435260.6、申请名称为“基于内容与协同过滤的产品推荐方法、装置、计算机设备及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及到一种基于内容与协同过滤的产品推荐方法、装置及设备。
背景技术
在医疗行业应用信息化技术的发展过程中,医疗产品库中包括了大量复杂数据,且不同用户偏好不同,因此,从大量数据中难以精确的搜索到符合用户需求的产品。
目前是通过关键词进行模糊搜索,将得到的模糊搜索结果按照其历史被访问次数由高到低的顺序排序并推荐给用户,但是发明人意识到,一方面通过关键词进行模糊查询的方法查询到的产品准确度不高,另一方面按照被访问次数的顺序推荐无法满足用户个性化偏好。
发明内容
有鉴于此,本申请提供了一种基于内容与协同过滤的产品推荐方法、装置及设备,涉及互联网技术领域,可以解决用户在大量产品中搜索目标产品时准确度低以及无法满足用户个性化偏好的问题。
根据本申请的一个方面,提供了一种基于内容与协同过滤的产品推荐方法,该方法包括:
获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
提取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针 对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
根据本申请的另一个方面,提供了一种基于内容与协同过滤的产品推荐装置,该装置包括:
筛选模块,用于获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
第一推荐模块,用于获取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
第二推荐模块,用于确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
确定模块,用于根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
根据本申请的又一个方面,提供了一种非易失性可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现上述基于内容与协同过滤的产品推荐方法。
根据本申请的再一个方面,提供了一种计算机设备,包括非易失性可读存储介质、处理器及存储在非易失性可读存储介质上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述基于内容与协同过滤的产品推荐方法。
借由上述技术方案,本申请公开了一种基于内容与协同过滤的产品推荐方法、装置及设备,本申请首先获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;进一步的,提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;此外,确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;最后,根据第一推荐结果与第二推荐结果确定目标产品推荐结果。通过本申请中的技术方案,从产品查询文本出发得到针对目标产品的第一推荐结果,从与查询用户行为 相关性高的邻居用户角度出发得到针对目标产品的第二推荐结果,然后使用第一推荐结果与第二推荐结果共同确定目标产品推荐结果,通过多个维度综合为查询用户推荐,推荐精确度高,且符合查询用户的个性化需求。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本地申请的不当限定。在附图中:
图1示出了本申请实施例提供的一种基于内容与协同过滤的产品推荐方法的流程示意图;
图2示出了本申请实施例提供的另一种基于内容与协同过滤的产品推荐方法的流程示意图;
图3出了本申请实施例提供的一种基于内容与协同过滤的产品推荐装置的结构示意图;
图4出了本申请实施例提供的另一种基于内容与协同过滤的产品推荐装置的结构示意图。
具体实施方式
下文将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互结合。
针对目前的问题,本申请实施例提供了一种基于内容与协同过滤的产品推荐方法,如图1所示,该方法包括:
101、获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本。
其中,目标产品是查询用户需求的产品,产品查询文本是关于目标产品的产品说明书等,预设产品文本存在于产品数据库中,用于与产品查询文本匹配,进而确定出符合第一相似度大于预设相似度阈值的预设产品文本,其中,预设产品文本同样可以是产品说明书等。
对于本实施例,预设分词技术可为任意一种现有分词技术,如CRF分词器、 IKAnalyzer分词器等。通过预设分词技术对产品查询文本进行分词处理,得到包括至少一个产品查询特征词的产品查询特征词集,通过TF-IDF算法计算产品查询特征词集对应的产品查询特征向量,同样的,通过预设分词技术对预设产品文本进行分词处理,得到包括至少一个预设产品特征词的预设产品特征词集,通过TF-IDF算法计算预设产品特征词集对应的预设产品特征向量。
其中,TF-IDF是一种常用的信息加权技术,普遍应用于信息检索和数据挖掘领域。TF-IDF值可以用来评估文本中某个特征词是否为该文本的关键词,TF-IDF值越大,说明该特征词对该文本的重要程度越大,也就是,该特征词是该文本的关键词,某个特征词在该文本中出现的词频高,并不代表就是该文本的关键词,因此TF-IDF值是文本中某个特征词在该文本中出现的词频TF与该特征词对应的逆文档频率IDF的乘积,例如对于最常见的特征词“的、是、在”给予最小的IDF,对于少见的特征词“流感、病毒”等给予较大的IDF。因此,通过TF-IDF算法计算出的产品查询特征向量包括每个产品查询特征词对应的TF-IDF值,通过TF-IDF算法计算出的预设产品特征向量包括每个预设产品特征词对应的TF-IDF值。
进一步的,利用预设相似度计算公式计算产品查询特征向量与预设产品特征向量的第一相似度,预设相似度计算公式可以包括余弦计算公式,余弦计算公式描述为:
Figure PCTCN2022122200-appb-000001
公式中,
Figure PCTCN2022122200-appb-000002
为产品查询特征向量,
Figure PCTCN2022122200-appb-000003
为预设产品特征向量,
Figure PCTCN2022122200-appb-000004
为第一相似度。
计算出产品查询特征向量与每个预设产品特征向量的第一相似度后,将对应第一相似度大于预设相似度阈值的预设产品文本筛选出来,并确定为推荐产品文本,从大量预设产品文本中进行初步筛选,以便将初步筛选出的推荐产品文本与产品查询文本进行进一步的匹配,提高了产品推荐效率与精确度。
102、提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果。
其中,对推荐产品文本进行分词处理后的推荐产品特征词按照其属于标识性词语或者描述性词语进行分组,属于标识性词语的推荐产品特征词构成标识特征词集,属于描述性词语的推荐产品特征词构成描述特征词集,例如,推荐产品文本1的第一特征词属于描述性词语,那么将第一特征词归类到描述特征词集,推荐产品文本1的第二特征词属于标识性词语,那么将第二特征词归类到标识特征词集…,因此,推荐产品文本1的标识特征词 集{第二特征词、第三特征词、第五特征词…},推荐产品文本1的描述特征词集{第一特征词、第四特征词、第六特征词…},其中,标识性词语如“流感、病毒”,描述性词语如“的、在、是”,这样分类的有益效果在于通过赋给每个推荐产品文本的描述性词语较小的占比,赋给每个推荐产品文本的标识性词语较大的占比,来提升第一推荐结果的精确度。
对于本实施例,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,就是进一步的通过赋给描述特征词集较小的占比,赋给标识特征词集较大的占比,再一次的计算每个推荐产品文本与产品查询文本的加权相似值,以便对多个推荐产品文本进行排序得出第一推荐结果。计算加权相似值的实施例步骤可包括:计算标识特征词集与产品查询特征词集的第一交集,以及描述特征词集与产品查询特征词集的第二交集。根据第一交集计算标识特征词集相对于产品查询特征词集的第三权重值,以及根据第二交集计算描述特征词集相对于产品查询特征词集的第四权重值,利用预设系数加权第三权重值与第四权重值得到推荐产品文本与产品查询文本的加权相似值。
第一推荐结果包括每个推荐产品文本与对应的加权相似值,按照加权相似值由大到小的顺序对推荐产品文本进行排序,排在第一推荐结果第一位的是最大的加权相似值对应的推荐产品文本。
103、确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果。
其中,协同过滤算法基于对查询用户的历史行为数据的挖掘发现查询用户的喜好偏向,并预测查询用户需求的产品,主要实现方式包括:计算出与查询用户有着共同需求的邻居用户,根据这些邻居用户的历史行为数据进行推荐。协同过滤算法在不存在产品查询文本或者产品查询文本不够精确时仍然可以帮助查询用户进行推荐。
对于本实施例,与查询用户有着共同需求的邻居用户体现在与查询用户行为相关性高于第一预设阈值的邻居用户,具体的,根据预设相关系数计算公式例如余弦计算公式去计算查询用户与其它用户的相关系数,将计算出来的相关系数与第一预设阈值比较,将相关系数高于第一预设阈值的其它用户作为邻居用户。
邻居用户针对产品的历史行为集合包括:待预测行为集、待预测行为集的相邻行为集,其中,待预测行为集是存在于邻居用户的行为集中,但是不存在于查询用户的行为集中的行为。待预测行为集的相邻行为集的确定具体包括:根据k近邻算法或k-means算法计算 待预测行为集与其他行为集的第二相似度,将第二相似度大于第二预设阈值的其他行为集确定为相邻行为集。最后,基于协同过滤算法计算查询用户对历史行为集合的评分,具体包括:基于用户的协同过滤算法计算查询用行为户对待预测集的第一评分,基于项的协同过滤算法计算查询用户对相邻行为集的第二评分,最后,加权计算第一评分与第二评分得到查询用户对历史行为集合的评分,并根据评分确定第二推荐结果。
第二推荐结果包括对评分以及与评分对应的产品的排序,排在第二推荐结果第一位的是最大评分以及与最大评分对应的产品。
104、根据第一推荐结果与第二推荐结果确定目标产品推荐结果。
对于本实施例,作为一种优选的实施方式,可以将第一推荐结果与第二推荐结果加权计算得到目标产品推荐结果。其中,根据预设分词技术与TF-IDF算法计算出的第一推荐结果是基于文本内容得到的,根据协同过滤算法计算出的第二推荐结果是基于用户行为得到的,通过加权的方法将这两个维度得到的推荐结果组合,综合得到目标产品推荐结果,比单一维度下得到的推荐结果更加准确。
本申请公开了一种基于内容与协同过滤的产品推荐方法、装置及设备,首先获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;进一步的,提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;此外,确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;最后,根据第一推荐结果与第二推荐结果确定目标产品推荐结果。通过本申请中的技术方案,从产品查询文本出发得到针对目标产品的第一推荐结果,从与查询用户行为相关性高的邻居用户角度出发得到针对目标产品的第二推荐结果,然后使用第一推荐结果与第二推荐结果共同确定目标产品推荐结果,通过多个维度综合为查询用户推荐,推荐精确度高,且符合查询用户的个性化需求。
进一步的,作为上述实施例具体实施方式的细化和扩展,为了完整说明本实施例中的具体实施过程,提供了另一种基于内容与协同过滤的产品推荐方法,如图2所示,该方法包括:
201、获取查询用户发送的针对目标产品的产品查询文本,根据预设分词技术对产品 查询文本进行分词处理得到产品查询特征词集,对预设产品文本进行分词处理得到预设产品特征词集。
对于本实施例,在具体的应用场景中,产品查询文本可以包括:产品说明书、医疗诊断书等,利用预设分词技术对产品查询文本进行分词处理,具体的:产品查询特征词集{产品查询特征词1,产品查询特征词2,产品查询特征词3…},例如,医疗诊断书中包括对于病情的详细描述文本,经过分词处理后,得到产品查询特征词集{口腔溃疡,反复发作,面部对称,复发性口疮…}同理,对每一个预设产品文本进行分词处理得到对应的预设产品特征词集{预设产品特征词1,预设产品特征词2,预设产品特征词3…},例如,预设产品特征词集1{复以性口腔溃疡,疱疹性口腔溃疡,抑制免疫功能…}。
202、利用TF-IDF算法计算产品查询特征词集对应的产品查询特征向量以及预设产品特征词集对应的预设产品特征向量。
对于本实施例,作为一种优选的实施方式,可以包括:计算产品查询特征词集中每个产品查询特征词对产品查询文本的第一权重值以及预设产品特征词集中的每个预设产品特征词对预设产品文本的第二权重值;构建包括产品查询特征词与对应第一权重值的产品查询特征向量,以及构建包括预设产品特征词与对应第二权重值的预设产品特征向量。
具体的,TF-IDF算法包括词频tf计算与逆文档频率idf计算,进一步的,将词频tf与逆文档频率idf相乘得到特征词对文本的权重值。
产品查询特征词的词频表示产品查询特征词在产品查询特征词集中出现的次数,因为每个产品查询文本有长短之分,因此对每个词频进行标准化,因此词频计算公式中除以∑ kn k,j
词频计算公式描述为:
Figure PCTCN2022122200-appb-000005
其中,i代表产品查询特征词,j代表产品查询特征词集,tf i,j表示i在集合j中的词频,n i,j表示i在集合j中出现的次数,∑ kn k,j表示集合j所有词出现的次数总和。
逆文档频率计算公式描述为:
Figure PCTCN2022122200-appb-000006
其中,i代表产品查询特征词,j代表产品查询特征词集,idf i表示i在集合j中的逆文档频率,|D|代表数据库中产品文本总数,|{j:t j∈d j}|代表出现产品查询特征词i的产品文本数。|{j:t j∈d j}|越小,IDF值越大,产品查询特征词i的文本区分效果越好。相反,IDF值越小则产品查询特征词的文本区分效果越差。
TF-IDF值计算公式描述为:tfidf i,j=tf i,j×idf i
其中,tfidf i,j为产品查询特征词i的第一权重值。
利用TF-IDF算法计算每个预设产品特征词的第二权重值的具体实施过程可参考利用TF-IDF算法计算每个产品查询特征词的第一权重值的过程。
构建产品查询特征向量:{(产品查询特征词1,产品查询特征词1的第一权重值),(产品查询特征词2,产品查询特征词2的第一权重值),(产品查询特征词3,产品查询特征词3的第一权重值)…}。
同理,构建预设产品特征向量:{(预设产品特征词1,预设产品特征词1的第二权重值),(预设产品特征词2,预设产品特征词2的第二权重值),(预设产品特征词3,预设产品特征词3的第二权重值)…}。
203、利用预设相似度计算公式计算产品查询特征向量与预设产品特征向量的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本。
对于本实施例,预设相似度计算公式可以包括余弦计算公式,余弦计算公式描述为:
Figure PCTCN2022122200-appb-000007
公式中,
Figure PCTCN2022122200-appb-000008
为产品查询特征向量,
Figure PCTCN2022122200-appb-000009
为预设产品特征向量,
Figure PCTCN2022122200-appb-000010
为第一相似度。
计算出产品查询特征向量与每个预设产品特征向量的第一相似度后,将对应第一相似度大于预设相似度阈值的预设产品文本筛选出来,并确定为推荐产品文本,从大量预设产品文本中进行初步筛选,以便将初步筛选出的推荐产品文本与产品查询文本进行进一步的匹配,提高了产品推荐效率与精确度。
204、提取推荐产品文本的标识特征词集与描述特征词集,计算标识特征词集与产品查询特征词集的第一交集,以及描述特征词集与产品查询特征词集的第二交集。
对于本实施例,推荐产品文本经过分词处理后得到的推荐产品特征词可以按照其属于标识性词语或者描述性词语进行分组,将推荐产品特征词集分为标识特征词集与描述特征词集,计算标识特征词集与产品查询特征词集的第一交集,以推荐产品文本1为例,如果存在于标识特征词集的第二特征词与产品查询特征词集没有交集,那么第一交集中不包括第二特征词,如果存在于标识特征词集的第三特征词与产品查询特征词集有交集,那么第一交集中包括第三特征词,因此,第一交集为{第三特征词…},同理,计算描述特征词集与产品查询特征词集的第二交集。
205、根据第一交集计算标识特征词集相对于产品查询特征词集的第三权重值,以及根据第二交集计算描述特征词集相对于产品查询特征词集的第四权重值。
对于本实施例,第三权重值计算过程为:tfidf w=∑ t∈ntf t,w×idf t,w
其中,tfidf w表示标识特征词集w相对于产品查询特征词集j的第三权重值,n表示标识特征词集w与产品查询特征词集j的第一交集。tf t,w表示w中标识特征词t的TF值,idf t,w表示w中标识特征词t的IDF值。
第四权重值计算过程为:tfidf v=∑ t∈mtf t,v×idf t,v
其中,tfidf v表示描述特征词集v相对于产品查询特征词集j的第四权重值,m表示描述特征词集v与产品查询特征词集j的第二交集。tf t,v表示v中的描述特征词t的TF值,idf t,v表示v中的特征词t的IDF值。
206、利用预设系数加权第三权重值与第四权重值得到推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果。
对于本实施例,每个推荐产品文本与产品查询文本的加权相似值计算过程为:C=λtfidf w+(1-λ)tfidf v
其中,λ为预设系数,C为加权相似值,tfidf w表示第三权重值,tfidf v表示第四权重值。
第一推荐结果包括加权相似值以及与加权相似值对应的推荐产品文本,其中,排在第一推荐结果第一位的是最大的加权相似值以及与最大的加权相似值对应的推荐产品文本,例如,第一推荐结果包括:{(曲安松龙软膏,0.5),(杜米芬含片,0.3),(利多卡因凝胶,0.15)...}。
利用预设系数加权第三权重值与第四权重值得到推荐产品文本与产品查询文本的加权相似值的作用在于:通过赋给每个推荐产品文本的描述性词语较小的系数也就是(1-λ),赋给每个推荐产品文本的标识性词语较大的系数也就是λ,减少描述性词语对第一推荐结果的干扰,从而提升第一推荐结果的精确度。
根据查询用户的产品查询文本作推荐,一方面,与查询用户个人数据无关,因此没有冷启动、新用户问题。另一方面,每个预设产品文本都有被推荐的可能,与预设产品文本信息的入库时间以及先后顺序无关,因此无新项目问题。最后,相较于直接拿关键词到数据库中做模糊查询的方法,此方法基于更加贴近查询用户需要的内容文本搜索,因此第一推荐结果更加准确。
207、确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合。
其中,历史行为集合包括邻居用户的行为集不同于查询用户的行为集的待预测行为集,以及与待预测行为集相邻的相邻行为集。
对于本实施例,作为一种优选的实施方式,确定与查询用户行为相关性高于第一预设 阈值的邻居用户的实施例步骤包括:利用预设相关系数计算公式计算查询用户与其他用户的相关系数,将相关系数大于第一预设阈值的其他用户确定为邻居用户。具体的,预设相关系数计算公式可以表示为:
Figure PCTCN2022122200-appb-000011
其中,S(u,b)表示查询用户u和其他用户b的相关系数,I u∩I b表示u,b共同调用的产品集合,r u,p和r b,p分别表示u和b对共同调用的产品p的历史调用次数,
Figure PCTCN2022122200-appb-000012
Figure PCTCN2022122200-appb-000013
分别表示u和b对I u∩I b集合中的产品的平均历史调用次数。
S(u,b)的值越大,表示u和b的相关系数越大。S(u,b)的取值区间一般为[-1,+1]。将S(u,b)大于预设第一阈值的其他用户b确定为邻居用户h。
确定邻居用户的行为集不同于查询用户的行为集的待预测行为集的具体实施例步骤包括:将邻居用户的行为集与查询用户的行为集的差确定为待预测行为集。具体的,待预测行为集存在于邻居用户的行为集中,但是不存在于查询用户的行为集中。
确定与待预测行为集相邻的相邻行为集的具体实施例步骤包括:根据k近邻算法或k-means算法计算待预测行为集与其他行为集的第二相似度,将第二相似度大于第二预设阈值的其他行为集确定为相邻行为集。具体的,k近邻算法或k-means算法可以参见现有技术,在此不再赘述。
208、基于用户的协同过滤算法计算查询用行为户对待预测集的第一评分。
基于用户的协同过滤算法主要思想是:找到与查询用户行为相关性高于第一预设阈值的邻居用户,基于查询用户与邻居用户的历史行为具有相似性,因此,邻居用户有过的历史行为而查询用户没有过的历史行为可能是查询用户会有的历史行为。其中,邻居用户有过的历史行为而查询用户没有过的历史行为在本实施例中体现在待预测集。
对于本实施例,第一评分计算公式描述为:
Figure PCTCN2022122200-appb-000014
其中,p(u,i)表示查询用户u对待预测行为集中产品i的第一评分,
Figure PCTCN2022122200-appb-000015
表示查询用户u对产品的平均调用次数,s(u,h)表示u与h的相关系数,p h,q表示邻居用户h对i的评分,n是邻居用户的个数。
209、基于项的协同过滤算法计算查询用户对相邻行为集的第二评分,根据第一评分和/或第二评分确定第二推荐结果。
其中,相邻行为集是与待预测行为集相邻的相邻行为集。对于本实施例,第二评分计算公式描述为:
Figure PCTCN2022122200-appb-000016
其中,p(u,i)代表查询用户u对待预测集中产品i的第二评分,
Figure PCTCN2022122200-appb-000017
代表邻居用户h对i的平均调用次数,i k为相邻行为集,
Figure PCTCN2022122200-appb-000018
为邻居用户h对相邻行为集i k的平均调用次数,s(i,i k)代表i与i k的第二相似度,
Figure PCTCN2022122200-appb-000019
代表查询用户u调用相邻行为集i k的次数。
作为一种实施方式,第二推荐结果可以只包括第一评分,也可以只包括第二评分,也可以对第一评分与第二评分进行加权计算,加权计算第一评分与第二评分的作用在于:结合了两种协同过滤算法,提高了第二推荐结果的精确度。
加权第一评分与第二评分的计算公式描述为:P=μp(u,i)+(1-μ)p(u,i)
其中,μ为加权系数,P为评分。
按照评分对相邻行为集对应的产品进行由大到小的排序,得到第二推荐结果,具体的,第二推荐结果为:产品1、评分为16;产品7、评分为10;产品5、评分为3…
进一步的,在按照评分对相邻行为集对应的产品进行由大到小的排序之前,还包括:删除评分小于等于0的产品。因为,查询用户不会搜索评分小于0的产品,所以只保存评分大于0的产品。
210、根据第一推荐结果与第二推荐结果确定目标产品推荐结果。
在将步骤206得到的第一推荐结果与步骤209得到的第二推荐结果进行加权处理之前,还包括:利用归一化将评分压缩到(0,1)范围。
例如,第二推荐结果为:产品1、归一化前评分为16;产品7、归一化前评分为10;产品5、归一化前评分为3…
归一化后的第二推荐结果为:产品1、评分为0.53,产品7、评分为0.33,产品5、评分为0.1…
对于本实施例,作为一种优选实施方式,计算第一推荐结果对应产品与第二推荐结果对应产品的并集产品;利用预设第三系数加权并集产品的加权相似值与评分,得到目标推荐值;根据目标推荐值由大到小的顺序排序得到目标产品推荐结果。
例如,第一推荐结果计算出来的是产品与对应的加权相似值,第二推荐结果计算出来的是产品与对应的评分,比如产品1在第一推荐结果里加权相似值是0.5,产品1在第二推荐结果里评分是0.53,那么目标推荐值是0.5*预设第三系数+0.53*(1-预设第三系数),如果产品2在第一推荐结果里加权相似值是0.1,但是第二推荐结果里没有产品2,那么目标推荐值是0.1*预设第三系数+0*(1-预设第三系数)。
本申请公开了一种基于内容与协同过滤的产品推荐方法、装置及设备,本申请首先获 取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;进一步的,提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;此外,确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;最后,根据第一推荐结果与第二推荐结果确定目标产品推荐结果。通过本申请中的技术方案,从产品查询文本出发得到针对目标产品的第一推荐结果,从与查询用户行为相关性高的邻居用户角度出发得到针对目标产品的第二推荐结果,然后使用第一推荐结果与第二推荐结果共同确定目标产品推荐结果,通过多个维度综合为查询用户推荐,推荐精确度高,且符合查询用户的个性化需求。
进一步的,作为图1和图2所示方法的具体实现,本申请实施例提供了一种基于内容与协同过滤的产品推荐装置,如图3所示,该装置包括:筛选模块31、第一推荐模块32、第二推荐模块33、确定模块34;
筛选模块31,可用于获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
第一推荐模块32,可用于提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;
第二推荐模块33,可用于确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;
确定模块34,可用于根据第一推荐结果与第二推荐结果确定目标产品推荐结果。
在具体的应用场景中,为了利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,如图4所示,筛选模块31,具体可包括:分词单元311、第一计算单元312、第二计算单元313;
分词单元311,可用于根据预设分词技术对产品查询文本进行分词处理得到产品查询特征词集,对预设产品文本进行分词处理得到预设产品特征词集;
第一计算单元312,可用于利用TF-IDF算法计算产品查询特征词集对应的产品查询特征向量以及预设产品特征词集对应的预设产品特征向量;
第二计算单元313,可用于利用预设相似度计算公式计算产品查询特征向量与预设产品特征向量的第一相似度。
相应的,为了利用TF-IDF算法计算产品查询特征词集对应的产品查询特征向量以及预设产品特征词集对应的预设产品特征向量,第一计算单元312,具体可用于计算产品查询特征词集中每个产品查询特征词对产品查询文本的第一权重值以及预设产品特征词集中的每个预设产品特征词对预设产品文本的第二权重值;构建包括产品查询特征词与对应第一权重值的产品查询特征向量,以及构建包括预设产品特征词与对应第二权重值的预设产品特征向量。
在具体的应用场景中,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,如图4所示,第一推荐模块32,具体可包括:交集单元321、权重单元322、第一加权单元323;
交集单元321,可用于计算所述标识特征词集与所述产品查询特征词集的第一交集,以及所述描述特征词集与所述产品查询特征词集的第二交集;
权重单元322,可用于根据第一交集计算标识特征词集相对于产品查询特征词集的第三权重值,以及根据第二交集计算描述特征词集相对于产品查询特征词集的第四权重值;
第一加权单元323,可用于利用预设系数加权第三权重值与第四权重值得到推荐产品文本与产品查询文本的加权相似值。
在具体的应用场景中,历史行为集合包括邻居用户的行为集不同于查询用户的行为集的待预测行为集,以及与待预测行为集相邻的相邻行为集,为了确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,如图4所示,第二推荐模块33,具体可包括:第一筛选单元331、第一确定单元332、第二筛选单元333;
第一筛选单元331,可用于利用预设相关系数计算公式计算所述查询用户与其他用户的相关系数,将所述相关系数大于第一预设阈值的其他用户确定为邻居用户;
第一确定单元332,可用于将邻居用户的行为集与查询用户的行为集的差确定为待预测行为集;
第二筛选单元333,可用于根据k近邻算法或k-means算法计算待预测行为集与其他行为集的第二相似度,将第二相似度大于第二预设阈值的其他行为集确定为相邻行为集。
在具体的应用场景中,基于协同过滤算法计算查询用户对历史行为集合的评分,并根 据评分确定第二推荐结果,如图4所示,第二推荐模块33,具体还可包括:第一评分单元334、第二评分单元335、第二筛选单元336;
第一评分单元334,可用于基于用户的协同过滤算法计算查询用户对待预测行为集的第一评分;
第二评分单元335,可用于基于项的协同过滤算法计算查询用户对相邻行为集的第二评分;
第二确定单元336,可用于根据第一评分和/或第二评分确定第二推荐结果。
在具体的应用场景中,根据第一推荐结果与第二推荐结果确定目标产品推荐结果,如图4所示,确定模块34,具体可包括:并集单元341、第二加权单元342、推荐单元343;
并集单元341,可用于计算第一推荐结果对应产品与第二推荐结果对应产品的并集产品;
第二加权单元342,可用于利用预设第三系数加权并集产品的加权相似值与评分,得到目标推荐值;
推荐单元343,可用于根据目标推荐值由大到小的顺序排序得到目标产品推荐结果。
需要说明的是,本实施例提供的一种基于内容与协同过滤的产品推荐装置所涉及各功能单元的其他相应描述,可以参考图1至图2的对应描述,在此不再赘述。
基于上述如图1至图2所示方法,相应的,本实施例还提供了一种可读存储介质,可读存储介质具体可为易失性或非易失性,其上存储有计算机可读指令,该可读指令被处理器执行时实现上述如图1至图2所示的基于内容与协同过滤的产品推荐方法。
基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个可读存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景的方法。
基于上述如图1至图2所示的方法和图3、图4所示的虚拟装置实施例,为了实现上述目的,本实施例还提供了一种计算机设备,该计算机设备包括非易失性可读存储介质和处理器;非易失性可读存储介质,用于存储计算机可读指令;处理器,用于执行计算机可读指令以实现上述如图1至图2所示的基于内容与协同过滤的产品推荐方法。
可选的,该计算机设备还可以包括用户接口、网络接口、摄像头、射频(Radio Frequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。 网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)等。
本领域技术人员可以理解,本实施例提供的一种计算机设备结构并不构成对该实体设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。
非易失性可读存储介质中还可以包括操作系统、网络通信模块。操作系统是管理上述计算机设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块用于实现非易失性可读存储介质内部各组件之间的通信,以及与信息处理实体设备中其它硬件和软件之间通信。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。
通过应用本申请的技术方案,与目前现有技术相比,本申请公开了一种基于内容与协同过滤的产品推荐方法、装置及设备,本申请首先获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算产品查询文本与预设产品文本的第一相似度,将对应第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;进一步的,提取推荐产品文本的标识特征词集与描述特征词集,根据标识特征词集与描述特征词集计算推荐产品文本与产品查询文本的加权相似值,并按照加权相似值由大到小的顺序形成第一推荐结果;此外,确定与查询用户行为相关性高于第一预设阈值的邻居用户,以及邻居用户针对产品的历史行为集合,基于协同过滤算法计算查询用户对历史行为集合的评分,并根据评分确定第二推荐结果;最后,根据第一推荐结果与第二推荐结果确定目标产品推荐结果。通过本申请中的技术方案,从产品查询文本出发得到针对目标产品的第一推荐结果,从与查询用户行为相关性高的邻居用户角度出发得到针对目标产品的第二推荐结果,然后使用第一推荐结果与第二推荐结果共同确定目标产品推荐结果,通过多个维度综合为查询用户推荐,推荐精确度高,且符合查询用户的个性化需求。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。

Claims (20)

  1. 一种基于内容与协同过滤的产品推荐方法,其中,包括:
    获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
    提取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
    确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
    根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
  2. 根据权利要求1所述的方法,其中,所述利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,包括:
    根据预设分词技术对所述产品查询文本进行分词处理得到产品查询特征词集,对所述预设产品文本进行分词处理得到预设产品特征词集;
    利用TF-IDF算法计算所述产品查询特征词集对应的产品查询特征向量以及所述预设产品特征词集对应的预设产品特征向量;
    利用预设相似度计算公式计算所述产品查询特征向量与所述预设产品特征向量的第一相似度。
  3. 根据权利要求2所述的方法,其中,所述利用TF-IDF算法计算所述产品查询特征词集对应的产品查询特征向量以及所述预设产品特征词集对应的预设产品特征向量,包括:
    计算所述产品查询特征词集中每个产品查询特征词对所述产品查询文本的第一权重值以及所述预设产品特征词集中的每个预设产品特征词对所述预设产品文本的第二权重值;
    构建包括所述产品查询特征词与对应所述第一权重值的产品查询特征向量,以及构建包括所述预设产品特征词与对应所述第二权重值的预设产品特征向量。
  4. 根据权利要求2所述的方法,其中,所述根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,包括:
    计算所述标识特征词集与所述产品查询特征词集的第一交集,以及所述描述特征词集 与所述产品查询特征词集的第二交集;
    根据所述第一交集计算所述标识特征词集相对于所述产品查询特征词集的第三权重值,以及根据所述第二交集计算所述描述特征词集相对于所述产品查询特征词集的第四权重值;
    利用预设系数加权所述第三权重值与所述第四权重值得到所述推荐产品文本与所述产品查询文本的加权相似值。
  5. 根据权利要求1所述的方法,其中,所述历史行为集合包括所述邻居用户的行为集不同于所述查询用户的行为集的待预测行为集,以及与所述待预测行为集相邻的相邻行为集,所述确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,包括:
    利用预设相关系数计算公式计算所述查询用户与其他用户的相关系数,将所述相关系数大于第一预设阈值的其他用户确定为邻居用户;
    将所述邻居用户的行为集与所述查询用户的行为集的差确定为待预测行为集;
    根据k近邻算法或k-means算法计算所述待预测行为集与其他行为集的第二相似度,将所述第二相似度大于第二预设阈值的其他行为集确定为相邻行为集。
  6. 根据权利要求5所述的方法,其中,所述基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果,包括:
    基于用户的协同过滤算法计算所述查询用户对所述待预测行为集的第一评分;
    基于项的协同过滤算法计算所述查询用户对所述相邻行为集的第二评分;
    根据所述第一评分和/或所述第二评分确定第二推荐结果。
  7. 根据权利要求1所述的方法,其中,所述根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果,包括:
    计算所述第一推荐结果对应产品与所述第二推荐结果对应产品的并集产品;
    利用预设第三系数加权所述并集产品的加权相似值与评分,得到目标推荐值;
    根据所述目标推荐值由大到小的顺序排序得到目标产品推荐结果。
  8. 一种基于内容与协同过滤的产品推荐装置,其特征在于,包括:
    筛选模块,用于获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
    第一推荐模块,用于提取所述推荐产品文本的标识特征词集与描述特征词集,根据所 述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
    第二推荐模块,用于确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
    确定模块,用于根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
  9. 一种计算机设备,包括存储介质、处理器及存储在存储介质上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现以下步骤:
    获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
    提取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
    确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
    根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现以下步骤:
    获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
    提取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
    确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
    根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
  11. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:根据预设分词技术对所述产品查询文本进行分词处理得到产品查询特征词集,对所述预设产品文本进行分词处理得到预设产品特征词集;利用TF-IDF算法计算所述产品查询特征词集对应的产品查询特征向量以及所述预设产品特征词集对应的预设产品特征向量;利用预设相似度计算公式计算所述产品查询特征向量与所述预设产品特征向量的第一相似度。
  12. 根据权利要求11所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:计算所述产品查询特征词集中每个产品查询特征词对所述产品查询文本的第一权重值以及所述预设产品特征词集中的每个预设产品特征词对所述预设产品文本的第二权重值;构建包括所述产品查询特征词与对应所述第一权重值的产品查询特征向量,以及构建包括所述预设产品特征词与对应所述第二权重值的预设产品特征向量。
  13. 根据权利要求11所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:计算所述标识特征词集与所述产品查询特征词集的第一交集,以及所述描述特征词集与所述产品查询特征词集的第二交集;根据所述第一交集计算所述标识特征词集相对于所述产品查询特征词集的第三权重值,以及根据所述第二交集计算所述描述特征词集相对于所述产品查询特征词集的第四权重值;利用预设系数加权所述第三权重值与所述第四权重值得到所述推荐产品文本与所述产品查询文本的加权相似值。
  14. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:利用预设相关系数计算公式计算所述查询用户与其他用户的相关系数,将所述相关系数大于第一预设阈值的其他用户确定为邻居用户;将所述邻居用户的行为集与所述查询用户的行为集的差确定为待预测行为集;根据k近邻算法或k-means算法计算所述待预测行为集与其他行为集的第二相似度,将所述第二相似度大于第二预设阈值的其他行为集确定为相邻行为集。
  15. 根据权利要求14所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:基于用户的协同过滤算法计算所述查询用户对所述待预测行为集的第一评分;基于项的协同过滤算法计算所述查询用户对所述相邻行为集的第二评分;根据所述第一评分和/或所述第二评分确定第二推荐结果。
  16. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:计算所述第一推荐结果对应产品与所述第二推荐结果对应产品的并集产品;利用预设第三系数加权所述并集产品的加权相似值与评分,得到目标推荐值; 根据所述目标推荐值由大到小的顺序排序得到目标产品推荐结果。
  17. 一种计算机程序产品,包括计算机程序,其中,该计算机程序被处理器执行时实现以下步骤:
    获取查询用户发送的针对目标产品的产品查询文本,利用预设分词技术与TF-IDF算法计算所述产品查询文本与预设产品文本的第一相似度,将对应所述第一相似度大于预设相似度阈值的预设产品文本确定为推荐产品文本;
    提取所述推荐产品文本的标识特征词集与描述特征词集,根据所述标识特征词集与所述描述特征词集计算所述推荐产品文本与所述产品查询文本的加权相似值,并按照所述加权相似值由大到小的顺序形成第一推荐结果;
    确定与所述查询用户行为相关性高于第一预设阈值的邻居用户,以及所述邻居用户针对产品的历史行为集合,基于协同过滤算法计算所述查询用户对所述历史行为集合的评分,并根据所述评分确定第二推荐结果;
    根据所述第一推荐结果与所述第二推荐结果确定目标产品推荐结果。
  18. 根据权利要求17所述的计算机程序产品,其中,所述计算机程序被处理器执行时实现以下步骤:根据预设分词技术对所述产品查询文本进行分词处理得到产品查询特征词集,对所述预设产品文本进行分词处理得到预设产品特征词集;利用TF-IDF算法计算所述产品查询特征词集对应的产品查询特征向量以及所述预设产品特征词集对应的预设产品特征向量;利用预设相似度计算公式计算所述产品查询特征向量与所述预设产品特征向量的第一相似度。
  19. 根据权利要求18所述的计算机程序产品,其中,所述计算机程序被处理器执行时实现以下步骤:计算所述产品查询特征词集中每个产品查询特征词对所述产品查询文本的第一权重值以及所述预设产品特征词集中的每个预设产品特征词对所述预设产品文本的第二权重值;构建包括所述产品查询特征词与对应所述第一权重值的产品查询特征向量,以及构建包括所述预设产品特征词与对应所述第二权重值的预设产品特征向量。
  20. 根据权利要求18所述的计算机程序产品,其中,所述计算机程序被处理器执行时实现以下步骤:计算所述标识特征词集与所述产品查询特征词集的第一交集,以及所述描述特征词集与所述产品查询特征词集的第二交集;根据所述第一交集计算所述标识特征词集相对于所述产品查询特征词集的第三权重值,以及根据所述第二交集计算所述描述特征词集相对于所述产品查询特征词集的第四权重值;利用预设系数加权所述第三权重值与所述第四权重值得到所述推荐产品文本与所述产品查询文本的加权相似值。
PCT/CN2022/122200 2022-04-24 2022-09-28 基于内容与协同过滤的产品推荐方法、装置及计算机设备 WO2023206960A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210435260.6 2022-04-24
CN202210435260.6A CN114610859A (zh) 2022-04-24 2022-04-24 基于内容与协同过滤的产品推荐方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2023206960A1 true WO2023206960A1 (zh) 2023-11-02

Family

ID=81869048

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122200 WO2023206960A1 (zh) 2022-04-24 2022-09-28 基于内容与协同过滤的产品推荐方法、装置及计算机设备

Country Status (2)

Country Link
CN (1) CN114610859A (zh)
WO (1) WO2023206960A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610859A (zh) * 2022-04-24 2022-06-10 康键信息技术(深圳)有限公司 基于内容与协同过滤的产品推荐方法、装置及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011070422A (ja) * 2009-09-25 2011-04-07 Dainippon Printing Co Ltd 商品推薦装置
JP2015079381A (ja) * 2013-10-17 2015-04-23 日本電信電話株式会社 アイテム推薦装置、アイテム推薦方法およびアイテム推薦プログラム
US20160147768A1 (en) * 2014-11-25 2016-05-26 Samsung Electronics Co., Ltd. Device and method for providing media resource
CN111104485A (zh) * 2019-12-24 2020-05-05 上海风秩科技有限公司 一种产品文本的确定方法、装置、计算机设备和介质
CN111506831A (zh) * 2020-04-13 2020-08-07 蔡梓超 一种协同过滤的推荐模块、方法、电子设备及存储介质
CN113643103A (zh) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 基于用户相似度的产品推荐方法、装置、设备及存储介质
CN113850643A (zh) * 2021-09-18 2021-12-28 中国平安财产保险股份有限公司 产品推荐方法、装置、电子设备及可读存储介质
CN114610859A (zh) * 2022-04-24 2022-06-10 康键信息技术(深圳)有限公司 基于内容与协同过滤的产品推荐方法、装置及设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011070422A (ja) * 2009-09-25 2011-04-07 Dainippon Printing Co Ltd 商品推薦装置
JP2015079381A (ja) * 2013-10-17 2015-04-23 日本電信電話株式会社 アイテム推薦装置、アイテム推薦方法およびアイテム推薦プログラム
US20160147768A1 (en) * 2014-11-25 2016-05-26 Samsung Electronics Co., Ltd. Device and method for providing media resource
CN111104485A (zh) * 2019-12-24 2020-05-05 上海风秩科技有限公司 一种产品文本的确定方法、装置、计算机设备和介质
CN111506831A (zh) * 2020-04-13 2020-08-07 蔡梓超 一种协同过滤的推荐模块、方法、电子设备及存储介质
CN113643103A (zh) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 基于用户相似度的产品推荐方法、装置、设备及存储介质
CN113850643A (zh) * 2021-09-18 2021-12-28 中国平安财产保险股份有限公司 产品推荐方法、装置、电子设备及可读存储介质
CN114610859A (zh) * 2022-04-24 2022-06-10 康键信息技术(深圳)有限公司 基于内容与协同过滤的产品推荐方法、装置及设备

Also Published As

Publication number Publication date
CN114610859A (zh) 2022-06-10

Similar Documents

Publication Publication Date Title
US10853360B2 (en) Searchable index
CN104835072B (zh) 用于社交网络中用户的兼容性评分的方法和系统
US9576029B2 (en) Trust propagation through both explicit and implicit social networks
KR101700352B1 (ko) 이력적 검색 결과들을 사용한 향상된 문서 분류 데이터 생성
WO2020037931A1 (zh) 项目推荐方法、装置、计算机设备及存储介质
US20160070803A1 (en) Conceptual product recommendation
US20110307469A1 (en) System and method for query suggestion based on real-time content stream
JP2005302043A (ja) 検索語提案のためのマルチ型データオブジェクトの強化されたクラスタリング
CN106952130B (zh) 基于协同过滤的通用物品推荐方法
JP6664599B2 (ja) 曖昧性評価装置、曖昧性評価方法、及び曖昧性評価プログラム
WO2013107031A1 (zh) 基于评论信息确定视频质量参数的方法、装置和系统
WO2023206960A1 (zh) 基于内容与协同过滤的产品推荐方法、装置及计算机设备
JP7067884B2 (ja) 分類装置、分類方法及び分類プログラム
WO2020134839A1 (zh) 一种图像搜索的方法及装置
CN112380451A (zh) 一种基于大数据的喜好内容推荐方法
WO2021027149A1 (zh) 基于画像相似性的信息检索推荐方法、装置及存储介质
JP6434954B2 (ja) 情報処理装置、情報処理方法、およびプログラム
WO2023151576A1 (zh) 搜索推荐方法、搜索推荐系统、计算机设备及存储介质
CN110162535B (zh) 用于执行个性化的搜索方法、装置、设备以及存储介质
CN109918661B (zh) 同义词获取方法及装置
JP2019149102A (ja) 情報処理装置、キーワード抽出装置、情報処理方法、およびプログラム
KR102351264B1 (ko) 사용자 맞춤형 신간 도서 정보의 제공 방법 및 그 시스템
US10909127B2 (en) Method and server for ranking documents on a SERP
JP2019053520A (ja) 提供装置、提供方法及び提供プログラム
JP2022029461A (ja) キーワード抽出装置、キーワード抽出方法、およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939782

Country of ref document: EP

Kind code of ref document: A1