CN111401936A - Recommendation method based on comment space and user preference - Google Patents

Recommendation method based on comment space and user preference Download PDF

Info

Publication number
CN111401936A
CN111401936A CN202010118462.9A CN202010118462A CN111401936A CN 111401936 A CN111401936 A CN 111401936A CN 202010118462 A CN202010118462 A CN 202010118462A CN 111401936 A CN111401936 A CN 111401936A
Authority
CN
China
Prior art keywords
comment
user
commodity
comment data
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010118462.9A
Other languages
Chinese (zh)
Other versions
CN111401936B (en
Inventor
余文涛
葛蕾
余文彬
黄晓辉
唐慧丰
胡瑞娟
李勇
李珠峰
席耀一
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202010118462.9A priority Critical patent/CN111401936B/en
Publication of CN111401936A publication Critical patent/CN111401936A/en
Application granted granted Critical
Publication of CN111401936B publication Critical patent/CN111401936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of commodity recommendation, and discloses a recommendation method based on comment space and user preference, which comprises the following steps: step 1, constructing a commodity comment data corpus and a user comment data corpus; step 2, respectively constructing a commodity comment space and a user preference space based on a commodity comment data corpus and a user comment data corpus; and 3, obtaining commodity recommendation for the user according to the position of the user in the user preference space and the constructed commodity comment space. According to the invention, commodity attributes are quantized by using commodity comment data, commodities are mapped to the attribute space, and commodity recommendation is carried out by combining a method of historical shopping preference of a user, so that the problem of low recommendation efficiency caused by sparse matrixes is solved for the existing recommendation system, and with the continuous increase of comment information amount, the commodity recommendation method can accurately recommend commodities meeting the shopping preference of the user to the user by digging the relation between the shopping preference of the user and the commodity attributes.

Description

Recommendation method based on comment space and user preference
Technical Field
The invention belongs to the technical field of commodity recommendation, and particularly relates to a recommendation method based on a comment space and user preferences.
Background
The internet brings convenience to people and also brings complex and heavy screening tasks to people, so that people have to spend a great deal of time to select information really needed by themselves. The information sea is also brought to people in the aspect of online shopping, and users always have no time to take when facing various and numerous commodity information, and often need to spend several times more time than before to select favorite commodities which are suitable for the users. Similarly, information overload can bring great challenges to merchants on the network, information published by the merchants through the internet can be quickly inundated by other information, and the merchants need to continuously update own commodity information to achieve the purpose of sales. In order to solve the problem, each online shopping platform provides various recommendation algorithms (Lilin, Liu brocade, Bengxifu, and the like), a commodity recommendation model [ J ] combining a scoring matrix and a comment text, 2018,41(7): 1559-.
With the development and progress of the times, the challenges of the e-commerce industry will increase greatly, and the quality of the recommendation system of the e-commerce website will become a way for merchants to compete (li yu qi, chen wei zheng hong fei, etc. personalized commodity recommendation based on network representation learning [ J ] computer science, 2019(8): 7.). In the current recommendation system, the accuracy and real-time performance of the internal recommendation algorithm are not high enough, so a feasible recommendation algorithm is still required to be provided to improve and upgrade the existing recommendation system.
Disclosure of Invention
The invention provides a recommendation method based on a comment space and user preferences, aiming at the problem that the accuracy and the real-time performance of an internal recommendation algorithm of the conventional recommendation system are not high enough.
In order to achieve the purpose, the invention adopts the following technical scheme:
a recommendation method based on comment space and user preferences includes:
step 1: constructing a commodity comment data corpus and a user comment data corpus;
step 2: respectively constructing a commodity comment space and a user preference space based on a commodity comment data corpus and a user comment data corpus;
and step 3: and obtaining commodity recommendation for the user according to the position of the user in the user preference space and the constructed commodity comment space.
Further, the constructing a corpus of commodity review data includes:
collecting comment data of all users on the same commodity;
performing word segmentation operation on comment data of the same commodity by all users by adopting a Jieba word segmentation module, and removing stop words;
and classifying the comment data of all users on the same commodity by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a commodity comment data corpus.
Further, the constructing a corpus of user comment data includes:
collecting comment data of the same user on all commodities;
performing word segmentation operation on comment data of all commodities of the same user by adopting a Jieba word segmentation module, and removing stop words;
and classifying the comment data of all commodities of the same user by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a user comment data corpus.
Further, constructing a commodity review space based on the commodity review data corpus includes:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express;
dividing the comment data in the commodity comment data corpus into a positive category and a negative category by using a KNN classification algorithm;
respectively forming 1 sequence of positive comment data and negative comment data which describe the same attribute feature of the same commodity according to the sequence of positive emotion probability from large to small, and determining the first comment data marked as positive in the sequence corresponding to the positive comment data as a positive deviation reference comment on the attribute feature, namely a unit 1 on the attribute feature; setting a sentence marked as negative in the corresponding sequence of the negative comment data as a negative reference comment on the attribute feature, namely, the sentence is a unit-1 on the attribute feature;
and (3) calculating a cosine value between the comment data to be processed and the reference comment on the attribute characteristic by using a cosine similarity algorithm: if the classification label of the comment data is positive, calculating a cosine value between the comment data and the attribute feature, which is actively biased to the reference comment, wherein the obtained cosine value is a coordinate value of the comment data in the attribute dimension; if the classification label of the comment statement is negative, calculating a cosine value between the comment data and the negative deviation reference comment on the attribute feature, wherein the opposite number of the obtained cosine value is a coordinate value of the comment data on the attribute dimension;
in the same comment data, summarizing the coordinate values on the attribute dimensions obtained through calculation into a coordinate point of a four-dimensional space, and for the attribute features which are not mentioned in the comment data, setting the value on the dimension corresponding to the attribute features which are not mentioned as 0, and obtaining the coordinate values of the comment data in the four-dimensional space coordinate;
averaging all the comment data of the same commodity on the dimension of the same commodity to obtain the coordinate value of each attribute of each commodity in a four-dimensional space coordinate;
and displaying coordinate values of each attribute of each commodity in the space coordinate, namely each commodity point in the four-dimensional space coordinate to complete the construction of the commodity comment space.
Further, after the completing the building of the commodity comment space, the method further comprises:
and clustering the generated commodity points by adopting a K-means clustering algorithm, putting commodities with similar attribute characteristics into a cluster, and storing a clustering center.
Further, constructing the user preference space based on the user comment data corpus includes:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express;
quantifying emotional tendency of the users according to the attention degree of the users to the attribute characteristics, wherein the attention degree is the proportion of the number of comment data of each attribute of the same user to the total number of the comment data, and the emotional tendency space coordinate of each user is obtained;
and displaying all spatial coordinate points representing the user in the four-dimensional spatial coordinates to complete the construction of the user preference space.
Further, the step 3 comprises:
and taking the position of the user in the user preference space as a point of the user shopping preference, calculating the distance between the point of the user shopping preference and the center point of each cluster in the commodity comment space by adopting the Euclidean distance, and recommending the commodity in the cluster closest to the point of the user shopping preference to the user.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses the commodity comment data to quantify the commodity attributes and maps the commodities to the attribute space, and the commodity recommendation is carried out by combining the method of the historical shopping preference of the user. Aiming at the existing recommendation system, the problem of low recommendation efficiency caused by a sparse matrix is avoided, and along with the continuous increase of the comment information amount, the judgment of the attribute characteristics of the commodity is more accurate. The comments of each user are analyzed in an independent analysis mode among the users, and the shopping preference of the user is obtained, so that the accuracy of the recommendation for the user is not influenced by the analysis results of other users, and personalized recommendation is realized. According to the invention, the relation between the shopping preference of the user and the commodity attribute is mined, the commodity which accords with the shopping preference of the user can be recommended to the user more accurately, and the experimental result shows that a good effect can be achieved only by using historical comment data.
Drawings
FIG. 1 is a basic flowchart of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of calculating cosine similarity of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a commodity comment space mapping of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;
FIG. 4 is a graph of change in SSE of the method for recommending based on comment space and user preferences according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a user preference space mapping of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;
FIG. 6 is an exemplary diagram of a product recommendation method based on a comment space and a user preference according to an embodiment of the present invention;
FIG. 7 is an exemplary diagram of a product and review of a recommendation method based on review space and user preferences according to an embodiment of the present invention;
FIG. 8 is an exemplary diagram of a user and comments of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;
fig. 9 is a comparison diagram of different algorithm accuracy rates of a recommendation method based on comment space and user preferences in an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
as shown in fig. 1, a recommendation method based on a comment space and a user preference includes:
step S101: constructing a commodity comment data corpus and a user comment data corpus;
step S102: respectively constructing a commodity comment space and a user preference space based on a commodity comment data corpus and a user comment data corpus;
step S103: and obtaining commodity recommendation for the user according to the position of the user in the user preference space and the constructed commodity comment space.
Specifically, the constructing a corpus of commodity review data includes:
collecting comment data of all users on the same commodity; as an implementable manner, the comment data employed in the present embodiment is mainly divided into the following two parts:
①, the data refers to the comment data published by the network, and the data can identify the commodity and the comment together, and can also identify the user and the comment together;
②, the data is semi-public, the data refers to the data provided by some competition official group committees, and the other part of experimental data is provided by the competition official group committees of big data and computer intelligence competition.
Performing word segmentation operation on comment data of the same commodity by all users by adopting a Jieba word segmentation module, and removing stop words; the Jieba word segmentation module has three modes including: full mode, precision mode, and search engine mode. As an implementable manner, the comment data is segmented by adopting a Jieba word segmentation full mode in the embodiment. In the process of Chinese text processing, in order to save storage space and improve operation efficiency, some words need to be removed after word segmentation, and the words are called stop words.
And classifying the comment data of all users on the same commodity by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a commodity comment data corpus. The comment dictionary refers to a few words which are mainly adjectives, nouns, adverbs and the like and are used for describing a certain attribute characteristic of a commodity when a user comments on the attribute characteristic. The classification method adopted by the present embodiment is a (review) dictionary-based classification algorithm. The key of the comment data classification algorithm based on the dictionary is the construction of the dictionary, and the four dictionaries adopted in the embodiment are completed in a manual filling mode by taking the existing dictionaries as a reference. Before classification, four attribute characteristics of quality, price, appearance and express are determined to be generally concerned about the commodity by people through analysis of comment data, so that the embodiment takes the four attribute characteristics as four aspects for measuring the commodity characteristics. The contents of the four dictionaries are partially shown in table 1.
TABLE 1 partial contents of the four dictionaries
Quality of Price Appearance of the product Express delivery
Is firm and durable Price reduction Good looking Delivery of goods
Adulteration Is cheap Beautiful and beautiful Delivery of goods
High quality Economical Color value Speed of rotation
Poor quality Offers Atmosphere (es) Delivery system
Mountain village Cost-effective Honour and honour Logistics
Defective products Selling price Junqiao Fast-acting toy
Foot goods Height of Beauty product Good service
And matching through the review data after the traversal processing, and further identifying the category described by the processed text. The classification result may have a phenomenon that a line of data corresponds to a plurality of category labels, such as: in the comment of 'book special stick color quality is good in delivery and ultra-fast', the comment data describes the quality attribute of the commodity, and also describes the appearance attribute and express delivery attribute of the commodity.
Specifically, the constructing a corpus of user comment data includes:
collecting comment data of the same user on all commodities; specifically, the adopted comment data is consistent with the original data for constructing the commodity comment data corpus;
performing word segmentation operation on comment data of all commodities of the same user by adopting a Jieba word segmentation module, and removing stop words;
and classifying the comment data of all commodities of the same user by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a user comment data corpus.
Specifically, constructing a commodity review space based on a commodity review data corpus includes:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express; specifically, the invention extracts four commodity attributes generally concerned by the user as classification objects when classifying the comment data, and the construction of the multidimensional space coordinate system is also established by taking the four attribute characteristics of the commodity as coordinate axes. Each coordinate axis in the space coordinate system represents an attribute feature of the commodity, the positive direction of the coordinate axis represents that the comment holds positive attitude to the attribute feature, the negative direction represents that the comment holds negative attitude to the attribute feature, and 0 represents that the comment holds neutral attitude to the attribute feature or does not refer to the attribute feature.
Dividing the comment data in the commodity comment data corpus into a positive category and a negative category by using a KNN classification algorithm;
all comment data describing the same attribute feature of the same commodity are combined into 1 sequence, and the comment data marked as positive in the sequence is determined as a positively biased reference comment on the attribute feature, namely a unit 1 on the attribute feature; setting the last sentence marked as negative as a negative reference comment on the attribute feature, namely, the last sentence is a unit-1 on the attribute feature;
and (3) calculating a cosine value between the comment data to be processed and the reference comment on the attribute characteristic by using a cosine similarity algorithm: if the classification label of the comment data is positive, calculating a cosine value between the comment data and the attribute feature, which is actively biased to the reference comment, wherein the obtained cosine value is a coordinate value of the comment data in the attribute dimension; if the classification label of the comment statement is negative, calculating a cosine value between the comment data and the negative deviation reference comment on the attribute feature, wherein the opposite number of the obtained cosine value is a coordinate value of the comment data on the attribute dimension;
in the same comment data, summarizing the coordinate values on the attribute dimensions obtained through calculation into a coordinate point of a four-dimensional space, and for the attribute features which are not mentioned in the comment data, setting the value on the dimension corresponding to the attribute features which are not mentioned as 0, and obtaining the coordinate values of the comment data in the four-dimensional space coordinate;
averaging all the comment data of the same commodity on the dimension of the same commodity to obtain the coordinate value of each attribute of each commodity in a four-dimensional space coordinate;
and displaying coordinate values of each attribute of each commodity in the space coordinate, namely each commodity point in the four-dimensional space coordinate to complete the construction of the commodity comment space.
Specifically, quantitative expression of emotional tendency of the commodity comment refers to evaluation of attribute features described by each comment of the commodity, and a value is finally obtained to express the emotional tendency of the comment on the attribute features, and a space coordinate generated by the method is obtained by means of emotional analysis of each comment;
for example, a statement describing the quality attribute characteristics is 'good quality of the commodity', the probability value of the positive emotion of the statement is 0.749396374210698, and for the same commodity, a statement product extreme descending sequence S describing the same attribute characteristic is obtained;
however, the SnowN L P library can only obtain positive biased probability, which is not enough for the invention to construct comment space, and the invention also provides the following solution:
the KNN classification algorithm is a relatively mature classification algorithm in theory and is one of the simplest machine learning algorithms. The idea of the algorithm core is to analyze the class to which the object to be classified belongs to, and the class to which the object to be classified belongs, in the latest K objects of the object to be classified. The KNN classification algorithm has high requirements on the training set because it is only related to the nearest K samples, and if the error rate of the sample data in the training set is high, the accuracy of the final classification result is also low. The present embodiment sets the value K to 9, which can prevent the occurrence of the case where the number of two classes is the same. And respectively carrying out distance calculation on all the commodity comments and the classified comments in the training set to obtain K comments with the nearest distance. Calculating the number of the categories to which the K comments belong, and dividing the comments to be classified into the categories with more categories to which the K comments belong;
in order to find two comments corresponding to 1 and-1 on a coordinate axis, a snowN L P and a KNN classification algorithm are used for processing contest data with emotion polarity labels, the snowN L P can only acquire the probability that the current comment is biased to be positive (positive emotion), one comment data with the highest probability in an S sequence is used as the comment corresponding to 1, when the comment corresponding to-1 is acquired, the obtained sequence is at the head end due to the limitation of the algorithm, the probability difference is obvious, the probability value difference is not large at the tail end, and in order to find the comment corresponding to-1, the contest data with emotion polarity labels are classified by using the KNN classification method, all negative comment data sets are independently used for processing the acquired sequence by using the snowN L P, the probability value difference is obvious, and the minimum value is used as the comment corresponding to-1.
After classification, sequences S corresponding to positive and negative comment sentences of each attribute of each commodity are respectively obtained through a SnowN L P library, in all sentences describing the same attribute feature of the same commodity, a first sentence marked as positive in the sequence S corresponding to the positive comment sentence is determined as a positive deviation reference sentence on the attribute feature, namely a unit 1 on the attribute feature, and a last sentence marked as negative in the sequence S corresponding to the negative comment sentence is determined as a negative reference sentence on the attribute feature, namely a unit-1 on the attribute feature;
and the cosine similarity is an algorithm for predicting the similarity between two vectors by calculating the cosine value of an included angle between the two vectors in the space. The cosine value has a minimum value of-1 and a maximum value of 1. When the cosine value between two vectors is closer to 1, the more the included angle between the two vectors is close to 0 degree, the more the two vectors are similar; the more the cosine value between two vectors approaches-1, the closer the angle between the two vectors is to 180 degrees, the larger the difference between the two vectors. Such as: in two-dimensional space, the coordinates defining vector a are (x)1,y1) The coordinate of the vector b is (x)2,y2) Placing the two vectors in the same two-dimensional space, as shown in fig. 2;
the cosine value of the angle between the vector a and the vector b is calculated as shown in (1):
Figure BDA0002392215610000081
if a and b are two vectors in an n-dimensional space, this calculation formula is still satisfied. Assume a and b are two n-dimensional vectors. The cosine values of the vector a and the vector b are calculated as follows. Therefore, the two comments are regarded as two n-dimensional vectors, and the similarity between the two comments can be calculated by using cosine similarity:
Figure BDA0002392215610000082
the invention uses cosine similarity algorithm to calculate cosine value between the comment statement to be processed and the standard comment statement on the attribute characteristic. If the classification label of the comment statement is positive, the comment statement and the attribute feature are actively biased to calculate a cosine value by referring to the comment, and the obtained value is a coordinate value of the comment statement in the attribute dimension; if the classification label of the comment statement is negative, the comment statement and the negative deviation on the attribute feature refer to the comment to calculate a cosine value, and the obtained opposite number of the value is the coordinate value of the comment statement on the attribute dimension;
in the same comment, the calculated coordinate values on each attribute dimension are collected into a coordinate point of a four-dimensional space, if the comment does not refer to a certain attribute feature, the value of the comment on the attribute dimension is 0, and the coordinate value of the comment in the space coordinate is obtained;
the invention adopts an averaging method to obtain the coordinates of the commodity in the comment space coordinate system. After each comment is represented in a coordinated mode, specific values of emotional tendency of each comment on each attribute feature are obtained, such as: the comment "this toy is too poor in quality and is also sold as expensive, bad comment" which after coordinated representation results in (-0.91, -0.85, 0, 0). The coordinate represents that the value of the comment on the quality attribute characteristic dimension is-0.91, the value on the price attribute characteristic dimension is-0.85, and the values on the appearance attribute characteristic dimension and the express attribute characteristic dimension are 0;
each commodity comprises a plurality of comments, and each comment corresponds to one coordinate value in the space coordinate system. The invention adds the corresponding attributes in the coordinate values of all the comments of each commodity together to take the mean value, as shown in formula (3):
Figure BDA0002392215610000091
and displaying all the commodities in a four-dimensional space, wherein the front three dimensions are respectively quality, price and appearance, displaying the commodities through a coordinate axis, and expressing the express attribute of the fourth dimension through the shade change of the color. As shown in fig. 3.
Specifically, after the completing the building of the commodity comment space, the method further comprises:
clustering the generated commodity points by adopting a K-means clustering algorithm, putting commodities with similar attribute characteristics into a cluster, and storing a clustering center; the invention clusters the generated commodity points by adopting a K-means clustering algorithm, and puts commodities with similar attribute characteristics in a cluster, thereby facilitating the recommendation operation for users. When a K-means clustering algorithm is used for clustering, the K value needs to be determined first, and the K value cannot be determined only by analyzing the formed commodity points. As shown in fig. 4;
the core index of the elbow method is SSE, namely the sum of squares of errors, and the core idea is as follows:
a. along with the continuous increase of the K value, the sample division is more fine, the aggregation degree among all data objects in each cluster is higher and higher, and the SSE value is reduced;
b. when the K value is smaller than the optimal cluster number, the aggregation degree between data in each cluster is greatly increased and the SSE value is also greatly decreased as the K value is increased. When the K value reaches the optimal clustering number, the increase amplitude of the aggregation degree among the data in each cluster is reduced along with the increase of the K value, the reduction amplitude of the SSE value is also reduced and then becomes gentle, and the K value closest to the extreme point of the SSE is the optimal clustering number;
according to the elbow method principle, when the K value is 7, the change of the SSE value is relatively slow, so that the optimal clustering number is 7 for the clustering result of the data set adopted in the embodiment;
after the K-means algorithm is used for obtaining a plurality of clusters, the coordinates of the center points of the clusters are stored, the coordinates of other points in the clusters can be temporarily not considered, but the cluster to which the points belong and the commodity name corresponding to each coordinate point need to be marked.
Specifically, constructing the user preference space based on the user comment data corpus includes:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express; in order to achieve the purpose of synchronizing with the commodity attribute characteristics and improve the recommendation accuracy, the preference space coordinate system of the user needs to be the same as the comment space coordinate system of the commodity. A four-dimensional space coordinate system is established, and four axes respectively represent the attention degree of users to quality, price, appearance and express delivery. The same coordinate system can combine the shopping preference of the user with the attribute characteristics of the commodities, and a higher accuracy rate can be achieved during recommendation;
quantifying emotional tendency of the users according to the attention degree of the users to the attribute characteristics, wherein the attention degree is the proportion of the number of comment data of each attribute of the same user to the total number of the comment data, and the emotional tendency space coordinate of each user is obtained; for all comments of a single user, some comments relate to the quality attribute, some comments relate to the price attribute, and the attention of the user to the quality attribute is required. Such as: to obtain the attention degree of a certain user to the quality attribute, all quality-describing comment sentences in the user need to be counted, and the proportion of the comment sentences in the total number of comment sentences is calculated, wherein the proportion represents the attention degree of the user to the quality attribute. The calculation formula is shown as (4):
Figure BDA0002392215610000101
wherein (n)Quality of,nPrice,nAppearance of the product,nExpress delivery) Respectively representing the number of comment sentences describing quality, price, appearance and express delivery attributes, wherein n represents the total number of comment sentences of the user;
displaying all spatial coordinate points representing the user in a four-dimensional spatial coordinate to complete the construction of a user preference space; according to the method, the emotional tendency space coordinate of a single user is obtained through quantitative representation of the emotional tendency of the user, and the coordinate is the position of the user in the preference space. And displaying all the spatial coordinate points representing the user in a user preference space, wherein the front three dimensions respectively represent the emotional tendency of the user to the quality, price and appearance attribute characteristics of the commodity, the fourth dimension represents the emotional tendency of the user to the express attribute characteristics of the commodity, and the value is represented by the shade degree of the color. As shown in fig. 5.
Specifically, the step S103 includes:
and taking the position of the user in the user preference space as a point of the user shopping preference, calculating the distance between the point of the user shopping preference and the center point of each cluster in the commodity comment space by adopting the Euclidean distance, and recommending the commodity in the cluster closest to the point of the user shopping preference to the user. When the recommendation operation is carried out, the commodity in the cluster closest to the user is recommended to the user according to the position of the user in the preference space. When the commodities are mapped to the multi-dimensional space coordinate system, clustering operation is already carried out on the commodities, and the center point coordinate of each cluster is calculated and reserved; when the user is mapped into the multi-dimensional space coordinate system, each coordinate point representing the shopping preference of the user is obtained. The present invention uses Euclidean distances to calculate the distance between the point of the user's shopping preferences and the center point of these clusters. The euclidean distance is also called euclidean distance, and is a distance definition which is widely applied, and refers to the distance between two points in an n-dimensional space, and the calculation formula is shown in (5).
Figure BDA0002392215610000111
Where n represents an n-dimensional space. The smaller the distance, the better the goods in the cluster meet the shopping preference of the user; the greater the distance, the less the items in the cluster conform to the user's shopping preferences;
the items in the cluster with the smallest distance are recommended to the user, and the recommendation results are shown in fig. 6, 7, and 8. Fig. 6 shows a part of the goods recommended by the user with the ID 15905, and fig. 7 and 8 show the comments of the goods and the comments given by the user with the ID 15905. Because the commodity data and the user data are processed in parallel, the commodity and the user are irrelevant during processing, and the commodity which is once well-rated by the user is recommended to the user again in the recommendation result, so that the recommendation result is high in credibility.
In order to verify the effect of the invention, the recommendation result is measured by adopting the accuracy p, wherein p represents the proportion of the commodities actually liked by the user in the commodities recommended to the user by the recommendation system, and the larger the value is, the better the effect of the recommendation algorithm is. The calculation formula of P is shown in (6):
Figure BDA0002392215610000121
where r (u) represents a list of items recommended by the recommendation system, t (u) represents a list of actions of the user, i.e., items liked by the user, and the absolute value represents the length of the list.
And comparing the recommendation accuracy rate p of the method of the invention with SVD, recommendation algorithm based on association rule, recommendation algorithm based on commodity attribute value and traditional algorithm (collaborative filtering algorithm based on user attribute and commodity classification). Specifically, the recommendation list length is chosen to be 100, i.e., | r (u) | 100. The recommended number of users is 5, 10, 15, 20, respectively, and the calculation result is shown in fig. 9. As can be seen from fig. 9, the accuracy of the recommendation method of the present invention is the highest, but as the number of recommended users increases, the accuracy changes continuously. The reason is that as the number of recommended users increases, the number of favorite commodities differs among the recommended users, the magnitude of increase of the numerator is also different when calculating the accuracy, and the accuracy calculated finally varies.
In conclusion, the recommendation system of the online shopping platform recommends the commodities which best meet the user to the user by discovering certain relations between the user and the commodities. According to the invention, the commodity which accords with the shopping preference of the user can be recommended to the user more accurately by mining the relation between the shopping preference of the user and the commodity attribute. The invention uses the commodity comment data to quantify the commodity attributes and maps the commodities to the attribute space, and the commodity recommendation is carried out by combining the method of the historical shopping preference of the user. Aiming at the existing recommendation system, the problem of low recommendation efficiency caused by a sparse matrix is avoided, and along with the continuous increase of the comment information amount, the judgment of the attribute characteristics of the commodity is more accurate. The comments of each user are analyzed in an independent analysis mode among the users, and the shopping preference of the user is obtained, so that the accuracy of the recommendation for the user is not influenced by the analysis results of other users, and personalized recommendation is realized. The experimental results show that good effects can be achieved only by using historical comment data.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (7)

1. A recommendation method based on comment space and user preferences is characterized by comprising the following steps:
step 1: constructing a commodity comment data corpus and a user comment data corpus;
step 2: respectively constructing a commodity comment space and a user preference space based on a commodity comment data corpus and a user comment data corpus;
and step 3: and obtaining commodity recommendation for the user according to the position of the user in the user preference space and the constructed commodity comment space.
2. The recommendation method based on the comment space and the user preference as claimed in claim 1, wherein the constructing a commodity comment data corpus comprises:
collecting comment data of all users on the same commodity;
performing word segmentation operation on comment data of the same commodity by all users by adopting a Jieba word segmentation module, and removing stop words;
and classifying the comment data of all users on the same commodity by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a commodity comment data corpus.
3. The recommendation method based on comment space and user preference according to claim 1, wherein said constructing a corpus of user comment data comprises:
collecting comment data of the same user on all commodities;
performing word segmentation operation on comment data of all commodities of the same user by adopting a Jieba word segmentation module, and removing stop words;
and classifying the comment data of all commodities of the same user by adopting a classification algorithm based on a dictionary according to four attributes of quality, price, appearance and express delivery, and finishing the construction of a user comment data corpus.
4. The recommendation method based on the comment space and the user preference as claimed in claim 1, wherein the constructing the commodity comment space based on the commodity comment data corpus comprises:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express;
dividing the comment data in the commodity comment data corpus into a positive category and a negative category by using a KNN classification algorithm;
respectively forming 1 sequence of positive comment data and negative comment data which describe the same attribute feature of the same commodity according to the sequence of positive emotion probability from large to small, and determining the first comment data marked as positive in the sequence corresponding to the positive comment data as a positive deviation reference comment on the attribute feature, namely a unit 1 on the attribute feature; setting a sentence marked as negative in the corresponding sequence of the negative comment data as a negative reference comment on the attribute feature, namely, the sentence is a unit-1 on the attribute feature;
and (3) calculating a cosine value between the comment data to be processed and the reference comment on the attribute characteristic by using a cosine similarity algorithm: if the classification label of the comment data is positive, calculating a cosine value between the comment data and the attribute feature, which is actively biased to the reference comment, wherein the obtained cosine value is a coordinate value of the comment data in the attribute dimension; if the classification label of the comment statement is negative, calculating a cosine value between the comment data and the negative deviation reference comment on the attribute feature, wherein the opposite number of the obtained cosine value is a coordinate value of the comment data on the attribute dimension;
in the same comment data, summarizing the coordinate values on the attribute dimensions obtained through calculation into a coordinate point of a four-dimensional space, and for the attribute features which are not mentioned in the comment data, setting the value on the dimension corresponding to the attribute features which are not mentioned as 0, and obtaining the coordinate values of the comment data in the four-dimensional space coordinate;
averaging all the comment data of the same commodity on the dimension of the same commodity to obtain the coordinate value of each attribute of each commodity in a four-dimensional space coordinate;
and displaying coordinate values of each attribute of each commodity in the space coordinate, namely each commodity point in the four-dimensional space coordinate to complete the construction of the commodity comment space.
5. The recommendation method based on comment space and user preference according to claim 4, further comprising, after said completing the building of product comment space:
and clustering the generated commodity points by adopting a K-means clustering algorithm, putting commodities with similar attribute characteristics into a cluster, and storing a clustering center.
6. The recommendation method based on the comment space and the user preference according to claim 1, wherein the constructing the user preference space based on the user comment data corpus comprises:
constructing a four-dimensional space coordinate according to four attributes of quality, price, appearance and express;
quantifying emotional tendency of the users according to the attention degree of the users to the attribute characteristics, wherein the attention degree is the proportion of the number of comment data of each attribute of the same user to the total number of the comment data, and the emotional tendency space coordinate of each user is obtained;
and displaying all spatial coordinate points representing the user in the four-dimensional spatial coordinates to complete the construction of the user preference space.
7. The recommendation method based on comment space and user preference according to claim 5, wherein the step 3 comprises:
and taking the position of the user in the user preference space as a point of the user shopping preference, calculating the distance between the point of the user shopping preference and the center point of each cluster in the commodity comment space by adopting the Euclidean distance, and recommending the commodity in the cluster closest to the point of the user shopping preference to the user.
CN202010118462.9A 2020-02-26 2020-02-26 Recommendation method based on comment space and user preference Active CN111401936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010118462.9A CN111401936B (en) 2020-02-26 2020-02-26 Recommendation method based on comment space and user preference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010118462.9A CN111401936B (en) 2020-02-26 2020-02-26 Recommendation method based on comment space and user preference

Publications (2)

Publication Number Publication Date
CN111401936A true CN111401936A (en) 2020-07-10
CN111401936B CN111401936B (en) 2023-05-26

Family

ID=71430459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010118462.9A Active CN111401936B (en) 2020-02-26 2020-02-26 Recommendation method based on comment space and user preference

Country Status (1)

Country Link
CN (1) CN111401936B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052392A (en) * 2020-09-10 2020-12-08 江苏电力信息技术有限公司 Online service recommendation method based on LFM model
CN113239265A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on connection matrix
CN114529340A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Shop recommendation method and device and computer medium
CN116629983A (en) * 2023-07-24 2023-08-22 成都晓多科技有限公司 Cross-domain commodity recommendation method and system based on user preference
CN117853152A (en) * 2024-03-07 2024-04-09 云南疆恒科技有限公司 Business marketing data processing system based on multiple channels
CN117851688A (en) * 2024-03-06 2024-04-09 成都理工大学 Personalized recommendation method based on deep learning and user comment content
CN118132856A (en) * 2024-05-07 2024-06-04 南京梓恒数字科技有限公司 Intelligent analysis method and system based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029926A1 (en) * 2009-07-30 2011-02-03 Hao Ming C Generating a visualization of reviews according to distance associations between attributes and opinion words in the reviews
CN107038609A (en) * 2017-04-24 2017-08-11 广州华企联信息科技有限公司 A kind of Method of Commodity Recommendation and system based on deep learning
CN110458627A (en) * 2019-08-19 2019-11-15 华南师范大学 A kind of commodity sequence personalized recommendation method of user oriented preference of dynamic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029926A1 (en) * 2009-07-30 2011-02-03 Hao Ming C Generating a visualization of reviews according to distance associations between attributes and opinion words in the reviews
CN107038609A (en) * 2017-04-24 2017-08-11 广州华企联信息科技有限公司 A kind of Method of Commodity Recommendation and system based on deep learning
CN110458627A (en) * 2019-08-19 2019-11-15 华南师范大学 A kind of commodity sequence personalized recommendation method of user oriented preference of dynamic

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FA SHAO等: ""A Standard Bibliography Recommended Method Based on Topic Model and Fusion of Multi-feature"", 《2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP》 *
李苑等: "基于注意力机制的评论情感分析及情感词检测", 《计算机科学》 *
龙承宇: """基于服务体验度感知的用户偏好分析方法研究""", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052392A (en) * 2020-09-10 2020-12-08 江苏电力信息技术有限公司 Online service recommendation method based on LFM model
CN113239265A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on connection matrix
CN114529340A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Shop recommendation method and device and computer medium
CN116629983A (en) * 2023-07-24 2023-08-22 成都晓多科技有限公司 Cross-domain commodity recommendation method and system based on user preference
CN116629983B (en) * 2023-07-24 2023-09-22 成都晓多科技有限公司 Cross-domain commodity recommendation method and system based on user preference
CN117851688A (en) * 2024-03-06 2024-04-09 成都理工大学 Personalized recommendation method based on deep learning and user comment content
CN117851688B (en) * 2024-03-06 2024-05-03 成都理工大学 Personalized recommendation method based on deep learning and user comment content
CN117853152A (en) * 2024-03-07 2024-04-09 云南疆恒科技有限公司 Business marketing data processing system based on multiple channels
CN117853152B (en) * 2024-03-07 2024-05-17 云南疆恒科技有限公司 Business marketing data processing system based on multiple channels
CN118132856A (en) * 2024-05-07 2024-06-04 南京梓恒数字科技有限公司 Intelligent analysis method and system based on big data
CN118132856B (en) * 2024-05-07 2024-07-02 南京梓恒数字科技有限公司 Intelligent analysis method and system based on big data

Also Published As

Publication number Publication date
CN111401936B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111401936B (en) Recommendation method based on comment space and user preference
CN107220365B (en) Accurate recommendation system and method based on collaborative filtering and association rule parallel processing
CN107506480B (en) Double-layer graph structure recommendation method based on comment mining and density clustering
CN105608600A (en) Method for evaluating and optimizing B2B seller performances
EP3384381A1 (en) Automatically classifying and enriching imported data records to ensure data integrity and consistency
CN104252456B (en) A kind of weight method of estimation, apparatus and system
CN105843799B (en) A kind of academic paper label recommendation method based on multi-source heterogeneous information graph model
CN108829847B (en) Multi-modal modeling method based on translation and application thereof in commodity retrieval
Wang et al. Joint label completion and label-specific features for multi-label learning algorithm
CN108228867A (en) A kind of theme collaborative filtering recommending method based on viewpoint enhancing
CN106610970A (en) Collaborative filtering-based content recommendation system and method
CN113360647B (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN111639258A (en) News recommendation method based on neural network
CN112214661B (en) Emotional unstable user detection method for conventional video comments
CN114626909A (en) Commodity recommendation method, system, equipment and medium
CN116757779A (en) Recommendation method based on user portrait
CN112380451A (en) Favorite content recommendation method based on big data
CN106920151A (en) The recommendation method and system in the alternative pond in hotel
CN111538846A (en) Third-party library recommendation method based on mixed collaborative filtering
Angelovska et al. Siamese neural networks for detecting complementary products
Liu et al. A deep learning-based sentiment analysis approach for online product ranking with probabilistic linguistic term sets
CN110059257A (en) Based on the modified item recommendation method that scores
CN112767085B (en) Commodity similarity analysis and commodity recommendation method and device and computer storage medium
Won et al. A hybrid collaborative filtering model using customer search keyword data for product recommendation
CN113449200B (en) Article recommendation method and device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant