CN111401936A

CN111401936A - A Recommendation Method Based on Comment Space and User Preferences

Info

Publication number: CN111401936A
Application number: CN202010118462.9A
Authority: CN
Inventors: 余文涛; 葛蕾; 余文彬; 黄晓辉; 唐慧丰; 胡瑞娟; 李勇; 李珠峰; 席耀一; 王博
Original assignee: PLA Information Engineering University
Current assignee: Information Engineering University Of Chinese People's Liberation Army Cyberspace Force
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2020-07-10
Anticipated expiration: 2040-02-26
Also published as: CN111401936B

Abstract

The invention belongs to the technical field of commodity recommendation, and discloses a recommendation method based on comment space and user preference, comprising: step 1, constructing a commodity comment data corpus and a user comment data corpus; Construct a product review space and a user preference space respectively; step 3, according to the user's position in the user preference space and the constructed product review space, obtain a product recommendation for the user. The invention uses the commodity review data to quantify the commodity attributes and maps the commodities to the attribute space, and combines the method of the user's historical shopping preference for commodity recommendation. For the existing recommendation system, it avoids the low recommendation efficiency caused by the sparse matrix. With the continuous increase of the amount of comment information, the present invention can more accurately recommend products that meet the user's shopping preference to the user by mining the relationship between the user's shopping preference and commodity attributes.

Description

A Recommendation Method Based on Comment Space and User Preferences

技术领域technical field

本发明属于商品推荐技术领域，尤其涉及一种基于评论空间和用户偏好的推荐方法。The invention belongs to the technical field of commodity recommendation, and in particular relates to a recommendation method based on comment space and user preference.

背景技术Background technique

国际互联网在带给人们便利的同时，也在带给人们复杂而繁重的筛选任务，使得人们不得不花大量的时间来挑选真正自己需要的信息。在网络购物方面同样带给人们信息海洋，用户在面对种类多样，数量繁多的商品信息的时候，总是无从下手，往往需要花费比之前多几倍的时间来挑选自己喜欢而又适合自己的商品。同样地，信息过载也会给网络上的商家带来极大的挑战，商家通过互联网发布的信息很快就会被其他信息所淹没，商家需要不断地更新自己的商品信息，以达到销售的目的。为了解决这个问题，各个网络购物平台推出了各式样的推荐算法(李琳,刘锦行,孟祥福,等.融合评分矩阵与评论文本的商品推荐模型[J].计算机学报,2018,41(7):1559-1573.)，目的就是为用户推荐适合自己的商品，减少用户的购物时间，提高用户的决策效率。While the Internet brings convenience to people, it also brings people complicated and heavy screening tasks, so that people have to spend a lot of time to select the information they really need. In terms of online shopping, it also brings people a sea of information. When users are faced with a variety of commodity information, they are always unable to start, and they often need to spend several times more time than before to choose the one they like and suit themselves. commodity. Similarly, information overload will also bring great challenges to the merchants on the Internet. The information released by the merchants through the Internet will soon be overwhelmed by other information. The merchants need to constantly update their product information to achieve the purpose of sales. . In order to solve this problem, various online shopping platforms have launched various recommendation algorithms (Li Lin, Liu Jinxing, Meng Xiangfu, etc. Product recommendation model integrating rating matrix and review text [J]. Journal of Computers, 2018, 41(7): 1559-1573.), the purpose is to recommend suitable products for users, reduce users' shopping time, and improve users' decision-making efficiency.

随着时代的发展与进步，电子商务行业的挑战也将大大增加，而电子商务网站的推荐系统的好坏也会成为商家竞争的方式(李宇琦,陈维政,闫宏飞,等.基于网络表示学习的个性化商品推荐[J].计算机学报,2019(8):7.)。就现如今的推荐系统来说，内部的推荐算法的准确率以及实时性不够高，所以，仍然需要提出一种切实可行的推荐算法来对现有的推荐系统进行完善和提高。With the development and progress of the times, the challenges of the e-commerce industry will also increase greatly, and the quality of the recommendation system of e-commerce websites will also become a way for merchants to compete (Li Yuqi, Chen Weizheng, Yan Hongfei, etc. Based on the network to represent the personality of learning Chemical Commodity Recommendation [J]. Chinese Journal of Computer, 2019(8):7.). As far as the current recommendation system is concerned, the accuracy and real-time performance of the internal recommendation algorithm are not high enough. Therefore, it is still necessary to propose a feasible recommendation algorithm to improve and improve the existing recommendation system.

发明内容SUMMARY OF THE INVENTION

本发明针对现有推荐系统内部推荐算法的准确率以及实时性不够高的问题，提出一种基于评论空间和用户偏好的推荐方法。Aiming at the problem that the accuracy and real-time performance of the internal recommendation algorithm of the existing recommendation system are not high enough, the present invention proposes a recommendation method based on comment space and user preference.

为了实现上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于评论空间和用户偏好的推荐方法，包括：A recommendation method based on comment space and user preferences, including:

步骤1：构建商品评论数据语料库及用户评论数据语料库；Step 1: Build a corpus of product review data and a corpus of user review data;

步骤2：基于商品评论数据语料库、用户评论数据语料库分别构建商品评论空间及用户偏好空间；Step 2: Construct a product review space and a user preference space based on the product review data corpus and the user review data corpus respectively;

步骤3：根据用户在用户偏好空间中的位置及构建的商品评论空间，得出对用户的商品推荐。Step 3: According to the user's position in the user preference space and the constructed product review space, the product recommendation for the user is obtained.

进一步地，所述构建商品评论数据语料库包括：Further, the building a corpus of product review data includes:

将所有用户对同一商品的评论数据进行采集；Collect the comment data of all users on the same product;

采用Jieba分词模块对所有用户对同一商品的评论数据进行分词操作，并去除停用词；The Jieba word segmentation module is used to perform word segmentation on the comment data of all users on the same product, and remove stop words;

采用基于词典的分类算法，根据质量、价格、外观和快递四个属性，对所有用户对同一商品的评论数据进行分类，完成商品评论数据语料库构建。Using a dictionary-based classification algorithm, according to the four attributes of quality, price, appearance and express delivery, the review data of all users for the same product is classified to complete the construction of the product review data corpus.

进一步地，所述构建用户评论数据语料库包括：Further, the construction of the user comment data corpus includes:

将同一用户对所有商品的评论数据进行采集；Collect the comment data of all products by the same user;

采用Jieba分词模块对同一用户对所有商品的评论数据进行分词操作，并去除停用词；The Jieba word segmentation module is used to perform word segmentation on the comment data of all products by the same user, and remove stop words;

采用基于词典的分类算法，根据质量、价格、外观和快递四个属性，对同一用户对所有商品的评论数据进行分类，完成用户评论数据语料库构建。Using a dictionary-based classification algorithm, according to the four attributes of quality, price, appearance and express delivery, the same user's review data for all products is classified to complete the construction of the user review data corpus.

进一步地，基于商品评论数据语料库构建商品评论空间包括：Further, building a product review space based on the product review data corpus includes:

根据质量、价格、外观和快递四个属性构建四维空间坐标；Construct four-dimensional space coordinates according to the four attributes of quality, price, appearance and express delivery;

使用KNN分类算法将商品评论数据语料库中的评论数据分为积极和消极两种类别；Use the KNN classification algorithm to classify the review data in the product review data corpus into two categories: positive and negative;

将描述同一商品的同一属性特征的积极和消极评论数据按照正面情感的概率由大到小的顺序分别组成1个序列，将积极评论数据对应序列中第一条标记为积极的评论数据定为该属性特征上的积极偏向参照评论，即为本属性特征上的单位1；将消极评论数据对应序列中最后一条标记为消极的语句定为该属性特征上的消极参照评论，即为该属性特征上的单位-1；The positive and negative review data describing the same attribute of the same product are formed into a sequence according to the probability of positive sentiment in descending order, and the first comment data marked as positive in the sequence corresponding to the positive review data is set as this. The positive biased reference review on the attribute feature is the unit 1 on this attribute feature; the last sentence marked as negative in the corresponding sequence of negative review data is designated as the negative reference review on the attribute feature, which is the attribute feature. unit -1;

使用余弦相似度算法计算待处理评论数据与本属性特征上的参照评论之间的余弦值：如果该评论数据的分类标签为积极，则计算该评论数据同本属性特征上积极偏向参照评论之间的余弦值，所得余弦值即为该评论数据在属性维度上的坐标值；如果该评论语句的分类标签为消极，则计算该评论数据同本属性特征上的消极偏向参照评论之间的余弦值，所得余弦值的相反数即为该评论数据在本属性维度上的坐标值；Use the cosine similarity algorithm to calculate the cosine value between the review data to be processed and the reference review on this attribute feature: if the classification label of the review data is positive, calculate the difference between the review data and the positive biased reference review on this attribute feature The obtained cosine value is the coordinate value of the review data in the attribute dimension; if the classification label of the review sentence is negative, calculate the cosine value between the review data and the negative bias reference review on this attribute feature. , the inverse of the obtained cosine value is the coordinate value of the comment data in this attribute dimension;

同一评论数据中，将计算得到的各个属性维度上的坐标值汇总为一个四维空间的坐标点，对于该评论数据未提及的属性特征，则在未提及的属性特征对应的维度上的值为0，得到该评论数据在四维空间坐标中的坐标值；In the same review data, the calculated coordinate values of each attribute dimension are summarized into a coordinate point in a four-dimensional space. For the attribute features not mentioned in the review data, the value on the dimension corresponding to the unmentioned attribute feature is calculated. is 0, the coordinate value of the comment data in the four-dimensional space coordinate is obtained;

将同一商品所有评论数据同一商品的维度上的值求平均，得出各商品各属性在四维空间坐标中的坐标值；Average the values of all the comment data of the same product in the dimension of the same product, and obtain the coordinate value of each attribute of each product in the four-dimensional space coordinates;

将各商品各属性在空间坐标中的坐标值、即各商品点呈现在四维空间坐标中，完成商品评论空间构建。The coordinate value of each attribute of each commodity in the spatial coordinates, that is, each commodity point, is presented in the four-dimensional spatial coordinates to complete the construction of the commodity comment space.

进一步地，在所述完成商品评论空间构建之后，还包括：Further, after the completion of the product review space construction, the method further includes:

采用K-means聚类算法对生成的商品点进行聚类，将具有类似属性特征的商品放在一个簇中，并保存聚类中心。The K-means clustering algorithm is used to cluster the generated product points, and the products with similar attribute characteristics are placed in a cluster, and the cluster center is saved.

进一步地，基于用户评论数据语料库构建用户偏好空间包括：Further, constructing the user preference space based on the user comment data corpus includes:

根据用户对各属性特征的关注程度对用户进行情感倾向量化，所述关注程度为同一用户各属性的评论数据个数所占总评论数据个数的比例，得到各用户的情感倾向空间坐标；Quantify the user's emotional tendency according to the user's degree of attention to each attribute feature, where the degree of attention is the ratio of the number of comment data for each attribute of the same user to the total number of comment data, and obtain the spatial coordinates of each user's emotional tendency;

将所有表示用户的空间坐标点显示在四维空间坐标中，完成用户偏好空间构建。Display all the spatial coordinate points representing the user in the four-dimensional spatial coordinate to complete the construction of the user preference space.

进一步地，所述步骤3包括：Further, the step 3 includes:

将用户在用户偏好空间中的位置作为用户购物偏好的点，在商品评论空间采用欧式距离计算用户购物偏好的点与各簇中心点之间的距离，将距离用户购物偏好的点最近的簇中的商品推荐给用户。The user's position in the user preference space is taken as the point of the user's shopping preference, and the Euclidean distance is used to calculate the distance between the point of the user's shopping preference and the center point of each cluster in the product review space. products are recommended to users.

与现有技术相比，本发明具有的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明使用商品评论数据量化商品属性并将商品映射到属性空间，结合用户历史购物偏好的方法进行商品推荐。针对于现有的推荐系统而言，避免了稀疏矩阵所带来的推荐效率低的问题并且随着评论信息量的不断增加，商品的属性特征的判断也越准确。采用各用户之间独立分析的方式对每个用户的评论进行分析，并以此获取其购物偏好，不会使为该用户推荐的正确性受到其他用户分析结果的影响实现个性化推荐。本发明通过挖掘用户的购物偏好与商品属性之间的联系，能较为准确地将符合用户购物偏好的商品推荐给用户，实验结果表明仅使用历史评论数据即可达到很好的效果。The invention uses the commodity review data to quantify the commodity attributes and map the commodities to the attribute space, and carries out commodity recommendation in combination with the method of the user's historical shopping preference. For the existing recommendation system, the problem of low recommendation efficiency caused by sparse matrix is avoided, and with the continuous increase of the amount of review information, the judgment of the attribute characteristics of the product is more accurate. The user's comments are analyzed independently by each user to obtain their shopping preferences, and the accuracy of the recommendation for the user will not be affected by the analysis results of other users to achieve personalized recommendation. By mining the relationship between the user's shopping preference and commodity attributes, the present invention can more accurately recommend commodities that meet the user's shopping preference to the user, and the experimental results show that only using historical comment data can achieve good results.

附图说明Description of drawings

图1为本发明实施例一种基于评论空间和用户偏好的推荐方法的基本流程图；1 is a basic flowchart of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图2为本发明实施例一种基于评论空间和用户偏好的推荐方法的余弦相似度计算示意图；2 is a schematic diagram of cosine similarity calculation of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图3为本发明实施例一种基于评论空间和用户偏好的推荐方法的商品评论空间映射示意图；3 is a schematic diagram of a product review space mapping of a recommendation method based on review space and user preference according to an embodiment of the present invention;

图4为本发明实施例一种基于评论空间和用户偏好的推荐方法的肘部法SSE变化曲线图；Fig. 4 is the elbow method SSE change curve diagram of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图5为本发明实施例一种基于评论空间和用户偏好的推荐方法的用户偏好空间映射示意图；5 is a schematic diagram of user preference space mapping of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图6为本发明实施例一种基于评论空间和用户偏好的推荐方法的商品推荐示例图；FIG. 6 is an example diagram of a product recommendation of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图7为本发明实施例一种基于评论空间和用户偏好的推荐方法的商品和评论示例图；FIG. 7 is an example diagram of commodities and comments of a recommendation method based on comment space and user preference according to an embodiment of the present invention;

图8为本发明实施例一种基于评论空间和用户偏好的推荐方法的用户和评论示例图；8 is an example diagram of users and comments of a recommendation method based on comment space and user preferences according to an embodiment of the present invention;

图9为本发明实施例一种基于评论空间和用户偏好的推荐方法的不同算法正确率对比图。FIG. 9 is a comparison diagram of the accuracy rates of different algorithms of a recommendation method based on comment space and user preference according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和具体的实施例对本发明做进一步的解释说明：The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments:

如图1所示，一种基于评论空间和用户偏好的推荐方法，包括：As shown in Figure 1, a recommendation method based on comment space and user preferences includes:

步骤S101：构建商品评论数据语料库及用户评论数据语料库；Step S101: constructing a product review data corpus and a user review data corpus;

步骤S102：基于商品评论数据语料库、用户评论数据语料库分别构建商品评论空间及用户偏好空间；Step S102: respectively constructing a product review space and a user preference space based on the product review data corpus and the user review data corpus;

步骤S103：根据用户在用户偏好空间中的位置及构建的商品评论空间，得出对用户的商品推荐。Step S103: According to the user's position in the user preference space and the constructed product review space, obtain a product recommendation for the user.

具体地，所述构建商品评论数据语料库包括：Specifically, the building a corpus of product review data includes:

将所有用户对同一商品的评论数据进行采集；作为一种可实施方式，本实施例所采用的评论数据主要分为以下两部分：Collect the comment data of all users on the same product; as an implementable manner, the comment data used in this embodiment is mainly divided into the following two parts:

①公开数据：这部分数据是指网络公开的评论数据。采用亚马逊商品评论数据，这些数据既可以将商品与评论标识在一起，也可以将用户与评论标识在一起；①Public data: This part of the data refers to the comment data published on the Internet. Use Amazon product review data, which can identify both products and reviews, and users and reviews;

②半公开数据：这部分数据是指一些竞赛官方组委会提供的数据。采用的另一部分实验数据是由大数据与计算机智能大赛的竞赛官方组委会提供。②Semi-public data: This part of the data refers to the data provided by the official organizing committee of the competition. Another part of the experimental data used is provided by the official organizing committee of the Big Data and Computer Intelligence Competition.

采用Jieba分词模块对所有用户对同一商品的评论数据进行分词操作，并去除停用词；Jieba分词模块所具有的三种模式包括：全模式、精确模式和搜索引擎模式。作为一种可实施方式，本实施例采用Jieba分词的全模式对评论数据进行切分。在中文文本处理的过程中，为节省存储空间、提高运行效率，需要在分词之后去除某些词，而这些词被称为停用词。The Jieba word segmentation module is used to perform word segmentation on the comment data of all users on the same product, and remove stop words; the three modes of the Jieba word segmentation module include: full mode, precise mode and search engine mode. As an implementable manner, this embodiment uses the full mode of Jieba word segmentation to segment the comment data. In the process of Chinese text processing, in order to save storage space and improve operating efficiency, some words need to be removed after word segmentation, and these words are called stop words.

采用基于词典的分类算法，根据质量、价格、外观和快递四个属性，对所有用户对同一商品的评论数据进行分类，完成商品评论数据语料库构建。评论词典是指用户在对商品某一属性特征进行评论时，评论中往往会带有少数描述这个属性特征的词语，主要为形容词、名词、副词等。本实施例采用的分类方法是基于(评论)词典的分类算法。基于词典的评论数据分类算法的关键在于词典的构建，本实施例所采用的四个词典是以现有词典作为基准，通过人工加注的方式进行完善。在分类之前，通过对评论数据进行分析，确定人们对商品普遍关心的是质量、价格、外观和快递这四个属性特征，因此本实施例根据这四个属性特征作为衡量商品特征的四个方面。四个词典的部分内容如表1所示。Using a dictionary-based classification algorithm, according to the four attributes of quality, price, appearance and express delivery, the review data of all users for the same product is classified to complete the construction of the product review data corpus. A comment dictionary means that when users comment on a certain attribute feature of a product, the comments often contain a few words that describe this attribute feature, mainly adjectives, nouns, adverbs, etc. The classification method adopted in this embodiment is a classification algorithm based on a (comment) dictionary. The key of the dictionary-based review data classification algorithm lies in the construction of the dictionary. The four dictionaries used in this embodiment are based on the existing dictionaries and are improved by manual annotation. Before classifying, by analyzing the review data, it is determined that people generally care about the four attribute features of the product: quality, price, appearance and express delivery. Therefore, in this embodiment, these four attribute features are used as the four aspects to measure the product features. . Part of the contents of the four dictionaries are shown in Table 1.

表1四个词典的部分内容Table 1 Parts of the four dictionaries

质量quality 价格price 外观Exterior 快递express delivery 结实solid 降价price cut 好看nice 送货deliver goods 搀假adulterated 便宜Cheap 漂亮pretty 发货Ship 优质high quality 实惠affordable 颜值face value 速度speed 劣质poor quality 优惠discount 大气atmosphere 配送delivery 山寨Copycat 划算cost-effective 尊贵noble 物流logistics 次品defective 售价selling price 俊俏handsome 快quick 脚货Foot goods 高high 美nice 服务好good service

通过遍历处理之后的评论数据进行匹配，进一步标识出处理文本所描述的类别。分类结果中会出现了一行数据对应多个类别标签的现象，如：”书特别棒颜色质量都很棒送货也超快”这条评论中，此条评论数据既描述了这个商品的质量属性，也描述了这个商品的外观属性和快递属性。By traversing the processed comment data for matching, the category described by the processed text is further identified. In the classification results, there will be a phenomenon that one line of data corresponds to multiple category labels, such as: "The book is very good, the color is very good, the quality is excellent, and the delivery is super fast." In this comment, this comment data not only describes the quality attributes of this product , also describes the appearance attributes and express attributes of this product.

具体地，所述构建用户评论数据语料库包括：Specifically, the construction of the user comment data corpus includes:

将同一用户对所有商品的评论数据进行采集；具体地，所采用的评论数据与构建商品评论数据语料库的原始数据一致；Collect the review data of all products by the same user; specifically, the used review data is consistent with the original data for constructing the product review data corpus;

具体地，基于商品评论数据语料库构建商品评论空间包括：Specifically, building a product review space based on the product review data corpus includes:

根据质量、价格、外观和快递四个属性构建四维空间坐标；具体地，本发明在对评论数据分类时提取出四个用户普遍关心的商品属性作为分类对象，多维空间坐标系的构建也是根据商品的这四个属性特征为坐标轴建立的。此空间坐标系中每个坐标轴表示商品的一个属性特征，坐标轴的正方向表示这条评论对这个属性特征持积极态度，负方向表示这条评论对这个属性特征持消极态度，0表示这条评论对这个属性特征持中性态度或者表示没有提及这个属性特征。Four-dimensional space coordinates are constructed according to the four attributes of quality, price, appearance and express delivery; specifically, the present invention extracts four commodity attributes that are generally concerned by users as classification objects when classifying the review data, and the construction of the multi-dimensional space coordinate system is also based on the commodity These four attribute characteristics of the coordinate axis are established. Each coordinate axis in this spatial coordinate system represents an attribute feature of the product. The positive direction of the coordinate axis indicates that the review has a positive attitude toward this attribute feature, and the negative direction indicates that the review has a negative attitude toward this attribute feature. Comments were neutral or did not mention this attribute.

将描述同一商品的同一属性特征的所有评论数据组成1个序列，将该序列中第一条标记为积极的评论数据定为该属性特征上的积极偏向参照评论，即为本属性特征上的单位1；将最后一条标记为消极的语句定为该属性特征上的消极参照评论，即为该属性特征上的单位-1；All the review data describing the same attribute feature of the same product is formed into a sequence, and the first comment data marked as positive in the sequence is designated as the positive bias reference review on the attribute feature, which is the unit on this attribute feature. 1; Set the last sentence marked as negative as a negative reference comment on the attribute feature, which is the unit-1 on the attribute feature;

具体地，商品评论的情感倾向量化表示是指对商品的每条评论所描述的属性特征进行求值，最终得到一个值用来表示这条评论对这个属性特征情感倾向，本发明生成的空间坐标是依靠对每条评论的情感分析得到。在进行情感倾向判定时，本实施例使用了三种工具：基于Python实现的SnowNLP类库、KNN分类算法、余弦相似度；Specifically, the quantitative representation of the sentimental tendency of product reviews refers to evaluating the attribute feature described by each review of the product, and finally obtaining a value to represent the sentimental tendency of the review to this attribute feature. The spatial coordinates generated by the present invention It is based on sentiment analysis of each comment. When determining the emotional tendency, this embodiment uses three tools: SnowNLP class library based on Python, KNN classification algorithm, and cosine similarity;

SnowNLP类库用来计算每条评论按照属性切割后语句的正面情感的概率，得到一个值表示这条语句在某个属性特征上的正面情感概率。如：一条描述质量属性特征的语句“这个商品的质量不错”，这条语句正面情感的概率值为0.749396374210698。对于同一种商品而言，获取描述同一属性特征的语句积极度降序序列S；The SnowNLP class library is used to calculate the probability of the positive sentiment of the sentence after each comment is cut by attributes, and obtain a value that represents the positive sentiment probability of this sentence on a certain attribute feature. For example: a sentence that describes the quality attribute characteristics "the quality of this product is good", the probability value of the positive emotion of this sentence is 0.749396374210698. For the same commodity, obtain a descending sequence S of sentence positivity describing the same attribute feature;

但是SnowNLP类库只能得到偏向积极的概率，这对本发明构建评论空间的帮助不够，本发明也提出以下解决方法：However, the SnowNLP class library can only obtain the probability of being biased towards the positive, which is not enough for the present invention to construct the comment space, and the present invention also proposes the following solutions:

KNN分类算法是一个理论上比较成熟的分类算法，也是最简单的机器学习算法之一。其算法核心的思想是分析待分类对象的最近K个对象中，属于哪一类别的数量多，待分类对象就属于哪一类别。由于其只与最近的K个样本有关，所以KNN分类算法对训练集的要求较高，如果训练集样本数据的错误率较高，则也会导致最后的分类结果的正确率不高。本实施例将K值设置为9，这样就能够很好地防止出现两个类的个数相同的情况。将所有商品评论分别与训练集中已经分好类的评论进行距离计算，得到距离最近的K条评论。计算这K条评论所属类别的个数，将需要分类的评论划分到这K条评论所属类别较多的类中；The KNN classification algorithm is a theoretically mature classification algorithm and one of the simplest machine learning algorithms. The core idea of the algorithm is to analyze which category the most recent K objects of the object to be classified belong to, and which category the object to be classified belongs to. Since it is only related to the recent K samples, the KNN classification algorithm has higher requirements on the training set. If the error rate of the training set sample data is high, the accuracy of the final classification result will also be low. In this embodiment, the value of K is set to 9, so that the situation where the number of two classes is the same can be well prevented. Calculate the distance between all product reviews and the reviews that have been classified in the training set, and obtain the K closest reviews. Calculate the number of categories to which the K comments belong, and divide the comments that need to be classified into the categories that the K comments belong to;

为了找到坐标轴上对应于1和-1的两条评论，我们使用SnowNLP和KNN分类算法对有情感极性标注的大赛数据进行处理。SnowNLP只能获取当前评论偏向积极(正面情感)的概率，我们将S序列中概率最大的一条评论数据作为1对应的评论。在获取-1对应的评论时，SnowNLP由于算法的局限性，其获取的序列在首端，概率差别明显；在末尾，概率值差异不大。为了找到的-1对应的评论，本发明将使用KNN分类方法对有情感极性标注的大赛数据进行分类，将所有消极评论数据集单独使用SnowNLP处理获取的序列，概率值差别明显，取其最小值作为-1对应的评论。To find the two reviews corresponding to 1 and -1 on the axis, we use SnowNLP and KNN classification algorithms on the contest data with sentiment polarity annotations. SnowNLP can only obtain the probability that the current comment is biased towards positive (positive sentiment), and we take the comment data with the highest probability in the S sequence as the comment corresponding to 1. When obtaining the comment corresponding to -1, due to the limitations of the algorithm, SnowNLP obtains the sequence at the beginning, and the probability difference is obvious; at the end, the probability value is not very different. In order to find the comment corresponding to -1, the present invention will use the KNN classification method to classify the contest data marked with sentiment polarity, and use SnowNLP to process all the negative comment data sets separately. The probability value is obviously different, whichever is the smallest The value as -1 corresponds to the comment.

具体地，本发明借助有情感极性标注的大赛数据，使用KNN分类算法对公开数据集处理之后的每条语句进行处理，根据每条语句的情感极性，将这些评论语句分为积极和消极两种类别。分类完成后，通过SnowNLP类库分别获取各商品各属性的积极和消极评论语句对应的序列S，在描述同一商品的同一属性特征的所有语句中，将积极评论语句对应的序列S中第一条标记为积极的语句定为该属性特征上的积极偏向参照语句，即为本属性特征上的单位1，将消极评论语句对应的序列S中最后一条标记为消极的语句定为该属性特征上的消极参照语句即为该属性特征上的单位-1；Specifically, the present invention uses the KNN classification algorithm to process each sentence after the public data set is processed with the help of the contest data marked with emotional polarity, and divides these comment sentences into positive and negative according to the emotional polarity of each sentence Two categories. After the classification is completed, the sequence S corresponding to the positive and negative comment sentences of each attribute of each product is obtained through the SnowNLP class library. Among all the sentences describing the same attribute feature of the same product, the first sentence in the sequence S corresponding to the positive comment sentence is selected. The sentence marked as positive is designated as the positive biased reference sentence on the attribute feature, which is the unit 1 on this attribute feature, and the last sentence marked as negative in the sequence S corresponding to the negative comment sentence is designated as the attribute feature. The negative reference sentence is the unit-1 on the attribute characteristic;

余弦相似度通过计算空间中两个向量之间的夹角的余弦值，来预估这两个向量之间相似度的算法。余弦值的最小值为-1，最大值为1。当两个向量之间的余弦值越趋近于1，表示这两个向量之间的夹角越接近于0度，两个向量越相似；当两个向量之间的余弦值越趋近于-1，表示这两个向量之间的夹角越接近于180度，两个向量差异越大。如：在二维空间中，定义向量a的坐标为(x₁,y₁)，向量b的坐标为(x₂,y₂)，将两个向量放在同一个二维空间中，如图2所示；Cosine similarity is an algorithm that estimates the similarity between two vectors by calculating the cosine value of the angle between two vectors in space. The cosine value has a minimum value of -1 and a maximum value of 1. When the cosine value between the two vectors is closer to 1, it means that the angle between the two vectors is closer to 0 degrees, and the two vectors are more similar; when the cosine value between the two vectors is closer to -1, indicating that the closer the angle between the two vectors is to 180 degrees, the greater the difference between the two vectors. For example, in two-dimensional space, define the coordinates of vector a as (x ₁ , y ₁ ) and the coordinates of vector b as (x ₂ , y ₂ ), and place the two vectors in the same two-dimensional space, as shown in the figure 2 shown;

则向量a和向量b之间的夹角的余弦值，计算公式如(1)所示：Then the cosine value of the angle between the vector a and the vector b, the calculation formula is shown in (1):

如果a和b是在n维空间里的两个向量，这个计算公式仍然符合。假定a和b为两个n维向量。则向量a和向量b的余弦值的计算，计算公式如下所示。因此，本发明将两条评论当做是两个n维向量，可以利用余弦相似度计算两个评论之间的相似度：If a and b are two vectors in n-dimensional space, this formula still holds. Suppose a and b are two n-dimensional vectors. Then the calculation formula of the cosine value of the vector a and the vector b is as follows. Therefore, the present invention regards two reviews as two n-dimensional vectors, and the cosine similarity can be used to calculate the similarity between the two reviews:

本发明使用余弦相似度算法计算待处理评论语句与本属性特征上的标准评论语句之间的余弦值。如果该评论语句的分类标签为积极，就将该评论语句同本属性特征上积极偏向参照评论计算余弦值，所得到的值即为该评论语句在属性维度上的坐标值；如果该评论语句的分类标签为消极，就将该评论语句同本属性特征上的消极偏向参照评论计算余弦值，所得到的值的相反数即为该评论语句在本属性维度上的坐标值；The present invention uses the cosine similarity algorithm to calculate the cosine value between the comment sentence to be processed and the standard comment sentence on this attribute feature. If the classification label of the comment statement is positive, the cosine value of the comment statement and the attribute feature is positively biased with reference to the comment to calculate the cosine value, and the obtained value is the coordinate value of the comment statement in the attribute dimension; If the classification label is negative, the cosine value of the comment statement and the negative bias on this attribute feature is calculated with reference to the comment, and the inverse of the obtained value is the coordinate value of the comment statement on the attribute dimension;

同一评论中，将计算得到的各个属性维度上的坐标值汇总为一个四维空间的坐标点，如果该评论未提及某个属性特征，该评论在这个属性维度上的值为0，得到该评论在空间坐标中的坐标值；In the same comment, the calculated coordinate values of each attribute dimension are summarized into a coordinate point in a four-dimensional space. If the comment does not mention a certain attribute feature, the value of the comment on this attribute dimension is 0, and the comment is obtained. The coordinate value in space coordinates;

本发明采用取均值的方法得出商品在评论空间坐标系中的坐标。在对每条评论进行坐标化表示后，得到每条评论在各个属性特征上情感倾向的具体值，如：评论“这个玩具的质量太差了，并且还卖得那么贵，差评”，这条评论在坐标化表示之后的结果为(-0.91，-0.85，0，0)。这个坐标表示这条评论在质量属性特征维度上的值为-0.91，在价格属性特征维度上的值为-0.85，在外观和快递这两个属性特征维度上的值为0；The invention adopts the method of taking the mean value to obtain the coordinates of the commodity in the comment space coordinate system. After coordinate representation of each comment, get the specific value of each comment's emotional inclination on each attribute, such as: comment "The quality of this toy is too bad, and it is sold so expensive, bad comment", this The result of a comment after coordinate representation is (-0.91, -0.85, 0, 0). This coordinate indicates that this review has a value of -0.91 in the dimension of the quality attribute feature, a value of -0.85 in the dimension of the price attribute feature, and a value of 0 in the two attribute feature dimensions of appearance and express delivery;

每个商品包含多条评论，每条评论在空间坐标系中会对应一个坐标值。本发明将每个商品的所有评论的坐标值中的相应属性相加起来取均值，如公式(3)所示：Each product contains multiple reviews, and each review corresponds to a coordinate value in the spatial coordinate system. The present invention adds up the corresponding attributes in the coordinate values of all comments of each product to obtain an average value, as shown in formula (3):

将所有商品展现在四维空间中，其中前三维分别为质量、价格、外观，通过坐标轴展示出来，第四维的快递属性通过颜色的深浅变化表示第四维值的大小。如图3所示。All commodities are displayed in a four-dimensional space, in which the first three dimensions are quality, price, and appearance, which are displayed through the coordinate axis, and the fourth-dimensional express attribute indicates the value of the fourth-dimensional value through the change of color depth. As shown in Figure 3.

具体地，在所述完成商品评论空间构建之后，还包括：Specifically, after the completion of the product review space construction, the method further includes:

采用K-means聚类算法对生成的商品点进行聚类，将具有类似属性特征的商品放在一个簇中，并保存聚类中心；本发明采用K-means聚类算法对生成的商品点进行聚类，将具有类似属性特征的商品放在一个簇中，方便为用户进行推荐操作。在使用K-means聚类算法进行聚类操作时，首先需要确定K值大小，仅仅通过分析上述形成的商品点无法确定K值大小，作为一种可实施方式，本实施例采用肘部法来确定K值大小。如图4所示；K-means clustering algorithm is used to cluster the generated commodity points, commodities with similar attribute characteristics are placed in a cluster, and the cluster center is saved; Clustering, placing products with similar attribute characteristics in a cluster is convenient for recommending operations for users. When using the K-means clustering algorithm for clustering operation, the K value needs to be determined first, and the K value cannot be determined only by analyzing the commodity points formed above. Determine the size of the K value. As shown in Figure 4;

肘部法的核心指标是SSE，即误差平方和，其核心思想为：The core indicator of the elbow method is SSE, that is, the sum of squares of errors, and its core idea is:

a、随着K值的不断增大，样本划分的更加精细，每个簇中各数据对象之间的聚合程度也会越来越高，SSE值也会随之降低；a. With the continuous increase of the K value, the samples are divided more finely, the degree of aggregation between the data objects in each cluster will become higher and higher, and the SSE value will also decrease;

b、当K值小于最佳聚类数时，随着K值的增加，每个簇中各数据之间的聚合程度会大幅度增加，SSE值也会大幅度降低。当K值到达最佳聚类数时，随着K值的增长，每一个簇中各数据之间的聚合程度的增加幅度会下降，SSE值的下降幅度也会降低，之后会趋于平缓，在距离SSE的极值点最近的K值为最佳聚类数；b. When the K value is less than the optimal number of clusters, with the increase of the K value, the degree of aggregation between the data in each cluster will greatly increase, and the SSE value will also greatly decrease. When the K value reaches the optimal number of clusters, with the increase of the K value, the increase in the degree of aggregation between the data in each cluster will decrease, the decrease in the SSE value will also decrease, and then it will become flat. The K value closest to the extreme point of SSE is the optimal number of clusters;

根据肘部法的原则，当K值取7时，SSE值的变化出现较缓的现象，故针对于本实施例中所采用数据集的聚类结果而言，最佳聚类个数为7；According to the principle of the elbow method, when the K value is 7, the change of the SSE value is relatively slow. Therefore, for the clustering results of the data set used in this embodiment, the optimal number of clusters is 7 ;

本发明在使用K-means算法得到几个簇之后，将几个簇的中心点坐标保存起来，而簇中其他点的坐标就可以暂时不考虑，但是需要标记这些点属于哪一个簇以及每个坐标点所对应的商品名称。In the present invention, after using the K-means algorithm to obtain several clusters, the coordinates of the center points of several clusters are saved, and the coordinates of other points in the clusters can be temporarily ignored, but it is necessary to mark which cluster these points belong to and which cluster each The product name corresponding to the coordinate point.

具体地，基于用户评论数据语料库构建用户偏好空间包括：Specifically, building a user preference space based on the user comment data corpus includes:

根据质量、价格、外观和快递四个属性构建四维空间坐标；本发明为了达到和商品属性特征同步的目的，提高推荐的准确率，用户的偏好空间坐标系需要和商品的评论空间坐标系一样。建立一个四维空间坐标系，四个轴分别表示用户对质量、价格、外观以及快递的关注程度。同一种坐标系可以将用户的购物偏好和商品的属性特征结合起来，在推荐时可以达到一个较高的准确率；Four-dimensional space coordinates are constructed according to the four attributes of quality, price, appearance and express delivery; in order to achieve the purpose of synchronizing with commodity attributes and improve the accuracy of recommendation, the user's preference space coordinate system needs to be the same as the commodity comment space coordinate system. A four-dimensional space coordinate system is established, and the four axes represent the user's attention to quality, price, appearance, and express delivery. The same coordinate system can combine the user's shopping preference with the attribute characteristics of the product, and can achieve a high accuracy rate when recommending;

根据用户对各属性特征的关注程度对用户进行情感倾向量化，所述关注程度为同一用户各属性的评论数据个数所占总评论数据个数的比例，得到各用户的情感倾向空间坐标；就单个用户的所有评论而言，有些评论是涉及质量属性，有些评论是涉及价格属性，要想得到这个用户对质量属性的关注程度。如：想要得到某个用户对于质量属性的关注程度，需要将这个用户中所有描述质量的评论语句进行计数，计算出这些评论语句占总评论语句个数的比例，这个比例表示用户对质量属性的关注程度。计算公式如(4)所示：Quantify the user's emotional tendency according to the user's degree of attention to each attribute feature, where the degree of attention is the ratio of the number of comment data for each attribute of the same user to the total number of comment data, and obtain the spatial coordinates of each user's emotional tendency; For all the comments of a single user, some comments are related to quality attributes, and some comments are related to price attributes. To get the user's attention to quality attributes. For example, if you want to get a user's attention to the quality attribute, you need to count all the comment sentences describing the quality in the user, and calculate the ratio of these comment sentences to the total number of comment sentences. degree of attention. The calculation formula is shown in (4):

其中，(n_质量,n_价格,n_外观,n_快递)分别表示描述质量、价格、外观、快递属性的评论语句数，n表示该用户的评论语句总数；Among them, (n _quality , n _price , n _appearance , n _express ) represent the number of comment sentences describing quality, price, appearance, and express attributes respectively, and n represents the total number of comment sentences of the user;

将所有表示用户的空间坐标点显示在四维空间坐标中，完成用户偏好空间构建；本发明经过对用户情感倾向量化表示，得到单个用户的情感倾向空间坐标，这个坐标即为用户在偏好空间中的位置。将所有表示用户的空间坐标点显示在用户偏好空间中，其中前三维分别表示用户对商品的质量、价格、外观属性特征的情感倾向，第四维表示用户对商品快递属性特征的情感倾向，其值的大小用颜色的深浅程度表示。如图5所示。All the spatial coordinate points representing the user are displayed in the four-dimensional spatial coordinates to complete the construction of the user preference space; the present invention obtains the spatial coordinates of the emotional inclination of a single user through the quantitative representation of the user's emotional inclination, and this coordinate is the user's preference space in the preference space. Location. All spatial coordinate points representing users are displayed in the user preference space, in which the first three dimensions represent the user's emotional tendencies towards the quality, price, and appearance attributes of commodities respectively, and the fourth dimension represents the user's emotional tendencies towards the properties of commodity express delivery. The magnitude of the value is represented by the intensity of the color. As shown in Figure 5.

具体地，所述步骤S103包括：Specifically, the step S103 includes:

将用户在用户偏好空间中的位置作为用户购物偏好的点，在商品评论空间采用欧式距离计算用户购物偏好的点与各簇中心点之间的距离，将距离用户购物偏好的点最近的簇中的商品推荐给用户。本发明在进行推荐操作时，根据用户在偏好空间中的位置，给用户推荐出距离该用户最近的簇中的商品。由于在对商品映射到多维空间坐标系中时，已经对这些商品进行了聚类操作，计算并保留了每个簇的中心点坐标；对用户映射到多维空间坐标系中时，已经得到每个表示用户购物偏好的坐标点。本发明采用欧式距离计算用户的购物偏好的点与这些簇的中心点之间的距离。其中欧式距离又称为欧几里得距离，是一个应用比较广泛的距离定义，指在n维空间中两个点之间的距离，计算公式如(5)所示。The user's position in the user preference space is taken as the point of the user's shopping preference, and the Euclidean distance is used to calculate the distance between the point of the user's shopping preference and the center point of each cluster in the product review space, and the cluster closest to the point of the user's shopping preference is calculated. products are recommended to users. When the present invention performs the recommendation operation, according to the position of the user in the preference space, the product in the cluster closest to the user is recommended to the user. Because when the products are mapped to the multi-dimensional space coordinate system, these products have been clustered, and the center point coordinates of each cluster are calculated and retained; when users are mapped to the multi-dimensional space coordinate system, each Coordinate points representing the user's shopping preferences. The present invention uses Euclidean distance to calculate the distance between the points of the user's shopping preference and the center points of these clusters. Among them, Euclidean distance, also known as Euclidean distance, is a widely used distance definition, which refers to the distance between two points in n-dimensional space. The calculation formula is shown in (5).

其中，n表示n维空间。距离越小，表示这个簇中的商品越符合这个用户的购物偏好；距离越大，表示这个簇中的商品越不符合这个用户的购物偏好；Among them, n represents an n-dimensional space. The smaller the distance, the more the products in this cluster are in line with the user's shopping preferences; the larger the distance, the less the products in this cluster are in line with the user's shopping preferences;

将距离最小的簇中的商品推荐给用户，推荐结果如图6、图7、图8所示。图6为本发明方法为ID是15905的用户推荐的部分商品，图7、图8是商品的评论以及ID为15905的用户给出的评论。由于本发明处理商品数据和用户数据是平行处理，商品和用户在处理时没有关系，在推荐结果中，将用户曾经给过好评的商品重新推荐给用户，推荐结果可信性较强。The product in the cluster with the smallest distance is recommended to the user, and the recommendation results are shown in Figure 6, Figure 7, and Figure 8. Fig. 6 shows some products recommended by the user whose ID is 15905 according to the method of the present invention. Since the present invention processes commodity data and user data in parallel, there is no relationship between commodities and users during processing, and in the recommendation result, the commodities that the user once gave good reviews are re-recommended to the user, and the recommendation result is highly credible.

为验证本发明效果，采用正确率p对推荐结果进行衡量，p表示推荐系统给用户推荐商品中，用户实际喜欢的商品所占的比例，值越大，说明推荐算法的效果越好。P的计算公式如(6)所示：In order to verify the effect of the present invention, the recommendation result is measured by the correct rate p, where p represents the proportion of the products recommended by the recommendation system to the user, the products actually liked by the user, and the larger the value, the better the effect of the recommendation algorithm. The calculation formula of P is shown in (6):

其中，R(u)表示推荐系统所推荐的商品列表，T(u)表示用户的行为列表，即用户喜欢的商品，绝对值表示列表的长度。Among them, R(u) represents the list of products recommended by the recommendation system, T(u) represents the user's behavior list, that is, the user's favorite products, and the absolute value represents the length of the list.

并将本发明方法和SVD、基于关联规则推荐算法、基于商品属性值推荐算法、传统算法(基于用户属性和商品分类的协同过滤算法)的推荐正确率p进行比较。具体地，选取推荐列表长度为100，即|R(u)|＝100。推荐的用户个数分别为5、10、15、20，计算结果如图9所示。由图9可以得出，本发明推荐方法的正确率最高，但是随着被推荐用户的个数不断增多，正确率却在不断变化。原因在于，随着被推荐用户数量的不断增加，被推荐用户之间的喜欢的商品数不同，在计算正确率时，分子增加的幅度也不同，最后计算出的正确率不断在变化。The method of the present invention is compared with the recommendation accuracy p of SVD, recommendation algorithm based on association rules, recommendation algorithm based on commodity attribute value, and traditional algorithm (collaborative filtering algorithm based on user attributes and commodity classification). Specifically, the length of the recommendation list is selected to be 100, that is, |R(u)|=100. The recommended numbers of users are 5, 10, 15, and 20, respectively, and the calculation results are shown in Figure 9. It can be concluded from FIG. 9 that the recommendation method of the present invention has the highest accuracy rate, but as the number of recommended users continues to increase, the accuracy rate is constantly changing. The reason is that, with the increasing number of recommended users, the number of liked items among the recommended users is different, and when calculating the correct rate, the magnitude of the increase of the numerator is also different, and the final calculated correct rate is constantly changing.

综上，网络购物平台的推荐系统通过发现用户与商品之间的某些联系，将最符合用户的商品推荐给用户。本发明通过挖掘用户的购物偏好与商品属性之间的联系，能较为准确地将符合用户购物偏好的商品推荐给用户。本发明使用商品评论数据量化商品属性并将商品映射到属性空间，结合用户历史购物偏好的方法进行商品推荐。针对于现有的推荐系统而言，避免了稀疏矩阵所带来的推荐效率低的问题并且随着评论信息量的不断增加，商品的属性特征的判断也越准确。采用各用户之间独立分析的方式对每个用户的评论进行分析，并以此获取其购物偏好，不会使为该用户推荐的正确性受到其他用户分析结果的影响实现个性化推荐。实验结果表明仅使用历史评论数据即可达到很好的效果。To sum up, the recommendation system of the online shopping platform recommends the most suitable product to the user by discovering some connections between the user and the product. By mining the relationship between the user's shopping preference and the commodity attributes, the present invention can more accurately recommend the commodity conforming to the user's shopping preference to the user. The invention uses the commodity review data to quantify the commodity attributes and map the commodities to the attribute space, and carries out commodity recommendation in combination with the method of the user's historical shopping preference. For the existing recommendation system, the problem of low recommendation efficiency caused by sparse matrix is avoided, and with the continuous increase of the amount of comment information, the judgment of the attribute characteristics of the product is more accurate. Each user's comments are analyzed in the way of independent analysis among users, and their shopping preferences are obtained by this method, so that the correctness of the recommendation for the user will not be affected by the analysis results of other users, so as to realize personalized recommendation. Experimental results show that only using historical review data can achieve good results.

以上所示仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.

Claims

1. a recommendation method based on comment space and user preference, is characterized in that, comprises:

Step 1: Build a corpus of product review data and a corpus of user review data;

Step 2: Build a product review space and a user preference space based on the product review data corpus and the user review data corpus respectively;

Step 3: According to the user's position in the user preference space and the constructed product review space, the product recommendation for the user is obtained.

2. A recommendation method based on comment space and user preference according to claim 1, wherein the building a corpus of commodity comment data comprises:

Collect the comment data of all users on the same product;

The Jieba word segmentation module is used to perform word segmentation on the comment data of all users on the same product, and remove stop words;

Using a dictionary-based classification algorithm, according to the four attributes of quality, price, appearance and express delivery, the review data of all users for the same product is classified to complete the construction of the product review data corpus.

3. A kind of recommendation method based on comment space and user preference according to claim 1, is characterized in that, described constructing user comment data corpus comprises:

Collect the comment data of all products by the same user;

The Jieba word segmentation module is used to perform word segmentation on the comment data of the same user on all products, and remove stop words;

Using a dictionary-based classification algorithm, according to the four attributes of quality, price, appearance and express delivery, the same user's review data for all products is classified to complete the construction of the user review data corpus.

4. A kind of recommendation method based on comment space and user preference according to claim 1, is characterized in that, constructing commodity comment space based on commodity comment data corpus comprises:

Construct four-dimensional space coordinates according to the four attributes of quality, price, appearance and express delivery;

Use the KNN classification algorithm to classify the review data in the product review data corpus into two categories: positive and negative;

The positive and negative review data describing the same attribute of the same product are formed into a sequence according to the probability of positive sentiment in descending order, and the first comment data marked as positive in the sequence corresponding to the positive review data is set as this. The positive biased reference review on the attribute feature is the unit 1 on this attribute feature; the last sentence marked as negative in the corresponding sequence of negative review data is designated as the negative reference review on the attribute feature, which is the attribute feature. unit -1;

Use the cosine similarity algorithm to calculate the cosine value between the review data to be processed and the reference review on this attribute feature: if the classification label of the review data is positive, calculate the difference between the review data and the positive biased reference review on this attribute feature The obtained cosine value is the coordinate value of the review data in the attribute dimension; if the classification label of the review sentence is negative, calculate the cosine value between the review data and the negative bias reference review on this attribute feature. , the inverse of the obtained cosine value is the coordinate value of the comment data in this attribute dimension;

In the same review data, the calculated coordinate values of each attribute dimension are summarized into a coordinate point in a four-dimensional space. For the attribute features not mentioned in the review data, the value on the dimension corresponding to the unmentioned attribute feature is calculated. is 0, the coordinate value of the comment data in the four-dimensional space coordinate is obtained;

Average the values of all the comment data of the same product in the dimension of the same product, and obtain the coordinate value of each attribute of each product in the four-dimensional space coordinates;

The coordinate value of each attribute of each commodity in the spatial coordinates, that is, each commodity point, is presented in the four-dimensional spatial coordinates to complete the construction of the commodity comment space.

5. A recommendation method based on comment space and user preference according to claim 4, characterized in that, after the completion of the product comment space construction, further comprising:

The K-means clustering algorithm is used to cluster the generated product points, and the products with similar attribute characteristics are placed in a cluster, and the cluster center is saved.

6. A kind of recommendation method based on comment space and user preference according to claim 1, is characterized in that, constructing user preference space based on user comment data corpus comprises:

Quantify the user's emotional tendency according to the user's degree of attention to each attribute feature, where the degree of attention is the ratio of the number of comment data for each attribute of the same user to the total number of comment data, and obtain the spatial coordinates of each user's emotional tendency;

Display all spatial coordinate points representing users in four-dimensional spatial coordinates to complete the construction of user preference space.

7. A kind of recommendation method based on comment space and user preference according to claim 5, is characterized in that, described step 3 comprises:

The user's position in the user preference space is taken as the point of the user's shopping preference, and the Euclidean distance is used to calculate the distance between the point of the user's shopping preference and the center point of each cluster in the product review space. products are recommended to users.