CN110276065B - Method and device for processing item comments - Google Patents
Method and device for processing item comments Download PDFInfo
- Publication number
- CN110276065B CN110276065B CN201810213834.9A CN201810213834A CN110276065B CN 110276065 B CN110276065 B CN 110276065B CN 201810213834 A CN201810213834 A CN 201810213834A CN 110276065 B CN110276065 B CN 110276065B
- Authority
- CN
- China
- Prior art keywords
- comment
- keyword
- article
- keywords
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012545 processing Methods 0.000 title claims abstract description 39
- 238000011156 evaluation Methods 0.000 claims abstract description 22
- 230000011218 segmentation Effects 0.000 claims description 33
- 238000004364 calculation method Methods 0.000 claims description 26
- 230000006872 improvement Effects 0.000 claims description 17
- 238000012552 review Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 10
- 238000013500 data storage Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000005202 decontamination Methods 0.000 description 4
- 230000003588 decontaminative effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000001915 proofreading effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- Probability & Statistics with Applications (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for processing article comments, and relates to the technical field of computers. One embodiment of the method comprises the following steps: obtaining comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article; calculating keywords of the article according to the evaluation paper; calculating word frequency of each keyword of the article; and displaying each keyword according to the word frequency. The embodiment can process the item comments and extract and display valuable information in the item comments.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for processing article comments.
Background
With the development of information diversification, various industries can generate a large amount of text data besides massive structured data to be processed. For the fields of electronic commerce and the like, the maximum text data volume is comment information of articles such as commodities and the like, and the comment information directly represents feedback of users to the commodities and influences first impressions of enterprise strategies and other users.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: in the prior art, comments on the articles are displayed in the corresponding pages of the articles one by one, the characteristics of the articles cannot be protruded, and valuable information is difficult to obtain according to the comments.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method and a device for processing article comments, which can process the article comments and extract and display valuable information in the article comments.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of processing an item comment, including:
obtaining comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article;
calculating keywords of the article according to the evaluation paper;
calculating word frequency of each keyword of the article;
And displaying each keyword according to the word frequency.
Optionally, the step of calculating word frequency of each keyword of the article includes:
calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article to obtain the actual word frequency of each keyword;
calculating the ratio of the occurrence times of each keyword in comment texts corresponding to the articles to the occurrence times of each keyword in comment texts corresponding to all articles of the article category to which the articles belong so as to obtain word frequency coefficients;
Multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword;
optionally, the step of calculating the occurrence times of each keyword in the comment text corresponding to the article includes:
acquiring text weight values of each comment text corresponding to the article;
calculating the actual times of each keyword in each comment text;
multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword;
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
Optionally, before the step of obtaining the text weight value of each comment text corresponding to the article, the method further includes:
And calculating the text weight value of each comment text according to the comment value of the user corresponding to each comment text and/or the number of keywords actually contained in each comment text.
Optionally, the step of displaying each keyword according to the word frequency includes:
ordering the keywords according to the order of word frequency from high to low;
generating word clouds according to the ordered keywords, and displaying the word clouds on interfaces corresponding to the articles.
Optionally, before the step of calculating the keyword of the item according to the comment sample data, the method further includes:
When a user inputs comment text for an article, judging whether the comment text accords with an input condition or not; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
Optionally, the step of calculating the keyword of the article according to the evaluation paper includes:
And performing word segmentation processing on the evaluation paper to obtain keywords of the article.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an apparatus for processing an item comment, including:
The sample acquisition module is used for acquiring comment sample data corresponding to the article; the sample comment data comprises comment text corresponding to the article;
The word segmentation module is used for calculating keywords of the article according to the evaluation paper;
The word frequency calculation module is used for calculating the word frequency of each keyword of the article;
And the display module is used for displaying the keywords according to the word frequency.
Optionally, the word frequency calculation module is further configured to:
calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article to obtain the actual word frequency of each keyword;
calculating the ratio of the occurrence times of each keyword in comment texts corresponding to the articles to the occurrence times of each keyword in comment texts corresponding to all articles of the article category to which the articles belong so as to obtain word frequency coefficients;
Multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword;
optionally, the word frequency calculation module is further configured to:
acquiring text weight values of each comment text corresponding to the article;
calculating the actual times of each keyword in each comment text;
multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword;
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
Optionally, the word frequency calculation module is further configured to:
And calculating the text weight value of each comment text according to the comment value of the user corresponding to each comment text and/or the number of keywords actually contained in each comment text.
Optionally, the display module is further configured to:
ordering the keywords according to the order of word frequency from high to low;
generating word clouds according to the ordered keywords, and displaying the word clouds on interfaces corresponding to the articles.
Optionally, the apparatus further includes:
The input module is used for judging whether the comment text accords with the input conditions or not when the user inputs the comment text aiming at the article; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
Optionally, the word segmentation module is further configured to:
And performing word segmentation processing on the evaluation paper to obtain keywords of the article.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic device for processing an item comment, including:
One or more processors;
storage means for storing one or more programs,
When the one or more programs are executed by the one or more processors, the one or more processors are caused to at least implement:
obtaining comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article;
calculating keywords of the article according to the evaluation paper;
calculating word frequency of each keyword of the article;
And displaying each keyword according to the word frequency.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, at least realizes:
obtaining comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article;
calculating keywords of the article according to the evaluation paper;
calculating word frequency of each keyword of the article;
And displaying each keyword according to the word frequency.
One embodiment of the above invention has the following advantages or benefits: because the technical means of acquiring the keywords according to the comments of the articles, calculating the word frequency of the keywords and displaying the keywords according to the word frequency are adopted, the technical problem that a comment system in the prior art is simple in function is solved, and a user can acquire more accurate information from the comments so as to provide better use experience for the user.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of processing item reviews, according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the main architecture of a comment processing system suitable for applying the method of processing a comment on an item provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of the primary modules of an apparatus for processing item reviews in accordance with an embodiment of the invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 5 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram of the main steps of a method of processing item reviews, according to an embodiment of the invention.
As shown in fig. 1, the embodiment of the invention provides a method for processing item comments, which is particularly suitable for an internet operation scene. The "article" in this embodiment is not limited to a physical article, and any object such as a commodity, a service, a media file, etc. that can be reviewed by the user may be assigned to the "article" in this embodiment.
The method provided by the embodiment comprises the following steps:
S10, comment sample data corresponding to the article are obtained; the sample comment data comprises comment text corresponding to the article. For an article, the comment text usually has multiple items, and in order to improve the accuracy of keyword selection, the number of comment sample data obtained in this step should be as large as possible. In addition to comment text, other types of information may be included in the sample comment data, which will be described in the following embodiments.
S11, calculating keywords of the article according to the evaluation paper. The common method for selecting keywords is to segment the comment text, a general word segmentation library can be introduced, and the word segmentation library can be expanded by inputting some corpus as required. There are many word segmentation methods, and there are commonly used word segmentation methods based on dictionary and word bank matching, word segmentation methods based on word frequency statistics, word segmentation methods based on knowledge understanding, and the like. After word segmentation, invalid and repeated content in the comment text is deleted, invalid words such as the mood words are also deleted, and the reserved words are the keywords pointed by the embodiment.
S12, calculating word frequency of each keyword of the article. The term "word frequency" refers to the frequency of occurrence of a certain word, and in this embodiment, specifically refers to the frequency of occurrence of a certain keyword among all keywords of its corresponding item, and may also be expressed as the proportion occupied. The basic calculation method of keyword word frequency is to divide the number of times of occurrence of the keyword by the total number of all keywords.
S13, displaying the keywords according to the word frequency. In order to better embody the characteristics of the article and extract the main views expressed by the user aiming at the comments of the article, the step represents the importance degree of each keyword by the word frequency of each keyword, and the keywords are displayed after being sequenced according to the order of the word frequency from high to low. The specific form of the display can be used for generating word cloud according to word frequency of keywords, or selecting a plurality of keywords with highest word frequency for projection display and the like.
From the above, it can be seen that, the method provided in this embodiment adopts the technical means of obtaining the keywords according to the comments of the articles, calculating the word frequency of the keywords, and displaying the keywords according to the word frequency, so that the technical problem of simple functions of the comment system in the prior art is solved, and users can obtain more accurate information from the comments, thereby providing better use experience for the users.
In some alternative embodiments, S12, the step of calculating word frequencies of keywords of the item includes:
And calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article so as to obtain the actual word frequency of each keyword. The "actual word frequency" is the word frequency in the prior art meaning that the number of occurrences of each keyword is an actual proportion of the total number of occurrences of all keywords.
And calculating the ratio of the occurrence times of each keyword in the comment texts corresponding to the articles to the occurrence times of each keyword in the comment texts corresponding to all articles of the article category to which the articles belong so as to obtain the word frequency coefficient. The larger the value of the word frequency coefficient of a certain keyword of an article is, the more intensively the keyword appears in the comment text of the article, namely the keyword is more likely to represent the unique characteristic of the article, and the keyword can more accurately represent the characteristic of the specific article; therefore, in this embodiment, word frequency coefficients are set for such keywords, so that such keywords are represented in a convex manner when the actual word frequencies are the same.
And multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword. The improved word frequency is the word frequency indicated in step S12.
The embodiment further improves the word frequency calculation method based on the previous embodiment, so that the word frequency of each keyword obtained through calculation can more accurately represent the characteristics of the article, thereby improving the accuracy of word frequency calculation and enhancing the prompt effect which can be achieved when the keywords are displayed.
In some optional embodiments, the step of calculating the number of occurrences of each keyword in the comment text corresponding to the item includes:
And acquiring text weight values of the comment texts corresponding to the articles. The text weight value is a coefficient for representing the importance degree of the comment text, for example, if the sending user of the comment text has better user evaluation (for example, the user credit level is higher, the number of comments issued by the user is greater) or the quality of the comment text is higher (for example, the number of keywords contained in the comment text is greater than a certain preset value), the comment text is considered to better express the characteristics of the article, and the keywords in the comment text have higher association degree with the article, so that the comment text is endowed with a higher text weight value. In the specific calculation, one or more of user evaluation, text quality or other attributes capable of representing importance degree of comment text can be selected, and weighting operation is performed to obtain a text weight value.
And respectively calculating the actual times of each keyword in each comment text. The actual times are the actual numbers of the keywords obtained after the word segmentation processing of the comment texts.
And multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword. For example, if a keyword included in a comment text after word segmentation is "decontamination fast", the actual times of decontamination and decontamination are both 1; and if the text weight value corresponding to the comment text is 2, the calculated improvement times of decontamination and quick are 1*2, namely 2 times.
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
According to the method and the device for calculating the frequency of occurrence of the keywords, the frequency of occurrence of the keywords is calculated, so that the keywords contained in the comment texts with higher importance degree can be counted more frequently in word frequency statistics, the comment texts with higher importance degree can play a more important role in the final word frequency statistics result, and the relevance of the word frequency and the object is improved.
In some optional embodiments, before the step of calculating the keyword of the item according to the evaluation sample data, at S11, the method further includes:
When a user inputs comment text for an article, judging whether the comment text accords with an input condition or not; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
The embodiment guides the input of comment text to a certain extent. When the comment text input by the user accords with the input condition, allowing the user to submit the comment text; when the comment text input by the user does not meet the input condition, the user is not allowed to rate the comment text, and the user may be prompted as to what kind of change needs to be made (for example, "the number of comment words is not less than 15 words" or "please input a more accurate comment"). When the embodiment is realized, word segmentation processing is needed for the comment text before judging the number of the keywords in the comment text, and because of preliminary screening, a simpler word segmentation algorithm can be adopted to improve efficiency and avoid long-time waiting of users, and the method mainly can be used for distinguishing part of speech, identifying common Chinese words and identifying repeated words.
The embodiment provides a method for acquiring comment text through current user input. The existing historical comment text can also be obtained as comment sample data. In order to improve the simplicity and accuracy of calculation, the history comment text can be first subjected to preliminary cleaning during acquisition, and the meaningful keywords are reserved as far as possible and added into subsequent word frequency calculation.
In order to further explain the method provided by the embodiment of the invention, the application of the method in the field of electronic commerce is described below through a system constructed based on the method. The system is mainly used for collecting and processing comment information of the commodity, generating word clouds according to the comment information and displaying the word clouds on a commodity interface.
FIG. 2 is a schematic diagram of the main architecture of a comment processing system suitable for applying the method for processing a comment on an item provided by an embodiment of the present invention.
As shown in fig. 2, the comment processing system 200 provided in this embodiment mainly includes:
the external information collection unit 210 is mainly used for collecting commodity comment data of an external e-commerce website.
The data caching unit 220 is configured to cache the commodity comment data acquired by the external information acquisition unit 210.
The first data quality standard unit 230 is configured to perform a proofreading modification (mainly including filtering dirty data and rejecting data that does not meet requirements) on the cached commodity comment data, and store the proofreading modified commodity comment data in the data storage unit 240.
The data storage unit 240 is used to store commodity comment data. Can be realized by adopting a plurality of storage systems such as MySQL, hbase and the like.
The online comment interaction unit 250 is configured to collect online commodity comment data inside the e-commerce website, and to interface with each application of the data analysis application layer 270.
The input guiding unit 260 is configured to correct the data according to the input condition in real time in the process of collecting the commodity comment data by the online comment interaction unit 250, and the comments that do not meet the input condition are not allowed to pass. After successful collection of commodity comment data, the commodity comment data is saved to the data storage unit 240.
The data analysis application layer 270 is a platform for implementing machine learning algorithm and big data calculation, and can be constructed based on APACHE SPARK, hadoop and other distributed clusters, and adopts Python, R, scala, C and other languages to provide a working development environment for programming calculation and scheduling configuration, and is used for constructing a model, and performing operation based on the model according to commodity comment data in the data storage unit 240, so as to implement various applications. The data analysis application layer comprises:
The model construction unit 272 is configured to extract commodity comment data from the data storage unit 240, and process the commodity comment data according to a preset model to obtain a subsequent application.
A second data quality standard unit 274 for providing support for the input conditions or quality standards applied in the first data quality standard unit 230 and the input guidance unit 260.
The keyword word cloud generating unit 276 is configured to generate a keyword word cloud according to the commodity comment data, and display the keyword word cloud on a commodity-related page.
Other extended applications 278, in addition to the applications described above, may be added other applications related to merchandise reviews based on the present system architecture.
In the present system, the data collected by the external information collection unit 210 and the online comment interaction unit 250 have the same format to facilitate the subsequent calculation. For the external information collection unit 210, an external website may be interfaced by setting an appropriate API (Application Programming Interface ) so as to acquire comments of the external website as commodity comment data matching the host. An exemplary merchandise review data storage form in data storage unit 240 is shown in table 1:
table 1 commodity comment data storage table
In table 1, "commodity category" indicates a category to which a commodity belongs, and is generally classified according to industry specifications and business habits. "corresponding to the station Sku code" means the SKU code used by the commodity in the station; SKU is known as Stock Keeping Unit, stock unit, which is a concept of the field of logistics, and widely represents an identification code uniquely corresponding to a commodity in electronic commerce. The comment text is the specific text content of commodity comments; the information source is used for representing a source website of commodity comment data; the comment generation time is used for representing the time of inputting commodity comment information, and can be generally determined according to a time stamp; the "comment-from-purchase time interval" refers to an interval in which the comment generation time is from the time when the user who posted the comment last purchased the item.
As mentioned above, the second data quality standard unit 274 is used to provide support for the input conditions or quality standards applied in the first data quality standard unit 230 and the input guiding unit 260, and the two principles are similar, and the working principle is described below by taking the input guiding unit 260 as an example only:
After the user inputs the comment text, word segmentation processing is firstly carried out on the comment text, a general word segmentation library can be introduced, and the word segmentation library can be expanded by inputting related corpus according to business requirements. There are many word segmentation methods, such as word segmentation methods based on dictionary and word stock matching, word segmentation methods based on word frequency statistics, word segmentation methods based on knowledge understanding, and the like. Or the word segmentation can be carried out by means of a Python word segmentation package, which is based on a Trie tree structure (also called a prefix tree or a dictionary tree, which is an orderly tree-shaped data structure used for storing an associated array, wherein keys are usually character strings) to realize efficient word graph scanning and generate a directed acyclic graph formed by all possible word formation conditions of Chinese characters in a text. For example, if the user inputs "good-good" it becomes a word "good" after the word segmentation and the repeated word processing are performed. And calculating the number of keywords in the processed text content, so as to judge the quality score of the comment, namely the text weight value. In addition, the minimum number of keywords can be limited, if the number of keywords in the text input by the user is smaller than the minimum number of keywords, the keywords cannot be successfully submitted, and therefore comment quality is improved. In addition to the quality score, information such as "evaluation distance purchase time interval", "whether to match the map" and the like can be added for weighted scoring. The scheme of scoring various commodity comment data to generate text weight values can score users according to users corresponding to the commodity comment data after scoring a large number of commodity comment data, so that evaluation items, such as high-quality scoring user labels, of the users are generated.
The keyword cloud generating unit 276 is configured to generate a keyword cloud. The word cloud is a display form of words, and the words are displayed in different sizes or special effects according to the importance degree of the words in a preset pattern, so that a viewer can intuitively know what the words represent.
In the existing keyword word cloud generation mode, text sentence splitting is generally carried out firstly, word frequency statistics is carried out on the rest keywords after irrelevant words such as pause words are removed, and finally word clouds are generated by sequencing according to statistical results. The prior art has the problem that for most commodities, particularly commodities of the same category, the word frequency difference of keywords is small, and the characteristics of each commodity are difficult to be represented. Solving the problem, the present embodiment improves both word frequency calculation and keyword number calculation.
First, a calculation formula for keyword word frequency in a certain commodity in the prior art is as follows:
In formula 1, TF ij represents the word frequency of the keyword i in the comment of the commodity j, N ij represents the number of occurrences of the keyword i in the comment of the commodity j, and N represents the total number of categories of the keyword in the comment of the commodity j.
Improvement formula 1 in the present embodiment: when the number of keywords in each comment text of the commodity is calculated, a text weight value is added on the basis of the actual number of keywords, the text weight value can be obtained by carrying out weighted calculation according to the comment text quality (an evaluation standard can be that other users count the number of the comments), the comment text user quality, the number of the keywords in the comment text and the like, in general, the higher the text weight value is, the more accurate the comment text describes the commodity characteristics, and therefore the number of the keywords in the comment text can be additionally counted when the keywords in the comment text are counted. According to this embodiment, the formula for calculating the number of occurrences of keywords in a commodity is as follows:
In formula 2, N ij represents the number of occurrences of the keyword i in the comment of the commodity j after improvement, L represents the total number of comments of the commodity j, score jl represents the text weight value of the first comment of the commodity j, and N ijl represents the number of occurrences of the keyword i in the first comment of the commodity j.
Improvement formula 2 in the present embodiment: the number of keywords in the same class of commodities is added to adjust word frequency, so that the number of times of keywords appearing in a specific commodity is higher than that of keywords of other commodities in the same class under the condition of the same number of times, and the importance of the keywords is reflected. According to the present embodiment, the formula for calculating the keyword word frequency in the commodity is as follows
In the formula 3, TF-New ij represents word frequency of a keyword i in the improved commodity j comment, N ij represents occurrence number of the keyword i in the improved commodity j comment, N ik represents occurrence number of the keyword i in comments of all commodities in class k to which the improved commodity j belongs, and M represents total category number of the keyword in the commodity j comment.
Based on the improved word frequency calculated by the method in the embodiment, the importance degree of the keywords in the specific commodity can be more accurately embodied, so that the recognition degree of the user on commodity characteristics is improved after the word cloud is generated, and the method can also play a positive guiding role when the user inputs comment texts.
FIG. 3 is a schematic diagram of the main modules of an apparatus for handling item reviews in accordance with an embodiment of the invention
As shown in fig. 3, there is further provided an apparatus 300 for processing an item comment according to an embodiment of the present invention, including:
The sample acquisition module 301 is configured to acquire comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article;
A word segmentation module 302, configured to calculate keywords of the item according to the evaluation paper;
A word frequency calculation module 303, configured to calculate word frequencies of keywords of the article;
and the display module 304 is configured to display each keyword according to the word frequency.
In some alternative embodiments, the word frequency calculation module 303 is further configured to:
calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article to obtain the actual word frequency of each keyword;
calculating the ratio of the occurrence times of each keyword in comment texts corresponding to the articles to the occurrence times of each keyword in comment texts corresponding to all articles of the article category to which the articles belong so as to obtain word frequency coefficients;
Multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword;
in some alternative embodiments, the word frequency calculation module 303 is further configured to:
acquiring text weight values of each comment text corresponding to the article;
calculating the actual times of each keyword in each comment text;
multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword;
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
In some alternative embodiments, the word frequency calculation module 303 is further configured to:
And calculating the text weight value of each comment text according to the comment value of the user corresponding to each comment text and/or the number of keywords actually contained in each comment text.
In some alternative embodiments, the presentation module 304 is further configured to:
ordering the keywords according to the order of word frequency from high to low;
generating word clouds according to the ordered keywords, and displaying the word clouds on interfaces corresponding to the articles.
In some alternative embodiments, the apparatus 300 further comprises:
an input module 305, configured to determine, when a user inputs a comment text for an item, whether the comment text meets an input condition; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
In some alternative embodiments, the word segmentation module 302 is further configured to:
And performing word segmentation processing on the evaluation paper to obtain keywords of the article.
From the above, it can be seen that, the device provided in this embodiment adopts the technical means of obtaining the keywords according to the comments of the articles, calculating the word frequency of the keywords, and displaying the keywords according to the word frequency, so as to solve the technical problem of simple functions of the comment system in the prior art, and enable the user to obtain more accurate information from the comments, thereby achieving the technical effect of improving the accuracy of displaying the keywords.
FIG. 4 illustrates an exemplary system architecture 400 to which a method of processing item reviews or an apparatus for processing item reviews of embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 is used as a medium to provide communication links between the terminal devices 401, 402, 403 and the server 405. The network 404 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 401, 402, 403.
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server providing support for shopping-type websites browsed by the user using the terminal devices 401, 402, 403. The background management server can analyze and the like the received commodity comment data and feed back the processing result to the terminal equipment in the form of word cloud and the like.
It should be noted that, the method for processing the comment of the article provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the device for processing the comment of the article is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
According to an embodiment of the present invention, the present invention also provides an electronic device and a readable storage medium.
Fig. 5 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, the processes described by the schematic diagrams of the main steps above may be implemented as computer software programs according to embodiments of the present invention. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the schematic diagram of the main steps. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 501.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor comprises a sample acquisition module, a word segmentation module, a word frequency calculation module and a display module. The names of these modules do not limit the module itself in some cases, and for example, the sample acquisition module may also be described as "a module for acquiring comment sample data corresponding to an item".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
obtaining comment sample data corresponding to an article; the sample comment data comprises comment text corresponding to the article;
calculating keywords of the article according to the evaluation paper;
calculating word frequency of each keyword of the article;
And displaying each keyword according to the word frequency.
According to the technical scheme provided by the embodiment of the invention, the technical means of acquiring the keywords according to the comments of the articles, calculating the word frequency of the keywords and displaying the keywords according to the word frequency are adopted, so that the technical problem of simple functions of a comment system in the prior art is solved, a user can acquire more accurate information from comments, and the technical effect of improving the keyword display accuracy is achieved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (14)
1. A method of processing reviews of an item, comprising:
obtaining comment sample data corresponding to an article; the comment sample data comprises comment text corresponding to the article;
calculating keywords of the article according to the evaluation paper;
Calculating word frequency of each keyword of the article; the method specifically comprises the following steps: calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article to obtain the actual word frequency of each keyword; calculating the ratio of the occurrence times of each keyword in comment texts corresponding to the articles to the occurrence times of each keyword in comment texts corresponding to all articles of the article category to which the articles belong so as to obtain word frequency coefficients; multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword; the improved word frequency is used as the word frequency of the key word;
And displaying each keyword according to the word frequency.
2. The method of claim 1, wherein the step of calculating the number of occurrences of each keyword in the comment text corresponding to the item includes:
acquiring text weight values of each comment text corresponding to the article;
calculating the actual times of each keyword in each comment text;
multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword;
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
3. The method of claim 2, further comprising, prior to the step of obtaining a text weight value for each comment text corresponding to the item:
And calculating the text weight value of each comment text according to the comment value of the user corresponding to each comment text and/or the number of keywords actually contained in each comment text.
4. The method of claim 1, wherein the step of presenting each of the keywords according to the term frequency comprises:
ordering the keywords according to the order of word frequency from high to low;
generating word clouds according to the ordered keywords, and displaying the word clouds on interfaces corresponding to the articles.
5. The method of claim 1, further comprising, prior to the step of calculating keywords for the item from the comment sample data:
When a user inputs comment text for an article, judging whether the comment text accords with an input condition or not; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
6. The method of claim 1, wherein the step of calculating keywords for the item from the scoring paper comprises:
And performing word segmentation processing on the evaluation paper to obtain keywords of the article.
7. An apparatus for processing reviews of items, comprising:
the sample acquisition module is used for acquiring comment sample data corresponding to the article; the comment sample data comprises comment text corresponding to the article;
The word segmentation module is used for calculating keywords of the article according to the evaluation paper;
The word frequency calculation module is used for calculating the word frequency of each keyword of the article; the method is particularly used for: calculating the ratio of the occurrence times of each keyword to the total occurrence times of all keywords in the comment text corresponding to the article to obtain the actual word frequency of each keyword; calculating the ratio of the occurrence times of each keyword in comment texts corresponding to the articles to the occurrence times of each keyword in comment texts corresponding to all articles of the article category to which the articles belong so as to obtain word frequency coefficients; multiplying the actual word frequency by the word frequency coefficient to obtain an improved word frequency of each keyword; the improved word frequency is used as the word frequency of the key word;
And the display module is used for displaying the keywords according to the word frequency.
8. The apparatus of claim 7, wherein the word frequency calculation module is further configured to:
acquiring text weight values of each comment text corresponding to the article;
calculating the actual times of each keyword in each comment text;
multiplying the actual times of each keyword in each comment text by the text weight value of the comment text to obtain the improvement times of each keyword;
And respectively calculating the sum of the improvement times of each keyword in all comment texts so as to obtain the occurrence times of each keyword in the comment text corresponding to the article.
9. The apparatus of claim 8, wherein the word frequency calculation module is further configured to:
And calculating the text weight value of each comment text according to the comment value of the user corresponding to each comment text and/or the number of keywords actually contained in each comment text.
10. The apparatus of claim 7, wherein the display module is further to:
ordering the keywords according to the order of word frequency from high to low;
generating word clouds according to the ordered keywords, and displaying the word clouds on interfaces corresponding to the articles.
11. The apparatus of claim 7, wherein the apparatus further comprises:
The input module is used for judging whether the comment text accords with the input conditions or not when the user inputs the comment text aiming at the article; taking comment texts meeting the input conditions as comment sample data of the article; wherein the input condition includes at least one of: the number of keywords in the comment text is not less than a preset number threshold, and the number of words in the comment text is not less than a preset word number threshold.
12. The apparatus of claim 7, wherein the word segmentation module is further configured to:
And performing word segmentation processing on the evaluation paper to obtain keywords of the article.
13. An electronic device for processing reviews of items, comprising:
One or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
14. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810213834.9A CN110276065B (en) | 2018-03-15 | 2018-03-15 | Method and device for processing item comments |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810213834.9A CN110276065B (en) | 2018-03-15 | 2018-03-15 | Method and device for processing item comments |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276065A CN110276065A (en) | 2019-09-24 |
CN110276065B true CN110276065B (en) | 2024-07-19 |
Family
ID=67957702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810213834.9A Active CN110276065B (en) | 2018-03-15 | 2018-03-15 | Method and device for processing item comments |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276065B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895652A (en) * | 2019-09-27 | 2020-03-20 | 广州视源电子科技股份有限公司 | Comment information processing method, device, system, equipment and storage medium |
CN112989020B (en) * | 2019-12-17 | 2024-08-16 | 北京沃东天骏信息技术有限公司 | Information processing method, apparatus, and computer-readable storage medium |
CN111460261A (en) * | 2020-04-13 | 2020-07-28 | 同济大学 | Multi-platform network recording and playing course integration platform and method |
CN113129071A (en) * | 2021-04-29 | 2021-07-16 | 北京数聚智连科技股份有限公司 | Method and device for analyzing product SKU of merchant |
CN113836410B (en) * | 2021-09-22 | 2024-03-15 | 中国第一汽车股份有限公司 | Vehicle sound quality evaluation method, device, evaluation equipment and storage medium |
CN116579351B (en) * | 2023-07-14 | 2024-03-19 | 广州淘通科技股份有限公司 | Analysis method and device for user evaluation information |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3472032B2 (en) * | 1995-04-24 | 2003-12-02 | 株式会社東芝 | Information filter device and information filter method |
US7620651B2 (en) * | 2005-11-15 | 2009-11-17 | Powerreviews, Inc. | System for dynamic product summary based on consumer-contributed keywords |
JP4468294B2 (en) * | 2005-12-08 | 2010-05-26 | 日本電信電話株式会社 | EXPERIENCE INFORMATION EVALUATION DEVICE, PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM |
KR101178208B1 (en) * | 2009-10-08 | 2012-08-29 | 동국대학교 산학협력단 | Apparatus and method for extracting keywords |
CN102682120B (en) * | 2012-05-15 | 2015-06-03 | 合一网络技术(北京)有限公司 | Method and device for acquiring essential article commented on network |
KR101491627B1 (en) * | 2013-07-30 | 2015-02-11 | 성균관대학교산학협력단 | Quantification method, apparatus and system of reviews for mobile application evaluation |
CN106557483B (en) * | 2015-09-25 | 2020-11-27 | 创新先进技术有限公司 | Data processing method, data query method, data processing equipment and data query equipment |
CN107679069A (en) * | 2017-08-18 | 2018-02-09 | 国家计算机网络与信息安全管理中心 | Method is found based on a kind of special group of news data and related commentary information |
-
2018
- 2018-03-15 CN CN201810213834.9A patent/CN110276065B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110276065A (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276065B (en) | Method and device for processing item comments | |
US20190163742A1 (en) | Method and apparatus for generating information | |
CN107330752B (en) | Method and device for identifying brand words | |
CN107506495B (en) | Information pushing method and device | |
CN109145280A (en) | The method and apparatus of information push | |
CN111444304B (en) | Search ordering method and device | |
CN110766486B (en) | Method and device for determining item category | |
CN110827112B (en) | Deep learning commodity recommendation method and device, computer equipment and storage medium | |
US11741094B2 (en) | Method and system for identifying core product terms | |
CN110020162B (en) | User identification method and device | |
US11392631B2 (en) | System and method for programmatic generation of attribute descriptors | |
CN110858226A (en) | Conversation management method and device | |
CN110827101B (en) | Shop recommending method and device | |
CN112148841B (en) | Object classification and classification model construction method and device | |
CN112966081A (en) | Method, device, equipment and storage medium for processing question and answer information | |
CN110633398A (en) | Method for confirming central word, searching method, device and storage medium | |
CN113450172B (en) | Commodity recommendation method and device | |
CN114363019A (en) | Method, device and equipment for training phishing website detection model and storage medium | |
CN108959289B (en) | Website category acquisition method and device | |
CN112784861A (en) | Similarity determination method and device, electronic equipment and storage medium | |
CN113239273B (en) | Method, apparatus, device and storage medium for generating text | |
CN107357847B (en) | Data processing method and device | |
CN113313542B (en) | Method and device for pushing channel pages | |
CN111274383B (en) | Object classifying method and device applied to quotation | |
CN111767918A (en) | Picture identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |