CN106250365A - The extracting method of item property Feature Words in consumer reviews based on text analyzing - Google Patents

The extracting method of item property Feature Words in consumer reviews based on text analyzing Download PDF

Info

Publication number
CN106250365A
CN106250365A CN201610580612.1A CN201610580612A CN106250365A CN 106250365 A CN106250365 A CN 106250365A CN 201610580612 A CN201610580612 A CN 201610580612A CN 106250365 A CN106250365 A CN 106250365A
Authority
CN
China
Prior art keywords
feature words
word
feature
words
comment data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610580612.1A
Other languages
Chinese (zh)
Inventor
陈峥
张婷
梁恒
张永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu De Maian Science And Technology Ltd
Original Assignee
Chengdu De Maian Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu De Maian Science And Technology Ltd filed Critical Chengdu De Maian Science And Technology Ltd
Priority to CN201610580612.1A priority Critical patent/CN106250365A/en
Publication of CN106250365A publication Critical patent/CN106250365A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the extracting method of item property Feature Words in a kind of consumer reviews based on text analyzing, comprise determining that end article, and obtain the comment data of end article;Described comment data is carried out pretreatment;Part of speech sequence samples is obtained from pretreated comment data;Utilize described part of speech sequence samples to mate all comment data, state the position of Feature Words in model according to the formalization of part of speech sequence samples from comment data, extract Feature Words, and record the frequency of each Feature Words, all Feature Words constitutive characteristic word pre-candidate set;Feature Words pre-candidate set is carried out pretreatment;The similarity of any two Feature Words in statistical nature word pre-candidate set, and similarity is merged more than two Feature Words of threshold value.The present invention uses semantic similarity based on quantity of information to merge similar features word, removes redundancy feature word, decreases the data volume being analyzed Feature Words.

Description

The extracting method of item property Feature Words in consumer reviews based on text analyzing
Technical field
The present invention relates to the technical field of information processing, particularly relate in a kind of consumer reviews based on text analyzing The extracting method of item property Feature Words.
Background technology
The ordinary consumer that develops into of the Internet and information technology is shared commodity consumption online and is experienced and provide chance, thus The a large amount of comment data produced for Platform Analysis market, obtain user and evaluate attitude and carry out recommendation for user and provide Good chance, obtains other users for consumer and can preferably assist it to carry out decision-making in purchasing the attitude of commodity, and The important step of data mining it is by from comment on commodity extracting data attribute character word.
From the quality of the attribute character word that comment on commodity extracting data goes out, the impact on platform and user is all very big, good Feature Words platform can be allowed to understand the characteristic of commodity that user pays close attention to, promote or keep the individual features of commodity, improve and sell Amount, it is also possible to allow user understand the truth of the product characteristics oneself paid close attention to.
At present, in comment on commodity data, the method for Feature Words extraction has had a lot, is broadly divided into two big classes: rule-based Feature extraction and feature extraction based on probability.Such as the part of speech template matching method extended based on grammatical rules, based on word sequence The Hidden Markov of row mark and condition random field, these are all tentatively to extract the Feature Words in comment data.Research finds, Owing to being affected by consumer's schooling, culture background, diction, for the same attribute of same commodity, also The gap in description can be there is, but overall semanteme is close, if only with rule-based matching process to Feature Words Extracting, the Feature Words that extracts is it would appear that redundancy phenomena.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that in a kind of consumer reviews based on text analyzing The extracting method of item property Feature Words, uses semantic similarity based on quantity of information to merge similar features word, removes redundancy special Levy word, decrease the data volume that Feature Words is analyzed.
It is an object of the invention to be achieved through the following technical solutions: commodity in consumer reviews based on text analyzing The extracting method of attribute character word, comprises determining that end article, and obtains the comment data of end article;To described comment number According to carrying out pretreatment;Part of speech sequence samples is obtained from pretreated comment data;Described part of speech sequence samples is utilized to mate All comment data, state the position of Feature Words in model according to the formalization of part of speech sequence samples and extract spy from comment data Levy word, and record the frequency of each Feature Words, all Feature Words constitutive characteristic word pre-candidate set;In statistical nature word pre-candidate set The similarity of any two Feature Words, and similarity is merged more than two Feature Words of threshold value.
The acquisition methods of the comment data of end article is: use crawler algorithm to crawl end article from default website Comment data.
The preprocess method of comment data is: according to punctuation mark, every comment data is divided into multiple statement;By described Sentence segmentation is multiple single words;Part of speech is marked for each single word.
The preprocess method of comment data also includes, removes stop words.
The method obtaining part of speech sequence samples is:
The comment on commodity statement that definition comprises item property Feature Words is characterized sentence, chooses and carries out pretreated characteristic sentence As part of speech sequence samples;
The formalization statement model of part of speech sequence samples is:
(BF3, BF2, BF1, featurei, AF1, AF2, AF3, Pos:i)
In formula: featureiFeature Words, BFiI-th word before Feature Words, AFiI-th word after Feature Words, Pos Feature Words position in this feature sentence.
Further, the step that Feature Words pre-candidate set is carried out pretreatment is also included:
Whether the Feature Words in judging characteristic word pre-candidate set meets preset rules, if meeting, then retains this feature word, no Then delete this feature word.
Described preset rules is: the length of word is less than or equal to four words, and the frequency of word is in preset range.
In statistical nature word pre-candidate set, the method for the similarity of each Feature Words is: each in Feature Words pre-candidate set Feature Words carries out the calculating of quantity of information based on HowNet, and calculates the similar of any two Feature Words in Feature Words pre-candidate set Degree.
The method that Feature Words merges is: more than two Feature Words of threshold value, similarity is merged into a Feature Words, This feature word is the Feature Words that said two Feature Words medium frequency is bigger.
The invention has the beneficial effects as follows: the present invention uses semantic similarity based on quantity of information to merge similar features word, goes Except redundancy feature word, decrease the data volume that Feature Words is analyzed.
Accompanying drawing explanation
Fig. 1 is the flow chart of one embodiment of the present of invention.
Detailed description of the invention
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to The following stated.
As it is shown in figure 1, the extracting method of item property Feature Words in consumer reviews based on text analyzing, including following Step:
Step one, determine end article, and obtain the comment data of end article.
The acquisition methods of the comment data of end article is: use crawler algorithm to crawl end article from default website Comment data.
Step 2, described comment data is carried out pretreatment.
The preprocess method of comment data is: according to punctuation mark, every comment data is divided into multiple statement;Participle: will Described sentence segmentation is multiple single words;Part-of-speech tagging: mark part of speech for each single word.Participle refers to one Sentence is cut into one by one individually word, it is simply that according to certain specification, continuous print word sequence is reassembled into word order Row;Part-of-speech tagging refers to mark a correct part of speech into each word of word segmentation result, namely determines that each word is noun, moves The process of word, adjective or other parts of speech.
The preprocess method of comment data also includes, removes stop words, and it is actual that stop words refers to what does not has in sentence The word of implication, such as all kinds of pronouns, numeral, mathematical symbol etc..The present invention can use Open-Source Tools HanLp or Words partition system NLPIR carries out pretreatment to comment data.Such as, comment: " mobile phone feel is pretty good, and tonequality is good, and charging rate is fast " enters with HanLp The pretreated text of row is: " mobile phone/n feel/n is pretty good/a tonequality/n is good/a charging/v speed/n soon/a ".Wherein n represents name Word, a represents adjective, and v represents verb, and d represents adverbial word, part of speech symbol except use defined in HanLp mark collection in addition to, Can the most additionally add part custom words.
Step 3, from pretreated comment data obtain part of speech sequence samples.
The method obtaining part of speech sequence samples is: the comment on commodity statement that definition comprises item property Feature Words is characterized Sentence, chooses and carries out pretreated characteristic sentence as part of speech sequence samples;The formalization statement model of part of speech sequence samples is:
(BF3, BF2, BF1, featurei, AF1, AF2, AF3, Pos:i)
In formula: featureiFeature Words, BFiI-th word before Feature Words, AFiI-th word after Feature Words, Pos Feature Words position in this feature sentence.
Step 4, utilize described part of speech sequence samples mate all comment data, according to the formalization of part of speech sequence samples In statement model, the position of Feature Words extracts Feature Words from comment data, and records the frequency of each Feature Words, all features Word constitutive characteristic word pre-candidate set.
Step 5, Feature Words pre-candidate set is carried out pretreatment: whether the Feature Words in judging characteristic word pre-candidate set accords with Close preset rules, if meeting, then retain this feature word, otherwise delete this feature word;That is, the Feature Words meeting preset rules is protected Stay in Feature Words pre-candidate set, delete the Feature Words not meeting preset rules in Feature Words pre-candidate set.Preset rules is: word The length of language is less than or equal to four words, and the frequency of word is in preset range.
The similarity of any two Feature Words in step 6, statistical nature word pre-candidate set, and to similarity more than threshold value Two Feature Words merge.
In statistical nature word pre-candidate set, the method for the similarity of each Feature Words is: each in Feature Words pre-candidate set Feature Words carries out the calculating of quantity of information based on HowNet, and calculates the similar of any two Feature Words in Feature Words pre-candidate set Degree.
The method that Feature Words merges is: more than two Feature Words of threshold value, similarity is merged into a Feature Words, This feature word is the Feature Words that said two Feature Words medium frequency is bigger.
Embodiment one
Several comments as follows are selected to be analyzed from the comment text of certain mobile phone of certain electricity business website:
A, " mobile phone feel is pretty good, and tonequality is good, and charging rate is fast, the same with what boudoir honey was bought ".
B, " mobile phone pixel is fine, and unlocked by fingerprint is ultrafast, and quality is the prettyst good ".
C, " mobile phone screen is enough big, and pixel is high, and performance is good, and customer service attitude is super good, super likes, and next time, bull's machine also came this Family ".
D, " employing a period of time, screen size is suitable, and feel is pretty good, and earphone tonequality is fine, and volume is enough big, the most not Mistake, battery is the most durable ".
E, " quickly, Mobile phone screen is suitable, and definition is felt quite pleased in logistics, and pixel is high, and customer service is fine ".
Every comment is divided into multiple sentence according to punctuation mark, and utilizes HanLp to carry out data prediction, such as: " hands Machine/n-pixel/n very well/a fingerprint/n unblocks/v is super/d is fast/a mass/n also/d is pretty good/a ", wherein n representation noun, a representative is described Word, v represents verb, and d represents adverbial word.
Use brief introduction HanLp being carried out to pretreatment is as follows:
import com.hankcs.hanlp.tokenizer.NLPTokenizer;
TermList=NLPTokenizer.segment (sentence).
For five examples of A, B, C, D, E chosen above, each sentence in A, B, C is selected to use as characteristic sentence.
All texts in example are carried out pretreatment:
" mobile phone/n, feel/n, good/a, tonequality/n, good/a, charging/vi, speed/n, fast/a, and/cc, boudoir honey/nz, Buy/v, /ude1, the same/uyy] ".
" mobile phone/n, pixel/n, very well/a, fingerprint/n, unblock/v, super/d, fast/a, quality/n also/d, good/a ".
" mobile phone/n, screen/n, enough/v, big/a, pixel/n, height/a, performance/n, good/a, customer service/n, attitude/n, super/d, good/ A, super/b, like/vi, next time/t, buys/v, mobile phone/n, also/d, carrys out/vf, this/rzv, family/q ".
" use/v ,/ule, and one section/mq, time/n ,/ule, screen/n, size/n, suitable/a, feel/n, no Mistake/a, earphone/n, tonequality/n, very/d, good/a, volume/n, enough/v, big/a, very/d, good/a, battery/n, also/d, durable/ a”。
" logistics/n, very/d, fast/a, mobile phone/n, screen/n, suitable/a, definition/n, very/d, satisfaction/v, pixel/n, high/ A, customer service/n, very/d, good/a ".
Can be expressed as respectively (the most not comprising spy by the part of speech sequence formalized model of example A, B, C, D, E The sentence levying word only marks part of speech):
{feature1/n feature2/n AF1/a,Pos:1,2},{feature/n AF1/a,Pos:1}, {feature1/vi feature2/n AF1/a,Pos:1,2}{/cc,/nz,/v,/ude1,/uyy}。
{feature1/n feature2/n AF1/ a, Pos:1,2}, { feature1/n feature2/v AF1/d feature2/a,Pos:1,2},{feature/n AF1/d AF2/a,Pos:1}。
{{feature1/n feature2/n AF1/v AF2/ a, Pos:1,2}, { feature/n AF1/a,Pos:1}, {feature/n AF1/a,Pos:1},{BF1/n feature/n AF1/d AF2/ a, Pos:1,2} ,/b/vi} ,/t ,/v ,/ n,/d,/v,/rzv,/q}。
{feature/n AF1/a,Pos:1},{feature/n AF1/a,Pos:1}{{feature1/n feature2/n AF1/d AF2/ a, Pos, 1,2}, { feature/n AF1/v AF2/a,Pos:1},{/d/a},{feature/n AF1/d AF2/ a,Pos:1}。
{feature/n AF1/d AF2/a,Pos:1},{feature1/n feature2/n AF1/ a, Pos:1,2}, {feature/n AF1/d AF2/v,Pos:2},{feature/n AF1/a,Pos:1},{feature/n AF1/d AF2/a, Pos:1}。
After sample part of speech sequences match, it is thus achieved that preliminary election concentrate Feature Words and the frequency to be: mobile phone screen: 2, tonequality: 1, charging rate: 1, mobile phone pixel: 1, unlocked by fingerprint: 1, quality: 1, pixel: 2, performance: 1, customer service attitude: 1, screen: 1, ear Machine tonequality: 1, volume: 1, battery: 1, logistics: 1, Mobile phone screen: 1, definition: 1, customer service: 1}.
According to rule: if certain word is included in another word, using word less for word length as Feature Words, i.e. Word1.contains (word2), then retain word2 as Feature Words.Obtain after pre-selected works are made preliminary treatment by rule To screen: 4, and tonequality: 2, charging rate, pixel: 3, unlocked by fingerprint: 1, quality: 1, performance: 1, customer service 2, volume: 1, battery: 1, logistics: 1, definition: 1}.
The master record pattern of HowNet dictionary:
Word: W_C=
Word example: E_C=
Part of speech: G_C=
Concept definition (senses of a dictionary entry): DEF=
HowNet records example as follows:
Basic concepts in HowNet: justice is former: describe the ultimate unit of the senses of a dictionary entry;The senses of a dictionary entry: the different implications of word.
Assume that senses of a dictionary entry n_1 has n adopted former N_1={P_11, P_12 ..., P_1n}, senses of a dictionary entry n_2 have m adopted former N_2={P_ 21, P_22 ..., P_2m}, des (P) they are the adopted former quantity of descendants that adopted former p comprises, and max (P) is this justice elite tree place former system of justice The quantity of system, the most adopted former sample space, we select entity class in HowNet, event class, Attribute class, property value class, secondary spy Levying totally 2216 the adopted original work comprised is sample space.The information computing formula of the former P of justice is:
I C ( P ) = - l o g l o g d e s ( P ) + 1 m a x ( P )
The former similarity of justice depends on their general character and individual character, general character i.e.: on an adopted elite tree, it is assumed that adopted former P1And P2 Nearest ancestors' node is Pa, then PaFor adopted former P1And P2Minimum general character, adopted former calculating formula of similarity is:
Sim o r i ( P 1 , P 2 ) = I C ( P a ) I C ( P 1 ) + I C ( P 2 )
By calculating senses of a dictionary entry n respectively1And n2In each former quantity of information of justice, the similarity between the senses of a dictionary entry, Sim can be obtainedL (N1,N2): set similarity is equal to the arithmetic average of the similarity of its element pair, C1、C2Represent senses of a dictionary entry n respectively1And n2Middle record Number, between the senses of a dictionary entry, calculating formula of similarity is:
Sim i t e ( n 1 , n 2 ) = Sim L ( N 1 , N 2 ) m i n ( C 1 , C 2 ) C 1 C 2
For two word w1And w2, it is assumed that w1There is k the senses of a dictionary entry: w1=(n21,n22,…,n2r),w2There is r the senses of a dictionary entry: w2= (n11,n12,…,n1k), then can obtain word w by equation below by the above senses of a dictionary entry similarity calculated1And w2Similarity.
Sim w o r = Σ i = 1 k Σ j = 1 r Sim i t e ( n 1 i , n 2 j ) k r
The word using above formula to concentrate preliminary election carries out Similarity Measure two-by-two, and result is as follows:
According to the comparison of Similarity value, set similarity threshold β, at this we assume that β=0.310, then by eigenvalue Tonequality and the similarity of volume more than threshold value beta, tonequality and volume are merged into the word that frequency is high, i.e. tonequality, and frequency are Two word frequency rate sums, then characteristic value collection be screen: 4, tonequality: 3, charging rate, pixel: 3, unlocked by fingerprint: 1, quality: 1, performance: 1, customer service 2, battery: 1, logistics: 1, definition: 1}.
The above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form, is not to be taken as the eliminating to other embodiments, and can be used for other combinations various, amendment and environment, and can be at this In the described contemplated scope of literary composition, it is modified by above-mentioned teaching or the technology of association area or knowledge.And those skilled in the art are entered The change of row and change, the most all should be at the protection domains of claims of the present invention without departing from the spirit and scope of the present invention In.

Claims (9)

1. the extracting method of item property Feature Words in consumer reviews based on text analyzing, it is characterised in that: including:
Determine end article, and obtain the comment data of end article;
Described comment data is carried out pretreatment;
Part of speech sequence samples is obtained from pretreated comment data;
Utilize described part of speech sequence samples to mate all comment data, state in model special according to the formalization of part of speech sequence samples Levy the position of word from comment data, extract Feature Words, and record the frequency of each Feature Words, all Feature Words constitutive characteristic words Pre-candidate set;
The similarity of any two Feature Words in statistical nature word pre-candidate set, and similarity is more than two Feature Words of threshold value Merge.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: the acquisition methods of the comment data of end article is: use crawler algorithm to crawl target business from default website The comment data of product.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: the preprocess method of comment data is:
Every comment data is divided into multiple statement according to punctuation mark;
It is multiple single words by described sentence segmentation;
Part of speech is marked for each single word.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 3, It is characterized in that: the preprocess method of comment data also includes, remove stop words.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: the method obtaining part of speech sequence samples is:
The comment on commodity statement that definition comprises item property Feature Words is characterized sentence, chooses and carries out pretreated characteristic sentence conduct Part of speech sequence samples;
The formalization statement model of part of speech sequence samples is:
(BF3, BF2, BF1, featurei, AF1, AF2, AF3, Pos:i)
In formula: featureiFeature Words, BFiI-th word before Feature Words, AFiI-th word after Feature Words, Pos Feature Words position in this feature sentence.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: also include Feature Words pre-candidate set is carried out the step of pretreatment:
Whether the Feature Words in judging characteristic word pre-candidate set meets preset rules, if meeting, then retains this feature word, otherwise deletes Except this feature word.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 6, It is characterized in that: described preset rules is: the length of word is less than or equal to four words, and the frequency of word is in preset range.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: in statistical nature word pre-candidate set, the method for the similarity of each Feature Words is: in Feature Words pre-candidate set Each Feature Words carries out the calculating of quantity of information based on HowNet, and calculates any two Feature Words in Feature Words pre-candidate set Similarity.
The extracting method of item property Feature Words in consumer reviews based on text analyzing the most according to claim 1, It is characterized in that: the method that Feature Words merges is: similarity is merged into a feature more than two Feature Words of threshold value Word, this feature word is the Feature Words that said two Feature Words medium frequency is bigger.
CN201610580612.1A 2016-07-21 2016-07-21 The extracting method of item property Feature Words in consumer reviews based on text analyzing Pending CN106250365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610580612.1A CN106250365A (en) 2016-07-21 2016-07-21 The extracting method of item property Feature Words in consumer reviews based on text analyzing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610580612.1A CN106250365A (en) 2016-07-21 2016-07-21 The extracting method of item property Feature Words in consumer reviews based on text analyzing

Publications (1)

Publication Number Publication Date
CN106250365A true CN106250365A (en) 2016-12-21

Family

ID=57603270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610580612.1A Pending CN106250365A (en) 2016-07-21 2016-07-21 The extracting method of item property Feature Words in consumer reviews based on text analyzing

Country Status (1)

Country Link
CN (1) CN106250365A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948141A (en) * 2017-12-21 2019-06-28 北京京东尚科信息技术有限公司 A kind of method and apparatus for extracting Feature Words
CN109977198A (en) * 2019-04-01 2019-07-05 北京百度网讯科技有限公司 Establish method and apparatus, the hardware device, computer-readable medium of mapping relations
CN110096618A (en) * 2019-05-10 2019-08-06 北京友普信息技术有限公司 A kind of film recommended method based on fractional dimension sentiment analysis
CN111275521A (en) * 2020-01-16 2020-06-12 华南理工大学 Commodity recommendation method based on user comment and satisfaction level embedding
CN113378578A (en) * 2021-05-08 2021-09-10 重庆航天信息有限公司 Food and medicine public opinion analysis method
CN116402049A (en) * 2023-06-06 2023-07-07 摩尔线程智能科技(北京)有限责任公司 Method and device for generating decorated text set and image enhancer and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN103778214A (en) * 2014-01-16 2014-05-07 北京理工大学 Commodity property clustering method based on user comments
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN103778214A (en) * 2014-01-16 2014-05-07 北京理工大学 Commodity property clustering method based on user comments
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
李俊等: "面向电子商务网站的产品属性提取算法", 《小型微型计算机系统》 *
林岚岚: "基于语法模式的评论特征词提取", 《广东水利电力职业技术学院学报》 *
栗春亮等: "中文产品评论中属性词抽取方法研究", 《计算机工程》 *
胡龙茂: "中文在线评论中产品特征抽取研究", 《电脑知识与技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948141A (en) * 2017-12-21 2019-06-28 北京京东尚科信息技术有限公司 A kind of method and apparatus for extracting Feature Words
CN109977198A (en) * 2019-04-01 2019-07-05 北京百度网讯科技有限公司 Establish method and apparatus, the hardware device, computer-readable medium of mapping relations
CN110096618A (en) * 2019-05-10 2019-08-06 北京友普信息技术有限公司 A kind of film recommended method based on fractional dimension sentiment analysis
CN110096618B (en) * 2019-05-10 2021-06-15 北京友普信息技术有限公司 Movie recommendation method based on dimension-based emotion analysis
CN111275521A (en) * 2020-01-16 2020-06-12 华南理工大学 Commodity recommendation method based on user comment and satisfaction level embedding
CN111275521B (en) * 2020-01-16 2022-06-14 华南理工大学 Commodity recommendation method based on user comment and satisfaction level embedding
CN113378578A (en) * 2021-05-08 2021-09-10 重庆航天信息有限公司 Food and medicine public opinion analysis method
CN116402049A (en) * 2023-06-06 2023-07-07 摩尔线程智能科技(北京)有限责任公司 Method and device for generating decorated text set and image enhancer and electronic equipment
CN116402049B (en) * 2023-06-06 2023-08-22 摩尔线程智能科技(北京)有限责任公司 Method and device for generating decorated text set and image enhancer and electronic equipment

Similar Documents

Publication Publication Date Title
Alsubari et al. Data analytics for the identification of fake reviews using supervised learning
Shwartz et al. Still a pain in the neck: Evaluating text representations on lexical composition
Zhou et al. Fake news early detection: A theory-driven model
CN106708966B (en) Junk comment detection method based on similarity calculation
CN106250365A (en) The extracting method of item property Feature Words in consumer reviews based on text analyzing
US9336192B1 (en) Methods for analyzing text
Seerat et al. Opinion Mining: Issues and Challenges(A survey)
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
KR102032091B1 (en) Method And System of Comment Emotion Analysis based on Artificial Intelligence
CN108154395A (en) A kind of customer network behavior portrait method based on big data
Balwant Bidirectional LSTM based on POS tags and CNN architecture for fake news detection
CN101782898A (en) Method for analyzing tendentiousness of affective words
Ghosh et al. Natural language processing fundamentals: build intelligent applications that can interpret the human language to deliver impactful results
CN107357793A (en) Information recommendation method and device
CN108256968B (en) E-commerce platform commodity expert comment generation method
CN105843796A (en) Microblog emotional tendency analysis method and device
Sun et al. Pre-processing online financial text for sentiment classification: A natural language processing approach
Zhou et al. Fake news early detection: An interdisciplinary study
Gao et al. Text classification research based on improved Word2vec and CNN
CN110955750A (en) Combined identification method and device for comment area and emotion polarity, and electronic equipment
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Pandey et al. Sentiment analysis using lexicon based approach
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
Rana et al. A conceptual model for decision support systems using aspect based sentiment analysis
Yao et al. Online deception detection refueled by real world data collection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221

RJ01 Rejection of invention patent application after publication