CN104484336B - A kind of Chinese comment and analysis method and its system - Google Patents
A kind of Chinese comment and analysis method and its system Download PDFInfo
- Publication number
- CN104484336B CN104484336B CN201410663427.XA CN201410663427A CN104484336B CN 104484336 B CN104484336 B CN 104484336B CN 201410663427 A CN201410663427 A CN 201410663427A CN 104484336 B CN104484336 B CN 104484336B
- Authority
- CN
- China
- Prior art keywords
- comment
- analysis
- user
- user comment
- control centre
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The present invention is applied to collect Chinese " puppet comment " language material, disclose a kind of Chinese comment and analysis method, Chinese comment to user is analyzed so that determine if can be as language material, user submits to website and commented on, website front end sends analysis request to control centre, control centre is passed to analysis component, analysis component carries out a point subject analysis to it, Chinese Word Segmentation Service device is segmented and part-of-speech tagging, analysis component carries out syntactic analysis and sentiment analysis successively, and data center preserves analytical conclusions into user comment table.A kind of Chinese comment and analysis method provided by the invention, control centre can directly exclude unqualified language material by subject analysis, and syntactic analysis is carried out to user comment successively for analysis component and Sentiment orientation is analyzed, effectively draw the Sentiment orientation conclusion of Chinese comment, improve the degree of accuracy of analysis system, it is positive comment that now keeper, which can only browse tendentiousness, to determine whether it meets the requirements.
Description
【Technical field】
The present invention relates to a kind of Chinese comment and analysis method, when more particularly to a kind of Chinese publicity property " puppet comment " is collected
Analysis method and its system.
【Background technology】
China " 12 " Informationized plan clearly proposes " improving network public-opinion monitoring capability ", " network harmful information
The internet information developing goal of monitoring and management and control ability ", and plan " check and evaluation and the prison established for internet information
Survey the technical support systems such as early warning ".Visible network public sentiment, internet information monitoring have become national information strategic level
Vital task.And one of Key technology is sentiment analysis (Sentiment Analysis), i.e., key of the invention
One of technology.
Sentiment analysis, also known as opining mining (Opinion Mining), refer to by excavate the viewpoint in text, view,
The subjective informations such as mood, taste, classification judgement is made to the Sentiment orientation of text.Emotion has wide in range intension, can be people
For product, society judgement or a kind of aesthetic attitude.The Sentiment orientation of text refers to the tendentiousness that text is reflected
And its Sentiment orientation intensity, there is different criteria for classifications depending on different purposes.
Except the application in internet public feelings monitoring field, sentiment analysis is also widely used for life information service, doctor
Treat numerous industries of the relation people's livelihood such as service.User surfs the Net the comment of inquiry Related product, and is made by contrasting final
Purchase decision;Healthcare system assesses the attitude of patient, to provide more preferable prescription.This project is concerned with text emotion
Analyze the application in ecommerce.
Comment spam is ubiquitous on internet, such as the comment spam in community, the comment spam in blog, ecommerce
Comment spam of article etc. in website, different comment spams respectively have feature.In e-commerce website, some are special to comment
By:Product/service that some comments are held is said to be bad, and some has been said to be bad product/service, this two class
Comment is collectively referred to as " puppet comment ", and puppet comment is one kind of comment spam.In reality, this two class comment be all it is very harmful,
The former damages the interests of businessman, and the latter damages the interests of consumer.But pseudo- comment mixes with true comment, manually
Method be difficult to make a distinction.
The identification of puppet comment is needed by means of text emotion analytical technology, is the one of automatic Text Categorization on the technological essence
Kind, training dataset (also known as language material) acquisition methods commonly used in text classification are manually to mark.But " puppet comment " is people
Work is unrecognizable, that is to say, that the method for expert's mark can not be used to be labeled it.
We are investigated existing opining mining language material, TREC【TREC(Text Retrieval
Conference), text retrieval conference】Blog Track of setting, NTCIR evaluation and test MOAT, the evaluation and test of Chinese sentiment classification
The Chinese Opinion that COAE series provides certain scale excavates language material.In addition, many research units and individual also provide
The opining mining language material of certain scale.But so far, we do not have found the language material dedicated for Chinese pseudo- comment detection.
The problem of Ott et al. is difficult to obtain for pseudo- comment, by Amazon Mechanical Turk platforms, is assigned with
400 HIT (Human-Intelligence Tasks) tasks, it is (tendentious " pseudo- to have collected 400 duplicity comment spams
Comment "), experiment conclusion shows that crowdsourcing is effective.Unfortunately, it is domestic without such platform;We are domestic
User again can not possibly be to looking on Amazon Mechanical Turk platforms work to do.
The Chinese language material of current both no landfill product comment and analysis, also without the Chinese comment language material collecting net of correlation
Stand.In order to obtain " puppet comment " language material, it would be desirable to which oneself exploitation one is similar to the flat of Amazon Mechanical Turk
Platform.Existing research and be practiced as this Project-developing and provide many thinkings for being available for using for reference and technique preparation, but also need to do into
One step is integrated and improved.
【The content of the invention】
It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of Chinese comment and analysis method and its be
System, it is aimed to solve the problem that can not carry out automatic distinguishing, the comment progress Sentiment orientation analysis to website to pseudo- comment in the prior art
Inaccurate technical problem.
To achieve the above object, the present invention proposes a kind of Chinese comment and analysis method, and the Chinese that user submits is commented on
Analyzed, it is comprised the following steps that:
A) user submits to website and commented on, after comment of the website front end to user arranges, by the user comment of arrangement
Data center is transferred to, and analysis request is sent to control centre;
B after) data center receives user comment, it is recorded in user comment table, and adds for each user comment
The analysis whether analyzed is added to mark;
C after) control centre receives request, actively it is connected with data center, all analyses are labeled as not dividing by data center
The user comment of analysis is transferred to control centre;
D after) control centre receives user comment, it is passed to analysis component;
E after) analysis component receives user comment, a point subject analysis is carried out to it, if the theme of user comment is commented with it
The product of opinion is related, then user comment is transferred into Chinese Word Segmentation Service device, and go to step F);If the theme of user comment and its
The product of comment is unrelated, then directly generates the analytical conclusions of " theme is unrelated ", and goes to step H);
F after) Chinese Word Segmentation Service device receives user comment, user comment is segmented and part-of-speech tagging, and part of speech will be carried
The user comment of mark returns to analysis component;
G after) analysis component receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is obtained
Go out the analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored in stored;
H) analytical conclusions drawn are fed back to control centre by analysis component, and control centre, will after analytical conclusions are received
It is delivered to data center and stored;
I after) data center receives analytical conclusions, analytical conclusions are preserved into user comment table, and its is corresponding
The analysis mark of user comment is changed to mark;
J) when keeper needs to audit analytical conclusions, keeper is entered by supervising platform to the analytical conclusions of data center
Row operation;
K) website is actively with reading analytical conclusions, and when user asks to understand conclusion of comments, website will at data center
The analytical conclusions show user.
Preferably, step G) include step in detail below:
G1 after) analysis component receives the user comment that Chinese Word Segmentation Service device transmits, using the match party based on regular expression
Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis knot
By;
G2) according to sentiment analysis resource, the adjective in the short sentence after combination, verb, noun and emotional symbol are carried out
Feeling polarities judge, and draw the preliminary tendency conclusion of user comment;
G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the short sentence with part-of-speech tagging, and according to first
Step tendency conclusion, draws Sentiment orientation conclusion;
G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion are short
Language>Binary pair.Different weights are assigned to different emotion object, using method of weighting, draw the emotion of whole user comment
Conclusion, when emotion conclusion is positive tendentiousness, the analytical conclusions of analysis component generation " substantially conforming to require, wait pending ";When
Emotion conclusion is that negative tendentiousness is, the analytical conclusions of analysis component generation " unqualified comment is, it is necessary to front comment ";
G5) syntactic analysis conclusion and sentiment classification result are individually stored into being locally stored.
Preferably, step K) include step in detail below:
K1 it is) when keeper needs to check analytical conclusions, supervising platform is transferred to data center by request is audited;
K2 after) data center receives examination & verification request, it will analyze and be labeled as analysis knot corresponding to the user comment that has marked
By being transferred to supervising platform;
K3 after) supervising platform receives analytical conclusions, analytical conclusions are showed into keeper, keeper checks or repaiied to it
Change;
K4) after keeper completes operation, supervising platform generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to
Data center;
K5 after) data center receives examination & verification conclusion, conclusion will be audited and be added to shape in the analytical conclusions in user comment table
Cheng Xin analytical conclusions.
Preferably, described Chinese Word Segmentation Service device using the NLPIR/ICTCLAS2014 DLL modules of the Computer Department of the Chinese Academy of Science as
Core.
Preferably, in step A) in, the user comment after arrangement also includes the ID of evaluation user and the product class of evaluation
Type, in step E) in, whether subject analysis judges user comment comprising corresponding according to the property data base of product type
Product type title or product brand title.
Preferably, described website front end, control centre, analysis component and Chinese Word Segmentation Service device, which use, is based on Socket
Data transfer mode, website front end sends a request message to control centre, control centre's conduct as Socket clients
Socket service end is monitored, and analysis component sends messages to Chinese Word Segmentation Service device, Chinese Word Segmentation Service device as Socket clients
Monitored as Socket service device end.
Preferably, described data center is passed using database technology to website front end, control centre and supervising platform
The different pieces of information passed is managed.
In order to which the technical purpose of the present invention is better achieved, the invention also provides a kind of Chinese comment and analysis system, adopt
With a kind of above-mentioned Chinese comment and analysis method, including with the website front end of user mutual, store user comment data center,
The control centre that is connected with website front end and data center, the Chinese Word Segmentation Service device for being segmented and being marked to user comment, to point
Analysis component that user comment after word is analyzed, the supervising platform interacted with keeper and store the locals of analytical conclusions and deposit
Storage;
Described website front end transmits analysis request to control centre, and user comment is transferred to point by described control centre
Part is analysed, user comment is transferred to Chinese Word Segmentation Service device by described analysis component, and described Chinese Word Segmentation Service device will carry part of speech mark
The user comment of note is fed back to analysis component, described analysis component feedback analytical conclusions to control centre, in described data
The heart receives the examination & verification knot of the user comment data of website front end transmission, the analytical conclusions that control centre transmits and supervising platform respectively
By.
Beneficial effects of the present invention:Compared with prior art, a kind of Chinese comment and analysis method provided by the invention, structure
Rationally, using control centre and data center to coordinate, the work of connecting components.When user submits a comment from foreground
When, control centre can directly exclude uncorrelated comment by subject analysis so that the only related user comment ability of theme
The analysis of next step can be carried out, and syntactic analysis is carried out to user comment successively for analysis component and Sentiment orientation is analyzed, effectively
Ground draws the Sentiment orientation conclusion of Chinese comment, improves the degree of accuracy of analysis system, and now keeper can only browse tendentiousness
For positive comment, to determine whether it meets the requirements, so as to mitigate the workload that keeper handles pseudo- comment, improve pseudo- comment and receive
The efficiency of collection, and user can also learn whether the comment oneself submitted meets the requirements.
The feature and advantage of the present invention will be described in detail by embodiment combination accompanying drawing.
【Brief description of the drawings】
Fig. 1 is the schematic flow sheet of the embodiment of the present invention;
Fig. 2 is the part schematic flow sheet of the analysis component of the embodiment of the present invention.
【Embodiment】
It is right below by accompanying drawing and embodiment to make the object, technical solutions and advantages of the present invention of greater clarity
The present invention is further elaborated.However, it should be understood that specific embodiment described herein is only explaining this hair
Scope that is bright, being not intended to limit the invention.In addition, in the following description, the description to known features and technology is eliminated, with
Avoid unnecessarily obscuring idea of the invention.
Refering to Fig. 1 and Fig. 2, the embodiment of the present invention provides a kind of Chinese comment and analysis method, is made with the Chinese comment of user
Analyzed for language material, it is comprised the following steps that:
A) user submits to website and commented on, and after comment of the website front end 1 to user arranges, is sent to control centre 2
Analysis request, and the user comment of arrangement is transferred to data center 3.
Wherein, website front end 1 and control centre 2 use the data transfer mode based on Socket, and Socket is generally also referred to as
Make " socket ", application program generally sends request or response network request by " socket " to network.Opened according to connection
Dynamic mode and the local socket target to be connected, the connection procedure between socket can be divided into three steps:Service
Device is monitored, client request, and connection confirms.
Server is monitored:Server side socket and the specific client socket of delocalization, but in etc. it is to be connected
State, monitor network state in real time.
Client request:Connection request is proposed by the socket of client, the target to be connected is the socket of server end
Word.Therefore, the socket of client must describe the socket of its server to be connected first, it is indicated that server side socket
Address and port numbers, then just to server side socket propose connection request.
Connection confirms:The connection request of client socket is received in other words when server side socket listens to, it
With regard to the request of customer in response end socket, a new thread is established, client is issued in the description of server side socket, one
Denier client confirms this description, and connection just establishes.And server side socket keeps listening state, continue to
The connection request of other client sockets.
In such a structure, website front end 1 sends a request message to control centre 2 as Socket clients, in control
The heart 2 is monitored as Socket service end.That is user checks related data introduction by network, writes and submit and comment
After to website front end 1, website sends analysis request by web socket, to control centre 2, is triggered by control centre 2 and is
System is started working.
B after) data center 3 receives user comment, it is recorded in user comment table, and adds for each user comment
The analysis whether analyzed is added to mark.
C after) control centre 2 receives request, actively it is connected with data center 3, all analyses are labeled as not by data center 3
The user comment of analysis is transferred to control centre 2.
D after) control centre 2 receives user comment, it is passed to analysis component 4.
E after) analysis component 4 receives user comment, a point subject analysis is carried out to it, if the theme of user comment is commented with it
The product of opinion is related, then the user comment after participle is transferred into Chinese Word Segmentation Service device 5, and go to step F);If user comment
Theme it is unrelated with the product that it is commented on, then directly generate the analytical conclusions of " theme is unrelated ", and go to step H).
Wherein, analysis component 4 and Chinese Word Segmentation Service device 5 also use the data transfer mode based on Socket, and analysis component 4 is made
Chinese Word Segmentation Service device 5 is sent messages to for Socket clients, Chinese Word Segmentation Service device 5 is monitored as Socket service device end.
That is analysis component 4 is in the course of the work, during to being further processed by the user comment of subject analysis, it is necessary to
Chinese Word Segmentation Service device 5 is communicated, and Chinese Word Segmentation Service device 5 is monitored, such as Socket service device end by the port numbers of setting
The connection request that fruit receives this port of analysis component 4 then establishes connection, obtains data and processing conclusion is returned into analysis portion
Part 4.
F after) Chinese Word Segmentation Service device 5 receives user comment, user comment is segmented and part-of-speech tagging, and part of speech will be carried
The user comment of mark returns back to analysis component 4.
Wherein, Chinese Word Segmentation Service device 5 is that the NLPIR/ICTCLAS2014 DLL modules for encapsulating the Computer Department of the Chinese Academy of Science obtain,
Worked by the way of port snoop.Chinese Word Segmentation Service device 5 provides service using Socket forms, and by analysis component, 4 need to mark
Parametric form, comment text and the user-oriented dictionary using correlation of note are organized, and Chinese Word Segmentation Service device 5 are issued, after processing is completed
Text with part-of-speech tagging is returned to analysis component 4.
G after) analysis component 4 receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is obtained
Go out the analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored 6 in stored.
Because the identification of NLPIR/ICTCLAS2014 Words partition systems noun and part-of-speech tagging are more accurate, but relevant verb
Knowledge provides simultaneously few, so, the user comment of part of speech was marked for NLPIR/ICTCLAS2014 Words partition systems, was analyzed
Part 4 also needs to further handle, and supplements the relevant knowledge of verb, to improve the accuracy rate of verb phrase structural analysis.
H) analytical conclusions drawn are fed back to control centre 2 by analysis component 4, control centre 2 after analytical conclusions are received,
It is transferred to data center 3 again to be stored.
I after) data center 3 receives analytical conclusions, analytical conclusions are preserved into user comment table, and its is corresponding
The analysis mark of user comment is changed to mark.
J) when keeper needs to audit analytical conclusions, analytical conclusions of the keeper by supervising platform 7 to data center 3
Operated.
K) website is actively with reading analytical conclusions, and when user asks to understand conclusion of comments, website will at data center 3
The analytical conclusions show user.
In an embodiment of the present invention, control centre 2 is used as core.Control centre 2, which monitors, to be come to point of foreground website
Analysis request, call modules work, the request that processing website is sent.Wherein, control centre 2 is designed to easily extend
, it can be not required to change other parts, just can realize the extension of program, make by simply adding power function, call instruction
Obtaining the dynamic expansion of whole analysis system can realize easily.
Specifically, step G) include step in detail below:
G1 after) analysis component 4 receives the user comment that Chinese Word Segmentation Service device 5 transmits, using the matching based on regular expression
Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis
Conclusion.
Regular expression is described using single character string, is matched a series of character strings for meeting some syntactic rule.Table 1
It is regular expression example used in the embodiment of the present invention of part.
The regular expression example of table 1
According to the composition feature of various phrases, the embodiment of the present invention is classified to it, to form different short sentences.Its
In the structure of 1 time is only needed to use in the identification process of whole sentence, as shown in table 2;It may be needed repeatedly in identification process
The structure of processing, as shown in table 3.
There is the processing rule of phrase structure once and its regular expression in table 2
There is the processing rule of multiple language material structure and its regular expression in table 3
G2) according to sentiment analysis resource, the adjective in the short sentence after combination, verb, noun and emotional symbol are carried out
Feeling polarities judge, and draw the emotion value of vocabulary one-level.
G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the phrase with part-of-speech tagging, and according to word
The emotion value of remittance one-level, draws revised Sentiment orientation value.
In an embodiment of the present invention, Chinese sentiment classification vocabulary, the data hall provided using HowNet
(www.datatang.com) the sentiment classification weighting vocabulary (have chosen part) and network of the Tsing-Hua University provided
The vocabularies such as table, degree adverb table, negative vocabulary, adversative conjunction table, coordinating conjunction table, summary conjunction table are accorded with emotional facial expressions,
As sentiment analysis resource.
In step G2) in, embodiments of the invention are weighted using Chinese sentiment classification vocabulary, sentiment classification
Vocabulary and network are labeled with emotional facial expressions symbol table to the emotion of adjective, verb, noun and emotional symbol, then in step
Rapid G3) in, Emotion tagging is carried out to some adverbial words according to degree adverb table, negative vocabulary.Degree adverb only influences the strong of emotion
It is weak, and negative word can then influence the polarity of emotion.That is, it was after the word of positive tendency is modified through negative word, to become originally
For what is born;Originally it was after the word of negative sense tendency is modified through negative word, to be changed into positive.Two phrases connected by adversative conjunction,
Anterior-posterior polarity is opposite;And two phrases connected by coordinating conjunction, anterior-posterior polarity are consistent;The phrase tendency of conjunction connection is summarized,
Then contribute to infer the tendency entirely commented on.
G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion are short
Language>Binary pair.Different weights are assigned to different emotion object, using method of weighting, draw the emotion of whole user comment
Conclusion, when emotion conclusion is positive tendentiousness, analysis component 4 generates the analytical conclusions of " substantially conforming to require, wait pending ";When
Emotion conclusion is that negative tendentiousness is, analysis component 4 generates the analytical conclusions of " unqualified comment is, it is necessary to front comment ".Such as the institute of table 4
Show, the specific test case of the embodiment of the present invention.
Table 4<User comment, comment on short sentence>Binary is to extracting test case
G5) syntactic analysis conclusion and sentiment analysis conclusion are individually stored to being locally stored in 6.Due to syntactic analysis conclusion
The analysis logic of analysis component can clearly be reflected with sentiment analysis conclusion, therefore, be stored separately on and be locally stored 6
In, facilitate keeper to check it.
Further, step K) include step in detail below:
K1 it is) when keeper needs to check analytical conclusions, supervising platform 7 is transferred to data center 3 by request is audited.
K2 after) data center 3 receives examination & verification request, it will analyze and be labeled as analysis knot corresponding to the user comment that has marked
By being transferred to supervising platform 7.
K3 after) supervising platform 7 receives analytical conclusions, analytical conclusions are showed into keeper, keeper it is checked or
Modification.In an embodiment of the present invention, keeper carries out artificial examination & verification and confirmation for positive user comment to Sentiment orientation, will
The unrelated comment or analysis component that control centre does not pick out analyze unsuitable negative reviews and do final, clear and definite point
Analyse result.Now, the user comment for having already passed through keeper's examination & verification can be as the instruction of whole publicity property puppet comment and analysis system
White silk language material.
K4) after keeper completes operation, supervising platform 7 generates corresponding examination & verification conclusion, and examination & verification conclusion is preserved
To data center 3.
K5 after) data center 3 receives examination & verification conclusion, conclusion will be audited and be added to shape in the analytical conclusions in user comment table
Cheng Xin analytical conclusions.Because examination & verification conclusion is the final analysis result of analytical conclusions, therefore in an embodiment of the present invention, number
Directly examination & verification conclusion is write in analytical conclusions according to center 3, examination & verification conclusion is stored without opening up new memory space, improves
The storage efficiency of data.
Further, in step A) in, the user comment after arrangement also includes the ID of evaluation user and the product class of evaluation
Type, in step E) in, whether subject analysis judges user comment comprising corresponding according to the property data base of product type
Product type title or product brand title.
Wherein, product type listed by the characteristic of product type and website front end corresponds, and is adjusted in subject analysis
With the corresponding characteristic of the product type in user comment, and in this, as judgment standard.
What it is due to embodiments of the invention selection is tendentious comment of the user to product progress in network, therefore, phase
Answer characteristic of the ground using product type.Certainly, embodiments of the invention can be applicable to other kinds of user
Comment, including the comment for topical news, now need the corresponding characteristic using topical news classification.
Specifically, data center 3 is transmitted using database technology to website front end 1, control centre 2 and supervising platform 7
Different pieces of information is managed.
Keeper adds relevant information (such as product category, brief introduction for the product for needing to collect pseudo- comment by supervising platform 7
Deng) data center 3 is arrived, website front end 1 to data center 3 reads product information and is shown;User is read by website front end 1
Read product information and submit related commentary, the comment that website front end 1 submits user writes data center 3;Control centre 2 is from number
The data such as untreated comment ID, affiliated subject categories are extracted according to center 3 and send analysis component 4 to;Analysis component 4
Call a series of module or service to carry out text analyzing, and analytical conclusions are fed back into control centre 2, will by control centre 2
Feed back conclusion write-in data center 3;Keeper carries out most last instance to sentiment classification conclusion by supervising platform 7 for positive comment
Core, it is determined that comment validity, and will whether effective conclusion write-in data center 3;User is by website front end 1 from data
The heart 3 obtains final analytical conclusions.
In order to which the technical purpose of the present invention is better achieved, the invention also provides a kind of Chinese comment and analysis system, adopt
With above-mentioned Chinese comment and analysis method, including with the website front end 1 of user mutual, store user comment data center 3, with
It is connected control centre 2 of website front end 1 and data center 3, the Chinese Word Segmentation Service device 5 for being segmented and being marked to user comment, right
Analysis component 4 that user comment after participle is analyzed, the supervising platform 7 interacted with keeper and the sheet for storing analysis result
Ground storage 6.
Specifically, website front end 1 transmits analysis request to control centre 2, control centre 2 and user comment is transferred into analysis
User comment is transferred to Chinese Word Segmentation Service device 5 by part 4, analysis component 4, and Chinese Word Segmentation Service device 5 comments the user with part-of-speech tagging
By analysis component 4, the feedback analytical conclusions of analysis component 4 to control centre 2 is fed back to, data center 3 receives website front end 1 respectively
User comment data, the analytical conclusions of the transmission of control centre 2 and the examination & verification conclusion of supervising platform 7 of transmission.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
Any modification, equivalent substitution or improvement made within refreshing and principle etc., should be included in the scope of the protection.
Claims (2)
- A kind of 1. Chinese comment and analysis method, it is characterised in that:The Chinese comment submitted to user is analyzed, its specific steps It is as follows:A) user is submitted to website and commented on, and after comment of the website front end (1) to user arranges, the user comment of arrangement is passed Data center (3) is handed to, and analysis request is sent to control centre (2), the user comment after arrangement also includes evaluation user's ID and the product type of evaluation;B after) data center (3) receives user comment, it is recorded in user comment table, and is added for each user comment The analysis mark whether analyzed;C after) control centre (2) receives request, actively it is connected with data center (3), all analyses are labeled as by data center (3) The user comment do not analyzed is transferred to control centre (2);D after) control centre (2) receives user comment, it is passed to analysis component (4);E after) analysis component (4) receives user comment, a point subject analysis is carried out to it, subject analysis is according to the feature of product type Database judges whether user comment includes corresponding product type title or product brand title, if user comment Theme it is related to the product that it is commented on, then user comment is transferred to Chinese Word Segmentation Service device (5), and go to step F);If with The theme of family comment is unrelated with the product that it is commented on, then directly generates the analytical conclusions of " theme is unrelated ", and go to step H);F after) Chinese Word Segmentation Service device (5) receives user comment, user comment is segmented and part-of-speech tagging, and part of speech mark will be carried The user comment of note returns to analysis component (4), and described Chinese Word Segmentation Service device (5) is with the NLPIR/ of the Computer Department of the Chinese Academy of Science ICTCLAS2014DLL modules are core, described website front end (1), control centre (2), analysis component (4) and Chinese Word Segmentation Service Device (5) uses the data transfer mode based on Socket, and website front end (1) is sent a request message to as Socket clients Control centre (2), control centre (2) are monitored as Socket service end, and analysis component (4) is sent out as Socket clients Message is sent to give Chinese Word Segmentation Service device (5), Chinese Word Segmentation Service device (5) is monitored as Socket service device end;G after) analysis component (4) receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is drawn The analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored in (6) and stored, specifically Step is as follows:G1 after) analysis component (4) receives the user comment that Chinese Word Segmentation Service device (5) transmits, using the matching based on regular expression Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis Conclusion;G2) according to sentiment analysis resource, emotion is carried out to the adjective in the short sentence after combination, verb, noun and emotional symbol Polarity judges, and draws the emotion value of vocabulary one-level;G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the short sentence with part-of-speech tagging, and according to vocabulary level Emotion value, draw revised Sentiment orientation value;G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion phrase>Two Member is right, and different weights are assigned to different emotion object, using method of weighting, draws the emotion conclusion of whole user comment, When emotion conclusion is positive tendentiousness, the analytical conclusions of analysis component (4) generation " substantially conforming to require, wait pending ";Work as feelings Sense conclusion is that negative tendentiousness is, the analytical conclusions of analysis component (4) generation " unqualified comment is, it is necessary to front comment ";G5) syntactic analysis conclusion and sentiment classification conclusion are individually stored to being locally stored in (6);H) analytical conclusions drawn are fed back to control centre (2) by analysis component (4), and control centre (2) is receiving analytical conclusions Afterwards, data center (3) is passed it to be stored;I after) data center (3) receives analytical conclusions, analytical conclusions are preserved into user comment table, and by its corresponding use The analysis mark of family comment is changed to mark;J) when keeper needs to audit analytical conclusions, analytical conclusions of the keeper by supervising platform (7) to data center (3) Operated, described data center (3) is using database technology to website front end (1), control centre (2) and supervising platform (7) different pieces of information transmitted is managed;K) analytical conclusions are read in website from data center (3), and when user asks to understand conclusion of comments, this is analyzed in website Conclusion shows user, comprises the following steps that:K1 it is) when keeper needs to check analytical conclusions, supervising platform (7) is transferred to data center (3) by request is audited;K2 after) data center (3) receives examination & verification request, it will analyze and be labeled as analytical conclusions corresponding to the user comment that has marked It is transferred to supervising platform (7);K3 after) supervising platform (7) receives analytical conclusions, analytical conclusions are showed into keeper, keeper checks or repaiied to it Change;K4) after keeper completes operation, supervising platform (7) generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to Data center (3);K5 after) data center (3) receives examination & verification conclusion, formed in the analytical conclusions that examination & verification conclusion is added in user comment table New analytical conclusions.
- A kind of 2. Chinese comment and analysis system, it is characterised in that:Using a kind of Chinese comment and analysis side as claimed in claim 1 Method, including with the website front end (1) of user mutual, store user comment data center (3), with website front end (1) and data The connected control centre (2) in center (3), the Chinese Word Segmentation Service device (5) that user comment is segmented and marked, to the use after participle Comment on being locally stored for the analysis component (4) analyzed, the supervising platform (7) interacted with keeper and storage analysis result in family (6);Described website front end (1) transmits analysis request to control centre (2), described control centre (2) and passes user comment Analysis component (4) is handed to, user comment is transferred to Chinese Word Segmentation Service device (5), described Chinese Word Segmentation Service by described analysis component (4) User comment with part-of-speech tagging is fed back to analysis component (4), described analysis component (4) feedback analytical conclusions by device (5) To control centre (2), described data center (3) receives the user comment data of website front end (1) transmission, control centre respectively (2) analytical conclusions and the examination & verification conclusion of supervising platform (7) transmitted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410663427.XA CN104484336B (en) | 2014-11-19 | 2014-11-19 | A kind of Chinese comment and analysis method and its system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410663427.XA CN104484336B (en) | 2014-11-19 | 2014-11-19 | A kind of Chinese comment and analysis method and its system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104484336A CN104484336A (en) | 2015-04-01 |
CN104484336B true CN104484336B (en) | 2017-12-19 |
Family
ID=52758877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410663427.XA Active CN104484336B (en) | 2014-11-19 | 2014-11-19 | A kind of Chinese comment and analysis method and its system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104484336B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649268A (en) * | 2016-11-30 | 2017-05-10 | 北京京东尚科信息技术有限公司 | Investigation sample judging method and system and grey list generation method and system |
CN108280560A (en) * | 2017-01-06 | 2018-07-13 | 广州市动景计算机科技有限公司 | A kind of anti-brush method and device of subject evaluation |
CN109933775B (en) * | 2017-12-15 | 2022-02-18 | 腾讯科技(深圳)有限公司 | UGC content processing method and device |
CN108170685B (en) * | 2018-01-29 | 2021-10-29 | 浙江省公众信息产业有限公司 | Text emotion analysis method and device and computer readable storage medium |
CN108256098B (en) * | 2018-01-30 | 2022-02-15 | 中国银联股份有限公司 | Method and device for determining emotional tendency of user comment |
CN108536601B (en) * | 2018-04-13 | 2022-05-31 | 腾讯科技(深圳)有限公司 | Evaluation method, device, server and storage medium |
CN109522412B (en) * | 2018-11-14 | 2021-02-26 | 鼎富智能科技有限公司 | Text emotion analysis method, device and medium |
CN110442798B (en) * | 2019-07-03 | 2021-10-08 | 华中科技大学 | Spam comment user group detection method based on network representation learning |
CN110674256B (en) * | 2019-09-25 | 2023-05-12 | 携程计算机技术(上海)有限公司 | Method and system for detecting correlation degree of comment and reply of OTA hotel |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102096680A (en) * | 2009-12-15 | 2011-06-15 | 北京大学 | Method and device for analyzing information validity |
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
CN103064971A (en) * | 2013-01-05 | 2013-04-24 | 南京邮电大学 | Scoring and Chinese sentiment analysis based review spam detection method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9201863B2 (en) * | 2009-12-24 | 2015-12-01 | Woodwire, Inc. | Sentiment analysis from social media content |
-
2014
- 2014-11-19 CN CN201410663427.XA patent/CN104484336B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102096680A (en) * | 2009-12-15 | 2011-06-15 | 北京大学 | Method and device for analyzing information validity |
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
CN103064971A (en) * | 2013-01-05 | 2013-04-24 | 南京邮电大学 | Scoring and Chinese sentiment analysis based review spam detection method |
Also Published As
Publication number | Publication date |
---|---|
CN104484336A (en) | 2015-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104484336B (en) | A kind of Chinese comment and analysis method and its system | |
Guellil et al. | Social big data mining: A survey focused on opinion mining and sentiments analysis | |
CN103207913B (en) | The acquisition methods of commercial fine granularity semantic relation and system | |
Khasawneh et al. | Sentiment analysis of Arabic social media content: a comparative study | |
WO2013059487A1 (en) | System and methods for automatically detecting deceptive content | |
CN107273474A (en) | Autoabstract abstracting method and system based on latent semantic analysis | |
CN103064971A (en) | Scoring and Chinese sentiment analysis based review spam detection method | |
CN108845986A (en) | A kind of sentiment analysis method, equipment and system, computer readable storage medium | |
CN104050243B (en) | It is a kind of to search for the network search method combined with social activity and its system | |
CN111090735B (en) | Performance evaluation method of intelligent question-answering method based on knowledge graph | |
CN111080055A (en) | Hotel scoring method, hotel recommendation method, electronic device and storage medium | |
Jianqiang et al. | Combining semantic and prior polarity for boosting twitter sentiment analysis | |
Ziser et al. | Humor detection in product question answering systems | |
Saranya et al. | A Machine Learning-Based Technique with IntelligentWordNet Lemmatize for Twitter Sentiment Analysis. | |
Wu et al. | Sentiment analysis of online product reviews based on SenBERT-CNN | |
Dou et al. | Improving large-scale paraphrase acquisition and generation | |
Nguyen et al. | A corpus for aspect-based sentiment analysis in Vietnamese | |
CN112115712B (en) | Topic-based group emotion analysis method | |
Zhang et al. | Product features extraction and categorization in Chinese reviews | |
CN104933097B (en) | A kind of data processing method and device for retrieval | |
Hristova | Topic modeling of chat data: A case study in the banking domain | |
Yan et al. | Sentiment Analysis of Short Texts Based on Parallel DenseNet. | |
Sudhakaran et al. | Research directions, challenges and issues in opinion mining | |
Huang et al. | Examining bias in opinion summarisation through the perspective of opinion diversity | |
Bugueño et al. | Applying self-attention for stance classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |