CN104484336B - A kind of Chinese comment and analysis method and its system - Google Patents

A kind of Chinese comment and analysis method and its system Download PDF

Info

Publication number
CN104484336B
CN104484336B CN201410663427.XA CN201410663427A CN104484336B CN 104484336 B CN104484336 B CN 104484336B CN 201410663427 A CN201410663427 A CN 201410663427A CN 104484336 B CN104484336 B CN 104484336B
Authority
CN
China
Prior art keywords
comment
analysis
user
user comment
control centre
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410663427.XA
Other languages
Chinese (zh)
Other versions
CN104484336A (en
Inventor
郝秀兰
蒋云良
许方曲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huzhou University
Original Assignee
Huzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huzhou University filed Critical Huzhou University
Priority to CN201410663427.XA priority Critical patent/CN104484336B/en
Publication of CN104484336A publication Critical patent/CN104484336A/en
Application granted granted Critical
Publication of CN104484336B publication Critical patent/CN104484336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The present invention is applied to collect Chinese " puppet comment " language material, disclose a kind of Chinese comment and analysis method, Chinese comment to user is analyzed so that determine if can be as language material, user submits to website and commented on, website front end sends analysis request to control centre, control centre is passed to analysis component, analysis component carries out a point subject analysis to it, Chinese Word Segmentation Service device is segmented and part-of-speech tagging, analysis component carries out syntactic analysis and sentiment analysis successively, and data center preserves analytical conclusions into user comment table.A kind of Chinese comment and analysis method provided by the invention, control centre can directly exclude unqualified language material by subject analysis, and syntactic analysis is carried out to user comment successively for analysis component and Sentiment orientation is analyzed, effectively draw the Sentiment orientation conclusion of Chinese comment, improve the degree of accuracy of analysis system, it is positive comment that now keeper, which can only browse tendentiousness, to determine whether it meets the requirements.

Description

A kind of Chinese comment and analysis method and its system
【Technical field】
The present invention relates to a kind of Chinese comment and analysis method, when more particularly to a kind of Chinese publicity property " puppet comment " is collected Analysis method and its system.
【Background technology】
China " 12 " Informationized plan clearly proposes " improving network public-opinion monitoring capability ", " network harmful information The internet information developing goal of monitoring and management and control ability ", and plan " check and evaluation and the prison established for internet information Survey the technical support systems such as early warning ".Visible network public sentiment, internet information monitoring have become national information strategic level Vital task.And one of Key technology is sentiment analysis (Sentiment Analysis), i.e., key of the invention One of technology.
Sentiment analysis, also known as opining mining (Opinion Mining), refer to by excavate the viewpoint in text, view, The subjective informations such as mood, taste, classification judgement is made to the Sentiment orientation of text.Emotion has wide in range intension, can be people For product, society judgement or a kind of aesthetic attitude.The Sentiment orientation of text refers to the tendentiousness that text is reflected And its Sentiment orientation intensity, there is different criteria for classifications depending on different purposes.
Except the application in internet public feelings monitoring field, sentiment analysis is also widely used for life information service, doctor Treat numerous industries of the relation people's livelihood such as service.User surfs the Net the comment of inquiry Related product, and is made by contrasting final Purchase decision;Healthcare system assesses the attitude of patient, to provide more preferable prescription.This project is concerned with text emotion Analyze the application in ecommerce.
Comment spam is ubiquitous on internet, such as the comment spam in community, the comment spam in blog, ecommerce Comment spam of article etc. in website, different comment spams respectively have feature.In e-commerce website, some are special to comment By:Product/service that some comments are held is said to be bad, and some has been said to be bad product/service, this two class Comment is collectively referred to as " puppet comment ", and puppet comment is one kind of comment spam.In reality, this two class comment be all it is very harmful, The former damages the interests of businessman, and the latter damages the interests of consumer.But pseudo- comment mixes with true comment, manually Method be difficult to make a distinction.
The identification of puppet comment is needed by means of text emotion analytical technology, is the one of automatic Text Categorization on the technological essence Kind, training dataset (also known as language material) acquisition methods commonly used in text classification are manually to mark.But " puppet comment " is people Work is unrecognizable, that is to say, that the method for expert's mark can not be used to be labeled it.
We are investigated existing opining mining language material, TREC【TREC(Text Retrieval Conference), text retrieval conference】Blog Track of setting, NTCIR evaluation and test MOAT, the evaluation and test of Chinese sentiment classification The Chinese Opinion that COAE series provides certain scale excavates language material.In addition, many research units and individual also provide The opining mining language material of certain scale.But so far, we do not have found the language material dedicated for Chinese pseudo- comment detection.
The problem of Ott et al. is difficult to obtain for pseudo- comment, by Amazon Mechanical Turk platforms, is assigned with 400 HIT (Human-Intelligence Tasks) tasks, it is (tendentious " pseudo- to have collected 400 duplicity comment spams Comment "), experiment conclusion shows that crowdsourcing is effective.Unfortunately, it is domestic without such platform;We are domestic User again can not possibly be to looking on Amazon Mechanical Turk platforms work to do.
The Chinese language material of current both no landfill product comment and analysis, also without the Chinese comment language material collecting net of correlation Stand.In order to obtain " puppet comment " language material, it would be desirable to which oneself exploitation one is similar to the flat of Amazon Mechanical Turk Platform.Existing research and be practiced as this Project-developing and provide many thinkings for being available for using for reference and technique preparation, but also need to do into One step is integrated and improved.
【The content of the invention】
It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of Chinese comment and analysis method and its be System, it is aimed to solve the problem that can not carry out automatic distinguishing, the comment progress Sentiment orientation analysis to website to pseudo- comment in the prior art Inaccurate technical problem.
To achieve the above object, the present invention proposes a kind of Chinese comment and analysis method, and the Chinese that user submits is commented on Analyzed, it is comprised the following steps that:
A) user submits to website and commented on, after comment of the website front end to user arranges, by the user comment of arrangement Data center is transferred to, and analysis request is sent to control centre;
B after) data center receives user comment, it is recorded in user comment table, and adds for each user comment The analysis whether analyzed is added to mark;
C after) control centre receives request, actively it is connected with data center, all analyses are labeled as not dividing by data center The user comment of analysis is transferred to control centre;
D after) control centre receives user comment, it is passed to analysis component;
E after) analysis component receives user comment, a point subject analysis is carried out to it, if the theme of user comment is commented with it The product of opinion is related, then user comment is transferred into Chinese Word Segmentation Service device, and go to step F);If the theme of user comment and its The product of comment is unrelated, then directly generates the analytical conclusions of " theme is unrelated ", and goes to step H);
F after) Chinese Word Segmentation Service device receives user comment, user comment is segmented and part-of-speech tagging, and part of speech will be carried The user comment of mark returns to analysis component;
G after) analysis component receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is obtained Go out the analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored in stored;
H) analytical conclusions drawn are fed back to control centre by analysis component, and control centre, will after analytical conclusions are received It is delivered to data center and stored;
I after) data center receives analytical conclusions, analytical conclusions are preserved into user comment table, and its is corresponding The analysis mark of user comment is changed to mark;
J) when keeper needs to audit analytical conclusions, keeper is entered by supervising platform to the analytical conclusions of data center Row operation;
K) website is actively with reading analytical conclusions, and when user asks to understand conclusion of comments, website will at data center The analytical conclusions show user.
Preferably, step G) include step in detail below:
G1 after) analysis component receives the user comment that Chinese Word Segmentation Service device transmits, using the match party based on regular expression Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis knot By;
G2) according to sentiment analysis resource, the adjective in the short sentence after combination, verb, noun and emotional symbol are carried out Feeling polarities judge, and draw the preliminary tendency conclusion of user comment;
G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the short sentence with part-of-speech tagging, and according to first Step tendency conclusion, draws Sentiment orientation conclusion;
G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion are short Language>Binary pair.Different weights are assigned to different emotion object, using method of weighting, draw the emotion of whole user comment Conclusion, when emotion conclusion is positive tendentiousness, the analytical conclusions of analysis component generation " substantially conforming to require, wait pending ";When Emotion conclusion is that negative tendentiousness is, the analytical conclusions of analysis component generation " unqualified comment is, it is necessary to front comment ";
G5) syntactic analysis conclusion and sentiment classification result are individually stored into being locally stored.
Preferably, step K) include step in detail below:
K1 it is) when keeper needs to check analytical conclusions, supervising platform is transferred to data center by request is audited;
K2 after) data center receives examination & verification request, it will analyze and be labeled as analysis knot corresponding to the user comment that has marked By being transferred to supervising platform;
K3 after) supervising platform receives analytical conclusions, analytical conclusions are showed into keeper, keeper checks or repaiied to it Change;
K4) after keeper completes operation, supervising platform generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to Data center;
K5 after) data center receives examination & verification conclusion, conclusion will be audited and be added to shape in the analytical conclusions in user comment table Cheng Xin analytical conclusions.
Preferably, described Chinese Word Segmentation Service device using the NLPIR/ICTCLAS2014 DLL modules of the Computer Department of the Chinese Academy of Science as Core.
Preferably, in step A) in, the user comment after arrangement also includes the ID of evaluation user and the product class of evaluation Type, in step E) in, whether subject analysis judges user comment comprising corresponding according to the property data base of product type Product type title or product brand title.
Preferably, described website front end, control centre, analysis component and Chinese Word Segmentation Service device, which use, is based on Socket Data transfer mode, website front end sends a request message to control centre, control centre's conduct as Socket clients Socket service end is monitored, and analysis component sends messages to Chinese Word Segmentation Service device, Chinese Word Segmentation Service device as Socket clients Monitored as Socket service device end.
Preferably, described data center is passed using database technology to website front end, control centre and supervising platform The different pieces of information passed is managed.
In order to which the technical purpose of the present invention is better achieved, the invention also provides a kind of Chinese comment and analysis system, adopt With a kind of above-mentioned Chinese comment and analysis method, including with the website front end of user mutual, store user comment data center, The control centre that is connected with website front end and data center, the Chinese Word Segmentation Service device for being segmented and being marked to user comment, to point Analysis component that user comment after word is analyzed, the supervising platform interacted with keeper and store the locals of analytical conclusions and deposit Storage;
Described website front end transmits analysis request to control centre, and user comment is transferred to point by described control centre Part is analysed, user comment is transferred to Chinese Word Segmentation Service device by described analysis component, and described Chinese Word Segmentation Service device will carry part of speech mark The user comment of note is fed back to analysis component, described analysis component feedback analytical conclusions to control centre, in described data The heart receives the examination & verification knot of the user comment data of website front end transmission, the analytical conclusions that control centre transmits and supervising platform respectively By.
Beneficial effects of the present invention:Compared with prior art, a kind of Chinese comment and analysis method provided by the invention, structure Rationally, using control centre and data center to coordinate, the work of connecting components.When user submits a comment from foreground When, control centre can directly exclude uncorrelated comment by subject analysis so that the only related user comment ability of theme The analysis of next step can be carried out, and syntactic analysis is carried out to user comment successively for analysis component and Sentiment orientation is analyzed, effectively Ground draws the Sentiment orientation conclusion of Chinese comment, improves the degree of accuracy of analysis system, and now keeper can only browse tendentiousness For positive comment, to determine whether it meets the requirements, so as to mitigate the workload that keeper handles pseudo- comment, improve pseudo- comment and receive The efficiency of collection, and user can also learn whether the comment oneself submitted meets the requirements.
The feature and advantage of the present invention will be described in detail by embodiment combination accompanying drawing.
【Brief description of the drawings】
Fig. 1 is the schematic flow sheet of the embodiment of the present invention;
Fig. 2 is the part schematic flow sheet of the analysis component of the embodiment of the present invention.
【Embodiment】
It is right below by accompanying drawing and embodiment to make the object, technical solutions and advantages of the present invention of greater clarity The present invention is further elaborated.However, it should be understood that specific embodiment described herein is only explaining this hair Scope that is bright, being not intended to limit the invention.In addition, in the following description, the description to known features and technology is eliminated, with Avoid unnecessarily obscuring idea of the invention.
Refering to Fig. 1 and Fig. 2, the embodiment of the present invention provides a kind of Chinese comment and analysis method, is made with the Chinese comment of user Analyzed for language material, it is comprised the following steps that:
A) user submits to website and commented on, and after comment of the website front end 1 to user arranges, is sent to control centre 2 Analysis request, and the user comment of arrangement is transferred to data center 3.
Wherein, website front end 1 and control centre 2 use the data transfer mode based on Socket, and Socket is generally also referred to as Make " socket ", application program generally sends request or response network request by " socket " to network.Opened according to connection Dynamic mode and the local socket target to be connected, the connection procedure between socket can be divided into three steps:Service Device is monitored, client request, and connection confirms.
Server is monitored:Server side socket and the specific client socket of delocalization, but in etc. it is to be connected State, monitor network state in real time.
Client request:Connection request is proposed by the socket of client, the target to be connected is the socket of server end Word.Therefore, the socket of client must describe the socket of its server to be connected first, it is indicated that server side socket Address and port numbers, then just to server side socket propose connection request.
Connection confirms:The connection request of client socket is received in other words when server side socket listens to, it With regard to the request of customer in response end socket, a new thread is established, client is issued in the description of server side socket, one Denier client confirms this description, and connection just establishes.And server side socket keeps listening state, continue to The connection request of other client sockets.
In such a structure, website front end 1 sends a request message to control centre 2 as Socket clients, in control The heart 2 is monitored as Socket service end.That is user checks related data introduction by network, writes and submit and comment After to website front end 1, website sends analysis request by web socket, to control centre 2, is triggered by control centre 2 and is System is started working.
B after) data center 3 receives user comment, it is recorded in user comment table, and adds for each user comment The analysis whether analyzed is added to mark.
C after) control centre 2 receives request, actively it is connected with data center 3, all analyses are labeled as not by data center 3 The user comment of analysis is transferred to control centre 2.
D after) control centre 2 receives user comment, it is passed to analysis component 4.
E after) analysis component 4 receives user comment, a point subject analysis is carried out to it, if the theme of user comment is commented with it The product of opinion is related, then the user comment after participle is transferred into Chinese Word Segmentation Service device 5, and go to step F);If user comment Theme it is unrelated with the product that it is commented on, then directly generate the analytical conclusions of " theme is unrelated ", and go to step H).
Wherein, analysis component 4 and Chinese Word Segmentation Service device 5 also use the data transfer mode based on Socket, and analysis component 4 is made Chinese Word Segmentation Service device 5 is sent messages to for Socket clients, Chinese Word Segmentation Service device 5 is monitored as Socket service device end. That is analysis component 4 is in the course of the work, during to being further processed by the user comment of subject analysis, it is necessary to Chinese Word Segmentation Service device 5 is communicated, and Chinese Word Segmentation Service device 5 is monitored, such as Socket service device end by the port numbers of setting The connection request that fruit receives this port of analysis component 4 then establishes connection, obtains data and processing conclusion is returned into analysis portion Part 4.
F after) Chinese Word Segmentation Service device 5 receives user comment, user comment is segmented and part-of-speech tagging, and part of speech will be carried The user comment of mark returns back to analysis component 4.
Wherein, Chinese Word Segmentation Service device 5 is that the NLPIR/ICTCLAS2014 DLL modules for encapsulating the Computer Department of the Chinese Academy of Science obtain, Worked by the way of port snoop.Chinese Word Segmentation Service device 5 provides service using Socket forms, and by analysis component, 4 need to mark Parametric form, comment text and the user-oriented dictionary using correlation of note are organized, and Chinese Word Segmentation Service device 5 are issued, after processing is completed Text with part-of-speech tagging is returned to analysis component 4.
G after) analysis component 4 receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is obtained Go out the analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored 6 in stored.
Because the identification of NLPIR/ICTCLAS2014 Words partition systems noun and part-of-speech tagging are more accurate, but relevant verb Knowledge provides simultaneously few, so, the user comment of part of speech was marked for NLPIR/ICTCLAS2014 Words partition systems, was analyzed Part 4 also needs to further handle, and supplements the relevant knowledge of verb, to improve the accuracy rate of verb phrase structural analysis.
H) analytical conclusions drawn are fed back to control centre 2 by analysis component 4, control centre 2 after analytical conclusions are received, It is transferred to data center 3 again to be stored.
I after) data center 3 receives analytical conclusions, analytical conclusions are preserved into user comment table, and its is corresponding The analysis mark of user comment is changed to mark.
J) when keeper needs to audit analytical conclusions, analytical conclusions of the keeper by supervising platform 7 to data center 3 Operated.
K) website is actively with reading analytical conclusions, and when user asks to understand conclusion of comments, website will at data center 3 The analytical conclusions show user.
In an embodiment of the present invention, control centre 2 is used as core.Control centre 2, which monitors, to be come to point of foreground website Analysis request, call modules work, the request that processing website is sent.Wherein, control centre 2 is designed to easily extend , it can be not required to change other parts, just can realize the extension of program, make by simply adding power function, call instruction Obtaining the dynamic expansion of whole analysis system can realize easily.
Specifically, step G) include step in detail below:
G1 after) analysis component 4 receives the user comment that Chinese Word Segmentation Service device 5 transmits, using the matching based on regular expression Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis Conclusion.
Regular expression is described using single character string, is matched a series of character strings for meeting some syntactic rule.Table 1 It is regular expression example used in the embodiment of the present invention of part.
The regular expression example of table 1
According to the composition feature of various phrases, the embodiment of the present invention is classified to it, to form different short sentences.Its In the structure of 1 time is only needed to use in the identification process of whole sentence, as shown in table 2;It may be needed repeatedly in identification process The structure of processing, as shown in table 3.
There is the processing rule of phrase structure once and its regular expression in table 2
There is the processing rule of multiple language material structure and its regular expression in table 3
G2) according to sentiment analysis resource, the adjective in the short sentence after combination, verb, noun and emotional symbol are carried out Feeling polarities judge, and draw the emotion value of vocabulary one-level.
G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the phrase with part-of-speech tagging, and according to word The emotion value of remittance one-level, draws revised Sentiment orientation value.
In an embodiment of the present invention, Chinese sentiment classification vocabulary, the data hall provided using HowNet (www.datatang.com) the sentiment classification weighting vocabulary (have chosen part) and network of the Tsing-Hua University provided The vocabularies such as table, degree adverb table, negative vocabulary, adversative conjunction table, coordinating conjunction table, summary conjunction table are accorded with emotional facial expressions, As sentiment analysis resource.
In step G2) in, embodiments of the invention are weighted using Chinese sentiment classification vocabulary, sentiment classification Vocabulary and network are labeled with emotional facial expressions symbol table to the emotion of adjective, verb, noun and emotional symbol, then in step Rapid G3) in, Emotion tagging is carried out to some adverbial words according to degree adverb table, negative vocabulary.Degree adverb only influences the strong of emotion It is weak, and negative word can then influence the polarity of emotion.That is, it was after the word of positive tendency is modified through negative word, to become originally For what is born;Originally it was after the word of negative sense tendency is modified through negative word, to be changed into positive.Two phrases connected by adversative conjunction, Anterior-posterior polarity is opposite;And two phrases connected by coordinating conjunction, anterior-posterior polarity are consistent;The phrase tendency of conjunction connection is summarized, Then contribute to infer the tendency entirely commented on.
G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion are short Language>Binary pair.Different weights are assigned to different emotion object, using method of weighting, draw the emotion of whole user comment Conclusion, when emotion conclusion is positive tendentiousness, analysis component 4 generates the analytical conclusions of " substantially conforming to require, wait pending ";When Emotion conclusion is that negative tendentiousness is, analysis component 4 generates the analytical conclusions of " unqualified comment is, it is necessary to front comment ".Such as the institute of table 4 Show, the specific test case of the embodiment of the present invention.
Table 4<User comment, comment on short sentence>Binary is to extracting test case
G5) syntactic analysis conclusion and sentiment analysis conclusion are individually stored to being locally stored in 6.Due to syntactic analysis conclusion The analysis logic of analysis component can clearly be reflected with sentiment analysis conclusion, therefore, be stored separately on and be locally stored 6 In, facilitate keeper to check it.
Further, step K) include step in detail below:
K1 it is) when keeper needs to check analytical conclusions, supervising platform 7 is transferred to data center 3 by request is audited.
K2 after) data center 3 receives examination & verification request, it will analyze and be labeled as analysis knot corresponding to the user comment that has marked By being transferred to supervising platform 7.
K3 after) supervising platform 7 receives analytical conclusions, analytical conclusions are showed into keeper, keeper it is checked or Modification.In an embodiment of the present invention, keeper carries out artificial examination & verification and confirmation for positive user comment to Sentiment orientation, will The unrelated comment or analysis component that control centre does not pick out analyze unsuitable negative reviews and do final, clear and definite point Analyse result.Now, the user comment for having already passed through keeper's examination & verification can be as the instruction of whole publicity property puppet comment and analysis system White silk language material.
K4) after keeper completes operation, supervising platform 7 generates corresponding examination & verification conclusion, and examination & verification conclusion is preserved To data center 3.
K5 after) data center 3 receives examination & verification conclusion, conclusion will be audited and be added to shape in the analytical conclusions in user comment table Cheng Xin analytical conclusions.Because examination & verification conclusion is the final analysis result of analytical conclusions, therefore in an embodiment of the present invention, number Directly examination & verification conclusion is write in analytical conclusions according to center 3, examination & verification conclusion is stored without opening up new memory space, improves The storage efficiency of data.
Further, in step A) in, the user comment after arrangement also includes the ID of evaluation user and the product class of evaluation Type, in step E) in, whether subject analysis judges user comment comprising corresponding according to the property data base of product type Product type title or product brand title.
Wherein, product type listed by the characteristic of product type and website front end corresponds, and is adjusted in subject analysis With the corresponding characteristic of the product type in user comment, and in this, as judgment standard.
What it is due to embodiments of the invention selection is tendentious comment of the user to product progress in network, therefore, phase Answer characteristic of the ground using product type.Certainly, embodiments of the invention can be applicable to other kinds of user Comment, including the comment for topical news, now need the corresponding characteristic using topical news classification.
Specifically, data center 3 is transmitted using database technology to website front end 1, control centre 2 and supervising platform 7 Different pieces of information is managed.
Keeper adds relevant information (such as product category, brief introduction for the product for needing to collect pseudo- comment by supervising platform 7 Deng) data center 3 is arrived, website front end 1 to data center 3 reads product information and is shown;User is read by website front end 1 Read product information and submit related commentary, the comment that website front end 1 submits user writes data center 3;Control centre 2 is from number The data such as untreated comment ID, affiliated subject categories are extracted according to center 3 and send analysis component 4 to;Analysis component 4 Call a series of module or service to carry out text analyzing, and analytical conclusions are fed back into control centre 2, will by control centre 2 Feed back conclusion write-in data center 3;Keeper carries out most last instance to sentiment classification conclusion by supervising platform 7 for positive comment Core, it is determined that comment validity, and will whether effective conclusion write-in data center 3;User is by website front end 1 from data The heart 3 obtains final analytical conclusions.
In order to which the technical purpose of the present invention is better achieved, the invention also provides a kind of Chinese comment and analysis system, adopt With above-mentioned Chinese comment and analysis method, including with the website front end 1 of user mutual, store user comment data center 3, with It is connected control centre 2 of website front end 1 and data center 3, the Chinese Word Segmentation Service device 5 for being segmented and being marked to user comment, right Analysis component 4 that user comment after participle is analyzed, the supervising platform 7 interacted with keeper and the sheet for storing analysis result Ground storage 6.
Specifically, website front end 1 transmits analysis request to control centre 2, control centre 2 and user comment is transferred into analysis User comment is transferred to Chinese Word Segmentation Service device 5 by part 4, analysis component 4, and Chinese Word Segmentation Service device 5 comments the user with part-of-speech tagging By analysis component 4, the feedback analytical conclusions of analysis component 4 to control centre 2 is fed back to, data center 3 receives website front end 1 respectively User comment data, the analytical conclusions of the transmission of control centre 2 and the examination & verification conclusion of supervising platform 7 of transmission.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modification, equivalent substitution or improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims (2)

  1. A kind of 1. Chinese comment and analysis method, it is characterised in that:The Chinese comment submitted to user is analyzed, its specific steps It is as follows:
    A) user is submitted to website and commented on, and after comment of the website front end (1) to user arranges, the user comment of arrangement is passed Data center (3) is handed to, and analysis request is sent to control centre (2), the user comment after arrangement also includes evaluation user's ID and the product type of evaluation;
    B after) data center (3) receives user comment, it is recorded in user comment table, and is added for each user comment The analysis mark whether analyzed;
    C after) control centre (2) receives request, actively it is connected with data center (3), all analyses are labeled as by data center (3) The user comment do not analyzed is transferred to control centre (2);
    D after) control centre (2) receives user comment, it is passed to analysis component (4);
    E after) analysis component (4) receives user comment, a point subject analysis is carried out to it, subject analysis is according to the feature of product type Database judges whether user comment includes corresponding product type title or product brand title, if user comment Theme it is related to the product that it is commented on, then user comment is transferred to Chinese Word Segmentation Service device (5), and go to step F);If with The theme of family comment is unrelated with the product that it is commented on, then directly generates the analytical conclusions of " theme is unrelated ", and go to step H);
    F after) Chinese Word Segmentation Service device (5) receives user comment, user comment is segmented and part-of-speech tagging, and part of speech mark will be carried The user comment of note returns to analysis component (4), and described Chinese Word Segmentation Service device (5) is with the NLPIR/ of the Computer Department of the Chinese Academy of Science ICTCLAS2014DLL modules are core, described website front end (1), control centre (2), analysis component (4) and Chinese Word Segmentation Service Device (5) uses the data transfer mode based on Socket, and website front end (1) is sent a request message to as Socket clients Control centre (2), control centre (2) are monitored as Socket service end, and analysis component (4) is sent out as Socket clients Message is sent to give Chinese Word Segmentation Service device (5), Chinese Word Segmentation Service device (5) is monitored as Socket service device end;
    G after) analysis component (4) receives the user comment with part-of-speech tagging, syntactic analysis and sentiment analysis are carried out successively, is drawn The analytical conclusions of the Sentiment orientation of the user comment, and the analytical conclusions are delivered to be locally stored in (6) and stored, specifically Step is as follows:
    G1 after) analysis component (4) receives the user comment that Chinese Word Segmentation Service device (5) transmits, using the matching based on regular expression Method carries out syntactic analysis to it, and the phrase in user comment is formed, and is combined into different short sentences, is drawn syntactic analysis Conclusion;
    G2) according to sentiment analysis resource, emotion is carried out to the adjective in the short sentence after combination, verb, noun and emotional symbol Polarity judges, and draws the emotion value of vocabulary one-level;
    G3) according to sentiment analysis resource, Emotion tagging is carried out to the adverbial word in the short sentence with part-of-speech tagging, and according to vocabulary level Emotion value, draw revised Sentiment orientation value;
    G4 the emotion object that emotion phrase evaluated) is found with syntactic relation, forms several<Emotion object, emotion phrase>Two Member is right, and different weights are assigned to different emotion object, using method of weighting, draws the emotion conclusion of whole user comment, When emotion conclusion is positive tendentiousness, the analytical conclusions of analysis component (4) generation " substantially conforming to require, wait pending ";Work as feelings Sense conclusion is that negative tendentiousness is, the analytical conclusions of analysis component (4) generation " unqualified comment is, it is necessary to front comment ";
    G5) syntactic analysis conclusion and sentiment classification conclusion are individually stored to being locally stored in (6);
    H) analytical conclusions drawn are fed back to control centre (2) by analysis component (4), and control centre (2) is receiving analytical conclusions Afterwards, data center (3) is passed it to be stored;
    I after) data center (3) receives analytical conclusions, analytical conclusions are preserved into user comment table, and by its corresponding use The analysis mark of family comment is changed to mark;
    J) when keeper needs to audit analytical conclusions, analytical conclusions of the keeper by supervising platform (7) to data center (3) Operated, described data center (3) is using database technology to website front end (1), control centre (2) and supervising platform (7) different pieces of information transmitted is managed;
    K) analytical conclusions are read in website from data center (3), and when user asks to understand conclusion of comments, this is analyzed in website Conclusion shows user, comprises the following steps that:
    K1 it is) when keeper needs to check analytical conclusions, supervising platform (7) is transferred to data center (3) by request is audited;
    K2 after) data center (3) receives examination & verification request, it will analyze and be labeled as analytical conclusions corresponding to the user comment that has marked It is transferred to supervising platform (7);
    K3 after) supervising platform (7) receives analytical conclusions, analytical conclusions are showed into keeper, keeper checks or repaiied to it Change;
    K4) after keeper completes operation, supervising platform (7) generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to Data center (3);
    K5 after) data center (3) receives examination & verification conclusion, formed in the analytical conclusions that examination & verification conclusion is added in user comment table New analytical conclusions.
  2. A kind of 2. Chinese comment and analysis system, it is characterised in that:Using a kind of Chinese comment and analysis side as claimed in claim 1 Method, including with the website front end (1) of user mutual, store user comment data center (3), with website front end (1) and data The connected control centre (2) in center (3), the Chinese Word Segmentation Service device (5) that user comment is segmented and marked, to the use after participle Comment on being locally stored for the analysis component (4) analyzed, the supervising platform (7) interacted with keeper and storage analysis result in family (6);
    Described website front end (1) transmits analysis request to control centre (2), described control centre (2) and passes user comment Analysis component (4) is handed to, user comment is transferred to Chinese Word Segmentation Service device (5), described Chinese Word Segmentation Service by described analysis component (4) User comment with part-of-speech tagging is fed back to analysis component (4), described analysis component (4) feedback analytical conclusions by device (5) To control centre (2), described data center (3) receives the user comment data of website front end (1) transmission, control centre respectively (2) analytical conclusions and the examination & verification conclusion of supervising platform (7) transmitted.
CN201410663427.XA 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system Active CN104484336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410663427.XA CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410663427.XA CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Publications (2)

Publication Number Publication Date
CN104484336A CN104484336A (en) 2015-04-01
CN104484336B true CN104484336B (en) 2017-12-19

Family

ID=52758877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410663427.XA Active CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Country Status (1)

Country Link
CN (1) CN104484336B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649268A (en) * 2016-11-30 2017-05-10 北京京东尚科信息技术有限公司 Investigation sample judging method and system and grey list generation method and system
CN108280560A (en) * 2017-01-06 2018-07-13 广州市动景计算机科技有限公司 A kind of anti-brush method and device of subject evaluation
CN109933775B (en) * 2017-12-15 2022-02-18 腾讯科技(深圳)有限公司 UGC content processing method and device
CN108170685B (en) * 2018-01-29 2021-10-29 浙江省公众信息产业有限公司 Text emotion analysis method and device and computer readable storage medium
CN108256098B (en) * 2018-01-30 2022-02-15 中国银联股份有限公司 Method and device for determining emotional tendency of user comment
CN108536601B (en) * 2018-04-13 2022-05-31 腾讯科技(深圳)有限公司 Evaluation method, device, server and storage medium
CN109522412B (en) * 2018-11-14 2021-02-26 鼎富智能科技有限公司 Text emotion analysis method, device and medium
CN110442798B (en) * 2019-07-03 2021-10-08 华中科技大学 Spam comment user group detection method based on network representation learning
CN110674256B (en) * 2019-09-25 2023-05-12 携程计算机技术(上海)有限公司 Method and system for detecting correlation degree of comment and reply of OTA hotel

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9201863B2 (en) * 2009-12-24 2015-12-01 Woodwire, Inc. Sentiment analysis from social media content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Also Published As

Publication number Publication date
CN104484336A (en) 2015-04-01

Similar Documents

Publication Publication Date Title
CN104484336B (en) A kind of Chinese comment and analysis method and its system
Guellil et al. Social big data mining: A survey focused on opinion mining and sentiments analysis
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
Khasawneh et al. Sentiment analysis of Arabic social media content: a comparative study
WO2013059487A1 (en) System and methods for automatically detecting deceptive content
CN107273474A (en) Autoabstract abstracting method and system based on latent semantic analysis
CN103064971A (en) Scoring and Chinese sentiment analysis based review spam detection method
CN108845986A (en) A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN104050243B (en) It is a kind of to search for the network search method combined with social activity and its system
CN111090735B (en) Performance evaluation method of intelligent question-answering method based on knowledge graph
CN111080055A (en) Hotel scoring method, hotel recommendation method, electronic device and storage medium
Jianqiang et al. Combining semantic and prior polarity for boosting twitter sentiment analysis
Ziser et al. Humor detection in product question answering systems
Saranya et al. A Machine Learning-Based Technique with IntelligentWordNet Lemmatize for Twitter Sentiment Analysis.
Wu et al. Sentiment analysis of online product reviews based on SenBERT-CNN
Dou et al. Improving large-scale paraphrase acquisition and generation
Nguyen et al. A corpus for aspect-based sentiment analysis in Vietnamese
CN112115712B (en) Topic-based group emotion analysis method
Zhang et al. Product features extraction and categorization in Chinese reviews
CN104933097B (en) A kind of data processing method and device for retrieval
Hristova Topic modeling of chat data: A case study in the banking domain
Yan et al. Sentiment Analysis of Short Texts Based on Parallel DenseNet.
Sudhakaran et al. Research directions, challenges and issues in opinion mining
Huang et al. Examining bias in opinion summarisation through the perspective of opinion diversity
Bugueño et al. Applying self-attention for stance classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant