CN104484336A - Chinese commentary analysis method and system - Google Patents

Chinese commentary analysis method and system Download PDF

Info

Publication number
CN104484336A
CN104484336A CN201410663427.XA CN201410663427A CN104484336A CN 104484336 A CN104484336 A CN 104484336A CN 201410663427 A CN201410663427 A CN 201410663427A CN 104484336 A CN104484336 A CN 104484336A
Authority
CN
China
Prior art keywords
analysis
comment
conclusion
user
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410663427.XA
Other languages
Chinese (zh)
Other versions
CN104484336B (en
Inventor
郝秀兰
蒋云良
许方曲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huzhou University
Original Assignee
Huzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huzhou University filed Critical Huzhou University
Priority to CN201410663427.XA priority Critical patent/CN104484336B/en
Publication of CN104484336A publication Critical patent/CN104484336A/en
Application granted granted Critical
Publication of CN104484336B publication Critical patent/CN104484336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Chinese commentary analysis method applicable to collecting a Chinese 'pseudo comment' corpus by analyzing Chinese comments of users to determine whether the comments can be used as corpus or not. The method includes the steps that a user submits a comment to a website, the front-end of the website sends an analysis request to a control center, the control center transfers the comment to an analysis unit, the analysis unit performs theme classification analysis on the comment, a word classification server performs word classification and part-of-speech tagging, the analysis unit sequentially performs syntactic analysis and sentiment analysis, and a data center saves analysis conclusions to a user comment table. According to the Chinese commentary analysis method, the control center can directly exclude unqualified corpus through thematic analysis and the analysis unit sequentially performs syntactic analysis and sentiment analysis on the comment of the user so as to effectively draw conclusions of emotional tendencies of the Chinese comment and improve the accuracy of an analysis system, and then an administrator can only view the comment in positive tendencies to determine whether the comment meets the requirements or not.

Description

A kind of Chinese comment and analysis method and system thereof
[technical field]
The present invention relates to a kind of Chinese comment and analysis method, analytical approach when particularly a kind of Chinese publicity property " pseudo-comment " is collected and system thereof.
[background technology]
China " 12 " Informationized plan clearly proposes " improving network public-opinion monitoring capability ", the internet information developing goal of " monitoring of network harmful information and management and control ability ", and planning is set up " technical support system such as check and evaluation and monitoring and warning " for internet information.The monitoring of visible network public sentiment, internet information has become the vital task of national information strategic level.And one of them Key technology is sentiment analysis (Sentiment Analysis), i.e. one of gordian technique of the present invention.
Sentiment analysis, also known as opining mining (Opinion Mining), refers to the subjective information such as viewpoint, view, mood, taste by excavating in text, makes classification judge the Sentiment orientation of text.Emotion has wide in range intension, can be the judgements of people for product, society, may also be a kind of aesthetic attitude.The Sentiment orientation of text refers to the tendentiousness that text reflects and its Sentiment orientation intensity, has different criteria for classifications depending on different purposes.
Except the application in internet public feelings monitoring field, sentiment analysis is also widely used in numerous industries of the relation people's livelihood such as life information service, medical services.User surfs the Net and inquires about the comment of Related product, and makes final purchase decision by contrast; Healthcare system carrys out the attitude of evaluating patient, to provide better prescription.What this project was paid close attention to is the application of text emotion analysis in ecommerce.
On internet, comment spam is ubiquitous, and as the comment spam etc. of related products in the comment spam in community, the comment spam in blog, e-commerce website, different comment spam respectively has feature.In e-commerce website, some special comment: it is bad that product/service that the comment had is held is said into, have bad product/service is said into is good, and the comment of this two class is collectively referred to as " pseudo-comment ", and pseudo-comment is the one of comment spam.In reality, the comment of this two class is harmful to all very much, and the former damages the interests of businessman, and the latter damages the interests of consumer.But pseudo-comment mixes with true comment, is difficult to make a distinction by artificial method.
The identification of pseudo-comment needs by means of text emotion analytical technology, this technological essence is the one of automatic Text Categorization, and training dataset (also known as language material) acquisition methods conventional in text classification is artificial mark.But " pseudo-comment " is manually unrecognizable, and that is, the method that expert cannot be adopted to mark marks it.
We investigate existing opining mining language material, TREC[TREC (Text RetrievalConference), text retrieval conference] the evaluation and test MOAT of Blog Track, NTCIR, the Chinese sentiment classification evaluation and test COAE series that the arrange Chinese Opinion that provides certain scale excavates language material.In addition, many research units and individual also provide the opining mining language material of certain scale.But up to now, we do not find the language material being specifically designed to the pseudo-comment detection of Chinese.
The people such as Ott are difficult to the problem obtained for puppet comment, by Amazon Mechanical Turk platform, be assigned with 400 HIT (Human-Intelligence Tasks) task, have collected 400 sections of duplicity comment spam (tendentious " pseudo-comment "), experiment conclusion shows that crowdsourcing is effective.Unfortunately, domestic do not have such platform; We domestic user can not look for again living and do to Amazon MechanicalTurk platform.
Both the Chinese language material of landfill product comment and analysis had not been had at present, yet not relevant Chinese comment language material collecting net station.In order to obtain " pseudo-comment " language material, we need oneself exploitation one to be similar to the platform of AmazonMechanical Turk.Existing research and be practiced as this Project-developing and provide many thinkings for using for reference and technique preparation, but also need to do and integrate further and improve.
[summary of the invention]
The object of the invention is to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of Chinese comment and analysis method and system thereof, it is intended to solve in prior art and cannot carries out automatic distinguishing to puppet comment, carries out Sentiment orientation analyze inaccurate technical matters to the comment of website.
For achieving the above object, the present invention proposes a kind of Chinese comment and analysis method, analyze the Chinese comment that user submits to, its concrete steps are as follows:
A) user submits comment to website, after the comment of website front end to user arranges, the user comment of arrangement is passed to data center, and sends analysis request to control center;
B) after data center receives user comment, be recorded in user comment table, and marked for each user comment adds the analysis whether analyzed;
C), after control center receives request, be initiatively connected with data center, all analyses are labeled as the user comment do not analyzed and are passed to control center by data center;
D), after control center receives user comment, analysis component is passed to;
E) after analysis component receives user comment, a point subject analysis is carried out to it, if the theme of user comment is relevant to the product of its comment, then user comment is passed to Chinese Word Segmentation Service device, and goes to step F); If the product of the theme of user comment and its comment has nothing to do, then directly generate the analysis conclusion of " theme has nothing to do ", and go to step H);
F) after Chinese Word Segmentation Service device receives user comment, participle and part-of-speech tagging are carried out to user comment, and the user comment with part-of-speech tagging is turned back to analysis component;
G) after analysis component receives the user comment with part-of-speech tagging, carry out syntactic analysis and sentiment analysis successively, draw the analysis conclusion of the Sentiment orientation of this user comment, and this analysis conclusion is delivered in local storage and stores;
H) the analysis conclusion drawn is fed back to control center by analysis component, and control center, after receiving analysis conclusion, is delivered to data center and stores;
I) data center receives and analyzes after conclusion, analysiss conclusion is saved in user comment table, and the analysis of its corresponding user comment is marked to change to and mark;
J) when keeper needs examination & verification to analyze conclusion, keeper is operated by the analysis conclusion of supervising platform to data center;
K) website is initiatively located to read with data center and is analyzed conclusion, and when user asks to understand conclusion of comments, this analysis conclusion is showed user by website.
As preferably, step G) comprise following concrete steps:
G1) after analysis component receives the user comment that Chinese Word Segmentation Service device transmits, adopt the matching process based on regular expression to carry out syntactic analysis to it, form according to the phrase in user comment, be combined into different short sentence, draw syntactic analysis conclusion;
G2) according to sentiment analysis resource, feeling polarities judgement is carried out to adjective, verb, noun and emotional symbol in the short sentence after combination, and draws the preliminary tendency conclusion of user comment;
G3) according to sentiment analysis resource, carry out Emotion tagging to the adverbial word in the short sentence of part-of-speech tagging, and according to being tentatively inclined to conclusion, draw Sentiment orientation conclusion;
G4) with the emotion object that syntactic relation finds emotion phrase to evaluate, several < emotion objects are formed, emotion phrase > binary pair.Give different weights to different emotion objects, adopt method of weighting, draw the emotion conclusion of whole user comment, when emotion conclusion is positive tendentiousness, analysis component generates the analysis conclusion of " substantially meet the requirements, wait pending "; When emotion conclusion for negative tendentiousness is that analysis component generates the analysis conclusion of " defective comment needs front to comment on ";
G5) syntactic analysis conclusion and sentiment classification result are stored to separately in local storage.
As preferably, step K) comprise following concrete steps:
K1), time for needing as keeper to check analysis conclusion, supervising platform will audit request forwarding to data center;
K2), after data center receives examination & verification request, by analysis, the analysis conclusion be labeled as corresponding to the user comment marked is passed to supervising platform;
K3), after supervising platform receives and analyzes conclusion, analysis conclusion is showed keeper, and keeper checks it or revises;
K4) after keeper's complete operation, supervising platform generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to data center;
K5), after data center receives examination & verification conclusion, examination & verification conclusion is added in the analysis conclusion in user comment table and forms new analysis conclusion.
As preferably, described Chinese Word Segmentation Service device with the NLPIR/ICTCLAS2014 DLL module of the Computer Department of the Chinese Academy of Science for core.
As preferably, in steps A) in, user comment after arrangement also comprises evaluates the ID of user and the product type of evaluation, in step e) in, according to the property data base of product type, subject analysis judges whether user comment comprises corresponding product type title or product brand title.
As preferably, described website front end, control center, analysis component and Chinese Word Segmentation Service device all adopt the data transfer mode based on Socket, website front end sends a request message to control center as Socket client, control center monitors as Socket service end, analysis component sends message to Chinese Word Segmentation Service device as Socket client, and Chinese Word Segmentation Service device is monitored as Socket service device end.
As preferably, described data center adopts the different pieces of information of database technology to website front end, control center and supervising platform transmission to manage.
In order to realize technical purpose of the present invention better, the invention allows for a kind of Chinese comment and analysis system, adopt above-mentioned one Chinese comment and analysis method, this locality comprising the supervising platform mutual with the website front end of user interactions, the data center storing user comment, the control center be connected with website front end and data center, Chinese Word Segmentation Service device user comment being carried out to participle and mark, the analysis component analyzed the user comment after participle and keeper and inventory analysis conclusion stores;
Described website front end transmits analysis request to control center, user comment is passed to analysis component by described control center, user comment is passed to Chinese Word Segmentation Service device by described analysis component, user comment with part-of-speech tagging is fed back to analysis component by described Chinese Word Segmentation Service device, described analysis component feedback analyzes conclusion to control center, and described data center receives user comment data, the analysis conclusion of control center's transmission and the examination & verification conclusion of supervising platform that website front end transmits respectively.
Beneficial effect of the present invention: compared with prior art, the Chinese comment and analysis method of one provided by the invention, rational in infrastructure, adopt control center and data center is coordinated, the work of connecting components.When user submits to one to comment on from foreground, uncorrelated comment directly can be got rid of by subject analysis by control center, make the user comment only having theme to be correlated with just can carry out next step analysis, and analysis component carries out syntactic analysis and Sentiment orientation analysis to user comment successively, effectively draw the Sentiment orientation conclusion of Chinese comment, improve the accuracy of analytic system, now keeper only can browse tendentiousness is positive comment, to determine whether it meets the requirements, thus alleviate the workload that keeper processes pseudo-comment, improve the efficiency that pseudo-comment is collected, and user also can learn whether the comment oneself submitted to meets the requirements.
Feature of the present invention and advantage will be described in detail by reference to the accompanying drawings by embodiment.
[accompanying drawing explanation]
Fig. 1 is the schematic flow sheet of the embodiment of the present invention;
Fig. 2 is the part run schematic diagram of the analysis component of the embodiment of the present invention.
[embodiment]
For making the object, technical solutions and advantages of the present invention clearly understand, below by accompanying drawing and embodiment, the present invention is further elaborated.But should be appreciated that, specific embodiment described herein, only in order to explain the present invention, is not limited to scope of the present invention.In addition, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring concept of the present invention.
Consult Fig. 1 and Fig. 2, the embodiment of the present invention provides a kind of Chinese comment and analysis method, and analyze using the comment of the Chinese of user as language material, its concrete steps are as follows:
A) user submits comment to website, after the comment of website front end 1 couple of user arranges, sends analysis request, and the user comment of arrangement is passed to data center 3 to control center 2.
Wherein, website front end 1 and control center 2 adopt the data transfer mode based on Socket, and Socket is usually also referred to as " socket ", and application program sends request or response network request by " socket " to network usually.According to the target that the mode and local socket that connect startup will connect, the connection procedure between socket can be divided into three steps: server is monitored, client-requested, connects and confirms.
Server is monitored: server side socket the concrete client socket of delocalization, but the state to be connected such as to be in, real-time monitoring state of network.
Client-requested: propose connection request by the socket of client, the target that connect is the socket of server end.For this reason, first the socket of client must describe the socket of the server that it will connect, and points out address and the port numbers of server side socket, then just proposes connection request to server side socket.
Connect and confirm: when server side socket listens to the connection request receiving client socket in other words, it is with regard to the request of customer in response end socket, set up a new thread, client is issued in the description of server side socket, describe once client confirms this, connect and just established.And server side socket keeps listening state, continue the connection request receiving other client socket.
In this kind of structure, website front end 1 sends a request message to control center 2 as Socket client, and control center 2 monitors as Socket service end.That is user checks related data introduction by network, writes and submits to comment to after website front end 1, and website, by web socket, sends analysis request to control center 2, started working by control center 2 triggering system.
B) after data center 3 receives user comment, be recorded in user comment table, and marked for each user comment adds the analysis whether analyzed.
C), after control center 2 receives request, be initiatively connected with data center 3, all analyses are labeled as the user comment do not analyzed and are passed to control center 2 by data center 3.
D), after control center 2 receives user comment, analysis component 4 is passed to.
E) after analysis component 4 receives user comment, a point subject analysis is carried out to it, if the theme of user comment is relevant to the product of its comment, then the user comment after participle is passed to Chinese Word Segmentation Service device 5, and goes to step F); If the product of the theme of user comment and its comment has nothing to do, then directly generate the analysis conclusion of " theme has nothing to do ", and go to step H).
Wherein, analysis component 4 and Chinese Word Segmentation Service device 5 also adopt the data transfer mode based on Socket, and analysis component 4 sends message to Chinese Word Segmentation Service device 5 as Socket client, and Chinese Word Segmentation Service device 5 is monitored as Socket service device end.That is, analysis component 4 in the course of the work, when user comment by subject analysis is further processed, need to communicate with Chinese Word Segmentation Service device 5, Chinese Word Segmentation Service device 5 is as Socket service device end, monitored by the port numbers arranged, if receive the connection request of this port of analysis component 4, connect, obtain data and process conclusion is returned to analysis component 4.
F) after Chinese Word Segmentation Service device 5 receives user comment, participle and part-of-speech tagging are carried out to user comment, and the user comment with part-of-speech tagging is returned back to analysis component 4.
Wherein, Chinese Word Segmentation Service device 5 is that the NLPIR/ICTCLAS2014 DLL module of the encapsulation Computer Department of the Chinese Academy of Science obtains, and the mode adopting port to monitor works.Chinese Word Segmentation Service device 5 adopts Socket form to provide service, organized needing the relevant user-oriented dictionary of the parametric form of mark, comment text and application by analysis component 4, issue Chinese Word Segmentation Service device 5, after having processed, the text of band part-of-speech tagging is returned to analysis component 4.
G) after analysis component 4 receives the user comment with part-of-speech tagging, carry out syntactic analysis and sentiment analysis successively, draw the analysis conclusion of the Sentiment orientation of this user comment, and this analysis conclusion is delivered to local storage in 6 and stores.
Due to the noun identification of NLPIR/ICTCLAS2014 Words partition system and part-of-speech tagging more accurate, but the knowledge about verb provides also few, so, NLPIR/ICTCLAS2014 Words partition system was marked to the user comment of part of speech, analysis component 4 also needs further process, supplement the relevant knowledge of verb, to improve the accuracy rate of verb phrase structure analysis.
H) the analysis conclusion drawn is fed back to control center 2 by analysis component 4, and it, after receiving analysis conclusion, is passed to again data center 3 and stores by control center 2.
I) data center 3 receives and analyzes after conclusion, analysiss conclusion is saved in user comment table, and the analysis of its corresponding user comment is marked to change to and mark.
J) when keeper needs examination & verification to analyze conclusion, keeper is operated by the analysis conclusion of supervising platform 7 pairs of data centers 3.
K) website is initiatively read with place of data center 3 and is analyzed conclusion, and when user asks to understand conclusion of comments, this analysis conclusion is showed user by website.
In an embodiment of the present invention, control center 2 is adopted to be core.The analysis request to website, foreground is monitored by control center 2, calls modules work, the request that process website sends.Wherein, control center 2 is designed to easy expansion, by adding power function, call instruction simply, not needing to change other parts, just can realize the expansion of program, the dynamic expansion of whole analytic system can be realized easily.
Particularly, step G) comprise following concrete steps:
G1) after analysis component 4 receives the user comment that Chinese Word Segmentation Service device 5 transmits, adopt the matching process based on regular expression to carry out syntactic analysis to it, form according to the phrase in user comment, be combined into different short sentence, draw syntactic analysis conclusion.
Regular expression uses single character string to describe, mate a series of character string meeting certain syntactic rule.Table 1 is the regular expression example that the part embodiment of the present invention uses.
Table 1 regular expression example
According to the formation feature of various phrase, the embodiment of the present invention is classified to it, to form different short sentences.In the identifying of whole sentence, wherein only need the structure of use 1 time, as shown in table 2; The structure repeatedly processed may be needed in identifying, as shown in table 3.
There is processing rule and the regular expression thereof of phrase structure once in table 2
There is processing rule and the regular expression thereof of language material structure repeatedly in table 3
G2) according to sentiment analysis resource, feeling polarities judgement is carried out to adjective, verb, noun and emotional symbol in the short sentence after combination, and draws the emotion value of vocabulary one-level.
G3) according to sentiment analysis resource, carry out Emotion tagging to the adverbial word in the phrase of part-of-speech tagging, and according to the emotion value of vocabulary one-level, draw revised Sentiment orientation value.
In an embodiment of the present invention, the vocabularies such as the sentiment classification weighting vocabulary (have chosen part) of the Tsing-Hua University that the Chinese sentiment classification vocabulary adopting HowNet to provide, data hall (www.datatang.com) provide and network emotional facial expressions symbol table, degree adverb table, negative vocabulary, adversative conjunction table, coordinating conjunction table, summary conjunction table, as sentiment analysis resource.
In step G2) in, embodiments of the invention use Chinese sentiment classification vocabulary, sentiment classification weighting vocabulary and the emotion of network emotional facial expressions symbol table to adjective, verb, noun and emotional symbol to mark, then in step G3) in, according to degree adverb table, negative vocabulary, Emotion tagging is carried out to some adverbial words.Degree adverb only affects the power of emotion, and negative word then can affect the polarity of emotion.That is, the word being originally forward tendency, after negative word is modified, becomes negative; Originally the word being negative sense tendency, after negative word is modified, becomes positive.Two phrases connected by adversative conjunction, anterior-posterior polarity is contrary; And two phrases connected by coordinating conjunction, anterior-posterior polarity is consistent; Sum up the phrase tendency that conjunction connects, then contribute to the tendency inferring whole comment.
G4) with the emotion object that syntactic relation finds emotion phrase to evaluate, several < emotion objects are formed, emotion phrase > binary pair.Give different weights to different emotion objects, adopt method of weighting, draw the emotion conclusion of whole user comment, the analysis conclusion that when emotion conclusion is positive tendentiousness, analysis component 4 generates " substantially meet the requirements, wait pending "; When emotion conclusion for negative tendentiousness is, the analysis conclusion that analysis component 4 generates " defective comment needs front to comment on ".As shown in table 4, the concrete test case of the embodiment of the present invention.
Table 4< user comment, comment short sentence > binary is to extraction test case
G5) syntactic analysis conclusion and sentiment analysis conclusion are stored to separately local storage in 6.Because syntactic analysis conclusion and sentiment analysis conclusion can reflect the analysis logic of analysis component clearly, therefore, it is stored in separately local storage in 6, facilitates keeper to check it.
Further, step K) comprise following concrete steps:
K1), time for needing as keeper to check analysis conclusion, supervising platform 7 will audit request forwarding to data center 3.
K2), after data center 3 receives examination & verification request, by analysis, the analysis conclusion be labeled as corresponding to the user comment marked is passed to supervising platform 7.
K3), after supervising platform 7 receives and analyzes conclusion, analysis conclusion is showed keeper, and keeper checks it or revises.In an embodiment of the present invention, keeper is that positive user comment carries out artificial examination & verification and confirmation to Sentiment orientation, and irrelevant comment control center do not picked out or analysis component are analyzed unsuitable negative reviews and done final, clear and definite analysis result.Now, can as the training language material of the pseudo-comment and analysis system of whole publicity through the user comment of keeper's examination & verification.
K4) after keeper's complete operation, supervising platform 7 generates corresponding examination & verification conclusion, and examination & verification conclusion is saved to data center 3.
K5), after data center 3 receives examination & verification conclusion, examination & verification conclusion is added in the analysis conclusion in user comment table and forms new analysis conclusion.Because examination & verification conclusion analyzes the final analysis result of conclusion, therefore in an embodiment of the present invention, data center 3 directly will audit conclusion write and analyze in conclusion, and without the need to opening up new storage space to store examination & verification conclusion, improve the storage efficiency of data.
Further, in steps A) in, user comment after arrangement also comprises evaluates the ID of user and the product type of evaluation, in step e) in, according to the property data base of product type, subject analysis judges whether user comment comprises corresponding product type title or product brand title.
Wherein, product type one_to_one corresponding listed by the characteristic of product type and website front end, the characteristic that the product type in subject analysis in invoke user comment is corresponding, and in this, as judgment standard.
What select due to embodiments of the invention is the tendentious comment that user carries out product in network, and therefore, what correspondingly adopt is the characteristic of product type.Certainly, embodiments of the invention can also be applicable to the user comment of other types, comprise the comment for topical news, now need the corresponding characteristic adopting topical news classification.
Particularly, data center 3 adopts database technology to manage the different pieces of information that website front end 1, control center 2 and supervising platform 7 transmit.
Keeper adds needs by supervising platform 7 and collects the relevant information (as product category, brief introduction etc.) of the product of pseudo-comment to data center 3, and website front end 1 reads product information to data center 3 and shows; User reads product information by website front end 1 and submits related commentary to, the comment write data center 3 that website front end 1 submits user to; Control center 2 extracts the data such as untreated comment ID, affiliated subject categories from data center 3 and sends analysis component 4 to; Analysis component 4 calls a series of module or text analyzing is carried out in service, and analysis conclusion is fed back to control center 2, will feed back conclusion write data center 3 by control center 2; Keeper is that final review is carried out in positive comment by supervising platform 7 pairs of sentiment classification conclusions, determines the validity commented on, and whether effectively conclusion will write data center 3; User obtains final analysis conclusion by website front end 1 from data center 3.
In order to realize technical purpose of the present invention better, the invention allows for a kind of Chinese comment and analysis system, adopt above-mentioned Chinese comment and analysis method, this locality comprising the supervising platform 7 mutual with the website front end 1 of user interactions, the data center 3 storing user comment, the control center 2 be connected with website front end 1 and data center 3, the Chinese Word Segmentation Service device 5 user comment being carried out to participle and mark, the analysis component 4 analyzed the user comment after participle and keeper and inventory analysis result stores 6.
Particularly, website front end 1 transmits analysis request to control center 2, user comment is passed to analysis component 4 by control center 2, user comment is passed to Chinese Word Segmentation Service device 5 by analysis component 4, user comment with part-of-speech tagging is fed back to analysis component 4 by Chinese Word Segmentation Service device 5, analysis component 4 feedback analyzes conclusion to control center 2, and data center 3 receives user comment data, the analysis conclusion of control center 2 transmission and the examination & verification conclusion of supervising platform 7 that website front end 1 transmits respectively.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement or improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. a Chinese comment and analysis method, is characterized in that: analyze the Chinese comment that user submits to, its concrete steps are as follows:
A) user submits comment to website, after website front end (1) comment to user arranges, the user comment of arrangement is passed to data center (3), and sends analysis request to control center (2);
B) after data center (3) receives user comment, be recorded in user comment table, and marked for each user comment adds the analysis whether analyzed;
C), after control center (2) receives request, be initiatively connected with data center (3), all analyses are labeled as the user comment do not analyzed and are passed to control center (2) by data center (3);
D), after control center (2) receives user comment, analysis component (4) is passed to;
E) after analysis component (4) receives user comment, a point subject analysis is carried out to it, if the theme of user comment is relevant to the product of its comment, then user comment is passed to Chinese Word Segmentation Service device (5), and goes to step F); If the product of the theme of user comment and its comment has nothing to do, then directly generate the analysis conclusion of " theme has nothing to do ", and go to step H);
F) after Chinese Word Segmentation Service device (5) receives user comment, participle and part-of-speech tagging are carried out to user comment, and the user comment with part-of-speech tagging is turned back to analysis component (4);
G) after analysis component (4) receives the user comment with part-of-speech tagging, carry out syntactic analysis and sentiment analysis successively, draw the analysis conclusion of the Sentiment orientation of this user comment, and this analysis conclusion is delivered to local storage in (6) and stores;
H) the analysis conclusion drawn is fed back to control center (2) by analysis component (4), and control center (2), after receiving analysis conclusion, is delivered to data center (3) and stores;
I) data center (3) receives and analyzes after conclusion, analysiss conclusion is saved in user comment table, and the analysis of its corresponding user comment is marked to change to and mark;
J) when keeper needs examination & verification to analyze conclusion, keeper is operated by the analysis conclusion of supervising platform (7) to data center (3);
K) website is read from data center (3) and is analyzed conclusion, and when user asks to understand conclusion of comments, this analysis conclusion is showed user by website.
2. a kind of Chinese comment and analysis method as claimed in claim 1, is characterized in that: step G) comprise following concrete steps:
G1) after analysis component (4) receives the user comment that Chinese Word Segmentation Service device (5) transmits, the matching process based on regular expression is adopted to carry out syntactic analysis to it, form according to the phrase in user comment, be combined into different short sentence, draw syntactic analysis conclusion;
G2) according to sentiment analysis resource, feeling polarities judgement is carried out to adjective, verb, noun and emotional symbol in the short sentence after combination, and draws the emotion value of vocabulary one-level;
G3) according to sentiment analysis resource, carry out Emotion tagging to the adverbial word in the short sentence of part-of-speech tagging, and according to the emotion value of vocabulary level, draw revised Sentiment orientation value;
G4) with the emotion object that syntactic relation finds emotion phrase to evaluate, several < emotion objects are formed, emotion phrase > binary pair.Give different weights to different emotion objects, adopt method of weighting, draw the emotion conclusion of whole user comment, when emotion conclusion is positive tendentiousness, analysis component (4) generates the analysis conclusion of " substantially meet the requirements, wait pending "; When emotion conclusion for negative tendentiousness is that analysis component (4) generates the analysis conclusion of " defective comment needs front to comment on ";
G5) syntactic analysis conclusion and sentiment classification conclusion are stored to separately local storage in (6).
3. a kind of Chinese comment and analysis method as claimed in claim 1, is characterized in that: step K) comprise following concrete steps:
K1), time for needing as keeper to check analysis conclusion, supervising platform (7) will audit request forwarding to data center (3);
K2), after data center (3) receives examination & verification request, by analysis, the analysis conclusion be labeled as corresponding to the user comment marked is passed to supervising platform (7);
K3), after supervising platform (7) receives and analyzes conclusion, analysis conclusion is showed keeper, and keeper checks it or revises;
K4) after keeper's complete operation, supervising platform (7) generates corresponding examination & verification conclusion, and examination & verification conclusion is returned back to data center (3);
K5), after data center (3) receives examination & verification conclusion, examination & verification conclusion is added in the analysis conclusion in user comment table and forms new analysis conclusion.
4. a kind of Chinese comment and analysis method as claimed in claim 1, is characterized in that: described Chinese Word Segmentation Service device (5) with the NLPIR/ICTCLAS2014DLL module of the Computer Department of the Chinese Academy of Science for core.
5. a kind of Chinese comment and analysis method as claimed in claim 1, it is characterized in that: in steps A) in, user comment after arrangement also comprises evaluates the ID of user and the product type of evaluation, in step e) in, according to the property data base of product type, subject analysis judges whether user comment comprises corresponding product type title or product brand title.
6. a kind of Chinese comment and analysis method as claimed in claim 1, it is characterized in that: described website front end (1), control center (2), analysis component (4) and Chinese Word Segmentation Service device (5) all adopt the data transfer mode based on Socket, website front end (1) sends a request message to control center (2) as Socket client, control center (2) monitors as Socket service end, analysis component (4) sends message to Chinese Word Segmentation Service device (5) as Socket client, Chinese Word Segmentation Service device (5) is monitored as Socket service device end.
7. a kind of Chinese comment and analysis method as claimed in claim 1, is characterized in that: described data center (3) adopts database technology to manage the different pieces of information that website front end (1), control center (2) and supervising platform (7) transmit.
8. a Chinese comment and analysis system, it is characterized in that: adopt the one Chinese comment and analysis method according to any one of claim 1 to 7, comprise and the website front end of user interactions (1), store the data center (3) of user comment, the control center (2) be connected with website front end (1) and data center (3), user comment is carried out to the Chinese Word Segmentation Service device (5) of participle and mark, to the analysis component (4) that the user comment after participle is analyzed, the supervising platform (7) mutual with keeper and this locality of inventory analysis result store (6),
Described website front end (1) transmits analysis request to control center (2), user comment is passed to analysis component (4) by described control center (2), user comment is passed to Chinese Word Segmentation Service device (5) by described analysis component (4), user comment with part-of-speech tagging is fed back to analysis component (4) by described Chinese Word Segmentation Service device (5), described analysis component (4) feedback analyzes conclusion to control center (2), described data center (3) receives the user comment data that website front end (1) transmits respectively, the examination & verification conclusion of the analysis conclusion that control center (2) transmits and supervising platform (7).
CN201410663427.XA 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system Active CN104484336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410663427.XA CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410663427.XA CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Publications (2)

Publication Number Publication Date
CN104484336A true CN104484336A (en) 2015-04-01
CN104484336B CN104484336B (en) 2017-12-19

Family

ID=52758877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410663427.XA Active CN104484336B (en) 2014-11-19 2014-11-19 A kind of Chinese comment and analysis method and its system

Country Status (1)

Country Link
CN (1) CN104484336B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649268A (en) * 2016-11-30 2017-05-10 北京京东尚科信息技术有限公司 Investigation sample judging method and system and grey list generation method and system
CN108170685A (en) * 2018-01-29 2018-06-15 浙江省公众信息产业有限公司 Text emotion analysis method, device and computer readable storage medium
CN108256098A (en) * 2018-01-30 2018-07-06 中国银联股份有限公司 A kind of method and device of determining user comment Sentiment orientation
CN108280560A (en) * 2017-01-06 2018-07-13 广州市动景计算机科技有限公司 A kind of anti-brush method and device of subject evaluation
CN108536601A (en) * 2018-04-13 2018-09-14 腾讯科技(深圳)有限公司 A kind of evaluating method, device, server and storage medium
CN109522412A (en) * 2018-11-14 2019-03-26 北京神州泰岳软件股份有限公司 Text emotion analysis method, device and medium
CN109933775A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 UGC content processing method and device
CN110442798A (en) * 2019-07-03 2019-11-12 华中科技大学 Comment spam groups of users detection method based on network representation study
CN110674256A (en) * 2019-09-25 2020-01-10 携程计算机技术(上海)有限公司 Detection method and system for relevancy of comment and reply of OTA hotel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
US20120101808A1 (en) * 2009-12-24 2012-04-26 Minh Duong-Van Sentiment analysis from social media content
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
US20120101808A1 (en) * 2009-12-24 2012-04-26 Minh Duong-Van Sentiment analysis from social media content
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649268A (en) * 2016-11-30 2017-05-10 北京京东尚科信息技术有限公司 Investigation sample judging method and system and grey list generation method and system
CN108280560A (en) * 2017-01-06 2018-07-13 广州市动景计算机科技有限公司 A kind of anti-brush method and device of subject evaluation
CN109933775A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 UGC content processing method and device
CN108170685A (en) * 2018-01-29 2018-06-15 浙江省公众信息产业有限公司 Text emotion analysis method, device and computer readable storage medium
CN108256098A (en) * 2018-01-30 2018-07-06 中国银联股份有限公司 A kind of method and device of determining user comment Sentiment orientation
CN108256098B (en) * 2018-01-30 2022-02-15 中国银联股份有限公司 Method and device for determining emotional tendency of user comment
CN108536601A (en) * 2018-04-13 2018-09-14 腾讯科技(深圳)有限公司 A kind of evaluating method, device, server and storage medium
CN108536601B (en) * 2018-04-13 2022-05-31 腾讯科技(深圳)有限公司 Evaluation method, device, server and storage medium
CN109522412A (en) * 2018-11-14 2019-03-26 北京神州泰岳软件股份有限公司 Text emotion analysis method, device and medium
CN109522412B (en) * 2018-11-14 2021-02-26 鼎富智能科技有限公司 Text emotion analysis method, device and medium
CN110442798A (en) * 2019-07-03 2019-11-12 华中科技大学 Comment spam groups of users detection method based on network representation study
CN110674256A (en) * 2019-09-25 2020-01-10 携程计算机技术(上海)有限公司 Detection method and system for relevancy of comment and reply of OTA hotel
CN110674256B (en) * 2019-09-25 2023-05-12 携程计算机技术(上海)有限公司 Method and system for detecting correlation degree of comment and reply of OTA hotel

Also Published As

Publication number Publication date
CN104484336B (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN104484336A (en) Chinese commentary analysis method and system
Wijeratne et al. Emojinet: Building a machine readable sense inventory for emoji
Gu et al. " what parts of your apps are loved by users?"(T)
WO2018036239A1 (en) Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN104978314B (en) Media content recommendations method and device
US20150032492A1 (en) Methods of Identifying Relevant Content and Subject Matter Expertise for Online Communities
US20150032751A1 (en) Methods and Systems for Utilizing Subject Matter Experts in an Online Community
US9710829B1 (en) Methods, systems, and articles of manufacture for analyzing social media with trained intelligent systems to enhance direct marketing opportunities
CN108885623A (en) The lexical analysis system and method for knowledge based map
CN108845986A (en) A kind of sentiment analysis method, equipment and system, computer readable storage medium
KR20130022042A (en) System for detecting and tracking topic based on topic opinion and social-influencer and method thereof
WO2020237872A1 (en) Method and apparatus for testing accuracy of semantic analysis model, storage medium, and device
Nandi et al. Bangla news recommendation using doc2vec
Ziser et al. Humor detection in product question answering systems
Awrahman et al. Sentiment analysis and opinion mining within social networks using konstanz information miner
Zhang et al. Product features extraction and categorization in Chinese reviews
Wang et al. Twiinsight: Discovering topics and sentiments from social media datasets
Tan et al. Constructing and Interpreting Causal Knowledge Graphs from News
CN110134866A (en) Information recommendation method and device
CN115269771A (en) Big data analysis system based on semantics
Jee et al. Potential of patent image data as technology intelligence source
Yin et al. Research of integrated algorithm establishment of a spam detection system
Sandesh et al. Detection of cyberbullying on twitter data using machine learning
Israeli et al. Unsupervised discovery of non-trivial similarities between online communities
Remus et al. Textual characteristics of different-sized corpora

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant