CN107045497A - A kind of quick newsletter archive content sentiment analysis system and method - Google Patents
A kind of quick newsletter archive content sentiment analysis system and method Download PDFInfo
- Publication number
- CN107045497A CN107045497A CN201710309000.3A CN201710309000A CN107045497A CN 107045497 A CN107045497 A CN 107045497A CN 201710309000 A CN201710309000 A CN 201710309000A CN 107045497 A CN107045497 A CN 107045497A
- Authority
- CN
- China
- Prior art keywords
- text
- participle
- word
- weight
- newsletter archive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
Abstract
Include the invention discloses a kind of quick newsletter archive content sentiment analysis system and method with lower module:News handling module:For capturing news documents from news portal, forum and microblogging, preliminary duplicate removal processing is carried out including to text;Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, modus tollens phrase is additionally marked;Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized, COMPREHENSIVE CALCULATING obtains the affection index of document;Data memory module:Result after storage calculating.The present invention can quickly carry out affection index calculating under a large amount of public sentiment scenes.
Description
Technical field
The present invention relates to a kind of Domestic News field, and in particular to a kind of quick newsletter archive content sentiment analysis system
And method.
Background technology
With the fast development of internet, network public-opinion is increasing to the influence power of society.Either government network carriage
The need for feelings are monitored, or enterprise is the need for branding communication and brand public relations is carried out, how under conditions of substantial amounts of public sentiment,
The Sentiment orientation of public sentiment is rapidly analyzed, is guided with carrying out decision support and public sentiment in time, the public opinion ring of response quickly change
The problem of border is in the urgent need to address in the analysis of public opinion.Conventional sentiment analysis is, it is necessary to carry out the analysis of complexity, in reply greatly
Under the conditions of the public sentiment of amount, it is impossible to accomplish that low latency is handled.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of news user feeling analysis system, in face
To under a large amount of public sentiment scenes, the quick method for carrying out affection index calculating.
The purpose of the present invention is achieved through the following technical solutions:
A kind of quick newsletter archive content sentiment analysis system, including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text
Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right
Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place
Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
A kind of quick newsletter archive content sentiment analysis method, comprises the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion strength S of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document
Affection index.
Further, the use Text Rank described in described rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just
Literary participle span is [0,100]), n represent weighting weight weight value range be how many [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1).Wherein, wt
The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
Further, the affection index of document is calculated in described step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, S d represent the affection index of document, and wt represents the weight of each participle, and St represents the affection index of each participle
The exponent value range is [- 100,100], C be a constant range value be how many [1,5], n is represented in the document, the number of word
Amount
The beneficial effects of the invention are as follows:The present invention only need to can be obtained by corresponding emotion by simple text-processing and calculating
Index analysis result, is solved in the low latency processing under the conditions of a large amount of public sentiments.
Brief description of the drawings
Fig. 1 is system structure diagram of the invention;
Fig. 2 is flow chart of the method for the present invention.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to
It is as described below.
As shown in figure 1,
A kind of quick newsletter archive content sentiment analysis system, including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text
Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right
Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place
Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
As shown in Figure 2:
A kind of quick newsletter archive content sentiment analysis method, comprises the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion intensity of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document
Affection index.
Specific operation is to capture text first, duplicate removal processing, extracts text message, including source, date, title, just
The information such as text, author, and then carry out word segmentation processing to title, text, are then handled in terms of two;One is to use Text
Rank calculates the weight of each word, and does normalized, and two be by looking up the dictionary, obtaining the Sentiment orientation and emotion of each word
Strength S(The value of emotion strength S is raising concrete numerical value scope how).
Use Text Rank described in described rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just
Literary participle span is [0,100], and n represents that weighting weight weight value range is [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1).Wherein, wt
The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
The affection index of document is calculated in described step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, Sd represents the affection index of document, and wt represents the weight of each participle, and St represents that the affection index of each participle should
Exponent value range is [- 100,100], and C is that a constant range value is [1,5], and n is represented in the document, the quantity of word.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein
Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this
In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered
Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain
It is interior.
Claims (4)
1. a kind of quick newsletter archive content sentiment analysis system, it is characterised in that including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text
Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right
Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place
Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
2. a kind of quick newsletter archive content sentiment analysis method, it is characterised in that comprise the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion strength S of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document
Affection index.
3. a kind of quick newsletter archive content sentiment analysis method according to claim 2, it is characterised in that:Described
Use Text Rank described in rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just
Literary participle, span is [0,100], and n represents to weight weight, value range be how many [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1), wherein, wt
The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
4. a kind of quick newsletter archive content sentiment analysis method according to claim 2, it is characterised in that:Described
The affection index of document is calculated in step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, Sd represents the affection index of document, and wt represents the weight of each participle, and St represents the affection index model of each participle
It is [- 100,100] to enclose, C be a constant range value be how many [1,5], n is represented in the document, the quantity of word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710309000.3A CN107045497A (en) | 2017-05-04 | 2017-05-04 | A kind of quick newsletter archive content sentiment analysis system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710309000.3A CN107045497A (en) | 2017-05-04 | 2017-05-04 | A kind of quick newsletter archive content sentiment analysis system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107045497A true CN107045497A (en) | 2017-08-15 |
Family
ID=59547113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710309000.3A Pending CN107045497A (en) | 2017-05-04 | 2017-05-04 | A kind of quick newsletter archive content sentiment analysis system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107045497A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228569A (en) * | 2018-01-30 | 2018-06-29 | 武汉理工大学 | A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose |
CN109190105A (en) * | 2018-06-28 | 2019-01-11 | 中译语通科技股份有限公司 | A kind of enterprise's public sentiment macroscopic view sentiment analysis method |
WO2019227710A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Network public opinion analysis method and apparatus, and computer-readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101639824A (en) * | 2009-08-27 | 2010-02-03 | 北京理工大学 | Text filtering method based on emotional orientation analysis against malicious information |
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
CN103678278A (en) * | 2013-12-16 | 2014-03-26 | 中国科学院计算机网络信息中心 | Chinese text emotion recognition method |
CN104933093A (en) * | 2015-05-19 | 2015-09-23 | 武汉泰迪智慧科技有限公司 | Regional public opinion monitoring and decision-making auxiliary system and method based on big data |
US20170060996A1 (en) * | 2015-08-26 | 2017-03-02 | Subrata Das | Automatic Document Sentiment Analysis |
CN106610955A (en) * | 2016-12-13 | 2017-05-03 | 成都数联铭品科技有限公司 | Dictionary-based multi-dimensional emotion analysis method |
-
2017
- 2017-05-04 CN CN201710309000.3A patent/CN107045497A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101639824A (en) * | 2009-08-27 | 2010-02-03 | 北京理工大学 | Text filtering method based on emotional orientation analysis against malicious information |
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
CN103678278A (en) * | 2013-12-16 | 2014-03-26 | 中国科学院计算机网络信息中心 | Chinese text emotion recognition method |
CN104933093A (en) * | 2015-05-19 | 2015-09-23 | 武汉泰迪智慧科技有限公司 | Regional public opinion monitoring and decision-making auxiliary system and method based on big data |
US20170060996A1 (en) * | 2015-08-26 | 2017-03-02 | Subrata Das | Automatic Document Sentiment Analysis |
CN106610955A (en) * | 2016-12-13 | 2017-05-03 | 成都数联铭品科技有限公司 | Dictionary-based multi-dimensional emotion analysis method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228569A (en) * | 2018-01-30 | 2018-06-29 | 武汉理工大学 | A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose |
WO2019227710A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Network public opinion analysis method and apparatus, and computer-readable storage medium |
CN109190105A (en) * | 2018-06-28 | 2019-01-11 | 中译语通科技股份有限公司 | A kind of enterprise's public sentiment macroscopic view sentiment analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11494648B2 (en) | Method and system for detecting fake news based on multi-task learning model | |
CN109446404B (en) | Method and device for analyzing emotion polarity of network public sentiment | |
US20170147682A1 (en) | Automated text-evaluation of user generated text | |
CN105389307A (en) | Statement intention category identification method and apparatus | |
CN111767725B (en) | Data processing method and device based on emotion polarity analysis model | |
CN113837531A (en) | Product quality problem finding and risk assessment method based on network comments | |
CN110223675B (en) | Method and system for screening training text data for voice recognition | |
CN109325124B (en) | Emotion classification method, device, server and storage medium | |
CN110321562B (en) | Short text matching method and device based on BERT | |
CN105512104A (en) | Dictionary dimension reducing method and device and information classifying method and device | |
CN112839012B (en) | Bot domain name identification method, device, equipment and storage medium | |
CN105956740B (en) | Semantic risk calculation method based on text logical features | |
US20180307677A1 (en) | Sentiment Analysis of Product Reviews From Social Media | |
CN112860902A (en) | Public opinion emotional heat degree calculation method and device | |
CN111460162B (en) | Text classification method and device, terminal equipment and computer readable storage medium | |
CN110134788B (en) | Microblog release optimization method and system based on text mining | |
CN106569989A (en) | De-weighting method and apparatus for short text | |
CN107045497A (en) | A kind of quick newsletter archive content sentiment analysis system and method | |
CN115329769A (en) | Semantic enhancement network-based platform enterprise network public opinion emotion analysis method | |
CN107145568A (en) | A kind of quick media event clustering system and method | |
CN110705250A (en) | Method and system for identifying target content in chat records | |
CN112287240A (en) | Case microblog evaluation object extraction method and device based on double-embedded multilayer convolutional neural network | |
CN110837590B (en) | Information pushing method and device, computer equipment and storage medium | |
CN111177421A (en) | Method and device for generating email historical event axis facing digital human | |
CN116561298A (en) | Title generation method, device, equipment and storage medium based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170815 |