CN107045497A - A kind of quick newsletter archive content sentiment analysis system and method - Google Patents

A kind of quick newsletter archive content sentiment analysis system and method Download PDF

Info

Publication number
CN107045497A
CN107045497A CN201710309000.3A CN201710309000A CN107045497A CN 107045497 A CN107045497 A CN 107045497A CN 201710309000 A CN201710309000 A CN 201710309000A CN 107045497 A CN107045497 A CN 107045497A
Authority
CN
China
Prior art keywords
text
participle
word
weight
newsletter archive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710309000.3A
Other languages
Chinese (zh)
Inventor
余军
卢品吟
刘盾
张汨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Hua Seiun Technology Co Ltd
Original Assignee
Chengdu Hua Seiun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hua Seiun Technology Co Ltd filed Critical Chengdu Hua Seiun Technology Co Ltd
Priority to CN201710309000.3A priority Critical patent/CN107045497A/en
Publication of CN107045497A publication Critical patent/CN107045497A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Abstract

Include the invention discloses a kind of quick newsletter archive content sentiment analysis system and method with lower module:News handling module:For capturing news documents from news portal, forum and microblogging, preliminary duplicate removal processing is carried out including to text;Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, modus tollens phrase is additionally marked;Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized, COMPREHENSIVE CALCULATING obtains the affection index of document;Data memory module:Result after storage calculating.The present invention can quickly carry out affection index calculating under a large amount of public sentiment scenes.

Description

A kind of quick newsletter archive content sentiment analysis system and method
Technical field
The present invention relates to a kind of Domestic News field, and in particular to a kind of quick newsletter archive content sentiment analysis system And method.
Background technology
With the fast development of internet, network public-opinion is increasing to the influence power of society.Either government network carriage The need for feelings are monitored, or enterprise is the need for branding communication and brand public relations is carried out, how under conditions of substantial amounts of public sentiment, The Sentiment orientation of public sentiment is rapidly analyzed, is guided with carrying out decision support and public sentiment in time, the public opinion ring of response quickly change The problem of border is in the urgent need to address in the analysis of public opinion.Conventional sentiment analysis is, it is necessary to carry out the analysis of complexity, in reply greatly Under the conditions of the public sentiment of amount, it is impossible to accomplish that low latency is handled.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of news user feeling analysis system, in face To under a large amount of public sentiment scenes, the quick method for carrying out affection index calculating.
The purpose of the present invention is achieved through the following technical solutions:
A kind of quick newsletter archive content sentiment analysis system, including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
A kind of quick newsletter archive content sentiment analysis method, comprises the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion strength S of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document Affection index.
Further, the use Text Rank described in described rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just Literary participle span is [0,100]), n represent weighting weight weight value range be how many [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1).Wherein, wt The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
Further, the affection index of document is calculated in described step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, S d represent the affection index of document, and wt represents the weight of each participle, and St represents the affection index of each participle The exponent value range is [- 100,100], C be a constant range value be how many [1,5], n is represented in the document, the number of word Amount
The beneficial effects of the invention are as follows:The present invention only need to can be obtained by corresponding emotion by simple text-processing and calculating Index analysis result, is solved in the low latency processing under the conditions of a large amount of public sentiments.
Brief description of the drawings
Fig. 1 is system structure diagram of the invention;
Fig. 2 is flow chart of the method for the present invention.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to It is as described below.
As shown in figure 1,
A kind of quick newsletter archive content sentiment analysis system, including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
As shown in Figure 2:
A kind of quick newsletter archive content sentiment analysis method, comprises the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion intensity of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document Affection index.
Specific operation is to capture text first, duplicate removal processing, extracts text message, including source, date, title, just The information such as text, author, and then carry out word segmentation processing to title, text, are then handled in terms of two;One is to use Text Rank calculates the weight of each word, and does normalized, and two be by looking up the dictionary, obtaining the Sentiment orientation and emotion of each word Strength S(The value of emotion strength S is raising concrete numerical value scope how).
Use Text Rank described in described rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just Literary participle span is [0,100], and n represents that weighting weight weight value range is [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1).Wherein, wt The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
The affection index of document is calculated in described step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, Sd represents the affection index of document, and wt represents the weight of each participle, and St represents that the affection index of each participle should Exponent value range is [- 100,100], and C is that a constant range value is [1,5], and n is represented in the document, the quantity of word.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain It is interior.

Claims (4)

1. a kind of quick newsletter archive content sentiment analysis system, it is characterised in that including with lower module:
News handling module:For capturing news documents from news portal, forum and microblogging, carried out just including to text Walk duplicate removal processing;
Newsletter archive preliminary treatment module:For carrying out preliminary text feature processing to text, including participle, remove stop words, it is right Modus tollens phrase is additionally marked;
Newsletter archive affection computation module:Including TextRank calculating, participle affection computation, calculated value is normalized place Reason, COMPREHENSIVE CALCULATING obtain the affection index of document;
Data memory module:Result after storage calculating.
2. a kind of quick newsletter archive content sentiment analysis method, it is characterised in that comprise the following steps:
S01:News is crawled from internet news door, forum and microblogging, to text duplicate removal;
S02:Extract text message, the mainly information such as source, author, title, text;
S03:Participle is carried out to title, text, removes stop words;
S04:The weight of each word is calculated using Text Rank;
S05:Simultaneously according to sentiment dictionary, the Sentiment orientation and emotion strength S of each word are obtained;
S06:Finally the weight of word is multiplied with the emotion intensity of word, summation is calculated, is normalized, so as to obtain document Affection index.
3. a kind of quick newsletter archive content sentiment analysis method according to claim 2, it is characterised in that:Described Use Text Rank described in rapid S04 calculate the weight of each word, specifically include
Word to title is additionally weighted, and weighting algorithm is wt=n × wd, wherein, wt represents title participle, and wd is represented just Literary participle, span is [0,100], and n represents to weight weight, value range be how many [2,10];
Part of speech filtering is carried out to participle, only retains nominal and verb character participle;
The weight of each word is calculated using Text Rank algorithms;
Result of calculation is normalized, normalized calculation is wt=wt/(max (wt)+1), wherein, wt The word weight that Text Rank are calculated is indicated, max (wt) represents weight maximum in the document.
4. a kind of quick newsletter archive content sentiment analysis method according to claim 2, it is characterised in that:Described The affection index of document is calculated in step S06 according to participle, specific calculation is
Sd = ∑(wt × St) × C/n
Wherein, Sd represents the affection index of document, and wt represents the weight of each participle, and St represents the affection index model of each participle It is [- 100,100] to enclose, C be a constant range value be how many [1,5], n is represented in the document, the quantity of word.
CN201710309000.3A 2017-05-04 2017-05-04 A kind of quick newsletter archive content sentiment analysis system and method Pending CN107045497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710309000.3A CN107045497A (en) 2017-05-04 2017-05-04 A kind of quick newsletter archive content sentiment analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710309000.3A CN107045497A (en) 2017-05-04 2017-05-04 A kind of quick newsletter archive content sentiment analysis system and method

Publications (1)

Publication Number Publication Date
CN107045497A true CN107045497A (en) 2017-08-15

Family

ID=59547113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710309000.3A Pending CN107045497A (en) 2017-05-04 2017-05-04 A kind of quick newsletter archive content sentiment analysis system and method

Country Status (1)

Country Link
CN (1) CN107045497A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228569A (en) * 2018-01-30 2018-06-29 武汉理工大学 A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose
CN109190105A (en) * 2018-06-28 2019-01-11 中译语通科技股份有限公司 A kind of enterprise's public sentiment macroscopic view sentiment analysis method
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
US20170060996A1 (en) * 2015-08-26 2017-03-02 Subrata Das Automatic Document Sentiment Analysis
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
US20170060996A1 (en) * 2015-08-26 2017-03-02 Subrata Das Automatic Document Sentiment Analysis
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228569A (en) * 2018-01-30 2018-06-29 武汉理工大学 A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN109190105A (en) * 2018-06-28 2019-01-11 中译语通科技股份有限公司 A kind of enterprise's public sentiment macroscopic view sentiment analysis method

Similar Documents

Publication Publication Date Title
US11494648B2 (en) Method and system for detecting fake news based on multi-task learning model
CN109446404B (en) Method and device for analyzing emotion polarity of network public sentiment
US20170147682A1 (en) Automated text-evaluation of user generated text
CN105389307A (en) Statement intention category identification method and apparatus
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
CN113837531A (en) Product quality problem finding and risk assessment method based on network comments
CN110223675B (en) Method and system for screening training text data for voice recognition
CN109325124B (en) Emotion classification method, device, server and storage medium
CN110321562B (en) Short text matching method and device based on BERT
CN105512104A (en) Dictionary dimension reducing method and device and information classifying method and device
CN112839012B (en) Bot domain name identification method, device, equipment and storage medium
CN105956740B (en) Semantic risk calculation method based on text logical features
US20180307677A1 (en) Sentiment Analysis of Product Reviews From Social Media
CN112860902A (en) Public opinion emotional heat degree calculation method and device
CN111460162B (en) Text classification method and device, terminal equipment and computer readable storage medium
CN110134788B (en) Microblog release optimization method and system based on text mining
CN106569989A (en) De-weighting method and apparatus for short text
CN107045497A (en) A kind of quick newsletter archive content sentiment analysis system and method
CN115329769A (en) Semantic enhancement network-based platform enterprise network public opinion emotion analysis method
CN107145568A (en) A kind of quick media event clustering system and method
CN110705250A (en) Method and system for identifying target content in chat records
CN112287240A (en) Case microblog evaluation object extraction method and device based on double-embedded multilayer convolutional neural network
CN110837590B (en) Information pushing method and device, computer equipment and storage medium
CN111177421A (en) Method and device for generating email historical event axis facing digital human
CN116561298A (en) Title generation method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170815