CN107315797A - A kind of Internet news is obtained and text emotion forecasting system - Google Patents
A kind of Internet news is obtained and text emotion forecasting system Download PDFInfo
- Publication number
- CN107315797A CN107315797A CN201710463295.XA CN201710463295A CN107315797A CN 107315797 A CN107315797 A CN 107315797A CN 201710463295 A CN201710463295 A CN 201710463295A CN 107315797 A CN107315797 A CN 107315797A
- Authority
- CN
- China
- Prior art keywords
- news
- feature
- text
- keyword
- votes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
A kind of Internet news is obtained and text emotion forecasting system, training set is used as using the newsletter archive that network is crawled, utilize Algorithm of documents categorization, training pattern is set up, expected news and journals text is treated according to training pattern and classified, automatic Emotion tagging, predict the influence that Internet news text to be delivered is likely to result in public sentiment, the text emotion forecasting system that social news influence on public's emotion is built, the public sentiment that one news of prediction is likely to result in provides facility for network security.
Description
Technical field
Obtained and text emotion prediction system the present invention relates to intelligent use technical field, more particularly to a kind of Internet news
System.
Background technology
The today developed rapidly in internet, network has turned into the important sources that people obtain information, promotes people more
Easily understand society's dynamic.But in actual life, people are after a piece of news is read, and generation that can be autonomous is corresponding
Emotion, for example:Most people see that " outdated " news can become indignation;See that " so-and-so does not hesitate to do what is right " news can be felt
It is dynamic, in fact, body may be free of any emotion word (such as " gloomy ", " sad ", " happiness ") in itself, but read this
Class news can but allow people to produce certain Sentiment orientation, and this Sentiment orientation has certain regularity of distribution, and most people are to certain
The emotional responses of one news is basically identical.
With the increase of data volume, government and website maintenance person can not predict Internet news in advance and the common people may be produced
Mood and its social influence, in consideration of it, in the urgent need to a text emotion forecasting system, the common people can be predicted and read news
After can issuable emotion, prevention in advance predicted and the mesh of the analysis of public opinion emotion tendentiousness of text with intervening so as to reach
's.
The many text emotion forecasting systems occurred at present, mainly for subjective text (such as comment, viewpoint, taste)
Analysis mining is carried out, i.e., the emotion keyword in subjective text is found out the user feeling that these subjective texts are reflected and inclined
To usually containing positive and negative two classes emotion, it is impossible to by network analysis body, the shadow hidden in news is excavated with this
Ring the factor of people's emotion.
The content of the invention
Technical problem solved by the invention is that providing a kind of Internet news obtains and text emotion forecasting system, with
Solve the shortcoming in above-mentioned background technology.
Technical problem solved by the invention is realized using following technical scheme:
A kind of Internet news is obtained and text emotion forecasting system, and training set, profit are used as using the newsletter archive that network is crawled
With Algorithm of documents categorization, training pattern is set up, expected news and journals text is treated according to training pattern and classified, automatic emotion mark
Note, predicts the influence that Internet news text to be delivered is likely to result in public sentiment, specific steps:
One) training set is used as using the newsletter archive that network is crawled
Info web is crawled by reptile magnanimity, and body and votes are parsed during crawling, while basis
The keyword of setting builds corpus to body progress pretreatment matching, and body is carried out certainly according to votes
Dynamic Emotion tagging, to obtain the language material for the demand that meets and store to local;
1. social news are obtained
The social news website containing emotion votes is mainly captured, webpage is crawled using reptile, first analyzing web site
Structure, will be extracted out with the news related content news links URL that will be crawled from webpage source code, obtain the corresponding URL of news
Afterwards, sent and asked using HttpClient, receive response, and utilize HtmlParser resolution responses, to obtain and the news pair
The content answered, such as title, text and votes;Using the principle filtered in crawl, if the news that analyzing web page is obtained is just
There is not keyword or word approximate therewith (being given by user) in text or title, then it is assumed that the news is unrelated with keyword,
Give up;
2. building of corpus and data storage
Corpus selects MySQL, sets up form, and the related text of the keyword for crawling to setting is stored in into corpus;
I) news table is created, the field of table has news links news_url, headline news_title, body
News_content and news votes news_vote, using news links as major key, the content that reptile is crawled is stored in news
Table;
Ii antistop list) is created, the field of table has keyword sequence number keyword_id, keyword keyword;Read and use
The keyword of family setting, using keyword as major key, is stored to antistop list;
Iii concordance list) is created, the field of table has sequence number id, keyword keyword, headline news_title, new
The news for including keyword in text news_content and news votes news_vote, selection news table is heard, with key
Word is index, is stored to concordance list;
3. automatic marking emotional category
According to the corresponding responses of parsing news URL, the votes of every news are obtained, based on votes, are set certainly
Dynamic mark embodiment:
A) the total threshold value N of self-defined ballot, if the ballot sum of some news is less than N, the news will be skipped;
B) self-defined difference threshold M, if votes most in some news and the difference of secondary many votes are less than M,
Then it is not involved in building corpus;
If c) news votes exceed threshold value N and M, it is the most class of votes to mark the news, realizes automatic mark
In note, the corresponding table of newsletter archive category deposit training set most at last after automatic marking;
Two) Text Pretreatment
Newsletter archive in training set is pre-processed, including participle and removes stop words, based on the Chinese Academy of Sciences
ICTCLAS2015 and lucence Words partition system interfaces, to complete participle;User Defined is allowed to disable vocabulary, it is possible to use
Acquiescence disables vocabulary, to filter out the word that class discrimination is indifferent, semantic information is few;
Three) feature selecting and feature weight are set
Feature selecting is carried out to the training set newsletter archive that pretreatment is finished to set with weight, feature selecting is that removing is special
The feature of effective information can not be preferably represented in collection, to improve the classification degree of accuracy and reduce computation complexity;Weight is set
Using the statistical information of newsletter archive, certain weights are assigned to characteristic item;
1) text vector spatial model is built
First, the newsletter archive of training set is converted into computer-readable format, structure will be converted to without structure text
Change text, a newsletter archive document is converted into vector, vector is selected per one-dimensional value representative feature weight by feature
Construction feature dictionary is selected, the vocabulary of feature lexicon is N, N-dimensional vector representation newsletter archive is built, using weighing computation method
Calculate per one-dimensional weighted value, to build text vector spatial model;
2) feature selecting
Using feature is extracted under unitary word, three kinds of granularities of binary word and theme, feature selecting extraction deposits feature after finishing
Storage is in HashMap;Text feature is being extracted, with related between chi quantity algorithm computation measure word and document classification
Degree, the chi value of a certain class of word correspondence is higher, and explanation may represent a certain class document, that is, the class discrimination letter having
Breath is more, for multi-class problem, first calculates chi-square value of the word for each classification, then chooses maximum of which value
It is used as chi-square value of the word on whole language material;
3) feature weight is set
Feature weight is used to weigh significance level or separating capacity power of some characteristic item in text representation, uses
TFIDF calculates weight, and wherein TF is word frequency, for calculating the ability that the word describes document content;IDF is inverse document frequency, is used
In the ability for calculating word differentiation document;
Four) training pattern is set up
By SVM training methods, nonlinear transformation is carried out to the chi-square value kernel function for being provided with feature weight, will be inputted
Nonlinear characteristic DUAL PROBLEMS OF VECTOR MAPPING to high-dimensional feature space, optimal linear classifying face is then found in high-dimensional feature space, with
Text class is separated, training pattern is set up;
I) training set vector model
User-defined feature dimension, feature is extracted according to feature selection approach, sets the weight under granularity, but intrinsic dimensionality
The excessive training speed that is easily caused is slow, and over-fitting and the excessive noise of introducing, intrinsic dimensionality are too small, can not carry enough texts
Information, will all produce influence to classification performance, therefore the training pattern being arranged under different characteristic dimension, using cross validation or
Person predicts the classification accuracy on test set, determines optimal input dimension, sets up training set vector model;
II) input normalization
Because training set vector model initial data possible range is excessive or too small, first by training set vector model original number
Input normalization is carried out according to re-scaling to proper range, makes training and predetermined speed faster;
III) cross validation parameter optimization
Using grid search, it is allowed to the initial value of gamma functions, step-length in self-defined loss function and kernel function, use
5 folding cross-validation methods evaluate the quality of the training pattern under different loss functions and gamma functions, can so avoid random
Factor is disturbed, and optimal loss function and kernel function is obtained, to set up SVM models;Wherein, refer to will be initial for 5 folding cross validations
Sampling is divided into 5 subsamples, and an independent subsample is kept as verifying the data of model, and other 4 samples are used for instructing
Practice, cross validation is repeated 5 times, once, average 5 times result finally gives single estimation for each subsample checking;
Five) prediction output
The info web that reptile magnanimity is crawled carries out being loaded into training vector model after input normalization, uses SVM models
Treat classifying text to be predicted, output prediction class label.
Beneficial effect:The newsletter archive of the invention crawled using network is utilized Algorithm of documents categorization, set up as training set
Training pattern, treats expected news and journals text according to training pattern and is classified, automatic Emotion tagging predicts news to be delivered
The influence that text is likely to result in public sentiment, builds the text emotion forecasting system that social news influence on public's emotion, in advance
The public sentiment that a news is likely to result in is surveyed, facility is provided for network security.
Brief description of the drawings
Fig. 1 is the flow chart of presently preferred embodiments of the present invention.
Embodiment
In order that the technical means, the inventive features, the objects and the advantages of the present invention are easy to understand, below
With reference to being specifically illustrating, the present invention is expanded on further.
A kind of Internet news shown in Figure 1 is obtained and text emotion forecasting system, specific steps:
One) set up and training set is used as using the newsletter archive that network is crawled
Info web is crawled by reptile magnanimity, and body and votes are parsed during crawling, while basis
The keyword of setting builds corpus to body progress pretreatment matching, and body is carried out certainly according to votes
Dynamic Emotion tagging, to obtain the language material for the demand that meets and store to local;
1. social news are obtained
The social news website containing emotion votes is mainly captured, webpage is crawled using reptile, first analyzing web site
Structure, will extract from webpage source code with the news related content (such as news links URL) that will be crawled, filters out
Useless link such as advertisement;Obtain after the corresponding URL of news, sent and asked using HttpClient, receive response, and utilize
HtmlParser resolution responses, to obtain content corresponding with the news, such as title, text and votes;Using in crawl
The principle of filtering, if do not occur in body or title that analyzing web page is obtained keyword or word approximate therewith (by with
Family gives), then it is assumed that the news is unrelated with keyword, gives up;
2. building of corpus and data storage
Corpus selects MySQL, sets up form, and the related text of the keyword for crawling to setting is stored in into corpus;
I) news table is created, the field of table has news links news_url, headline news_title, body
News_content and news votes news_vote, using news links as major key, the news that can prevent insertion from repeating will be climbed
The content deposit news table that worm crawls;
Ii antistop list) is created, the field of table has keyword sequence number keyword_id, keyword keyword;Read and use
The keyword of family setting, using keyword as major key, is stored to antistop list;
Iii concordance list) is created, the field of table has sequence number id, keyword keyword, headline news_title, new
The news for including keyword in text news_content and news votes news_vote, selection news table is heard, with key
Word is index, is stored to concordance list;
3. automatic marking emotional category
According to the corresponding responses of parsing news URL, the votes of every news are obtained, based on votes, are set certainly
Dynamic mark embodiment:
A) the total threshold value N of self-defined ballot, if the ballot sum of some news is less than N, the news will be skipped;
B) self-defined difference threshold M, if votes most in some news and the difference of secondary many votes are less than M,
Then it is not involved in building corpus;
If c) news votes exceed threshold value N and M, it is the most class of votes to mark the news, realizes automatic mark
In note, the corresponding table of newsletter archive category deposit training set most at last after automatic marking;
Two) Text Pretreatment
Newsletter archive in training set is pre-processed, including participle and removes stop words, based on the Chinese Academy of Sciences
ICTCLAS2015 and lucence Words partition system interfaces, to complete participle;User Defined is allowed to disable vocabulary, it is possible to use
Acquiescence disables vocabulary, to filter out the word (such as, etc.) that class discrimination is indifferent, semantic information is few;
Three) feature selecting and feature weight are set
Feature selecting is carried out to the training set newsletter archive that pretreatment is finished to set with weight, feature selecting is that removing is special
The feature of effective information can not be preferably represented in collection, to improve the classification degree of accuracy and reduce computation complexity;Weight is set
Using the statistical information of newsletter archive, certain weights are assigned to characteristic item;
1) text vector spatial model is built
First, the newsletter archive of training set is converted into computer-readable format, structure will be converted to without structure text
Change text, in order to computer disposal;The present embodiment uses vector space model, i.e., turn a newsletter archive document
Vector is changed to, vector is per one-dimensional value representative feature weight;
Specific steps:If newsletter archive document={ t1, a w1;... tm, wm }, wherein, tnFor the n-th dimension
Characteristic item, wnFor the n-th right-safeguarding weight values, by taking this paper as an example, pass through feature selecting construction feature dictionary, the vocabulary of feature lexicon
For N, build N-dimensional vector representation newsletter archive, calculated using weighing computation method per one-dimensional weighted value, with build text to
Quantity space model;
2) feature selecting
Using feature is extracted under unitary word, three kinds of granularities of binary word and theme, it is characterized as with being extracted under binary word granularity
Example:Using Skip-Bigrams binary word feature lexicons, training set newsletter archive content is subjected to middle largest interval according to word
Operated for 2 sliding window, form the word fragment sequence that length is 2, be then stored in HashMap, you can produce with bright
The binary feature word of aobvious emotion tendency, such as in " I | love | Chinese " the words, uses Skip-Bigram binary word Feature Words
Allusion quotation can produce " I/love ", " love/China ", binary phrase as " I/Chinese ", wherein occurring in that " love/China " so
The abundant Feature Words of one semanteme, on large-scale corpus, can obtain more this cooccurrence relations, divide after word and be stored in
In HashMap;
After feature selecting structure is finished, text feature is extracted, the present embodiment is by taking chi quantity algorithm as an example:By right
The comparison of theoretical value and actual value, it is determined that it is theoretical whether correct, the degree of correlation between word and document classification is mainly measured,
Suppositive t and document classification c obeys the chi square distribution of the single order free degree, then the chi value of a certain class of word correspondence is higher,
Just illustrate that it may represent a certain class document, that is, the class discrimination information having is more, and the formula of card side is as follows:
Wherein, A represents to include word t number of files in c classes, and B, which represents to remove, includes word t number of files in c classes, C is represented in c classes
Number of files not comprising word t, D represents to remove the number of files for not including word t in c classes, and sum is all number of files;
For multi-class problem, chi-square values of the word t for each classification can be first calculated, is then chosen wherein most
Chi-square value of the big value as word t on whole language material;
3) feature weight is set
Feature weight is used to weigh significance level or separating capacity power of some characteristic item in text representation, this reality
Apply example and weight is calculated using TFIDF, wherein TF is word frequency, for calculating the ability that the word describes document content;IDF is inverse text
Shelves frequency, for calculating the ability that the word distinguishes document;
Four) training pattern is set up
The present embodiment is by taking SVM training methods as an example, and its basic mode is to carry out nonlinear transformation by kernel function, will be defeated
The nonlinear characteristic DUAL PROBLEMS OF VECTOR MAPPING entered then finds optimal linear classifying face to high-dimensional feature space in high-dimensional feature space,
So that text class to be separated, training pattern is set up;
I) training set vector model
User-defined feature dimension, feature is extracted according to feature selection approach, sets the weight under binary word granularity, is built
Text vector spatial model, but the excessive training speed that is easily caused of intrinsic dimensionality is slow, over-fitting and the excessive noise of introducing, feature
Dimension is too small, can not carry enough text messages, influence will be all produced on classification performance, therefore be arranged on different characteristic dimension
Under training pattern, the classification accuracy on test set using cross validation or prediction determines optimal input dimension, builds
Vertical training set vector model;
II) input normalization
Because training set initial data possible range is excessive or too small, first training set initial data re-scaling can be arrived
Proper range carries out input normalization, makes training and predetermined speed faster;
III) cross validation parameter optimization
Need to set gamma functions (G) in some important parameters, such as loss function (C) and kernel function in SVM,
It just can guarantee that overall Generalization Capability is good, the present embodiment uses grid search, it is allowed to self-defined C, G initial value and step-length, makes
The quality of the training pattern under different loss functions and gamma functions is evaluated with 5 folding cross-validation methods, can so avoid with
Machine factor is disturbed, and optimal C, G is obtained, to set up SVM models.Wherein, 5 folding cross validations refer to initial samples being divided into 5
Individual subsample, an independent subsample is kept as verifying the data of model, and other 4 samples are used for training, cross validation
It is repeated 5 times, once, average 5 times result finally gives single estimation for each subsample checking;
Five) prediction output
The info web that reptile magnanimity is crawled carries out being loaded into training vector model after input normalization, uses SVM models
Treat classifying text to be predicted, output prediction class label.
The general principle and principal character and advantages of the present invention of the present invention has been shown and described above.The skill of the industry
Art personnel are it should be appreciated that the present invention is not limited to the above embodiments, and described in above-described embodiment and specification is explanation
The principle of the present invention, without departing from the spirit and scope of the present invention, various changes and modifications of the present invention are possible, this
A little changes and improvements all fall within the protetion scope of the claimed invention.The claimed scope of the invention is by appended claim
Book and its equivalent thereof.
Claims (4)
1. a kind of Internet news is obtained and text emotion forecasting system, it is characterised in that the newsletter archive crawled using network as
Training set, using Algorithm of documents categorization, sets up training pattern, treats expected news and journals text according to training pattern and is classified, from
Dynamic Emotion tagging, specific steps:
One)Training set is used as using the newsletter archive that network is crawled
Info web is crawled by reptile magnanimity, and body and votes are parsed during crawling, while according to setting
Keyword pretreatment matching carried out to body build corpus, automatic emotion is being carried out to body according to votes
Mark, to obtain the corpus for the demand that meets and store to local disk;
Two)Text Pretreatment
Newsletter archive in training set is pre-processed, including participle and removes stop words, based on Chinese Academy of Sciences ICTCLAS2015 and
Lucence Words partition system interfaces, to complete participle;
Three)Feature selecting and feature weight are set
Feature selecting is carried out to the training set newsletter archive that pretreatment is finished to set with weight, feature selecting is removed in feature set
The feature of effective information can not be preferably represented, to improve the classification degree of accuracy and reduce computation complexity;It is using new that weight, which is set,
The statistical information of text is heard, certain weights are assigned to characteristic item;
1)Build text vector spatial model
First, the newsletter archive of training set is converted into computer-readable format, structuring text will be converted to without structure text
This, vector is converted to by a newsletter archive document, and vector is built per one-dimensional value representative feature weight by feature selecting
Feature lexicon, the vocabulary of feature lexicon is N, builds N-dimensional vector representation newsletter archive, calculates each using weighing computation method
The weighted value of dimension, to build text vector spatial model;
2)Feature selecting
Using feature is extracted under unitary word, three kinds of granularities of binary word and theme, feature selecting extraction exists characteristic storage after finishing
In HashMap;Text feature is being extracted, with the degree of correlation between chi quantity algorithm computation measure word and document classification,
The chi value of a certain class of word correspondence is higher, and explanation may represent a certain class document, that is, the class discrimination information having is more,
For multi-class problem, chi-square value of the word for each classification is first calculated, maximum of which value is then chosen and exists as word
Chi-square value on whole corpus;
3)Feature weight is set
Feature weight calculates weight using TFIDF, and wherein TF is word frequency, for calculating the ability that the word describes document content;
IDF is inverse document frequency, for calculating the ability that the word distinguishes document;
Four)Set up training pattern
By SVM training methods, nonlinear transformation is carried out to the chi-square value kernel function for being provided with feature weight, by the non-thread of input
Property maps feature vectors then find optimal linear classifying face, by text to high-dimensional feature space in high-dimensional feature space
Class is separated, and sets up training pattern;
I)Training set vector model
User-defined feature dimension, feature is extracted according to feature selection approach, sets the weight under granularity, builds the vectorial mould of training set
Type;
II)Input normalization
Because training set vector model initial data possible range is excessive or too small, first by training set vector model initial data again
Zoom to proper range and carry out input normalization;
III)Cross validation parameter optimization
Using grid search, it is allowed to the initial value of gamma functions, step-length in self-defined loss function and kernel function, handed over using 5 foldings
The quality of proof method evaluation training pattern under different loss functions and gamma functions is pitched, optimal loss function and core letter is obtained
Number, to set up SVM models;
Five)Prediction output
The info web that reptile magnanimity is crawled carries out being loaded into training vector model after input normalization, is treated using SVM models
Classifying text is predicted, output prediction class label.
2. a kind of Internet news according to claim 1 is obtained and text emotion forecasting system, it is characterised in that training set
Construction step is as follows:
1. social news are obtained
The social news website containing emotion votes is mainly captured, the structure of webpage, first analyzing web site is crawled using reptile,
It will be extracted out with the news related content news links URL that will be crawled from webpage source code, obtain after the corresponding URL of news, make
Sent and asked with HttpClient, receive response, and utilize HtmlParser resolution responses, to obtain in corresponding with the news
Hold, such as title, text and votes;Using the principle filtered in crawl, if body or title that analyzing web page is obtained
In there is not keyword or word approximate therewith, then it is assumed that the news is unrelated with keyword, gives up;
2. building of corpus and data storage
Corpus selects MySQL, sets up form, and the related text of the keyword for crawling to setting is stored in into corpus;
i)News table is created, the field of table has news links news_url, headline news_title, body news_
Content and news votes news_vote, using news links as major key, the content that reptile is crawled is stored in news table;
ii)Antistop list is created, the field of table has keyword sequence number keyword_id, keyword keyword;Read user's setting
Keyword, using keyword as major key, store to antistop list;
iii)Concordance list is created, the field of table has sequence number id, keyword keyword, headline news_title, body
The news of keyword is included in news_content and news votes news_vote, selection news table, using keyword as rope
Draw, store to concordance list;
3. automatic marking emotional category
According to the corresponding responses of parsing news URL, the votes of every news are obtained, based on votes, automatic mark is set
Note in emotional category, the corresponding table of newsletter archive category deposit training set most at last after automatic marking.
3. a kind of Internet news according to claim 2 is obtained and text emotion forecasting system, it is characterised in that set certainly
Dynamic mark emotional category embodiment:
a)The self-defined total threshold value N of ballot, if the ballot sum of some news is less than N, the news will be skipped;
b)Self-defined difference threshold M, if votes most in some news and the difference of secondary many votes are less than M, no
Participate in building corpus;
c)If news votes exceed threshold value N and M, it is the most class of votes to mark the news, realizes automatic marking.
4. a kind of Internet news according to claim 1 is obtained and text emotion forecasting system, it is characterised in that 5 foldings are handed over
Fork proof method is concretely comprised the following steps is divided into 5 subsamples by initial samples, and an independent subsample is kept as verifying model
Data, other 4 samples are used for training, and cross validation is repeated 5 times, the checking of each subsample once, average 5 times result,
Finally give single estimation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463295.XA CN107315797A (en) | 2017-06-19 | 2017-06-19 | A kind of Internet news is obtained and text emotion forecasting system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463295.XA CN107315797A (en) | 2017-06-19 | 2017-06-19 | A kind of Internet news is obtained and text emotion forecasting system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107315797A true CN107315797A (en) | 2017-11-03 |
Family
ID=60181878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710463295.XA Pending CN107315797A (en) | 2017-06-19 | 2017-06-19 | A kind of Internet news is obtained and text emotion forecasting system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107315797A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885833A (en) * | 2017-11-09 | 2018-04-06 | 山东师范大学 | Method and system based on the change of Web newsletter archive quick detections ground mulching |
CN108153853A (en) * | 2017-12-22 | 2018-06-12 | 齐鲁工业大学 | Chinese Concept Vectors generation method and device based on Wikipedia link structures |
CN108363699A (en) * | 2018-03-21 | 2018-08-03 | 浙江大学城市学院 | A kind of netizen's school work mood analysis method based on Baidu's mhkc |
CN108389082A (en) * | 2018-03-15 | 2018-08-10 | 火烈鸟网络(广州)股份有限公司 | A kind of game intelligence ranking method and system |
CN108509629A (en) * | 2018-04-09 | 2018-09-07 | 南京大学 | Text emotion analysis method based on emotion dictionary and support vector machine |
CN108595704A (en) * | 2018-05-10 | 2018-09-28 | 成都信息工程大学 | A kind of the emotion of news and classifying importance method based on soft disaggregated model |
CN108829898A (en) * | 2018-06-29 | 2018-11-16 | 无码科技(杭州)有限公司 | HTML content page issuing time extracting method and system |
CN109376244A (en) * | 2018-10-25 | 2019-02-22 | 山东省通信管理局 | A kind of swindle website identification method based on tagsort |
CN109409537A (en) * | 2018-09-29 | 2019-03-01 | 深圳市元征科技股份有限公司 | A kind of Maintenance Cases classification method and device |
CN109471942A (en) * | 2018-11-07 | 2019-03-15 | 合肥工业大学 | Chinese comment sensibility classification method and device based on evidential reasoning rule |
CN109522927A (en) * | 2018-10-09 | 2019-03-26 | 北京奔影网络科技有限公司 | Sentiment analysis method and device for user message |
CN109657057A (en) * | 2018-11-22 | 2019-04-19 | 天津大学 | A kind of short text sensibility classification method of combination SVM and document vector |
CN109710825A (en) * | 2018-11-02 | 2019-05-03 | 成都三零凯天通信实业有限公司 | Webpage harmful information identification method based on machine learning |
CN109783800A (en) * | 2018-12-13 | 2019-05-21 | 北京百度网讯科技有限公司 | Acquisition methods, device, equipment and the storage medium of emotion keyword |
CN110298403A (en) * | 2019-07-02 | 2019-10-01 | 郭刚 | The sentiment analysis method and system of enterprise dominant in a kind of financial and economic news |
TWI681308B (en) * | 2018-11-01 | 2020-01-01 | 財團法人資訊工業策進會 | Apparatus and method for predicting response of an article |
CN110728139A (en) * | 2018-06-27 | 2020-01-24 | 鼎复数据科技(北京)有限公司 | Key information extraction model and construction method thereof |
CN112100372A (en) * | 2020-08-20 | 2020-12-18 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Head news prediction classification method |
CN112131384A (en) * | 2020-08-27 | 2020-12-25 | 科航(苏州)信息科技有限公司 | News classification method and computer-readable storage medium |
WO2020258481A1 (en) * | 2019-06-28 | 2020-12-30 | 平安科技(深圳)有限公司 | Method and apparatus for intelligently recommending personalized text, and computer-readable storage medium |
CN112201225A (en) * | 2020-09-30 | 2021-01-08 | 北京大米科技有限公司 | Corpus obtaining method and device, readable storage medium and electronic equipment |
CN112819023A (en) * | 2020-06-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Sample set acquisition method and device, computer equipment and storage medium |
CN113127595A (en) * | 2021-04-26 | 2021-07-16 | 数库(上海)科技有限公司 | Method, device, equipment and storage medium for extracting viewpoint details of research and report abstract |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331506A (en) * | 2014-11-20 | 2015-02-04 | 北京理工大学 | Multiclass emotion analyzing method and system facing bilingual microblog text |
CN104572613A (en) * | 2013-10-21 | 2015-04-29 | 富士通株式会社 | Data processing device, data processing method and program |
CN105183717A (en) * | 2015-09-23 | 2015-12-23 | 东南大学 | OSN user emotion analysis method based on random forest and user relationship |
CN105589941A (en) * | 2015-12-15 | 2016-05-18 | 北京百分点信息科技有限公司 | Emotional information detection method and apparatus for web text |
CN105824922A (en) * | 2016-03-16 | 2016-08-03 | 重庆邮电大学 | Emotion classifying method fusing intrinsic feature and shallow feature |
-
2017
- 2017-06-19 CN CN201710463295.XA patent/CN107315797A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572613A (en) * | 2013-10-21 | 2015-04-29 | 富士通株式会社 | Data processing device, data processing method and program |
CN104331506A (en) * | 2014-11-20 | 2015-02-04 | 北京理工大学 | Multiclass emotion analyzing method and system facing bilingual microblog text |
CN105183717A (en) * | 2015-09-23 | 2015-12-23 | 东南大学 | OSN user emotion analysis method based on random forest and user relationship |
CN105589941A (en) * | 2015-12-15 | 2016-05-18 | 北京百分点信息科技有限公司 | Emotional information detection method and apparatus for web text |
CN105824922A (en) * | 2016-03-16 | 2016-08-03 | 重庆邮电大学 | Emotion classifying method fusing intrinsic feature and shallow feature |
Non-Patent Citations (2)
Title |
---|
刘泽光: "网络舆情分析关键技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
叶升阳: "基于网络评论的倾向性分析研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885833A (en) * | 2017-11-09 | 2018-04-06 | 山东师范大学 | Method and system based on the change of Web newsletter archive quick detections ground mulching |
CN107885833B (en) * | 2017-11-09 | 2020-05-05 | 山东师范大学 | Method and system for rapidly detecting earth surface coverage change based on Web news text |
CN108153853A (en) * | 2017-12-22 | 2018-06-12 | 齐鲁工业大学 | Chinese Concept Vectors generation method and device based on Wikipedia link structures |
CN108153853B (en) * | 2017-12-22 | 2022-02-01 | 齐鲁工业大学 | Chinese concept vector generation method and device based on Wikipedia link structure |
CN108389082A (en) * | 2018-03-15 | 2018-08-10 | 火烈鸟网络(广州)股份有限公司 | A kind of game intelligence ranking method and system |
CN108389082B (en) * | 2018-03-15 | 2021-07-06 | 火烈鸟网络(广州)股份有限公司 | Intelligent game rating method and system |
CN108363699A (en) * | 2018-03-21 | 2018-08-03 | 浙江大学城市学院 | A kind of netizen's school work mood analysis method based on Baidu's mhkc |
CN108509629A (en) * | 2018-04-09 | 2018-09-07 | 南京大学 | Text emotion analysis method based on emotion dictionary and support vector machine |
CN108509629B (en) * | 2018-04-09 | 2022-05-13 | 南京大学 | Text emotion analysis method based on emotion dictionary and support vector machine |
CN108595704A (en) * | 2018-05-10 | 2018-09-28 | 成都信息工程大学 | A kind of the emotion of news and classifying importance method based on soft disaggregated model |
CN110728139A (en) * | 2018-06-27 | 2020-01-24 | 鼎复数据科技(北京)有限公司 | Key information extraction model and construction method thereof |
CN108829898B (en) * | 2018-06-29 | 2020-11-20 | 无码科技(杭州)有限公司 | HTML content page release time extraction method and system |
CN108829898A (en) * | 2018-06-29 | 2018-11-16 | 无码科技(杭州)有限公司 | HTML content page issuing time extracting method and system |
CN109409537A (en) * | 2018-09-29 | 2019-03-01 | 深圳市元征科技股份有限公司 | A kind of Maintenance Cases classification method and device |
CN109522927A (en) * | 2018-10-09 | 2019-03-26 | 北京奔影网络科技有限公司 | Sentiment analysis method and device for user message |
CN109376244A (en) * | 2018-10-25 | 2019-02-22 | 山东省通信管理局 | A kind of swindle website identification method based on tagsort |
TWI681308B (en) * | 2018-11-01 | 2020-01-01 | 財團法人資訊工業策進會 | Apparatus and method for predicting response of an article |
CN109710825A (en) * | 2018-11-02 | 2019-05-03 | 成都三零凯天通信实业有限公司 | Webpage harmful information identification method based on machine learning |
CN109471942B (en) * | 2018-11-07 | 2021-09-07 | 合肥工业大学 | Chinese comment emotion classification method and device based on evidence reasoning rule |
CN109471942A (en) * | 2018-11-07 | 2019-03-15 | 合肥工业大学 | Chinese comment sensibility classification method and device based on evidential reasoning rule |
CN109657057A (en) * | 2018-11-22 | 2019-04-19 | 天津大学 | A kind of short text sensibility classification method of combination SVM and document vector |
CN109783800B (en) * | 2018-12-13 | 2024-04-12 | 北京百度网讯科技有限公司 | Emotion keyword acquisition method, device, equipment and storage medium |
CN109783800A (en) * | 2018-12-13 | 2019-05-21 | 北京百度网讯科技有限公司 | Acquisition methods, device, equipment and the storage medium of emotion keyword |
WO2020258481A1 (en) * | 2019-06-28 | 2020-12-30 | 平安科技(深圳)有限公司 | Method and apparatus for intelligently recommending personalized text, and computer-readable storage medium |
CN110298403B (en) * | 2019-07-02 | 2023-12-12 | 北京金融大数据有限公司 | Emotion analysis method and system for enterprise main body in financial news |
CN110298403A (en) * | 2019-07-02 | 2019-10-01 | 郭刚 | The sentiment analysis method and system of enterprise dominant in a kind of financial and economic news |
CN112819023A (en) * | 2020-06-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Sample set acquisition method and device, computer equipment and storage medium |
CN112819023B (en) * | 2020-06-11 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Sample set acquisition method, device, computer equipment and storage medium |
CN112100372A (en) * | 2020-08-20 | 2020-12-18 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Head news prediction classification method |
CN112131384A (en) * | 2020-08-27 | 2020-12-25 | 科航(苏州)信息科技有限公司 | News classification method and computer-readable storage medium |
CN112201225A (en) * | 2020-09-30 | 2021-01-08 | 北京大米科技有限公司 | Corpus obtaining method and device, readable storage medium and electronic equipment |
CN112201225B (en) * | 2020-09-30 | 2024-02-02 | 北京大米科技有限公司 | Corpus acquisition method and device, readable storage medium and electronic equipment |
CN113127595B (en) * | 2021-04-26 | 2022-08-16 | 数库(上海)科技有限公司 | Method, device, equipment and storage medium for extracting viewpoint details of research and report abstract |
CN113127595A (en) * | 2021-04-26 | 2021-07-16 | 数库(上海)科技有限公司 | Method, device, equipment and storage medium for extracting viewpoint details of research and report abstract |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107315797A (en) | A kind of Internet news is obtained and text emotion forecasting system | |
Gupta et al. | Study of Twitter sentiment analysis using machine learning algorithms on Python | |
Ahuja et al. | The impact of features extraction on the sentiment analysis | |
CN104951548B (en) | A kind of computational methods and system of negative public sentiment index | |
WO2021051518A1 (en) | Text data classification method and apparatus based on neural network model, and storage medium | |
CN107193801A (en) | A kind of short text characteristic optimization and sentiment analysis method based on depth belief network | |
CN110134792B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN109255012B (en) | Method and device for machine reading understanding and candidate data set size reduction | |
CN111767716A (en) | Method and device for determining enterprise multilevel industry information and computer equipment | |
CN110795525A (en) | Text structuring method and device, electronic equipment and computer readable storage medium | |
CN112256861B (en) | Rumor detection method based on search engine return result and electronic device | |
CN109670014A (en) | A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning | |
WO2012096388A1 (en) | Unexpectedness determination system, unexpectedness determination method, and program | |
KR20200007713A (en) | Method and Apparatus for determining a topic based on sentiment analysis | |
CN115796181A (en) | Text relation extraction method for chemical field | |
CN108228612B (en) | Method and device for extracting network event keywords and emotional tendency | |
CN115017303A (en) | Method, computing device and medium for enterprise risk assessment based on news text | |
CN110134777A (en) | Problem De-weight method, device, electronic equipment and computer readable storage medium | |
Chun et al. | Detecting Political Bias Trolls in Twitter Data. | |
CN115329085A (en) | Social robot classification method and system | |
Lim et al. | Examining machine learning techniques in business news headline sentiment analysis | |
Asha et al. | Fake news detection using n-gram analysis and machine learning algorithms | |
Al Mostakim et al. | Bangla content categorization using text based supervised learning methods | |
Wambsganss et al. | Improving Explainability and Accuracy through Feature Engineering: A Taxonomy of Features in NLP-based Machine Learning. | |
Anjum et al. | Exploring humor in natural language processing: a comprehensive review of JOKER tasks at CLEF symposium 2023 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171103 |
|
RJ01 | Rejection of invention patent application after publication |