CN104537097A - Microblog public opinion monitoring system - Google Patents

Microblog public opinion monitoring system Download PDF

Info

Publication number
CN104537097A
CN104537097A CN201510009995.2A CN201510009995A CN104537097A CN 104537097 A CN104537097 A CN 104537097A CN 201510009995 A CN201510009995 A CN 201510009995A CN 104537097 A CN104537097 A CN 104537097A
Authority
CN
China
Prior art keywords
microblogging
public sentiment
module
feature phrase
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510009995.2A
Other languages
Chinese (zh)
Other versions
CN104537097B (en
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Keyi Culture Communication Co.,Ltd.
Original Assignee
BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510009995.2A priority Critical patent/CN104537097B/en
Publication of CN104537097A publication Critical patent/CN104537097A/en
Application granted granted Critical
Publication of CN104537097B publication Critical patent/CN104537097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a microblog public opinion monitoring system which comprises a public opinion popularization degree obtaining module, an intelligent crawler crawling module, an extracting and preprocessing module, a feature phrase filtering module, a public opinion analyzing module, an emotion tendency analyzing module and a user interaction module. According to the system, by means of the distributed cloud computing mode, microblog public opinion hot spots are obtained through various microblog public opinion monitoring algorithms, the obtained microblog public opinion hot spots are comprehensively judged, classified and assessed, and accordingly microblog public opinion hot spot topics are efficiently and accurately monitored.

Description

Microblogging public sentiment monitoring system
Technical field
The present invention relates to internet information processing technology field, specifically, relate to a kind of microblogging public sentiment monitoring system.
Background technology
Along with internet develop rapidly in the world, the network media has been acknowledged as " fourth media " after newspaper, broadcast, TV, and network becomes one of main carriers of reflection Social Public Feelings.
Network public-opinion is by transmission on Internet, what the public held some focus, focal issue in actual life has stronger influence power, tendentious emotion, attitude, suggestion, speech or viewpoint, and it realizes mainly through post comment and follow-up post, news, the blog Blog etc. on forum BBS and strengthened.Because internet has virtual, disguised, the feature such as diversity, perviousness and randomness, increasing netizen gladly expresses viewpoint, propagating thought by this channel.
Along with developing rapidly of Internet technology, the New Generation of Media being representative with microblog media etc. breaks control and the monopolization of information, on network, people freely express attitude and the suggestion of oneself, no longer so easily unconditionally accepted as the past, on the contrary, the Interest demands of different estate presents one after another, different thought viewpoint head-on crash.Concerning related governmental departments, how awareness network microblogging public sentiment promptly and accurately, strengthens the timely monitoring to network microblog public opinion, effectively guides, become a large difficult point of network microblog public sentiment management.In this case, the microblogging public sentiment monitoring system that construction can cover microblog data source is very necessary, this type systematic can for new microblogging media transmission environment, the impact that the focus analysis method of further further investigation microblogging public sentiment and new media bring, carries out abundant and perfect to the research of microblogging public sentiment.
Although there has been a lot of unit to propose some different solutions for the monitoring of network microblog public sentiment at present.But the technical matters needing those skilled in the art to solve how to improve the Efficiency and accuracy judging network microblog public feelings information.Because so far, not yet have comparatively efficiently, accurately for the network public-opinion monitoring system of microblog media data.
In prior art, the general data source of network public-opinion is generally various website or forum, then fewer for the monitoring system of microblogging public sentiment data separately; Even specially for the monitoring system of microblogging public sentiment data, also often due to a variety of causes accuracy rate or efficiency lower.And the present invention proposes a kind of specially for the monitoring system of the public sentiment data of micro blog network data source.
Compared with prior art, the present invention includes following advantage:
First, microblogging public sentiment monitoring system of the present invention is towards micro blog network resource, the microblog data gathered obtains through public sentiment temperature, intelligent reptile crawls, extract and the data processing step such as pre-service, feature phrase filtration, the analysis of public opinion, emotional orientation analysis, effectively improves the microblogging public sentiment data filtration efficiency of micro blog network data source;
Secondly, by distributed cloud computing mode, can excavate extensive image data, analyze, and microblogging public sentiment hot can be obtained based on multiple microblogging public sentiment monitoring algorithm module, to described microblogging public sentiment hot comprehensive descision classification, thus the discovery realized microblogging public sentiment hot topic and tracking, to the social network analysis of microblogging, analysis result is visual to be presented, for Party and government offices, units such as large enterprise and organize Timeliness coverage microblogging sensitive information, grasp microblogging public sentiment hot, hold microblogging public sentiment trend, the crisis of reply microblogging public sentiment provides robotization, systematization and scientific Informational support.Effectively improve described microblogging public sentiment monitoring system judge accuracy, the subsequent treatment for network microblog public feelings information provides more truly, accurately basis.
Summary of the invention
The present invention is exactly for the weak point in above-mentioned background technology, and the public sentiment monitoring system of a kind of microblog media proposed, it has higher accuracy rate.The object of the invention is to be achieved by the following technical measures.
The present invention proposes a kind of microblogging public sentiment monitoring system, this system comprises: public sentiment temperature acquisition module 1, intelligent reptile crawl module 2, extract and pretreatment module 3, feature phrase filtering module 4, the analysis of public opinion module 5, emotional orientation analysis module 6 and user interactive module 7, wherein
Public sentiment temperature acquisition module 1 screens for the public sentiment temperature weights according to microblogging the microblog page needing to carry out the analysis of public opinion;
Intelligence reptile crawls module 2 for the microblog data by crawling the microblog page of specifying in the fixed time, and analyzes crawled microblog data according to predefined event, filters out the microblog data irrelevant with the public sentiment that will monitor;
To extract and pretreatment module 3 carries out extracting and pre-service for the information in the microblog data that intelligent reptile crawled module 2 and obtain;
Feature phrase filtering module 4 is for carrying out filtering screening to the feature phrase in the microblog data after extraction and pretreatment module 3 process;
The analysis of public opinion module 5, for based on the microblog data after feature phrase filtering module 4 process, finds microblogging public sentiment hot;
Emotional orientation analysis module 6 is for performing emotional orientation analysis to found microblogging public sentiment hot;
User interactive module 7, for chart or report form display translation microblogging the analysis of public opinion result, realizes integration of user interaction functionality.
Preferably, described public sentiment temperature acquisition module 1 calculates the public sentiment temperature weights ρ of described microblogging, if ρ is greater than the threshold value T ρ preset, then using this microblogging as the analysis of public opinion Data Source and analyze foundation, particularly:
The clicks of browsing supposing microblogging is K1, and comment number is K2, and reply number is K3, and click support number is K4, and click inverse logarithm is K5, and forwarding number is K6, collection number be K7, β 1 ~ β 4 for preset and adjustable coefficient, then
ρ=(lg(K1) 3/4+0.03)*β1+(lg((K2) 2/3+(K3) 2/3)+0.02)*β2+(lg((K4) 1/2+(K5) 1/2)+0.01)*β3+(lg((K6) 1/3+(K7) 1/3)+0.005)*β4;
Wherein, β 1 ~ β 4 can be set to: β 1=0.4; β 2=0.2; β 3=0.1; β 4=0.1.
Preferably, described intelligent reptile crawls module 2 and performs following steps:
Step 2-1, by system predefined event, microblog page is analyzed, with this, link filter irrelevant with the predefined event that will monitor is fallen, remaining link relevant with predefined event, these link relevant with predefined event are remained, and them stored in waiting for the URL queue capturing the page;
Step 2-2, according to predefined search strategy, selects the URL corresponding to the page captured according to described predefined search strategy, repeats step 2-1 from described URL queue, when then stopping after the stop condition meeting systemic presupposition crawling process.
Preferably, described extraction and pretreatment module 3 perform following steps:
First, extract the information to the useful microblogging body part of microblogging the analysis of public opinion, microblogging body part is reconstructed, the representational microblog data of theme will be had and flock together;
Secondly, word segmentation processing, filtration stop words, named entity recognition, syntax parsing, part-of-speech tagging, emotion recognition, Feature Words extraction are carried out to described microblog data; Then feature phrase extraction is carried out.
Preferably, described feature phrase filtering module 4 performs following steps:
Step 4-1, duplicate removal is carried out to feature phrase, comprising: the repeated feature phrase that occurs and its number of times occurred in the text of record microblogging, filtering out the frequency of occurrences lower than repeating the repeated feature phrase of threshold value and length lower than the repeated feature phrase repeating threshold value;
Step 4-2, divides into groups to feature phrase, comprising: calculate the Similarity value between each feature phrase and other feature phrases, and Similarity value is divided into identical group higher than the feature phrase of similarity threshold; If the Similarity value between a feature phrase and every other feature phrase is all 0, then this feature phrase is filtered out; Particularly, one of following two steps can be selected to calculate the Similarity value Sims (X, Y) of described two feature phrases X, Y, then carry out feature phrase grouping:
Step 4-2-1:
First, suppose that the quantity of the sentence simultaneously occurring feature phrase X, Y is sum (XY); Only there is feature phrase X, do not occur that the quantity of the sentence of feature phrase Y is sum (X); Only there is feature phrase Y, do not occur that the quantity of the sentence of feature phrase X is sum (Y); Now, Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=log 2(sum(XY))/log 2(sum(X))+log 2(sum(XY))/log 2(sum(Y));
Secondly, if Sims (X, Y)≤threshold value TD1, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-2-2:
First, suppose that the number that two feature phrases X, Y comprise character is respectively m and n, make k get smaller value in m, n, respectively with the subphrase that i character before in Xi, Yi representative feature phrase X, Y forms, wherein, i=1,2 ..., k; Definition | Xi-Yi| represents the character quantity comprised in the longest common characters string of subphrase Xi, Yi, then Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=(|X1-Y1| 3+|X2-Y2| 3+…+|Xk-Yk| 3) 1/3
Secondly, if Sims (X, Y)≤threshold value TD2, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-3, carries out entropy filtration to feature phrase, comprising: calculate the entropy of feature phrase, filters out entropy lower than the feature phrase of the lower threshold value preset and the entropy feature phrase higher than the upper threshold value preset.
Preferably, described the analysis of public opinion module 5, for analyzing and finding microblogging public sentiment hot, comprises the steps:
First, use multiple microblogging focus to find submodule, obtain microblogging public sentiment hot by parallel MapReduce distributed computing, described microblogging focus finds that submodule comprises:
1) Single-Pass microblogging focus finds submodule 5.1, adopts single pass algorithm;
2) KNN microblogging focus finds submodule 5.2, adopts KNN arest neighbors sorting algorithm;
3) SVM microblogging focus finds submodule 5.3, adopts support vector machines algorithm;
4) K-means microblogging focus finds submodule 5.4, adopts K means Data Cluster Algorithm; And
5) SOM microblogging focus finds submodule 5.5, adopts Self-organizing Maps SOM neural network clustering algorithm;
Secondly, each microblogging focus above-mentioned is found that all microblogging public sentiment hot that submodule obtains respectively gather, carries out following classification and judge:
If the microblogging public sentiment hot obtained derives from above-mentioned more than three focuses and finds submodule, be then senior microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained derives from above-mentioned two focuses and finds submodule, be then intermediate microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained only derives from an above-mentioned focus and finds submodule, be then elementary microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
Finally, described senior microblogging public sentiment hot, intermediate microblogging public sentiment hot and elementary microblogging public sentiment hot are sent to described emotional orientation analysis module 6 successively.
Preferably, described emotional orientation analysis module 6, for performing the emotion tendentiousness of text analysis of microblogging, comprises the following steps:
Step 6-1, manually chooses the Chinese of some common emotion tendencies with English adjective, noun and verb with as initialization seed collection; Wherein, described initialization seed is concentrated, and adjectival quantity can be 50, and the quantity of noun and verb can be 100;
Step 6-2, is reduced to nominal original referents by pronouns with the relation of referring to all in the text of microblogging, to prevent failing to judge or misjudging of object in analytic process;
Step 6-3, in units of the sentence of microblogging, utilizes part-of-speech tagging POS and semantic character labeling SRL to analyze the sentence element of each sentence in microblogging, extracts the subjectivity word in each sentence;
Step 6-4, inputs the subjectivity word in each sentence successively, carries out emotion tendency automatic marking according to described subset to the subjectivity word in the sentence of microblogging; For cannot the subjectivity word of automatic marking, after its emotion tendency of artificial judgment, this subjectivity word be replenished described subset.
Preferably, described user interactive module 7 is for realizing integration of user interaction functionality, and the chart that can be formed or report comprise: microblogging public feelings information temperature seniority among brothers and sisters form, microblogging public sentiment early warning information distribution form, the distribution of microblogging public sentiment geography information form, microblogging public sentiment sentiment analysis form, microblogging public sentiment statistic form and microblogging public sentiment trend move towards analysis chart.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, technical scheme of the present invention is further detailed.Described accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.
Fig. 1 shows the functional structure chart of microblogging public sentiment monitoring system according to an embodiment of the invention.
Embodiment
By hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Described description is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and in order to above and other objects of the present invention, feature and advantage can be become apparent.
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
A kind of microblogging public sentiment monitoring system is protected in request of the present invention, and this system comprises: public sentiment temperature acquisition module, intelligent reptile crawl module, extraction and pretreatment module, feature phrase filtering module, the analysis of public opinion module, emotional orientation analysis module and user interactive module.Wherein, described the analysis of public opinion module is by distributed cloud computing mode, use multiple microblogging public sentiment monitoring algorithm submodule to obtain microblogging public sentiment hot, and comprehensive descision classification assessment is carried out to the microblogging public sentiment hot obtained, thus realize to microblogging public sentiment hot topic more efficient, monitor accurately.
Fig. 1 is the functional structure chart of microblogging public sentiment monitoring system according to an embodiment of the invention.
As shown in Figure 1, described microblogging public sentiment monitoring system comprises 7 modules, is respectively: public sentiment temperature acquisition module 1, intelligent reptile crawl module 2, extract and pretreatment module 3, feature phrase filtering module 4, the analysis of public opinion module 5, emotional orientation analysis module 6 and user interactive module 7.Wherein:
Public sentiment temperature acquisition module 1 screens for the public sentiment temperature weights according to microblogging the microblog page needing to carry out the analysis of public opinion;
Intelligence reptile crawls module 2 for the microblog data by crawling the microblog page of specifying in the fixed time, and analyzes crawled microblog data according to predefined event, filters out the microblog data irrelevant with the public sentiment that will monitor;
To extract and pretreatment module 3 carries out extracting and pre-service for the information in the microblog data that intelligent reptile crawled module 2 and obtain;
Feature phrase filtering module 4 is for carrying out filtering screening to the feature phrase in the microblog data after extraction and pretreatment module 3 process;
The analysis of public opinion module 5, for based on the microblog data after feature phrase filtering module 4 process, finds microblogging public sentiment hot;
Emotional orientation analysis module 6 is for performing emotional orientation analysis to found microblogging public sentiment hot;
User interactive module 7, for chart or report form display translation microblogging the analysis of public opinion result, realizes integration of user interaction functionality.
Particularly, described public sentiment temperature acquisition module 1 calculates the public sentiment temperature weights ρ of described microblogging, if ρ is greater than the threshold value T ρ preset, then using this microblogging as the analysis of public opinion Data Source and analyze foundation, particularly:
The clicks of browsing supposing microblogging is K1, and comment number is K2, and reply number is K3, and click support number is K4, and click inverse logarithm is K5, and forwarding number is K6, collection number be K7, β 1 ~ β 4 for preset and adjustable coefficient, then
ρ=(lg(K1) 3/4+0.03)*β1+(lg((K2) 2/3+(K3) 2/3)+0.02)*β2+(lg((K4) 1/2+(K5) 1/2)+0.01)*β3+(lg((K6) 1/3+(K7) 1/3)+0.005)*β4;
Preferably, above-mentioned factor beta 1 ~ β 4 can be set to: β 1=0.4; β 2=0.2; β 3=0.1; β 4=0.1.
Particularly, described intelligent reptile crawls module 2 and performs following steps:
Step 2-1, by system predefined event, microblog page is analyzed, with this, link filter irrelevant with the predefined event that will monitor is fallen, remaining link relevant with predefined event, these link relevant with predefined event are remained, and them stored in waiting for the URL queue capturing the page;
Step 2-2, according to predefined search strategy, selects the URL corresponding to the page captured according to described predefined search strategy, repeats step 2-1 from described URL queue, when then stopping after the stop condition meeting systemic presupposition crawling process.
Particularly, described extraction and pretreatment module 3 perform following steps:
First, extract the information to the useful microblogging body part of microblogging the analysis of public opinion, microblogging body part is reconstructed, the representational microblog data of theme will be had and flock together;
Secondly, word segmentation processing, filtration stop words, named entity recognition, syntax parsing, part-of-speech tagging, emotion recognition, Feature Words extraction are carried out to described microblog data; Then feature phrase extraction is carried out.
Particularly, described feature phrase filtering module 4 performs following steps:
Step 4-1, duplicate removal is carried out to feature phrase, comprising: the repeated feature phrase that occurs and its number of times occurred in the text of record microblogging, filtering out the frequency of occurrences lower than repeating the repeated feature phrase of threshold value and length lower than the repeated feature phrase repeating threshold value;
Step 4-2, divides into groups to feature phrase, comprising: calculate the Similarity value between each feature phrase and other feature phrases, and Similarity value is divided into identical group higher than the feature phrase of similarity threshold; If the Similarity value between a feature phrase and every other feature phrase is all 0, then this feature phrase is filtered out; Particularly, one of following two steps can be selected to calculate the Similarity value Sims (X, Y) of described two feature phrases X, Y, then carry out feature phrase grouping:
Step 4-2-1:
First, suppose that the quantity of the sentence simultaneously occurring feature phrase X, Y is sum (XY); Only there is feature phrase X, do not occur that the quantity of the sentence of feature phrase Y is sum (X); Only there is feature phrase Y, do not occur that the quantity of the sentence of feature phrase X is sum (Y); Now, Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=log 2(sum(XY))/log 2(sum(X))+log 2(sum(XY))/log 2(sum(Y));
Secondly, if Sims (X, Y)≤threshold value TD1, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-2-2:
First, suppose that the number that two feature phrases X, Y comprise character is respectively m and n, make k get smaller value in m, n, respectively with the subphrase that i character before in Xi, Yi representative feature phrase X, Y forms, wherein, i=1,2 ..., k; Definition | Xi-Yi| represents the character quantity comprised in the longest common characters string of subphrase Xi, Yi, then Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=(|X1-Y1| 3+|X2-Y2| 3+…+|Xk-Yk| 3) 1/3
Secondly, if Sims (X, Y)≤threshold value TD2, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-3, carries out entropy filtration to feature phrase, comprising: calculate the entropy of feature phrase, filters out entropy lower than the feature phrase of the lower threshold value preset and the entropy feature phrase higher than the upper threshold value preset.
Particularly, described the analysis of public opinion module 5 is for analyzing and finding microblogging public sentiment hot, and the principle of work of described the analysis of public opinion module 5 is as follows:
The present invention adopts distributed cloud computing mode, can excavate, analyze the extensive microblog data that gathers; And microblogging public sentiment hot can be obtained based on multiple public sentiment monitoring algorithm module; to described microblogging public sentiment hot comprehensive descision classification; thus realize the discovery of microblogging public sentiment hot topic and tracking, social network analysis to microblogging; analysis result is visual to be presented, for the unit such as Party and government offices, large enterprise and organize Timeliness coverage microblogging sensitive information, grasp microblogging public sentiment hot, hold microblogging public sentiment trend, the crisis of reply microblogging public sentiment provides robotization, systematization and scientific Informational support.Effectively improve described microblogging public sentiment monitoring system judge accuracy, the subsequent treatment for network microblog public feelings information provides more truly, accurately basis.Particularly:
By microblog data and the analysis result of distributed storage layer storage of collected, described distributed storage layer realizes based on HDFS;
And at Distributed Calculation layer, adopt MapReduce parallel calculating method to realize parallelization and calculate;
Optimized by the storage of HDFS file and transmission optimization, MapReduce parallel computation, achieve the optimization of the microblogging public sentiment monitoring of magnanimity, and achieve stable, efficient large data store optimization, make the microblogging public sentiment data query processing optimization of magnanimity, be with good expansibility, reliability, security.This system, based on cloud platform, has good response speed, supports that massive micro-blog data analysis is served with excavation.
Described the analysis of public opinion module 5 is analyzed and is found that the step of microblogging public sentiment hot is as follows:
First, use multiple microblogging focus to find submodule, obtain microblogging public sentiment hot by parallel distributed computing, described microblogging focus finds that submodule comprises:
1) Single-Pass microblogging focus finds submodule 5.1, and this submodule adopts the single pass algorithm based on MapReduce;
2) KNN microblogging focus finds submodule 5.2, and this submodule adopts the KNN arest neighbors sorting algorithm based on MapReduce;
3) SVM microblogging focus finds submodule 5.3, and this submodule adopts the support vector machines algorithm based on MapReduce;
4) K-means microblogging focus finds submodule 5.4, and this submodule adopts the average cluster of K (K-means) algorithm based on MapReduce; And
5) SOM microblogging focus finds submodule 5.5, and this submodule adopts the Self-organizing Maps SOM neural network clustering algorithm based on MapReduce;
Secondly, each microblogging focus above-mentioned is found that all microblogging public sentiment hot that submodule obtains respectively gather, carries out following classification and judge:
If the microblogging public sentiment hot obtained derives from above-mentioned more than three focuses and finds submodule, be then senior microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained derives from above-mentioned two focuses and finds submodule, be then intermediate microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained only derives from an above-mentioned focus and finds submodule, be then elementary microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
Finally, described senior microblogging public sentiment hot, intermediate microblogging public sentiment hot and elementary microblogging public sentiment hot are sent to described emotional orientation analysis module 6 successively.
The algorithm that above-mentioned focus discovery submodule 5.1 ~ 5.5 adopts all adopts the general-purpose algorithm of this area in general sense.Therefore improvements of the present invention are not above-mentioned several algorithm itself.Because in existing microblogging public sentiment monitoring system, a kind of microblogging public sentiment hot be often only the use of wherein finds algorithm, and not yet find above-mentioned multiple microblogging public sentiment hot to find that algorithm uses simultaneously, and the system of grade separation is carried out to the result of concentrated algorithm.And, although microblogging public sentiment monitoring system of the present invention employs multiple public sentiment hot and finds algorithm, but because system of the present invention have employed the distributed structure/architecture based on cloud computing, therefore the expense being difficult to bear can't be brought, and due to the combination of various ways, substantially increase the accuracy of microblogging public sentiment monitoring system, achieve good technique effect.
Particularly, described emotional orientation analysis module 6, for performing the emotion tendentiousness of text analysis of microblogging, comprises the following steps:
Step 6-1, manually chooses the Chinese of some common emotion tendencies with English adjective, noun and verb with as initialization seed collection; As preferably, described initialization seed is concentrated, and adjectival quantity can be 50, and the quantity of noun and verb can be 100;
Step 6-2, is reduced to nominal original referents by pronouns with the relation of referring to all in the text of microblogging, to prevent failing to judge or misjudging of object in analytic process;
Step 6-3, in units of the sentence of microblogging, utilizes part-of-speech tagging POS and semantic character labeling SRL to analyze the sentence element of each sentence in microblogging, extracts the subjectivity word in each sentence;
Step 6-4, inputs the subjectivity word in each sentence successively, carries out emotion tendency automatic marking according to described subset to the subjectivity word in the sentence of microblogging; For cannot the subjectivity word of automatic marking, after its emotion tendency of artificial judgment, this subjectivity word be replenished described subset.
Particularly, described user interactive module 7 can be user formed chart or report comprise: microblogging public feelings information temperature seniority among brothers and sisters form, microblogging public sentiment early warning information distribution form, microblogging public sentiment geography information distribution form, microblogging public sentiment sentiment analysis form, microblogging public sentiment statistic form and microblogging public sentiment trend move towards analysis chart.
System described in this instructions and the embodiment of comprising modules thereof are only schematic, and some or all of module wherein can be selected according to the actual needs to realize the object of embodiment of the present invention scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
In sum; be only the present invention's preferably embodiment, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (8)

1. a microblogging public sentiment monitoring system, this system comprises: public sentiment temperature acquisition module (1), intelligent reptile crawl module (2), extract and pretreatment module (3), feature phrase filtering module (4), the analysis of public opinion module (5), emotional orientation analysis module (6) and user interactive module (7), wherein
Public sentiment temperature acquisition module (1) screens for the public sentiment temperature weights according to microblogging the microblog page needing to carry out the analysis of public opinion;
Intelligence reptile crawls module (2) for the microblog data by crawling the microblog page of specifying in the fixed time, and analyzes crawled microblog data according to predefined event, filters out the microblog data irrelevant with the public sentiment that will monitor;
To extract and pretreatment module (3) carries out extracting and pre-service for the information in the microblog data that intelligent reptile crawled module (2) and obtain;
Feature phrase filtering module (4) is for carrying out filtering screening to the feature phrase in the microblog data after extraction and pretreatment module (3) process;
The analysis of public opinion module (5), for based on the microblog data after feature phrase filtering module (4) process, finds microblogging public sentiment hot;
Emotional orientation analysis module (6) is for performing emotional orientation analysis to found microblogging public sentiment hot;
User interactive module (7), for chart or report form display translation microblogging the analysis of public opinion result, realizes integration of user interaction functionality.
2. microblogging public sentiment monitoring system according to claim 1, is characterized in that:
Described public sentiment temperature acquisition module (1) calculates the public sentiment temperature weights ρ of described microblogging, if ρ is greater than the threshold value T ρ preset, then using this microblogging as the analysis of public opinion Data Source and analyze foundation, particularly:
The clicks of browsing supposing microblogging is K1, and comment number is K2, and reply number is K3, and click support number is K4, and click inverse logarithm is K5, and forwarding number is K6, collection number be K7, β 1 ~ β 4 for preset and adjustable coefficient, then
ρ=(lg(K1) 3/4+0.03)*β1+(lg((K2) 2/3+(K3) 2/3)+0.02)*β2+(lg((K4) 1/2+(K5) 1/2)+0.01)*β3+(lg((K6) 1/3+(K7) 1/3)+0.005)*β4;
Wherein, β 1 ~ β 4 can be set to: β 1=0.4; β 2=0.2; β 3=0.1; β 4=0.1.
3. microblogging public sentiment monitoring system according to claim 2, is characterized in that:
Described intelligent reptile crawls module (2) and performs following steps:
Step 2-1, by system predefined event, microblog page is analyzed, with this, link filter irrelevant with the predefined event that will monitor is fallen, remaining link relevant with predefined event, these link relevant with predefined event are remained, and them stored in waiting for the URL queue capturing the page;
Step 2-2, according to predefined search strategy, selects the URL corresponding to the page captured according to described predefined search strategy, repeats step 2-1 from described URL queue, when then stopping after the stop condition meeting systemic presupposition crawling process.
4. microblogging public sentiment monitoring system according to claim 3, is characterized in that:
Described extraction and pretreatment module (3) perform following steps:
First, extract the information to the useful microblogging body part of microblogging the analysis of public opinion, microblogging body part is reconstructed, the representational microblog data of theme will be had and flock together;
Secondly, word segmentation processing, filtration stop words, named entity recognition, syntax parsing, part-of-speech tagging, emotion recognition, Feature Words extraction are carried out to described microblog data; Then feature phrase extraction is carried out.
5. microblogging public sentiment monitoring system according to claim 4, is characterized in that:
Described feature phrase filtering module (4) performs following steps:
Step 4-1, duplicate removal is carried out to feature phrase, comprising: the repeated feature phrase that occurs and its number of times occurred in the text of record microblogging, filtering out the frequency of occurrences lower than repeating the repeated feature phrase of threshold value and length lower than the repeated feature phrase repeating threshold value;
Step 4-2, divides into groups to feature phrase, comprising: calculate the Similarity value between each feature phrase and other feature phrases, and Similarity value is divided into identical group higher than the feature phrase of similarity threshold; If the Similarity value between a feature phrase and every other feature phrase is all 0, then this feature phrase is filtered out; Particularly, one of following two steps can be selected to calculate the Similarity value Sims (X, Y) of described two feature phrases X, Y, then carry out feature phrase grouping:
Step 4-2-1:
First, suppose that the quantity of the sentence simultaneously occurring feature phrase X, Y is sum (XY); Only there is feature phrase X, do not occur that the quantity of the sentence of feature phrase Y is sum (X); Only there is feature phrase Y, do not occur that the quantity of the sentence of feature phrase X is sum (Y); Now, Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=log 2(sum(XY))/log 2(sum(X))+log 2(sum(XY))/log 2(sum(Y));
Secondly, if Sims (X, Y)≤threshold value TD1, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-2-2:
First, suppose that the number that two feature phrases X, Y comprise character is respectively m and n, make k get smaller value in m, n, respectively with the subphrase that i character before in Xi, Yi representative feature phrase X, Y forms, wherein, i=1,2 ..., k; Definition | Xi-Yi| represents the character quantity comprised in the longest common characters string of subphrase Xi, Yi, then Similarity value Sims (X, the Y) computing formula of feature phrase X, Y is as follows:
Sims(X,Y)=(|X1-Y1| 3+|X2-Y2| 3+…+|Xk-Yk| 3) 1/3
Secondly, if Sims (X, Y)≤threshold value TD2, then feature phrase Y is divided into the group at feature phrase X place;
Step 4-3, carries out entropy filtration to feature phrase, comprising: calculate the entropy of feature phrase, filters out entropy lower than the feature phrase of the lower threshold value preset and the entropy feature phrase higher than the upper threshold value preset.
6. microblogging public sentiment monitoring system according to claim 5, is characterized in that:
Described the analysis of public opinion module (5), for analyzing and finding microblogging public sentiment hot, comprises the steps:
First, use multiple microblogging focus to find submodule, obtain microblogging public sentiment hot by parallel MapReduce distributed computing, described microblogging focus finds that submodule comprises:
1) Single-Pass microblogging focus finds submodule (5.1), adopts single pass algorithm;
2) KNN microblogging focus finds submodule (5.2), adopts KNN arest neighbors sorting algorithm;
3) SVM microblogging focus finds submodule (5.3), adopts support vector machines algorithm;
4) K-means microblogging focus finds submodule (5.4), adopts K means Data Cluster Algorithm; And
5) SOM microblogging focus finds submodule (5.5), adopts Self-organizing Maps SOM neural network clustering algorithm;
Secondly, each microblogging focus above-mentioned is found that all microblogging public sentiment hot that submodule obtains respectively gather, carries out following classification and judge:
If the microblogging public sentiment hot obtained derives from above-mentioned more than three focuses and finds submodule, be then senior microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained derives from above-mentioned two focuses and finds submodule, be then intermediate microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
If the microblogging public sentiment hot obtained only derives from an above-mentioned focus and finds submodule, be then elementary microblogging public sentiment hot by the category label of this microblogging public sentiment hot;
Finally, described senior microblogging public sentiment hot, intermediate microblogging public sentiment hot and elementary microblogging public sentiment hot are sent to described emotional orientation analysis module (6) successively.
7. microblogging public sentiment monitoring system according to claim 6, is characterized in that:
Described emotional orientation analysis module (6), for performing the emotion tendentiousness of text analysis of microblogging, comprises the following steps:
Step 6-1, manually chooses the Chinese of some common emotion tendencies with English adjective, noun and verb with as initialization seed collection; Wherein, described initialization seed is concentrated, and adjectival quantity can be 50, and the quantity of noun and verb can be 100;
Step 6-2, is reduced to nominal original referents by pronouns with the relation of referring to all in the text of microblogging, to prevent failing to judge or misjudging of object in analytic process;
Step 6-3, in units of the sentence of microblogging, utilizes part-of-speech tagging POS and semantic character labeling SRL to analyze the sentence element of each sentence in microblogging, extracts the subjectivity word in each sentence;
Step 6-4, inputs the subjectivity word in each sentence successively, carries out emotion tendency automatic marking according to described subset to the subjectivity word in the sentence of microblogging; For cannot the subjectivity word of automatic marking, after its emotion tendency of artificial judgment, this subjectivity word be replenished described subset.
8. microblogging public sentiment monitoring system according to claim 7, is characterized in that:
Described user interactive module (7) is for realizing integration of user interaction functionality, and the chart that can be formed or report comprise: microblogging public feelings information temperature seniority among brothers and sisters form, microblogging public sentiment early warning information distribution form, the distribution of microblogging public sentiment geography information form, microblogging public sentiment sentiment analysis form, microblogging public sentiment statistic form and microblogging public sentiment trend move towards analysis chart.
CN201510009995.2A 2015-01-09 2015-01-09 Microblogging public sentiment monitoring system Active CN104537097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510009995.2A CN104537097B (en) 2015-01-09 2015-01-09 Microblogging public sentiment monitoring system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510009995.2A CN104537097B (en) 2015-01-09 2015-01-09 Microblogging public sentiment monitoring system

Publications (2)

Publication Number Publication Date
CN104537097A true CN104537097A (en) 2015-04-22
CN104537097B CN104537097B (en) 2017-08-11

Family

ID=52852625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510009995.2A Active CN104537097B (en) 2015-01-09 2015-01-09 Microblogging public sentiment monitoring system

Country Status (1)

Country Link
CN (1) CN104537097B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105491117A (en) * 2015-11-26 2016-04-13 北京航空航天大学 Flow chart data processing system and method for real time data analysis
CN105630970A (en) * 2015-12-24 2016-06-01 哈尔滨工业大学 Social media data processing system and method
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
WO2016206395A1 (en) * 2015-06-25 2016-12-29 中兴通讯股份有限公司 Weekly report information processing method and device
CN106339463A (en) * 2016-08-26 2017-01-18 中国传媒大学 Network public opinion early-warning system based on logistic model and early-warning method thereof
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN106777040A (en) * 2016-12-09 2017-05-31 厦门大学 A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm
CN106778895A (en) * 2016-12-29 2017-05-31 西安工程大学 Core k Mean Methods based on local density and single pass
CN106909541A (en) * 2015-12-23 2017-06-30 神州数码信息系统有限公司 A kind of automatic identification of cross-cutting public public sentiment, classify and the system for reporting
CN107220236A (en) * 2017-05-23 2017-09-29 武汉朱雀闻天科技有限公司 It is a kind of to determine the doubtful naked method and device for borrowing student
CN107229689A (en) * 2017-05-19 2017-10-03 四川新网银行股份有限公司 A kind of method that microblogging public sentiment risk is studied and judged
CN107704513A (en) * 2017-08-31 2018-02-16 四川长虹电器股份有限公司 A kind of network public-opinion monitoring method and system
CN107894994A (en) * 2017-10-18 2018-04-10 北京京东尚科信息技术有限公司 A kind of method and apparatus for detecting much-talked-about topic classification
CN107943800A (en) * 2016-10-09 2018-04-20 郑州大学 A kind of microblog topic public sentiment calculates the method with analysis
CN108052507A (en) * 2017-12-29 2018-05-18 浙江大学城市学院 A kind of city management information the analysis of public opinion system and method
CN108073604A (en) * 2016-11-10 2018-05-25 北京国双科技有限公司 Text handling method and device
WO2018184518A1 (en) * 2017-04-07 2018-10-11 平安科技(深圳)有限公司 Microblog data processing method and device, computer device and storage medium
CN109325860A (en) * 2018-08-29 2019-02-12 中国科学院自动化研究所 Network public-opinion detection method and system for overseas investment Risk-warning
CN109597952A (en) * 2018-12-10 2019-04-09 江苏满运软件科技有限公司 Web information processing method, system, electronic equipment and storage medium
CN109635192A (en) * 2018-12-05 2019-04-16 宁波深擎信息科技有限公司 Magnanimity information temperature seniority among brothers and sisters update method and platform towards micro services
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN111310476A (en) * 2020-02-21 2020-06-19 山东大学 Public opinion monitoring method and system using aspect-based emotion analysis method
CN111444404A (en) * 2020-03-19 2020-07-24 杭州叙简科技股份有限公司 Social public opinion monitoring system based on microblog and monitoring method thereof
CN111797333A (en) * 2020-06-04 2020-10-20 南京擎盾信息科技有限公司 Public opinion spreading task display method and device
CN111859230A (en) * 2019-04-30 2020-10-30 北京智慧星光信息技术有限公司 Control method for monitoring hot spot trend of internet information
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248434A1 (en) * 2008-03-31 2009-10-01 Datanetics Ltd. Analyzing transactional data
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103092950A (en) * 2013-01-15 2013-05-08 重庆邮电大学 Online public opinion geographical location real time monitoring system and method
CN103544294A (en) * 2013-10-30 2014-01-29 北京京东尚科信息技术有限公司 Keyword popularity automatic control method
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248434A1 (en) * 2008-03-31 2009-10-01 Datanetics Ltd. Analyzing transactional data
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system
CN103092950A (en) * 2013-01-15 2013-05-08 重庆邮电大学 Online public opinion geographical location real time monitoring system and method
CN103544294A (en) * 2013-10-30 2014-01-29 北京京东尚科信息技术有限公司 Keyword popularity automatic control method

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN104915386B (en) * 2015-05-25 2018-04-27 中国科学院自动化研究所 A kind of short text clustering method based on deep semantic feature learning
WO2016206395A1 (en) * 2015-06-25 2016-12-29 中兴通讯股份有限公司 Weekly report information processing method and device
CN105491117A (en) * 2015-11-26 2016-04-13 北京航空航天大学 Flow chart data processing system and method for real time data analysis
CN105491117B (en) * 2015-11-26 2018-12-21 北京航空航天大学 Streaming diagram data processing system and method towards real-time data analysis
CN106909541A (en) * 2015-12-23 2017-06-30 神州数码信息系统有限公司 A kind of automatic identification of cross-cutting public public sentiment, classify and the system for reporting
CN105630970A (en) * 2015-12-24 2016-06-01 哈尔滨工业大学 Social media data processing system and method
CN106230809B (en) * 2016-07-27 2019-11-19 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method and system based on URL
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
CN106339463A (en) * 2016-08-26 2017-01-18 中国传媒大学 Network public opinion early-warning system based on logistic model and early-warning method thereof
CN107943800A (en) * 2016-10-09 2018-04-20 郑州大学 A kind of microblog topic public sentiment calculates the method with analysis
CN108073604A (en) * 2016-11-10 2018-05-25 北京国双科技有限公司 Text handling method and device
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN106598944B (en) * 2016-11-25 2019-03-19 中国民航大学 A kind of civil aviaton's security public sentiment sentiment analysis method
CN106777040A (en) * 2016-12-09 2017-05-31 厦门大学 A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm
CN106778895A (en) * 2016-12-29 2017-05-31 西安工程大学 Core k Mean Methods based on local density and single pass
WO2018184518A1 (en) * 2017-04-07 2018-10-11 平安科技(深圳)有限公司 Microblog data processing method and device, computer device and storage medium
CN107229689A (en) * 2017-05-19 2017-10-03 四川新网银行股份有限公司 A kind of method that microblogging public sentiment risk is studied and judged
CN107220236A (en) * 2017-05-23 2017-09-29 武汉朱雀闻天科技有限公司 It is a kind of to determine the doubtful naked method and device for borrowing student
CN107704513A (en) * 2017-08-31 2018-02-16 四川长虹电器股份有限公司 A kind of network public-opinion monitoring method and system
CN107894994A (en) * 2017-10-18 2018-04-10 北京京东尚科信息技术有限公司 A kind of method and apparatus for detecting much-talked-about topic classification
CN108052507A (en) * 2017-12-29 2018-05-18 浙江大学城市学院 A kind of city management information the analysis of public opinion system and method
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN109325860A (en) * 2018-08-29 2019-02-12 中国科学院自动化研究所 Network public-opinion detection method and system for overseas investment Risk-warning
CN109635192A (en) * 2018-12-05 2019-04-16 宁波深擎信息科技有限公司 Magnanimity information temperature seniority among brothers and sisters update method and platform towards micro services
CN109597952A (en) * 2018-12-10 2019-04-09 江苏满运软件科技有限公司 Web information processing method, system, electronic equipment and storage medium
CN111859230A (en) * 2019-04-30 2020-10-30 北京智慧星光信息技术有限公司 Control method for monitoring hot spot trend of internet information
CN111859230B (en) * 2019-04-30 2024-02-06 北京智慧星光信息技术有限公司 Control method for monitoring hot spot trend of internet information
CN111310476A (en) * 2020-02-21 2020-06-19 山东大学 Public opinion monitoring method and system using aspect-based emotion analysis method
CN111444404A (en) * 2020-03-19 2020-07-24 杭州叙简科技股份有限公司 Social public opinion monitoring system based on microblog and monitoring method thereof
CN111797333A (en) * 2020-06-04 2020-10-20 南京擎盾信息科技有限公司 Public opinion spreading task display method and device
CN111797333B (en) * 2020-06-04 2021-04-20 南京擎盾信息科技有限公司 Public opinion spreading task display method and device
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system
CN114707045B (en) * 2022-03-23 2023-09-26 江苏悉宁科技有限公司 Public opinion monitoring method and system based on big data

Also Published As

Publication number Publication date
CN104537097B (en) 2017-08-11

Similar Documents

Publication Publication Date Title
CN104537097A (en) Microblog public opinion monitoring system
CN104504150A (en) News public opinion monitoring system
Hasan et al. Real-time event detection from the Twitter data stream using the TwitterNews+ Framework
US20190121806A1 (en) Managing a search
Salloum et al. Mining text in news channels: a case study from Facebook
US9147154B2 (en) Classifying resources using a deep network
US9229977B2 (en) Real-time and adaptive data mining
CN104504151A (en) Public opinion monitoring system of Wechat
US20100306144A1 (en) System and method for classifying information
CN104077377A (en) Method and device for finding network public opinion hotspots based on network article attributes
CN105447081A (en) Cloud platform-oriented government affair and public opinion monitoring method
CN101814083A (en) Automatic webpage classification method and system
Win et al. Target oriented tweets monitoring system during natural disasters
Hasan et al. TwitterNews+: a framework for real time event detection from the Twitter data stream
Alghamdi et al. Topic detections in Arabic dark websites using improved vector space model
Nikhil et al. A survey on text mining and sentiment analysis for unstructured web data
CN111625715A (en) Information extraction method and device, electronic equipment and storage medium
US9563666B2 (en) Unsupervised detection and categorization of word clusters in text data
US11334592B2 (en) Self-orchestrated system for extraction, analysis, and presentation of entity data
Sreenivasulu et al. A survey on event detection methods on various social media
CN103488741A (en) Online semantic excavation system of Chinese polysemic words and based on uniform resource locator (URL)
Li et al. Event Detection from Social Media Stream: Methods, Datasets and Opportunities
Aut et al. Social media based hate speech detection using machine learning
Pujar et al. A systematic review web content mining tools and its applications
CN106776654B (en) Data searching method and device

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211230

Address after: 201100 room e3004, third floor, building 1, No. 1755, Hongmei South Road, Minhang District, Shanghai

Patentee after: Shanghai Keyi Culture Communication Co.,Ltd.

Address before: 610000 No. 1, No. 3 Shen Xian Nan Road, Chengdu high tech Zone, Sichuan, China.

Patentee before: CHENGDU BLTSAFE INFORMATION TECHNOLOGY Co.,Ltd.