CN109902237A - System for analyzing and predicting netizen interest in forum - Google Patents
System for analyzing and predicting netizen interest in forum Download PDFInfo
- Publication number
- CN109902237A CN109902237A CN201910133585.7A CN201910133585A CN109902237A CN 109902237 A CN109902237 A CN 109902237A CN 201910133585 A CN201910133585 A CN 201910133585A CN 109902237 A CN109902237 A CN 109902237A
- Authority
- CN
- China
- Prior art keywords
- netizen
- interest
- analysis
- association
- analyzing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of system for analyzing and predicting netizen interest in forum, characterized by comprising: data storage layer is used for structured data and unstructured data;Intelligent content analysis layer, for the data in the data storage layer to be made subject classification, the extraction of hot topic and tracking, proneness analysis;Association analysis layer, according to the subject classification and the hot topic, successively netizen is associated with content, netizen is associated with netizen for progress;Interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the proneness analysis, progress netizen interest analysis prediction.According to this system, it can effectively solve the problem that the depth to forum netizen interest analysis excavates demand, the implementation suitable for Internet public opinion analysis system.
Description
Technical field
The present invention is a kind of analytical technology of Network character, and in particular to a kind of analyzing and predicting netizen interest in forum system
System, belongs to data mining technology field.
Background technique
With the development of Network Information, there is a large amount of virtual community, form a Network character,
Network forum is exactly one such principal mode.In traditional socialization warp, have for a long time a set of effective
The management system of people and group, but Network character this be a new things, it not only has freely makes a speech on the net
Feature also has the characteristics that netizen's anonymity, increases the difficulty of supervision.Currently, network public-opinion, which has become one, to neglect
Depending on aspect, and network forum better reflects the characteristics of network is assembled a crowd and compares with other network applications, can more reflect network carriage
Mood gesture.Therefore, for the chief motivation amount of public sentiment in Websites --- the analysis of netizen is of great significance.By right
The analysis of netizen interest in forum can accurately control the main trend that network public-opinion situation develops in certain time period.
Although there is preferable development prospect to the netizen interest analysis based on forum and apply future, also occur some
Relevant system, still, the system in the field remains a series of problem at present, mainly have it is several under it is several:
1. simple netizen and the association analysis published an article, lack to netizen participate in subject under discussion, hot topic, content type when
Between network analysis in span so that the analysis to individual netizen lacks three-dimensional sense.
2. activity of the netizen on network often has ignored this often with the property of group, current system and method
Point.Network public-opinion is essentially all to be formed under the drive of network community, and individual netizen is hardly formed one strength,
Therefore, it is necessary to deep analysis is carried out to network crowd.
3. current system and method are analyzed instant, local data, still, the interest of netizen is not
Independent, they are often associated with big network environment, network Development process, and current system and method lack a net
People's models repository, for being analyzed and predicted on the whole to netizen interest.
It can be seen that the analysis of netizen interest is very important in network forum, to the analysis of netizen interest in data
It digs according to the requirement for above having depth, and existing system association, netizen's models repository between netizen and content association, netizen
All there is defects, are also unable to satisfy the profound of netizen interest analysis and require.
Summary of the invention
The purpose of the present invention is primarily directed in the system of the existing Network character netizen interest analysis based on forum
Existing defect proposes a kind of using association, netizen's models repository between netizen and content association, netizen as technical foundation reality
The existing system for analyzing and predicting netizen interest in forum based on data mining, it mainly passes through netizen and hot topic, subject under discussion, content
Classification, proneness analysis, relationship analysis between netizen and netizen, the accumulation etc. of long-term netizen's models repository, depth are dug
Origin and the development of netizen interest have been dug, and has been made a prediction, has realized the profound analysis of forum netizen interest.
It is of the present invention using association, netizen's models repository between the association of netizen and content, netizen as technical foundation
Realize based on the system for analyzing and predicting netizen interest in forum of data mining by data storage layer, intelligent content analysis layer, association
Analysis layer and interest analysis layer composition.
The data storage layer is responsible for storage configuration data and unstructured data, the storage of data in local system
It with index is completed in this layer.For structural data, such as netizen ID, time, the data storage layer is deposited in
In general business database, here using oracle;And for unstructured data, mainly content of text, if
Be stored in general business database, with the increase of data volume, indexing performance will be drastically reduced, therefore, we by its
It is placed in the dedicated unstructured data repository of independent development.The structural data and unstructured data of every article because
To be stored in different databases, and type is different, it is therefore desirable to data uniformly associate, we use structuring
Unique designation ID of the data in common commercial database is as associated foundation.
The intelligent content analysis layer is directed to unstructured data, using the method for data mining, mainly including text point
Class, text cluster, text snippet etc. carry out intelligent text content analysis, realize subject classification, hot topic is extracted and with
The functions such as track, proneness analysis.
The text classification is the identification in such a way that artificial and automation combines, to theme progress classification is both set.
There are many kinds of the methods of classification, the method that we use SUPPORT VECTOR MACHINE (support vector machines), this method
It establishes on the statistical basis to word.Its workflow is mainly as follows: the first step, manually extracts a part of article as training
Collection;Second step carries out Chinese word segmentation to feature set, filters stop words, extracts Feature Words, and every article in feature set is turned
Turn to the expression of Feature Words vector;Third step, calling classification training aids are trained feature set vector, obtain classifier;4th
Step, input content of text to be sorted, according to training set Feature Words extract feature, formed feature vector, using classifier to its into
Row classification.
The hot topic is extracted and is tracked in such a way that text cluster and classification combine, and is to heat in specific practice
The method that the extraction of point topic uses text cluster, and to the method that the tracking of hot topic uses text classification, workflow
Journey is as follows: the first step, carries out Chinese word segmentation, feature extraction to the text data in designated time period, forms vector;Second step,
Automation cluster is carried out to the vector of formation, the algorithm of cluster has very much, we are using the clustering algorithm based on level;The
Three steps, using the classification clustered out as new hot topic;If necessary to track the topic, the article in new hot topic is made
For the training set of text classification, it is trained, obtains classifier;4th step, using obtained classifier, to what is newly inputted
Article is classified, some hot topic is classified to, to realize the tracking to hot topic.
The proneness analysis is by the way of manually and automatically combining, firstly, we form semanteme to general term
Library, in this semantic base, we have carried out tendentious weight to each word and have analyzed;Secondly, input content of text, utilizes language
Yi Ku carries out semantic weighting to the word in content of text, to obtain the tendentiousness of content of text;Again, artificial side is intervened
Formula, Regulatory focus analyze result.
The association analysis layer, according to the subject classification and the hot topic, successively carry out netizen be associated with content,
Netizen is associated with netizen.The netizen is associated with the association for not referring to that netizen and Ta are published an article with content, but utilizes above-mentioned
The intelligent content analysis layer output as a result, being carried out to netizen and current subject classification, hot topic, speech tendentiousness
Association, it will thus be seen which kind of state is the interest of the netizen during this period of time hold in which subject classification, which hot topic
The method that degree mainly uses probability statistics, the concern situation of statistical analysis netizen in all directions, to judge point of interest.
The netizen is associated with netizen, the number of results of structural data described in integrated use, the intelligent content analysis layer
According to, the netizen and the associated analysis result data of content, using the method for data correlation, analysis obtains a networked society structure,
Including Web Community, network colony, network clique.According to forum's structural data, including website, the space of a whole page, netizen, time etc.,
It analyzes in certain time, is often active in the netizen group of some classification of some space of a whole page of some websites, we are defined as network society
Area;In Web Community, the netizen group of certain class sensitive subjects is often simultaneously participated in, we are defined as network colony;In network group
In vivo, unified subject under discussion is often participated in, i.e., the group of unified root patch and money order receipt to be signed and returned to the sender, we are defined as network clique.
The interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the tendentiousness
Analysis carries out netizen interest analysis prediction.The interest analysis layer includes: netizen's models repository module, for single net
The conclusion and summary of the people and netizen group past interest analysis form empirical model, and supply subsequent point as machine learning knowledge
Analysis;Netizen interest analysis module, for analyzing interest and the netizen group of single netizen according to netizen's models repository module
The point of interest of body;Netizen interest development prediction module, for according to netizen's models repository module, prediction to judge single net
The development of the following interest of the people and netizen group.
Netizen's models repository module is to go over the conclusion and summary of interest analysis to netizen and group, forms Empirical Mode
Type, and as machine learning knowledge, for subsequent analysis.Netizen's models repository has recorded the interest probabilities system of netizen and group
Score cloth, and development and change over a period.
The netizen interest analysis module, not only analyzes the interest of single netizen, also analyzes the interest of network group
Point.The method mainly used be according to netizen and content relating module analysis as a result, netizen and netizen's relating module analysis as a result,
In conjunction with netizen's models repository, comprehensively considers netizen and the previous interest experience of group, judge that netizen's current interest is distributed.
The netizen interest development prediction module is known according to where the current discussion hot spot of netizen and group with netizen's model
Know library and obtain previous development model, after comparison, prediction appropriate is made to the development of interest from now on of netizen and group and is judged.I
Use Markov model, the upper probability distribution using point of interest at every point of time, according to the general of current interest point
Rate distribution, to be made that forecast analysis to the development of the following point of interest to a certain extent.
The present invention has substantive distinguishing features and marked improvement: (1) by excavating to netizen and the associated depth of content, to net
China Association for Promoting Democracy's row interest analysis;(2) it by the analysis to network crowd, excavates, obtains netizen and institute's role and risen on network
The effect arrived, to excavate out the motivation of netizen;(3) by the way of netizen's models repository, it is related to accumulate a large amount of netizen
The model of information reapplies in current data analysis, is conducive to where analyzing the interest of netizen on the whole, and make suitable
Work as prediction.
It is proposed by the present invention using association, netizen's models repository between the association of netizen and content, netizen as technical foundation
The system for analyzing and predicting netizen interest in forum based on data mining realized, makes full use of network content information, netizen's information, goes through
History data information, effective solution excavate demand to the depth of the netizen interest analysis based on forum, are suitable for network public-opinion
The implementation of analysis system.
Detailed description of the invention
Attached drawing 1 is the system architecture diagram of system for analyzing and predicting netizen interest in forum embodiment.
Specific embodiment
Detailed description of embodiments of the present invention with reference to the accompanying drawing.
Attached drawing show the system architecture diagram of system for analyzing and predicting netizen interest in forum embodiment.As shown, entire
System architecture is divided into four levels: first layer is data storage layer, is responsible for management structural data and unstructured data enters
Library, index;The second layer is intelligent content analysis layer, carries out text classification, hot spot to article content using the method for data mining
Topic extracts and tracking, proneness analysis;Third layer is association analysis layer, including netizen and content relating module, netizen and net
People's relating module, wherein the analysis of netizen and content relating module the result is that netizen and netizen's relating module analysis foundation;The
Four layers, and most upper one layer is interest analysis layer, including netizen's models repository module, netizen interest analysis module, Wang Minxing
Interesting development prediction module, call sequence are that netizen interest analysis module calls netizen's models repository module, the two modules
It is the basis of netizen interest development prediction module again.
In the intelligent content analysis layer, text data is inputted into the module first, content analysis module calls Chinese point
Word function, segments Chinese text, then enters back into feature selecting, mainly has two work, removes stop words first, then
TFIDF value is calculated, feature selecting is carried out.The feature selecting of text classification and text cluster be it is different, text classification is direct
Feature selecting is carried out to Training document, and all test documents are regarded as different classifications by text cluster, carry out feature selecting,
Therefore, two feature selecting results are obtained.After feature selecting terminates, it is divided into two parts, a part is to carry out text classification, separately
A part is to carry out text cluster.In text classification this part, first calling classification training function, divided after training
The classifier of class;Secondly text classification is carried out;Proneness analysis finally is carried out to classification results, the speech for obtaining each classification is inclined
To implementations.In this part of text cluster, first calling text cluster function, classification is enumerated automatically;It will gather automatically again
Classification extraction come out, form new hot topic and tracking;Finally, carrying out proneness analysis to hot topic, obtain each
The speech tendentiousness of hot topic.
In the association analysis layer, existing netizen and content relating module, and have netizen and netizen's relating module.It is first
Netizen and content relating module, are divided into three parts, and first is text classification result and website space of a whole page netizen's association analysis, and
Two are hot topic analysis results and website space of a whole page netizen scrapes face analysis, and third is same topic subject under discussion and website space of a whole page netizen
Association analysis;Followed by netizen and netizen's relating module are also classified into three parts, respectively correspond above three part, and first
A network group by the identical space of a whole page same category of same web site is divided into Web Community;Second by the identical space of a whole page phase of same web site
Network group with topic is divided into network colony;The identical space of a whole page of same web site is divided by third with the network group of topic subject under discussion
Network clique.
In the interest analysis layer, by Web Community obtained above, network colony, network clique, individual netizen and incline
Tropism analysis result combines, by statistical analysis, the interest analysis point of our available netizens and network group;In this base
On plinth, in conjunction with netizen's models repository, make prediction respectively to the interest development of netizen and network group, including Web Community is emerging
Interest analysis and development prediction, network colony interest analysis and development prediction, network clique interest analysis and development prediction, Wang Minxing
Interest analysis and development prediction.
Can be seen that the present invention done from above-mentioned implementation process with association, net between the association of netizen and content, netizen
People's models repository is the system for analyzing and predicting netizen interest in forum based on data mining that technical foundation is realized, effective to realize
The depth of forum netizen interest analysis is excavated, and analysis for netter and group in Internet public opinion analysis provides reliable letter
Breath.
Claims (7)
1. a kind of system for analyzing and predicting netizen interest in forum, characterized by comprising:
Data storage layer is used for structured data and unstructured data;
Intelligent content analysis layer, for by the data in the data storage layer make subject classification, the extraction of hot topic and with
Track, proneness analysis;
Association analysis layer, according to the subject classification and the hot topic, successively carry out netizen be associated with content, netizen and net
People's association;
Interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the proneness analysis, progress
Netizen interest analysis prediction.
2. system for analyzing and predicting netizen interest in forum according to claim 1, it is characterised in that the interest analysis layer packet
It includes:
Netizen's models repository module is formed for the conclusion and summary to single netizen and netizen group past interest analysis
Empirical model, and as machine learning knowledge for subsequent analysis;
Netizen interest analysis module, for analyzing interest and the netizen of single netizen according to netizen's models repository module
The point of interest of group;
Netizen interest development prediction module, for according to netizen's models repository module, prediction to judge single netizen and net
The following interest of people group develops.
3. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the netizen and content
Association include text classification result and netizen's association analysis, hot topic analysis result and website space of a whole page netizen association analysis and
With topic subject under discussion and netizen's association analysis.
4. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the netizen and netizen
Association includes by the netizen of the identical space of a whole page same category of same web site and netizen association, by the identical words of the identical space of a whole page of same web site
The netizen of topic and netizen are associated with and are associated with the identical space of a whole page of same web site with netizen with the netizen of topic subject under discussion.
5. system for analyzing and predicting netizen interest in forum according to claim 3, it is characterised in that the netizen and netizen are closed
Connection includes by the netizen of the identical space of a whole page same category of same web site and netizen association, by the identical space of a whole page same topic of same web site
Netizen and netizen be associated with and be associated with the identical space of a whole page of same web site with netizen with the netizen of topic subject under discussion.
6. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the data storage layer
Index is established for the structural data and the unstructured data.
7. system for analyzing and predicting netizen interest in forum according to claim 2, it is characterised in that the netizen interest analysis
Module uses Markov model, at every point of time the upper probability distribution using point of interest, according to the probability of current interest point
Distribution, prediction judge the development of the following point of interest.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910133585.7A CN109902237A (en) | 2019-02-22 | 2019-02-22 | System for analyzing and predicting netizen interest in forum |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910133585.7A CN109902237A (en) | 2019-02-22 | 2019-02-22 | System for analyzing and predicting netizen interest in forum |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109902237A true CN109902237A (en) | 2019-06-18 |
Family
ID=66945406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910133585.7A Withdrawn CN109902237A (en) | 2019-02-22 | 2019-02-22 | System for analyzing and predicting netizen interest in forum |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902237A (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101556582A (en) * | 2008-04-09 | 2009-10-14 | 上海复旦光华信息科技股份有限公司 | System for analyzing and predicting netizen interest in forum |
-
2019
- 2019-02-22 CN CN201910133585.7A patent/CN109902237A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101556582A (en) * | 2008-04-09 | 2009-10-14 | 上海复旦光华信息科技股份有限公司 | System for analyzing and predicting netizen interest in forum |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
CN103176985B (en) | The most efficient a kind of internet information crawling method | |
CN108629633A (en) | A kind of method and system for establishing user's portrait based on big data | |
CN108664269B (en) | A kind of feature attachment code peculiar smell detection method based on deep learning | |
CN108897857A (en) | The Chinese Text Topic sentence generating method of domain-oriented | |
CN103812872B (en) | A kind of network navy behavioral value method and system based on mixing Di Li Cray process | |
CN103823890B (en) | A kind of microblog hot topic detection method for special group and device | |
CN105069080B (en) | A kind of document retrieval method and system | |
CN101556582A (en) | System for analyzing and predicting netizen interest in forum | |
CN109492026A (en) | A kind of Telecoms Fraud classification and Detection method based on improved active learning techniques | |
CN108733791B (en) | Network event detection method | |
CN106022708A (en) | Method for predicting employee resignation | |
CN109359137B (en) | User growth portrait construction method based on feature screening and semi-supervised learning | |
CN106682236A (en) | Machine learning based patent data processing method and processing system adopting same | |
CN106649270A (en) | Public opinion monitoring and analyzing method | |
CN110032643A (en) | A kind of building maintenance work order analysis method, device, storage medium and client | |
Gu et al. | [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management | |
CN110059190A (en) | A kind of user's real-time point of view detection method based on social media content and structure | |
CN109558484A (en) | Electric power customer service work order emotion quantitative analysis method based on similarity word order matrix | |
Wang et al. | Measuring technology complementarity between enterprises with an hLDA topic model | |
Tong et al. | Multimedia network public opinion supervision prediction algorithm based on big data | |
Wang et al. | Enhancing rumor detection in social media using dynamic propagation structures | |
CN117455529A (en) | User electricity utilization characteristic image construction method and system based on big data technology | |
Chi et al. | Expert identification based on dynamic LDA topic model | |
CN109902237A (en) | System for analyzing and predicting netizen interest in forum |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190618 |