CN109902237A - System for analyzing and predicting netizen interest in forum - Google Patents

System for analyzing and predicting netizen interest in forum Download PDF

Info

Publication number
CN109902237A
CN109902237A CN201910133585.7A CN201910133585A CN109902237A CN 109902237 A CN109902237 A CN 109902237A CN 201910133585 A CN201910133585 A CN 201910133585A CN 109902237 A CN109902237 A CN 109902237A
Authority
CN
China
Prior art keywords
netizen
interest
analysis
association
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910133585.7A
Other languages
Chinese (zh)
Inventor
赵乔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Hua Bi Mdt Infotech Ltd
Original Assignee
Suzhou Hua Bi Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Hua Bi Mdt Infotech Ltd filed Critical Suzhou Hua Bi Mdt Infotech Ltd
Priority to CN201910133585.7A priority Critical patent/CN109902237A/en
Publication of CN109902237A publication Critical patent/CN109902237A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of system for analyzing and predicting netizen interest in forum, characterized by comprising: data storage layer is used for structured data and unstructured data;Intelligent content analysis layer, for the data in the data storage layer to be made subject classification, the extraction of hot topic and tracking, proneness analysis;Association analysis layer, according to the subject classification and the hot topic, successively netizen is associated with content, netizen is associated with netizen for progress;Interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the proneness analysis, progress netizen interest analysis prediction.According to this system, it can effectively solve the problem that the depth to forum netizen interest analysis excavates demand, the implementation suitable for Internet public opinion analysis system.

Description

System for analyzing and predicting netizen interest in forum
Technical field
The present invention is a kind of analytical technology of Network character, and in particular to a kind of analyzing and predicting netizen interest in forum system System, belongs to data mining technology field.
Background technique
With the development of Network Information, there is a large amount of virtual community, form a Network character, Network forum is exactly one such principal mode.In traditional socialization warp, have for a long time a set of effective The management system of people and group, but Network character this be a new things, it not only has freely makes a speech on the net Feature also has the characteristics that netizen's anonymity, increases the difficulty of supervision.Currently, network public-opinion, which has become one, to neglect Depending on aspect, and network forum better reflects the characteristics of network is assembled a crowd and compares with other network applications, can more reflect network carriage Mood gesture.Therefore, for the chief motivation amount of public sentiment in Websites --- the analysis of netizen is of great significance.By right The analysis of netizen interest in forum can accurately control the main trend that network public-opinion situation develops in certain time period.
Although there is preferable development prospect to the netizen interest analysis based on forum and apply future, also occur some Relevant system, still, the system in the field remains a series of problem at present, mainly have it is several under it is several:
1. simple netizen and the association analysis published an article, lack to netizen participate in subject under discussion, hot topic, content type when Between network analysis in span so that the analysis to individual netizen lacks three-dimensional sense.
2. activity of the netizen on network often has ignored this often with the property of group, current system and method Point.Network public-opinion is essentially all to be formed under the drive of network community, and individual netizen is hardly formed one strength,
Therefore, it is necessary to deep analysis is carried out to network crowd.
3. current system and method are analyzed instant, local data, still, the interest of netizen is not Independent, they are often associated with big network environment, network Development process, and current system and method lack a net People's models repository, for being analyzed and predicted on the whole to netizen interest.
It can be seen that the analysis of netizen interest is very important in network forum, to the analysis of netizen interest in data It digs according to the requirement for above having depth, and existing system association, netizen's models repository between netizen and content association, netizen All there is defects, are also unable to satisfy the profound of netizen interest analysis and require.
Summary of the invention
The purpose of the present invention is primarily directed in the system of the existing Network character netizen interest analysis based on forum Existing defect proposes a kind of using association, netizen's models repository between netizen and content association, netizen as technical foundation reality The existing system for analyzing and predicting netizen interest in forum based on data mining, it mainly passes through netizen and hot topic, subject under discussion, content Classification, proneness analysis, relationship analysis between netizen and netizen, the accumulation etc. of long-term netizen's models repository, depth are dug Origin and the development of netizen interest have been dug, and has been made a prediction, has realized the profound analysis of forum netizen interest.
It is of the present invention using association, netizen's models repository between the association of netizen and content, netizen as technical foundation Realize based on the system for analyzing and predicting netizen interest in forum of data mining by data storage layer, intelligent content analysis layer, association Analysis layer and interest analysis layer composition.
The data storage layer is responsible for storage configuration data and unstructured data, the storage of data in local system It with index is completed in this layer.For structural data, such as netizen ID, time, the data storage layer is deposited in In general business database, here using oracle;And for unstructured data, mainly content of text, if Be stored in general business database, with the increase of data volume, indexing performance will be drastically reduced, therefore, we by its It is placed in the dedicated unstructured data repository of independent development.The structural data and unstructured data of every article because To be stored in different databases, and type is different, it is therefore desirable to data uniformly associate, we use structuring Unique designation ID of the data in common commercial database is as associated foundation.
The intelligent content analysis layer is directed to unstructured data, using the method for data mining, mainly including text point Class, text cluster, text snippet etc. carry out intelligent text content analysis, realize subject classification, hot topic is extracted and with The functions such as track, proneness analysis.
The text classification is the identification in such a way that artificial and automation combines, to theme progress classification is both set. There are many kinds of the methods of classification, the method that we use SUPPORT VECTOR MACHINE (support vector machines), this method It establishes on the statistical basis to word.Its workflow is mainly as follows: the first step, manually extracts a part of article as training Collection;Second step carries out Chinese word segmentation to feature set, filters stop words, extracts Feature Words, and every article in feature set is turned Turn to the expression of Feature Words vector;Third step, calling classification training aids are trained feature set vector, obtain classifier;4th Step, input content of text to be sorted, according to training set Feature Words extract feature, formed feature vector, using classifier to its into Row classification.
The hot topic is extracted and is tracked in such a way that text cluster and classification combine, and is to heat in specific practice The method that the extraction of point topic uses text cluster, and to the method that the tracking of hot topic uses text classification, workflow Journey is as follows: the first step, carries out Chinese word segmentation, feature extraction to the text data in designated time period, forms vector;Second step, Automation cluster is carried out to the vector of formation, the algorithm of cluster has very much, we are using the clustering algorithm based on level;The Three steps, using the classification clustered out as new hot topic;If necessary to track the topic, the article in new hot topic is made For the training set of text classification, it is trained, obtains classifier;4th step, using obtained classifier, to what is newly inputted Article is classified, some hot topic is classified to, to realize the tracking to hot topic.
The proneness analysis is by the way of manually and automatically combining, firstly, we form semanteme to general term Library, in this semantic base, we have carried out tendentious weight to each word and have analyzed;Secondly, input content of text, utilizes language Yi Ku carries out semantic weighting to the word in content of text, to obtain the tendentiousness of content of text;Again, artificial side is intervened Formula, Regulatory focus analyze result.
The association analysis layer, according to the subject classification and the hot topic, successively carry out netizen be associated with content, Netizen is associated with netizen.The netizen is associated with the association for not referring to that netizen and Ta are published an article with content, but utilizes above-mentioned The intelligent content analysis layer output as a result, being carried out to netizen and current subject classification, hot topic, speech tendentiousness Association, it will thus be seen which kind of state is the interest of the netizen during this period of time hold in which subject classification, which hot topic The method that degree mainly uses probability statistics, the concern situation of statistical analysis netizen in all directions, to judge point of interest.
The netizen is associated with netizen, the number of results of structural data described in integrated use, the intelligent content analysis layer According to, the netizen and the associated analysis result data of content, using the method for data correlation, analysis obtains a networked society structure, Including Web Community, network colony, network clique.According to forum's structural data, including website, the space of a whole page, netizen, time etc., It analyzes in certain time, is often active in the netizen group of some classification of some space of a whole page of some websites, we are defined as network society Area;In Web Community, the netizen group of certain class sensitive subjects is often simultaneously participated in, we are defined as network colony;In network group In vivo, unified subject under discussion is often participated in, i.e., the group of unified root patch and money order receipt to be signed and returned to the sender, we are defined as network clique.
The interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the tendentiousness Analysis carries out netizen interest analysis prediction.The interest analysis layer includes: netizen's models repository module, for single net The conclusion and summary of the people and netizen group past interest analysis form empirical model, and supply subsequent point as machine learning knowledge Analysis;Netizen interest analysis module, for analyzing interest and the netizen group of single netizen according to netizen's models repository module The point of interest of body;Netizen interest development prediction module, for according to netizen's models repository module, prediction to judge single net The development of the following interest of the people and netizen group.
Netizen's models repository module is to go over the conclusion and summary of interest analysis to netizen and group, forms Empirical Mode Type, and as machine learning knowledge, for subsequent analysis.Netizen's models repository has recorded the interest probabilities system of netizen and group Score cloth, and development and change over a period.
The netizen interest analysis module, not only analyzes the interest of single netizen, also analyzes the interest of network group Point.The method mainly used be according to netizen and content relating module analysis as a result, netizen and netizen's relating module analysis as a result, In conjunction with netizen's models repository, comprehensively considers netizen and the previous interest experience of group, judge that netizen's current interest is distributed.
The netizen interest development prediction module is known according to where the current discussion hot spot of netizen and group with netizen's model Know library and obtain previous development model, after comparison, prediction appropriate is made to the development of interest from now on of netizen and group and is judged.I Use Markov model, the upper probability distribution using point of interest at every point of time, according to the general of current interest point Rate distribution, to be made that forecast analysis to the development of the following point of interest to a certain extent.
The present invention has substantive distinguishing features and marked improvement: (1) by excavating to netizen and the associated depth of content, to net China Association for Promoting Democracy's row interest analysis;(2) it by the analysis to network crowd, excavates, obtains netizen and institute's role and risen on network The effect arrived, to excavate out the motivation of netizen;(3) by the way of netizen's models repository, it is related to accumulate a large amount of netizen The model of information reapplies in current data analysis, is conducive to where analyzing the interest of netizen on the whole, and make suitable Work as prediction.
It is proposed by the present invention using association, netizen's models repository between the association of netizen and content, netizen as technical foundation The system for analyzing and predicting netizen interest in forum based on data mining realized, makes full use of network content information, netizen's information, goes through History data information, effective solution excavate demand to the depth of the netizen interest analysis based on forum, are suitable for network public-opinion The implementation of analysis system.
Detailed description of the invention
Attached drawing 1 is the system architecture diagram of system for analyzing and predicting netizen interest in forum embodiment.
Specific embodiment
Detailed description of embodiments of the present invention with reference to the accompanying drawing.
Attached drawing show the system architecture diagram of system for analyzing and predicting netizen interest in forum embodiment.As shown, entire System architecture is divided into four levels: first layer is data storage layer, is responsible for management structural data and unstructured data enters Library, index;The second layer is intelligent content analysis layer, carries out text classification, hot spot to article content using the method for data mining Topic extracts and tracking, proneness analysis;Third layer is association analysis layer, including netizen and content relating module, netizen and net People's relating module, wherein the analysis of netizen and content relating module the result is that netizen and netizen's relating module analysis foundation;The Four layers, and most upper one layer is interest analysis layer, including netizen's models repository module, netizen interest analysis module, Wang Minxing Interesting development prediction module, call sequence are that netizen interest analysis module calls netizen's models repository module, the two modules It is the basis of netizen interest development prediction module again.
In the intelligent content analysis layer, text data is inputted into the module first, content analysis module calls Chinese point Word function, segments Chinese text, then enters back into feature selecting, mainly has two work, removes stop words first, then TFIDF value is calculated, feature selecting is carried out.The feature selecting of text classification and text cluster be it is different, text classification is direct Feature selecting is carried out to Training document, and all test documents are regarded as different classifications by text cluster, carry out feature selecting, Therefore, two feature selecting results are obtained.After feature selecting terminates, it is divided into two parts, a part is to carry out text classification, separately A part is to carry out text cluster.In text classification this part, first calling classification training function, divided after training The classifier of class;Secondly text classification is carried out;Proneness analysis finally is carried out to classification results, the speech for obtaining each classification is inclined To implementations.In this part of text cluster, first calling text cluster function, classification is enumerated automatically;It will gather automatically again Classification extraction come out, form new hot topic and tracking;Finally, carrying out proneness analysis to hot topic, obtain each The speech tendentiousness of hot topic.
In the association analysis layer, existing netizen and content relating module, and have netizen and netizen's relating module.It is first Netizen and content relating module, are divided into three parts, and first is text classification result and website space of a whole page netizen's association analysis, and Two are hot topic analysis results and website space of a whole page netizen scrapes face analysis, and third is same topic subject under discussion and website space of a whole page netizen Association analysis;Followed by netizen and netizen's relating module are also classified into three parts, respectively correspond above three part, and first A network group by the identical space of a whole page same category of same web site is divided into Web Community;Second by the identical space of a whole page phase of same web site Network group with topic is divided into network colony;The identical space of a whole page of same web site is divided by third with the network group of topic subject under discussion Network clique.
In the interest analysis layer, by Web Community obtained above, network colony, network clique, individual netizen and incline Tropism analysis result combines, by statistical analysis, the interest analysis point of our available netizens and network group;In this base On plinth, in conjunction with netizen's models repository, make prediction respectively to the interest development of netizen and network group, including Web Community is emerging Interest analysis and development prediction, network colony interest analysis and development prediction, network clique interest analysis and development prediction, Wang Minxing Interest analysis and development prediction.
Can be seen that the present invention done from above-mentioned implementation process with association, net between the association of netizen and content, netizen People's models repository is the system for analyzing and predicting netizen interest in forum based on data mining that technical foundation is realized, effective to realize The depth of forum netizen interest analysis is excavated, and analysis for netter and group in Internet public opinion analysis provides reliable letter Breath.

Claims (7)

1. a kind of system for analyzing and predicting netizen interest in forum, characterized by comprising:
Data storage layer is used for structured data and unstructured data;
Intelligent content analysis layer, for by the data in the data storage layer make subject classification, the extraction of hot topic and with Track, proneness analysis;
Association analysis layer, according to the subject classification and the hot topic, successively carry out netizen be associated with content, netizen and net People's association;
Interest analysis layer is associated with according to the netizen with content, the netizen is associated with netizen and the proneness analysis, progress Netizen interest analysis prediction.
2. system for analyzing and predicting netizen interest in forum according to claim 1, it is characterised in that the interest analysis layer packet It includes:
Netizen's models repository module is formed for the conclusion and summary to single netizen and netizen group past interest analysis Empirical model, and as machine learning knowledge for subsequent analysis;
Netizen interest analysis module, for analyzing interest and the netizen of single netizen according to netizen's models repository module The point of interest of group;
Netizen interest development prediction module, for according to netizen's models repository module, prediction to judge single netizen and net The following interest of people group develops.
3. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the netizen and content Association include text classification result and netizen's association analysis, hot topic analysis result and website space of a whole page netizen association analysis and With topic subject under discussion and netizen's association analysis.
4. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the netizen and netizen Association includes by the netizen of the identical space of a whole page same category of same web site and netizen association, by the identical words of the identical space of a whole page of same web site The netizen of topic and netizen are associated with and are associated with the identical space of a whole page of same web site with netizen with the netizen of topic subject under discussion.
5. system for analyzing and predicting netizen interest in forum according to claim 3, it is characterised in that the netizen and netizen are closed Connection includes by the netizen of the identical space of a whole page same category of same web site and netizen association, by the identical space of a whole page same topic of same web site Netizen and netizen be associated with and be associated with the identical space of a whole page of same web site with netizen with the netizen of topic subject under discussion.
6. system for analyzing and predicting netizen interest in forum according to claim 1 or 2, it is characterised in that the data storage layer Index is established for the structural data and the unstructured data.
7. system for analyzing and predicting netizen interest in forum according to claim 2, it is characterised in that the netizen interest analysis Module uses Markov model, at every point of time the upper probability distribution using point of interest, according to the probability of current interest point Distribution, prediction judge the development of the following point of interest.
CN201910133585.7A 2019-02-22 2019-02-22 System for analyzing and predicting netizen interest in forum Withdrawn CN109902237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910133585.7A CN109902237A (en) 2019-02-22 2019-02-22 System for analyzing and predicting netizen interest in forum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910133585.7A CN109902237A (en) 2019-02-22 2019-02-22 System for analyzing and predicting netizen interest in forum

Publications (1)

Publication Number Publication Date
CN109902237A true CN109902237A (en) 2019-06-18

Family

ID=66945406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910133585.7A Withdrawn CN109902237A (en) 2019-02-22 2019-02-22 System for analyzing and predicting netizen interest in forum

Country Status (1)

Country Link
CN (1) CN109902237A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556582A (en) * 2008-04-09 2009-10-14 上海复旦光华信息科技股份有限公司 System for analyzing and predicting netizen interest in forum

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556582A (en) * 2008-04-09 2009-10-14 上海复旦光华信息科技股份有限公司 System for analyzing and predicting netizen interest in forum

Similar Documents

Publication Publication Date Title
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
CN103176985B (en) The most efficient a kind of internet information crawling method
CN108629633A (en) A kind of method and system for establishing user's portrait based on big data
CN108664269B (en) A kind of feature attachment code peculiar smell detection method based on deep learning
CN108897857A (en) The Chinese Text Topic sentence generating method of domain-oriented
CN103812872B (en) A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN103823890B (en) A kind of microblog hot topic detection method for special group and device
CN105069080B (en) A kind of document retrieval method and system
CN101556582A (en) System for analyzing and predicting netizen interest in forum
CN109492026A (en) A kind of Telecoms Fraud classification and Detection method based on improved active learning techniques
CN108733791B (en) Network event detection method
CN106022708A (en) Method for predicting employee resignation
CN109359137B (en) User growth portrait construction method based on feature screening and semi-supervised learning
CN106682236A (en) Machine learning based patent data processing method and processing system adopting same
CN106649270A (en) Public opinion monitoring and analyzing method
CN110032643A (en) A kind of building maintenance work order analysis method, device, storage medium and client
Gu et al. [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management
CN110059190A (en) A kind of user's real-time point of view detection method based on social media content and structure
CN109558484A (en) Electric power customer service work order emotion quantitative analysis method based on similarity word order matrix
Wang et al. Measuring technology complementarity between enterprises with an hLDA topic model
Tong et al. Multimedia network public opinion supervision prediction algorithm based on big data
Wang et al. Enhancing rumor detection in social media using dynamic propagation structures
CN117455529A (en) User electricity utilization characteristic image construction method and system based on big data technology
Chi et al. Expert identification based on dynamic LDA topic model
CN109902237A (en) System for analyzing and predicting netizen interest in forum

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190618