CN111666268A - Microblog big data public opinion analysis method - Google Patents
Microblog big data public opinion analysis method Download PDFInfo
- Publication number
- CN111666268A CN111666268A CN202010430701.4A CN202010430701A CN111666268A CN 111666268 A CN111666268 A CN 111666268A CN 202010430701 A CN202010430701 A CN 202010430701A CN 111666268 A CN111666268 A CN 111666268A
- Authority
- CN
- China
- Prior art keywords
- microblog
- data
- public opinion
- analysis
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 4
- 230000008451 emotion Effects 0.000 claims description 7
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a microblog big data public opinion analysis method, which relates to the technical field of public opinion analysis, and comprises the following steps in order to obtain a related hot microblog public opinion result: the method comprises the following steps: collecting data; receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping; step two: analyzing data; processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity; step three: outputting the result; and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results. A large amount of data is obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result.
Description
Technical Field
The invention relates to the technical field of public opinion analysis, in particular to a microblog big data public opinion analysis method.
Background
With the advent of the web2.0 era, the number of microblog users is gradually huge, state information is frequently updated, information is rapidly spread, and microblog platform medium user occupancy is relatively centralized, so that analysis and research based on microblog data are a very interesting research direction.
Microblogs have a broad user base, public opinion information is generated and spread rapidly on a microblog platform, microblog users grow rapidly, and analysis based on microblog data has attracted social attention widely.
In order to effectively utilize microblogs to analyze social public opinions, the acquisition of microblog data is very important. For example, a large number of users are active on the Sina microblog, and nearly 1 hundred million microblog contents are generated every day.
The big data method is adopted to effectively monitor and analyze a large amount of public opinion information generated by the microblog in time, and has important practical significance for maintaining social stability and promoting national development.
In daily life, emergencies frequently occur, and users are increasingly accustomed to publishing their own opinions and emotions using social networks (e.g., blogs, forums, twitter, Facebook, etc.). The domestic users use the microblog more frequently and commonly, but the emotion of the users on the event does not remain unchanged, but continuously evolves along with the change of time or the development of the event, becomes stronger or weaker gradually, and even transforms from one emotion to another emotion. How to detect the emotional evolution process of the user to the emergency on line in real time has very important significance.
Disclosure of Invention
In view of the above, the present invention is to provide a method for analyzing public sentiment of big data of a microblog, so as to effectively analyze the public sentiment information generated by the microblog.
Based on the above purpose, the invention provides a microblog big data public opinion analysis method, which comprises the following steps:
the method comprises the following steps: collecting data;
receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping;
step two: analyzing data;
processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity;
step three: outputting the result;
and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results.
Optionally, the first step: the data acquisition method comprises the following specific steps:
respectively logging in a microblog platform by using a plurality of registered microblog accounts in a mode of simulating user login;
and each microblog account number pays attention to and receives microblog content issued by a concerned friend in each preset group of the microblog account numbers in a grouping manner.
Optionally, the second step: the specific steps of data analysis are as follows:
s21: filtering repeated contents from the computer-processable structured data obtained in the step two;
s22: performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set;
s23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
Optionally, the filtering method comprises:
s201: filtering the dialogue interaction information with pertinence, and eliminating noise data as much as possible;
s202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
Optionally, the time periods are divided according to time units with different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related microblog public opinion results of different time units.
From the above, the microblog big data public opinion analysis method is provided, a large amount of data are obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result and a related hot microblog public opinion result set.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to specific embodiments below.
It should be noted that all expressions of "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of description and should not be construed as limiting the embodiments of the present invention, and the directions and positions used are used for explaining and understanding the present invention and are not used for limiting the following embodiments of the present invention.
A microblog big data public opinion analysis method comprises the following specific steps:
1. data acquisition:
and S11, logging in a microblog platform by using the registered microblog accounts in a mode of simulating user login.
The method aims to control the computer to automatically log in the microblog platform so that the computer can automatically acquire microblog contents.
And S12, receiving microblog contents released by each preset in-group attention friend of the microblog accounts respectively based on an attention-grouping mode.
Specifically, the method comprises the following steps: after the microblog account logs in the microblog platform, when a microblog user paying attention to the setting is added, the group to which the microblog user belongs can be specified. Because the microblog platform has length limitation on receiving microblog messages of concerned friends each time, if account numbers in a certain field are added with concerned target account numbers and the concerned target account numbers are not grouped, or even if the target account numbers are grouped but microblog contents are not received in batches by the target account numbers in each grouped group respectively, the problem that the microblog contents cannot be received because the length of the received microblog contents exceeds the length limitation can occur, therefore, the embodiment requires that the concerned target account numbers of the microblog account numbers are firstly grouped, then the microblog contents issued by the concerned target account numbers in each grouped group are respectively received, so that the microblog contents issued by the target account numbers in the group of only one group are received each time, and the problem that the received data lose data because the received data exceed the length limitation of the microblog platform due to excessive target account numbers from which the acquired content comes is reduced, the integrity of the acquired microblog data can be improved.
In this embodiment, the number of the related microblog account numbers is not limited herein, and one or more microblog account numbers may be used, and the specific number may be determined according to one or more of the factors of the precision of the public opinion analysis, the target field of the public opinion analysis, the limitation of the number of users of the microblog platform, the limitation of the access frequency of the microblog platform, and/or the limitation of the number of concerns of the microblog platform.
2. And (3) data analysis:
s21: processing the natural semantic text in the data acquired in step S12 into computer-processable structured data, and filtering out repeated content, wherein the processing method for the data is as follows:
s201: and filtering the targeted dialogue interaction information to eliminate the noise data as much as possible.
S202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
S22: and performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set.
S23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
3. And (4) outputting a result:
s31: classifying the preliminary microblog public opinion results of the related popularity obtained in the step S23 according to a set time period, wherein the time period can be divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years, and the like, or can be divided simultaneously by adopting more than one of the time units for positioning, so as to obtain the related microblog public opinion results of different time units.
S32: and sending public opinion early warning or public opinion brief report to the user or the demander according to the related hot microblog public opinion results of different time units, wherein the public opinion early warning or the public opinion brief report is based on the related hot microblog public opinion results.
S33: and classifying and collecting the obtained related hot microblog public opinion results of different time units according to the different time units to obtain related hot microblog public opinion result sets classified according to different time units.
And S34, outputting the related hot microblog public opinion result sets classified according to different time units.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (5)
1. A microblog big data public opinion analysis method is characterized by comprising the following steps:
the method comprises the following steps: collecting data;
receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping;
step two: analyzing data;
processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity;
step three: outputting the result;
and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results.
2. The microblog big data public opinion analysis method according to claim 1, characterized in that the first step: the data acquisition method comprises the following specific steps:
respectively logging in a microblog platform by using a plurality of registered microblog accounts in a mode of simulating user login;
and each microblog account number pays attention to and receives microblog content issued by a concerned friend in each preset group of the microblog account numbers in a grouping manner.
3. The microblog big data public opinion analysis method according to claim 1, characterized in that the second step: the specific steps of data analysis are as follows:
s21: filtering repeated contents from the computer-processable structured data obtained in the step two;
s22: performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set;
s23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
4. The microblog big data public opinion analysis method according to claim 3, wherein the filtering method is as follows:
s201: filtering the dialogue interaction information with pertinence, and eliminating noise data as much as possible;
s202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
5. The microblog big data public opinion analysis method according to claim 1, wherein the time periods are divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related hot microblog public opinion results of different time units.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010430701.4A CN111666268A (en) | 2020-05-20 | 2020-05-20 | Microblog big data public opinion analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010430701.4A CN111666268A (en) | 2020-05-20 | 2020-05-20 | Microblog big data public opinion analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111666268A true CN111666268A (en) | 2020-09-15 |
Family
ID=72384018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010430701.4A Pending CN111666268A (en) | 2020-05-20 | 2020-05-20 | Microblog big data public opinion analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666268A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157993A (en) * | 2021-02-08 | 2021-07-23 | 电子科技大学 | Network water army behavior early warning model based on time sequence graph polarization analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103390051A (en) * | 2013-07-25 | 2013-11-13 | 南京邮电大学 | Topic detection and tracking method based on microblog data |
WO2015016784A1 (en) * | 2013-08-01 | 2015-02-05 | National University Of Singapore | A method and apparatus for tracking microblog messages for relevancy to an entity identifiable by an associated text and an image |
CN104954234A (en) * | 2015-05-19 | 2015-09-30 | 中国地质大学(北京) | Microblog data acquisition method, microblog data acquisition device and public opinion analysis method |
-
2020
- 2020-05-20 CN CN202010430701.4A patent/CN111666268A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103390051A (en) * | 2013-07-25 | 2013-11-13 | 南京邮电大学 | Topic detection and tracking method based on microblog data |
WO2015016784A1 (en) * | 2013-08-01 | 2015-02-05 | National University Of Singapore | A method and apparatus for tracking microblog messages for relevancy to an entity identifiable by an associated text and an image |
CN104954234A (en) * | 2015-05-19 | 2015-09-30 | 中国地质大学(北京) | Microblog data acquisition method, microblog data acquisition device and public opinion analysis method |
Non-Patent Citations (1)
Title |
---|
李亚芳等: "基于新浪微博大数据的新疆伽师6.4级地震舆情分析及可视化研究", 《内陆地震》, vol. 34, no. 01, 15 March 2020 (2020-03-15), pages 103 - 110 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157993A (en) * | 2021-02-08 | 2021-07-23 | 电子科技大学 | Network water army behavior early warning model based on time sequence graph polarization analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9424319B2 (en) | Social media based content selection system | |
US20160203187A1 (en) | System and method for generating social summaries | |
Rubin et al. | Deception detection for news: three types of fakes | |
CN110263238B (en) | Big data-based public opinion listening system | |
Glance et al. | Blogpulse: Automated trend discovery for weblogs | |
Graham et al. | # IStandWithDan versus# DictatorDan: the polarised dynamics of Twitter discussions about Victoria’s COVID-19 restrictions | |
US20130297619A1 (en) | Social media profiling | |
US9563770B2 (en) | Spammer group extraction apparatus and method | |
CN105718587A (en) | Network content resource evaluation method and evaluation system | |
CN101510879A (en) | Method and apparatus for filtering rubbish contents | |
JP2013543610A (en) | System and method for reputation management of consumer sent media | |
CN102945246B (en) | The disposal route of network information data and device | |
US10387467B2 (en) | Time-based sentiment normalization based on authors personality insight data points | |
Liu et al. | SDHM: A hybrid model for spammer detection in Weibo | |
CN109033286B (en) | Data statistical method and device | |
CN111310021A (en) | Network public opinion monitoring method | |
CN111191096B (en) | Method for identifying public opinion events and tracking popularity of whole-network patriotic | |
CN110825868A (en) | Topic popularity based text pushing method, terminal device and storage medium | |
CN111984787A (en) | Public opinion hotspot obtaining method and system based on internet data | |
Abinaya et al. | Spam detection on social media platforms | |
CN112035603A (en) | Propagation influence evaluation method for comprehensive calculation event | |
CN111666268A (en) | Microblog big data public opinion analysis method | |
CN106886916A (en) | Reputation management system and method | |
CN108959484B (en) | Multi-strategy media data stream filtering method and device for event detection | |
Sakib et al. | Automated detection of sockpuppet accounts in wikipedia |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200915 |
|
RJ01 | Rejection of invention patent application after publication |