CN111666268A - Microblog big data public opinion analysis method - Google Patents

Microblog big data public opinion analysis method Download PDF

Info

Publication number
CN111666268A
CN111666268A CN202010430701.4A CN202010430701A CN111666268A CN 111666268 A CN111666268 A CN 111666268A CN 202010430701 A CN202010430701 A CN 202010430701A CN 111666268 A CN111666268 A CN 111666268A
Authority
CN
China
Prior art keywords
microblog
data
public opinion
analysis
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010430701.4A
Other languages
Chinese (zh)
Inventor
张俊杰
侍文超
耿雁萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Huolan Data Co ltd
Original Assignee
Anhui Huolan Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Huolan Data Co ltd filed Critical Anhui Huolan Data Co ltd
Priority to CN202010430701.4A priority Critical patent/CN111666268A/en
Publication of CN111666268A publication Critical patent/CN111666268A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a microblog big data public opinion analysis method, which relates to the technical field of public opinion analysis, and comprises the following steps in order to obtain a related hot microblog public opinion result: the method comprises the following steps: collecting data; receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping; step two: analyzing data; processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity; step three: outputting the result; and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results. A large amount of data is obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result.

Description

Microblog big data public opinion analysis method
Technical Field
The invention relates to the technical field of public opinion analysis, in particular to a microblog big data public opinion analysis method.
Background
With the advent of the web2.0 era, the number of microblog users is gradually huge, state information is frequently updated, information is rapidly spread, and microblog platform medium user occupancy is relatively centralized, so that analysis and research based on microblog data are a very interesting research direction.
Microblogs have a broad user base, public opinion information is generated and spread rapidly on a microblog platform, microblog users grow rapidly, and analysis based on microblog data has attracted social attention widely.
In order to effectively utilize microblogs to analyze social public opinions, the acquisition of microblog data is very important. For example, a large number of users are active on the Sina microblog, and nearly 1 hundred million microblog contents are generated every day.
The big data method is adopted to effectively monitor and analyze a large amount of public opinion information generated by the microblog in time, and has important practical significance for maintaining social stability and promoting national development.
In daily life, emergencies frequently occur, and users are increasingly accustomed to publishing their own opinions and emotions using social networks (e.g., blogs, forums, twitter, Facebook, etc.). The domestic users use the microblog more frequently and commonly, but the emotion of the users on the event does not remain unchanged, but continuously evolves along with the change of time or the development of the event, becomes stronger or weaker gradually, and even transforms from one emotion to another emotion. How to detect the emotional evolution process of the user to the emergency on line in real time has very important significance.
Disclosure of Invention
In view of the above, the present invention is to provide a method for analyzing public sentiment of big data of a microblog, so as to effectively analyze the public sentiment information generated by the microblog.
Based on the above purpose, the invention provides a microblog big data public opinion analysis method, which comprises the following steps:
the method comprises the following steps: collecting data;
receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping;
step two: analyzing data;
processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity;
step three: outputting the result;
and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results.
Optionally, the first step: the data acquisition method comprises the following specific steps:
respectively logging in a microblog platform by using a plurality of registered microblog accounts in a mode of simulating user login;
and each microblog account number pays attention to and receives microblog content issued by a concerned friend in each preset group of the microblog account numbers in a grouping manner.
Optionally, the second step: the specific steps of data analysis are as follows:
s21: filtering repeated contents from the computer-processable structured data obtained in the step two;
s22: performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set;
s23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
Optionally, the filtering method comprises:
s201: filtering the dialogue interaction information with pertinence, and eliminating noise data as much as possible;
s202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
Optionally, the time periods are divided according to time units with different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related microblog public opinion results of different time units.
From the above, the microblog big data public opinion analysis method is provided, a large amount of data are obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result and a related hot microblog public opinion result set.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to specific embodiments below.
It should be noted that all expressions of "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of description and should not be construed as limiting the embodiments of the present invention, and the directions and positions used are used for explaining and understanding the present invention and are not used for limiting the following embodiments of the present invention.
A microblog big data public opinion analysis method comprises the following specific steps:
1. data acquisition:
and S11, logging in a microblog platform by using the registered microblog accounts in a mode of simulating user login.
The method aims to control the computer to automatically log in the microblog platform so that the computer can automatically acquire microblog contents.
And S12, receiving microblog contents released by each preset in-group attention friend of the microblog accounts respectively based on an attention-grouping mode.
Specifically, the method comprises the following steps: after the microblog account logs in the microblog platform, when a microblog user paying attention to the setting is added, the group to which the microblog user belongs can be specified. Because the microblog platform has length limitation on receiving microblog messages of concerned friends each time, if account numbers in a certain field are added with concerned target account numbers and the concerned target account numbers are not grouped, or even if the target account numbers are grouped but microblog contents are not received in batches by the target account numbers in each grouped group respectively, the problem that the microblog contents cannot be received because the length of the received microblog contents exceeds the length limitation can occur, therefore, the embodiment requires that the concerned target account numbers of the microblog account numbers are firstly grouped, then the microblog contents issued by the concerned target account numbers in each grouped group are respectively received, so that the microblog contents issued by the target account numbers in the group of only one group are received each time, and the problem that the received data lose data because the received data exceed the length limitation of the microblog platform due to excessive target account numbers from which the acquired content comes is reduced, the integrity of the acquired microblog data can be improved.
In this embodiment, the number of the related microblog account numbers is not limited herein, and one or more microblog account numbers may be used, and the specific number may be determined according to one or more of the factors of the precision of the public opinion analysis, the target field of the public opinion analysis, the limitation of the number of users of the microblog platform, the limitation of the access frequency of the microblog platform, and/or the limitation of the number of concerns of the microblog platform.
2. And (3) data analysis:
s21: processing the natural semantic text in the data acquired in step S12 into computer-processable structured data, and filtering out repeated content, wherein the processing method for the data is as follows:
s201: and filtering the targeted dialogue interaction information to eliminate the noise data as much as possible.
S202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
S22: and performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set.
S23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
3. And (4) outputting a result:
s31: classifying the preliminary microblog public opinion results of the related popularity obtained in the step S23 according to a set time period, wherein the time period can be divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years, and the like, or can be divided simultaneously by adopting more than one of the time units for positioning, so as to obtain the related microblog public opinion results of different time units.
S32: and sending public opinion early warning or public opinion brief report to the user or the demander according to the related hot microblog public opinion results of different time units, wherein the public opinion early warning or the public opinion brief report is based on the related hot microblog public opinion results.
S33: and classifying and collecting the obtained related hot microblog public opinion results of different time units according to the different time units to obtain related hot microblog public opinion result sets classified according to different time units.
And S34, outputting the related hot microblog public opinion result sets classified according to different time units.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (5)

1. A microblog big data public opinion analysis method is characterized by comprising the following steps:
the method comprises the following steps: collecting data;
receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping;
step two: analyzing data;
processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity;
step three: outputting the result;
and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results.
2. The microblog big data public opinion analysis method according to claim 1, characterized in that the first step: the data acquisition method comprises the following specific steps:
respectively logging in a microblog platform by using a plurality of registered microblog accounts in a mode of simulating user login;
and each microblog account number pays attention to and receives microblog content issued by a concerned friend in each preset group of the microblog account numbers in a grouping manner.
3. The microblog big data public opinion analysis method according to claim 1, characterized in that the second step: the specific steps of data analysis are as follows:
s21: filtering repeated contents from the computer-processable structured data obtained in the step two;
s22: performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set;
s23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.
4. The microblog big data public opinion analysis method according to claim 3, wherein the filtering method is as follows:
s201: filtering the dialogue interaction information with pertinence, and eliminating noise data as much as possible;
s202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.
5. The microblog big data public opinion analysis method according to claim 1, wherein the time periods are divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related hot microblog public opinion results of different time units.
CN202010430701.4A 2020-05-20 2020-05-20 Microblog big data public opinion analysis method Pending CN111666268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430701.4A CN111666268A (en) 2020-05-20 2020-05-20 Microblog big data public opinion analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430701.4A CN111666268A (en) 2020-05-20 2020-05-20 Microblog big data public opinion analysis method

Publications (1)

Publication Number Publication Date
CN111666268A true CN111666268A (en) 2020-09-15

Family

ID=72384018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430701.4A Pending CN111666268A (en) 2020-05-20 2020-05-20 Microblog big data public opinion analysis method

Country Status (1)

Country Link
CN (1) CN111666268A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157993A (en) * 2021-02-08 2021-07-23 电子科技大学 Network water army behavior early warning model based on time sequence graph polarization analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390051A (en) * 2013-07-25 2013-11-13 南京邮电大学 Topic detection and tracking method based on microblog data
WO2015016784A1 (en) * 2013-08-01 2015-02-05 National University Of Singapore A method and apparatus for tracking microblog messages for relevancy to an entity identifiable by an associated text and an image
CN104954234A (en) * 2015-05-19 2015-09-30 中国地质大学(北京) Microblog data acquisition method, microblog data acquisition device and public opinion analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390051A (en) * 2013-07-25 2013-11-13 南京邮电大学 Topic detection and tracking method based on microblog data
WO2015016784A1 (en) * 2013-08-01 2015-02-05 National University Of Singapore A method and apparatus for tracking microblog messages for relevancy to an entity identifiable by an associated text and an image
CN104954234A (en) * 2015-05-19 2015-09-30 中国地质大学(北京) Microblog data acquisition method, microblog data acquisition device and public opinion analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李亚芳等: "基于新浪微博大数据的新疆伽师6.4级地震舆情分析及可视化研究", 《内陆地震》, vol. 34, no. 01, 15 March 2020 (2020-03-15), pages 103 - 110 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157993A (en) * 2021-02-08 2021-07-23 电子科技大学 Network water army behavior early warning model based on time sequence graph polarization analysis

Similar Documents

Publication Publication Date Title
US9424319B2 (en) Social media based content selection system
US20160203187A1 (en) System and method for generating social summaries
Rubin et al. Deception detection for news: three types of fakes
CN110263238B (en) Big data-based public opinion listening system
Glance et al. Blogpulse: Automated trend discovery for weblogs
Graham et al. # IStandWithDan versus# DictatorDan: the polarised dynamics of Twitter discussions about Victoria’s COVID-19 restrictions
US20130297619A1 (en) Social media profiling
US9563770B2 (en) Spammer group extraction apparatus and method
CN105718587A (en) Network content resource evaluation method and evaluation system
CN101510879A (en) Method and apparatus for filtering rubbish contents
JP2013543610A (en) System and method for reputation management of consumer sent media
CN102945246B (en) The disposal route of network information data and device
US10387467B2 (en) Time-based sentiment normalization based on authors personality insight data points
Liu et al. SDHM: A hybrid model for spammer detection in Weibo
CN109033286B (en) Data statistical method and device
CN111310021A (en) Network public opinion monitoring method
CN111191096B (en) Method for identifying public opinion events and tracking popularity of whole-network patriotic
CN110825868A (en) Topic popularity based text pushing method, terminal device and storage medium
CN111984787A (en) Public opinion hotspot obtaining method and system based on internet data
Abinaya et al. Spam detection on social media platforms
CN112035603A (en) Propagation influence evaluation method for comprehensive calculation event
CN111666268A (en) Microblog big data public opinion analysis method
CN106886916A (en) Reputation management system and method
CN108959484B (en) Multi-strategy media data stream filtering method and device for event detection
Sakib et al. Automated detection of sockpuppet accounts in wikipedia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200915

RJ01 Rejection of invention patent application after publication