CN111666268A

CN111666268A - Microblog big data public opinion analysis method

Info

Publication number: CN111666268A
Application number: CN202010430701.4A
Authority: CN
Inventors: 张俊杰; 侍文超; 耿雁萍
Original assignee: Anhui Huolan Data Co ltd
Current assignee: Anhui Huolan Data Co ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2020-09-15

Abstract

The invention discloses a microblog big data public opinion analysis method, which relates to the technical field of public opinion analysis, and comprises the following steps in order to obtain a related hot microblog public opinion result: the method comprises the following steps: collecting data; receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping; step two: analyzing data; processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity; step three: outputting the result; and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results. A large amount of data is obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result.

Description

Microblog big data public opinion analysis method

Technical Field

The invention relates to the technical field of public opinion analysis, in particular to a microblog big data public opinion analysis method.

Background

With the advent of the web2.0 era, the number of microblog users is gradually huge, state information is frequently updated, information is rapidly spread, and microblog platform medium user occupancy is relatively centralized, so that analysis and research based on microblog data are a very interesting research direction.

Microblogs have a broad user base, public opinion information is generated and spread rapidly on a microblog platform, microblog users grow rapidly, and analysis based on microblog data has attracted social attention widely.

In order to effectively utilize microblogs to analyze social public opinions, the acquisition of microblog data is very important. For example, a large number of users are active on the Sina microblog, and nearly 1 hundred million microblog contents are generated every day.

The big data method is adopted to effectively monitor and analyze a large amount of public opinion information generated by the microblog in time, and has important practical significance for maintaining social stability and promoting national development.

In daily life, emergencies frequently occur, and users are increasingly accustomed to publishing their own opinions and emotions using social networks (e.g., blogs, forums, twitter, Facebook, etc.). The domestic users use the microblog more frequently and commonly, but the emotion of the users on the event does not remain unchanged, but continuously evolves along with the change of time or the development of the event, becomes stronger or weaker gradually, and even transforms from one emotion to another emotion. How to detect the emotional evolution process of the user to the emergency on line in real time has very important significance.

Disclosure of Invention

In view of the above, the present invention is to provide a method for analyzing public sentiment of big data of a microblog, so as to effectively analyze the public sentiment information generated by the microblog.

Based on the above purpose, the invention provides a microblog big data public opinion analysis method, which comprises the following steps:

the method comprises the following steps: collecting data;

receiving microblog content issued by each pre-set in-group concerned friend of a plurality of microblog accounts through attention and grouping;

step two: analyzing data;

processing the acquired microblog content into computer processable structured data, and filtering out repeated content to further obtain a preliminary microblog public opinion result of related popularity;

step three: outputting the result;

and classifying the obtained preliminary microblog public opinion results of the related popularity according to a set time period, and outputting the classified results.

Optionally, the first step: the data acquisition method comprises the following specific steps:

respectively logging in a microblog platform by using a plurality of registered microblog accounts in a mode of simulating user login;

and each microblog account number pays attention to and receives microblog content issued by a concerned friend in each preset group of the microblog account numbers in a grouping manner.

Optionally, the second step: the specific steps of data analysis are as follows:

s21: filtering repeated contents from the computer-processable structured data obtained in the step two;

s22: performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set;

s23: and according to the document set in the step S22, performing key hot words, trend analysis, negative information, topic detection, connection analysis, hot spot discovery and emotion analysis on the content of the document set, and extracting the keywords with high occurrence frequency and the URL address data information of the clicked result webpage to obtain a primary microblog public opinion result of the related popularity.

Optionally, the filtering method comprises:

s201: filtering the dialogue interaction information with pertinence, and eliminating noise data as much as possible;

s202: and removing word segmentation, stop words and illegal characters in the data set to obtain data set information with low interference degree preliminarily.

Optionally, the time periods are divided according to time units with different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related microblog public opinion results of different time units.

From the above, the microblog big data public opinion analysis method is provided, a large amount of data are obtained by capturing microblog data, and then the data are processed to obtain a related hot microblog public opinion result and a related hot microblog public opinion result set.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to specific embodiments below.

It should be noted that all expressions of "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of description and should not be construed as limiting the embodiments of the present invention, and the directions and positions used are used for explaining and understanding the present invention and are not used for limiting the following embodiments of the present invention.

A microblog big data public opinion analysis method comprises the following specific steps:

1. data acquisition:

and S11, logging in a microblog platform by using the registered microblog accounts in a mode of simulating user login.

The method aims to control the computer to automatically log in the microblog platform so that the computer can automatically acquire microblog contents.

And S12, receiving microblog contents released by each preset in-group attention friend of the microblog accounts respectively based on an attention-grouping mode.

Specifically, the method comprises the following steps: after the microblog account logs in the microblog platform, when a microblog user paying attention to the setting is added, the group to which the microblog user belongs can be specified. Because the microblog platform has length limitation on receiving microblog messages of concerned friends each time, if account numbers in a certain field are added with concerned target account numbers and the concerned target account numbers are not grouped, or even if the target account numbers are grouped but microblog contents are not received in batches by the target account numbers in each grouped group respectively, the problem that the microblog contents cannot be received because the length of the received microblog contents exceeds the length limitation can occur, therefore, the embodiment requires that the concerned target account numbers of the microblog account numbers are firstly grouped, then the microblog contents issued by the concerned target account numbers in each grouped group are respectively received, so that the microblog contents issued by the target account numbers in the group of only one group are received each time, and the problem that the received data lose data because the received data exceed the length limitation of the microblog platform due to excessive target account numbers from which the acquired content comes is reduced, the integrity of the acquired microblog data can be improved.

In this embodiment, the number of the related microblog account numbers is not limited herein, and one or more microblog account numbers may be used, and the specific number may be determined according to one or more of the factors of the precision of the public opinion analysis, the target field of the public opinion analysis, the limitation of the number of users of the microblog platform, the limitation of the access frequency of the microblog platform, and/or the limitation of the number of concerns of the microblog platform.

2. And (3) data analysis:

s21: processing the natural semantic text in the data acquired in step S12 into computer-processable structured data, and filtering out repeated content, wherein the processing method for the data is as follows:

s201: and filtering the targeted dialogue interaction information to eliminate the noise data as much as possible.

S22: and performing clustering analysis based on word vectors on the preprocessed data, clustering by adopting an optimized mean value, and combining each type of data into a document set.

3. And (4) outputting a result:

s31: classifying the preliminary microblog public opinion results of the related popularity obtained in the step S23 according to a set time period, wherein the time period can be divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years, and the like, or can be divided simultaneously by adopting more than one of the time units for positioning, so as to obtain the related microblog public opinion results of different time units.

S32: and sending public opinion early warning or public opinion brief report to the user or the demander according to the related hot microblog public opinion results of different time units, wherein the public opinion early warning or the public opinion brief report is based on the related hot microblog public opinion results.

S33: and classifying and collecting the obtained related hot microblog public opinion results of different time units according to the different time units to obtain related hot microblog public opinion result sets classified according to different time units.

And S34, outputting the related hot microblog public opinion result sets classified according to different time units.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A microblog big data public opinion analysis method is characterized by comprising the following steps:

the method comprises the following steps: collecting data;

step two: analyzing data;

step three: outputting the result;

2. The microblog big data public opinion analysis method according to claim 1, characterized in that the first step: the data acquisition method comprises the following specific steps:

3. The microblog big data public opinion analysis method according to claim 1, characterized in that the second step: the specific steps of data analysis are as follows:

4. The microblog big data public opinion analysis method according to claim 3, wherein the filtering method is as follows:

5. The microblog big data public opinion analysis method according to claim 1, wherein the time periods are divided according to time units of different lengths, such as hours, days, weeks, months, quarters, years and the like, and one or more time units are selected to process the preliminary microblog public opinion results to obtain related hot microblog public opinion results of different time units.