CN113051455A - Water affair public opinion identification method based on network text data - Google Patents
Water affair public opinion identification method based on network text data Download PDFInfo
- Publication number
- CN113051455A CN113051455A CN202110346900.1A CN202110346900A CN113051455A CN 113051455 A CN113051455 A CN 113051455A CN 202110346900 A CN202110346900 A CN 202110346900A CN 113051455 A CN113051455 A CN 113051455A
- Authority
- CN
- China
- Prior art keywords
- topic
- network text
- text data
- water
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a water public opinion identification method based on network text data, which comprises the following steps: 1. acquiring the network text data related to the water affairs, 2 preprocessing the network text data related to the water affairs, 3 analyzing the network text data related to the water affairs and finding out the focus of the water affair public opinion. The method can determine the webpage searching strategy according to the type of the website, realize the rapid acquisition of the text data related to the water affairs from the mass network data, and find the water affair public opinion concern by combining with the theme analysis so as to realize the water affair public opinion identification, thereby improving the efficiency and the accuracy of the water affair public opinion identification, and the result has good interpretability.
Description
Technical Field
The invention relates to the technical field of data mining, in particular to a water affair public opinion identification method based on network text data.
Background
With the rapid development of the internet and the constant change of people's life style, network data related to various industries show explosive growth. Most of the network data are related to people and show the guidance of social public opinion, so that the data on the network are highly valued by enterprises, and the water affairs industry also realizes the identification of the water affairs public opinion by acquiring network texts related to water affairs from mass network data and finding out the attention points of the social public opinion from the network texts.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a water affair public opinion identification method based on network text data, so that the problem that the water affair public opinion is difficult to identify can be solved, the text data related to the water affair can be quickly acquired from massive network data, the focus of the water affair network text can be accurately analyzed, and the efficiency and the accuracy of the water affair public opinion identification can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a water affair public opinion identification method based on network text data, which is characterized by comprising the following steps:
step 1, acquiring network text data related to water affairs;
step 1.1, adopting different webpage searching strategies to collect the network text data of the target webpage according to the type of the website:
if the website type is the official website of the water supply group, adopting a breadth-first strategy;
if the website type is a government portal website, adopting a depth priority strategy;
if the website type is a network community or forum website, acquiring the type of a user publishing topics, comments or messages related to water affairs according to the network text data of the target webpage, and determining a webpage searching strategy;
if the type of the user is an official user, adopting a depth priority strategy;
if the type of the user is a personal user, adopting a depth priority strategy and an extent priority strategy;
step 1.2, acquiring the grades of all participating users of a first topic to which the network text data belongs according to the network text data of the target webpage;
if the level of the participating user belonging to the first topic meets the preset level requirement, collecting all network text data published by the corresponding participating user under the belonging first topic;
step 1.3, acquiring topic participation times of all participating users of a second topic to which the network text data belongs according to the network text data of the target webpage;
if the topic participation frequency of the participating user belonging to the second topic exceeds a preset participation frequency threshold, acquiring network text data issued by the corresponding participating user in the life cycle of the belonging second topic;
step 1.4, acquiring all participating users and grades thereof of a third topic to which the published network text data belongs according to the network text data published by the participating users in the life cycle of the second topic to which the participating users belong;
if the grade of the participating user belonging to the third topic meets the preset grade requirement, collecting all network text data published by the corresponding participating user under the third topic;
step 2, preprocessing the network text data related to the water affairs;
step 2.1, performing word segmentation processing on the network text data related to the water affairs so as to convert the text into word vectors;
2.2, constructing a network text data stop word list related to the water affairs, and performing stop word removing processing on the word vector to obtain the word vector without stop words;
step 3, analyzing the network text data related to the water affairs and finding out the public opinion of the water affairs;
step 3.1, constructing a corpus by utilizing the preprocessed web text data, and assuming that M pieces of water affair web texts exist in the corpus, expressing all word vectors and corresponding topics in the corpus as Wherein the content of the first and second substances,representing the word vector in the mth water affairs web text,representing word vectorsA corresponding topic number;
step 3.2, calculating the topic generation probability of the water affair network text in the corpus:
step 3.2.1, obtaining the theme generation probability of the mth water affair network text by using the formula (1)
In the formula (1), the reaction mixture is,a word number vector representing the m-th water affair network text according to the subject statistics, anRepresenting the number of words generated by the kth topic in the mth water affair network text,for the hyperparameter, Δ (·) represents a normalization function;
step 3.2.2, obtaining the theme generation probability of the water affair network text in the corpus by using the formula (2)
Step 3.3, calculating the word generation probability of the water affair network text in the corpus:
In the formula (3), the reaction mixture is,representing the word vector produced by the k-th topic,represents a number vector of words generated by the kth topic, andindicates the number of the t-th word generated by the k-th topic,is a hyper-parameter;
step 3.3.2, obtaining the generation probability of the water affair network text words in the corpus by using the formula (4)
Step 3.4, calculating joint probability generated by the water affair network text in the corpus by using the formula (5)
And 3.5, updating the theme of each word in the corpus by using the formula (6):
in the formula (6), ziIndicates the subject corresponding to the ith word, k indicates the subject number,indicating that the subject excluding the ith word, the remaining words,a vector of words is represented that is,indicates the number of words, alpha, corresponding to the kth topic in the mth water affair network text after the ith word is eliminatedkIs a hyperparameterThe k dimension of (b), betatIs a hyperparameterThe (d) th dimension of (a),indicating the number of t words generated by the kth subject excluding the ith word, and V indicating the length of the whole water affair network text corpus;
Step 3.7, calculating the kth theme distribution of the mth water affair network text by using the formula (8)
Step 3.8, according to the word distribution under the k topicAnd selecting the first N words from the current kth theme as keywords of the kth theme, and describing and analyzing the kth theme according with the actual meaning of the water affair public opinion according to the semantics of the keywords, so that the points of concern of the social public opinion and the mainstream media on the water affair are found, and the water affair public opinion is identified.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention discovers the water affair public opinion concern by utilizing the water affair network text data and combining with the theme analysis, realizes the identification of the water affair public opinion, and improves the efficiency and the accuracy of the identification of the water affair public opinion.
2. The invention provides a webpage searching strategy determined according to the type of the website, and the webpage searching strategy is adopted to collect the water service network text data of the target webpage, so that the text data related to the water service can be quickly acquired from massive network data, and the acquisition efficiency of the water service network text data is improved.
3. The text data analysis method using LDA topic modeling is suitable for processing large-scale text data sets, realizes the discovery of the water affair public opinion concern and completes the water affair public opinion identification through the analysis of a large amount of water affair network text data, and the result has good interpretability.
Drawings
FIG. 1 is a flow chart of a water service web text data acquisition method of the present invention;
FIG. 2 is a schematic diagram of a structure of a web page network node according to an embodiment of the present invention;
FIG. 3 is a flow chart of the water public opinion identification of the present invention;
FIG. 4 is a diagram of a subject modeling model topology structure according to the present invention;
FIG. 5 is a schematic diagram of a probability map representation of a subject modeled directed graph.
Detailed Description
In this embodiment, as shown in fig. 3, a method for identifying public water affairs based on web text data is performed according to the following steps:
in practical applications, the water affair network text data may be related data such as a message of each water supply group and a government portal website, topics of a network community, and news reports of each media platform, and for convenience of description, the following description will take the example that the water affair network text data is the message data of the government portal website.
Step 1, as shown in fig. 1, acquiring network text data related to water affairs;
s10, comparing the website of the target webpage with the reference website to determine the type of the website;
s20, determining a webpage searching strategy according to the type of the website;
and S30, acquiring the webpage data of the target webpage according to the webpage searching strategy.
Step 1.1, adopting different webpage searching strategies to collect the network text data of the target webpage according to the type of the website:
in the method, the webpage searching strategy can comprise a breadth first strategy, a depth first strategy, a combination of the breadth first strategy and the depth first strategy, and the webpage searching strategy can comprise a combination of the depth first strategy and the breadth first strategy, a combination of the breadth first strategy and the depth first strategy, or a combination of the depth first strategy and the breadth first strategy. The following determines that the web page search policy is a policy in which both the depth-first policy and the breadth-first policy refer to a combination of the depth-first policy and the breadth-first policy.
Fig. 2 shows a web page node structure of a website according to an embodiment. The first-layer webpage node is A (root node), the second-layer node comprises B, C and D, the third-layer node comprises E, F, G, H and I, if the webpage searching strategy is determined to be the breadth-first strategy, the traversal crawling path of the breadth-first strategy is A- > B- > C- > D- > E- > F- > G- > H- > I, and if the webpage searching strategy is determined to be the depth-first strategy, the traversal processing path of the depth-first strategy is A- > B- > E- > F- > C- > G- > D- > H- > I.
Before determining the web search policy according to the type of the web address in step S20, the type of the web address may be bound or mapped with the web search policy in advance, and the type of the web address may be determined by comparing the web address of the target web page with a preset reference web address, and further determining the web search policy. Step S20 may include adding a website type tag to the reference website in a pre-established database, comparing the website of the target webpage with the reference website, determining the reference website with the highest similarity to the website of the target webpage, reading the website type tag of the reference website, and determining the website type of the target webpage according to the read website tag. The types of the websites can comprise official websites of a water supply group, websites of a network community and websites of a government information portal, and different webpage searching strategies are adopted to collect webpage data according to different website types, so that the efficiency and the accuracy of acquiring the webpage data are improved.
If the website type is the official website of the water supply group, adopting a breadth-first strategy;
if the website type is a government portal website, adopting a depth priority strategy;
in one embodiment, after the acquired website of the target webpage is compared with the reference website, the reference website most similar to the website of the target webpage is determined, the website label of the reference website is read, the website type of the target webpage is acquired as a government portal website, the website is a display platform for most directly feeding back messages by the masses, the messages comprise messages related to keywords such as water supply, water consumption, water fee and water quality, and a webpage search strategy is determined as a depth priority strategy according to pre-binding. And the water affair network text data is the water affair-related message feedback data in the government portal website message block.
If the website type is a network community or forum website, acquiring the type of a user publishing topics, comments or messages related to water affairs according to the network text data of a target webpage (generally a community or forum home page), thereby determining a webpage search strategy;
if the type of the user is a water service group or a news media official user, adopting a depth priority strategy;
if the type of the user is a personal user, adopting a depth priority strategy and a breadth priority strategy;
step 1.2, acquiring the grades of all participating users (namely first participating users) of a first topic (a first topic or a keyword defined according to needs) to which the network text data belongs according to the network text data of the target webpage;
if the grade of the first participating user meets the preset grade requirement, acquiring all network text data which are published under the first topic of the first participating user and are related to the water affairs;
step 1.3, acquiring topic participation times of all participating users (namely second participating users) of a second topic (a second topic or a keyword defined according to needs) to which the network text data belongs according to the network text data of the target webpage;
if the topic participation frequency of the second participating user exceeds a preset participation frequency threshold, acquiring network text data issued by the second participating user in the life cycle (a time period defined according to needs) of the second topic;
step 1.4, acquiring a third topic (a third topic or a keyword defined as required) to which the published network text data belongs, all participating users of the third topic (namely third participating users) and the levels thereof according to the network text data published by the second participating users in the life cycle of the second topic;
if the grade of the third participating user meets the preset grade requirement, collecting all network text data published by the third participating user under the third topic;
in step S30 in fig. 1, when acquiring the water service network text data of the target web page according to the web page search policy, a data acquisition technique needs to be determined, which may include acquiring the text data by using the beautiful sound technology of python and the matching technology of regular expressions, or may also adopt a distributed parallel automatic acquisition technique. Firstly, an initial url queue is constructed, html content of each webpage is obtained through requests.get (url), then Beautiful Soup technology is used for analyzing bSoup ═ Beautiful Soup (responseHtml. text, 'html. part'), all needed url addresses bSoup. find _ all ('a', href ═ re. complex (regex)) in the page can be obtained through a find _ all method, url with a specified form is obtained, and then the url is added into the queue one by one.
After the search strategy and the acquisition technology are determined, the business network text data can be crawled, and the method mainly comprises three parts of splicing, acquiring and analyzing HTML codes or json codes and acquiring text data of target addresses. On the basis of successful login, some Chinese keywords are spliced with a known address after being subjected to MD5 coding, and some Chinese keywords are spliced with the address according to page numbers or text numbers and the coding and the address of the keywords, so that a target URL address is obtained; then simulating a browser to access a webpage to acquire an HTML code or a json code; finally, the HTML codes or json codes are analyzed and the required text data related to the water affairs are extracted from the HTML codes or json codes
And finally, storing the acquired water service network text data, wherein the process mainly stores a list formed by the text data acquired in the last step into a MySQL database or a txt text by using circulation so as to analyze and mine the acquired water service network text data.
In the data acquisition process, python can be used as a development tool, pycharm is used as a development environment, MySQL or a local file is used as data storage, strategies related to the web crawler and the python web crawler technology are combined, codes for realizing acquisition of each water affair web text data are designed, finally, the water affair web text data can be acquired through real-time program operation, and the data acquisition efficiency is improved.
In one embodiment, the water service network text data is stored in a local file, and then the text data is preprocessed and mined, including word segmentation, word deactivation and topic analysis.
Step 2, preprocessing the network text data related to the water affairs;
step 2.1, performing word segmentation processing on the network text data related to the water affairs so as to convert the text into a word set;
in one embodiment, the word segmentation is performed on the water affair text corpus formed by the acquired water affair-related message text data of the government portal website, and the word segmentation is performed on the water affair text corpus by using a result word segmentation kit in python.
2.2, constructing a network text data stop word list related to the water affairs, and performing stop word removing processing on the word set to obtain the word set without stop words;
in one embodiment, the participled water affair text corpus is subjected to stop word processing, and various punctuations, special characters, tone words and idioms are added into a stop word list, such as ' hello ', leadership ' and the like. And matching the vocabulary in the water affair text with the vocabulary in the stop word list by adopting a character string matching method, and removing the vocabulary matched with the stop words in the water affair text, thereby reducing noise data and effectively reducing the influence of irrelevant vocabulary on theme description.
Step 3, analyzing the network text data related to the water affairs and finding out the public opinion of the water affairs;
step 3.1, constructing a corpus by utilizing the preprocessed web text data, and assuming that M pieces of water affair web texts exist in the corpus, expressing all word vectors and corresponding topics in the corpus as Wherein the content of the first and second substances,representing the word vector in the mth water affairs web text,representing word vectorsA corresponding topic number;
step 3.2, calculating the topic generation probability of the water affair network text in the corpus:
step 3.2.1, obtaining the theme generation probability of the mth water affair network text by using the formula (1)
In the formula (1), the reaction mixture is,a word number vector representing the m-th water affair network text according to the subject statistics, anRepresenting the number of words generated by the kth topic in the mth water affair network text,as a parameter, Δ (·) represents a normalization function that, for a K-dimensional vector X,Γ (x) is a gamma function;
step 3.2.2, obtaining the water affair network text in the corpus by using the formula (2)Subject matter generation probability of the present
Step 3.3, calculating the word generation probability of the water affair network text in the corpus:
In the formula (3), the reaction mixture is,representing the word vector produced by the k-th topic,represents a number vector of words generated by the kth topic, andindicates the number of words t generated by the kth topic,is a hyper-parameter;
step 3.3.2, obtaining the generation probability of the water affair network text words in the corpus by using the formula (4)
Step 3.4, calculating joint probability generated by the water affair network text in the corpus by using the formula (5)
And 3.5, updating the theme of each word in the corpus by using the formula (6):
in the formula (6), ziIndicates the subject corresponding to the ith word, k indicates the subject number,indicating that the subject excluding the ith word, the remaining words,a vector of words is represented that is,indicates the number of words, alpha, corresponding to the kth topic in the mth water affair network text after the ith word is eliminatedkIs a hyperparameterThe k dimension of (b), betatIs a hyperparameterThe (d) th dimension of (a),indicating the number of t words generated by the kth subject excluding the ith word, and V indicating the length of the whole water affair network text corpus;
in one embodiment, the topic analysis may be performed on the preprocessed water affairs text corpus by using an LDA topic modeling method, as shown in fig. 4, which is a topological structure diagram of a subject modeling model of an embodiment, where C1 is a document layer, C2 is a topic layer, and C3 is a word layer, and fig. 5 is a schematic diagram of a probability diagram representation of a subject modeling directed graph of an embodiment, where the topic modeling method includes inference on a topic, and updating the topic for each word by using equation (6).
Step 3.7, calculating the kth theme distribution of the mth water affair network text by using the formula (8)
Step 3.8, according to the word distribution under the k topicAnd selecting the first N words from the current kth theme as keywords of the kth theme, and describing and analyzing the kth theme according with the actual meaning of the water affair public opinion according to the semantics of the keywords, so that the points of concern of the social public opinion and the mainstream media on the water affair are found, and the water affair public opinion is identified.
The table shows the topic vocabulary obtained by topic analysis of one embodiment:
for the kth theme, top N words are selected from the current theme as key words and main descriptive words of the theme, and then according to the descriptive word semantics and word distribution phik,tAnd explaining the actual meaning of the current theme in the public opinion, and thus mining the attention point of the public opinion by combining a plurality of themes.
Claims (1)
1. A water affair public opinion identification method based on network text data is characterized by comprising the following steps:
step 1, acquiring network text data related to water affairs;
step 1.1, adopting different webpage searching strategies to collect the network text data of the target webpage according to the type of the website:
if the website type is the official website of the water supply group, adopting a breadth-first strategy;
if the website type is a government portal website, adopting a depth priority strategy;
if the website type is a network community or forum website, acquiring the type of a user publishing topics, comments or messages related to water affairs according to the network text data of the target webpage, and determining a webpage searching strategy;
if the type of the user is an official user, adopting a depth priority strategy;
if the type of the user is a personal user, adopting a depth priority strategy and an extent priority strategy;
step 1.2, acquiring the grades of all participating users of a first topic to which the network text data belongs according to the network text data of the target webpage;
if the level of the participating user belonging to the first topic meets the preset level requirement, collecting all network text data published by the corresponding participating user under the belonging first topic;
step 1.3, acquiring topic participation times of all participating users of a second topic to which the network text data belongs according to the network text data of the target webpage;
if the topic participation frequency of the participating user belonging to the second topic exceeds a preset participation frequency threshold, acquiring network text data issued by the corresponding participating user in the life cycle of the belonging second topic;
step 1.4, acquiring all participating users and grades thereof of a third topic to which the published network text data belongs according to the network text data published by the participating users in the life cycle of the second topic to which the participating users belong;
if the grade of the participating user belonging to the third topic meets the preset grade requirement, collecting all network text data published by the corresponding participating user under the third topic;
step 2, preprocessing the network text data related to the water affairs;
step 2.1, performing word segmentation processing on the network text data related to the water affairs so as to convert the text into word vectors;
2.2, constructing a network text data stop word list related to the water affairs, and performing stop word removing processing on the word vector to obtain the word vector without stop words;
step 3, analyzing the network text data related to the water affairs and finding out the public opinion of the water affairs;
step 3.1, constructing a corpus by utilizing the preprocessed web text data, and assuming that M pieces of water affair web texts exist in the corpus, expressing all word vectors and corresponding topics in the corpus as Wherein the content of the first and second substances,representing the word vector in the mth water affairs web text,means direction of wordsMeasurement ofA corresponding topic number;
step 3.2, calculating the topic generation probability of the water affair network text in the corpus:
step 3.2.1, obtaining the theme generation probability of the mth water affair network text by using the formula (1)
In the formula (1), the reaction mixture is,a word number vector representing the m-th water affair network text according to the subject statistics, an Representing the number of words generated by the kth topic in the mth water affair network text,for the hyperparameter, Δ (·) represents a normalization function;
step 3.2.2, obtaining the theme generation probability of the water affair network text in the corpus by using the formula (2)
Step 3.3, calculating the word generation probability of the water affair network text in the corpus:
In the formula (3), the reaction mixture is,representing the word vector produced by the k-th topic,represents a number vector of words generated by the kth topic, and indicates the number of the t-th word generated by the k-th topic,is a hyper-parameter;
step 3.3.2, obtaining the generation probability of the water affair network text words in the corpus by using the formula (4)
Step 3.4, calculating joint probability generated by the water affair network text in the corpus by using the formula (5)
And 3.5, updating the theme of each word in the corpus by using the formula (6):
in the formula (6), ziIndicates the subject corresponding to the ith word, k indicates the subject number,indicating that the subject excluding the ith word, the remaining words,a vector of words is represented that is,indicates the number of words, alpha, corresponding to the kth topic in the mth water affair network text after the ith word is eliminatedkIs a hyperparameterThe k dimension of (b), betatIs a hyperparameterThe (d) th dimension of (a),indicating the number of t words generated by the kth subject excluding the ith word, and V indicating the length of the whole water affair network text corpus;
Step 3.7, calculating the kth theme distribution of the mth water affair network text by using the formula (8)
Step 3.8, according to the word distribution under the k topicAnd selecting the first N words from the current kth theme as keywords of the kth theme, and describing and analyzing the kth theme according with the actual meaning of the water affair public opinion according to the semantics of the keywords, so that the points of concern of the social public opinion and the mainstream media on the water affair are found, and the water affair public opinion is identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110346900.1A CN113051455B (en) | 2021-03-31 | 2021-03-31 | Water affair public opinion identification method based on network text data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110346900.1A CN113051455B (en) | 2021-03-31 | 2021-03-31 | Water affair public opinion identification method based on network text data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113051455A true CN113051455A (en) | 2021-06-29 |
CN113051455B CN113051455B (en) | 2022-04-26 |
Family
ID=76516631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110346900.1A Active CN113051455B (en) | 2021-03-31 | 2021-03-31 | Water affair public opinion identification method based on network text data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113051455B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996450A (en) * | 2022-05-27 | 2022-09-02 | 华中科技大学 | Water public opinion big data analysis method based on double-layer fastText model |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809252A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Internet data extraction system |
CN107229735A (en) * | 2017-06-13 | 2017-10-03 | 成都布林特信息技术有限公司 | Public feelings information analysis and early warning method based on natural language processing |
CN107256263A (en) * | 2017-06-13 | 2017-10-17 | 成都布林特信息技术有限公司 | Internet hot spots information automatic monitoring method |
CN107291778A (en) * | 2016-04-11 | 2017-10-24 | 中兴通讯股份有限公司 | The collection method and device of data |
CN109145215A (en) * | 2018-08-29 | 2019-01-04 | 中国平安保险(集团)股份有限公司 | Internet public opinion analysis method, apparatus and storage medium |
CN109471965A (en) * | 2018-10-26 | 2019-03-15 | 四川才子软件信息网络有限公司 | A kind of network public-opinion data sampling and processing method and monitoring platform based on big data |
EP3499508A1 (en) * | 2017-12-14 | 2019-06-19 | Koninklijke Philips N.V. | Computer-implemented method and apparatus for generating information |
CN110163688A (en) * | 2019-05-30 | 2019-08-23 | 复旦大学 | Commodity network public sentiment detection system |
WO2019227710A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Network public opinion analysis method and apparatus, and computer-readable storage medium |
US20190377763A1 (en) * | 2009-09-28 | 2019-12-12 | Ebay Inc. | System and method for topic extraction and opinion mining |
CN112395539A (en) * | 2020-11-26 | 2021-02-23 | 格美安(北京)信息技术有限公司 | Public opinion risk monitoring method and system based on natural language processing |
-
2021
- 2021-03-31 CN CN202110346900.1A patent/CN113051455B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190377763A1 (en) * | 2009-09-28 | 2019-12-12 | Ebay Inc. | System and method for topic extraction and opinion mining |
CN104809252A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Internet data extraction system |
CN107291778A (en) * | 2016-04-11 | 2017-10-24 | 中兴通讯股份有限公司 | The collection method and device of data |
CN107229735A (en) * | 2017-06-13 | 2017-10-03 | 成都布林特信息技术有限公司 | Public feelings information analysis and early warning method based on natural language processing |
CN107256263A (en) * | 2017-06-13 | 2017-10-17 | 成都布林特信息技术有限公司 | Internet hot spots information automatic monitoring method |
EP3499508A1 (en) * | 2017-12-14 | 2019-06-19 | Koninklijke Philips N.V. | Computer-implemented method and apparatus for generating information |
WO2019227710A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Network public opinion analysis method and apparatus, and computer-readable storage medium |
CN109145215A (en) * | 2018-08-29 | 2019-01-04 | 中国平安保险(集团)股份有限公司 | Internet public opinion analysis method, apparatus and storage medium |
CN109471965A (en) * | 2018-10-26 | 2019-03-15 | 四川才子软件信息网络有限公司 | A kind of network public-opinion data sampling and processing method and monitoring platform based on big data |
CN110163688A (en) * | 2019-05-30 | 2019-08-23 | 复旦大学 | Commodity network public sentiment detection system |
CN112395539A (en) * | 2020-11-26 | 2021-02-23 | 格美安(北京)信息技术有限公司 | Public opinion risk monitoring method and system based on natural language processing |
Non-Patent Citations (2)
Title |
---|
QINJUAN YANG等: "Segment-level joint topic-sentiment model for online review analysis", 《IEEE INTELLIGENT SYSTEMS》 * |
吴卿凤等: "江苏水利网络舆情年度数据分析及思考", 《江苏水利》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996450A (en) * | 2022-05-27 | 2022-09-02 | 华中科技大学 | Water public opinion big data analysis method based on double-layer fastText model |
Also Published As
Publication number | Publication date |
---|---|
CN113051455B (en) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN106682192B (en) | Method and device for training answer intention classification model based on search keywords | |
CN107229668B (en) | Text extraction method based on keyword matching | |
CN109726274B (en) | Question generation method, device and storage medium | |
CN110609983B (en) | Structured decomposition method for policy file | |
CN104102721A (en) | Method and device for recommending information | |
CN111767725B (en) | Data processing method and device based on emotion polarity analysis model | |
CN111581376B (en) | Automatic knowledge graph construction system and method | |
CN103544255A (en) | Text semantic relativity based network public opinion information analysis method | |
CN104471568A (en) | Learning-based processing of natural language questions | |
CN110968782A (en) | Student-oriented user portrait construction and application method | |
CN104268148A (en) | Forum page information auto-extraction method and system based on time strings | |
CN113535917A (en) | Intelligent question-answering method and system based on travel knowledge map | |
CN112183056A (en) | Context-dependent multi-classification emotion analysis method and system based on CNN-BilSTM framework | |
CN107862039B (en) | Webpage data acquisition method and system and data matching and pushing method | |
CN115796181A (en) | Text relation extraction method for chemical field | |
CN1629837A (en) | Method and apparatus for processing, browsing and classified searching of electronic document and system thereof | |
Jia et al. | OpenKN: An open knowledge computational engine for network big data | |
CN109165373B (en) | Data processing method and device | |
CN110110218B (en) | Identity association method and terminal | |
CN115329085A (en) | Social robot classification method and system | |
CN113051455B (en) | Water affair public opinion identification method based on network text data | |
CN109992723B (en) | User interest tag construction method based on social network and related equipment | |
Nethra et al. | WEB CONTENT EXTRACTION USING HYBRID APPROACH. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |