CN111159519B - Public safety public opinion analysis method based on website click stream - Google Patents

Public safety public opinion analysis method based on website click stream Download PDF

Info

Publication number
CN111159519B
CN111159519B CN201911373986.6A CN201911373986A CN111159519B CN 111159519 B CN111159519 B CN 111159519B CN 201911373986 A CN201911373986 A CN 201911373986A CN 111159519 B CN111159519 B CN 111159519B
Authority
CN
China
Prior art keywords
website
data
user
click stream
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911373986.6A
Other languages
Chinese (zh)
Other versions
CN111159519A (en
Inventor
王誓伟
徐晓斌
李阳阳
金昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Beijing University of Posts and Telecommunications
Electronic Science Research Institute of CTEC
Original Assignee
Beijing University of Technology
Beijing University of Posts and Telecommunications
Electronic Science Research Institute of CTEC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology, Beijing University of Posts and Telecommunications, Electronic Science Research Institute of CTEC filed Critical Beijing University of Technology
Priority to CN201911373986.6A priority Critical patent/CN111159519B/en
Publication of CN111159519A publication Critical patent/CN111159519A/en
Application granted granted Critical
Publication of CN111159519B publication Critical patent/CN111159519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a public safety public opinion analysis method based on website click stream data, which is used for solving the problems of incomplete data source, inaccurate public opinion analysis and no real-time property in the current public opinion analysis. The specific content comprises the following steps: 1) by acquiring website content data and user access log data of the website as data sources, the problem that the data sources are incomplete in current public opinion analysis is solved; 2) because the click stream data contains behavior information accessed by the user, the analysis dimension of public opinion analysis can be increased, so that the public opinion analysis is more objective, accurate and comprehensive; 3) by matching the IP address information in the click stream data, the actual geographic position information of the user can be obtained, and the real-time and real public opinion analysis result based on the geographic position can be obtained based on the actual geographic position information of the user.

Description

Public safety public opinion analysis method based on website click stream
Technical Field
The invention provides a data processing and analyzing method for website offline data and real-time streaming data based on a Hadoop big data processing platform. The data source used in the method is website server log data in a general log format. Compared with the traditional statistical analysis algorithm, the method can realize more comprehensive analysis and more accurate prediction on public safety and public sentiment.
Technical Field
Now that big data technology has become widespread, the global data volume is expected to reach 35ZB (equivalent to 35 trillion GB), and analysis companies have pointed out that the newly generated data volume is increasing at a rate of at least 50% per year, and the big data era has come. In the big data era, public safety of cities is guaranteed to be protected from diseases in the future, public opinion supervision is an important implementation way, real-time analysis and prediction of public opinions can be realized through data analysis and prediction technologies, and supervision departments can timely make corresponding measures based on the prediction of the public opinions to prevent diseases in the future, so that the public opinions are controlled to a certain extent, and public safety is further guaranteed.
The traditional public opinion analysis method focuses more on analysis of webpage content and user evaluation content, the main source of data is content data of a website, so that some user access data cannot be collected and utilized, and most of the data analysis process adopts a statistical method for analysis, so that certain limitations exist. Therefore, it becomes an objective need to establish a complete, comprehensive and detailed public opinion analysis model. The model comprehensively processes and analyzes click stream data and webpage content data through a big data processing platform, so that relatively comprehensive analysis, prediction and display are realized.
Disclosure of Invention
The method aims to solve the problem that the data source is not comprehensive in the current public opinion analysis. The method takes website content data and website log data as source data, performs characteristic word frequency statistical processing on the content data through a preprocessing model, and processes the website log data through some methods of data mining to obtain website click stream data. So that more comprehensive and multidimensional data can be acquired. Aiming at the problems of poor data analysis real-time performance and inaccurate analysis result in the current public opinion analysis, the method provides real-time processing of click stream data and realizes more accurate website public opinion analysis based on two aspects of website content data and website log data. The overall architecture of the method is shown in fig. 1.
The invention specifically comprises the following steps:
a public safety public opinion analysis method based on website click stream is realized based on an offline big data processing platform Hadoop and a real-time stream data processing platform Storm.
First, the important objects involved in the method will be explained:
[1] the positive and negative face vocabulary dictionary comprises word part of speech types, emotion intensity and polarity information.
[2] Website content data: including content data of news or information in the web site and user comment plain text data. Taking the microblog as an example, the microblog is mainly data such as content of a certain event, user evaluation and the like. Further, social, news media and other website content data cover the following aspects: 1. data of news content; 2. the user reviews the content data.
[3] Click stream data: when a user accesses a website, the access information of the user is recorded by each mouse click, and the access clicks of each user are connected to form user click stream data. Website clickstream data is typically obtained from a website server log file.
[4] Common Log Format CLF (Commom Log Format): as a standard log format of a website server, some basic information of a user accessing a website is recorded and adopted by most websites. The invention adopts the logs in the general log format as data sources of click stream data, and the data needs to be stored in a big data storage platform Hive in a persistent mode on one hand and provides the data sources for a stream data processing platform Storm in a real-time stream data mode on the other hand. The basic architecture of the clickstream data processing model is shown in fig. 2.
The website click stream data is an important source for analyzing website user data, and can track the track of a user accessing a website and determine information such as time, place and duration of the user accessing the website. Obtaining clickstream data in the common format log [4] includes the following information: (1) and (3) user identification: in the access log of the website, the IP address of the user is recorded and used as the user identifier of the click stream data. (2) Resource address requested by user: in the click stream data, only the HTML address requested by the user is recorded, and the CSS, JS, and addresses of the picture video related to the HTML file in the website are not recorded. (3) Time of request for relevant resource: recording the specific time of the user for accessing the website; (4) address directing user access to current page: guiding the user to enter the last address of the current address; (5) success rate of resource access: and taking the ratio of the number of the resources successfully accessed by the user to the total number of the resources accessed by the user as the success rate of the resource access.
Based on the defined clickstream data [3] information, the present invention defines a clickstream data structure in the form of a linked list. The head pointer of the linked list is a user identifier, and resource address information requested by a user, time information for requesting related resources, address information for guiding the user to access the current page and the success rate of resource access are sequentially used as nodes of the linked list to be connected behind the user identifier. Each node also consists of a linked list, the head pointer of the linked list stores the data when the user accesses the website for the first time, and the data accessed each time is added to the linked list of each part of data. The clickstream data structure definition format is shown in fig. 3.
The reason for adopting such a data structure is as follows:
1. uniquely determining an accessing user;
2. recording tracks of users for accessing resources in detail;
3. recording the starting time of a user for accessing a specific resource and the residence time of each resource;
4. determining a page last visited by a user;
5. determining the success rate of the user for accessing the resource;
6. the operation of quickly inserting data can be realized;
the traditional public opinion analysis method mainly comprises the following processes: 1) acquiring text data from a webpage concerned in the Internet; 2) segmenting text data in a webpage; 3) counting the occurrence frequency of each word segmentation vocabulary in the text data; 4) acquiring characteristic words of the webpage according to the word frequency sorting result of the word segmentation; 5) and weighting the feature words of the current webpage according to the weight of the feature words in the pre-made positive and negative vocabulary dictionary [1] to obtain the emotional color of the webpage.
The traditional public opinion analysis is based on the analysis of webpage content, namely news information content, user evaluation content and other text data, focuses on analyzing the text content data of a website, and does not combine the behaviors of website users, so that the analysis method has certain limitations. The website click stream data can record the behavior track of a certain user in detail, so the invention provides a public opinion analysis method based on click stream combined with website content data, and the advantages of the click stream and the website content data are combined, so that website analysis is more objective and comprehensive.
As the emotional color of the whole website needs to be calculated, the click stream data is used as a main line of data analysis, and the emotional score of each resource address accessed by the user in the click stream data is calculated in sequence by the traditional webpage public sentiment analysis method. The sentiment scores of all click stream data are calculated by the method, and the sentiment scores of all click stream data are used as input to obtain the public sentiment grade of the current website. The website content data processing model architecture is shown in fig. 4.
The following describes the point of connection between the clickstream data and the website content data [2] in the present invention:
1. the traditional public opinion analysis method may analyze data of a region filled by a user in a certain hot event, but a general user region distribution data source is region information filled when the user registers, and the information has no real-time property, so that an IP address source in click stream data can be used as a source for browsing user region distribution data after the click stream data is combined. A more comprehensive and objective analysis result can be obtained, so that the trend of public opinion in a certain area can be determined.
2. The time interval of the user accessing the webpage can be accurately determined by clicking stream data, the emotional tendency of the webpage can be judged by analyzing the content of the webpage, data analysis is carried out according to the time interval of the user browsing the website and the attention degree of the current user to the current webpage, and the accuracy of a data analysis result is improved.
3. The click stream data can determine the sequence of pages visited by the user and the number of pages visited by the user, thereby helping the website system administrator determine the attention of the pages.
Advantageous effects
The traditional public opinion analysis method is limited to analyzing the text data of the website content, so that the analysis result is inaccurate, and the traditional public opinion analysis method based on the geographic position is limited to using the geographic position information when the user registers, so that the real-time performance is not realized. The method provides a public opinion analysis method based on the combination of click stream data and website content data, and can increase the dimensionality of public opinion analysis and enable the analysis result to be more accurate. In addition, the user geographic position information analysis based on the click stream data has real-time performance, and the current situation of public opinion analysis can be reflected more truly.
Drawings
FIG. 1 is an overall construction diagram of the present invention
FIG. 2 is a basic architecture diagram of a clickstream data processing model
FIG. 3 illustrates a structure definition format of clickstream data
FIG. 4 shows a model architecture for processing website content data
FIG. 5 shows a process for analyzing website content data and clickstream data
Detailed Description
1. Data acquisition: the website content data is acquired by a web crawler or a website administrator; the website log data is obtained by exporting the access log of the user in the website log server or directly accessing the website log server.
2. Preprocessing website content data: taking text data as input, firstly performing word segmentation processing on the text data, defining stop words, and counting word frequencies of all the segmented words; and sequencing all the participles according to the participle frequency, and acquiring the participle at the top 20 of the sequencing as a characteristic word. The distributed file system of Hadoop supports mass data storage, MapReduce supports mass data calculation, and due to the fact that the capacity of a single webpage content file is small, a Hadoop storage model of small file storage is adopted, word frequency statistics needs to define Map and Reduce functions in MapReduce, and word frequency of word segmentation in each webpage is calculated in parallel.
3. Website content data emotion analysis: and matching all the feature words by using the positive and negative face vocabulary dictionaries, and calculating the emotion score of the current webpage according to the emotion classification (positive 1 or negative-1), strength and polarity of the feature words. The score calculation method is as follows: the CS (TOM) SOM POM, the score of the score result is positive, and the larger the absolute value of the score is, the more positive the emotional color of the webpage is; the score result score is negative, and the larger the absolute value of the score is, the more negative the emotional color of the webpage is (content score: content score, emotion classification: type of mind TOM emotional intensity: strong of mind SOM emotional polarity: polarity of mind POM). And obtaining the content scores of all the characteristic words according to the calculation method.
4. Preprocessing click stream data: according to different analysis instantaneity, the click stream data can be obtained online, and the click stream data can also be obtained through log data stored in a local file system. 1) Taking log data of a webpage of one day as input, and converting a file in a general log format into click stream data according to a defined data format of the click stream data because the log data are recorded in real time, wherein the specific conversion process comprises the steps of firstly using a script to remove log records of all CSS, JS and picture video files, and then classifying the log records of the same IP address into the same click stream data; according to a defined data format, extracting the concerned information in the log record, and sequentially inserting the concerned information into the linked list according to the user access time sequence according to the click stream data format. 2) The log server can also be connected with a click stream preprocessing system in real time, the log server directly sends log data to the Kafka message queue every time the log server generates one piece of log data, the click stream preprocessing system reads the data from the message queue and processes the data through the steps of 1), and therefore near real-time processing and analysis of the click stream can be achieved.
5. Click stream data processing and analysis: 1) acquiring the staying time of a user in a website, and acquiring the staying time of the user in the website by the difference value of data in a first node and a tail node in a timestamp linked list in click stream data; 2) acquiring the stay time of a user in a certain webpage, and acquiring the stay time by the timestamp difference between adjacent nodes; 3) and acquiring the success rate of the user for accessing the webpage, and acquiring the success rate by the value of the node corresponding to the resource address in the access success rate linked list. 4) And the user accesses the IP address of the website, and the matching information of the IP address and the geographic position is matched to determine the regional distribution of the visiting user.
6. Calculating the emotional color value of each node in the click stream data: because the emotional color of the webpage can be calculated through the text data of the webpage, the score of the emotional color of the click stream data of each user node is calculated by combining the access time difference and the access success rate of the adjacent nodes in the click stream data: click stream node score: NS — CS — TSD — SR (Node Score NS click stream Node Score, Timestamp difference of Timestamp differential TSD neighbor nodes, Success Rate of Success Rate SR Node access).
7. Calculating the sentiment value of click stream data and distributing public sentiment: the invention mainly provides three evaluation dimensions: 1) the overall public opinion condition of the website is evaluated by the total score of all the users clicking the stream nodes, and the higher the score is, the better the website public opinion condition is; 2) the public opinion condition of the website is objectively evaluated by adopting two dimensions of positive and negative, and in the webpage emotion score, the calculation is carried out by adopting a comprehensive evaluation mode, namely, the positive score and the negative score are mutually offset, and in order to more truly and objectively reflect the condition of the website, the positive score and the negative score of the website by a user are respectively counted; 3) and evaluating the website public opinion distribution based on the geographical position, matching the IP addresses with the geographical position mapping table by the IP address information of all users in the click stream data to obtain the geographical position distribution of the users, and calculating the sum of the scores of the click streams of all the users under different geographical positions (province and city levels can be selected) to obtain the public opinion distribution based on the geographical position.
8. Presentation of evaluation analysis results: and visually displaying the website public opinion comprehensive analysis result data.

Claims (8)

1. A public safety public opinion analysis method based on website click stream is realized based on an offline big data processing platform Hadoop and a real-time stream data processing platform Storm, and is characterized by comprising the following steps:
(1) acquiring website content data and website log data;
(2) preprocessing website content data to obtain feature words;
(3) analyzing the website content data emotion to obtain the emotion score CS of the current webpage: calculating the emotion score of the current webpage according to the emotion classification of the feature words, wherein the score calculation method comprises the following steps: the CS (TOM) SOM POM, the score of the score result is positive, and the larger the absolute value of the score is, the more positive the emotional color of the webpage is; the score of the scoring result is negative, and the larger the absolute value of the score is, the more negative the sentiment color of the webpage is shown, wherein the TOM represents sentiment classification, the SOM represents sentiment intensity, the POM represents sentiment polarity, and the TOM, the SOM and the POM are obtained by matching the feature words with the positive and negative face vocabulary dictionary;
(4) preprocessing click stream data: according to different analysis real-time requirements, click stream data can be obtained online by reading a website real-time log, and the click stream data can also be obtained by the log stored in a local file system;
(5) click stream data processing and analysis: 5.1) obtaining the stay time of the user in the website, wherein the stay time is obtained by the difference value of a timestamp head node and a tail node in a user node; 5.2) obtaining the staying time of a user in a certain webpage, wherein the staying time is obtained according to the time stamp difference between adjacent time stamp nodes in user nodes; 5.3) acquiring the success rate of the user for accessing the webpage, wherein the success rate is obtained by the value of the access success rate in the user node; 5.4) the user accesses the IP address of the website, and the regional distribution of the visiting user is determined by matching the IP address with the matching information of the geographic position;
(6) calculating the emotional color value of each user node in the click stream data: because the emotional color of the webpage can be calculated through the text data of the webpage, the score of the emotional color of the click stream data of each node is calculated by combining the access time difference and the access success rate of the adjacent nodes in the click stream data: click stream node score: NS TSD represents the difference of time stamps of adjacent nodes, and SR represents the success rate of node access;
(7) calculating the sentiment value of click stream data and distributing public sentiment: three evaluation dimensions were used: 1) the overall public opinion condition of the website is evaluated by the total score of all the users clicking the stream nodes, and the higher the score is, the better the website public opinion condition is; specifically, calculating the emotion score of each resource address accessed by a user in click stream data in sequence; calculating the sentiment scores of all click stream data by the method, and taking the sentiment scores of all click stream data as input to obtain the public sentiment grade of the current website; 2) the public opinion condition of the website is objectively evaluated by adopting two dimensions of positive and negative, and in the webpage emotion score, the calculation is carried out by adopting a comprehensive evaluation mode, namely, the positive score and the negative score are mutually offset, so that the positive score and the negative score of the accessed user on the website can be respectively counted in order to more truly reflect the condition of the website; 3) and evaluating the website public opinion distribution based on the geographical position, matching the IP addresses with the geographical position mapping table by the IP address information of all users in the click stream data to obtain the geographical position distribution of the users, and calculating the sum of the scores of the click streams of all the users under different geographical positions to obtain the geographical position-based public opinion distribution.
2. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: the website content data comprises news or information content in a website and user comment plain text data; further, the content data of the social and news media website comprises the following contents: 1. news content data; 2. the user reviews the content data.
3. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: the website content data is obtained through a web crawler or from a website administrator; the website log data is obtained by exporting the access log of the user in the website log server or directly accessing the website log server.
4. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: the step 2 specifically comprises the following steps: taking text data as input, firstly performing word segmentation processing on the text data, defining stop words, and counting word frequencies of all the segmented words; and sequencing all the participles according to the participle frequency, and acquiring the participle at the top 20 of the sequencing as a characteristic word.
5. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: and (4) the positive and negative vocabulary dictionary in the step (3) comprises word part of speech types, emotion intensity and polarity information.
6. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: the click stream data: when a user accesses a website, recording access information of the user every time of mouse clicking, connecting access clicks of each user to form user click stream data, and acquiring the website click stream data from a log file of a website server; the user node is pointed in click stream data, all clicks of each user form click stream data, and the user ID uniquely marks the click stream data of the user to form the user node.
7. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: the clickstream data obtained in the common format log includes the following information: (1) and (3) user identification: in an access log of a website, recording an IP address of a user, and using the IP address as a user identifier of click stream data; (2) resource address requested by user: only recording an HTML (hypertext markup language) address requested by a user in click stream data, and not recording CSS (cascading style sheets), JS (JS) and picture video addresses related to an HTML file in a website; (3) time of request for relevant resource: recording the specific time of the user for accessing the website; (4) address directing user access to current page: guiding the user to enter the last address of the current address; (5) success rate of resource access: and taking the ratio of the number of the resources successfully accessed by the user to the total number of the resources accessed by the user as the success rate of the resource access.
8. The public safety public opinion analysis method based on website click stream as claimed in claim 1, wherein: step 4 further comprises: 4.1) offline clickstream data: taking off-line log data of a website for one day as input, converting the log data in the general log format into click stream data according to the defined format of the click stream data, wherein the specific conversion process is as follows: firstly, removing all log records of CSS, JS and picture video files by using a script, and then classifying the log records with the same IP address into the same click stream data; extracting the concerned information in the log record according to a defined data format, and sequentially inserting the concerned information into a linked list according to a click stream data format and a user access time sequence; 4.2) online clickthrough data: the log server can also be connected with a click stream preprocessing system in real time, the log server directly sends log data to a Kafka message queue for caching when generating a piece of log data, the click stream preprocessing system reads the data from the message queue and processes the data through the steps of 4.1), and therefore real-time processing and analysis of the click stream can be achieved.
CN201911373986.6A 2019-12-26 2019-12-26 Public safety public opinion analysis method based on website click stream Active CN111159519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911373986.6A CN111159519B (en) 2019-12-26 2019-12-26 Public safety public opinion analysis method based on website click stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911373986.6A CN111159519B (en) 2019-12-26 2019-12-26 Public safety public opinion analysis method based on website click stream

Publications (2)

Publication Number Publication Date
CN111159519A CN111159519A (en) 2020-05-15
CN111159519B true CN111159519B (en) 2021-07-23

Family

ID=70558493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911373986.6A Active CN111159519B (en) 2019-12-26 2019-12-26 Public safety public opinion analysis method based on website click stream

Country Status (1)

Country Link
CN (1) CN111159519B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112584407B (en) * 2020-12-04 2022-07-22 重庆玖舆博泓科技有限公司 LTE user complaint qualitative method and device based on space-time combination
CN113689246B (en) * 2021-08-31 2023-09-12 中国平安人寿保险股份有限公司 Website monitoring method and device based on artificial intelligence, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591990A (en) * 2012-01-16 2012-07-18 广州市动景计算机科技有限公司 Method and device for acquiring user click information of website
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors
CN107092620A (en) * 2016-02-18 2017-08-25 奥多比公司 Click steam visual analysis based on maximum serial model
CN108664932A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of Latent abilities state identification method based on Multi-source Information Fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7318056B2 (en) * 2002-09-30 2008-01-08 Microsoft Corporation System and method for performing click stream analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591990A (en) * 2012-01-16 2012-07-18 广州市动景计算机科技有限公司 Method and device for acquiring user click information of website
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors
CN107092620A (en) * 2016-02-18 2017-08-25 奥多比公司 Click steam visual analysis based on maximum serial model
CN108664932A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of Latent abilities state identification method based on Multi-source Information Fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Web page prediction model based on click-stream tree representation of user behavior;Sule Gündüz.etc;《Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003》;20030830;第1-7页 *
论大数据背景下的网络舆情监测;杨海龙;《情报探索》;20151030;第132-135页 *

Also Published As

Publication number Publication date
CN111159519A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
Keneshloo et al. Predicting the popularity of news articles
Mayr et al. Think before you collect: Setting up a data collection approach for social media studies
Zagheni et al. You are where you e-mail: using e-mail data to estimate international migration rates
Yalçın et al. What is search engine optimization: SEO?
JP5078674B2 (en) Analysis system, information processing apparatus, activity analysis method, and program
JP4896071B2 (en) Advertisement evaluation method, advertisement evaluation system, and recording medium using keyword comparison
CN109905288B (en) Application service classification method and device
CN105677844A (en) Mobile advertisement big data directional pushing and user cross-screen recognition method
Bendler et al. Taming uncertainty in big data: Evidence from social media in urban areas
CN101313330A (en) Selecting high quality reviews for display
JP2012519918A (en) Method, apparatus and system for visualizing the behavior of a user browsing a web page
US9245035B2 (en) Information processing system, information processing method, program, and non-transitory information storage medium
KR20090000284A (en) Infomedics prevention system
Stede et al. The climate change debate and natural language processing
Saravanou et al. Twitter floods when it rains: a case study of the UK floods in early 2014
CN111159519B (en) Public safety public opinion analysis method based on website click stream
Przybyła et al. When classification accuracy is not enough: Explaining news credibility assessment
Kirsh et al. Splitting the web analytics atom: from page metrics and KPIs to sub-page metrics and KPIs
Yang et al. Are Altmetric. com scores effective for research impact evaluation in the social sciences and humanities?
CN111611464A (en) Big data-based public opinion monitoring platform
Guo et al. A survey of internet public opinion mining
KR102124935B1 (en) Disaster Monitoring System, Method Using Crowd Sourcing, and Computer Program therefor
Cortez et al. Measuring user influence in financial microblogs: experiments using stocktwits data
CN110866170A (en) Importance evaluation method, search method and system for Tor darknet service based on site quality
CN112685618A (en) User feature identification method and device, computing equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant