CN106649527B - Advertisement click abnormity detection system and detection method based on Spark Streaming - Google Patents
Advertisement click abnormity detection system and detection method based on Spark Streaming Download PDFInfo
- Publication number
- CN106649527B CN106649527B CN201610915505.XA CN201610915505A CN106649527B CN 106649527 B CN106649527 B CN 106649527B CN 201610915505 A CN201610915505 A CN 201610915505A CN 106649527 B CN106649527 B CN 106649527B
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- log
- advertisement
- knn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0242—Determining effectiveness of advertisements
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a Spark Streaming based advertisement click abnormity detection system and a detection method, and relates to the field of computer technology application. The abnormal data and the normal data are stored in a database, the suspected data are sent to a Kafka data message system, then a naive Bayes classifier is trained through the abnormal data, the classification condition of the suspected data can be obtained through the classifier, and the data are stored in the database. Finally, the cost of the advertiser is reasonably collected through the normal data volume, meanwhile, the popularity of each advertisement can be analyzed and obtained, the industry development direction is provided for the advertiser, and information such as the nationwide distribution condition of the user is provided.
Description
Technical Field
The invention relates to the field of computer technology application, in particular to a system and a method for detecting advertisement click abnormity based on Spark Streaming.
Background
With the explosive growth of data, the era of big data comes, and safe, rapid, real-time and efficient data processing can not only enable enterprises to avoid risks in advance, but also provide data information in time to provide real and effective basis for enterprise development, product production and development.
However, because the network has openness, the convenience of the public is brought, and simultaneously, unreal information, malicious access, malicious attack and the like are brought. The problems are faced by each open website, and the research focus of each open website is how to prevent the problems, how to extract real and effective data and how to reduce the malicious load of the server. The malicious click of the advertisement is a typical problem, abnormal data is mastered in time to prevent the malicious click, effective advertisement click data is obtained, a basis is provided for reasonable charging of an open website, the server load can be effectively improved, and reasonable commercial planning and business guidance are provided for commercial merchants. The current processing technology is generally based on off-line batch processing, and the processing technology cannot solve the on-line problem in real time and cannot provide theoretical basis for some schemes needing quick decision making. For real-time type systems such as: storm, which has the capability of processing data in real time, but has a weaker effect than Spark Streaming in data security and mass data processing. Spark is a distributed computing framework similar to MapReduce, and the core of Spark is an elastic distributed data set, which provides a richer model than MapReduce, and can perform multiple iterations on the data set in a memory rapidly to support complex data mining algorithms and graph computing algorithms. Spark Streaming is a real-time computing framework built on Spark, which expands Spark's ability to process large-scale Streaming data.
The advantage of Spark Streaming is:
can run on a 100+ node and achieve millisecond delay.
Use of memory-based Spark as an execution engine, with efficient and fault tolerant features.
Batch and interactive queries that can integrate Spark.
Providing a simple interface similar to batch processing for implementing complex algorithms.
Therefore, based on the problems, the existing Spark big data calculation framework, strong computer hardware support and reasonable machine learning algorithm are combined, and the problems can be quickly, efficiently and accurately solved.
The invention aims to provide a Spark Streaming based advertisement click abnormity detection system, which can analyze and filter advertisement click abnormity thrown at a user end, timely master effective advertisement click conditions, reasonably and effectively charge advertisement throwing, analyze behaviors and characteristics of abnormal data, be more beneficial to analyzing user behaviors and interests, provide commercial planning for advertisement throwers, play a practical basis for product rationality and the like, and predict market future behavior and the like.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The advertisement click abnormity detection system and the detection method based on Spark Streaming can provide commercial planning, product rationality and the like for advertisement publishers quickly, efficiently and accurately, and can predict market future market quotations. The technical scheme of the invention is as follows:
a Spark Streaming based advertisement click abnormity detection system comprises a data acquisition unit, a data cleaning unit, a distributed data message system, a first abnormal data detection unit, a suspected data extraction unit, a normal data and abnormal data classifier and a classified data database unit; wherein
The data acquisition unit is used for acquiring the log information of the advertisement clicked by the user;
the data cleaning unit is used for cleaning and standardizing the logs acquired by the data acquisition unit, and finally sending the standardized data to the distributed data message system to wait for consumption;
the distributed data message system is mainly used for storing data after data standardization, also storing suspect data sent by the suspect data extraction unit, generating theme data required to be consumed by Spark Streaming, and generating respective Topic by different data;
the first abnormal data detection unit adopts a KNN algorithm to perform quasi real-time processing on data from the distributed message system (3) in Spark Streaming to obtain suspected data, abnormal data and normal data;
the suspect data extraction unit is mainly used for sending the suspect data generated by the first abnormal data detection unit back to the distributed data message system;
the normal data and abnormal data classifier adopts a naive Bayes classification method to classify the suspect data stored in the distributed message system to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database and a Redis memory database, wherein the MySQL database is used for storing normal data and abnormal data generated by a normal data classifier and an abnormal data classifier and mapping the abnormal data to the Redis memory database, so that a naive Bayesian classifier is convenient to train quickly, the Redis is a memory database which is only used for mapping the MySQL database, the query and modification speed is convenient to improve, the data is written into MySQL in a set period, and the permanent storage is convenient. In short, Redis is an intermediate piece, in order to increase speed.
Further, the Redis memory database further comprises a naive Bayesian classifier which uses the stored abnormal data for training.
Further, the device for acquiring the log information of the advertisement clicked by the user by the data acquisition unit is a log collector flash (distributed log collection system), and the distributed data message system is Kafka.
Further, the first abnormal data detection unit (4) adopts a KNN function of a KNN algorithm as follows:
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity of the log to be classified and the example log is cosine similarity:
further, in the KNN algorithm, the effectiveness of the KNN classifier clicking comprises five vectors, the first is that the number of clicks of the same IP in a period of time is large and abnormal, the second is that the stay time of the clicked IP on an advertisement page is almost negligible and abnormal, the third is that the time of artificial activities of the clicked IP for the abnormality of the advertisement access time is different from the normal time, the fourth is that the access synchronism of different addresses of the same IP section is similar for multiple times and abnormal, and the fifth is that the past behaviors and interests of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, the KNN classifier is obtained by training the sample data on the KNN.
Further, the naive bayes function is:
where d is the number of attributes, xiIs the value of x on the ith attribute.
Training the classifier by taking the abnormal data mapped to Redis as samples, and in a period, for example: one week, the naive bayesian classifier was retrained with 20% of the outlier data extracted at random.
A Spark Streaming based advertisement click abnormity detection method comprises the following steps:
1) collecting advertisement click logs of website users by using a Flume (distributed log collection system);
2) carrying out data standardization processing on the data collected by the flash in the step 1), and then sending the standardized data to a Kafka message system by the flash, wherein the original data is defined as Topic1, and Topic1 represents the data waiting to be consumed, namely the address equivalent to the data is defined;
3) classifying the data to be consumed Topic1 in the step 2) under the KNN algorithm through a Spark Streaming quasi-real-time computing frame;
4) according to the suspected data, the abnormal data and the normal data generated in the step 3), sending the suspected data back to Kafka to be defined as Topic2, storing the rest data in a Redis memory database, and writing the data into a MySQL database to realize read-write separation of MySQL;
5) and (3) training a naive Bayes classifier according to 20% of abnormal data randomly extracted from the Redis in the MySQL database in the step 4), and then classifying the Topic2 in the Kafka under a naive Bayes algorithm through a Spark Streaming quasi-real-time computing framework.
Further, the KNN algorithm in step 3) is: and taking the training sample as a reference point, calculating the distance between the test sample and the training sample, and obtaining the closest value in the distance by adopting the Euclidean distance as a classification basis.
Further, the formula of the euclidean distance of the KNN algorithm in step 2) is as follows:
x and y represent different individuals, each having n-dimensional features.
The invention has the following advantages and beneficial effects:
according to the method, advertisement click data is put at a user side through a flash acquisition user side, the data is cleaned and standardized, the standardized data is sent to a distributed message system Kafka by the flash, Topic1 is generated after subscription is consumed, the data is classified into suspected data, abnormal data and normal data by using a big data quasi-real-time stream data Spark Streaming computing framework combined with a KNN classification algorithm, then the suspected data is sent back to the Kafka to generate Topic2, and the Topic2 generated by the suspected data is classified by using the big data quasi-real-time stream data Spark Streaming computing framework combined with a naive Bayesian classification algorithm to obtain the abnormal data and the normal data. The processes are finally classified and stored in Redis and then stored in a MySQL database, so that the read-write separation of the database is realized, and the read-write speed is increased.
Drawings
FIG. 1 is a schematic structural view of a preferred embodiment of the present invention;
FIG. 2 is a KNN classification flow chart under Spark Streaming;
fig. 3 is a naive bayes classification flow chart under Spark Streaming.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme of the invention is as follows:
as shown in fig. 1, an advertisement click anomaly detection system based on Spark Streaming is characterized by comprising a data acquisition unit 1, a data cleaning unit 2, a distributed data message system 3, a first anomaly data detection unit 4, a suspect data extraction unit 5, a normal data and anomaly data classifier 6 and a classified data database unit; wherein
The data acquisition unit 1 is used for acquiring the log information of the advertisement clicked by the user;
the data cleaning unit 2 is used for cleaning and standardizing the logs acquired by the data acquisition unit 1, and finally sending the standardized data to the distributed data message system 3 to wait for consumption;
the distributed data message system 3 is mainly used for storing data after data standardization, also storing suspect data sent by the suspect data extraction unit, generating theme data required to be consumed by Spark Streaming, and generating respective Topic by different data;
the first abnormal data detection unit 4 adopts a KNN algorithm to perform quasi real-time processing on data from the distributed message system 3 in Spark Streaming to obtain suspected data, abnormal data and normal data;
the suspect data extracting unit 5 is mainly used for sending the suspect data generated by the first abnormal data detecting unit 4 back to the distributed data message system 3;
the normal data and abnormal data classifier 6 classifies the suspect data stored in the distributed message system 3 by adopting a naive Bayesian classification method to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database 7 and a Redis memory database 8, wherein the MySQL database 7 is used for storing normal data and abnormal data generated by the normal data and abnormal data classifier 6 and mapping the abnormal data to the Redis memory database, so that a naive Bayesian classifier can be trained quickly, the Redis is a memory database and is only used for mapping the MySQL database, the query and modification speed can be improved conveniently, and the data can be written into MySQL in a set period and can be stored permanently. In short, Redis is an intermediate piece, in order to increase speed.
Fig. 2 is a KNN classification flowchart under Spark Streaming.
Fig. 3 is a naive bayes classification flow chart under Spark Streaming.
The KNN classifier classifies Topic1 data which are stored in Kafka after standardization to generate suspect data (KNN data which cannot be classified), normal data and abnormal data, the generated normal data and abnormal data are stored in a database, the suspect data are sent back to the Kafka to generate Topic2 to wait for classification of a naive Bayesian classifier, the naive Bayesian classifier is trained through abnormal data classified by the KNN, calculation is faster through combining super-strong calculation capability of big data Spark Streaming, results are more accurate, and the classified data are finally stored.
According to the method, after the webpage user clicks the advertisement, abnormal data are filtered in real time, the characteristics and behaviors of the abnormal data are analyzed and extracted, normal data are collected, the advertisement putting cost is calculated in total, the behaviors and interests of the user are analyzed, a business plan is made for advertisement putting enterprises, the future market quotation is predicted, and the like. The classification reaches three classifications, namely suspect data, abnormal data and normal data through the first classification of KNN, then naive Bayes is trained through the abnormal data, and the suspect data is accurately divided so as to achieve the rationality of the data, and the abnormal data, the normal data, the relevant data and the irrelevant data can powerfully provide guarantee for accurate data mining and predictive analysis.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.
Claims (7)
1. A Spark Streaming based advertisement click abnormity detection system is characterized by comprising a data acquisition unit (1), a data cleaning unit (2), a distributed data message system (3), a first abnormal data detection unit (4), a suspect data extraction unit (5), a normal data and abnormal data classifier (6) and a classified data database unit; wherein
The data acquisition unit (1) is used for acquiring log information of the advertisement clicked by the user;
the data cleaning unit (2) is used for cleaning and standardizing the logs acquired by the data acquisition unit (1), and finally sending the standardized data to the distributed data message system (3) to wait for consumption;
the distributed data message system (3) is used for storing the standardized data and also storing the suspect data sent by the suspect data extraction unit (5) to generate subject data which needs to be consumed by Spark Streaming, and different data generate respective Topic;
the first abnormal data detection unit (4) adopts a KNN algorithm to perform quasi-real-time processing on data from the distributed data message system (3) in Spark Streaming to obtain suspected data, abnormal data and normal data; the KNN function of the KNN algorithm adopted by the first abnormal data detection unit (4) is as follows:
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity between the log to be classified and the example log is cosine similarity, and the similarity between the log to be classified and the example log is as follows:
wherein when d belongs to cjWhen the log is classified, x is the vector representation of a log to be classified, d is the example log vector in the training set, and d is 1, otherwise 0 is taken; the distance metric uses the euclidean distance;
in the KNN algorithm, the click effectiveness of the KNN classifier comprises five sample data, the first sample data is that the number of clicks of the same IP in a period of time is large and abnormal, the second sample data is that the stay time of the clicked IP in an advertisement page is almost negligible and abnormal, the third sample data is that the time of the clicked IP for the abnormal advertisement access time is different from the normal human activity time, the fourth sample data is that the time of the same IP section with different address access synchronicity is similar for multiple times and abnormal, the fifth sample data is that the past behavior and interest of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, and the sample data is used as KNN representative data to obtain the KNN classifier;
the suspect data extraction unit (5) is used for sending the suspect data generated by the first abnormal data detection unit (4) back to the distributed data message system (3);
the normal data and abnormal data classifier (6) classifies the suspected data stored in the distributed data message system (3) by adopting a naive Bayesian classification method to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database (7) and a Redis memory database (8), wherein the MySQL database (7) is used for storing normal data and abnormal data generated by a normal data and abnormal data classifier (6), the abnormal data is mapped to the Redis memory database (8), a naive Bayesian classifier is trained, the Redis is used as the memory database, the MySQL database is only used for mapping, the query and modification speed is improved, and the data is written into MySQL in a certain period and is permanently stored.
2. The Spark Streaming based ad click anomaly detection system as claimed in claim 1, wherein said Redis in-memory database further comprises using stored anomaly data to train a naive Bayesian classifier.
3. The Spark Streaming based advertisement click anomaly detection system according to claim 1, wherein the device for collecting the log information of the user click advertisement by the data collection unit (1) is a flash distributed log collection system, and the distributed data message system (3) is Kafka.
4. The Spark Streaming based advertisement click anomaly detection system according to claim 3, wherein said naive Bayes function is:
where d is the number of attributes, xiAnd (3) taking the value of x on the ith attribute, training a classifier by taking the abnormal data mapped to Redis as a sample, and retraining and updating the naive Bayes classifier by using 20% of the abnormal data extracted randomly in one period.
5. A Spark Streaming based advertisement click abnormity detection method is characterized by comprising the following steps:
1) collecting advertisement click logs of website users by using a distributed log collection system Flume;
2) carrying out data standardization processing on the data collected by the flash in the step 1), and then sending the standardized data to a Kafka message system by the flash, wherein the original data is defined as Topic1, and Topic1 represents the data waiting to be consumed, namely the address equivalent to the data is defined;
3) classifying the data to be consumed Topic1 in the step 2) under the KNN algorithm through a Spark Streaming quasi-real-time computing frame;
4) sending the suspected data back to Kafka according to the suspected data, the abnormal data and the normal data generated in the step 3) to be defined as Topic2, storing the abnormal data and the normal data in a Redis memory database, and writing the abnormal data and the normal data into a MySQL database to realize read-write separation of MySQL;
the KNN function using the KNN algorithm is:
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity between the log to be classified and the example log is cosine similarity, and the similarity between the log to be classified and the example log is as follows:
d represents an example log vector in the training set, and d is taken as 1, otherwise 0 is taken; the distance metric uses the euclidean distance;
in the KNN algorithm, the click effectiveness of the KNN classifier comprises five sample data, the first sample data is that the number of clicks of the same IP in a period of time is large and abnormal, the second sample data is that the stay time of the clicked IP in an advertisement page is almost negligible and abnormal, the third sample data is that the time of the clicked IP for the abnormal advertisement access time is different from the normal human activity time, the fourth sample data is that the time of the same IP section with different address access synchronicity is similar for multiple times and abnormal, the fifth sample data is that the past behavior and interest of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, and the sample data is used as KNN representative data to obtain the KNN classifier;
5) 20% of abnormal data in the MySQL database is randomly extracted to train a naive Bayes classifier, and then Topic2 in Kafka is classified under a naive Bayes algorithm through a Spark Streaming quasi-real-time computing framework.
6. The method for detecting abnormal advertisement clicks according to claim 5, wherein the KNN algorithm in step 3) is: and taking the training sample as a reference point, calculating the distance between the test sample and the training sample, and obtaining the closest value in the distance by adopting the Euclidean distance as a classification basis.
7. The method for detecting abnormal advertisement clicks based on Spark Streaming according to claim 6, wherein the formula of the Euclidean distance of the KNN algorithm in the step 2) is as follows:
dist (x, y) denotes the Euclidean distance, x and y denote the individual differences, each with n-dimensional features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610915505.XA CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610915505.XA CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649527A CN106649527A (en) | 2017-05-10 |
CN106649527B true CN106649527B (en) | 2021-02-09 |
Family
ID=58856008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610915505.XA Active CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649527B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229564B (en) * | 2018-01-05 | 2022-08-02 | 创新先进技术有限公司 | Data processing method, device and equipment |
CN108829715B (en) * | 2018-05-04 | 2022-03-25 | 慧安金科(北京)科技有限公司 | Method, apparatus, and computer-readable storage medium for detecting abnormal data |
CN110717771A (en) * | 2018-07-11 | 2020-01-21 | 武汉斗鱼网络科技有限公司 | Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system |
CN109388548B (en) * | 2018-09-29 | 2020-12-22 | 京东数字科技控股有限公司 | Method and apparatus for generating information |
CN109361699A (en) * | 2018-12-06 | 2019-02-19 | 四川长虹电器股份有限公司 | Anomalous traffic detection method based on Spark Streaming |
CN110334105B (en) * | 2019-07-12 | 2022-09-09 | 河海大学常州校区 | Stream data abnormity detection method based on Storm |
CN111708846A (en) * | 2020-05-14 | 2020-09-25 | 北京嗨学网教育科技股份有限公司 | Multi-terminal data management method and device |
CN112667723A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data acquisition method and terminal equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130005597A (en) * | 2011-07-06 | 2013-01-16 | 이성진 | System for preventing of cpc advertisement fraud click |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9633364B2 (en) * | 2010-12-30 | 2017-04-25 | Nokia Technologies Oy | Method and apparatus for detecting fraudulent advertising traffic initiated through an application |
US20130325591A1 (en) * | 2012-06-01 | 2013-12-05 | Airpush, Inc. | Methods and systems for click-fraud detection in online advertising |
CN104765874B (en) * | 2015-04-24 | 2019-03-26 | 百度在线网络技术(北京)有限公司 | For detecting the method and device for clicking cheating |
-
2016
- 2016-10-20 CN CN201610915505.XA patent/CN106649527B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130005597A (en) * | 2011-07-06 | 2013-01-16 | 이성진 | System for preventing of cpc advertisement fraud click |
Also Published As
Publication number | Publication date |
---|---|
CN106649527A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649527B (en) | Advertisement click abnormity detection system and detection method based on Spark Streaming | |
Silva et al. | Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data | |
Nguyen et al. | Automatic image filtering on social networks using deep learning and perceptual hashing during crises | |
Huang et al. | A probabilistic method for emerging topic tracking in microblog stream | |
Chen et al. | Entity embedding-based anomaly detection for heterogeneous categorical events | |
Vo et al. | Crime rate detection using social media of different crime locations and Twitter part-of-speech tagger with Brown clustering | |
Lee et al. | Incremental cluster evolution tracking from highly dynamic network data | |
CN112765603B (en) | Abnormity tracing method combining system log and origin graph | |
Hua et al. | Automatic targeted-domain spatiotemporal event detection in twitter | |
CN109284432A (en) | Network public opinion analysis system based on big data platform | |
CN113449204B (en) | Social event classification method and device based on local aggregation graph attention network | |
Maini et al. | Characterizing datapoints via second-split forgetting | |
Xie et al. | Robust detection of hyper-local events from geotagged social media data | |
Xiang et al. | Spam detection in reviews using LSTM-based multi-entity temporal features | |
Demirbaga | HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data | |
Liu et al. | Detecting collusive spamming activities in community question answering | |
Peng et al. | Emerging topic detection from microblog streams based on emerging pattern mining | |
Xue et al. | An anomaly detection framework for time-evolving attributed networks | |
Yang et al. | News topic detection based on capsule semantic graph | |
Bhardwaj et al. | A human-AI loop approach for joint keyword discovery and expectation estimation in micropost event detection | |
Shu et al. | Automatic extraction of web page text information based on network topology coincidence degree | |
Lazreg et al. | Semantic Decay Filter for Event Detection. | |
Yang et al. | Towards temporal event detection: A dataset, benchmarks and challenges | |
Zhang et al. | Event-radar: Real-time local event detection system for geo-tagged tweet streams | |
Nguyen et al. | Pagerank-based approach on ranking social events: a case study with flickr |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |