CN106649527B - Advertisement click abnormity detection system and detection method based on Spark Streaming - Google Patents

Advertisement click abnormity detection system and detection method based on Spark Streaming Download PDF

Info

Publication number
CN106649527B
CN106649527B CN201610915505.XA CN201610915505A CN106649527B CN 106649527 B CN106649527 B CN 106649527B CN 201610915505 A CN201610915505 A CN 201610915505A CN 106649527 B CN106649527 B CN 106649527B
Authority
CN
China
Prior art keywords
data
abnormal
log
advertisement
knn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610915505.XA
Other languages
Chinese (zh)
Other versions
CN106649527A (en
Inventor
刘群
谭敢锋
戴大祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201610915505.XA priority Critical patent/CN106649527B/en
Publication of CN106649527A publication Critical patent/CN106649527A/en
Application granted granted Critical
Publication of CN106649527B publication Critical patent/CN106649527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a Spark Streaming based advertisement click abnormity detection system and a detection method, and relates to the field of computer technology application. The abnormal data and the normal data are stored in a database, the suspected data are sent to a Kafka data message system, then a naive Bayes classifier is trained through the abnormal data, the classification condition of the suspected data can be obtained through the classifier, and the data are stored in the database. Finally, the cost of the advertiser is reasonably collected through the normal data volume, meanwhile, the popularity of each advertisement can be analyzed and obtained, the industry development direction is provided for the advertiser, and information such as the nationwide distribution condition of the user is provided.

Description

Advertisement click abnormity detection system and detection method based on Spark Streaming
Technical Field
The invention relates to the field of computer technology application, in particular to a system and a method for detecting advertisement click abnormity based on Spark Streaming.
Background
With the explosive growth of data, the era of big data comes, and safe, rapid, real-time and efficient data processing can not only enable enterprises to avoid risks in advance, but also provide data information in time to provide real and effective basis for enterprise development, product production and development.
However, because the network has openness, the convenience of the public is brought, and simultaneously, unreal information, malicious access, malicious attack and the like are brought. The problems are faced by each open website, and the research focus of each open website is how to prevent the problems, how to extract real and effective data and how to reduce the malicious load of the server. The malicious click of the advertisement is a typical problem, abnormal data is mastered in time to prevent the malicious click, effective advertisement click data is obtained, a basis is provided for reasonable charging of an open website, the server load can be effectively improved, and reasonable commercial planning and business guidance are provided for commercial merchants. The current processing technology is generally based on off-line batch processing, and the processing technology cannot solve the on-line problem in real time and cannot provide theoretical basis for some schemes needing quick decision making. For real-time type systems such as: storm, which has the capability of processing data in real time, but has a weaker effect than Spark Streaming in data security and mass data processing. Spark is a distributed computing framework similar to MapReduce, and the core of Spark is an elastic distributed data set, which provides a richer model than MapReduce, and can perform multiple iterations on the data set in a memory rapidly to support complex data mining algorithms and graph computing algorithms. Spark Streaming is a real-time computing framework built on Spark, which expands Spark's ability to process large-scale Streaming data.
The advantage of Spark Streaming is:
can run on a 100+ node and achieve millisecond delay.
Use of memory-based Spark as an execution engine, with efficient and fault tolerant features.
Batch and interactive queries that can integrate Spark.
Providing a simple interface similar to batch processing for implementing complex algorithms.
Therefore, based on the problems, the existing Spark big data calculation framework, strong computer hardware support and reasonable machine learning algorithm are combined, and the problems can be quickly, efficiently and accurately solved.
The invention aims to provide a Spark Streaming based advertisement click abnormity detection system, which can analyze and filter advertisement click abnormity thrown at a user end, timely master effective advertisement click conditions, reasonably and effectively charge advertisement throwing, analyze behaviors and characteristics of abnormal data, be more beneficial to analyzing user behaviors and interests, provide commercial planning for advertisement throwers, play a practical basis for product rationality and the like, and predict market future behavior and the like.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The advertisement click abnormity detection system and the detection method based on Spark Streaming can provide commercial planning, product rationality and the like for advertisement publishers quickly, efficiently and accurately, and can predict market future market quotations. The technical scheme of the invention is as follows:
a Spark Streaming based advertisement click abnormity detection system comprises a data acquisition unit, a data cleaning unit, a distributed data message system, a first abnormal data detection unit, a suspected data extraction unit, a normal data and abnormal data classifier and a classified data database unit; wherein
The data acquisition unit is used for acquiring the log information of the advertisement clicked by the user;
the data cleaning unit is used for cleaning and standardizing the logs acquired by the data acquisition unit, and finally sending the standardized data to the distributed data message system to wait for consumption;
the distributed data message system is mainly used for storing data after data standardization, also storing suspect data sent by the suspect data extraction unit, generating theme data required to be consumed by Spark Streaming, and generating respective Topic by different data;
the first abnormal data detection unit adopts a KNN algorithm to perform quasi real-time processing on data from the distributed message system (3) in Spark Streaming to obtain suspected data, abnormal data and normal data;
the suspect data extraction unit is mainly used for sending the suspect data generated by the first abnormal data detection unit back to the distributed data message system;
the normal data and abnormal data classifier adopts a naive Bayes classification method to classify the suspect data stored in the distributed message system to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database and a Redis memory database, wherein the MySQL database is used for storing normal data and abnormal data generated by a normal data classifier and an abnormal data classifier and mapping the abnormal data to the Redis memory database, so that a naive Bayesian classifier is convenient to train quickly, the Redis is a memory database which is only used for mapping the MySQL database, the query and modification speed is convenient to improve, the data is written into MySQL in a set period, and the permanent storage is convenient. In short, Redis is an intermediate piece, in order to increase speed.
Further, the Redis memory database further comprises a naive Bayesian classifier which uses the stored abnormal data for training.
Further, the device for acquiring the log information of the advertisement clicked by the user by the data acquisition unit is a log collector flash (distributed log collection system), and the distributed data message system is Kafka.
Further, the first abnormal data detection unit (4) adopts a KNN function of a KNN algorithm as follows:
Figure BDA0001135043010000031
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity of the log to be classified and the example log is cosine similarity:
Figure BDA0001135043010000032
further, in the KNN algorithm, the effectiveness of the KNN classifier clicking comprises five vectors, the first is that the number of clicks of the same IP in a period of time is large and abnormal, the second is that the stay time of the clicked IP on an advertisement page is almost negligible and abnormal, the third is that the time of artificial activities of the clicked IP for the abnormality of the advertisement access time is different from the normal time, the fourth is that the access synchronism of different addresses of the same IP section is similar for multiple times and abnormal, and the fifth is that the past behaviors and interests of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, the KNN classifier is obtained by training the sample data on the KNN.
Further, the naive bayes function is:
Figure BDA0001135043010000041
where d is the number of attributes, xiIs the value of x on the ith attribute.
Training the classifier by taking the abnormal data mapped to Redis as samples, and in a period, for example: one week, the naive bayesian classifier was retrained with 20% of the outlier data extracted at random.
A Spark Streaming based advertisement click abnormity detection method comprises the following steps:
1) collecting advertisement click logs of website users by using a Flume (distributed log collection system);
2) carrying out data standardization processing on the data collected by the flash in the step 1), and then sending the standardized data to a Kafka message system by the flash, wherein the original data is defined as Topic1, and Topic1 represents the data waiting to be consumed, namely the address equivalent to the data is defined;
3) classifying the data to be consumed Topic1 in the step 2) under the KNN algorithm through a Spark Streaming quasi-real-time computing frame;
4) according to the suspected data, the abnormal data and the normal data generated in the step 3), sending the suspected data back to Kafka to be defined as Topic2, storing the rest data in a Redis memory database, and writing the data into a MySQL database to realize read-write separation of MySQL;
5) and (3) training a naive Bayes classifier according to 20% of abnormal data randomly extracted from the Redis in the MySQL database in the step 4), and then classifying the Topic2 in the Kafka under a naive Bayes algorithm through a Spark Streaming quasi-real-time computing framework.
Further, the KNN algorithm in step 3) is: and taking the training sample as a reference point, calculating the distance between the test sample and the training sample, and obtaining the closest value in the distance by adopting the Euclidean distance as a classification basis.
Further, the formula of the euclidean distance of the KNN algorithm in step 2) is as follows:
Figure BDA0001135043010000051
x and y represent different individuals, each having n-dimensional features.
The invention has the following advantages and beneficial effects:
according to the method, advertisement click data is put at a user side through a flash acquisition user side, the data is cleaned and standardized, the standardized data is sent to a distributed message system Kafka by the flash, Topic1 is generated after subscription is consumed, the data is classified into suspected data, abnormal data and normal data by using a big data quasi-real-time stream data Spark Streaming computing framework combined with a KNN classification algorithm, then the suspected data is sent back to the Kafka to generate Topic2, and the Topic2 generated by the suspected data is classified by using the big data quasi-real-time stream data Spark Streaming computing framework combined with a naive Bayesian classification algorithm to obtain the abnormal data and the normal data. The processes are finally classified and stored in Redis and then stored in a MySQL database, so that the read-write separation of the database is realized, and the read-write speed is increased.
Drawings
FIG. 1 is a schematic structural view of a preferred embodiment of the present invention;
FIG. 2 is a KNN classification flow chart under Spark Streaming;
fig. 3 is a naive bayes classification flow chart under Spark Streaming.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme of the invention is as follows:
as shown in fig. 1, an advertisement click anomaly detection system based on Spark Streaming is characterized by comprising a data acquisition unit 1, a data cleaning unit 2, a distributed data message system 3, a first anomaly data detection unit 4, a suspect data extraction unit 5, a normal data and anomaly data classifier 6 and a classified data database unit; wherein
The data acquisition unit 1 is used for acquiring the log information of the advertisement clicked by the user;
the data cleaning unit 2 is used for cleaning and standardizing the logs acquired by the data acquisition unit 1, and finally sending the standardized data to the distributed data message system 3 to wait for consumption;
the distributed data message system 3 is mainly used for storing data after data standardization, also storing suspect data sent by the suspect data extraction unit, generating theme data required to be consumed by Spark Streaming, and generating respective Topic by different data;
the first abnormal data detection unit 4 adopts a KNN algorithm to perform quasi real-time processing on data from the distributed message system 3 in Spark Streaming to obtain suspected data, abnormal data and normal data;
the suspect data extracting unit 5 is mainly used for sending the suspect data generated by the first abnormal data detecting unit 4 back to the distributed data message system 3;
the normal data and abnormal data classifier 6 classifies the suspect data stored in the distributed message system 3 by adopting a naive Bayesian classification method to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database 7 and a Redis memory database 8, wherein the MySQL database 7 is used for storing normal data and abnormal data generated by the normal data and abnormal data classifier 6 and mapping the abnormal data to the Redis memory database, so that a naive Bayesian classifier can be trained quickly, the Redis is a memory database and is only used for mapping the MySQL database, the query and modification speed can be improved conveniently, and the data can be written into MySQL in a set period and can be stored permanently. In short, Redis is an intermediate piece, in order to increase speed.
Fig. 2 is a KNN classification flowchart under Spark Streaming.
Fig. 3 is a naive bayes classification flow chart under Spark Streaming.
The KNN classifier classifies Topic1 data which are stored in Kafka after standardization to generate suspect data (KNN data which cannot be classified), normal data and abnormal data, the generated normal data and abnormal data are stored in a database, the suspect data are sent back to the Kafka to generate Topic2 to wait for classification of a naive Bayesian classifier, the naive Bayesian classifier is trained through abnormal data classified by the KNN, calculation is faster through combining super-strong calculation capability of big data Spark Streaming, results are more accurate, and the classified data are finally stored.
According to the method, after the webpage user clicks the advertisement, abnormal data are filtered in real time, the characteristics and behaviors of the abnormal data are analyzed and extracted, normal data are collected, the advertisement putting cost is calculated in total, the behaviors and interests of the user are analyzed, a business plan is made for advertisement putting enterprises, the future market quotation is predicted, and the like. The classification reaches three classifications, namely suspect data, abnormal data and normal data through the first classification of KNN, then naive Bayes is trained through the abnormal data, and the suspect data is accurately divided so as to achieve the rationality of the data, and the abnormal data, the normal data, the relevant data and the irrelevant data can powerfully provide guarantee for accurate data mining and predictive analysis.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (7)

1. A Spark Streaming based advertisement click abnormity detection system is characterized by comprising a data acquisition unit (1), a data cleaning unit (2), a distributed data message system (3), a first abnormal data detection unit (4), a suspect data extraction unit (5), a normal data and abnormal data classifier (6) and a classified data database unit; wherein
The data acquisition unit (1) is used for acquiring log information of the advertisement clicked by the user;
the data cleaning unit (2) is used for cleaning and standardizing the logs acquired by the data acquisition unit (1), and finally sending the standardized data to the distributed data message system (3) to wait for consumption;
the distributed data message system (3) is used for storing the standardized data and also storing the suspect data sent by the suspect data extraction unit (5) to generate subject data which needs to be consumed by Spark Streaming, and different data generate respective Topic;
the first abnormal data detection unit (4) adopts a KNN algorithm to perform quasi-real-time processing on data from the distributed data message system (3) in Spark Streaming to obtain suspected data, abnormal data and normal data; the KNN function of the KNN algorithm adopted by the first abnormal data detection unit (4) is as follows:
KNN function
Figure FDA0002772354420000011
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity between the log to be classified and the example log is cosine similarity, and the similarity between the log to be classified and the example log is as follows:
Figure FDA0002772354420000012
wherein when d belongs to cjWhen the log is classified, x is the vector representation of a log to be classified, d is the example log vector in the training set, and d is 1, otherwise 0 is taken; the distance metric uses the euclidean distance;
in the KNN algorithm, the click effectiveness of the KNN classifier comprises five sample data, the first sample data is that the number of clicks of the same IP in a period of time is large and abnormal, the second sample data is that the stay time of the clicked IP in an advertisement page is almost negligible and abnormal, the third sample data is that the time of the clicked IP for the abnormal advertisement access time is different from the normal human activity time, the fourth sample data is that the time of the same IP section with different address access synchronicity is similar for multiple times and abnormal, the fifth sample data is that the past behavior and interest of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, and the sample data is used as KNN representative data to obtain the KNN classifier;
the suspect data extraction unit (5) is used for sending the suspect data generated by the first abnormal data detection unit (4) back to the distributed data message system (3);
the normal data and abnormal data classifier (6) classifies the suspected data stored in the distributed data message system (3) by adopting a naive Bayesian classification method to obtain abnormal data and normal data;
the classification data database unit comprises a MySQL database (7) and a Redis memory database (8), wherein the MySQL database (7) is used for storing normal data and abnormal data generated by a normal data and abnormal data classifier (6), the abnormal data is mapped to the Redis memory database (8), a naive Bayesian classifier is trained, the Redis is used as the memory database, the MySQL database is only used for mapping, the query and modification speed is improved, and the data is written into MySQL in a certain period and is permanently stored.
2. The Spark Streaming based ad click anomaly detection system as claimed in claim 1, wherein said Redis in-memory database further comprises using stored anomaly data to train a naive Bayesian classifier.
3. The Spark Streaming based advertisement click anomaly detection system according to claim 1, wherein the device for collecting the log information of the user click advertisement by the data collection unit (1) is a flash distributed log collection system, and the distributed data message system (3) is Kafka.
4. The Spark Streaming based advertisement click anomaly detection system according to claim 3, wherein said naive Bayes function is:
Figure FDA0002772354420000021
where d is the number of attributes, xiAnd (3) taking the value of x on the ith attribute, training a classifier by taking the abnormal data mapped to Redis as a sample, and retraining and updating the naive Bayes classifier by using 20% of the abnormal data extracted randomly in one period.
5. A Spark Streaming based advertisement click abnormity detection method is characterized by comprising the following steps:
1) collecting advertisement click logs of website users by using a distributed log collection system Flume;
2) carrying out data standardization processing on the data collected by the flash in the step 1), and then sending the standardized data to a Kafka message system by the flash, wherein the original data is defined as Topic1, and Topic1 represents the data waiting to be consumed, namely the address equivalent to the data is defined;
3) classifying the data to be consumed Topic1 in the step 2) under the KNN algorithm through a Spark Streaming quasi-real-time computing frame;
4) sending the suspected data back to Kafka according to the suspected data, the abnormal data and the normal data generated in the step 3) to be defined as Topic2, storing the abnormal data and the normal data in a Redis memory database, and writing the abnormal data and the normal data into a MySQL database to realize read-write separation of MySQL;
the KNN function using the KNN algorithm is:
Figure FDA0002772354420000031
x is a vector representation of a log to be classified, diFor an example log vector representation in the training set, cjIs a category; the similarity between the log to be classified and the example log is cosine similarity, and the similarity between the log to be classified and the example log is as follows:
Figure FDA0002772354420000032
d represents an example log vector in the training set, and d is taken as 1, otherwise 0 is taken; the distance metric uses the euclidean distance;
in the KNN algorithm, the click effectiveness of the KNN classifier comprises five sample data, the first sample data is that the number of clicks of the same IP in a period of time is large and abnormal, the second sample data is that the stay time of the clicked IP in an advertisement page is almost negligible and abnormal, the third sample data is that the time of the clicked IP for the abnormal advertisement access time is different from the normal human activity time, the fourth sample data is that the time of the same IP section with different address access synchronicity is similar for multiple times and abnormal, the fifth sample data is that the past behavior and interest of the IP behavior and the concerned advertisement abnormality are different from the IP are suspected, and the sample data is used as KNN representative data to obtain the KNN classifier;
5) 20% of abnormal data in the MySQL database is randomly extracted to train a naive Bayes classifier, and then Topic2 in Kafka is classified under a naive Bayes algorithm through a Spark Streaming quasi-real-time computing framework.
6. The method for detecting abnormal advertisement clicks according to claim 5, wherein the KNN algorithm in step 3) is: and taking the training sample as a reference point, calculating the distance between the test sample and the training sample, and obtaining the closest value in the distance by adopting the Euclidean distance as a classification basis.
7. The method for detecting abnormal advertisement clicks based on Spark Streaming according to claim 6, wherein the formula of the Euclidean distance of the KNN algorithm in the step 2) is as follows:
Figure FDA0002772354420000041
dist (x, y) denotes the Euclidean distance, x and y denote the individual differences, each with n-dimensional features.
CN201610915505.XA 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming Active CN106649527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610915505.XA CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610915505.XA CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Publications (2)

Publication Number Publication Date
CN106649527A CN106649527A (en) 2017-05-10
CN106649527B true CN106649527B (en) 2021-02-09

Family

ID=58856008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610915505.XA Active CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Country Status (1)

Country Link
CN (1) CN106649527B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229564B (en) * 2018-01-05 2022-08-02 创新先进技术有限公司 Data processing method, device and equipment
CN108829715B (en) * 2018-05-04 2022-03-25 慧安金科(北京)科技有限公司 Method, apparatus, and computer-readable storage medium for detecting abnormal data
CN110717771A (en) * 2018-07-11 2020-01-21 武汉斗鱼网络科技有限公司 Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system
CN109388548B (en) * 2018-09-29 2020-12-22 京东数字科技控股有限公司 Method and apparatus for generating information
CN109361699A (en) * 2018-12-06 2019-02-19 四川长虹电器股份有限公司 Anomalous traffic detection method based on Spark Streaming
CN110334105B (en) * 2019-07-12 2022-09-09 河海大学常州校区 Stream data abnormity detection method based on Storm
CN111708846A (en) * 2020-05-14 2020-09-25 北京嗨学网教育科技股份有限公司 Multi-terminal data management method and device
CN112667723A (en) * 2020-12-30 2021-04-16 平安证券股份有限公司 Data acquisition method and terminal equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130005597A (en) * 2011-07-06 2013-01-16 이성진 System for preventing of cpc advertisement fraud click

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633364B2 (en) * 2010-12-30 2017-04-25 Nokia Technologies Oy Method and apparatus for detecting fraudulent advertising traffic initiated through an application
US20130325591A1 (en) * 2012-06-01 2013-12-05 Airpush, Inc. Methods and systems for click-fraud detection in online advertising
CN104765874B (en) * 2015-04-24 2019-03-26 百度在线网络技术(北京)有限公司 For detecting the method and device for clicking cheating

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130005597A (en) * 2011-07-06 2013-01-16 이성진 System for preventing of cpc advertisement fraud click

Also Published As

Publication number Publication date
CN106649527A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106649527B (en) Advertisement click abnormity detection system and detection method based on Spark Streaming
Silva et al. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data
Nguyen et al. Automatic image filtering on social networks using deep learning and perceptual hashing during crises
Huang et al. A probabilistic method for emerging topic tracking in microblog stream
Chen et al. Entity embedding-based anomaly detection for heterogeneous categorical events
Vo et al. Crime rate detection using social media of different crime locations and Twitter part-of-speech tagger with Brown clustering
Lee et al. Incremental cluster evolution tracking from highly dynamic network data
CN112765603B (en) Abnormity tracing method combining system log and origin graph
Hua et al. Automatic targeted-domain spatiotemporal event detection in twitter
CN109284432A (en) Network public opinion analysis system based on big data platform
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
Maini et al. Characterizing datapoints via second-split forgetting
Xie et al. Robust detection of hyper-local events from geotagged social media data
Xiang et al. Spam detection in reviews using LSTM-based multi-entity temporal features
Demirbaga HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data
Liu et al. Detecting collusive spamming activities in community question answering
Peng et al. Emerging topic detection from microblog streams based on emerging pattern mining
Xue et al. An anomaly detection framework for time-evolving attributed networks
Yang et al. News topic detection based on capsule semantic graph
Bhardwaj et al. A human-AI loop approach for joint keyword discovery and expectation estimation in micropost event detection
Shu et al. Automatic extraction of web page text information based on network topology coincidence degree
Lazreg et al. Semantic Decay Filter for Event Detection.
Yang et al. Towards temporal event detection: A dataset, benchmarks and challenges
Zhang et al. Event-radar: Real-time local event detection system for geo-tagged tweet streams
Nguyen et al. Pagerank-based approach on ranking social events: a case study with flickr

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant