CN106649527A - Detection system and detection method of advertisement clicking anomaly based on Spark Streaming - Google Patents

Detection system and detection method of advertisement clicking anomaly based on Spark Streaming Download PDF

Info

Publication number
CN106649527A
CN106649527A CN201610915505.XA CN201610915505A CN106649527A CN 106649527 A CN106649527 A CN 106649527A CN 201610915505 A CN201610915505 A CN 201610915505A CN 106649527 A CN106649527 A CN 106649527A
Authority
CN
China
Prior art keywords
data
abnormal
spark streaming
suspicion
click
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610915505.XA
Other languages
Chinese (zh)
Other versions
CN106649527B (en
Inventor
刘群
谭敢锋
戴大祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201610915505.XA priority Critical patent/CN106649527B/en
Publication of CN106649527A publication Critical patent/CN106649527A/en
Application granted granted Critical
Publication of CN106649527B publication Critical patent/CN106649527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a detection system and a detection method of advertisement clicking anomaly based on Spark Streaming, and relates to the field of computer technique application. Logs are collected when a user clicks the advertisements on a webpage, data collected in real time are cleaned, data field format is standardized, and the standardized data is transferred to the Kafka data information system by Flume, data are classified through a KNN neighborhood algorithm of Spark Streaming, and the three classes of abnormal data, suspicious data, and normal data can be obtained. The abnormal data and the normal data are stored in a database, the suspicious data are sent to the Kafka data information system, and naive Bayes classifiers are trained through the abnormal data, the classification information of the suspicious data can be obtained using the classifier, and data are saved in the database. Advertiser expenses are justly collected by the amount of normal data, in the meantime, the popularities of each advertisement are obtained by analyses, the directions for industrial developments are provided for the advertisers, and the information such as user distributions in the country is provided.

Description

Ad click abnormality detection system and detection method based on Spark Streaming
Technical field
The present invention relates to computer technology application field, is specifically examined extremely based on Spark Streaming ad clicks Examining system and detection method.
Background technology
With the growth of data explosion type, the epoch of big data arrive, at safe, quick, real-time, efficient data Reason, can not only allow enterprise to avoid risk in advance, and can in time provide data message for enterprise development, production and open Send out and authentic and valid foundation is provided.
However, because network has opening, also bring while convenient popular untrue information, malicious access, Malicious attack etc..This is that each opens the problem that website all suffers from, and how to prevent these problems, how to extract authentic and valid number According to mitigation server malice load is the research emphasis of each open website.It is exactly one wherein to throw in clicking maliciously for advertisement Typical problem is planted, abnormal data is grasped in time and is prevented to click maliciously, effective ad click data are obtained, to open website Reasonable fee provides foundation, can be effectively improved server load, and to throw in advertisement trade company rational commercial planning and industry are provided Business is instructed significant.Instantly treatment technology, is generally based on offline batch processing, and such treatment technology can not be real-time Solution line on problem, need high-speed decision scheme quickly to provide theoretical foundation some.For real-time type system such as: Storm, although it possesses the ability of real-time processing data, the effect table in Information Security and large batch of data processing Now it is weaker than Spark Streaming.Spark is a distributed computing framework similar to MapReduce, and its core is elasticity Distributed data collection, there is provided the model more more rich than MapReduce, quickly can carried out repeatedly in internal memory to data set Iteration, to support the data mining algorithm and graphics calculations algorithm of complexity.Spark Streaming are a kind of structures in Spark On real-time Computational frame, it extends the ability that Spark processes extensive stream data.
The advantage of Spark Streaming is:
Can operate on the node of 100+, and reach Millisecond delay.
Using the Spark based on internal memory as enforcement engine, with efficient and fault-tolerant characteristic.
The batch processing and interactive query of the integrated Spark of energy.
Algorithm to realize complicated provides the simple interface similar with batch processing.
So being based on problem above, support with reference to existing Spark big datas Computational frame, and powerful computer hardware, Rational machine learning algorithm, can quickly, efficiently, accurately solve problems.
One object of the present invention is just to provide based on Spark Streaming ad click abnormality detection systems, and it can To be analyzed filtration extremely to the ad click for being invested in user side, effective ad click situation is grasped in time, rationally effectively Advertisement putting charging, analyze behavior and the feature of abnormal data, be more conducive to analyze user behavior and interest, be advertisement putting Business provides commercial planning, product reasonability etc. and serves fact basis, prediction markets future prospects etc..
The content of the invention
Present invention seek to address that above problem of the prior art.It can be quickly, efficiently, accurately advertisement to propose one kind Throw in business provide commercial planning, product reasonability etc. serve fact basis, prediction markets future prospects based on Spark The ad click abnormality detection system of Streaming and detection method.Technical scheme is as follows:
A kind of ad click abnormality detection system based on Spark Streaming, it includes data acquisition unit, data Cleaning unit, distributed data message system, the first anomaly data detection unit, suspicion data extracting unit, normal data and Abnormal data grader and grouped data data library unit;Wherein
Data acquisition unit, for gathering the log information that user clicks on advertisement;
Data cleansing unit, is cleaned and standardization to the daily record that data acquisition unit is collected, finally will mark Data is activation after standardization in distributed data message system, consumed by wait;
Distributed data message system, the data after main data storage standard also store suspicion data extracting unit and send out The suspicion data sent, generate the subject data of consumption needed for Spark Streaming, and different data genarations are each Topic;
First anomaly data detection unit, the data in employing KNN algorithms to coming from distributed information system (3) exist Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit, is mainly used in the suspicion data is activation produced to the first anomaly data detection unit unit In returning distributed data message system;
Normal data and abnormal data grader, employ Naive Bayes Classification method, to being stored in distributed message The suspicion data of system are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database and Redis memory databases, wherein MySQL database For storing normal data and the abnormal data that normal data and abnormal data grader are produced, and abnormal data is mapped to Redis memory databases, are easy to Fast Training Naive Bayes Classifier, and Redis is memory database, are only intended to mapping MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to permanent Preserve.In brief, Redis is a middleware, in order to improve speed.
Further, the Redis memory databases also include for the abnormal data of storage being used for the simplicity that is trained Bayes classifier.
Further, the equipment of the log information that the data acquisition unit collection user clicks on advertisement is log collector Flume (distributed information log collection system), distributed data message system is Kafka.
Further, the KNN functions that the first anomaly data detection unit (4) employs KNN algorithms are:
X is the vector representation of a daily record to be sorted, diAn example daily record vector representation in for training set, cjFor one Classification;Their similarity uses the similarity of cosine similarity, daily record to be sorted and example daily record to be:
Further, in KNN algorithms, the validity that KNN graders are clicked on includes five vectors, and first is " identical IP Hits within a period of time are many then abnormal ", second is that " time of staying for clicking on IP in advertisement page almost can neglect It is slightly then abnormal ", the 3rd is " clicking on IP for advertisement accesses moment abnormal other in the normal human activity time ", the 4th It is " identical IP sections different address access synchronized is repeatedly similar then abnormal " that the 5th is " for IP behaviors and concern advertisement exception These sample datas are trained by the not conventional behavior in this IP and interest then suspicion " to KNN, obtain KNN graders.
Further, the naive Bayesian function is:
Wherein d be attribute number, xiThe value for being x in ith attribute.
It is sample by the abnormal data for being mapped in Redis, trains the grader, in a cycle such as:As soon as all, profit Naive Bayes Classifier is updated with the 20% of random extraction abnormal data re -training.
A kind of ad click method for detecting abnormality based on Spark Streaming, it is comprised the following steps:
1) advertisement click logs of website user are gathered with Flume (distributed information log collection system);
2) to step 1) Flume collects data carries out data normalization process, then again by Flume by standardized data It is Topic1 by this kind of original data definition in being sent to Kafka message systems, Topic1 represents wait by consumption data, I.e. equivalent to the address for defining such data;
3) to step 2) by consumption data Topic1, by Spark Streaming, quasi real time Computational frame exists for middle wait Classified under KNN algorithms;
4) according to step 3) generate suspicion data, abnormal data, normal data, by suspicion data is activation return Kafka in Topic2 is defined as, remainder data is stored in Redis memory databases, then these data is write in MySQL databases, Realize the read and write abruption of MySQL;
5) according to step 4) be extracted from random in Redis in MySQL database 20% abnormal data training is simple Bayes classifier, then by the Topic2 in Kafka by Spark Streaming quasi real time Computational frame in simple pattra leaves Classified under this algorithm.
Further, the step 3) in KNN algorithms be:Training sample is as a reference point, test sample is calculated with instruction Practice the distance of sample, using Euclidean distance, obtain value nearest in distance as the foundation of classification.
Further, step 2) described in the formula of Euclidean distance of KNN algorithms be:
X and y represent that difference is individual, there is n dimensional features respectively.
Advantages of the present invention and have the beneficial effect that:
The present invention gathers user side and throws in ad click data by Flume, and to data cleaning standardization, Flume are carried out Topic1 is generated by consumption in the data is activation after standardization to distributed information system Kafka, waiting subscribing to, using big number KNN sorting algorithms are combined according to quasi real time flow data Spark Streaming Computational frames, is sorted data into as suspicion data, different Often and normal data, then suspicion data is activation returned in Kafka and will generate Topic2, also with big data quasi real time flow data Spark Streaming Computational frames combine Naive Bayes Classification Algorithm, and the Topic2 of suspicion data genaration is classified, Obtain abnormal data and normal data.It is stored in Redis in these process final classifications, is then stored in MySQL database In, the read and write abruption of database is realized, increase read or write speed.
Description of the drawings
Fig. 1 is the structural representation that the present invention provides preferred embodiment;
Fig. 2 is the KNN classification process figures under Spark Streaming;
Fig. 3 is the Naive Bayes Classification flow chart under Spark Streaming.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed Carefully describe.Described embodiment is only a part of embodiment of the present invention.
Technical scheme is as follows:
As shown in figure 1, a kind of ad click abnormality detection system based on Spark Streaming, it is characterised in that bag Include data acquisition unit 1, data cleansing unit 2, distributed data message system 3, the first anomaly data detection unit 4, suspicion Data extracting unit 5, normal data and abnormal data grader 6 and grouped data data library unit;Wherein
Data acquisition unit 1, for gathering the log information that user clicks on advertisement;
Data cleansing unit 2, is cleaned and standardization to the daily record that data acquisition unit 1 is collected, and finally will Data is activation after standardization in distributed data message system 3, consumed by wait;
Distributed data message system 3, the data after main data storage standard also store suspicion data extracting unit and send out The suspicion data sent, generate the subject data of consumption needed for Spark Streaming, and different data genarations are each Topic;
First anomaly data detection unit 4, the data in employing KNN algorithms to coming from distributed information system 3 exist Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit 5, is mainly used in sending out the suspicion data that the unit of the first anomaly data detection unit 4 is produced In sending distributed data message system 3 back to;
Normal data and abnormal data grader 6, employ Naive Bayes Classification method, to being stored in distributed message The suspicion data of system 3 are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database 7 and Redis memory databases 8, wherein MySQL data Storehouse 7 is used to store normal data and the abnormal data that normal data and abnormal data grader 6 are produced, and abnormal data is mapped Redis memory databases are given, is easy to Fast Training Naive Bayes Classifier, Redis is memory database, be only intended to mapping MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to permanent Preserve.In brief, Redis is a middleware, in order to improve speed.
Fig. 2 is the KNN classification process figures under Spark Streaming.
Fig. 3 is the Naive Bayes Classification flow chart under Spark Streaming.
KNN graders are classified to the Topic1 data being stored in after standardization in Kafka, generate suspicion data (KNN Cannot grouped data), normal data and abnormal data, and the normal data and abnormal data of generation are stored in database, Suspicion data is activation is returned in Kafka and generates the classification that Topic2 waits Naive Bayes Classifier, Naive Bayes Classifier It is trained by the sorted abnormal datas of KNN, by the superpower computing capability with reference to big data Spark Streaming, Calculating is become faster, as a result become more accurate, finally store sorted data.
The present invention web page user click on advertisement after, real time filtering abnormal data, and analyze extraction abnormal data feature and Behavior, collects normal data, adds up to the advertisement putting expense that calculates, analysis user behavior and interest, is that advertisement putting enterprise formulates Business is planned, prediction markets future prospects etc..Three classification are reached by first subseries of KNN, suspicion data, abnormal data and Normal data, is then trained by abnormal data to naive Bayesian, and suspicion data are accurately divided, to reach The reasonability of data, abnormal data and normal data, related data and non-relevant data can be strong for precise information excavation Provide safeguard with forecast analysis.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limits the scope of the invention. After the content of the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims (9)

1. a kind of ad click abnormality detection system based on Spark Streaming, it is characterised in that including data acquisition list First (1), data cleansing unit (2), distributed data message system (3), the first anomaly data detection unit (4), suspicion data Extraction unit (5), normal data and abnormal data grader (6) and grouped data data library unit;Wherein
Data acquisition unit (1), for gathering the log information that user clicks on advertisement;
Data cleansing unit (2), is cleaned and standardization to the daily record that data acquisition unit (1) is collected, and finally will Data is activation after standardization in distributed data message system (3), consumed by wait;
Distributed data message system (3), the data after main data storage standard also store suspicion data extracting unit and send The suspicion data come, generate the subject data of consumption needed for Spark Streaming, the respective Topic of different data genaration;
First anomaly data detection unit (4), the data in employing KNN algorithms to coming from distributed information system (3) exist Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit (5), is mainly used in sending out the suspicion data that the first anomaly data detection unit (4) unit is produced In sending distributed data message system (3) back to;
Normal data and abnormal data grader (6), employ Naive Bayes Classification method, to being stored in distributed message system The suspicion data of system (3) are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database (7) and Redis memory databases (8), wherein MySQL data Storehouse (7) for storing the normal data and abnormal data that normal data and abnormal data grader (6) are produced, and by abnormal data Redis memory databases are mapped to, are easy to Fast Training Naive Bayes Classifier, Redis is memory database, is only intended to Mapping MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to Persistence.
2. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists In the Redis memory databases also include being used for the abnormal data of storage to be trained Naive Bayes Classifier.
3. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists In the equipment of the log information that data acquisition unit (1) the collection user clicks on advertisement is that log collector Flume is distributed Result collection system, distributed data message system is Kafka.
4. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists In the KNN functions that the first anomaly data detection unit (4) employs KNN algorithms are:
X is the vector representation of a daily record to be sorted, diAn example daily record vector representation in for training set, cjFor a class Not;Their similarity uses the similarity of cosine similarity, daily record to be sorted and example daily record to be:
c o s < x , d > = x &CenterDot; d | x | &CenterDot; | d |
Wherein when d belongs to cjWhen, d is taken for 1, on the contrary take 0;Distance metric uses Euclidean distance.
5. the ad click abnormality detection system based on Spark Streaming according to claim 3, its feature exists In in KNN algorithms, the validity that KNN graders are clicked on includes five vectors, and first is that " identical IP is within a period of time Hits are many then abnormal ", second is " clicking on IP can almost ignore then abnormal in the time of staying of advertisement page ", the 3rd Individual is " clicking on IP for advertisement accesses the other in the normal human activity time of moment exception ", and the 4th is that " identical IP sections are not It is repeatedly similar then abnormal with address access synchronized ", the 5th be " for IP behaviors and concern advertisement it is abnormal not in this IP with Toward behavior and interest then suspicion ", data are represented as KNN with these sample datas, obtain KNN graders.
6. the ad click abnormality detection system based on Spark Streaming according to claim 3, its feature exists In the naive Bayesian function is:
h n b ( x ) = arg max c &Element; y P ( c ) &Pi; i = 1 d P ( x i | c )
Wherein d be attribute number, xiThe value for being x in ith attribute,
It is sample by the abnormal data for being mapped in Redis, trains the grader, in a cycle such as:As soon as it is all, utilize The abnormal data re -training of the 20% of random extraction updates Naive Bayes Classifier.
7. a kind of ad click method for detecting abnormality based on Spark Streaming, it is characterised in that comprise the following steps:
1) advertisement click logs of website user are gathered with distributed information log collection system Flume;
2) to step 1) Flume collects data carries out data normalization process, then by Flume standardized data is sent again It is Topic1 by this kind of original data definition in Kafka message systems, Topic1 represents wait by consumption data, i.e. phase When in the address for defining such data;
3) to step 2) by consumption data Topic1, by Spark Streaming, quasi real time Computational frame is calculated in KNN for middle wait Classified under method;
4) according to step 3) generate suspicion data, abnormal data, normal data, by suspicion data is activation return Kafka defined in For Topic2, remainder data is stored in Redis memory databases, then by these data write MySQL databases, is realized The read and write abruption of MySQL;
5) according to step 4) be extracted from random in Redis in MySQL database 20% abnormal data is trained into simple pattra leaves This grader, then by the Topic2 in Kafka, by SparkStreaming, quasi real time Computational frame is calculated in naive Bayesian Classified under method.
8. the ad click method for detecting abnormality based on Spark Streaming according to claim 7, its feature exists In the step 3) in KNN algorithms be:Training sample is as a reference point, the distance of test sample and training sample is calculated, adopt With Euclidean distance, value nearest in distance is obtained as the foundation of classification.
9. the ad click method for detecting abnormality based on Spark Streaming according to claim 8, its feature exists In step 2) described in the formula of Euclidean distance of KNN algorithms be:
d i s t ( x , y ) = &Sigma; i = 1 n ( x i - y i ) 2
X and y represent that difference is individual, there is n dimensional features respectively.
CN201610915505.XA 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming Active CN106649527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610915505.XA CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610915505.XA CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Publications (2)

Publication Number Publication Date
CN106649527A true CN106649527A (en) 2017-05-10
CN106649527B CN106649527B (en) 2021-02-09

Family

ID=58856008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610915505.XA Active CN106649527B (en) 2016-10-20 2016-10-20 Advertisement click abnormity detection system and detection method based on Spark Streaming

Country Status (1)

Country Link
CN (1) CN106649527B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229564A (en) * 2018-01-05 2018-06-29 阿里巴巴集团控股有限公司 A kind of processing method of data, device and equipment
CN108829715A (en) * 2018-05-04 2018-11-16 慧安金科(北京)科技有限公司 For detecting the method, equipment and computer readable storage medium of abnormal data
CN109361699A (en) * 2018-12-06 2019-02-19 四川长虹电器股份有限公司 Anomalous traffic detection method based on Spark Streaming
CN109388548A (en) * 2018-09-29 2019-02-26 北京京东金融科技控股有限公司 Method and apparatus for generating information
CN110334105A (en) * 2019-07-12 2019-10-15 河海大学常州校区 A kind of flow data Outlier Detection Algorithm based on Storm
CN110717771A (en) * 2018-07-11 2020-01-21 武汉斗鱼网络科技有限公司 Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system
CN111708846A (en) * 2020-05-14 2020-09-25 北京嗨学网教育科技股份有限公司 Multi-terminal data management method and device
CN112667723A (en) * 2020-12-30 2021-04-16 平安证券股份有限公司 Data acquisition method and terminal equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173315A1 (en) * 2010-12-30 2012-07-05 Nokia Corporation Method and apparatus for detecting fraudulent advertising traffic initiated through an application
KR20130005597A (en) * 2011-07-06 2013-01-16 이성진 System for preventing of cpc advertisement fraud click
US20130325591A1 (en) * 2012-06-01 2013-12-05 Airpush, Inc. Methods and systems for click-fraud detection in online advertising
CN104765874A (en) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 Method and device for detecting click-cheating

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173315A1 (en) * 2010-12-30 2012-07-05 Nokia Corporation Method and apparatus for detecting fraudulent advertising traffic initiated through an application
KR20130005597A (en) * 2011-07-06 2013-01-16 이성진 System for preventing of cpc advertisement fraud click
US20130325591A1 (en) * 2012-06-01 2013-12-05 Airpush, Inc. Methods and systems for click-fraud detection in online advertising
CN104765874A (en) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 Method and device for detecting click-cheating

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林穗 等: "基于 Spark 的线性模型在广告投放系统中的应用研究", 《广东工业大学学报》 *
董亚楠 等: "点击欺诈群体检测与发现", 《计算机应用研究》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229564A (en) * 2018-01-05 2018-06-29 阿里巴巴集团控股有限公司 A kind of processing method of data, device and equipment
CN108829715A (en) * 2018-05-04 2018-11-16 慧安金科(北京)科技有限公司 For detecting the method, equipment and computer readable storage medium of abnormal data
CN108829715B (en) * 2018-05-04 2022-03-25 慧安金科(北京)科技有限公司 Method, apparatus, and computer-readable storage medium for detecting abnormal data
CN110717771A (en) * 2018-07-11 2020-01-21 武汉斗鱼网络科技有限公司 Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system
CN109388548A (en) * 2018-09-29 2019-02-26 北京京东金融科技控股有限公司 Method and apparatus for generating information
CN109388548B (en) * 2018-09-29 2020-12-22 京东数字科技控股有限公司 Method and apparatus for generating information
CN109361699A (en) * 2018-12-06 2019-02-19 四川长虹电器股份有限公司 Anomalous traffic detection method based on Spark Streaming
CN110334105A (en) * 2019-07-12 2019-10-15 河海大学常州校区 A kind of flow data Outlier Detection Algorithm based on Storm
CN111708846A (en) * 2020-05-14 2020-09-25 北京嗨学网教育科技股份有限公司 Multi-terminal data management method and device
CN112667723A (en) * 2020-12-30 2021-04-16 平安证券股份有限公司 Data acquisition method and terminal equipment

Also Published As

Publication number Publication date
CN106649527B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN106649527A (en) Detection system and detection method of advertisement clicking anomaly based on Spark Streaming
Nguyen et al. Automatic image filtering on social networks using deep learning and perceptual hashing during crises
CN105653444B (en) Software defect fault recognition method and system based on internet daily record data
CN106202561B (en) Digitlization contingency management case base construction method and device based on text big data
Wang et al. An improved K-Means clustering algorithm
CN107301118B (en) A kind of fault indices automatic marking method and system based on log
Archak et al. Mining advertiser-specific user behavior using adfactors
CN109165950A (en) A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing
CN105389341A (en) Text clustering and analysis method for repeating caller work orders of customer service calls
Ansah et al. Leveraging burst in twitter network communities for event detection
CN108549647A (en) The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms
Ouyang et al. Study on the classification of data streams with concept drift
CN110533467A (en) User behavior analysis platform and its working method based on big data analysis
CN109753408A (en) A kind of process predicting abnormality method based on machine learning
Jin et al. Crime-GAN: A context-based sequence generative network for crime forecasting with adversarial loss
Fagni et al. Fine-grained prediction of political leaning on social media with unsupervised deep learning
CN104579782A (en) Hotspot security event identification method and system
Yang et al. News topic detection based on capsule semantic graph
Peng et al. Emerging topic detection from microblog streams based on emerging pattern mining
Wang et al. The detection of network intrusion based on improved adaboost algorithm
CN103684896A (en) Method of detecting website cheating based on domain name resolution characteristics
Wei et al. [Retracted] Analysis and Risk Assessment of Corporate Financial Leverage Using Mobile Payment in the Era of Digital Technology in a Complex Environment
Yao et al. Electricity theft detection in unbalanced sample distribution: a novel approach including a mechanism of sample augmentation
Qiao et al. Rapid trajectory clustering based on neighbor spatial analysis
CN102103700A (en) Land mobile distance-based image spam similarity-detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant