CN108415931A - A kind of method for establishing model and system of flow of practising fraud for identification - Google Patents

A kind of method for establishing model and system of flow of practising fraud for identification Download PDF

Info

Publication number
CN108415931A
CN108415931A CN201810059065.1A CN201810059065A CN108415931A CN 108415931 A CN108415931 A CN 108415931A CN 201810059065 A CN201810059065 A CN 201810059065A CN 108415931 A CN108415931 A CN 108415931A
Authority
CN
China
Prior art keywords
flow
cheating
feature
traffic classification
sorted lists
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810059065.1A
Other languages
Chinese (zh)
Other versions
CN108415931B (en
Inventor
郭昊
欧阳辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Friends Of Interactive Information Technology Co Ltd
Original Assignee
Beijing Friends Of Interactive Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Friends Of Interactive Information Technology Co Ltd filed Critical Beijing Friends Of Interactive Information Technology Co Ltd
Priority to CN201810059065.1A priority Critical patent/CN108415931B/en
Publication of CN108415931A publication Critical patent/CN108415931A/en
Application granted granted Critical
Publication of CN108415931B publication Critical patent/CN108415931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Abstract

The invention discloses a kind of method for establishing model and system of flow of practising fraud for identification.This method includes:Obtain a plurality of flow;Extract the cheating feature of flow;Establish the corresponding ad-request number sorted lists in heterogeneous networks address, the corresponding ad-request number sorted lists of different top level domain number of request sorted lists corresponding with different advertisement types;Extract the network address of preceding first preset ratio of ranking;The cheating flow of label first;Extract the top level domain of preceding second preset ratio of ranking;The cheating flow of label second;Extract the advertisement type of the preceding third preset ratio of ranking;Mark third cheating flow;Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical flow;If so, being determined as flow of practising fraud;If it is not, being then determined as normal discharge;Using cheating flow trained traffic classification model is obtained with normal discharge.The present invention disclosure satisfy that DSP environment, improve the robustness of cheating flow identification.

Description

A kind of method for establishing model and system of flow of practising fraud for identification
Technical field
The present invention relates to internet advertisement technology fields, more particularly to a kind of model foundation for flow of practising fraud for identification Method and system.
Background technology
Anti- cheating is always the critical issue of Internet advertising industry.For every flow, party in request platform (Demand Side Platform, DSP) needing real time discriminating, whether it is cheating flow, to be further determined whether to bid, DSP docking One or more advertisement transaction platforms can be directed to complicated flow and be differentiated that robustness is high.Currently, common advertisement Anti- cheat method is to establish disaggregated model, is trained to obtain training pattern to disaggregated model using positive negative sample, utilizes training Model Identification cheating flow.And since DSP can not directly acquire cheating flow sample, the classification that existing method is established Model cannot meet DSP environment, cause its robustness not high.
Invention content
Based on this, it is necessary to a kind of method for establishing model and system of flow of practising fraud for identification are provided, to meet DSP rings The robustness of cheating flow identification is improved in border.
To achieve the above object, the present invention provides following schemes:
A kind of method for establishing model for flow of practising fraud for identification, including:
Obtain a plurality of flow;
Extract the cheating feature of the flow, the cheating feature include the corresponding ad-request number in heterogeneous networks address, The different corresponding ad-request numbers of top level domain and the corresponding number of request of different advertisement types;
According to the cheating feature of the flow, the corresponding ad-request number sorted lists in heterogeneous networks address, difference are established The corresponding ad-request number sorted lists of top level domain and the corresponding number of request sorted lists of different advertisement types;
Extract preceding first preset ratio of ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address Network address;
By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating flow;
Extract preceding second preset ratio of ranking in the corresponding ad-request number sorted lists of the different top level domain Top level domain;
By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating flow;
Extract the wide of the preceding third preset ratio of ranking in the corresponding number of request sorted lists of the different advertisement type Accuse type;
By the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as third cheating flow;
Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical stream Amount;
If so, the identical flow is determined as flow of practising fraud;
If it is not, then the first cheating flow, the second cheating flow and third cheating flow are determined as Normal discharge;
Flow disaggregated model is trained with the normal discharge using the cheating flow, obtains trained flow Disaggregated model, the trained traffic classification model is for being identified flow to be tested.
Optionally, the cheating feature according to the flow establishes the corresponding ad-request number row in heterogeneous networks address Sequence table, the corresponding ad-request number sorted lists of different top level domain number of request Sorted list corresponding with different advertisement types Table specifically includes:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, obtains heterogeneous networks address correspondence Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain it is corresponding with different advertisement types Number of request sorted lists.
Optionally, described that flow disaggregated model is trained with the normal discharge using the cheating flow, it obtains Trained traffic classification model, the trained traffic classification model is for being identified flow to be tested, specifically Including:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the stream It measures in disaggregated model, judges whether the traffic classification model can correctly classify;
If it is not, then adjusting the parameter of the traffic classification model, it is special to return to the cheating by the cheating flow The cheating feature of the normal discharge of seeking peace is input in the traffic classification model, judges that the traffic classification model is No the step for capable of correctly classifying;
If so, the traffic classification model is determined as trained traffic classification model.
Optionally, it is using the method that flow to be tested is identified in the trained traffic classification model:
Extract the cheating feature of flow to be tested;
The cheating feature of the flow to be tested is input in the trained traffic classification model, is exported As a result;
Judge whether the flow to be tested is cheating flow according to the output result.
The present invention also provides a kind of model foundation systems for flow of practising fraud for identification, including:
Acquisition module, for obtaining a plurality of flow;
Cheating characteristic extracting module, the cheating feature for extracting the flow, the cheating feature includes heterogeneous networks The corresponding ad-request number in address, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types;
Sorted lists establish module, and for the cheating feature according to the flow, it is corresponding wide to establish heterogeneous networks address Accuse number of request sorted lists, the request corresponding with different advertisement types of the corresponding ad-request number sorted lists of different top level domain Number sorted lists;
First extraction module exists for extracting ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address The network address of the first preceding preset ratio;
First mark module, for marking the corresponding flow of the network address of preceding first preset ratio of the ranking For the first cheating flow;
Second extraction module exists for extracting ranking in the corresponding ad-request number sorted lists of the different top level domain The top level domain of the second preceding preset ratio;
Second mark module, for marking the corresponding flow of the top level domain of preceding second preset ratio of the ranking For the second cheating flow;
Third extraction module, it is preceding for extracting ranking in the corresponding number of request sorted lists of the different advertisement types The advertisement type of third preset ratio;
Third mark module, for marking the corresponding flow of the advertisement type of the preceding third preset ratio of the ranking For third cheating flow;
Judgment module, for judging the first cheating flow, the second cheating flow and third cheating flow Whether it is identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then described first is practised fraud Flow, the second cheating flow and third cheating flow are determined as normal discharge;
Training module is obtained for being trained to flow disaggregated model with the normal discharge using the cheating flow To trained traffic classification model, the trained traffic classification model is for being identified flow to be tested.
Optionally, the sorted lists establish module, specifically include:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains not for being ranked up from high to low to the corresponding request number of times of each cheating feature With the corresponding ad-request number sorted lists of network address, the corresponding ad-request number sorted lists of different top level domain and difference The corresponding number of request sorted lists of advertisement type.
Optionally, the training module, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the institute of the cheating feature and the normal discharge for extracting the cheating flow State cheating feature;
First judging unit is used for the cheating of the cheating feature and the normal discharge of the cheating flow Feature is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjusting the parameter of the traffic classification model, return to be described by the cheating flow The cheating feature and the cheating feature of the normal discharge be input in the traffic classification model, judge the stream The step for whether amount disaggregated model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained flow Disaggregated model.
Optionally, further include identification module, the identification module is used to utilize the trained traffic classification model pair Flow to be tested is identified, and the identification module specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained flow point In class model, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention proposes a kind of method for establishing model and system of flow of practising fraud for identification, described to include:It obtains more Flow;Extract the cheating feature of flow;According to the cheating feature of flow, the corresponding ad-request number in heterogeneous networks address is established Sorted lists, the corresponding ad-request number sorted lists of different top level domain number of request Sorted list corresponding with different advertisement types Table;Extract the network address of preceding first preset ratio of ranking;By the network address pair of preceding first preset ratio of ranking The flow answered is labeled as the first cheating flow;Extract the top level domain of preceding second preset ratio of ranking;Ranking is preceding The corresponding flow of top level domain of second preset ratio is labeled as the second cheating flow;Extract the preceding third preset ratio of ranking Advertisement type;By the corresponding flow of advertisement type of the preceding third preset ratio of ranking labeled as third cheating flow;Sentence Whether disconnected first cheating flow, the second cheating flow and third cheating flow are identical flow;If so, by identical flow It is determined as flow of practising fraud;If it is not, the first cheating flow, the second cheating flow and third cheating flow are then determined as normal stream Amount;Flow disaggregated model is trained with normal discharge using cheating flow, obtains trained traffic classification model.This hair Bright method or system disclosure satisfy that DSP environment, improve the robustness of cheating flow identification.
Description of the drawings
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow chart of the method for establishing model of the flow of cheating for identification of the embodiment of the present invention;
Fig. 2 is a kind of structure chart of the model foundation system of the flow of cheating for identification of the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
The present invention utilizes the knowledge of hypothesis testing in statistics, gives a kind of model foundation for flow of practising fraud for identification Method, and further flow to be tested is identified and is labelled in real time online using the model of foundation.
Hypothesis testing is one of the classical way for doing statistical inference, and main thought may be caused for two kinds Identical result, the reason of needing to differentiate A and B, ratio of making a mistake (generally 5%) fixed first, the sample distribution under A reasons In, select 5% probability interval of most likely B reasons.If sample falls into the section, which is considered as being led by B reasons It causes, it is on the contrary then be considered as being caused by A reasons.
Using the thought of hypothesis testing, a kind of method for establishing model for flow of practising fraud for identification is present embodiments provided, Fig. 1 is a kind of flow chart of the method for establishing model of the flow of cheating for identification of the embodiment of the present invention.
Referring to Fig. 1, the method for establishing model of the flow of practising fraud for identification of embodiment, including:
Step S1:Obtain a plurality of flow.
Step S2:The cheating feature of the flow is extracted, the cheating feature includes the corresponding advertisement in heterogeneous networks address Number of request, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types.
Step S3:According to the cheating feature of the flow, the corresponding ad-request number Sorted list in heterogeneous networks address is established Table, the corresponding ad-request number sorted lists of different top level domain number of request sorted lists corresponding with different advertisement types.
It specifically includes:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, obtains heterogeneous networks address correspondence Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain it is corresponding with different advertisement types Number of request sorted lists.
Step S4:It is pre- to extract ranking preceding first in the corresponding ad-request number sorted lists in the heterogeneous networks address If the network address of ratio.
In the present embodiment, in the corresponding ad-request number sorted lists in the heterogeneous networks address preceding 5% network is extracted Address.
Step S5:By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating Flow.
In the present embodiment, by preceding 5% network in the corresponding ad-request number sorted lists in the heterogeneous networks address The corresponding flow in location is labeled as the first cheating flow.
Step S6:It is pre- to extract ranking preceding second in the corresponding ad-request number sorted lists of the different top level domain If the top level domain of ratio.
In the present embodiment, extract in the corresponding ad-request number sorted lists of the different top level domain preceding 3% it is top Domain name.
Step S7:By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating Flow.
In the present embodiment, by the corresponding ad-request number sorted lists of the difference top level domain preceding 3% top level domain The corresponding flow of name is labeled as the second cheating flow.
Step S8:Extract the default ratio of the preceding third of ranking in the corresponding number of request sorted lists of the different advertisement types The advertisement type of example.
In the present embodiment, in the corresponding number of request sorted lists of the different advertisement types preceding 8% advertisement type is extracted.
Step S9:The corresponding flow of advertisement type of the preceding third preset ratio of the ranking is practised fraud labeled as third Flow.
In the present embodiment, by the corresponding number of request sorted lists of the difference advertisement type preceding 8% advertisement type pair The flow answered is labeled as third cheating flow.
Step S10:Judge it is described first cheating flow, it is described second cheating flow and the third cheating flow whether be Identical flow.
If so, thening follow the steps S11.
Step S11:The identical flow is determined as flow of practising fraud.
If it is not, thening follow the steps S12.
Step S12:The first cheating flow, the second cheating flow and third cheating flow are determined as Normal discharge.
Step S13:Flow disaggregated model is trained with the normal discharge using the cheating flow, is trained Good traffic classification model.
It specifically includes:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the stream It measures in disaggregated model, judges whether the traffic classification model can correctly classify;
If it is not, then adjusting the parameter of the traffic classification model, it is special to return to the cheating by the cheating flow The cheating feature of the normal discharge of seeking peace is input in the traffic classification model, judges that the traffic classification model is No the step for capable of correctly classifying;
If so, the traffic classification model is determined as trained traffic classification model, it is described trained Traffic classification model is for being identified flow to be tested.
In the present embodiment, flow to be tested is identified using above-mentioned trained traffic classification model, specific side Method is:
By the trained traffic classification model deployment or update onto line;The cheating for extracting flow to be tested is special Sign;The cheating feature of the flow to be tested is input in the trained traffic classification model, output result is obtained; Judge whether the flow to be tested is cheating flow according to the output result.
Identify that cheating flow, the flow to be tested to each carry out cheating identification and label using the above method, with Subsequent algorithm is supplied to use.
The method for establishing model of flow of practising fraud for identification in the present embodiment, does not need previously known positive negative sample, is A kind of unsupervised method can be good at meeting DSP environment, and then improve the robustness of cheating flow identification.
The present invention also provides a kind of model foundation systems for flow of practising fraud for identification, and Fig. 2 is the embodiment of the present invention one Plant the structure chart of the model foundation system of cheating flow for identification.
The model foundation system 20 of the flow of practising fraud for identification of embodiment, including:
Acquisition module 201, for obtaining a plurality of flow.
Cheating characteristic extracting module 202, the cheating feature for extracting the flow, the cheating feature includes different nets The corresponding ad-request number in network address, the request corresponding with different advertisement types of the corresponding ad-request number of different top level domain Number.
Sorted lists establish module 203, and for the cheating feature according to the flow, it is corresponding to establish heterogeneous networks address Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain and different advertisement type is corresponding asks Seek several sorted lists.
The sorted lists establish module 203, specifically include:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains not for being ranked up from high to low to the corresponding request number of times of each cheating feature With the corresponding ad-request number sorted lists of network address, the corresponding ad-request number sorted lists of different top level domain and difference The corresponding number of request sorted lists of advertisement type.
First extraction module 204 is arranged for extracting in the corresponding ad-request number sorted lists in the heterogeneous networks address The network address of preceding first preset ratio of name.
First mark module 205 is used for the corresponding flow of network address of preceding first preset ratio of the ranking Labeled as the first cheating flow.
Second extraction module 206 is arranged for extracting in the corresponding ad-request number sorted lists of the different top level domain The top level domain of preceding second preset ratio of name.
Second mark module 207 is used for the corresponding flow of top level domain of preceding second preset ratio of the ranking Labeled as the second cheating flow.
Third extraction module 208 exists for extracting ranking in the corresponding number of request sorted lists of the different advertisement types The advertisement type of preceding third preset ratio.
Third mark module 209 is used for the corresponding flow of advertisement type of the preceding third preset ratio of the ranking Labeled as third cheating flow.
Judgment module 210, for judging the first cheating flow, the second cheating flow and third cheating stream Whether amount is identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then described first is made Disadvantage flow, the second cheating flow and third cheating flow are determined as normal discharge.
Training module 211, for being trained to flow disaggregated model with the normal discharge using the cheating flow, Trained traffic classification model is obtained, the trained traffic classification model is for being identified flow to be tested.
The training module 211, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the institute of the cheating feature and the normal discharge for extracting the cheating flow State cheating feature;
First judging unit is used for the cheating of the cheating feature and the normal discharge of the cheating flow Feature is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjusting the parameter of the traffic classification model, return to be described by the cheating flow The cheating feature and the cheating feature of the normal discharge be input in the traffic classification model, judge the stream The step for whether amount disaggregated model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained flow Disaggregated model.
Identification module 212, for flow to be tested to be identified using the disaggregated model.
The identification module 212, specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained flow point In class model, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
The model foundation system of flow of practising fraud for identification in the present embodiment, does not need previously known positive negative sample, is A kind of unsupervised method can be good at meeting DSP environment, and then improve the robustness of cheating flow identification.
Principle and implementation of the present invention are described for specific case used herein, and above example is said The bright method and its core concept for being merely used to help understand the present invention;Meanwhile for those of ordinary skill in the art, foundation The thought of the present invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (8)

1. a kind of method for establishing model for flow of practising fraud for identification, which is characterized in that including:
Obtain a plurality of flow;
The cheating feature of the flow is extracted, the cheating feature includes the corresponding ad-request number in heterogeneous networks address, difference The corresponding ad-request number of top level domain and the corresponding number of request of different advertisement types;
According to the cheating feature of the flow, it is top to establish the corresponding ad-request number sorted lists in heterogeneous networks address, difference The corresponding ad-request number sorted lists of domain name and the corresponding number of request sorted lists of different advertisement types;
Extract the net of preceding first preset ratio of ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address Network address;
By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating flow;
Extract the top of preceding second preset ratio of ranking in the corresponding ad-request number sorted lists of the different top level domain Grade domain name;
By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating flow;
Extract the commercial paper of the preceding third preset ratio of ranking in the corresponding number of request sorted lists of the different advertisement types Type;
By the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as third cheating flow;
Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical flow;
If so, the identical flow is determined as flow of practising fraud;
If it is not, then the first cheating flow, the second cheating flow and third cheating flow are determined as normally Flow;
Flow disaggregated model is trained with the normal discharge using the cheating flow, obtains trained traffic classification Model, the trained traffic classification model is for being identified flow to be tested.
2. it is according to claim 1 it is a kind of for identification practise fraud flow method for establishing model, which is characterized in that it is described according to According to the cheating feature of the flow, the corresponding ad-request number sorted lists in heterogeneous networks address, different top level domain pair are established The corresponding number of request sorted lists of ad-request number sorted lists and different advertisement types answered, specifically include:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, it is corresponding wide to obtain heterogeneous networks address Accuse number of request sorted lists, the request corresponding with different advertisement types of the corresponding ad-request number sorted lists of different top level domain Number sorted lists.
3. a kind of method for establishing model of flow of practising fraud for identification according to claim 1, which is characterized in that the profit Flow disaggregated model is trained with the normal discharge with the cheating flow, obtains trained traffic classification model, The trained traffic classification model is specifically included for flow to be tested to be identified:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the flow point In class model, judge whether the traffic classification model can correctly classify;
If it is not, then adjust the parameter of the traffic classification model, return it is described by the cheating feature of the cheating flow and The cheating feature of the normal discharge is input in the traffic classification model, judges that the traffic classification model whether can The step for correct classification;
If so, the traffic classification model is determined as trained traffic classification model.
4. a kind of method for establishing model of flow of practising fraud for identification according to claim 1, which is characterized in that utilize institute Stating the method that flow to be tested is identified in trained traffic classification model is:
Extract the cheating feature of flow to be tested;
The cheating feature of the flow to be tested is input in the trained traffic classification model, output knot is obtained Fruit;
Judge whether the flow to be tested is cheating flow according to the output result.
5. a kind of model foundation system for flow of practising fraud for identification, which is characterized in that including:
Acquisition module, for obtaining a plurality of flow;
Cheating characteristic extracting module, the cheating feature for extracting the flow, the cheating feature includes heterogeneous networks address Corresponding ad-request number, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types;
Sorted lists establish module, for the cheating feature according to the flow, establish the corresponding advertisement in heterogeneous networks address and ask Ask several sorted lists, the corresponding ad-request number sorted lists of different top level domain number of request row corresponding with different advertisement types Sequence table;
First extraction module, it is preceding for extracting ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address The network address of first preset ratio;
First mark module, for by the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the One cheating flow;
Second extraction module, it is preceding for extracting ranking in the corresponding ad-request number sorted lists of the different top level domain The top level domain of second preset ratio;
Second mark module, for by the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the Two cheating flows;
Third extraction module, for extracting the preceding third of ranking in the corresponding number of request sorted lists of the different advertisement types The advertisement type of preset ratio;
Third mark module, for by the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as the Three cheating flows;
Judgment module, for whether judging the first cheating flow, the second cheating flow and third cheating flow For identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then by it is described first practise fraud flow, The second cheating flow and third cheating flow are determined as normal discharge;
Training module is instructed for being trained to flow disaggregated model with the normal discharge using the cheating flow The traffic classification model perfected, the trained traffic classification model is for being identified flow to be tested.
6. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that the row Module is established in sequence table, is specifically included:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains different nets for being ranked up from high to low to the corresponding request number of times of each cheating feature The corresponding ad-request number sorted lists in network address, the corresponding ad-request number sorted lists of different top level domain and different advertisements The corresponding number of request sorted lists of type.
7. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that the instruction Practice module, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the work of the cheating feature and the normal discharge for extracting the cheating flow Disadvantage feature;
First judging unit is used for the cheating feature of the cheating feature and the normal discharge of the cheating flow It is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjust the parameter of the traffic classification model, returning to the institute by the cheating flow The cheating feature for stating cheating feature and the normal discharge is input in the traffic classification model, judges the flow point The step for whether class model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained traffic classification Model.
8. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that further include Identification module, the identification module are used to that flow to be tested to be identified using the trained traffic classification model, The identification module, specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained traffic classification mould In type, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
CN201810059065.1A 2018-01-22 2018-01-22 Model establishing method and system for identifying cheating flow Active CN108415931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810059065.1A CN108415931B (en) 2018-01-22 2018-01-22 Model establishing method and system for identifying cheating flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810059065.1A CN108415931B (en) 2018-01-22 2018-01-22 Model establishing method and system for identifying cheating flow

Publications (2)

Publication Number Publication Date
CN108415931A true CN108415931A (en) 2018-08-17
CN108415931B CN108415931B (en) 2020-05-19

Family

ID=63126019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810059065.1A Active CN108415931B (en) 2018-01-22 2018-01-22 Model establishing method and system for identifying cheating flow

Country Status (1)

Country Link
CN (1) CN108415931B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559149A (en) * 2018-10-17 2019-04-02 杭州家娱互动网络科技有限公司 A kind of flow identifying processing method and device
CN111404835A (en) * 2020-03-30 2020-07-10 北京海益同展信息科技有限公司 Flow control method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106204108A (en) * 2016-06-29 2016-12-07 腾讯科技(深圳)有限公司 The anti-cheat method of advertisement and the anti-cheating device of advertisement
CN106355431A (en) * 2016-08-18 2017-01-25 晶赞广告(上海)有限公司 Detection method, device and terminal for cheating traffic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106204108A (en) * 2016-06-29 2016-12-07 腾讯科技(深圳)有限公司 The anti-cheat method of advertisement and the anti-cheating device of advertisement
CN106355431A (en) * 2016-08-18 2017-01-25 晶赞广告(上海)有限公司 Detection method, device and terminal for cheating traffic

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559149A (en) * 2018-10-17 2019-04-02 杭州家娱互动网络科技有限公司 A kind of flow identifying processing method and device
CN111404835A (en) * 2020-03-30 2020-07-10 北京海益同展信息科技有限公司 Flow control method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108415931B (en) 2020-05-19

Similar Documents

Publication Publication Date Title
WO2020155939A1 (en) Image recognition method and device, storage medium and processor
CN109145159B (en) Method and device for processing data
CN104268134B (en) Subjective and objective classifier building method and system
CN105302911B (en) A kind of data screening engine method for building up and data screening engine
CN108985347A (en) Training method, the method and device of shop classification of disaggregated model
CN107346496A (en) Targeted customer's orientation method and device
CN108229267A (en) Object properties detection, neural metwork training, method for detecting area and device
CN107704806A (en) A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks
CN105491444B (en) A kind of data identifying processing method and device
CN104820835A (en) Automatic examination paper marking method for examination papers
CN109214280A (en) Shop recognition methods, device, electronic equipment and storage medium based on streetscape
CN109120632A (en) Network flow method for detecting abnormality based on online feature selection
CN107886344A (en) Convolutional neural network-based cheating advertisement page identification method and device
CN105869008A (en) Targeted delivery method and device of advertisement
CN105224921A (en) A kind of facial image preferentially system and disposal route
CN109816625A (en) A kind of video quality score implementation method
CN108415931A (en) A kind of method for establishing model and system of flow of practising fraud for identification
CN104867144A (en) IC element solder joint defect detection method based on Gaussian mixture model
CN110210301A (en) Method, apparatus, equipment and storage medium based on micro- expression evaluation interviewee
CN106529189A (en) User classifying method, application server and application client-side
CN109977779A (en) Knowledge method for distinguishing is carried out to the advertisement being inserted into video intention
CN104992482A (en) Athletic competition data processing system and method thereof
CN104933121A (en) Method, device and system for testing foreign language learning and language competence
CN107895140A (en) Porny identification method based on face complexion
CN110152306A (en) Script user identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Unit 01, 9th Floor, Building 20, Dongsanhuan Middle Road, Chaoyang District, Beijing 100022

Applicant after: Beijing Shenyan Intelligent Technology Co., Ltd.

Address before: 100000, 9, 01, unit 20, East Third Ring Road, Chaoyang District, Beijing.

Applicant before: Beijing friends of Interactive Information Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant