CN108415931A - A kind of method for establishing model and system of flow of practising fraud for identification - Google Patents
A kind of method for establishing model and system of flow of practising fraud for identification Download PDFInfo
- Publication number
- CN108415931A CN108415931A CN201810059065.1A CN201810059065A CN108415931A CN 108415931 A CN108415931 A CN 108415931A CN 201810059065 A CN201810059065 A CN 201810059065A CN 108415931 A CN108415931 A CN 108415931A
- Authority
- CN
- China
- Prior art keywords
- flow
- cheating
- feature
- traffic classification
- sorted lists
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
Abstract
The invention discloses a kind of method for establishing model and system of flow of practising fraud for identification.This method includes:Obtain a plurality of flow;Extract the cheating feature of flow;Establish the corresponding ad-request number sorted lists in heterogeneous networks address, the corresponding ad-request number sorted lists of different top level domain number of request sorted lists corresponding with different advertisement types;Extract the network address of preceding first preset ratio of ranking;The cheating flow of label first;Extract the top level domain of preceding second preset ratio of ranking;The cheating flow of label second;Extract the advertisement type of the preceding third preset ratio of ranking;Mark third cheating flow;Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical flow;If so, being determined as flow of practising fraud;If it is not, being then determined as normal discharge;Using cheating flow trained traffic classification model is obtained with normal discharge.The present invention disclosure satisfy that DSP environment, improve the robustness of cheating flow identification.
Description
Technical field
The present invention relates to internet advertisement technology fields, more particularly to a kind of model foundation for flow of practising fraud for identification
Method and system.
Background technology
Anti- cheating is always the critical issue of Internet advertising industry.For every flow, party in request platform (Demand
Side Platform, DSP) needing real time discriminating, whether it is cheating flow, to be further determined whether to bid, DSP docking
One or more advertisement transaction platforms can be directed to complicated flow and be differentiated that robustness is high.Currently, common advertisement
Anti- cheat method is to establish disaggregated model, is trained to obtain training pattern to disaggregated model using positive negative sample, utilizes training
Model Identification cheating flow.And since DSP can not directly acquire cheating flow sample, the classification that existing method is established
Model cannot meet DSP environment, cause its robustness not high.
Invention content
Based on this, it is necessary to a kind of method for establishing model and system of flow of practising fraud for identification are provided, to meet DSP rings
The robustness of cheating flow identification is improved in border.
To achieve the above object, the present invention provides following schemes:
A kind of method for establishing model for flow of practising fraud for identification, including:
Obtain a plurality of flow;
Extract the cheating feature of the flow, the cheating feature include the corresponding ad-request number in heterogeneous networks address,
The different corresponding ad-request numbers of top level domain and the corresponding number of request of different advertisement types;
According to the cheating feature of the flow, the corresponding ad-request number sorted lists in heterogeneous networks address, difference are established
The corresponding ad-request number sorted lists of top level domain and the corresponding number of request sorted lists of different advertisement types;
Extract preceding first preset ratio of ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address
Network address;
By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating flow;
Extract preceding second preset ratio of ranking in the corresponding ad-request number sorted lists of the different top level domain
Top level domain;
By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating flow;
Extract the wide of the preceding third preset ratio of ranking in the corresponding number of request sorted lists of the different advertisement type
Accuse type;
By the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as third cheating flow;
Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical stream
Amount;
If so, the identical flow is determined as flow of practising fraud;
If it is not, then the first cheating flow, the second cheating flow and third cheating flow are determined as
Normal discharge;
Flow disaggregated model is trained with the normal discharge using the cheating flow, obtains trained flow
Disaggregated model, the trained traffic classification model is for being identified flow to be tested.
Optionally, the cheating feature according to the flow establishes the corresponding ad-request number row in heterogeneous networks address
Sequence table, the corresponding ad-request number sorted lists of different top level domain number of request Sorted list corresponding with different advertisement types
Table specifically includes:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, obtains heterogeneous networks address correspondence
Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain it is corresponding with different advertisement types
Number of request sorted lists.
Optionally, described that flow disaggregated model is trained with the normal discharge using the cheating flow, it obtains
Trained traffic classification model, the trained traffic classification model is for being identified flow to be tested, specifically
Including:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the stream
It measures in disaggregated model, judges whether the traffic classification model can correctly classify;
If it is not, then adjusting the parameter of the traffic classification model, it is special to return to the cheating by the cheating flow
The cheating feature of the normal discharge of seeking peace is input in the traffic classification model, judges that the traffic classification model is
No the step for capable of correctly classifying;
If so, the traffic classification model is determined as trained traffic classification model.
Optionally, it is using the method that flow to be tested is identified in the trained traffic classification model:
Extract the cheating feature of flow to be tested;
The cheating feature of the flow to be tested is input in the trained traffic classification model, is exported
As a result;
Judge whether the flow to be tested is cheating flow according to the output result.
The present invention also provides a kind of model foundation systems for flow of practising fraud for identification, including:
Acquisition module, for obtaining a plurality of flow;
Cheating characteristic extracting module, the cheating feature for extracting the flow, the cheating feature includes heterogeneous networks
The corresponding ad-request number in address, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types;
Sorted lists establish module, and for the cheating feature according to the flow, it is corresponding wide to establish heterogeneous networks address
Accuse number of request sorted lists, the request corresponding with different advertisement types of the corresponding ad-request number sorted lists of different top level domain
Number sorted lists;
First extraction module exists for extracting ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address
The network address of the first preceding preset ratio;
First mark module, for marking the corresponding flow of the network address of preceding first preset ratio of the ranking
For the first cheating flow;
Second extraction module exists for extracting ranking in the corresponding ad-request number sorted lists of the different top level domain
The top level domain of the second preceding preset ratio;
Second mark module, for marking the corresponding flow of the top level domain of preceding second preset ratio of the ranking
For the second cheating flow;
Third extraction module, it is preceding for extracting ranking in the corresponding number of request sorted lists of the different advertisement types
The advertisement type of third preset ratio;
Third mark module, for marking the corresponding flow of the advertisement type of the preceding third preset ratio of the ranking
For third cheating flow;
Judgment module, for judging the first cheating flow, the second cheating flow and third cheating flow
Whether it is identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then described first is practised fraud
Flow, the second cheating flow and third cheating flow are determined as normal discharge;
Training module is obtained for being trained to flow disaggregated model with the normal discharge using the cheating flow
To trained traffic classification model, the trained traffic classification model is for being identified flow to be tested.
Optionally, the sorted lists establish module, specifically include:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains not for being ranked up from high to low to the corresponding request number of times of each cheating feature
With the corresponding ad-request number sorted lists of network address, the corresponding ad-request number sorted lists of different top level domain and difference
The corresponding number of request sorted lists of advertisement type.
Optionally, the training module, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the institute of the cheating feature and the normal discharge for extracting the cheating flow
State cheating feature;
First judging unit is used for the cheating of the cheating feature and the normal discharge of the cheating flow
Feature is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjusting the parameter of the traffic classification model, return to be described by the cheating flow
The cheating feature and the cheating feature of the normal discharge be input in the traffic classification model, judge the stream
The step for whether amount disaggregated model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained flow
Disaggregated model.
Optionally, further include identification module, the identification module is used to utilize the trained traffic classification model pair
Flow to be tested is identified, and the identification module specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained flow point
In class model, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention proposes a kind of method for establishing model and system of flow of practising fraud for identification, described to include:It obtains more
Flow;Extract the cheating feature of flow;According to the cheating feature of flow, the corresponding ad-request number in heterogeneous networks address is established
Sorted lists, the corresponding ad-request number sorted lists of different top level domain number of request Sorted list corresponding with different advertisement types
Table;Extract the network address of preceding first preset ratio of ranking;By the network address pair of preceding first preset ratio of ranking
The flow answered is labeled as the first cheating flow;Extract the top level domain of preceding second preset ratio of ranking;Ranking is preceding
The corresponding flow of top level domain of second preset ratio is labeled as the second cheating flow;Extract the preceding third preset ratio of ranking
Advertisement type;By the corresponding flow of advertisement type of the preceding third preset ratio of ranking labeled as third cheating flow;Sentence
Whether disconnected first cheating flow, the second cheating flow and third cheating flow are identical flow;If so, by identical flow
It is determined as flow of practising fraud;If it is not, the first cheating flow, the second cheating flow and third cheating flow are then determined as normal stream
Amount;Flow disaggregated model is trained with normal discharge using cheating flow, obtains trained traffic classification model.This hair
Bright method or system disclosure satisfy that DSP environment, improve the robustness of cheating flow identification.
Description of the drawings
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention
Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow chart of the method for establishing model of the flow of cheating for identification of the embodiment of the present invention;
Fig. 2 is a kind of structure chart of the model foundation system of the flow of cheating for identification of the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
The present invention utilizes the knowledge of hypothesis testing in statistics, gives a kind of model foundation for flow of practising fraud for identification
Method, and further flow to be tested is identified and is labelled in real time online using the model of foundation.
Hypothesis testing is one of the classical way for doing statistical inference, and main thought may be caused for two kinds
Identical result, the reason of needing to differentiate A and B, ratio of making a mistake (generally 5%) fixed first, the sample distribution under A reasons
In, select 5% probability interval of most likely B reasons.If sample falls into the section, which is considered as being led by B reasons
It causes, it is on the contrary then be considered as being caused by A reasons.
Using the thought of hypothesis testing, a kind of method for establishing model for flow of practising fraud for identification is present embodiments provided,
Fig. 1 is a kind of flow chart of the method for establishing model of the flow of cheating for identification of the embodiment of the present invention.
Referring to Fig. 1, the method for establishing model of the flow of practising fraud for identification of embodiment, including:
Step S1:Obtain a plurality of flow.
Step S2:The cheating feature of the flow is extracted, the cheating feature includes the corresponding advertisement in heterogeneous networks address
Number of request, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types.
Step S3:According to the cheating feature of the flow, the corresponding ad-request number Sorted list in heterogeneous networks address is established
Table, the corresponding ad-request number sorted lists of different top level domain number of request sorted lists corresponding with different advertisement types.
It specifically includes:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, obtains heterogeneous networks address correspondence
Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain it is corresponding with different advertisement types
Number of request sorted lists.
Step S4:It is pre- to extract ranking preceding first in the corresponding ad-request number sorted lists in the heterogeneous networks address
If the network address of ratio.
In the present embodiment, in the corresponding ad-request number sorted lists in the heterogeneous networks address preceding 5% network is extracted
Address.
Step S5:By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating
Flow.
In the present embodiment, by preceding 5% network in the corresponding ad-request number sorted lists in the heterogeneous networks address
The corresponding flow in location is labeled as the first cheating flow.
Step S6:It is pre- to extract ranking preceding second in the corresponding ad-request number sorted lists of the different top level domain
If the top level domain of ratio.
In the present embodiment, extract in the corresponding ad-request number sorted lists of the different top level domain preceding 3% it is top
Domain name.
Step S7:By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating
Flow.
In the present embodiment, by the corresponding ad-request number sorted lists of the difference top level domain preceding 3% top level domain
The corresponding flow of name is labeled as the second cheating flow.
Step S8:Extract the default ratio of the preceding third of ranking in the corresponding number of request sorted lists of the different advertisement types
The advertisement type of example.
In the present embodiment, in the corresponding number of request sorted lists of the different advertisement types preceding 8% advertisement type is extracted.
Step S9:The corresponding flow of advertisement type of the preceding third preset ratio of the ranking is practised fraud labeled as third
Flow.
In the present embodiment, by the corresponding number of request sorted lists of the difference advertisement type preceding 8% advertisement type pair
The flow answered is labeled as third cheating flow.
Step S10:Judge it is described first cheating flow, it is described second cheating flow and the third cheating flow whether be
Identical flow.
If so, thening follow the steps S11.
Step S11:The identical flow is determined as flow of practising fraud.
If it is not, thening follow the steps S12.
Step S12:The first cheating flow, the second cheating flow and third cheating flow are determined as
Normal discharge.
Step S13:Flow disaggregated model is trained with the normal discharge using the cheating flow, is trained
Good traffic classification model.
It specifically includes:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the stream
It measures in disaggregated model, judges whether the traffic classification model can correctly classify;
If it is not, then adjusting the parameter of the traffic classification model, it is special to return to the cheating by the cheating flow
The cheating feature of the normal discharge of seeking peace is input in the traffic classification model, judges that the traffic classification model is
No the step for capable of correctly classifying;
If so, the traffic classification model is determined as trained traffic classification model, it is described trained
Traffic classification model is for being identified flow to be tested.
In the present embodiment, flow to be tested is identified using above-mentioned trained traffic classification model, specific side
Method is:
By the trained traffic classification model deployment or update onto line;The cheating for extracting flow to be tested is special
Sign;The cheating feature of the flow to be tested is input in the trained traffic classification model, output result is obtained;
Judge whether the flow to be tested is cheating flow according to the output result.
Identify that cheating flow, the flow to be tested to each carry out cheating identification and label using the above method, with
Subsequent algorithm is supplied to use.
The method for establishing model of flow of practising fraud for identification in the present embodiment, does not need previously known positive negative sample, is
A kind of unsupervised method can be good at meeting DSP environment, and then improve the robustness of cheating flow identification.
The present invention also provides a kind of model foundation systems for flow of practising fraud for identification, and Fig. 2 is the embodiment of the present invention one
Plant the structure chart of the model foundation system of cheating flow for identification.
The model foundation system 20 of the flow of practising fraud for identification of embodiment, including:
Acquisition module 201, for obtaining a plurality of flow.
Cheating characteristic extracting module 202, the cheating feature for extracting the flow, the cheating feature includes different nets
The corresponding ad-request number in network address, the request corresponding with different advertisement types of the corresponding ad-request number of different top level domain
Number.
Sorted lists establish module 203, and for the cheating feature according to the flow, it is corresponding to establish heterogeneous networks address
Ad-request number sorted lists, the corresponding ad-request number sorted lists of different top level domain and different advertisement type is corresponding asks
Seek several sorted lists.
The sorted lists establish module 203, specifically include:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains not for being ranked up from high to low to the corresponding request number of times of each cheating feature
With the corresponding ad-request number sorted lists of network address, the corresponding ad-request number sorted lists of different top level domain and difference
The corresponding number of request sorted lists of advertisement type.
First extraction module 204 is arranged for extracting in the corresponding ad-request number sorted lists in the heterogeneous networks address
The network address of preceding first preset ratio of name.
First mark module 205 is used for the corresponding flow of network address of preceding first preset ratio of the ranking
Labeled as the first cheating flow.
Second extraction module 206 is arranged for extracting in the corresponding ad-request number sorted lists of the different top level domain
The top level domain of preceding second preset ratio of name.
Second mark module 207 is used for the corresponding flow of top level domain of preceding second preset ratio of the ranking
Labeled as the second cheating flow.
Third extraction module 208 exists for extracting ranking in the corresponding number of request sorted lists of the different advertisement types
The advertisement type of preceding third preset ratio.
Third mark module 209 is used for the corresponding flow of advertisement type of the preceding third preset ratio of the ranking
Labeled as third cheating flow.
Judgment module 210, for judging the first cheating flow, the second cheating flow and third cheating stream
Whether amount is identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then described first is made
Disadvantage flow, the second cheating flow and third cheating flow are determined as normal discharge.
Training module 211, for being trained to flow disaggregated model with the normal discharge using the cheating flow,
Trained traffic classification model is obtained, the trained traffic classification model is for being identified flow to be tested.
The training module 211, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the institute of the cheating feature and the normal discharge for extracting the cheating flow
State cheating feature;
First judging unit is used for the cheating of the cheating feature and the normal discharge of the cheating flow
Feature is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjusting the parameter of the traffic classification model, return to be described by the cheating flow
The cheating feature and the cheating feature of the normal discharge be input in the traffic classification model, judge the stream
The step for whether amount disaggregated model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained flow
Disaggregated model.
Identification module 212, for flow to be tested to be identified using the disaggregated model.
The identification module 212, specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained flow point
In class model, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
The model foundation system of flow of practising fraud for identification in the present embodiment, does not need previously known positive negative sample, is
A kind of unsupervised method can be good at meeting DSP environment, and then improve the robustness of cheating flow identification.
Principle and implementation of the present invention are described for specific case used herein, and above example is said
The bright method and its core concept for being merely used to help understand the present invention;Meanwhile for those of ordinary skill in the art, foundation
The thought of the present invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (8)
1. a kind of method for establishing model for flow of practising fraud for identification, which is characterized in that including:
Obtain a plurality of flow;
The cheating feature of the flow is extracted, the cheating feature includes the corresponding ad-request number in heterogeneous networks address, difference
The corresponding ad-request number of top level domain and the corresponding number of request of different advertisement types;
According to the cheating feature of the flow, it is top to establish the corresponding ad-request number sorted lists in heterogeneous networks address, difference
The corresponding ad-request number sorted lists of domain name and the corresponding number of request sorted lists of different advertisement types;
Extract the net of preceding first preset ratio of ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address
Network address;
By the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the first cheating flow;
Extract the top of preceding second preset ratio of ranking in the corresponding ad-request number sorted lists of the different top level domain
Grade domain name;
By the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the second cheating flow;
Extract the commercial paper of the preceding third preset ratio of ranking in the corresponding number of request sorted lists of the different advertisement types
Type;
By the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as third cheating flow;
Judge whether the first cheating flow, the second cheating flow and third cheating flow are identical flow;
If so, the identical flow is determined as flow of practising fraud;
If it is not, then the first cheating flow, the second cheating flow and third cheating flow are determined as normally
Flow;
Flow disaggregated model is trained with the normal discharge using the cheating flow, obtains trained traffic classification
Model, the trained traffic classification model is for being identified flow to be tested.
2. it is according to claim 1 it is a kind of for identification practise fraud flow method for establishing model, which is characterized in that it is described according to
According to the cheating feature of the flow, the corresponding ad-request number sorted lists in heterogeneous networks address, different top level domain pair are established
The corresponding number of request sorted lists of ad-request number sorted lists and different advertisement types answered, specifically include:
Count the corresponding request number of times of each cheating feature in preset time period;
The corresponding request number of times of each cheating feature is ranked up from high to low, it is corresponding wide to obtain heterogeneous networks address
Accuse number of request sorted lists, the request corresponding with different advertisement types of the corresponding ad-request number sorted lists of different top level domain
Number sorted lists.
3. a kind of method for establishing model of flow of practising fraud for identification according to claim 1, which is characterized in that the profit
Flow disaggregated model is trained with the normal discharge with the cheating flow, obtains trained traffic classification model,
The trained traffic classification model is specifically included for flow to be tested to be identified:
Traffic classification model is established using decision Tree algorithms;
Extract the cheating feature of the cheating feature and the normal discharge of the cheating flow;
The cheating feature of the cheating feature of the cheating flow and the normal discharge is input to the flow point
In class model, judge whether the traffic classification model can correctly classify;
If it is not, then adjust the parameter of the traffic classification model, return it is described by the cheating feature of the cheating flow and
The cheating feature of the normal discharge is input in the traffic classification model, judges that the traffic classification model whether can
The step for correct classification;
If so, the traffic classification model is determined as trained traffic classification model.
4. a kind of method for establishing model of flow of practising fraud for identification according to claim 1, which is characterized in that utilize institute
Stating the method that flow to be tested is identified in trained traffic classification model is:
Extract the cheating feature of flow to be tested;
The cheating feature of the flow to be tested is input in the trained traffic classification model, output knot is obtained
Fruit;
Judge whether the flow to be tested is cheating flow according to the output result.
5. a kind of model foundation system for flow of practising fraud for identification, which is characterized in that including:
Acquisition module, for obtaining a plurality of flow;
Cheating characteristic extracting module, the cheating feature for extracting the flow, the cheating feature includes heterogeneous networks address
Corresponding ad-request number, the corresponding ad-request number of different top level domain number of request corresponding with different advertisement types;
Sorted lists establish module, for the cheating feature according to the flow, establish the corresponding advertisement in heterogeneous networks address and ask
Ask several sorted lists, the corresponding ad-request number sorted lists of different top level domain number of request row corresponding with different advertisement types
Sequence table;
First extraction module, it is preceding for extracting ranking in the corresponding ad-request number sorted lists in the heterogeneous networks address
The network address of first preset ratio;
First mark module, for by the corresponding flow of network address of preceding first preset ratio of the ranking labeled as the
One cheating flow;
Second extraction module, it is preceding for extracting ranking in the corresponding ad-request number sorted lists of the different top level domain
The top level domain of second preset ratio;
Second mark module, for by the corresponding flow of top level domain of preceding second preset ratio of the ranking labeled as the
Two cheating flows;
Third extraction module, for extracting the preceding third of ranking in the corresponding number of request sorted lists of the different advertisement types
The advertisement type of preset ratio;
Third mark module, for by the corresponding flow of advertisement type of the preceding third preset ratio of the ranking labeled as the
Three cheating flows;
Judgment module, for whether judging the first cheating flow, the second cheating flow and third cheating flow
For identical flow;If so, the identical flow is determined as flow of practising fraud;If it is not, then by it is described first practise fraud flow,
The second cheating flow and third cheating flow are determined as normal discharge;
Training module is instructed for being trained to flow disaggregated model with the normal discharge using the cheating flow
The traffic classification model perfected, the trained traffic classification model is for being identified flow to be tested.
6. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that the row
Module is established in sequence table, is specifically included:
Statistic unit, for counting the corresponding request number of times of each cheating feature in preset time period;
Sequencing unit obtains different nets for being ranked up from high to low to the corresponding request number of times of each cheating feature
The corresponding ad-request number sorted lists in network address, the corresponding ad-request number sorted lists of different top level domain and different advertisements
The corresponding number of request sorted lists of type.
7. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that the instruction
Practice module, specifically includes:
Disaggregated model establishes unit, for establishing traffic classification model using decision Tree algorithms;
Cheating feature extraction unit, the work of the cheating feature and the normal discharge for extracting the cheating flow
Disadvantage feature;
First judging unit is used for the cheating feature of the cheating feature and the normal discharge of the cheating flow
It is input in the traffic classification model, judges whether the traffic classification model can correctly classify;
Adjustment unit, for if it is not, then adjust the parameter of the traffic classification model, returning to the institute by the cheating flow
The cheating feature for stating cheating feature and the normal discharge is input in the traffic classification model, judges the flow point
The step for whether class model can correctly classify;
Disaggregated model determination unit, for if so, the traffic classification model is determined as trained traffic classification
Model.
8. a kind of model foundation system of flow of practising fraud for identification according to claim 5, which is characterized in that further include
Identification module, the identification module are used to that flow to be tested to be identified using the trained traffic classification model,
The identification module, specifically includes:
Extraction unit, the cheating feature for extracting flow to be tested;
As a result acquiring unit, for the cheating feature of the flow to be tested to be input to the trained traffic classification mould
In type, output result is obtained;
Second judgment unit, for judging whether the flow to be tested is cheating flow according to the output result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810059065.1A CN108415931B (en) | 2018-01-22 | 2018-01-22 | Model establishing method and system for identifying cheating flow |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810059065.1A CN108415931B (en) | 2018-01-22 | 2018-01-22 | Model establishing method and system for identifying cheating flow |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108415931A true CN108415931A (en) | 2018-08-17 |
CN108415931B CN108415931B (en) | 2020-05-19 |
Family
ID=63126019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810059065.1A Active CN108415931B (en) | 2018-01-22 | 2018-01-22 | Model establishing method and system for identifying cheating flow |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415931B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559149A (en) * | 2018-10-17 | 2019-04-02 | 杭州家娱互动网络科技有限公司 | A kind of flow identifying processing method and device |
CN111404835A (en) * | 2020-03-30 | 2020-07-10 | 北京海益同展信息科技有限公司 | Flow control method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022834A (en) * | 2016-05-24 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Advertisement against cheating method and device |
CN106204108A (en) * | 2016-06-29 | 2016-12-07 | 腾讯科技(深圳)有限公司 | The anti-cheat method of advertisement and the anti-cheating device of advertisement |
CN106355431A (en) * | 2016-08-18 | 2017-01-25 | 晶赞广告(上海)有限公司 | Detection method, device and terminal for cheating traffic |
-
2018
- 2018-01-22 CN CN201810059065.1A patent/CN108415931B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022834A (en) * | 2016-05-24 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Advertisement against cheating method and device |
CN106204108A (en) * | 2016-06-29 | 2016-12-07 | 腾讯科技(深圳)有限公司 | The anti-cheat method of advertisement and the anti-cheating device of advertisement |
CN106355431A (en) * | 2016-08-18 | 2017-01-25 | 晶赞广告(上海)有限公司 | Detection method, device and terminal for cheating traffic |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559149A (en) * | 2018-10-17 | 2019-04-02 | 杭州家娱互动网络科技有限公司 | A kind of flow identifying processing method and device |
CN111404835A (en) * | 2020-03-30 | 2020-07-10 | 北京海益同展信息科技有限公司 | Flow control method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108415931B (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020155939A1 (en) | Image recognition method and device, storage medium and processor | |
CN109145159B (en) | Method and device for processing data | |
CN104268134B (en) | Subjective and objective classifier building method and system | |
CN105302911B (en) | A kind of data screening engine method for building up and data screening engine | |
CN108985347A (en) | Training method, the method and device of shop classification of disaggregated model | |
CN107346496A (en) | Targeted customer's orientation method and device | |
CN108229267A (en) | Object properties detection, neural metwork training, method for detecting area and device | |
CN107704806A (en) | A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks | |
CN105491444B (en) | A kind of data identifying processing method and device | |
CN104820835A (en) | Automatic examination paper marking method for examination papers | |
CN109214280A (en) | Shop recognition methods, device, electronic equipment and storage medium based on streetscape | |
CN109120632A (en) | Network flow method for detecting abnormality based on online feature selection | |
CN107886344A (en) | Convolutional neural network-based cheating advertisement page identification method and device | |
CN105869008A (en) | Targeted delivery method and device of advertisement | |
CN105224921A (en) | A kind of facial image preferentially system and disposal route | |
CN109816625A (en) | A kind of video quality score implementation method | |
CN108415931A (en) | A kind of method for establishing model and system of flow of practising fraud for identification | |
CN104867144A (en) | IC element solder joint defect detection method based on Gaussian mixture model | |
CN110210301A (en) | Method, apparatus, equipment and storage medium based on micro- expression evaluation interviewee | |
CN106529189A (en) | User classifying method, application server and application client-side | |
CN109977779A (en) | Knowledge method for distinguishing is carried out to the advertisement being inserted into video intention | |
CN104992482A (en) | Athletic competition data processing system and method thereof | |
CN104933121A (en) | Method, device and system for testing foreign language learning and language competence | |
CN107895140A (en) | Porny identification method based on face complexion | |
CN110152306A (en) | Script user identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Unit 01, 9th Floor, Building 20, Dongsanhuan Middle Road, Chaoyang District, Beijing 100022 Applicant after: Beijing Shenyan Intelligent Technology Co., Ltd. Address before: 100000, 9, 01, unit 20, East Third Ring Road, Chaoyang District, Beijing. Applicant before: Beijing friends of Interactive Information Technology Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |