CN106919579A - A kind of information processing method and device, equipment - Google Patents

A kind of information processing method and device, equipment Download PDF

Info

Publication number
CN106919579A
CN106919579A CN201510990666.0A CN201510990666A CN106919579A CN 106919579 A CN106919579 A CN 106919579A CN 201510990666 A CN201510990666 A CN 201510990666A CN 106919579 A CN106919579 A CN 106919579A
Authority
CN
China
Prior art keywords
parameter
user
trigger event
time period
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510990666.0A
Other languages
Chinese (zh)
Other versions
CN106919579B (en
Inventor
彭作杰
李益群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510990666.0A priority Critical patent/CN106919579B/en
Publication of CN106919579A publication Critical patent/CN106919579A/en
Application granted granted Critical
Publication of CN106919579B publication Critical patent/CN106919579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0245Surveys
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of information processing method, methods described includes:Determine the first trigger event to be sorted;The identification information of user and the attribute information of first trigger action are obtained from first trigger event;Identification information according to the user obtains the fisrt feature parameter for describing user behavior feature of the user in first time period;Attribute information according to first trigger action determines the second feature parameter for describing user behavior feature of the user in second time period, wherein, the first time period is more than the second time period;The fisrt feature parameter and the second feature parameter are input into default disaggregated model, the disaggregated model is with the fisrt feature parameter and the second feature parameter as sorting parameter;Obtain the classification results of first trigger event of the disaggregated model output;Export the classification results.The present invention also discloses a kind of information processing method and equipment.

Description

A kind of information processing method and device, equipment
Technical field
The present invention relates to information technology, more particularly to a kind of information processing method and device, equipment.
Background technology
The development of China Mobile Internet industry raises the development of moving advertising industry, and advertising business is recognized To be most active engine, wherein mobile Internet advertisement (abbreviation moving advertising) conduct in internet economy Most important part in Internet advertising, is that a class is based on wireless communication technology with mobile device as carrier A kind of advertisement form, moving advertising obtained vigorous growth in recent years, moving advertising market rule in 2014 Mould soars to 12,500,000,000 dollars.
Figure 1A is the industrial chain schematic diagram of Internet advertising in correlation technique, and as shown in Figure 1A, internet is wide The industrial chain 10 of announcement includes advertiser 11, advertising platform 12, flow master (such as media etc.) 13 and receives Many (users) 14;Wherein advertising platform 12 is the intermediary between advertiser 11 and flow master 13, works as advertisement Main 11 when have the advertisement to need exposure, and the cost that advertiser can pay economically is hung with by advertisement to be exposed On advertising platform 12, wherein, advertiser 11 often selects some to throw when advertisement to be exposed is hung Condition is put, the flow that for example audient, broadcast mode, charging way, advertisement needs are delivered is main etc..Advertisement is put down Platform 12 determines flow master 13 to be put according to the fixed condition that advertiser 11 is set, then will be to be exposed Advertisement hangs over flow Your Majesty.When product (such as watch video) of the user 15 using flow master, user 15 receive advertisement by such as mobile phone, panel computer, PC terminal 14, so as to complete advertiser's The exposure of advertisement.Be can be seen that from the industrial chain of above-mentioned advertisement:Advertiser can typically pay gold and silver to carry The popularity of oneself product high, and flow master collects advertising expenditure by consuming the flow of itself, it is general next Say that advertising platform can collect the charges according to the actual exposure of advertisement to advertiser, then by agreement to flow master Distribution income.
Collecting for advertising expenditure be all that, by means of some data, such as advertising expenditure is in current Internet advertising Charged according to by clicking on valuation (CPC), if advertising platform or flow is main is carried out using fraudulent meanses It is a large amount of to click on, cheating flow (false data) can be thus formed, and then charge to advertiser.In general, Fraudulent meanses include:Click event is directly transmitted using a large amount of test machines or simulator, it is to employ also there are some Hire or Exciting-simulator system induces user largely to be clicked on, so as to form cheating flow.As can be seen here, advertisement Master is victim maximum in Internet advertising chain, so many advertisers make in carrying out by every means Disadvantage is protected.With perfecting for Internet advertising market, third party monitoring platform arises at the historic moment, third party monitoring Platform is taken into account the data of full platform and practise fraud and prevent by the professional and third-party neutrality of technical elements Shield, can be very good to protect the interests of advertiser.
However, the cheating in Internet advertising can not only damage the economic interests of advertiser, advertisement can be also damaged The reputation of platform, because third party monitoring platform can return to one to advertiser (being somebody's turn to do close to real data Data reject the false data produced due to cheating), if advertising platform is empty to the data that advertiser provides It is high, then advertiser will to be allowed to lose confidence advertising platform, so as to the reputation for causing advertising platform declines. In sum, problem of the cheating as urgent need to resolve in Internet advertising how is prevented.
The content of the invention
In view of this, the embodiment of the present invention provides one to solve at least one problem present in prior art Information processing method and device, equipment are planted, the cheating operation in internet is prevented from, it is correct so as to draw Effect click on result.
What the technical scheme of the embodiment of the present invention was realized in:
In a first aspect, the embodiment of the present invention provides a kind of information processing method, it is characterised in that methods described Including:
Determine the first trigger event to be sorted, first trigger event is used to describe the first trigger action;
The identification information of user and the attribute letter of first trigger action are obtained from first trigger event Breath;
Identification information according to the user obtains the user's row for describing the user in first time period The fisrt feature parameter being characterized;
Attribute information according to first trigger action is determined for describing the user in second time period User behavior feature second feature parameter, wherein, the first time period be more than the second time period;
The fisrt feature parameter and the second feature parameter are input into default disaggregated model, the classification Model is with the fisrt feature parameter and the second feature parameter as sorting parameter;
Obtain the classification results of first trigger event of the disaggregated model output;
Export the classification results.
Second aspect, the embodiment of the present invention provides a kind of information processor, and described device includes that first determines Unit, first acquisition unit, second acquisition unit, the second determining unit, input block, the 3rd obtain single Unit and output unit, wherein:
First determining unit, for determining the first trigger event to be sorted, first trigger event For describing the first trigger action;
The first acquisition unit, identification information and institute for obtaining user from first trigger event State the attribute information of the first trigger action;
The second acquisition unit, for being obtained for describing the user according to the identification information of the user The fisrt feature parameter of the user behavior feature in first time period;
Second determining unit, for being determined for describing according to the attribute information of first trigger action The second feature parameter of user behavior feature of the user in second time period, wherein, when described first Between section be more than the second time period;
The input block, for the fisrt feature parameter and second feature parameter input is default Disaggregated model, the disaggregated model is with the fisrt feature parameter and the second feature parameter as sorting parameter;
3rd acquiring unit, for obtaining dividing for first trigger event that the disaggregated model is exported Class result;
The output unit, for exporting the classification results.
The third aspect, the embodiment of the present invention provides a kind of message processing device, and the equipment includes display device And processing unit, wherein:
The display device, the classification results for showing the processing unit output;
The processing unit, is used for:Determine the first trigger event to be sorted, first trigger event is used In describing the first trigger action;The identification information and described first of user is obtained from first trigger event The attribute information of trigger action;Identification information according to the user is obtained for describing the user first The fisrt feature parameter of the user behavior feature in the time period;According to the attribute information of first trigger action It is determined that the second feature parameter for describing user behavior feature of the user in second time period, wherein, The first time period is more than the second time period;The fisrt feature parameter and the second feature are joined The default disaggregated model of number input, the disaggregated model is joined with the fisrt feature parameter and the second feature Number is sorting parameter;Obtain the classification results of first trigger event of the disaggregated model output;Output The classification results.
A kind of information processing method provided in an embodiment of the present invention and device, equipment, wherein determining to be sorted First trigger event;The identification information of user and the first triggering behaviour are obtained from first trigger event The attribute information of work;Identification information according to the user is obtained for describing the user in first time period The fisrt feature parameter of interior user behavior feature;Attribute information according to first trigger action determines to use In the second feature parameter for describing user behavior feature of the user in second time period, wherein, it is described First time period is more than the second time period;The fisrt feature parameter and the second feature parameter is defeated Enter default disaggregated model, the disaggregated model is with the fisrt feature parameter and the second feature parameter Sorting parameter;Obtain the classification results of first trigger event of the disaggregated model output;Output is described Classification results, so, it is possible to prevent the cheating in internet from operating, so as to show that correct effect clicks on knot Really.
Brief description of the drawings
Figure 1A is the industrial chain schematic diagram of Internet advertising in correlation technique;
Figure 1B be the embodiment of the present invention in carry out information exchange each side's hardware entities schematic diagram;
Fig. 2 realizes schematic flow sheet for the information processing method of the embodiment of the present invention one;
Fig. 3 A are the training process schematic diagram of the disaggregated model of inventive embodiments three;
Fig. 3 B are the block schematic illustration of the sample drawn of the embodiment of the present invention three;
Fig. 3 C are the schematic diagram of regression tree in the disaggregated model of the GBDT of the embodiment of the present invention three;
Fig. 3 D be the embodiment of the present invention in rule-based penalty mode to cheating flow carry out real time filtering Method realizes schematic flow sheet;
Fig. 4 realizes schematic flow sheet for the information processing method of the embodiment of the present invention four;
Fig. 5 A are the hardware composition structural representation of each entity of the embodiment of the present invention;
Fig. 5 B are the composition structural representation of the information processor of the embodiment of the present invention five;
Fig. 6 is the composition structural representation of the message processing device of the embodiment of the present invention six.
Specific embodiment
In order to preferably introduce and understand various embodiments of the present invention, each implementation of the invention is described below Some specialized vocabularies that be may relate in example, specifically include:
Ad-request:User side ad-request pulls advertisement, for representing;
Advertisement exposure:Advertisement carries out once reality and shows that allowing user to see can be regarded as single exposure in user side;
Ad click:User accesses the webpage of advertiser by clicking on advertisement, claims one click;
Advertising conversion:The conversion behavior that user is carried out after clicking on advertisement, such as download software application (APP, Application), purchase commodity etc.;
Click time sequence:There is a series of order of clicks in same user's same day;
Click on cheating:In ad system, can there is brush ad click amount in user for certain malicious intent Behavior;Click comprising cheating can not only have a strong impact on the public praise of advertising platform, and can malice consumption advertisement Master budget, so as to drag down advertiser's rate of return on investment (ROI), ultimately result in advertiser lose faith in advertisement put down Platform;
Click on anti-cheating:Ad click is checked with multiple means, judges whether click on is normal point Hit;
Click on anti-cheating system:The system that anti-cheating is checked is carried out to click;
Cheating self feed back (FOPF, Fraud On Predict Fraud):Represent what actual cheating and prediction were practised fraud Ratio, for correction model precision of prediction.The concept is from the COPC clicked on during conversion ratio (CTR) is estimated (Click On Predict Click);
Anti- cheating strategy:Anti- cheating system is the anti-cheating of strike, has formulated series of rules and model, Every kind of rule and model are referred to as a kind of strategy;
CTR:Click on conversion ratio;
CVR:Effect conversion ratio;
CTR is estimated:The probability that user clicks on advertisement is estimated based on machine learning;
Machine learning:Probability theory, statistics, neural propagation scheduling theory is relied on to enable a computer to simulate people The learning behavior of class, to obtain new knowledge or technical ability, reorganizes the existing structure of knowledge and is allowed to constantly change It is apt to the performance of itself;
Model training:The sample of artificial selection is inputed into machine learning system, is joined by constantly adjustment model Number, makes final mask be optimal the standard rate of calling together of specimen discerning;
There is monitor model:Needs are manually specified positive negative sample and are trained;
Unsupervised model:Positive and negative sample training need not be specified, also without the concept of positive negative sample;
Online marking:To the real-time ad click for producing, malice degree is carried out to it by the means of machine learning Marking, fraction is higher, then probability of practising fraud is bigger;
(Cost Per Mille, CPM) is valuated by every thousand displayings;
(Cost Per Click, CPC) is valuated by clicking on.
Each side's hardware entities of the information exchange involved by the embodiment of the present invention, Tu1BWei are described below The schematic diagram of each side's hardware entities of information exchange is carried out in the embodiment of the present invention, Figure 1B includes:Service Device (can be because of advertising platform) 11 ... 1n, terminal device 21-24, terminal device 21-24 pass through wired network Network or wireless network and server carry out information exchange, terminal device include mobile phone, desktop computer, PC, The types such as all-in-one, in an example, server 11 ... 1n can also be by network and first terminal 31 ... 3n (terminal as where advertiser, or to provide the object that ad material and content are promoted) carry out Interaction, first terminal (terminal as where advertiser, or it is right for provide that ad material and content promote As) advertisement of desired dispensing is submitted to after, be stored in server cluster, keeper can be equipped with to the One terminal (terminal as where advertiser, or to provide the object that ad material and content are promoted) deliver Advertisement carry out a series for the treatment of such as auditing.Wherein, relative to first terminal 31 ..., 3n is (such as where advertiser Terminal, or be that the object that ad material and content are promoted is provided) for, terminal device 21 to 24 can To be referred to as second terminal (terminal as where domestic consumer, or be advertising display or the object of exposure), can Think the user that video is seen by Video Applications, the user played games by game application, by surfing the web User of the page etc..Wherein, all applications installed in terminal device or the application specified (are such as played Using, Video Applications, browser application etc.) advertisement can be added to show user more to recommend Information.The example of above-mentioned Figure 1B is a system architecture example for realizing the embodiment of the present invention, the present invention Embodiment is not limited to the system architecture described in above-mentioned Figure 1B, based on the system architecture, proposes that the present invention is each Individual embodiment.
The technical solution of the present invention is further elaborated with specific embodiment below in conjunction with the accompanying drawings.
Embodiment one
In order to solve foregoing technical problem, the embodiment of the present invention provides a kind of information processing method, the method Message processing device is applied to, described information processing equipment sets including the various calculating with information processing capability It is standby, such as personal computer, panel computer, notebook computer, integrating server etc..The embodiment of the present invention In message processing device be specifically as follows the computing device of advertising platform or flow master or the 3rd to put detection flat Platform, during implementing, the function that the information processing method is realized can be set by information processing Processor caller code in standby realizes that certain program code can be stored in computer-readable storage medium In, it is seen then that the message processing device at least includes processor and storage medium.
Fig. 2 realizes schematic flow sheet for the information processing method of the embodiment of the present invention one, as shown in Fig. 2 should Information processing method includes:
Step S101, determines the first trigger event to be sorted, and first trigger event is used to describe first Trigger action;
Here, first trigger action can include the operation of triggering advertisement exposure, in general, triggering The operation of advertisement exposure can include the operation of clicking operation and the other forms in addition to the clicking operation, The clicking operation include by the clicking operation of mouse and by operating body touch operation on the touchscreen its The operation of his form can for example be operated or expression in the eyes identification operation with acoustic control operation, face recognition, be known by voice Do not determine whether sound that user sends will make advertisement exposure trigger advertisement exposure by acoustic control operation, lead to Cross face recognition and determine whether user will make advertisement exposure, i.e., triggering advertisement exposure is operated by face recognition; Recognized by expression in the eyes and determine whether user will make advertisement exposure, i.e., operation triggering advertisement exposure is recognized by expression in the eyes.
Here, the first trigger event, institute are illustrated by taking the operation that trigger action is click advertisement as an example below State the first trigger event includes many information during implementing, and such as the first trigger event includes using Family information, advertising message, attribute information, the end message clicked on etc., wherein user profile includes user's Identification information (ID), the pet name of user, title, contact method, address, unit, account, sex, year The information such as age, preference, etc. the identification information (ID) of advertising message including advertisement, advertisement classification, advertisement size Information, end message includes model, the manufacturer of terminal, the operator of network, the mark letter of equipment of terminal Breath (ID) etc. information, and the attribute information clicked on include the position clicked on, click on the time, click frequency or The information such as time interval, user behavior custom (pull, double-click, clicking on), the time for exposure of advertisement of click.
Step S102, the identification information and first trigger action of user are obtained from first trigger event Attribute information;
Step S103, the identification information according to the user is obtained for describing the user in first time period The fisrt feature parameter of interior user behavior feature;
Here, can be stored in locally can also be from network or on other databases for the fisrt feature parameter Obtain, for example, fisrt feature parameter can be stored in into inner server.
Step S104, the attribute information according to first trigger action is determined for describing the user The second feature parameter of the user behavior feature in two time periods, wherein, the first time period is more than described Second time period;
Here, during implementing, the fisrt feature parameter can be past one week or mistake The characteristic parameter of 10 days is removed one month or passes by, and second feature parameter can be that the same day or past 24 are small When within or the characteristic parameter within 36 hours in the past.In an embodiment of the present invention, described The attribute information of the one trigger action at least triggering moment including first trigger event (can be understood as a little Hit the moment), the second time period is included from default first moment when triggering of first trigger event Time period between quarter.
In general, by taking advertisement as an example, the characteristic parameter of advertisement includes various different dimensions, for example, include:1) User's long-term characteristic (same day number of clicks, the time interval for clicking on advertiser's number, click frequency or click etc.); 2) user's long-term characteristic (exposes/clicks on/convert number of times, clicks on advertiser/advertisement position in past a cycle APP numbers, click on Annual distribution, click coordinate distribution etc.), 3) user profile (sex, age, preference Deng);4) contextual feature is clicked on (to click on and whether exposure I P matches, whether user logs in account number (such as QQ), advertisement playing duration, click moment etc.);5) (whether broadcasting advertisement shows is exited the main feature of flow Button, average CTR/CVR/FOPF etc. in one section of cycle of history);6) advertiser's feature (one section of history Average CTR/CVR/FOPF etc. in cycle).And actually it is worth two dimensions of concern in the embodiment of the present invention Characteristic parameter, i.e. fisrt feature parameter and second feature parameter, wherein, fisrt feature parameter can be understood as Long-term characteristic, and second feature parameter can be understood as long-term characteristic.
Step S105, default classification mould is input into by the fisrt feature parameter and the second feature parameter Type, the disaggregated model is with the fisrt feature parameter and the second feature parameter as sorting parameter;
Here, the disaggregated model can include the model of various sorting algorithms, wherein the sorting algorithm bag Include logistic regression algorithm (LR), SVMs (SVM) and gradient lifting decision tree (Gradient Boosting Decision Tree, GBDT);When the disaggregated model uses GBDT, the disaggregated model is with described Fisrt feature parameter and the second feature parameter are the classification tree of class node.
Step S106, obtains the classification results of first trigger event of the disaggregated model output;
Step S107, exports the classification results.
Here, when message processing device includes processing unit (such as processor) and display device (display screen) When, the output classification results can include:Processing unit includes described aobvious the classification results On showing device;When described information processing unit does not include display device, the output classification results can To include:The classification results are sent it by the processing unit by communicator (external communication interface) His message processing device.
In the embodiment of the present invention, first trigger event can be online data, or off-line data; When first trigger event is online data, the determination the first trigger event to be sorted, including: Online triggering stream is received, first trigger event is isolated from the triggering stream.What is implemented During, user clicks on advertisement by terminal, and such terminal will set to the information processing of the embodiment of the present invention Standby (can also be other message processing devices) sends a penalty request, and penalty request is at least carried There is trigger event, judge whether trigger event practises fraud or violation for solicited message processing equipment, when many whole When end all sends penalty request to message processing device, then message processing device treatment is exactly continually Triggering stream, when message processing device needs to process trigger event, then from it is described triggering stream in point Separate out the first trigger event.When first trigger event is off-line data, the determination it is to be sorted the One trigger event, including:The daily record (such as click logs) of trigger action is obtained, from the Operation Log Extract first trigger event.
In the embodiment of the present invention, the attribute information of first trigger action at least includes the described first triggering thing The triggering moment of part, the second time period was included from default first moment to first trigger event Time period between triggering moment.Accordingly, it is described to be determined according to the attribute information of first trigger action Second feature parameter for describing user behavior feature of the user in second time period, including:
Step S131, the identification information according to the user obtains the third feature parameter of the second trigger event;
Here, second trigger event include apart from first trigger action click on the moment between when Between the most short trigger event of difference, second trigger event is used to describe the second trigger action, and the described 3rd is special Levy parameter for the user from first moment to the triggering moment of second trigger event between when Between user behavior feature in section;The third feature parameter is similar with the fisrt feature parameter, can Obtained from local or network.
Step S132, attribute information and the third feature parameter determination institute according to first trigger action State second feature parameter.
Here, suppose that the second feature parameter be the same day number of clicks, then the first trigger action it Before, the number of clicks on the same day is 18, then plus the number of clicks of the first trigger action, you can to determine to work as Preceding number of clicks is 19 times;For another example, the second feature parameter is to click on time interval, then then will be from Third feature parameter is the click moment of the second trigger action, the click moment and second of the first trigger action The difference at the click moment of trigger action is the time interval that can determine that click.
In the embodiment of the present invention, as a kind of perferred technical scheme, the fisrt feature parameter is included in Hits in one time period, the second feature parameter is included in hits and click in second time period Frequency or the time interval of click.
Embodiment two
Based on foregoing embodiment, the embodiment of the present invention provides one kind and is formed based on machine learning techniques are introduced A kind of disaggregated model, can all consider that then all characteristic dimensions comprehensively judge to clicking on classification each time. Forming the initial stage of disaggregated model, it is still desirable to which the characteristic parameter of hand picking various dimensions as far as possible supplies engineering Model training is practised, which feature to wipe description from is determined to the discrimination of training result according to characteristic parameter, this In be substantially not present the problem of manual intervention selection parameter, machine learning can be oneself learning suitable parameter Come;Because feature implication is apparently more directly perceived compared to nonsensical parameter, the distribution of binding characteristic data, Explain and be also easier to understand;The real-time penalty of machine learning model is primarily based on, penalty logic is very Complexity, cribber cannot carry out detection and crack by simple adjustment fraudulent meanses to anti-cheating strategy.In addition There is the function of evolutionary learning due to model itself.Even if cribber's change cheating gimmick, by simple weight Newly carry out model training (sometimes needing to be finely adjusted feature), you can go forward side by side with the fraudulent meanses for recognizing new Row punishment, makes cribber all the time difficult to bypass penalty strategy.
The real-time penalty technology of machine learning is clicked on to every and can all carry out independent penalty, and clicks on the time for occurring Sequence is one of characteristic point, and penalty will not be played a decisive role.So for appearing in time sequence The cheating flow of row beginning appears in the correct penalty of normal discharge energy at time sequence end and lets off, So as to avoid situation about imposing uniformity without examining individual cases from occurring;Application of the machine learning techniques in anti-cheating be free to share And propagation, because the principle of machine learning penalty is complicated and can be evolved with self, not for specific certain cheating Means, therefore the anti-cheating way based on machine learning model can equally even be disclosed to cribber.It is based on Foregoing embodiment, the embodiment of the present invention provides a kind of method for forming disaggregated model, and the method includes:
Step S201, positive sample and negative sample are obtained according to default allocation ratio;
Here, during practical operation, can there is certain ratio in cheating operation and normal operating, this Individual ratio is allocation ratio, and when disaggregated model is formed, training data is also required to enter according to the allocation ratio Row is set.
Step S202, extracts the characteristic parameter and the different dimensional of the negative sample of the different dimensions of the positive sample The characteristic parameter of degree;
Here, by taking the clicking operation of advertisement as an example, the characteristic parameter of the different dimensions can include advertisement Characteristic parameter includes various different dimensions, for example, include:1) user's long-term characteristic (same day number of clicks, point Hit time interval of advertiser's number, click frequency or click etc.);2) user's long-term characteristic (one week of past Exposure/click/conversion number of times, click advertiser/advertisement position/APP numbers, click on Annual distribution, click on seat in phase Mark distribution etc.), 3) user profile (sex, age, preference etc.);4) contextual feature is clicked on (to click on Whether matched with exposure I P, whether user logs in account number, advertisement playing duration, click on moment etc.);5) flow Measure main feature and (play whether advertisement shows exit button, average CTR/CVR/FOPF in one section of cycle of history Deng);6) advertiser's feature (average CTR/CVR/FOPF etc. in one section of cycle of history).And the present invention is real Actually value pays close attention to two characteristic parameters of dimension, i.e. fisrt feature parameter and second feature parameter in applying example, Wherein, fisrt feature parameter can be understood as long-term characteristic, and second feature parameter can be understood as long-term spy Levy.The parameter of above-mentioned these different dimensions is extracted from positive sample and negative sample.
Step S203, the characteristic parameter input of the positive sample or the different dimensions of the negative sample is set First training pattern, obtains the first training result, and first training pattern is with the difference with default weight The characteristic parameter of dimension is sorting parameter;
Step S204, if first training result is unsatisfactory for default condition, adjusts each institute one by one The weight of characteristic parameter of different dimensions is stated up to the training result meets the condition, by the described first instruction The first training pattern that white silk result meets the condition is exported as the disaggregated model;
In the embodiment of the present invention, the weight of the characteristic parameter of each different dimensions, including:
UtilizeFrom the spy of the different dimensions Levy and filter out the fisrt feature parameter and the second feature parameter in parameter;
Wherein, D is denoted as the positive sample and negative sample of training dataset, and A represents the feature of certain dimension Parameter, g (D, A) represents the weight that characteristic parameter A is obtained under training dataset D, P (D)iRepresent training Data set D is categorized as the probability of i, P (D/A)iRepresent on the premise of given characteristic parameter A, training data Collection D is categorized as the conditional probability of i.
Step S205, if first training result meets default condition, mould is trained by described first Type is exported as the disaggregated model.
In the embodiment of the present invention, no matter using which kind of training pattern, when training is started, the training pattern Input include the characteristic parameter of above-mentioned different dimensions, if by test of many times this feature parameter not to training When result produces Beneficial Effect or misclassification, the weight of this feature parameter is just reduced, if this feature is joined It is several just to improve the weight of this feature parameter when producing Beneficial Effect to training result, if parameter Weight is reduced to 0, then this feature parameter will cut little ice in training pattern.By the present invention The final experiment of embodiment, the characteristic parameter of above-mentioned different dimension can finally be produced actively to training result Influence is long-term characteristic (i.e. fisrt feature parameter) and long-term characteristic (i.e. second feature parameter).It is false below If the characteristic parameter of different dimensions only includes fisrt feature parameter and second feature parameter (i.e. by others Characteristic parameter is all weeded out), then the forming process of above-mentioned disaggregated model is generally comprised:By the positive sample This fisrt feature parameter and second feature parameter is input into the first training pattern, is obtained from first training pattern Obtain the first training result;Wherein described first training pattern with fisrt feature parameter and second feature parameter as point Class parameter, and each sorting parameter has corresponding weight;
If first training result meets default first condition, by the fisrt feature of the negative sample Parameter and second feature parameter are input into the first training pattern, and the second training knot is obtained from first training pattern Really;If second training result meets default second condition, using first training pattern as The disaggregated model.
If first training result is unsatisfactory for the first condition, first training pattern is adjusted The weight of sorting parameter, obtains the second training pattern;The fisrt feature parameter of the positive sample and second is special Levy parameter and be input into second training pattern, the first new training result is obtained from second training pattern; If the first new training result is unsatisfactory for the first condition, the second training pattern is adjusted according to this The weight of sorting parameter, until the first new training result meets the first condition;Then will be described The fisrt feature parameter and second feature parameter of negative sample are input into the second training pattern, from the described second training mould Type obtains the second training result;If second training result meets default second condition, will be described Second training pattern is used as the disaggregated model.
If second training result is unsatisfactory for default second condition, first training pattern is adjusted Sorting parameter weight, obtain the 3rd training pattern;By the fisrt feature parameter of the negative sample and second Characteristic parameter is input into the 3rd training pattern, and the second new training result is obtained from the 3rd training pattern; If the second new training result is unsatisfactory for the second condition, the 3rd training pattern is adjusted according to this The weight of sorting parameter, it is most described at last until the second new training result meets the second condition 3rd training pattern is used as disaggregated model.
From said process as can be seen that the weight of characteristic parameter is adjusted in disaggregated model in occupation of important work With in fact, the weight of characteristic parameter is adjust automatically in the embodiment of the present invention, you can to realize to people All features of work primary election carry out weight distribution automatically, and the characteristic parameter that protrusion plays a leading role has been weakened non- The characteristic parameter of key effect, for the characteristic parameter that effect is not had to final penalty, weight can directly drop It is 0, i.e., this feature can be ignored.Specific practice is that can calculate this feature ginseng for each characteristic parameter Several information gains to current data to be sorted, the definition of wherein information gain is characterized A to training dataset The entropy of D and the difference of conditional entropy, i.e. formula (1):
G (D, A)=H (D)-H (D/A) (1);
In formula (1), weight is exactly information gain, and what entropy was represented here is that a data set is classified not Determine degree, the calculation of entropy is:Wherein X represents data set, piRepresent Data set is categorized as the probability of i, and log represents the logarithm bottom of for 2, therefore the finally weight calculation of feature A Mode is formula (2):
In formula (2), D represents data set to be sorted, and A represents certain characteristic parameter, and g (D, A) is represented The weight that characteristic parameter A is obtained under training dataset D, P (D)iRepresent that data set D is categorized as the general of i Rate, P (D/A)iRepresent that data set D is categorized as the conditional probability of i, condition on the premise of given feature A Probability calculation formula (3) is:
In formula (3), P (B) represents the probability that event B occurs, and P (AB) represents that event A and B goes out simultaneously Existing probability, and P (A | B) represent on the premise of event B occurs, the probability that event A occurs.
Information gain means that the use of characteristic parameter A, probabilistic when classifying to data set D to subtract Few degree.For example, if there is flock of schoolchildren, the probability that correct classification is initially carried out to its sex is 50%, Now without any characteristic information, entropy is 1 to the maximum.If informing the age of each student, now to its property The probability for not carrying out correct classification is still 50%, and entropy is still maximum, because the age can not provide accurately sentencing The effective information of disconnected sex, then the information gain of characteristic age is 0, now it is believed that weight is 0;If The sex of each student is informed, Gender Classification then is carried out to it again, then the correct probability of classification is 100%, Now entropy minimum 0, the other information gain of characteristic is 1 to the maximum, now it is believed that weight is 1.We can With by being multiplied by different constant factor C come the absolute value of the weight that zooms in or out.
1) the embodiment of the present invention is can be seen that from above flow employ the filtering penalty system based on disaggregated model System, each side factor (such as fisrt feature parameter and second feature parameter) can be considered when click steam arrives Whether evaluation is this time clicked on and is practised fraud, and decides whether penalty rather than mechanical statistics click frequency, more intelligently With it is flexible;2) embodiment of the present invention introduces the characteristic parameter of various different dimensions to instruct training pattern Practice, according to training result determine it is final examine characteristic parameter (such as fisrt feature parameter and second feature parameter, The accuracy that so lifting cheating is recognized.3) distinguishing feature of the disaggregated model that the embodiment of the present invention is used It is that model can be evolved with self, the automatic adjustment for carrying out feature weight of the conversion according to cheating, it is to avoid base In the artificial frequent intervention adjusting parameter of rule.
Embodiment three
From embodiment one and embodiment two as can be seen that various embodiments of the present invention actually provide one kind and are based on Whether the sorting technique of machine learning, be cheating operation with the clicking operation for distinguishing advertisement, to clicking on each time The classification of operation can all consider all characteristic dimensions and then comprehensively be judged.By contrast, the present invention is implemented The technical scheme that example is provided more intelligence and accurate, and according to there is stronger convincingness from the explanation of result.
The training process of training pattern is described below, Fig. 3 A are the instruction of the training pattern of inventive embodiments three Practice process schematic, as shown in Figure 3A, the flow mainly includes:
Step S301, obtains various daily records;
Specifically, during implementing, click logs, the exposure daily record of advertisement, effect can be obtained Fruit daily record (Trace Log) and the user data including user profile;Above-mentioned daily record is mainly used to extract not With the characteristic parameter of dimension, the selection that characteristic parameter will be as far as possible more when initial is subsequently trained mould Gradually rejected when type training and less feature is acted on to classification, influence of the characteristic parameter to classification results can To be differentiated by feature weight, if the weight of certain characteristic parameter is 0 or very low, then It is considered that the effect very little of this characteristic parameter, can exclude.Here the characteristic dimension of initial selected has three More than ten, including user's dimension, advertiser's dimension, the main dimension of flow, long-term characteristic, Short-term characteristic etc.. Because data volume is huge, the acquisition of each feature dimensions angle value is needed by the distributed job based on Hadoop To calculate acquisition.
Step S302, positive negative sample is extracted according to the various daily records for obtaining;
Here, the embodiment of the present invention selects the classification regression model for having supervision as disaggregated model, and has supervision Classification regression model training need use positive negative sample, the selection of positive negative sample here it is pure manually exist with Lower problem:Sample size is limited, and artificial cognition has deviation, and the sample data for marking out also has noise; Mark high cost.Based on above reason, the embodiment of the present invention extracts positive sample and negative sample using programming automation This.The acquisition of positive sample is using the penalty mode of rule-based (Rule-based) and based on statistics (Statistics-based) mode that penalty mode is combined is extracted.The frame diagram of sample drawn is referring to figure 3B, as shown in Figure 3 B, rule-based penalty mode is used to do above-mentioned various daily records roughly to screen, Sample selection rule in wherein rule-based penalty mode can include the rule 321, advertisement of client layer The rule 322 of main stor(e)y and the rule 323 of flow main stor(e)y;Sample after rough screening, then by based on system The penalty mode of meter is screened, for example, select CTR, CVR and cheating rate exceedes certain threshold value (threshold value Statistics draw, therefore the screening mode be referred to as based on statistics penalty mode) advertiser, advertisement position It is main with flow to wait list, sample is cleaned using the method for crossing filtering then, finally give positive sample Sheet 324 and negative sample 325, wherein the positive sample for obtaining and negative sample will meet certain allocation ratio.
Here, it is necessary to explanation, in the acquisition process of positive negative sample, the embodiment of the present invention is also utilized Sorting technique sample is classified.The algorithm for returning of classifying has a multiple choices at present, such as LR, SVM, GBDT etc..Consider data volume and operation efficiency, the embodiment of the present invention is target lock-on in LR On GBDT.Shown according to test data checking under line, LR and GBDT accuracy rate differences in classification Less, but the characteristic parameter of LR requirement different characteristic dimensions will be normalized, and need different dimensional The characteristic parameter of degree is converted into consolidation form, and GBDT can receive any type of characteristic parameter by contrast And need not additionally be normalized.GBDT is multiple by combining as one kind combination boosting algorithm simultaneously Weak Classifier lifts final classifying quality, and this pattern in the industry cycle also widely approved.Therefore As one kind preferred embodiment, GBDT is selected as sorting algorithm.
Be briefly described below a Gradient lifting decision tree (Gradient Boosting Decision Tree, GBDT).The meta classifier that GBDT is used is regression tree (Regression Tree), regression tree it is worked Journey includes:To can export a classification value after data input regression tree to be sorted, by by the classification value with set Fixed classification thresholds are compared, so as to judge whether data to be sorted practise fraud.The grader of GBDT (divides Class model) multiple regression trees are combined cumulative by way of combination, final classification value is obtained, The grader of GBDT is expressed as follows using formula:
T (x in formula (4);Θm) represent the m result of recurrence tree classification, ΘmThe m regression tree is represented, M represents the total number of regression tree, and x represents the first trigger event to be sorted;
When training pattern is trained, each regression tree all goes to be fitted that front is all trained finishes The accumulated result of tree and the difference (being also Residual, residual error, referring to following formula (5)) of actual value,
rmj=yj-fm-1(xj), j=1,2 ..., N (5);
Y in formula (5)jIt is actual value, fm-1(xj) be the tree for training predicted value add up, which j represents Individual sample, N is the sum of sample, and which tree m represents, that is, each tree is only absorbed in front all times Return the part of tree prediction error, the purpose of error in classification is constantly reduced so as to reach.
Step S303, long-term characteristic is generated according to the various daily records for obtaining;
Step S304, Short-term characteristic is extracted from positive negative sample;
Step S305, crossing filter is input into by long-term characteristic and Short-term characteristic, forms input feature value;
Here, the effect of crossing filter is crossing filtering (Cross-Fitering), will a click event User's dimension, advertiser's dimension and the main dimension of flow are respectively mapped to, are then intersected and is judged current click event Whether positive sample (cheating) can be classified as.
Step S306, input feature value is input into training pattern, is entered using training pattern (GBDT) Row training, obtains training result;
Step S307, if training result does not meet expection, the parameter of adjusting training model;
Step S308, if training result meets expection, output training pattern is used as final disaggregated model.
The training process of above-mentioned training pattern can be simply summarized as from above step:1) positive and negative sample is obtained This;2) characteristic parameter is obtained, by characteristic parameter composition characteristic vector;3) sample conversion is will click on to be characterized The form of vector;4) it is trained model training;5) output category model.
The disaggregated model to GBDT is described in detail below, and the disaggregated model is to be determined using gradient lifting Plan tree and formed model fM(x);
Wherein T (x;Θm) represent the m result of recurrence tree classification, ΘmRepresent the m regression tree, M tables Show the total number of regression tree, x represents the first trigger event to be sorted.
Wherein M=1, Θ1Including:It is the first root node with the hits in first time period, with second Hits in time period are first child node and the first leaf node of first root node, first son Node includes the second leaf node and the 3rd leaf node;
Θ2Including:It is the second root node with the click frequency within the second time, second root node includes 4th leaf node and the 5th leaf node.
Above-mentioned Θ1May refer to the left figure of Fig. 3 C, in left figure, Θ1Input be characterized vector, the spy It is by all numbers of clicks (referred to as week is clicked on), day number of clicks (click of abbreviation day) and click frequency to levy vector The vector of (could alternatively be click interval, i.e., the time interval that this is clicked on and the last time is clicked on) composition. Clicked on, it is necessary to extract week click, day from click data to be sorted when the click data of classification in need With these characteristic parameters of click frequency, then again by these characteristic parameter composition characteristics vector, these are special Levy vector input Θ1, in Θ1In, week is clicked on as Θ1Root node, this week click on classification thresholds be 50 It is secondary, wherein clicking on as the Θ in the week more than 50 times1Root node the first child node, and be less than 50 times Week click on as Θ1The first leaf node, and the cheating probability of the first leaf node is 0.2, wherein leaf The cheating probability of node is that the click data is classified as the probability that cheating is clicked in the leaf node.First son To click on day as sorting parameter, the classification thresholds that wherein day is clicked on are 20 times, the day more than 20 times to node Click on as Θ1The second leaf node, click on as Θ day less than 20 times1The 3rd leaf node, The cheating probability of two leaf nodes is 0.8, and the cheating probability of the 3rd leaf node is 0.7.When a click Data by after the classification of binary tree, eventually by Θ1Sort out a cheating probability.
Above-mentioned Θ2May refer to the right figure of Fig. 3 C, Θ2Only one of which class node, wherein sorting parameter are Click frequency, and the classification thresholds of the click frequency are 0.5, Θ2Two leaf nodes cheating probability point Wei 0.15 and -0.03.From Θ2And Θ1The upper disaggregated model that can be seen that in fig. 3 c is had altogether including two Regression tree, and the click being input into is had altogether including 4 sample ABCD, each sample is by returning tree classification A cheating probability can be all obtained afterwards, then two are set the cheating probability for drawing and is added as disaggregated model most The probability for being categorized as cheating click of the sample for drawing eventually, by taking sample A as an example, sample A is by Θ1Classification Cheating probability afterwards is 0.8, by Θ2Sorted cheating probability is (0.15), then disaggregated model is final The cheating probability of the sample A for drawing is 0.95 (0.95=0.8+0.15).
A kind of method that rule-based penalty mode carries out real time filtering to cheating flow is first introduced below, extensively Platform or flow platform are accused for the real time filtering to flow of practising fraud, generally using rule-based mode (Rule-based) penalty is identified to cheating flow, such as clicks on the modes such as frequency control, blacklist and come Realize the punishment to flow of practising fraud.It is general based on the method that real time filtering is carried out to cheating flow referring to Fig. 3 D Including following steps:
Step S51, carries out data investigation, according to finding and artificial experience system under tactful personnel's boostrap Make different rules;
Here, different rules is some threshold values on penalty and the rule of punishment entry-into-force time for setting, The rule of client layer, the rule of advertisement main stor(e)y and the rule of flow main stor(e)y can for example be included, wherein, Fig. 3 D In only list the rule of some client layers, such as IP blacklists 341, subscriber blacklist 342, IP are clicked on Whether excessively frequent 343, user click on whether excessively frequent 344 etc..
Step S52, when the online click steam of input reaches in specified dimension (such as IP blacklists or user) To meeting condition, then it is assumed that the click thinks that cheating is clicked on.
Wherein, the condition can be one-dimensional (only including list) or two-dimentional (including threshold value and time), example Such as, if specified dimension is IP blacklists, then if the IP address of the user in click steam is the black names of IP In list, then the then click is considered as that cheating is clicked on;If specified dimension is whether excessively user clicks on Frequently, then if the current number of clicks of the user alreadys exceed threshold value, then click of the user after Cheating will be considered as to click on, in other words, click before is then considered as normal click.
There is problems with this rule-based penalty mode:1) it is involved in rule-based penalty mode All parameters by manually being set according to finding, therefore, these parameters excessively empirical;2) Because parameter is rule of thumb to set in rule-based penalty mode, therefore the interpretation of parameter is poor;3) Another big defect of rule-based penalty mode is:Cribber can be tried by attempting different cheating frequencies Visit, so that the parameter of some anti-cheating strategies is reversely released, so as to get around rule-based penalty;4) it is based on Rule in the penalty mode of regular penalty is excessively harsh, because the general rule for using is such:Think Cheating flow is produced in the centre of time series all the time, so let off harshly in rule-based penalty mode The flow of time series beginning, and the flow for cutting down time series end completely;But, work in practice Disadvantage is clicked on and possibly be present at the beginning of time series, middle or ending, therefore, rule-based penalty side The data that formula draws may be fewer than actual cheating click, in addition, penalty in rule-based penalty mode Cheating to click on be probably that the part that produces of normal users has an abnormal click, and normal users are produced one The click of field should not but be calculated as cheating and click on, it can be seen that, the number that rule-based penalty mode draws According to there is certain error;5) the anti-cheating rule in rule-based penalty mode will carry out security work, prevent The only leakage of details, and anti-rule of practising fraud once is leaked, then cribber will bypass all of cheating Rule, so as to cause economic loss to advertiser and cause reputation to lose to advertising platform.
Example IV
Embodiment three describes the process for how forming disaggregated model, and how sharp the embodiment of the present invention is introduced With the disaggregated model for being formed click on the flow of classification, shown in Figure 4, the flow includes:
Step S401, generates long-term characteristic;
Here, various daily records are obtained from Hadoop distributed servers, is extracted from above-mentioned various daily records Go out long-term characteristic.
Step S402, inner server is stored in by the long-term characteristic of extraction;
Step S403, obtains online click steam;
Here, obtain after the click steam of line, click event to be sorted is extracted from online click steam;
Step S404, will click on event input inquiry service system, then load classification model;
Step S405, the identification information of user is determined according to click event, and the identification information according to user is from interior Deposit server and obtain long-term characteristic and Short-term characteristic;
Here, the Short-term characteristic for being obtained from inner server can be understood as the Short-term characteristic of last click, For example before this click, the number of clicks of the user is 25 times, if it is desired to obtain the point of active user Hit number of times, then need to know the number of clicks before user today, and the click time before user today Number is just stored in inner server in the form of Short-term characteristic.
Step S406, determines current according to the Short-term characteristic obtained from inner server and current click event The Short-term characteristic of click;
Step S407, the long-term characteristic that will be obtained from inner server, the Short-term characteristic that step S406 determines, Form characteristic vector and be input to disaggregated model, then obtain the classification results of disaggregated model output;
Here, in addition it is also necessary to which this Short-term characteristic is returned into inner server, so as to next subseries when Wait, for calculating the Short-term characteristic clicked on next time.The classification results are to judge that the click is that cheating is clicked on Or it is normal to click on.Comprehensive to implement three and example IV, training pattern training obtains disaggregated model after finishing, Disaggregated model is loaded into the server (inquiry service) on line, while a part of long-term characteristic According to being loaded on line.When click on event reach inquiry service, will click in real time data conversion be characterized to Amount, is then classified (process given a mark) characteristic vector input disaggregated model, (can according to classification results To be interpreted as result of giving a mark) decide whether punishment.
Embodiment five
Based on foregoing embodiment, the embodiment of the present invention provides a kind of information processor, included by the device The first determining unit, first acquisition unit, second acquisition unit, the second determining unit, input block, The each unit such as the 3rd acquiring unit and output unit, and each module included by each unit, can pass through Processor in message processing device is realized, can also realized by specific logic circuit certainly;Wherein, For the processor for data processing, when treatment is performed, microprocessor, centre can be used Reason device (CPU, Central Processing Unit), digital signal processor (DSP, Digital Singnal Processor) or programmable logic array (FPGA, Field Programmable Gate Array) realize; For storage medium, comprising operational order, the operational order can be computer-executable code, lead to The operational order is crossed to realize each step in embodiments of the present invention information processing method flow.
Terminal, message processing device, server in the embodiment of the present invention etc. are used as the one of hardware entities S11 As shown in Figure 5A, hardware entities S11 includes processor 61, storage medium 62 and at least one to individual example External communication interface 63;The processor 61, storage medium 62 and external communication interface 63 are by total Line 64 is connected.
Fig. 5 B are the composition structural representation of the information processor of the embodiment of the present invention five, as shown in Figure 5 B, The device 500 includes the first determining unit 501, first acquisition unit 502, second acquisition unit 503, the Two determining units 504, input block 505, the 3rd acquiring unit 506 and output unit 507, wherein:
First determining unit 501, for determining the first trigger event to be sorted, first triggering Event is used to describe the first trigger action;
The first acquisition unit 502, the identification information for obtaining user from first trigger event With the attribute information of first trigger action;
The second acquisition unit 503, for being obtained for described in describing according to the identification information of the user The fisrt feature parameter of user behavior feature of the user in first time period;
Second determining unit 504, for determining to be used for according to the attribute information of first trigger action The second feature parameter of user behavior feature of the user in second time period is described, wherein, described One time period was more than the second time period;
The input block 505, for the fisrt feature parameter and second feature parameter input is pre- If disaggregated model, the disaggregated model with the fisrt feature parameter and the second feature parameter be classification Parameter;
3rd acquiring unit 506, first trigger event for obtaining the disaggregated model output Classification results;
The output unit 507, for exporting the classification results.
In the embodiment of the present invention, the attribute information of first trigger action at least includes the described first triggering thing The triggering moment of part, the second time period was included from default first moment to first trigger event Time period between triggering moment;
Second determining unit includes the first acquisition module and determining module, wherein:
First acquisition module, for obtaining the of the second trigger event according to the identification information of the user Three characteristic parameters, second trigger event was included between the click moment of first trigger action Time difference most short trigger event, the third feature parameter is used for the user from first moment to institute State the user behavior feature in the time period between the triggering moment of the second trigger event;
The determining module, joins for the attribute information according to first trigger action and the third feature Number determines the second feature parameter.
In the embodiment of the present invention, first determining unit includes receiver module and the first extraction module, wherein, The receiver module, for receiving online triggering stream, first extraction module, for from the triggering First trigger event is extracted in stream;Or,
First determining unit include the second acquisition module and the second extraction module, second acquisition module, Daily record for obtaining trigger action, second extraction module, for being extracted from the Operation Log First trigger event.
In the embodiment of the present invention, described device also includes that the 4th acquiring unit, extraction unit, the second input are single Unit, adjustment unit and the 3rd determining unit, wherein:
4th acquiring unit, obtains positive sample and bears for the allocation ratio according to default positive negative sample Sample;
The extraction unit, characteristic parameter and the negative sample for extracting the different dimensions of the positive sample Different dimensions characteristic parameter;
Second input block, for by the feature of the positive sample or the negative sample and its different dimensions The first training pattern that parameter and input are set, obtains the first training result, and first training pattern is having The characteristic parameter for having the different dimensions of default weight is sorting parameter;
The adjustment unit, if being unsatisfactory for default condition for first training result, adjusts one by one The weight of the characteristic parameter of whole each different dimensions meets the condition up to the training result, by institute State the first training result and meet the first training pattern of the condition and exported as the disaggregated model;
3rd determining unit, if meeting default condition for first training result, by institute The first training pattern is stated to be exported as the disaggregated model.
In the embodiment of the present invention, the weight of the characteristic parameter of each different dimensions, including:
UtilizeFrom the spy of the different dimensions Levy and filter out the fisrt feature parameter and the second feature parameter in parameter;
Wherein, D is denoted as the positive sample and negative sample of training dataset, and A represents the feature of certain dimension Parameter, g (D, A) represents the weight that characteristic parameter A is obtained under training dataset D, P (D)iRepresent training Data set D is categorized as the probability of i, and n represents that sample one has several possibilities, P (D/A)iRepresent given On the premise of characteristic parameter A, training dataset D is categorized as the conditional probability of i.
In the embodiment of the present invention, the fisrt feature parameter is included in the hits in first time period, described Second feature parameter is included in hits and click frequency in second time period.
In the embodiment of the present invention, the disaggregated model is the model formed using gradient lifting decision tree fM(x),Wherein T (x;Θm) represent the m result of recurrence tree classification, ΘmTable Show the m regression tree, M represents the total number of regression tree, and x represents the first trigger event to be sorted;
Wherein M=1, Θ1Including:It is the first root node with the hits in first time period, with second Hits in time period are first child node and the first leaf node of first root node, first son Node includes the second leaf node and the 3rd leaf node;
Θ2Including:It is the second root node with the click frequency within the second time, second root node includes 4th leaf node and the 5th leaf node.
It need to be noted that be:The description of apparatus above embodiment, the description with above method embodiment is Similar, with the similar beneficial effect of same embodiment of the method, therefore do not repeat.For apparatus of the present invention The ins and outs not disclosed in embodiment, refer to the description of the inventive method embodiment and understand, to save Length, therefore repeat no more.
Embodiment six
Based on foregoing embodiment, the embodiment of the present invention provides a kind of message processing device, and Fig. 6 is the present invention The composition structural representation of the message processing device of embodiment six, as shown in fig. 6, the message processing device 600 Including display device 601 and processing unit 602, wherein:
The display device 601, the classification results for showing the processing unit output;
The processing unit 602, is used for:Determine the first trigger event to be sorted, the first triggering thing Part is used to describe the first trigger action;The identification information of user and described is obtained from first trigger event The attribute information of the first trigger action;Identification information according to the user is obtained and existed for describing the user The fisrt feature parameter of the user behavior feature in first time period;According to the attribute of first trigger action Information determines the second feature parameter for describing user behavior feature of the user in second time period, Wherein, the first time period is more than the second time period;By the fisrt feature parameter and described second Characteristic parameter is input into default disaggregated model, and the disaggregated model is with the fisrt feature parameter and described second Characteristic parameter is sorting parameter;Obtain the classification results of first trigger event of the disaggregated model output; Export the classification results.
It need to be noted that be:The description of above apparatus embodiments, be with above method description it is similar, With same embodiment of the method identical beneficial effect, therefore do not repeat.For in present device embodiment The ins and outs not disclosed, those skilled in the art refer to the description of the inventive method embodiment and understand, To save length, repeat no more here.
It should be understood that " one embodiment " or " embodiment " that specification is mentioned in the whole text means and reality Applying the relevant special characteristic of example, structure or characteristic is included at least one embodiment of the present invention.Therefore, " in one embodiment " or " in one embodiment " occurred everywhere in entire disclosure not necessarily refers to Identical embodiment.Additionally, these specific feature, structure or characteristics can be combined in any suitable manner In one or more embodiments.It should be understood that in various embodiments of the present invention, the sequence of above-mentioned each process Number size be not meant to the priority of execution sequence, the execution sequence of each process should be patrolled with its function and inherence Collect and determine, the implementation process without tackling the embodiment of the present invention constitutes any restriction.The embodiments of the present invention Sequence number is for illustration only, and the quality of embodiment is not represented.
It should be noted that herein, term " including ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or dress including a series of key elements Putting not only includes those key elements, but also other key elements including being not expressly set out, or also including being This process, method, article or the intrinsic key element of device.In the absence of more restrictions, by The key element that sentence "including a ..." is limited, it is not excluded that in the process including the key element, method, thing Also there is other identical element in product or device.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can To realize by another way.Apparatus embodiments described above are only schematical, for example, institute The division of unit is stated, only a kind of division of logic function there can be other dividing mode when actually realizing, Such as:Multiple units or component can be combined, or be desirably integrated into another system, or some features can be neglected Slightly, or do not perform.In addition, the coupling each other of shown or discussed each part or directly coupling Close or communication connection can be that the INDIRECT COUPLING or communication connection of equipment or unit can by some interfaces Be it is electrical, machinery or other forms.
It is above-mentioned as separating component illustrate unit can be or may not be it is physically separate, as The part that unit shows can be or may not be physical location;Both a place had been may be located at, also might be used To be distributed on multiple NEs;Part or all of unit therein can be according to the actual needs selected Realize the purpose of this embodiment scheme.
In addition, each functional unit in various embodiments of the present invention can be fully integrated into a processing unit, Can also be each unit individually as a unit, it is also possible to which two or more units are integrated in one In individual unit;Above-mentioned integrated unit can both be realized in the form of hardware, it would however also be possible to employ hardware adds soft The form of part functional unit is realized.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can To be completed by the related hardware of programmed instruction, foregoing program can be stored in embodied on computer readable storage In medium, the program upon execution, performs the step of including above method embodiment;And foregoing storage is situated between Matter includes:Movable storage device, read-only storage (Read Only Memory, ROM), magnetic disc or CD etc. is various can be with the medium of store program codes.
Or, if the above-mentioned integrated unit of the present invention is using realization in the form of software function module and as independently Production marketing or when using, it is also possible to storage is in a computer read/write memory medium.Based on so Understanding, the part that the technical scheme of the embodiment of the present invention substantially contributes to prior art in other words can Embodied with the form of software product, the computer software product is stored in a storage medium, bag Some instructions are included to be used to so that a computer equipment (can be personal computer, server or network Equipment etc.) perform all or part of each embodiment methods described of the invention.And foregoing storage medium bag Include:Movable storage device, ROM, magnetic disc or CD etc. are various can be with the medium of store program codes.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited to This, any one skilled in the art the invention discloses technical scope in, can readily occur in Change or replacement, should all be included within the scope of the present invention.Therefore, protection scope of the present invention should It is defined by the scope of the claims.

Claims (13)

1. a kind of information processing method, it is characterised in that methods described includes:
Determine the first trigger event to be sorted, first trigger event is used to describe the first trigger action;
The identification information of user and the attribute letter of first trigger action are obtained from first trigger event Breath;
Identification information according to the user obtains the user's row for describing the user in first time period The fisrt feature parameter being characterized;
Attribute information according to first trigger action is determined for describing the user in second time period User behavior feature second feature parameter, wherein, the first time period be more than the second time period;
The fisrt feature parameter and the second feature parameter are input into default disaggregated model, the classification Model is with the fisrt feature parameter and the second feature parameter as sorting parameter;
Obtain the classification results of first trigger event of the disaggregated model output;
Export the classification results.
2. method according to claim 1, it is characterised in that the attribute letter of first trigger action The breath at least triggering moment including first trigger event, the second time period is included from default first Moment to the time period between the triggering moment of first trigger event.
3. method according to claim 2, it is characterised in that described according to first trigger action The attribute information second feature that determines for describing user behavior feature of the user in second time period Parameter, including:
Identification information according to the user obtains the third feature parameter of the second trigger event, and described second touches Hair event includes the trigger event most short apart from the time difference clicked between the moment of first trigger action, The third feature parameter be used for the user from first moment when triggering of second trigger event User behavior feature in time period between quarter;
Second feature described in attribute information and the third feature parameter determination according to first trigger action Parameter.
4. method according to claim 1, it is characterised in that the determination the first triggering to be sorted Event, including:
Online triggering stream is received, first trigger event is isolated from the triggering stream;Or,
The daily record of trigger action is obtained, first trigger event is extracted from the Operation Log.
5. the method according to any one of Claims 1-4, it is characterised in that methods described also includes:
Positive sample and negative sample are obtained according to default allocation ratio;
Extract the feature ginseng of the characteristic parameter of the different dimensions of the positive sample and the different dimensions of the negative sample Number;
The first training mould that the characteristic parameter input of the positive sample or the different dimensions of the negative sample is set Type, obtains the first training result, and first training pattern is with the feature of the different dimensions with default weight Parameter is sorting parameter;
If first training result is unsatisfactory for default condition, each different dimensions are adjusted one by one Characteristic parameter weight until the training result meet the condition, will first training result satisfaction First training pattern of the condition is exported as the disaggregated model;
If first training result meets default condition, using first training pattern as described Disaggregated model is exported.
6. method according to claim 5, it is characterised in that each different dimensions of adjustment Characteristic parameter weight, including:
Utilize g ( D , A ) = - Σ i = 1 n P ( D ) i * log P ( D ) i + Σ i = 1 n P ( D / A ) i * log P ( D / A ) i From the spy of the different dimensions Levy and filter out the fisrt feature parameter and the second feature parameter in parameter;
Wherein, D is denoted as the positive sample and negative sample of training dataset, and A represents the different dimensions In a characteristic parameter for dimension, g (D, A) represents the weights that are obtained under training dataset D of characteristic parameter A, P(D)iRepresent that training dataset D is categorized as the probability of i, P (D/A)iRepresent before given characteristic parameter A Put, training dataset D is categorized as the conditional probability of i.
7. the method according to any one of Claims 1-4, it is characterised in that the fisrt feature ginseng Number is included in the hits in first time period, and the second feature parameter is included in the point in second time period Hit the time interval of number and click frequency or click.
8. method according to claim 6, it is characterised in that the disaggregated model is to be carried using gradient The model f for rising decision tree and being formedM(x),
f M ( x ) = Σ m = 1 M T ( x ; Θ m )
Wherein T (x;Θm) represent the m result of recurrence tree classification, ΘmRepresent the m regression tree, M tables Show the total number of regression tree, x represents the first trigger event to be sorted;
Wherein M=1, Θ1Including:It is the first root node with the hits in first time period, with second Hits in time period are first child node and the first leaf node of first root node, first son Node includes the second leaf node and the 3rd leaf node;
Θ2Including:It is the second root node with the click frequency within the second time, second root node includes 4th leaf node and the 5th leaf node.
9. a kind of information processor, it is characterised in that described device includes that the first determining unit, first are obtained Unit, second acquisition unit, the second determining unit, input block, the 3rd acquiring unit and output unit are taken, Wherein:
First determining unit, for determining the first trigger event to be sorted, first trigger event For describing the first trigger action;
The first acquisition unit, identification information and institute for obtaining user from first trigger event State the attribute information of the first trigger action;
The second acquisition unit, for being obtained for describing the user according to the identification information of the user The fisrt feature parameter of the user behavior feature in first time period;
Second determining unit, for being determined for describing according to the attribute information of first trigger action The second feature parameter of user behavior feature of the user in second time period, wherein, when described first Between section be more than the second time period;
The input block, for the fisrt feature parameter and second feature parameter input is default Disaggregated model, the disaggregated model is with the fisrt feature parameter and the second feature parameter as sorting parameter;
3rd acquiring unit, for obtaining dividing for first trigger event that the disaggregated model is exported Class result;
The output unit, for exporting the classification results.
10. device according to claim 9, it is characterised in that the attribute of first trigger action The information at least triggering moment including first trigger event, the second time period is included from default the One moment to the time period between the triggering moment of first trigger event;
Second determining unit includes the first acquisition module and determining module, wherein:
First acquisition module, for obtaining the of the second trigger event according to the identification information of the user Three characteristic parameters, second trigger event was included between the click moment of first trigger action Time difference most short trigger event, the third feature parameter is used for the user from first moment to institute State the user behavior feature in the time period between the triggering moment of the second trigger event;
The determining module, joins for the attribute information according to first trigger action and the third feature Number determines the second feature parameter.
11. devices according to claim 10, it is characterised in that first determining unit includes connecing Module and the first extraction module are received, wherein, the receiver module is described for receiving online triggering stream First extraction module, for extracting first trigger event from the triggering stream;Or,
First determining unit include the second acquisition module and the second extraction module, second acquisition module, Daily record for obtaining trigger action, second extraction module, for being extracted from the Operation Log First trigger event.
12. device according to claim 10 or 11, it is characterised in that described device also includes the Four acquiring units, extraction unit, the second input block, adjustment unit and the 3rd determining unit, wherein:
4th acquiring unit, for obtaining positive sample and negative sample according to default allocation ratio;
The extraction unit, characteristic parameter and the negative sample for extracting the different dimensions of the positive sample Different dimensions characteristic parameter;
Second input block, for the feature of the positive sample or the different dimensions of the negative sample to be joined The first training pattern that number input is set, obtains the first training result, and first training pattern is with pre- If the characteristic parameter of the different dimensions of weight is sorting parameter;
The adjustment unit, if being unsatisfactory for default condition for first training result, adjusts one by one The weight of the characteristic parameter of whole each different dimensions meets the condition up to the training result, by institute State the first training result and meet the first training pattern of the condition and exported as the disaggregated model;
3rd determining unit, if meeting default condition for first training result, by institute The first training pattern is stated to be exported as the disaggregated model.
13. a kind of message processing devices, it is characterised in that the equipment includes display device and processing unit, Wherein:
The display device, the classification results for showing the processing unit output;
The processing unit, is used for:Determine the first trigger event to be sorted, first trigger event is used In describing the first trigger action;The identification information and described first of user is obtained from first trigger event The attribute information of trigger action;Identification information according to the user is obtained for describing the user first The fisrt feature parameter of the user behavior feature in the time period;According to the attribute information of first trigger action It is determined that the second feature parameter for describing user behavior feature of the user in second time period, wherein, The first time period is more than the second time period;The fisrt feature parameter and the second feature are joined The default disaggregated model of number input, the disaggregated model is joined with the fisrt feature parameter and the second feature Number is sorting parameter;Obtain the classification results of first trigger event of the disaggregated model output;Output The classification results.
CN201510990666.0A 2015-12-24 2015-12-24 Information processing method, device and equipment Active CN106919579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510990666.0A CN106919579B (en) 2015-12-24 2015-12-24 Information processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510990666.0A CN106919579B (en) 2015-12-24 2015-12-24 Information processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN106919579A true CN106919579A (en) 2017-07-04
CN106919579B CN106919579B (en) 2020-11-06

Family

ID=59456890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510990666.0A Active CN106919579B (en) 2015-12-24 2015-12-24 Information processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN106919579B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908522A (en) * 2017-11-15 2018-04-13 北京小米移动软件有限公司 Information displaying method, device and computer-readable recording medium
CN107993078A (en) * 2017-12-22 2018-05-04 北京三快在线科技有限公司 For evaluation information bandwagon effect method and apparatus and computing device
CN108694616A (en) * 2018-05-24 2018-10-23 百度在线网络技术(北京)有限公司 The recognition methods of advertisement cheating and device
CN108763574A (en) * 2018-06-06 2018-11-06 电子科技大学 A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
CN109255391A (en) * 2018-09-30 2019-01-22 武汉斗鱼网络科技有限公司 A kind of method, apparatus and storage medium identifying malicious user
CN109509028A (en) * 2018-11-15 2019-03-22 北京奇虎科技有限公司 A kind of advertisement placement method and device, storage medium, computer equipment
CN109947344A (en) * 2019-02-20 2019-06-28 腾讯科技(深圳)有限公司 A kind of training method and device of application strategy model
CN109961080A (en) * 2017-12-26 2019-07-02 腾讯科技(深圳)有限公司 Terminal identification method and device
CN110069617A (en) * 2017-11-09 2019-07-30 北京嘀嘀无限科技发展有限公司 Management method, device, server and the computer readable storage medium of forum's note
CN110213209A (en) * 2018-05-11 2019-09-06 腾讯科技(深圳)有限公司 A kind of cheat detection method, device and storage medium that pushed information is clicked
CN110909984A (en) * 2019-10-28 2020-03-24 苏宁金融科技(南京)有限公司 Business data processing model training method, business data processing method and device
CN110935176A (en) * 2019-11-27 2020-03-31 苏州思酷数字科技有限公司 Game instruction pre-judging system based on big data and working method thereof
CN111061954A (en) * 2019-12-19 2020-04-24 腾讯音乐娱乐科技(深圳)有限公司 Search result sorting method and device and storage medium
CN111435507A (en) * 2019-01-11 2020-07-21 腾讯科技(北京)有限公司 Advertisement anti-cheating method and device, electronic equipment and readable storage medium
CN111651538A (en) * 2020-05-11 2020-09-11 腾讯科技(深圳)有限公司 Position mapping method, device and equipment and readable storage medium
CN111786938A (en) * 2020-03-06 2020-10-16 北京沃东天骏信息技术有限公司 Method, system and electronic equipment for preventing malicious resource acquisition
CN111832789A (en) * 2019-09-24 2020-10-27 北京嘀嘀无限科技发展有限公司 Position determination method, model training method, device, equipment and storage medium
CN112506981A (en) * 2021-02-05 2021-03-16 深圳市阿卡索资讯股份有限公司 Online training service pushing method and device
CN112802456A (en) * 2021-04-14 2021-05-14 北京世纪好未来教育科技有限公司 Voice evaluation scoring method and device, electronic equipment and storage medium
CN113052632A (en) * 2021-03-25 2021-06-29 北京沃东天骏信息技术有限公司 Method, device, equipment and storage medium for identifying advertisement traffic data
CN114240471A (en) * 2021-11-01 2022-03-25 上海珍玩网络科技有限公司 Game software promotion and viewing data analysis method

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070073579A1 (en) * 2005-09-23 2007-03-29 Microsoft Corporation Click fraud resistant learning of click through rate
US20110029464A1 (en) * 2009-07-31 2011-02-03 Qiong Zhang Supplementing a trained model using incremental data in making item recommendations
CN102089759A (en) * 2008-07-09 2011-06-08 凯森公司 A method of generating an analytical data set for input into an analytical model
CN102176698A (en) * 2010-12-20 2011-09-07 北京邮电大学 Method for detecting abnormal behaviors of user based on transfer learning
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
CN103605697A (en) * 2013-11-06 2014-02-26 北京掌阔移动传媒科技有限公司 Method for judging cheat clicking of mobile phone advertising
CN103685423A (en) * 2012-09-24 2014-03-26 腾讯科技(深圳)有限公司 Information receiving method, device and system
US20140214527A1 (en) * 2005-09-14 2014-07-31 Millennial Media, Inc. System for Targeting Advertising Content to a Plurality of Mobile Communication Facilities
US20140297407A1 (en) * 2013-04-01 2014-10-02 Apple Inc. Context-switching taxonomy for mobile advertisement
CN104091276A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机系统有限公司 Click stream data online analyzing method and related device and system
CN104270373A (en) * 2014-10-11 2015-01-07 国家电网公司 Web server anonymous access flow detection method based on time characteristics
US20150019329A1 (en) * 2005-09-14 2015-01-15 Millennial Media, Inc. Dynamic Bidding and Expected Value
CN104731937A (en) * 2015-03-30 2015-06-24 百度在线网络技术(北京)有限公司 User behavior data processing method and device
CN104778173A (en) * 2014-01-10 2015-07-15 腾讯科技(深圳)有限公司 Determination method, device and equipment of objective user
CN104866296A (en) * 2014-02-25 2015-08-26 腾讯科技(北京)有限公司 Data processing method and device
CN104933157A (en) * 2015-06-26 2015-09-23 百度在线网络技术(北京)有限公司 Method and device used for obtaining user attribute information, and server
CN104965874A (en) * 2015-06-11 2015-10-07 腾讯科技(北京)有限公司 Information processing method and apparatus
CN105025170A (en) * 2015-08-05 2015-11-04 张京源 Detection and alarm method of mobile phone in non-normal use
CN105072089A (en) * 2015-07-10 2015-11-18 中国科学院信息工程研究所 WEB malicious scanning behavior abnormity detection method and system

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019329A1 (en) * 2005-09-14 2015-01-15 Millennial Media, Inc. Dynamic Bidding and Expected Value
US20140214527A1 (en) * 2005-09-14 2014-07-31 Millennial Media, Inc. System for Targeting Advertising Content to a Plurality of Mobile Communication Facilities
US20070073579A1 (en) * 2005-09-23 2007-03-29 Microsoft Corporation Click fraud resistant learning of click through rate
CN102089759A (en) * 2008-07-09 2011-06-08 凯森公司 A method of generating an analytical data set for input into an analytical model
US20110029464A1 (en) * 2009-07-31 2011-02-03 Qiong Zhang Supplementing a trained model using incremental data in making item recommendations
CN102176698A (en) * 2010-12-20 2011-09-07 北京邮电大学 Method for detecting abnormal behaviors of user based on transfer learning
CN103685423A (en) * 2012-09-24 2014-03-26 腾讯科技(深圳)有限公司 Information receiving method, device and system
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
US20140297407A1 (en) * 2013-04-01 2014-10-02 Apple Inc. Context-switching taxonomy for mobile advertisement
CN103605697A (en) * 2013-11-06 2014-02-26 北京掌阔移动传媒科技有限公司 Method for judging cheat clicking of mobile phone advertising
CN104091276A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机系统有限公司 Click stream data online analyzing method and related device and system
CN104778173A (en) * 2014-01-10 2015-07-15 腾讯科技(深圳)有限公司 Determination method, device and equipment of objective user
CN104866296A (en) * 2014-02-25 2015-08-26 腾讯科技(北京)有限公司 Data processing method and device
CN104270373A (en) * 2014-10-11 2015-01-07 国家电网公司 Web server anonymous access flow detection method based on time characteristics
CN104731937A (en) * 2015-03-30 2015-06-24 百度在线网络技术(北京)有限公司 User behavior data processing method and device
CN104965874A (en) * 2015-06-11 2015-10-07 腾讯科技(北京)有限公司 Information processing method and apparatus
CN104933157A (en) * 2015-06-26 2015-09-23 百度在线网络技术(北京)有限公司 Method and device used for obtaining user attribute information, and server
CN105072089A (en) * 2015-07-10 2015-11-18 中国科学院信息工程研究所 WEB malicious scanning behavior abnormity detection method and system
CN105025170A (en) * 2015-08-05 2015-11-04 张京源 Detection and alarm method of mobile phone in non-normal use

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BRETT STONE-GROSS等: "nderstanding Fraudulent Activities in Online Ad Exchanges", 《PROCEEDINGS OF THE 2011 ACM SIGCOMM CONFERENCE ON INTERNET MEASUREMENT CONFERENCE》 *
姜晓旭: "基于用户行为的网络广告点击欺骗检测与研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
李爱春: "Web挖掘在检测网络广告欺诈行为中的研究与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
龚尚福等: "基于用户行为分析的广告欺诈点击检测", 《计算机应用与软件》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069617A (en) * 2017-11-09 2019-07-30 北京嘀嘀无限科技发展有限公司 Management method, device, server and the computer readable storage medium of forum's note
CN107908522A (en) * 2017-11-15 2018-04-13 北京小米移动软件有限公司 Information displaying method, device and computer-readable recording medium
CN107908522B (en) * 2017-11-15 2021-05-04 北京小米移动软件有限公司 Information display method and device and computer readable storage medium
CN107993078A (en) * 2017-12-22 2018-05-04 北京三快在线科技有限公司 For evaluation information bandwagon effect method and apparatus and computing device
CN107993078B (en) * 2017-12-22 2020-08-14 北京三快在线科技有限公司 Method and device for evaluating information display effect and computing equipment
CN109961080A (en) * 2017-12-26 2019-07-02 腾讯科技(深圳)有限公司 Terminal identification method and device
CN109961080B (en) * 2017-12-26 2022-09-23 腾讯科技(深圳)有限公司 Terminal identification method and device
CN110213209A (en) * 2018-05-11 2019-09-06 腾讯科技(深圳)有限公司 A kind of cheat detection method, device and storage medium that pushed information is clicked
CN108694616A (en) * 2018-05-24 2018-10-23 百度在线网络技术(北京)有限公司 The recognition methods of advertisement cheating and device
CN108763574A (en) * 2018-06-06 2018-11-06 电子科技大学 A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
CN109255391A (en) * 2018-09-30 2019-01-22 武汉斗鱼网络科技有限公司 A kind of method, apparatus and storage medium identifying malicious user
CN109255391B (en) * 2018-09-30 2021-07-23 武汉斗鱼网络科技有限公司 Method, device and storage medium for identifying malicious user
CN109509028A (en) * 2018-11-15 2019-03-22 北京奇虎科技有限公司 A kind of advertisement placement method and device, storage medium, computer equipment
CN111435507A (en) * 2019-01-11 2020-07-21 腾讯科技(北京)有限公司 Advertisement anti-cheating method and device, electronic equipment and readable storage medium
CN109947344A (en) * 2019-02-20 2019-06-28 腾讯科技(深圳)有限公司 A kind of training method and device of application strategy model
CN111832789A (en) * 2019-09-24 2020-10-27 北京嘀嘀无限科技发展有限公司 Position determination method, model training method, device, equipment and storage medium
CN110909984A (en) * 2019-10-28 2020-03-24 苏宁金融科技(南京)有限公司 Business data processing model training method, business data processing method and device
CN110935176B (en) * 2019-11-27 2023-09-12 上海畅造网络科技有限公司 Game instruction prejudging system based on big data and working method thereof
CN110935176A (en) * 2019-11-27 2020-03-31 苏州思酷数字科技有限公司 Game instruction pre-judging system based on big data and working method thereof
CN111061954A (en) * 2019-12-19 2020-04-24 腾讯音乐娱乐科技(深圳)有限公司 Search result sorting method and device and storage medium
CN111061954B (en) * 2019-12-19 2022-03-15 腾讯音乐娱乐科技(深圳)有限公司 Search result sorting method and device and storage medium
CN111786938A (en) * 2020-03-06 2020-10-16 北京沃东天骏信息技术有限公司 Method, system and electronic equipment for preventing malicious resource acquisition
CN111651538A (en) * 2020-05-11 2020-09-11 腾讯科技(深圳)有限公司 Position mapping method, device and equipment and readable storage medium
CN112506981A (en) * 2021-02-05 2021-03-16 深圳市阿卡索资讯股份有限公司 Online training service pushing method and device
CN113052632A (en) * 2021-03-25 2021-06-29 北京沃东天骏信息技术有限公司 Method, device, equipment and storage medium for identifying advertisement traffic data
CN113052632B (en) * 2021-03-25 2024-05-17 北京沃东天骏信息技术有限公司 Advertisement traffic data identification method, device, equipment and storage medium
CN112802456A (en) * 2021-04-14 2021-05-14 北京世纪好未来教育科技有限公司 Voice evaluation scoring method and device, electronic equipment and storage medium
CN114240471A (en) * 2021-11-01 2022-03-25 上海珍玩网络科技有限公司 Game software promotion and viewing data analysis method

Also Published As

Publication number Publication date
CN106919579B (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN106919579A (en) A kind of information processing method and device, equipment
CN107346496B (en) Target user orientation method and device
CN109544197A (en) A kind of customer churn prediction technique and device
CN106997549A (en) The method for pushing and system of a kind of advertising message
CN110163647A (en) A kind of data processing method and device
CN109934704A (en) Information recommendation method, device, equipment and storage medium
CN105512916B (en) Method for delivering advertisement accurately and system
CN108256568A (en) A kind of plant species identification method and device
CN106960063A (en) A kind of internet information crawl and commending system for field of inviting outside investment
CN103617235B (en) Method and system for network navy account number identification based on particle swarm optimization
Guyon et al. Analysis of the kdd cup 2009: Fast scoring on a large orange customer database
US20050197889A1 (en) Method and apparatus for comparison over time of prediction model characteristics
CN105931068A (en) Cardholder consumption figure generation method and device
CN106294783A (en) A kind of video recommendation method and device
CN102708131A (en) Automatic classification of consumers into micro-segments
CN107862551B (en) Method and device for predicting network application promotion effect and terminal equipment
CN106408325A (en) User consumption behavior prediction analysis method based on user payment information and system
CN108322317A (en) A kind of account identification correlating method and server
CN110083759A (en) Public opinion information crawler method, apparatus, computer equipment and storage medium
Wagh et al. Customer churn prediction in telecom sector using machine learning techniques
CN103593355A (en) User original content recommending method and device
CN114841526A (en) Detection method of high-risk user, computing device and readable storage medium
US20240080280A1 (en) Understanding social media user behavior
Sangaralingam et al. Takeoff and sustained success of apps in hypercompetitive mobile platform ecosystems: an empirical analysis
CN110215703A (en) The selection method of game application, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant