CN109165691A - Training method, device and the electronic equipment of the model of cheating user for identification - Google Patents

Training method, device and the electronic equipment of the model of cheating user for identification Download PDF

Info

Publication number
CN109165691A
CN109165691A CN201811030204.4A CN201811030204A CN109165691A CN 109165691 A CN109165691 A CN 109165691A CN 201811030204 A CN201811030204 A CN 201811030204A CN 109165691 A CN109165691 A CN 109165691A
Authority
CN
China
Prior art keywords
user
user information
access
training
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811030204.4A
Other languages
Chinese (zh)
Other versions
CN109165691B (en
Inventor
韩冰
陈家耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201811030204.4A priority Critical patent/CN109165691B/en
Publication of CN109165691A publication Critical patent/CN109165691A/en
Application granted granted Critical
Publication of CN109165691B publication Critical patent/CN109165691B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the invention provides training method, device and the electronic equipments of the model of user that practises fraud for identification a kind of, this method comprises: obtaining and storing the user information of first kind access user;The user information that preset rules are not met in the user information of stored first kind access user is determined, as training sample, wherein preset rules are as follows: the user information based on stored second class access user, the rule determined by unsupervised learning algorithm;It is trained to preset to training pattern based on training sample;When the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning, the identification model for the user that practised fraud for identification.Compared with prior art, using scheme provided in an embodiment of the present invention, it can be improved and train obtained identification model to the recognition accuracy for the cheating user of type newly occur and the recall rate of the identification model.

Description

Training method, device and the electronic equipment of the model of cheating user for identification
Technical field
The present invention relates to field of computer technology, more particularly to a kind of training side of the model of user that practises fraud for identification Method, device and electronic equipment.
Background technique
Currently, with the continuous development of Internet technology, all kinds of letters are issued in more and more users' selection by network Breath, for example, the video of shooting, novel, the product advertising write etc..The information that these users are generally desirable to oneself publication can Higher attention rate is obtained, for example, higher video playing amount, higher novel amount of reading, higher ad click rate etc..
However, in some cases, above-mentioned attention rate may be simultaneously untrue, in the access user of above- mentioned information there may be Pass through the user for the not necessary being that some cheating application program simulations generate, i.e. cheating user.By taking advertisement as an example, in advertisement Viewing user in there may be cheating user to advertisement click the case where perhaps playing cause advertisement click volume or Playback volume is simultaneously untrue.
In order to cheating user perform corresponding processing, various information website need in message reference user which It is that cheating user identifies, that is, counter is practised fraud.In the prior art, the anti-mode practised fraud is usual are as follows: obtains to preparatory Cheating user information be labeled, the cheating user information being identified by is believed based on sample of users as sample of users information Breath obtains the identification model of identification cheating user by machine learning algorithm training, is used using trained identification model access The user information at family is detected, and determines the cheating user in access user according to testing result.
However, inventor has found that this method at least exists as follows during identifying cheating user by the above method Problem:, can only be to the cheating user for having found type during the method by manually marking obtains sample of users information User information be labeled, and due to simulation cheating user mode update it is very fast, the method manually marked can not be to new The user information of cheating user for type occur is labeled, so that trained obtained identification model is to newly there is type The recognition accuracy of cheating user is lower, and the recall rate of the identification model is lower.
Summary of the invention
The training method of the model for being designed to provide user that practises fraud for identification of the embodiment of the present invention a kind of, device and Electronic equipment, it is quasi- to the identification for the cheating user of type newly occur with the identification model for improving the identification cheating user that training obtains The recall rate of true rate and the identification model.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of training method of the model of user that practises fraud for identification, it is described Method includes:
Obtain and store the user information of first kind access user;
The user information that preset rules are not met in the user information of stored first kind access user is determined, as instruction Practice sample, wherein the preset rules are as follows: the user information based on stored second class access user passes through unsupervised The rule that algorithm determines is practised, the user information of the second class access user is in the user for obtaining the first kind access user The user information for the access user for obtaining and storing before information;
Be trained to preset to training pattern based on the training sample, wherein it is described to training pattern be for Identify whether first kind access user and second class access user are the model of user of practising fraud;
When described when the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning is used for The identification model of identification cheating user.
As a kind of embodiment of the embodiment of the present invention, the acquisition and the user information for storing first kind access user The step of, comprising:
The user information of first kind access user is obtained and stored in current period;
The user information that preset rules are not met in the user information of the stored first kind access user of determination, makees The step of for training sample, comprising:
At the end of current period, determines in current period and be not inconsistent in the user information of stored first kind access user The user information for closing preset rules, as training sample;
It is described to be based on the training sample to preset the step of being trained to training pattern, comprising:
The training sample is added in target sample set, wherein the target sample collection is combined into period knot Set when beam for the sample of training objective model, the object module are that current period is practised fraud the mould of user for identification Type;
Target sample set after addition is input in the object module and is trained.
As a kind of embodiment of the embodiment of the present invention, the method also includes: after entering next period, storage is worked as The user information of first kind access user in the preceding period, and user is accessed to the first kind by the identification model and is carried out Identification, return is described at the end of current period, determines the user information of stored first kind access user in current period In do not meet the user informations of preset rules, the step of as training sample.
As a kind of embodiment of the embodiment of the present invention, the training sample is added to target sample set described In step before, the method also includes: determine that each training sample is corresponding and meet predeterminated frequency in line frequency.
As a kind of embodiment of the embodiment of the present invention, stored first kind access is used in the determining current period The user information that preset rules are not met in the user information at family, the step of as training sample, comprising:
Obtain the user information and operation log of stored first kind access user in current period;
User is accessed for each first kind, judges whether its corresponding operation log meets preset rules;
If not meeting preset rules, determine that the user information of first kind access user is training sample.
As a kind of embodiment of the embodiment of the present invention, the operation log includes the operation data of a type;Institute It states and accesses user for each first kind, the step of whether its corresponding operation log meets preset rules judged, comprising:
User is accessed for each first kind, judges whether the operation data meets the first kind corresponding to its type Preset rules;
If described do not meet preset condition, determine that the first kind accesses the step of user information of user is training sample, Include:
If the operation data does not meet first kind preset rules corresponding to its type, determine that first kind access is used The user information at family is training sample.
As a kind of embodiment of the embodiment of the present invention, the operation log includes the operation data of multiple types;Institute It states and accesses user for each first kind, the step of whether its corresponding operation log meets preset rules judged, comprising:
For the operation data for each type for including in the operation log of each first kind access user, the operation is judged Whether data meet Second Type preset rules corresponding to its type;
If not meeting Second Type preset condition, determine that the operation data is object run data;
User is accessed for each first kind, judges the quantity of object run data corresponding to first kind access user Whether default value is not less than;
If described do not meet preset condition, determine that the first kind accesses the step of user information of user is training sample, Include:
If the quantity that the first kind accesses object run data corresponding to user is not less than the default value, determining should The user information that the first kind accesses user is training sample.
As a kind of embodiment of the embodiment of the present invention, the type of the operation data includes: access user to advertisement Clicking rate, access user are for the exposure rate of advertisement, the access time distribution proportion for accessing user, access user for same view The clicking rate ratio of the advertisement of frequency different periods.
As a kind of embodiment of the embodiment of the present invention, described accessed by the identification model the first kind is used The step of family is identified, comprising:
Obtain the user information of the first kind access user;
The user information is input in the identification model and is detected, the knowledge of the first kind access user is obtained Other result.
As a kind of embodiment of the embodiment of the present invention, the step of the user information for obtaining first kind access user Suddenly, comprising:
In next end cycle, under off-line state, the first kind access of next cycle memory storage is obtained The user information of user;Or,
When receiving the access request that first kind access user sends, user's letter of the first kind access user is obtained Breath.
As a kind of embodiment of the embodiment of the present invention, the method also includes: when first kind access user's When recognition result is cheating user, the access request of the first kind access user is shielded.
Second aspect, it is described the embodiment of the invention provides a kind of training device of the model of user that practises fraud for identification Device includes:
User profile acquisition module, for obtaining and storing the user information of first kind access user;
Training sample determining module is not met in the user information for determining stored first kind access user default The user information of rule, as training sample, wherein the preset rules are as follows: based on stored second class access user's User information, the rule determined by unsupervised learning algorithm, the user information of the second class access user are to obtain institute The user information for the access user for obtaining and storing before stating the user information of first kind access user;
Model training module, for being trained to preset to training pattern based on the training sample, wherein described It is that the first kind accesses user for identification and second class accesses whether user is the mould of user of practising fraud to training pattern Type;
Identification model obtains module, for reaching default accuracy rate when the accuracy rate to the output result of training pattern When, deconditioning, the identification model for the user that practised fraud for identification.
As a kind of embodiment of the embodiment of the present invention, the User profile acquisition module includes: that user information obtains Submodule, the user information acquisition submodule are used for: the user of first kind access user is obtained and stored in current period Information;
The training sample determining module includes: that training sample determines that submodule, the training sample determine that submodule is used In: at the end of current period, determine do not met in the user information of stored first kind access user in current period it is pre- If the user information of rule, as training sample;
The model training module includes: sample set addition submodule and model training submodule;The sample set Submodule is added, for the training sample to be added in target sample set, wherein the target sample collection is combined into one Set when end cycle for the sample of training objective model, the object module are that current period is practised fraud user for identification Model;The model training submodule is carried out for the target sample set after addition to be input in the object module Training.
As a kind of embodiment of the embodiment of the present invention, described device further include: information storage and model application module, For after entering next period, storing the user information of the first kind access user in current period, and pass through the identification Model identifies first kind access user, triggers the training sample determining module.
As a kind of embodiment of the embodiment of the present invention, described device further include: online frequency determining module is used for Before the training sample is added in target sample set, determine that each training sample is corresponding default in line frequency satisfaction Frequency.
As a kind of embodiment of the embodiment of the present invention, the user information acquisition submodule includes: that user information obtains Take unit, at the end of current period, obtain in current period the user information of stored first kind access user and Operation log;
Whether preset rules judging unit judges its corresponding operation log for accessing user for each first kind Meet preset rules, if not meeting, triggers training sample determination unit;
The training sample determination unit, for determining that the user information of first kind access user is training sample.
As a kind of embodiment of the embodiment of the present invention, the operation log includes the operation data of a type;Institute Preset rules judging unit is stated, is specifically used for: accessing user for each first kind, judges whether the operation data meets it First kind preset rules corresponding to type trigger the training sample determination unit if not meeting.
As a kind of embodiment of the embodiment of the present invention, the operation log includes the operation data of multiple types, institute Stating preset rules judging unit includes:
Preset rules judgment sub-unit, each class for including in the operation log for each first kind access user The operation data of type, judges whether the operation data meets Second Type preset rules corresponding to its type, if not meeting, touching Hair data determine subelement;
The data determine subelement, for determining that the operation data is object run data;
Default value judgment sub-unit judges that the first kind accesses user institute for accessing user for each first kind Whether the quantity of corresponding object run data is not less than default value, if satisfied, triggering the training sample determination unit.
As a kind of embodiment of the embodiment of the present invention, the type of the operation data: user is to ad click for access Rate, access user for advertisement exposure rate, access user access time distribution proportion, access user for same video not With the clicking rate ratio of the advertisement of period.
As a kind of embodiment of the embodiment of the present invention, the information storage and model application module, comprising:
Access information acquisition submodule, for obtaining the user information of the first kind access user;
It accesses user and identifies submodule, detect, obtain for the user information to be input in the identification model Obtain the recognition result of the first kind access user.
As a kind of embodiment of the embodiment of the present invention, the access information acquisition submodule is specifically used for:
In next end cycle, under off-line state, the first kind of next cycle memory storage is obtained Access the user information of user;Or,
When receiving the access request that the first kind access user sends, the use of the first kind access user is obtained Family information.
As a kind of embodiment of the embodiment of the present invention, described device further include: access request shroud module, for working as When the recognition result of the first kind access user is cheating user, the access request of the first kind access user is shielded.
The third aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor, when for executing the program stored on memory, the one kind for realizing that above-mentioned first aspect provides is used for Any method and step of training method of the model of identification cheating user.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer so that computer execute it is any of the above-described described for knowing Not Zuo Bi user model training method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes the model of any of the above-described user that practises fraud for identification Training method.
As it can be seen that in scheme provided in an embodiment of the present invention, in current period, application target model to access user whether It is identified for cheating user, and object module is to be obtained in a upper end cycle based on the training of target training sample set 's.At the end of current period, it can be used according to the stored access in current period of preset rules and predeterminated frequency is met The user information that can be used as training sample is determined in the user information at family;Wherein, preset rules be based on current period it The user information of preceding acquired access user, is determined by unsupervised learning algorithm;And then it can be by these training samples It is added in target sample set, obtains new target sample set;The new target sample obtained after training sample will be added Set is input in object module and is trained, and when the accuracy rate of the output result in object module reaches default accuracy rate, Deconditioning obtains new object module, obtained new object module be in next period for access user whether be The identification model that cheating user is identified;After entering next period, next period is current period, then can store and work as The user information of access user in the preceding period, and the identification model by obtaining identifies access user, and current When end cycle, again returns in determining current period and meet the use of preset rules in the user information of stored access user Family information, alternately the step of training sample, and then other above-mentioned subsequent steps are executed again, entire scheme is according to the period Circulation carries out.
It is visible above, in scheme provided in an embodiment of the present invention, stored first can be determined according to preset rules The user information of preset rules is not inconsistent in the access information of class access user, wherein preset rules are based in the acquisition first kind The user information for the second class access user for obtaining and storing before accessing the user information of user, is determined by unsupervised learning 's.Obviously, newly there is the cheating of type for what currently used identification model can not identify in these identified user informations The user information of user.In turn, can be using user information determined by these as training sample, and sample is trained based on these This is trained to preset to training pattern, and when this waits for that the input results accuracy rate of training pattern reaches default accuracy rate When, deconditioning is practised fraud the identification model of user for identification.In this way, since the identification model newly obtained is based on current What the new user information training for the cheating user of type occur that identification model used can not identify obtained, then it newly obtains Identification model, which can identify, newly there is the cheating user of type.
In scheme provided in an embodiment of the present invention, the user of user can be accessed based on the second class by unsupervised algorithm Information determines preset rules, to can determine type newly occur in the user information that the stored first kind accesses user Cheating user user information, in turn, user information based on determined by these training obtains new identification model so that The new identification model, which can identify, newly there is the cheating user of type.In this way, by preset rules to newly there is the work of type The user information of disadvantage user is labeled, and the cheating user for type newly occur can not be marked by avoiding the method manually marked The phenomenon that note, occurs, and trains obtained new identification model to the recognition accuracy for the cheating user of type newly occur to improve And the recall rate of the identification model.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of process signal of the training method of the model of user that practises fraud for identification provided in an embodiment of the present invention Figure;
Fig. 2 is to determine that stored access is used in current period in a kind of specific implementation provided in an embodiment of the present invention The user information at family does not meet the flow diagram of the mode of preset rules;
Fig. 3 is a kind of structural representation of the training device of the model of user that practises fraud for identification provided in an embodiment of the present invention Figure;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In the prior art, inventor has found that at least there are the following problems for existing method during user is practised fraud in identification: It, can only be to the user for the cheating user for having found type during the method by manually marking obtains sample of users information Information is labeled, and since the update of the mode of simulation cheating user is very fast, the method manually marked class can not occur to new The user information of the cheating user of type is labeled, so that the identification model that training obtains uses the cheating for type newly occur The recognition accuracy at family is lower, and the recall rate of the identification model is lower.
In order to solve the problems in the existing technology, practise fraud user's for identification the embodiment of the invention provides a kind of The training method of model, this method comprises:
Obtain and store the user information of first kind access user;
The user information that preset rules are not met in the user information of stored first kind access user is determined, as instruction Practice sample, wherein the preset rules are as follows: the user information based on stored second class access user passes through unsupervised The rule that algorithm determines is practised, the user information of the second class access user is in the user for obtaining the first kind access user The user information for the access user for obtaining and storing before information;
Be trained to preset to training pattern based on the training sample, wherein it is described to training pattern be for Identify whether first kind access user and second class access user are the model of user of practising fraud;
When described when the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning is used for The identification model of identification cheating user.
It is visible above, in scheme provided in an embodiment of the present invention, it can be accessed by unsupervised algorithm based on the second class The user information of user determines preset rules, to can determine in the user information that the stored first kind accesses user Newly there is the user information of the cheating user of type, in turn, the training of user information based on determined by these obtains new identification Model, so that the new identification model, which can identify, the cheating user of type newly occurs.In this way, by preset rules to it is new go out The user information of the cheating user of existing type is labeled, and avoiding the method that manually marks can not be to the cheating of type newly occurs The phenomenon that user is labeled appearance, to improve the obtained new identification model of training to newly occurring the cheating user's of type The recall rate of recognition accuracy and the identification model.
A kind of user provided by the embodiment of the present invention identifies that the training method of the model of cheating user can be applied to appoint It anticipates electronic equipment, for example, it may be processor, computer, server etc., are not specifically limited herein, hereinafter referred to as electronic equipment.
First below to a kind of training method progress of the model for the user that practises fraud for identification provided in an embodiment of the present invention It introduces.
As shown in Figure 1, being a kind of training method of the model of user that practises fraud for identification provided in an embodiment of the present invention Flow diagram, this method comprises:
S101: obtaining and stores the user information of first kind access user;
S102: determining the user information that preset rules are not met in the user information of stored first kind access user, As training sample;
Wherein, preset rules are as follows: the user information based on stored second class access user is calculated by unsupervised learning The rule that method determines, the user information of the second class access user are obtained before the user information for obtaining first kind access user And the user information of the access user stored;
S103: it is trained to preset to training pattern based on training sample;
It wherein, is that the first kind accesses user for identification and the second class accesses whether user is cheating user to training pattern Model.
S104: when the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning is used for The identification model of identification cheating user.
It is visible above, in scheme provided in an embodiment of the present invention, it can be accessed by unsupervised algorithm based on the second class The user information of user determines preset rules, to can determine in the user information that the stored first kind accesses user Newly there is the user information of the cheating user of type, in turn, the training of user information based on determined by these obtains new identification Model, so that the new identification model, which can identify, the cheating user of type newly occurs.In this way, by preset rules to it is new go out The user information of the cheating user of existing type is labeled, and avoiding the method that manually marks can not be to the cheating of type newly occurs The phenomenon that user is labeled appearance, to improve the obtained new identification model of training to newly occurring the cheating user's of type The recall rate of recognition accuracy and the identification model.
It should be noted that electronic equipment receives the access request of the second class access user before above-mentioned steps S101, And whether user, which is that cheating user knows, is accessed to the second class using the identification model of currently used identification cheating user Not, wherein the identification model of currently used identification cheating user is preset to training pattern, should be to training pattern For whether being that cheating user identifies to first kind access user.
It should be understood that it is above-mentioned to training pattern be based on it is manually marking, it has been found that type cheating user user Information training obtains, and therefore, the second class can be accessed all cheating users for having found type in user by electronic equipment. So, the user information of cheating user can be not present in the user information for the second class access user that electronic equipment is stored.
However, since the mode of simulation cheating user updates comparatively fast, electronic equipment is receiving first kind access use When user's request at family, it may include the cheating user for type newly occur that the first kind, which accesses in user, then, it continues to use wait instruct Practice model to identify first kind access user, it can can not be by all cheatings with identifying per family.
That is, the user for the first kind access user that electronic equipment is obtained and stored believes in above-mentioned steps S101 There may be newly there is the cheating user of type in breath, these cheating users cannot be identified by currently used identification model Come.
And in order to access in user the first kind and type newly occur when being identified to access user later Cheating user accurately identify, electronic equipment can based on it is above-mentioned cannot recognize that come practise fraud user user information conduct Training sample is treated training pattern and is trained, to obtain the identification model in the user that practises fraud for identification later, the identification Model may learn in the training process newly there is the user information of the cheating user of type in above-mentioned first kind access user Feature, when obtaining the access request of new access user later, can to presently, there are all types of cheating users It is accurately identified.
In order to obtain above-mentioned training sample, in above-mentioned steps S102, electronic equipment can determine the stored first kind Access the user information that preset rules are not met in the user information of user, it is clear that these do not meet user's letter of preset rules Therefore these can not met the user information of preset rules by the user information for ceasing the cheating user for type as newly occur As training sample.
Wherein, preset rules are to pass through unsupervised learning based on the user information in stored second class access user The rule that algorithm determines.It should be understood that the user information of the second class access user is in the use for obtaining first kind access user It obtains and stores before the information of family.
It should be noted that can be then above-mentioned there is no the user information of cheating user in above-mentioned second class access user What preset rules can be not understood as determining based on the feature of the user information of true access user, then, when some the When the user information of one kind access user does not meet the preset rules, it can determine that first kind access user can be newly to go out The cheating user of existing type.So, the user information of first kind access user also can serve as training sample.
Specifically, may include a plurality of types of information in the user information of access user, stored second is being obtained Class access user user information after, electronic equipment can from all second classes access user user information in include certain The angle of seed type information is set out, and accesses the similar of the type information included by the user information of user according to each second class These user informations by unsupervised learning algorithm, are divided into several information groups by degree.
For example, the clicking rate to certain advertisement that electronic equipment can include from the user information that all second classes access user Angle set out, will be included to the clicking rate of the advertisement included by the user information according to each second class access user The clicking rate to the advertisement difference be less than preset difference value user information be divided into one group, thus by these second classes access user User information be divided into several groups.
When receiving the access request of first kind access user, electronic equipment be can use to training pattern to the first kind Access user identify, determine whether it is cheating user, so as to it is identified cheating user access request into Row shielding, so that the user information of cheating user can be not present in the user information of stored first kind access user.
That is, theoretically, in above-mentioned steps S102, electronic equipment, should not be according to above-mentioned preset rules The user information of the first kind access user of storage gets outlier, and so-called outlier refers to: there is one or more first The user information of class access user can not be assigned to the information that any one user information based on the second class access user divides In group, then, the user information of the one or more access user is outlier.
And once electronic equipment has got outlier, it can be said that it is bright receive the first kind access user access request It is likely to occur the mode of new simulation cheating user in the process, to produce the user of the cheating user with new feature Information, and these cheating users cannot be identified by above-mentioned to training pattern.That is, visit corresponding to the outlier Asking that information is likely to cannot be by the user information of the access user to training pattern, therefore, and electronic equipment can should be from The corresponding access information of group's point is as training sample.
Therefore, in above-mentioned steps S102, electronic equipment can be according to above-mentioned preset rules, in the stored first kind The user information for obtaining in the user information of user and not meeting the preset rules is accessed, these do not meet user's letter of preset rules Breath can be used as training sample.I.e. electronic equipment can determine stored first kind access according to above-mentioned preset rules Outlier in the user information of user.
In embodiments of the present invention, the unsupervised learning algorithm in above-mentioned steps S102 can be and any can will store The second class access user user information, from all second classes access user user information in include certain type information Angle set out, according to each second class access user user information included by the type information similarity, pass through nothing The user information of these second classes access user is divided into the learning algorithm of several groups by supervised learning algorithm.For example, it may be K-means (k- mean value) clustering algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, representational density clustering), iForest (Isolation Forest, it is lonely Vertical forest) algorithm etc., it is not specifically limited herein.
In the following, being illustrated by taking k-means clustering algorithm as an example to above-mentioned preset rules:
It under normal circumstances, include access time in the user information of the second class access user, in this case, from all The ratio distribution angle for the access time that the user information of second class access user includes is set out, and is accessed and is used according to each second class The similarity degree of the access time ratio distribution at family, can access stored second class using k-means clustering algorithm The user information of user clusters, then available multiple clusters.
It should be understood that the user information of the second class of each of each cluster access user and the cluster centre of the cluster Distance meet preset similarity threshold, that is to say, that the second class of each of each cluster access user access time Ratio is distributed the difference of the access time ratio distribution of the second class access user corresponding with the cluster centre of the cluster default In range.Wherein, which can set the recognition accuracy demand of cheating user according in practical application, when When demand to recognition accuracy is high, which can be smaller, conversely, the preset range can be larger.
It should be noted that above-mentioned k-means clustering algorithm obtain multiple clusters reflection be normal users access The case where time scale is distributed, and when according to above-mentioned preset rules, use of the electronic equipment in stored first kind access user When finding outlier in the information of family, then illustrating that first kind access user has greatly may not be normal users, but make Disadvantage user in turn can be using the information of outlier as training sample.
Wherein, when electronic equipment is server, it is not inconsistent in the user information of above-mentioned stored first kind access user The user information for closing preset rules can be server when receiving access request, and corresponding with access request the first of storage Access the user information that preset rules are not met in the user information of user.
When electronic equipment is the electronic equipment of the non-servers such as processor, computer, electronic equipment can be built with server Vertical communication connection, electronic equipment can send user information acquisition request to the server, and the request server has stored First kind access user user information in do not meet the user informations of preset rules, and respond above-mentioned user in the server When information acquisition request, it is pre- to receive not meeting in the user information of stored first kind access user for server transmission If the user information of rule.
When electronic equipment is the electronic equipment of the non-servers such as processor, computer, electronic equipment can be built with server Vertical communication connection, then the server can not will meet preset rules in the user information of stored first kind access user User information is sent to electronic equipment.
In this application, electronic equipment is not determined do not met in the user information of stored first kind access user it is pre- If the mode of the user information of rule is specifically limited.
After obtaining training sample, electronic equipment can execute above-mentioned steps S103-S104, be based on these training samples pair It is preset to be trained to training pattern, when the accuracy rate of the output result of training pattern reaches default accuracy rate, stop Training, the identification model for the user that practised fraud for identification.
In the training process, the feature of above-mentioned training sample can be learnt to training pattern, that is to say, that training pattern It can learn the feature of each new user information for the cheating user of type occur in first kind access user.By to a large amount of instructions The study for practicing sample, can be by the feature of the user information of input with all types of cheating users' learnt to training pattern The feature of user information is matched, so that access user corresponding to the user information of input identifies, is determined if For the user that practises fraud, and then the identification model for the user that practised fraud for identification.
Wherein, the feature of the user information of so-called all types of cheating users may include: to training pattern based on above-mentioned The feature and above-mentioned first of the user information for all types of cheating users that training sample has just learnt before being trained The feature of each new user information for the cheating user of type occur in class access user.
It is treated after identification model is trained the identification model for the user that practised fraud for identification based on above-mentioned training sample, It can use the identification model to predict above-mentioned training sample, obtain the output result of the identification model.It is then possible to sentence Whether the identification model of breaking is correct to the prediction result of above-mentioned each training sample, and calculates accuracy rate, and then obtains identification mould The output result accuracy rate of type.
For example, the quantity of above-mentioned training sample is 200, wherein the prediction result of 194 training samples is correct, then can be with It is 97% that accuracy rate, which is calculated, that is to say, that the output result of the identification model of the obtained user that practises fraud for identification is accurate Rate is 97%.
It, then can be with when the output accuracy rate of identification model of the obtained user that practises fraud for identification reaches default accuracy rate Deconditioning, obtain finally training complete identification model, and then can use the identification model to access user later into Row identification, determines whether it is cheating user.
Wherein, default accuracy rate can be determined according to the requirement in practical application to cheating user's recognition accuracy, When to cheating user's recognition accuracy it is more demanding when, default accuracy rate can be higher.
It should be noted that training the output accuracy rate of obtained identification model to be typically no less than based on above-mentioned training sample To training pattern, so as to so that the identification model that training obtains is not less than model to be identified to the recognition accuracy of cheating user To the recognition accuracy of cheating user, in turn, guarantee that the recognition accuracy to cheating user will not reduce.
Therefore, in one case, when passing through successive ignition, obtained identification model is trained based on above-mentioned training sample Output accuracy rate is consistently less than when the output accuracy rate of training pattern, it may be considered that electronic equipment determines in step s 102 Training sample representativeness not enough, or the user information for not meeting preset rules got cannot function as training sample, Therefore, it practises fraud for identification user in such a case, it is possible to continue to use to training pattern, to guarantee the knowledge to cheating user Other accuracy rate will not reduce.
In another case, when the quantity of the training sample determined in step s 102 when electronic equipment is zero, also It is to say, does not have the mode for the simulation cheating user for occurring new therefore can continue to use wait train in first kind access user Model is practised fraud user for identification.
It is used it should be noted that electronic equipment can be updated periodically in current period applied cheating for identification The identification model at family, for the ease of being distinguished with above-mentioned identification model, by the mould for the user that practises fraud for identification in current period Type is known as object module.So, target can be known as the set of the sample of training objective model when a upper end cycle Sample set.
It is possible to understand, in a upper end cycle, electronic equipment can be trained based on target sample set Object module is obtained, then in current period, access user is identified using above-mentioned object module, is determined whether it is Practise fraud user.In turn, electronic equipment can identify identical with the cheating user type for including in above-mentioned target sample set Practise fraud user.However, due to simulation cheating user mode update comparatively fast, in current period, in fact it could happen that with it is above-mentioned The different cheating user of the cheating user type for including in target sample set, and these cheating users cannot be by current period The object module used identifies, that is to say, that the user information for the access user that electronic equipment stores in current period In include it is above-mentioned cannot be identified by the object module used in current period cheating user user information.
In order to can be to the above-mentioned work that cannot be identified by the object module used in current period within next period Disadvantage user accurately identifies, and electronic equipment can be using the above-mentioned user information that cannot recognize that the cheating user come as training Sample is added in above-mentioned target sample set, and using the target sample set after addition training sample again to target mould Type is trained, to obtain practising fraud for identification the identification model of user in next period, the identification model is in training process In may learn it is above-mentioned cannot recognize that come cheating user user information feature, can be to mesh within next period Preceding existing all types of cheating user accurately identifies.
So, in the first specific implementation provided in an embodiment of the present invention:
Above-mentioned steps S101 obtains and stores the user information of first kind access user, may include: in current period Obtain and store the user information of first kind access user;
Above-mentioned steps S102 determines the use that preset rules are not met in the user information of stored first kind access user Family information may include: to determine that the stored first kind is visited in current period at the end of current period as training sample The user information that preset rules are not met in the user information of user is asked, as training sample;
Above-mentioned steps S103 is trained to preset to training pattern based on training sample, may include: by training sample Originally it is added in target sample set, wherein sample when target sample collection is combined into an end cycle for training objective model This set, object module are that current period is practised fraud the model of user for identification;By the target sample set input after addition It is trained into object module.
Specifically, in order to obtain above-mentioned training sample, electronic equipment can be by analyzing current week in this implementation The user information of stored access user in phase, determines in current period and is not inconsistent in the user information of stored access user Close the user information of preset rules.Obviously, these user informations for not meeting preset rules be cannot be by mesh in current period Mark model identify cheating user user information, then, these do not meet preset rules user information can with for Newly occurs the user information of the cheating user of type in current period.Therefore, these can not met to the user of preset rules Information is as training sample.
Wherein, preset rules can be the user information based on access user acquired before current period, pass through The rule that unsupervised learning algorithm determines.
Specifically, electronic equipment can determine stored first kind access user in current period in several ways The user information that preset rules are not met in user information, as training sample.Clear in order to compose a piece of writing, rear extended meeting is to electronic equipment Determine the user information for not meeting preset rules in current period in the user information of stored first kind access user, as The mode of training sample carries out citing introduction.
After determining training sample, electronic equipment, which can execute, is added to training sample in target sample set, such as On, target sample collection is combined into the set in a upper end cycle for the sample of training objective model, and trains obtained mesh Mark model determines whether it is cheating user for identifying in current period to access user.
After training sample is added to target sample set, electronic equipment can be defeated by the target sample set after addition Enter and is trained into object module, when the accuracy rate of the output result of object module reaches default accuracy rate, deconditioning, Obtain the identification model in the user that practises fraud for identification.
Wherein, in the training process, object module can learn the user information in the target sample set after addition Feature, that is to say, that object module can learn the spy of the user information of all types of cheating users occurred in current period Sign.By the study to a large amount of training samples in the target sample set after addition, object module can be by the user of input The feature of information is matched with the feature of the user information of all types of cheating users learnt, thus to the user of input The corresponding access user of information identifies, determines whether it is cheating user, and then obtains practising fraud user's for identification Identification model.
Wherein, the target sample set after addition is input in object module after being trained to obtain object module, it can The training sample in the target sample set after addition is predicted with the object module obtained using training, obtains the target The output result of model.It is then possible to judge object module to each training sample in the target sample set after addition Whether prediction result is correct, and calculates accuracy rate, and then obtains the output result accuracy rate of identification model.It is defeated when object module , then can be with deconditioning when accuracy rate reaches default accuracy rate out, the identification model for the user that practised fraud for identification.
It should be understood that in order to periodically in target sample set sample size and sample type expand It fills, the type for allowing to the cheating user for including is more and more, so that the identification model for the user that practises fraud for identification begins The cheating user for type newly occur can quickly be identified eventually, guarantee the recognition accuracy rate and recall rate of identification model.
So, practise fraud for identification obtained in the first above-mentioned specific implementation user identification model can for use In the identification model of the identification cheating user within next period, that is, in next period, can use in above embodiment and obtain Identification model to access user identify, determine whether it is cheating user.
To have on the basis of the first above-mentioned specific implementation as second provided in an embodiment of the present invention In body implementation, it can also include the following steps:
Step A1: after entering next period, the user information of the first kind access user in current period is stored, and is led to It crosses identification model to identify first kind access user, returns at the end of current period, determine in current period and stored First kind access user user information in do not meet the user informations of preset rules, the step of as training sample.
That is, enter next period after, the identification model that electronic equipment can use, to access user into Row identification determines whether it is cheating user, and the user information for being not determined as the access user of cheating user is carried out Storage.In turn, in the end cycle, electronic equipment can return to execution and obtain in current period and store first kind access The user information of user, so as to obtain cheating user's emerging in the period and that model identification cannot be identified User information is as training sample, in turn, can further expand target sample set, the target sample after making based on expansion The identification model that set training obtains can identify newly occur the work of type within the next period within the next period Disadvantage user.In turn, the recognition accuracy of all types of cheating users of identification model and the recall rate of identification model can be improved.
It should be understood that many times, the user of electronic equipment stored first kind access user in current period When determination does not meet the user information of preset rules in information, user's letter of stored first kind access user in current period The user information of real user in breath may be confirmed to be outlier.
For example, it is assumed that the user information based on stored second class access user, using unsupervised learning algorithm to visit After asking that the access time of user is grouped, determine access time of real user 6 points at night 12 of daily morning Point.Some day in current period, some real user are needed to send when two o'clock in the morning on the same day and be visited because of working principle Ask request, it is clear that in this case, since the user is real user, then preset to carry out to it to training pattern Identification, so that the user information of the user can be stored in the use that first kind access user is obtained and stored in current period Family information.And the use for not meeting preset rules is determined in the user information of stored first kind access user in current period When the information of family, since the user information of the user becomes outlier, to be considered the work for type newly occur by electronic equipment The user information of disadvantage user, and the user information is as training sample.
Obviously, directly at the end of current period, the user of the first kind stored in current period access user is believed The user information for not meeting preset rules determined in breath is trained to preset to training pattern as training sample, then may be used User information under real user contingencies can be mistaken for training sample, lead to the finally obtained user that practises fraud for identification Identification model recognition accuracy it is lower.
Therefore, it in order to avoid the user information under real user contingencies is mistaken for training sample, improves The accuracy rate of training sample guarantees that the identification model identification with higher of the finally obtained user that practises fraud for identification is accurate Rate.
On the basis of the first implementation provided in an embodiment of the present invention, as third provided in an embodiment of the present invention Kind specific implementation, can be before the step that training sample is added in target sample set, and electronic equipment can be true Each training sample is corresponding calmly meets predeterminated frequency in line frequency.
Specifically, electronic equipment may determine that each training sample is corresponding is in line frequency after obtaining training sample It is no to meet predeterminated frequency, the training sample that predeterminated frequency is unsatisfactory in line frequency is deleted, so that each training determined Sample is all satisfied predeterminated frequency in line frequency.In turn, electronic equipment can meet predeterminated frequency in line frequency for determining Training sample be added in target sample set.
Wherein, it is so-called in line frequency it is to be understood that predetermined period to be divided into isometric multiple periods, for each instruction Practice sample, the quantity of the period of the training sample occurs in statistics, then the quantity is the training sample in line frequency.
For example, it is assumed that current period when a length of ten days, according to 24 hours preset durations, current period can be divided into Ten periods, and be to be numbered a period to this using 1-10, predeterminated frequency 6, when training sample is in line frequency When not less than predeterminated frequency, determine that the frequency of training sample meets predeterminated frequency.
If appearing in the period of the 1st, the 2nd, the 3rd, the 4th, the 5th, the 7th, the 8th and the 9th by statistics discovery training sample a It is interior, that is to say, that the quantity for the period that training sample a occurs can be determined as 8, i.e. training sample a is 8 in line frequency, Due to 8 > 6, electronic equipment then determines that training sample a's meets predeterminated frequency in line frequency, in turn, can be by training sample a It is added in target sample set.
It should be noted that the example above is only a kind of implementation of the embodiment of the present invention, the embodiment of the present invention is not right The particular content in line frequency and predeterminated frequency of training sample is defined, also not to the full in line frequency of determining training sample The concrete mode of sufficient predeterminated frequency is defined.
When electronic equipment determines that certain training sample is corresponding when line frequency is unsatisfactory for frequency condition, electronic equipment can be after It is continuous to be directed to next training sample, judge whether its corresponding meets predeterminated frequency in line frequency.
It should be noted that electronic equipment successively judge each training sample meet predeterminated frequency in line frequency, and will The determining training sample for meeting predeterminated frequency in line frequency is added in target sample set, can also judge each instruction simultaneously That practices sample meets predeterminated frequency in line frequency, and the determining training sample for meeting predeterminated frequency in line frequency is added to mesh It marks in sample set.This is all reasonable.
It is not met in the user information of stored first kind access user in the following, being determined in current period to electronic equipment The user information of preset rules, the mode as training sample carry out citing introduction.
Specifically, as shown in Fig. 2, which may include steps of:
S201: the user information and operation log of stored first kind access user in current period are obtained;
Wherein, the user information of first kind access user may include a plurality of types of user informations, these information can be with The user characteristics of first kind access user itself are identified, for example, User IP, User ID, browser relevant information, such as browser Type, cookie etc..Operation log may include a plurality of types of operation datas, these operation datas can be the mark first kind The data of user online status are accessed, for example, the data of mark first kind access user's line duration, mark first kind access are used The data of family online hours are also possible to carry out the data of each generic operation, example when first kind access user that statistics obtains is online Such as, to the clicking rate etc. of all kinds of resources when the first kind access user counted is online.
Usually when receiving access request, server can store first kind access user corresponding with the access request User information, and for each first kind access user user information, track user information institute in line duration The various operations done, to store all types of operation datas of first kind access user in operation log.That is, After receiving access request, server can store corresponding with access request first kind access user access information and Operation log.Wherein, it may include the operation data of a type in operation log, also may include the operand of multiple types According to being not specifically limited to this application.
It should be noted that stored first kind access is used in above-mentioned current period when electronic equipment is server The user information and operation log at family can be in current period, and server is when receiving access request, the first kind of storage Access the user information and operation log of user.
When electronic equipment is the electronic equipment of the non-servers such as processor, computer, electronic equipment can be built with server Vertical communication connection, at the end of current period, electronic equipment can send information acquisition request to the server, and request should The user information and operation log of server stored first kind access user in current period, and responded in the server When above- mentioned information acquisition request, user's letter of stored first kind access user in the current period of server transmission is received Breath and operation log.
When electronic equipment is the electronic equipment of the non-servers such as processor, computer, electronic equipment can be built with server Vertical communication connection, then the server can will be used in the stored first kind access at the end of current period in current period The user information and operation log at family are sent to electronic equipment.
In this application, the user information of stored first kind access user in current period is not obtained to electronic equipment And the mode of operation log is specifically limited.
S202: accessing user for each first kind, judge whether its corresponding operation log meets preset rules, if should The corresponding operation log of first kind access user does not meet preset rules, executes step S203;
It is obtaining in current period after the user information of stored first kind access user and operation log, for every One first kind accesses user, and electronic equipment may determine that whether corresponding operation log meets preset rules, if with this The corresponding operation log of one kind access user does not meet preset rules, then electronic equipment can continue to execute step S203.
Under normal conditions, when first kind access user is real user, the first kind access user's of server storage The numerical value of all types of operation datas is generally in a more determining numberical range, or usually one more determine Numerical value, thus using unsupervised learning algorithm, when the user information to first kind access user is according to the operand of each type According to similarity be grouped when, the user information that the first kind accesses user can be divided into several groups, and outlier is not present.
For example, when first kind access user is real user, the number of the user of same User IP to the click volume of advertisement Value is generally in numerical value set [1.98%, 2.02%], in another example, when first kind access user is real user, some Signpost before in TV play, in insert advertisement and suspend the numerical value of flowing of access ratio of advertisement be usually 1:7:2.
Therefore, it is corresponding that the numerical value that preset condition can be all types of operation datas in operation log does not meet real user The numerical value of the numberical range of all types of operation datas or all types of operation datas all types of operands corresponding with real user According to numerical value there is biggish difference, in this way, when the corresponding operation log of first kind access user is much different from real user When operation log, the user information of first kind access user can become outlier due to not meeting preset rules, because This, the first kind access user be cheating user a possibility that it is higher, then step S203 can be executed.
If the judging result that electronic equipment executes step S202 is that operation log corresponding with first kind access user accords with Preset rules are closed, then electronic equipment can access user continuing with next first kind, whether judge its corresponding operation log Meet preset rules.
It should be noted that whether electronic equipment can successively judge the corresponding operation log of each first kind access user Meet preset rules, can also judge whether the corresponding operation log of all first kind access users meets preset rules simultaneously, This is all reasonable.
S203: the user information for determining first kind access user is training sample;
When the corresponding operation log satisfaction of first kind access user does not meet preset rules, illustrate that first kind access is used All types of operands in the numerical value of all types of operation datas operation log corresponding with real user in the corresponding operation log in family According to numerical value there is larger difference, and then can illustrate, a possibility that first kind access user is cheating user, is higher, and First kind access user is not identified as cheating user by object module, and therefore, which accesses user largely On may be the new type cheating user occurred in current period, in turn, which accesses the user information of user very big It can be used as training sample in degree to be added in target sample set.Therefore, electronic equipment can be by the use of access user Family information is determined as training sample.
It should be noted that the first kind corresponding with the access request that server is stored when receiving access request is visited It asks in the operation log of user, it will usually all operation datas including first kind access user, however, for of the invention real For applying example, the stored first kind accesses user in the electronic equipment current period acquired when executing above-mentioned steps S201 Operation log in can not have to include first kind access user all operation datas, but according to the demand of practical application Obtain one of type or a plurality of types of operation datas.
Under normal conditions, different types of operation data is to judging whether the corresponding user information of the operation data is trained Sample is played the role of difference, and therefore, different types of operation data is to judging whether its corresponding user information is trained Weighted value shared by sample is different, and the operation data role of a certain type is bigger, the weight of the operation data of the type Value also can be higher.
For example, it is assumed that the first kind accesses user to ad click rate to judging whether its corresponding user information is alternatively to instruct Practicing weighted value shared by sample is 80%, since weighted value is higher, illustrates first kind access user to ad click rate to judging it It is larger whether corresponding user information is played the role of by training sample, can play decisive role, therefore, electronic equipment Access user can be only included in acquired operation log when executing above-mentioned steps S201 to ad click rate.
In another example, it is assumed that the first kind accesses user and is distributed ratio to the access time of ad click rate, first kind access user Example and first kind access user for the advertisement of same video different periods clicking rate ratio to judge its corresponding user letter It is respectively 40%, 30% and 30% that whether breath, which is weighted value shared by training sample, illustrates the operation data of above-mentioned three types To judging it is average whether its corresponding user information is played the role of by training sample, decisive work cannot be played With may include above-mentioned three types in the electronic equipment operation log acquired when executing above-mentioned steps S201 therefore Operation data.
The stored first kind accesses user in the electronic equipment current period acquired when executing above-mentioned steps S201 Operation log in include operation data type quantity difference when, electronic equipment is adopted when executing above-mentioned steps S202 Preset rules can be different.In the following, to the operation data and multiple types for respectively including a type in operation log When operation data, the concrete mode that electronic equipment executes above-mentioned steps S202 is illustrated.
In a kind of implementation, aforesaid operations log may include the operation data of a type.
Then above-mentioned steps S202 accesses user for each first kind, and it is default to judge whether its corresponding operation log meets Rule step, may include:
User is accessed for each first kind, it is default to judge whether operation data meets the first kind corresponding to its type Rule executes above-mentioned steps S203 if operation data does not meet first kind preset rules corresponding to its type;
After the stored first kind accesses user information and the operation log of user in acquisition current period, for each The first kind accesses user, and electronic equipment can determine the type for the operation data for including in its corresponding operation log, and then sentence Whether the operation data of breaking meets first kind preset rules corresponding to its type, if the operation data does not meet its type institute Corresponding first kind preset rules, then electronic equipment continues to execute step S203.It is, if operation data does not meet its class First kind preset rules corresponding to type, then electronic equipment can determine that the user information of first kind access user is training Sample.
For example, the type of operation data is that the first kind accesses user to ad click rate, the corresponding first kind is default Rule is that the first kind accesses user to the numerical value of ad click rate in set [1.98%, 2.02%], and electronic equipment obtains The first kind access user A operation log in include ad click rate be 3.5%, then due to 3.5% be located at set [1.98%, 2.02%] outside, then electronic equipment can determine that the user information of first kind access user A is training sample.Electronics The ad click rate for including in the operation log for the first kind access user B that equipment obtains is 1.99%, then due to 1.99% In set [1.98%, 2.02%], then electronic equipment can determine that the user information of first kind access user A is non-training sample This.
It should be noted that obtaining the user information of stored first kind access user and operation day in current period After will, electronic equipment successively can access user for each first kind, whether judge the operation data for including in operation log Meet the corresponding first kind preset rules of its type;User can also be accessed for all first kind simultaneously, judge to operate day Whether the operation data for including in will meets the corresponding first kind preset rules of its type, this is all reasonable.
In another implementation, aforesaid operations log may include the operation data of multiple types;
Then above-mentioned steps S202 accesses user for each first kind, and it is default to judge whether its corresponding operation log meets Rule step, may include:
Step B1: for the operation data for each type for including in the operation log of each first kind access user, sentence Whether the operation data of breaking meets Second Type preset rules corresponding to its type, if not meeting, executes step B2;
Step B2: determine that the operation data is object run data;
After the stored first kind accesses user information and the operation log of user in acquisition current period, for each The first kind accesses user, and electronic equipment can determine the type for the operation data for including in its corresponding operation log, and then sentence Whether each operation data of breaking accords with the pre- rule of Second Type corresponding to its type, if the operation data does not meet its class Second Type preset rules corresponding to type, then electronic equipment continues to execute step B2.It is right that its type institute will namely not met The operation data for the Second Type preset rules answered is determined as object run data.
Since different types of operation data is to judging whether the corresponding user information of the operation data is training sample institute Play the role of difference, therefore, although when operation data is targeted operation data, it may be said that the bright operation data is remote Far different from the operation data of real user, that is to say, that the corresponding first kind access user of the operation data is cheating user Possibility is higher, but in this case, it might not can illustrate the corresponding first kind access user of the operation data just It is cheating user.Therefore, in order to more accurately determine training sample in the user information of the first kind access user of acquisition, Then after object run data have been determined, electronic equipment can continue to execute step B3.
It should be noted that user can successively judge whether the operation data for including in acquired operation log meets Second Type preset rules corresponding to its type can also judge the operation for including in each acquired operation log simultaneously Whether data meet Second Type preset rules corresponding to its type, this is all reasonable.
Step B3: accessing user for each first kind, judges target operand corresponding to first kind access user According to quantity whether be not less than default value, if the quantity of object run data corresponding to first kind access user is not less than Default value executes above-mentioned steps S203;
User is accessed for each first kind, electronic equipment can determine the corresponding object run of first kind access user The quantity of data, in turn, it can be determined that whether the quantity is not less than default value, if the quantity is not less than default value, explanation The first kind accesses the operation data of multiple types of user all much different from the operation data of real user, in turn, illustrates this A possibility that first kind access user is cheating user is larger, then electronic equipment can continue to execute above-mentioned steps S203, also It is when saying the quantity not less than default value, then electronic equipment can determine that the user information of first kind access user is training Sample.
Wherein, default value can be according to different types of operation data to judging the corresponding user information of the operation data Whether by training sample play the role of and practical application in cheating user recognition accuracy requirement set, example Such as, the requirement in practical application to the recognition accuracy of cheating user is higher, then default value can be bigger.
For example, accessing user C for the first kind, in the operation log for the first kind access user C that electronic equipment obtains, packet The operation data of four seed types is included, respectively first kind access user is directed to advertisement to ad click rate, first kind access user Exposure rate, the first kind access user access time distribution proportion and the first kind access user for same video different periods Advertisement clicking rate ratio, wherein B1- step B2 through the above steps, electronic equipment determine the first kind access user C behaviour Make the ad click rate in log, first kind access user accesses the access time of user for the exposure rate and the first kind of advertisement Distribution proportion is object run data, then the quantity of object run data is 3, and default value is 3, then electronic equipment can be true The quantity of the fixed access corresponding object run data of C is not less than default value, and the user that the first kind can be accessed to user C believes Breath is determined as training sample.
It should be noted that the quantity of the object run data corresponding to first kind access user is not less than default value When, illustrate multiple operation datas of first kind access user much different from the operation data of real user, therefore, Ke Yi It largely excludes real user and the case where abnormal operation data accidentally occurs, to improve the standard of determining training sample True rate, in turn, electronic equipment can further determine that training sample from training sample.
Electronic equipment when executing above-mentioned steps B3, can successively for each first kind access user, judge this first Whether the quantity of object run data corresponding to class access user is not less than default value, can also be directed to all first simultaneously Class accesses user, judges whether the quantity of object run data corresponding to first kind access user is not less than default value. This is all reasonable.
Optionally, above-mentioned steps B3 accesses user for each first kind, judges corresponding to first kind access user Whether the quantity of object run data is not less than default value, may include:
User is accessed for each first kind, judges the weight of object run data corresponding to first kind access user Addition and value whether be not less than default weighted value, if the first kind access user corresponding to object run data weight plus With value not less than default weighted value, above-mentioned steps S203. is executed
When whether judge the user information of the corresponding first kind access user of object run data is training sample, according to Different types of operation data, can to judging whether the corresponding user information of the operation data is weighted value shared by training sample To determine the sum of the weighted value of object run data, when the sum of weighted value is greater than default weighted value, illustrate identified target Operation data can largely determine that corresponding first kind access user is cheating user, and then can determine that target is grasped The user information for making the corresponding first kind access user of data is training sample.
Wherein, default weighted value can be according to different types of operation data to judging that the corresponding user of the operation data believes Breath whether by training sample play the role of and practical application in cheating user recognition accuracy requirement set, For example, the requirement in practical application to the recognition accuracy of cheating user is higher, then presetting weighted value can be bigger.
For example, it is assumed that the type for the operation data for including in the operation log of first kind access user is respectively that the first kind is visited Ask user to the access time distribution proportion of ad click rate, first kind access user and first kind access user for same view The clicking rate ratio of the advertisement of frequency different periods, and the operation data of above three type is to judging that its corresponding user information is Weighted value shared by training sample is respectively 80%, 5% and 15%, and presetting weighted value is 70%.
Electronic equipment judges the operation data for including in the operation log of first kind access user D, determines wherein Only first kind access user is object run data to ad click rate, then the addition and value of the weight of target access data is 80%, due to 80% > 70%, the user information that the first kind accesses user D can be determined as training sample by electronic equipment.
Electronic equipment judges the operation data for including in the operation log of first kind access user E, determines wherein The first kind accesses the access time distribution proportion of user and the first kind accesses advertisement of the user for same video different periods Clicking rate ratio is object run data, then the addition and value of the weight of target access data is 20%, due to 20% < 70%, then The user information that the first kind accesses user D cannot be determined as training sample by electronic equipment.
As a kind of embodiment of the embodiment of the present invention, above-mentioned electronic equipment obtains current when executing step S201 The type for the operation data for including in the operation log of stored first kind access user in period can be following operand According to one of or it is a variety of:
The first kind accesses user and is accessed for exposure rate, the first kind of advertisement ad click rate, first kind access user The access time distribution proportion of user, the first kind access user for the clicking rate ratio of the advertisement of same video different periods.
Certainly, stored first kind access is used in the current period that above-mentioned electronic equipment is obtained when executing step S201 The operation data for including in the operation log at family can also include other kinds of operation data, in this regard, the application is not done specifically It limits.
As a kind of embodiment of the embodiment of the present invention, the first kind is accessed by identification model in above-mentioned steps B1 and is used The step of family is identified may include:
Step C1: the user information of first kind access user is obtained;
After entering next period, electronic equipment be can use at the end of current period based on the target sample collection after addition It closes the identification model that training obtains to identify first kind access user, determines whether it is cheating user.
When electronic equipment is server, then the available first kind corresponding with access request that is receiving accesses user User information;When electronic equipment is other electronic equipments of non-server, the electronic equipment and server establish communication link It connects, when server receives access request, the first kind corresponding with the access request can be accessed the use of user by server Family information is sent to the electronic equipment.The application to electronic equipment obtain the first kind access user user information mode into Row is specific to be limited.
Step C2: user information being input in identification model and is detected, and obtains the identification knot of first kind access user Fruit.
The user information can be input to identification after the user information for getting first kind access user by electronic equipment In model, since identification model can learn the feature of the user information of all types of cheating users occurred in current period, Therefore identification model can be by the feature of the user information of input and the user information of all types of cheating users learnt Feature is matched, to identify to the user information of input corresponding first kind access user, determines whether it is work Disadvantage user.
It should be noted that above-mentioned foundation is practised fraud the identification model of user for identification in next period, can from It uses, can also be used under presence under linear state.And when using the identification model under different conditions, electronic equipment It can be different in the mode for executing the user information that above-mentioned steps C1 obtains first kind access user, in the following, respectively to electricity The mode that sub- equipment executes the user information that above-mentioned steps C1 obtains first kind access user under different conditions illustrate It is bright.
In a kind of embodiment, the step of above-mentioned steps C1 obtains the user information of first kind access user, may include:
In next end cycle, under off-line state, the use of the first kind access user of next cycle memory storage is obtained Family information.
When receiving access request, server can store the use of first kind access user corresponding with the access request Family information under off-line state, can obtain next period by obtaining the access log of server when next end cycle The user information of the first kind access user of interior storage, and the user information that will acquire is input in identification model, utilizes knowledge Other model identifies first kind access user, determines whether it is cheating user.
Wherein, when electronic equipment is server, electronic equipment is available be stored in local current period the The user information of one kind access user;When electronic equipment is the equipment of non-server, which can build with server Vertical communication connection can receive the first kind stored in the current period that server is sent in next end cycle and visit in turn It asks the user information of user, or sends information acquisition request to server, obtain server and respond information acquisition request hair The first kind stored in the current period sent accesses the user information of user, may include preset in above- mentioned information acquisition request Period.
In another embodiment, above-mentioned steps C1 obtains the step of first kind accesses the user information of user, can wrap It includes:
When receiving the access request that first kind access user sends, the user information of first kind access user is obtained.
In the present embodiment, after entering next period, the access that electronic equipment can be obtained and be received in real time is asked Seek the user information of corresponding first kind access user.That is, electronic equipment can be under presence, under obtaining in real time The user information of first kind access user in one period, and then the user information that will acquire is input in identification model, benefit First kind access user is identified with identification model, determines whether it is cheating user.
Wherein, when electronic equipment is server, electronic equipment can obtain and the access when receiving access request Request the user information of corresponding first kind access user;When electronic equipment is the equipment of non-server, which can It is communicated to connect with being established with server, in turn, server, can will be corresponding with the access request when receiving access request The user information of first kind access user is sent to electronic equipment.This is all reasonable.
As a kind of embodiment of the embodiment of the present invention, a kind of above-mentioned training side of the model for the user that practises fraud for identification Method can also include:
When the recognition result of first kind access user is cheating user, the access request of shielding first kind access user.
Optionally, after access request corresponding first kind access user is confirmed as practising fraud user, electronic equipment can be with The target user's information carried to the access request is marked, which shows target user's information for cheating user's letter Breath.When carrying labeled target user's information in the access request that electronic equipment receives again, electronic equipment can be with By above-mentioned label, identify that the corresponding first kind access user of the access request is cheating user, and then shield visit access and ask It asks.
Optionally, after the corresponding first kind access user of access request is confirmed as cheating user, electronic equipment can also It is recorded with the target user's information carried to the access request, obtains cheating user information statistical form.Electronic equipment is again When carrying labeled target user's information in the access request received, electronic equipment can be used the target obtained again Family information is matched with the information in above-mentioned cheating user information statistical form, determines whether target user's information is recorded in In above-mentioned coordinate user information statistical form, in turn, electronic equipment can determine the corresponding first kind access user of the access request It whether is cheating user, if so, the access request can be shielded.
As a kind of embodiment of the embodiment of the present invention, in access request, corresponding first kind access user is confirmed as It practises fraud after user, target user's information which accesses user can also be sent to and communicate connection by electronic equipment Other electronic equipments can also be shielded when other electronic equipments receive the access request for carrying target user's information The access request.
As it can be seen that electronic equipment can shield the access request for carrying cheating user information in the present embodiment, so as to Influence of the less cheating user to the click volumes of all kinds of resources perhaps playback volume promoted the resource that statistics obtains click volume or The authenticity of playback volume, and then when carrying out decision according to the click volume or playback volume of resource, reduce cheating user's bring Adverse effect.
Corresponding to a kind of training method for model of cheating user for identification that the embodiments of the present invention provide, this hair Bright embodiment additionally provides a kind of training device of the model of user that practises fraud for identification, as shown in figure 3, the device includes:
User profile acquisition module 310, for obtaining and storing the user information of first kind access user;
Training sample determining module 320 is not met in the user information for determining stored first kind access user The user information of preset rules, as training sample.
Wherein, preset rules are as follows: the user information based on stored second class access user is calculated by unsupervised learning The rule that method determines, the user information of the second class access user are obtained before the user information for obtaining first kind access user And the user information of the access user stored;
Model training module 330, for being trained to preset to training pattern based on training sample, wherein described It is that the first kind accesses user for identification and second class accesses whether user is the mould of user of practising fraud to training pattern Type;
Identification model obtains module 340, for reaching default accuracy rate when the accuracy rate of the output result to training pattern When, deconditioning, the identification model for the user that practised fraud for identification.
It is visible above, in scheme provided in an embodiment of the present invention, it can be accessed by unsupervised algorithm based on the second class The user information of user determines preset rules, to can determine in the user information that the stored first kind accesses user Newly there is the user information of the cheating user of type, in turn, the training of user information based on determined by these obtains new identification Model, so that the new identification model, which can identify, the cheating user of type newly occurs.In this way, by preset rules to it is new go out The user information of the cheating user of existing type is labeled, and avoiding the method that manually marks can not be to the cheating of type newly occurs The phenomenon that user is labeled appearance, to improve the obtained new identification model of training to newly occurring the cheating user's of type The recall rate of recognition accuracy and the identification model.
As a kind of embodiment of the embodiment of the present invention,
Above-mentioned User profile acquisition module 310 may include:
User information acquisition submodule (is not shown) in Fig. 3, for first kind access to be obtained and stored in current period The user information of user;
Above-mentioned training sample determining module 320 may include:
Training sample determines submodule (being not shown in Fig. 3), for determining in current period at the end of current period The user information that preset rules are not met in the user information of the first kind access user of storage, as training sample;
Above-mentioned model training module 330 may include: sample set addition submodule (being not shown in Fig. 3) and model training Submodule (is not shown) in Fig. 3;
Sample set adds submodule (being not shown in Fig. 3), for training sample to be added in target sample set;
Wherein, set when target sample collection is combined into an end cycle for the sample of training objective model, target mould Type is that current period is practised fraud the model of user for identification.
Model training submodule (is not shown) in Fig. 3, for the target sample set after addition to be input to object module In be trained;
As a kind of embodiment of the embodiment of the present invention, a kind of above-mentioned training cartridge of the model for the user that practises fraud for identification Setting to include:
Information storage and model application module (being not shown in Fig. 3), for storing current period after entering next period The user information of interior first kind access user, and user is accessed to the first kind by identification model and is identified, triggering training Sample determining module.
As a kind of embodiment of the embodiment of the present invention, a kind of above-mentioned training cartridge of the model for the user that practises fraud for identification Setting to include:
Online frequency determining module (being not shown in Fig. 3), for training sample to be added to it in target sample set Before, it determines that each training sample is corresponding and meets predeterminated frequency in line frequency.
As a kind of embodiment of the embodiment of the present invention, above-mentioned user information acquisition submodule (being not shown in Fig. 3) can To include:
User information acquiring unit (is not shown) in Fig. 3, for obtaining and having deposited in current period at the end of current period The user information and operation log of the first kind access user of storage;
Preset rules judging unit (is not shown) in Fig. 3, for accessing user for each first kind, judges that its is corresponding Operation log is to meet preset rules, if not meeting, trigger training sample determination unit (being not shown in Fig. 3);
Training sample determination unit (is not shown) in Fig. 3, for determining that the user information of first kind access user is standby Select training sample.
In a kind of implementation, aforesaid operations log may include the operation data of a type.
Then above-mentioned preset rules judging unit (being not shown in Fig. 3), can be specifically used for: use for the access of each first kind Family, judges whether operation data meets first kind preset rules corresponding to its type, if not meeting, triggering training sample is true Order member (being not shown in Fig. 3).
In a kind of implementation, aforesaid operations log may include the operation data of multiple types.
Then above-mentioned preset rules judging unit (being not shown in Fig. 3) may include:
Preset rules judgment sub-unit (is not shown) in Fig. 3, for the operation log for each first kind access user In include each type operation data, judge whether the operation data meets the default rule of Second Type corresponding to its type Then, if not meeting, trigger data determines subelement (being not shown in Fig. 3);
Data determine subelement (being not shown in Fig. 3), for determining that the operation data is object run data;
Default value judgment sub-unit (is not shown) in Fig. 3, for for each first kind access user, judge this first Whether the quantity of object run data corresponding to class access user is not less than default value, if satisfied, the above-mentioned alternative instruction of triggering Practice sample determination unit (being not shown in Fig. 3).
Optionally, aforesaid operations data may include: the exposure for accessing user and being directed to advertisement to ad click rate, access user Light rate, the access time distribution proportion for accessing user access user for the clicking rate ratio of the advertisement of same video different periods Example.
As a kind of embodiment provided in an embodiment of the present invention, above- mentioned information storage and model application module are (in Fig. 3 not Show), may include:
Access information acquisition submodule (is not shown) in Fig. 3, for obtaining the user information of first kind access user;
It accesses user and identifies submodule (being not shown in Fig. 3), examined for user information to be input in identification model It surveys, obtains the recognition result of first kind access user.
As a kind of embodiment provided in an embodiment of the present invention, above-mentioned access information acquisition submodule (is not shown in Fig. 3 Out), can be specifically used for:
In next end cycle, under off-line state, the use of the first kind access user of next cycle memory storage is obtained Family information;Or,
When receiving the access request that first kind access user sends, the user information of first kind access user is obtained.
As a kind of embodiment provided in an embodiment of the present invention, a kind of above-mentioned instruction of the model for the user that practises fraud for identification Practicing device can also include:
Access request shroud module (is not shown) in Fig. 3, for being that cheating is used when the recognition result of first kind access user When family, the access request of shielding first kind access user.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, include processor 401, communication interface 402, Memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 are complete by communication bus 404 At mutual communication,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes that the embodiments of the present invention provide A kind of user that practises fraud for identification model training method method and step:
Specifically, the training method of the model of the above-mentioned user that practises fraud for identification, including
Obtain and store the user information of first kind access user;
The user information that preset rules are not met in the user information of stored first kind access user is determined, as instruction Practice sample, wherein preset rules are as follows: the user information based on stored second class access user is calculated by unsupervised learning The rule that method determines, the user information of the second class access user are obtained before the user information for obtaining first kind access user And the user information of the access user stored;
Be trained to preset to training pattern based on training sample, wherein it is described to training pattern be for identification Whether first kind access user and second class access user are the model of user of practising fraud;
When the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning is obtained for identification The identification model of cheating user.
It should be noted that above-mentioned processor 401 executes the program stored on memory 403 and the work for identification realized Other implementations of the training method of the model of disadvantage user are made for identification with one kind that preceding method embodiment part provides The embodiment of the method for the training method of the model of disadvantage user is identical, and which is not described herein again.
It is visible above, in scheme provided in an embodiment of the present invention, it can be accessed by unsupervised algorithm based on the second class The user information of user determines preset rules, to can determine in the user information that the stored first kind accesses user Newly there is the user information of the cheating user of type, in turn, the training of user information based on determined by these obtains new identification Model, so that the new identification model, which can identify, the cheating user of type newly occurs.In this way, by preset rules to it is new go out The user information of the cheating user of existing type is labeled, and avoiding the method that manually marks can not be to the cheating of type newly occurs The phenomenon that user is labeled appearance, to improve the obtained new identification model of training to newly occurring the cheating user's of type The recall rate of recognition accuracy and the identification model.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment The training method of the model of the user that practises fraud for identification stated.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes the model of any user that practises fraud for identification in above-described embodiment Training method.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality Apply example, electronic equipment embodiment, computer readable storage medium embodiment, the computer program product embodiments comprising instruction and Speech, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the part of embodiment of the method in place of correlation Explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (15)

1. a kind of training method of the model for the user that practises fraud for identification, which is characterized in that the described method includes:
Obtain and store the user information of first kind access user;
The user information that preset rules are not met in the user information of stored first kind access user is determined, as training sample This, wherein the preset rules are as follows: the user information based on stored second class access user is calculated by unsupervised learning The rule that method determines, the user information of the second class access user are in the user information for obtaining the first kind access user The user information for the access user for obtaining and storing before;
Be trained to preset to training pattern based on the training sample, wherein it is described to training pattern be for identification Whether first kind access user and second class access user are the model of user of practising fraud;
When described when the accuracy rate of the output result of training pattern reaches default accuracy rate, deconditioning is obtained for identification The identification model of cheating user.
2. the method according to claim 1, wherein
The acquisition simultaneously stores the step of first kind accesses the user information of user, comprising:
The user information of first kind access user is obtained and stored in current period;
The user information that preset rules are not met in the user information of the stored first kind access user of determination, as instruction The step of practicing sample, comprising:
At the end of current period, determine do not met in the user information of stored first kind access user in current period it is pre- If the user information of rule, as training sample;
It is described to be based on the training sample to preset the step of being trained to training pattern, comprising:
The training sample is added in target sample set, wherein when the target sample collection is combined into an end cycle The set of sample for training objective model, the object module are that current period is practised fraud the model of user for identification;
Target sample set after addition is input in the object module and is trained.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
After entering next period, the user information of the first kind access user in current period is stored, and passes through the identification Model identifies that return is described at the end of current period to first kind access user, determines in current period and has deposited The user informations of preset rules is not met in the user information of the first kind access user of storage, the step of as training sample.
4. according to the method described in claim 2, it is characterized in that, the training sample is added to target sample collection described Before step in conjunction, the method also includes:
It determines that each training sample is corresponding and meets predeterminated frequency in line frequency.
5. according to the method described in claim 2, it is characterized in that, stored first kind access in the determining current period The user information that preset rules are not met in the user information of user, the step of as training sample, comprising:
Obtain the user information and operation log of stored first kind access user in current period;
User is accessed for each first kind, judges whether its corresponding operation log meets preset rules;
If not meeting preset rules, determine that the user information of first kind access user is training sample.
6. according to the method described in claim 5, it is characterized in that, the operation log includes the operation data of a type;
It is described to access user for each first kind, judge the step of whether its corresponding operation log meets preset rules, wraps It includes:
User is accessed for each first kind, it is default to judge whether the operation data meets the first kind corresponding to its type Rule;
If described do not meet preset condition, determine that the first kind accesses the step of user information of user is training sample, comprising:
If the operation data does not meet first kind preset rules corresponding to its type, determine first kind access user's User information is training sample.
7. according to the method described in claim 5, it is characterized in that, the operation log includes the operation data of multiple types;
It is described to access user for each first kind, judge the step of whether its corresponding operation log meets preset rules, wraps It includes:
For the operation data for each type for including in the operation log of each first kind access user, the operation data is judged Whether its type corresponding to Second Type preset rules are met;
If not meeting Second Type preset condition, determine that the operation data is object run data;
For each first kind access user, judge the first kind access user corresponding to object run data quantity whether Not less than default value;
If described do not meet preset condition, determine that the first kind accesses the step of user information of user is training sample, comprising:
If the first kind access user corresponding to object run data quantity be not less than the default value, determine this first The user information that class accesses user is training sample.
8. a kind of training device of the model for the user that practises fraud for identification, which is characterized in that described device includes:
User profile acquisition module, for obtaining and storing the user information of first kind access user;
Training sample determining module does not meet preset rules in the user information for determining stored first kind access user User information, as training sample, wherein the preset rules are as follows: based on stored second class access user user Information, the rule determined by unsupervised learning algorithm, the user information of second class access user are to obtain described the The user information for the access user for obtaining and storing before the user information of one kind access user;
Model training module, for being trained to preset to training pattern based on the training sample, wherein described wait instruct Practicing model is that the first kind accesses user for identification and second class accesses whether user is the model of user of practising fraud;
Identification model obtain module, for when described when the accuracy rate of the output result of training pattern reaches default accuracy rate, Deconditioning, the identification model for the user that practised fraud for identification.
9. device according to claim 8, which is characterized in that
The User profile acquisition module includes: user information acquisition submodule, and the user information acquisition submodule is used for: The user information of first kind access user is obtained and stored in current period;
The training sample determining module includes: that training sample determines that submodule, the training sample determine that submodule is used for: At the end of current period, determines in current period and do not meet preset rules in the user information of stored first kind access user User information, as training sample;
The model training module includes: sample set addition submodule and model training submodule;The sample set addition Submodule, for the training sample to be added in target sample set, wherein the target sample collection is combined into a period At the end of for training objective model sample set, the object module is that current period is practised fraud the mould of user for identification Type;The model training submodule is trained for the target sample set after addition to be input in the object module.
10. device according to claim 9, which is characterized in that described device further include:
Information storage and model application module, the first kind for after entering next period, storing in current period, which accesses, to be used The user information at family, and user is accessed to the first kind by the identification model and is identified, trigger the training sample Determining module.
11. device according to claim 9, which is characterized in that described device further include:
Online frequency determining module, for determining each instruction before the training sample is added in target sample set Practice that sample is corresponding meets predeterminated frequency in line frequency.
12. device according to claim 9, which is characterized in that the user information acquisition submodule includes:
User information acquiring unit is used at the end of current period, obtaining stored first kind access in current period The user information and operation log at family;
Preset rules judging unit judges whether its corresponding operation log meets for accessing user for each first kind Preset rules trigger training sample determination unit if not meeting;
The training sample determination unit, for determining that the user information of first kind access user is training sample.
13. device according to claim 12, which is characterized in that the operation log includes the operand of a type According to;
The preset rules judging unit, is specifically used for: accessing user for each first kind, whether judges the operation data Meet first kind preset rules corresponding to its type, if not meeting, triggers the training sample determination unit.
14. device according to claim 12, which is characterized in that the operation log includes the operand of multiple types According to the preset rules judging unit includes:
Preset rules judgment sub-unit, each type for including in the operation log for each first kind access user Operation data, judges whether the operation data meets Second Type preset rules corresponding to its type, if not meeting, triggers number According to determining subelement;
The data determine subelement, for determining that the operation data is object run data;
Default value judgment sub-unit judges corresponding to first kind access user for accessing user for each first kind The quantity of object run data whether be not less than default value, if satisfied, triggering the training sample determination unit.
15. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes method and step as claimed in claim 1 to 7.
CN201811030204.4A 2018-09-05 2018-09-05 Training method and device for model for identifying cheating users and electronic equipment Active CN109165691B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811030204.4A CN109165691B (en) 2018-09-05 2018-09-05 Training method and device for model for identifying cheating users and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811030204.4A CN109165691B (en) 2018-09-05 2018-09-05 Training method and device for model for identifying cheating users and electronic equipment

Publications (2)

Publication Number Publication Date
CN109165691A true CN109165691A (en) 2019-01-08
CN109165691B CN109165691B (en) 2022-04-22

Family

ID=64894014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811030204.4A Active CN109165691B (en) 2018-09-05 2018-09-05 Training method and device for model for identifying cheating users and electronic equipment

Country Status (1)

Country Link
CN (1) CN109165691B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871807A (en) * 2019-02-21 2019-06-11 百度在线网络技术(北京)有限公司 Face image processing process and device
CN110995681A (en) * 2019-11-25 2020-04-10 北京奇艺世纪科技有限公司 User identification method and device, electronic equipment and storage medium
WO2020143765A1 (en) * 2019-01-11 2020-07-16 腾讯科技(深圳)有限公司 Advertisement anti-spamming method and apparatus, electronic device, and storage medium
CN112258221A (en) * 2020-10-12 2021-01-22 上海酷量信息技术有限公司 System and method for identifying cheating terminal
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment
CN113657535A (en) * 2021-08-24 2021-11-16 北京奇艺世纪科技有限公司 Model training method and device, electronic equipment and storage medium
CN113743963A (en) * 2021-09-28 2021-12-03 北京奇艺世纪科技有限公司 Abnormal recognition model training method, abnormal object recognition device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150088662A1 (en) * 2012-10-10 2015-03-26 Nugg.Ad Ag Predictive Behavioural Targeting
CN106022826A (en) * 2016-05-18 2016-10-12 武汉斗鱼网络科技有限公司 Cheating user recognition method and system in webcast platform
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106326498A (en) * 2016-10-13 2017-01-11 合网络技术(北京)有限公司 Cheat video identification method and device
CN106326497A (en) * 2016-10-10 2017-01-11 合网络技术(北京)有限公司 Cheating video user identification method and device
CN107274212A (en) * 2017-05-26 2017-10-20 北京小度信息科技有限公司 Cheating recognition methods and device
CN108109011A (en) * 2017-12-28 2018-06-01 北京皮尔布莱尼软件有限公司 A kind of anti-cheat method of advertisement and computing device
CN108470253A (en) * 2018-04-02 2018-08-31 腾讯科技(深圳)有限公司 A kind of user identification method, device and storage device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150088662A1 (en) * 2012-10-10 2015-03-26 Nugg.Ad Ag Predictive Behavioural Targeting
CN106022826A (en) * 2016-05-18 2016-10-12 武汉斗鱼网络科技有限公司 Cheating user recognition method and system in webcast platform
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106326497A (en) * 2016-10-10 2017-01-11 合网络技术(北京)有限公司 Cheating video user identification method and device
CN106326498A (en) * 2016-10-13 2017-01-11 合网络技术(北京)有限公司 Cheat video identification method and device
CN107274212A (en) * 2017-05-26 2017-10-20 北京小度信息科技有限公司 Cheating recognition methods and device
CN108109011A (en) * 2017-12-28 2018-06-01 北京皮尔布莱尼软件有限公司 A kind of anti-cheat method of advertisement and computing device
CN108470253A (en) * 2018-04-02 2018-08-31 腾讯科技(深圳)有限公司 A kind of user identification method, device and storage device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143765A1 (en) * 2019-01-11 2020-07-16 腾讯科技(深圳)有限公司 Advertisement anti-spamming method and apparatus, electronic device, and storage medium
CN111435507A (en) * 2019-01-11 2020-07-21 腾讯科技(北京)有限公司 Advertisement anti-cheating method and device, electronic equipment and readable storage medium
CN109871807A (en) * 2019-02-21 2019-06-11 百度在线网络技术(北京)有限公司 Face image processing process and device
CN110995681A (en) * 2019-11-25 2020-04-10 北京奇艺世纪科技有限公司 User identification method and device, electronic equipment and storage medium
CN110995681B (en) * 2019-11-25 2022-04-22 北京奇艺世纪科技有限公司 User identification method and device, electronic equipment and storage medium
CN112258221A (en) * 2020-10-12 2021-01-22 上海酷量信息技术有限公司 System and method for identifying cheating terminal
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment
CN113657535A (en) * 2021-08-24 2021-11-16 北京奇艺世纪科技有限公司 Model training method and device, electronic equipment and storage medium
CN113743963A (en) * 2021-09-28 2021-12-03 北京奇艺世纪科技有限公司 Abnormal recognition model training method, abnormal object recognition device and electronic equipment

Also Published As

Publication number Publication date
CN109165691B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN109165691A (en) Training method, device and the electronic equipment of the model of cheating user for identification
CN108921221A (en) Generation method, device, equipment and the storage medium of user characteristics
CN107040397B (en) Service parameter acquisition method and device
Hilbert et al. Computational communication science: A methodological catalyzer for a maturing discipline
CN110210227A (en) Risk checking method, device, equipment and storage medium
CN109784381A (en) Markup information processing method, device and electronic equipment
CN108804704A (en) A kind of user&#39;s depth portrait method and device
CN110889463A (en) Sample labeling method and device, server and machine-readable storage medium
CN106709318A (en) Recognition method, device and calculation equipment for user equipment uniqueness
WO2023115761A1 (en) Event detection method and apparatus based on temporal knowledge graph
US20220214957A1 (en) Machine learning models applied to interaction data for facilitating modifications to online environments
US11188517B2 (en) Annotation assessment and ground truth construction
CN108304935A (en) Machine learning model training method, device and computer equipment
CN108153909A (en) Word method, apparatus and electronic equipment, storage medium are opened up in keyword dispensing
CN103617146B (en) A kind of machine learning method and device based on hardware resource consumption
CN112883257A (en) Behavior sequence data processing method and device, electronic equipment and storage medium
CN110162609A (en) For recommending the method and device asked questions to user
CN107944026A (en) A kind of method, apparatus, server and the storage medium of atlas personalized recommendation
US11275994B2 (en) Unstructured key definitions for optimal performance
CN111159241A (en) Click conversion estimation method and device
CN113297486B (en) Click rate prediction method and related device
CN113704511B (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN108768743A (en) A kind of user identification method, device and server
CN111881007B (en) Operation behavior judgment method, device, equipment and computer readable storage medium
CN114693011A (en) Policy matching method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant