CN108108743A - Abnormal user recognition methods and the device for identifying abnormal user - Google Patents

Abnormal user recognition methods and the device for identifying abnormal user Download PDF

Info

Publication number
CN108108743A
CN108108743A CN201611051585.5A CN201611051585A CN108108743A CN 108108743 A CN108108743 A CN 108108743A CN 201611051585 A CN201611051585 A CN 201611051585A CN 108108743 A CN108108743 A CN 108108743A
Authority
CN
China
Prior art keywords
characteristic
abnormal user
user
abnormal
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611051585.5A
Other languages
Chinese (zh)
Other versions
CN108108743B (en
Inventor
陈善
田天
康伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201611051585.5A priority Critical patent/CN108108743B/en
Publication of CN108108743A publication Critical patent/CN108108743A/en
Application granted granted Critical
Publication of CN108108743B publication Critical patent/CN108108743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Device this application discloses abnormal user recognition methods and for identifying abnormal user.One specific embodiment of this method includes:The characteristic of multiple users and feature based data are obtained, the abnormal user in multiple users is determined in a manner of unsupervised learning;Based on the characteristic for the abnormal user determined, select to build the key characterization parameter of disaggregated model from multiple characteristic parameters in a manner of supervised learning and generate the key feature data for including key characterization parameter;Disaggregated model is built using key feature data.It realizes and abnormal user is identified using unsupervised learning mode, key feature structure disaggregated model is selected using characteristic of the supervised learning mode based on abnormal user, so that disaggregated model is identified abnormal user only with the higher key feature of the significance level to identifying abnormal user, interference of the feature for avoiding significance level relatively low to identification process promotes recognition accuracy and reduces the expense of identification process.

Description

Abnormal user recognition methods and the device for identifying abnormal user
Technical field
This application involves computer realms, and in particular to big data field more particularly to abnormal user recognition methods and use In the device of identification abnormal user.
Background technology
In big data analysis, it is often necessary to abnormal user be identified the data of removal abnormal user to promote big number According to the accuracy of analysis.At present, usually judge whether the feature of user matches with recognition rule, really by configuring recognition rule Determine whether user is abnormal user.
However, when abnormal user being identified using aforesaid way the data of removal abnormal user, on the one hand, due to The data of user are magnanimity rank, with recognition rule match by the characteristic information of each user one by one causing identification process Expense is larger.On the other hand, due to that can not determine that each feature of user to identifying the significance level of abnormal user, causes big It measures the low feature of importance to participate in calculating, in turn results in the interference to identification process, accuracy rate is caused to reduce, further increase and know The expense of other process.
Invention information
A kind of device this application provides abnormal user recognition methods and for identifying abnormal user, it is above-mentioned for solving Background section.
In a first aspect, this application provides abnormal user recognition methods, this method includes:Obtain the characteristic of multiple users According to and feature based data, the abnormal user in multiple users is determined in a manner of unsupervised learning, characteristic includes: The characteristic parameter of the feature of multiple instruction users;Based on the characteristic for the abnormal user determined, in a manner of supervised learning It selects to build the key characterization parameter of disaggregated model from multiple characteristic parameters and generate comprising key characterization parameter Key feature data;Disaggregated model is built using key feature data, whether to be abnormal use to user using disaggregated model Family is identified.
Second aspect, this application provides for identifying the device of abnormal user, which includes:Recognition unit, configuration For obtaining the characteristic of multiple users and feature based data, determined in a manner of unsupervised learning in multiple users Abnormal user, characteristic includes:The characteristic parameter of the feature of multiple instruction users;Unit is chosen, is configured to based on true The characteristic for the abnormal user made is selected from multiple characteristic parameters for structure classification mould in a manner of supervised learning The key characterization parameter of type and generation include the key feature data of key characterization parameter;Construction unit is configured to utilize Whether key feature data build disaggregated model, to be that abnormal user is identified to user using disaggregated model.
The abnormal user recognition methods that the application provides and the device for identifying abnormal user, by obtaining multiple users Characteristic and feature based data, the abnormal user in multiple users, characteristic are determined in a manner of unsupervised learning According to including:The characteristic parameter of the feature of multiple instruction users;Based on the characteristic for the abnormal user determined, to there is supervision to learn Habit mode selects to build from multiple characteristic parameters the key characterization parameter of disaggregated model and generation comprising crucial special Levy the key feature data of parameter;Disaggregated model is built using key feature data.It realizes and is known using unsupervised learning mode Do not go out abnormal user, key feature structure classification mould is selected using characteristic of the supervised learning mode based on abnormal user Type so that disaggregated model knows abnormal user only with the higher key feature of the significance level to identifying abnormal user Not, interference of the feature for avoiding significance level relatively low to identification process promotes the accuracy of identification, meanwhile, reduce identification process Expense.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the abnormal user recognition methods that can be applied to the application or for identifying the device of abnormal user Exemplary system architecture;
Fig. 2 shows the flow chart of one embodiment of the abnormal user recognition methods according to the application;
Fig. 3 shows the flow chart of another embodiment of the abnormal user recognition methods according to the application;
Fig. 4 shows the structure diagram for being used to identify one embodiment of the device of abnormal user according to the application;
Fig. 5 shows to be used for the computer system for being used to identify the device of abnormal user for realizing the embodiment of the present application Structure diagram.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention rather than the restriction to the invention.It also should be noted that in order to Convenient for description, illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the abnormal user recognition methods that can be applied to the application or for identifying the device of abnormal user The exemplary system architecture 100 of embodiment.
As shown in Figure 1, system architecture 100 can include terminal 101,102,103, network 104 and server 105.Network 104 between terminal 101,102,103 and server 105 provide transmission link medium.Network 104 can include various Connection type, such as wired, wireless transmission link or fiber optic cables etc..
User can be interacted with using terminal 101,102,103 by network 104 with server 105, be disappeared with receiving or sending Breath etc..Various communication applications can be installed, such as searching class is applied, purchases by group class application, is instant in terminal 101,102,103 Communication class application etc..
Terminal 101,102,103 can be the various electronic equipments for having display screen and supporting network communication, including but It is not limited to smart mobile phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and Desktop computer etc..
Terminal 101,102,103 can gather the feature of the instruction user such as account, user name, telephone number, URL of user Characteristic parameter, the characteristic comprising multiple characteristic parameters is sent to server 105.Server 105 can utilize feature Data structure for identify user whether be abnormal user disaggregated model.
It please refers to Fig.2, it illustrates the flows 200 of one embodiment of the abnormal user recognition methods according to the application. The abnormal user recognition methods that the embodiment of the present application is provided can be performed by the server 105 in Fig. 1, correspondingly, for knowing The device of other abnormal user can be arranged in server 105.This method comprises the following steps:
Step 201, the abnormal user in multiple users is determined in a manner of unsupervised learning.
In the present embodiment, multiple characteristic parameters can be utilized to describe the feature of user from multiple dimensions.The feature of user Data include:The characteristic parameter of the feature of multiple instruction users, wherein, each feature can correspond at least one feature ginseng Number.In multiple users are obtained after the characteristic of each user, unsupervised learning mode may be employed according to multiple use The characteristic at family determines the abnormal user in multiple users.
In the present embodiment, information that can be in advance based on each attribute from the attribute of user selects the attribute of user As characteristic parameter.It, can be with for example, the characteristic parameter selected includes the characteristic parameters such as account, user name, telephone number, URL The feature of user is described using characteristic parameters such as the account of user, user name, telephone number, URL from multiple dimensions.Correspondingly, The characteristic of user can include the characteristic parameters such as account, user name, telephone number, the URL of user.
In some optional realization methods of the present embodiment, determined in a manner of unsupervised learning different in multiple users Common family includes:The characteristic of multiple users is clustered using clustering algorithm, obtains multiple clusters;When in cluster include with it is pre- If during the characteristic of off-note Data Matching, the corresponding user of all characteristics in cluster is determined as abnormal user.
In the present embodiment, unsupervised learning mode can be clustering algorithm, such as density-based algorithms.It can be with The characteristic of multiple users is clustered using clustering algorithm, the characteristic of the high user of the degree of association is polymerize, is obtained Multiple clusters.The characteristic of the high user of multiple degrees of association can be included in each cluster.
In the present embodiment, can judge respectively in each cluster with the presence or absence of the characteristic with off-note Data Matching According to.The characteristic parameter composition off-note data of numerical exception can be advanced with.When in cluster include and default off-note number It, can will be in the cluster since the degree of association of the characteristic information of multiple users in a cluster is higher during according to matched characteristic The characteristic of all users is determined as the characteristic of abnormal user.It is correspondingly, the user belonging to this feature data is true It is set to abnormal user.
Step 202, in a manner of supervised learning the characteristic based on abnormal user select key characterization parameter and Generate key feature data.
In the present embodiment, the abnormal user in multiple users are determined in a manner of unsupervised learning by step 201 Afterwards, can be selected according to the characteristic for the abnormal user determined using supervised learning mode from multiple characteristic parameters Take out for build for identify user whether be abnormal user disaggregated model key characterization parameter, that is, select to identification The more important characteristic parameter of abnormal user.
It, can be with by taking the characteristic of user includes the characteristic parameters such as account, user name, telephone number, URL of user as an example The characteristic of user to being determined by step 201 is analyzed, from the account of user, user name, telephone number, URL Characteristic parameters is waited to select the characteristic parameter more important to identification abnormal user.
Step 203, disaggregated model is built using key feature data.
It in the present embodiment, can be with after by key feature data of step 202 generation comprising key characterization parameter Disaggregated model is built using key feature data, for example, being instructed using key feature data as training sample to disaggregated model Practice, whether be that abnormal user is identified to user using the disaggregated model after training.
In the present embodiment, using the disaggregated model that key feature data construct only with the weight to identifying abnormal user Abnormal user is identified in the key characterization parameter for wanting degree higher to determine by step 202, avoid significance level compared with Interference of the low feature to identification process promotes the accuracy of identification, meanwhile, reduce the expense of identification process.
It please refers to Fig.3, shows the flow 300 of another embodiment of abnormal user recognition methods according to the application. The abnormal user recognition methods that the embodiment of the present application is provided can be performed by the server 105 in Fig. 1, and this method includes following Step:
Step 301, the characteristic based on multiple users in a manner of unsupervised learning determines the exception in multiple users User.
In the present embodiment, multiple characteristic parameters can be utilized to describe the feature of user from multiple dimensions.The feature of user Data include:The characteristic parameter of the feature of multiple instruction users, wherein, each feature corresponds to a characteristic parameter.For example, with The characteristic at family includes the characteristic parameters such as account, user name, telephone number, URL.Each use in multiple users are obtained After the characteristic at family, characteristic of the unsupervised learning mode according to multiple users may be employed, determine multiple users In abnormal user.
Step 302, crucial spy is selected using the characteristic of decision tree or NB Algorithm based on abnormal user Levy parameter and generation key feature data.
It in the present embodiment, can be first using there is supervision in order to build the disaggregated model that abnormal user is identified Mode of learning is to determining that the characteristic of the abnormal user in multiple users is analyzed by step 301, from characteristic parameter In select to build the key characterization parameter of disaggregated model, the i.e. more important parameter in abnormal user is identified.
In the present embodiment, decision tree may be employed in supervised learning mode.It can be used for being selected using decision tree Before the key characterization parameter for building disaggregated model, first with the characteristic for the abnormal user determined by step 301, Build decision tree.Decision tree is trained by regarding the characteristic of multiple abnormal users as training sample, decision tree can To learn significance level of each characteristic parameter in the characteristic of abnormal user in abnormal user is identified.It is logical utilizing It crosses in the decision tree that the characteristic of the abnormal user that step 301 is determined constructs, includes multiple nodes, each node pair One characteristic parameter, the nearer corresponding characteristic parameter of node in the position of the root node apart from decision tree is in abnormal user is identified It is more important.The feature ginseng i.e. more important more than the corresponding characteristic parameter of node of depth threshold of depth in decision tree can be chosen Key characterization parameter of the number as structure disaggregated model.
By taking the characteristic of user includes the characteristic parameters such as account, user name, telephone number, URL as an example, abnormal use is utilized In the decision tree that the characteristic at family constructs, the corresponding section of the characteristic parameters such as account, user name, telephone number, URL is included Point, in decision tree, according to characteristic parameters such as account, user name, telephone number, URL to the significance level of identification abnormal user Difference, the depth of the corresponding node of characteristic parameters in decision tree such as account, user name, telephone number, URL be also different.
It in the present embodiment, can after the key characterization parameter for building disaggregated model is selected by decision tree To select the feature for the abnormal user for meeting the following conditions from the characteristic for the abnormal user determined by step 301 Data:The classification results that decision tree classifies to the characteristic of abnormal user are abnormal user.Use decision tree The characteristic of abnormal user to being identified by step 301 is classified again, obtains classification results.When decision tree is to different It, can be by the key feature in the characteristic of the abnormal user when classification results of the characteristic at common family are abnormal user Parameter is combined, and obtains key feature data, to build disaggregated model using the key feature data.
In the present embodiment, supervised learning mode can also use NB Algorithm.Simple pattra leaves may be employed It is corresponding to calculate each characteristic parameter according to the characteristic for the abnormal user determined by step 301 respectively for this algorithm Abnormal probability, it is the probability that user is abnormal user when the numerical exception of characteristic parameter that characteristic parameter, which corresponds to abnormal probability,.It is different Normal probability can represent significance level of the characteristic parameter in abnormal user is identified.The bigger characteristic parameter of corresponding exception probability It is abnormal more important for identification.Calculated respectively by NB Algorithm the corresponding abnormal probability of each characteristic parameter it Afterwards, the characteristic parameter that corresponding abnormal probability can be more than to probability threshold value is joined as building the key feature of disaggregated model Number.
In the present embodiment, selecting to build the key characterization parameter of disaggregated model by NB Algorithm Afterwards, the abnormal use for meeting the following conditions can be selected from the characteristic for the abnormal user determined by step 301 The characteristic at family:The classification results that NB Algorithm classifies to the characteristic of abnormal user are used to be different Common family.Divided again using the characteristic of abnormal user of the NB Algorithm to being identified by step 301 Class obtains classification results.When NB Algorithm to the classification results of the characteristic of abnormal user are abnormal user when, can It is combined with the key characterization parameter in the characteristic by the abnormal user, obtains key feature data, to utilize the pass Key characteristic builds disaggregated model.
Step 303, disaggregated model is built using key feature data.
In the present embodiment, disaggregated model can be decision-tree model.Decision-tree model can be created, step will be passed through 302 generation the key feature data critical characteristics comprising key characterization parameter as training sample to decision-tree model into Row training.It is then possible to whether it is that abnormal user is identified to user using the decision-tree model after training.
In the present embodiment, the decision-tree model after training leads to only with the significance level to identifying abnormal user is higher It crosses the key characterization parameter that step 302 is determined abnormal user is identified, the feature for avoiding significance level relatively low is to identification The interference of process promotes the accuracy of identification, meanwhile, reduce the expense of identification process.
It please refers to Fig.4, it illustrates the knots for being used to identify one embodiment of the device of abnormal user according to the application Structure schematic diagram.The device embodiment is corresponding with embodiment of the method shown in Fig. 2.
As shown in figure 4, the present embodiment is used to identify that the device 400 of abnormal user to include:Recognition unit 401 is chosen single Member 402, construction unit 403.Wherein, recognition unit 401 is configured to obtain the characteristic and feature based of multiple users Data, determine the abnormal user in multiple users in a manner of unsupervised learning, and characteristic includes:The spy of multiple instruction users The characteristic parameter of sign;It chooses unit 402 and is configured to the characteristic based on the abnormal user determined, with supervised learning side Formula selects to build the key characterization parameter of disaggregated model from multiple characteristic parameters and generation is joined comprising key feature Several key feature data;Construction unit 403 is configured to build disaggregated model using key feature data, to utilize mould of classifying Whether type is that abnormal user is identified to user.
In some optional realization methods of the present embodiment, recognition unit 401 includes:Abnormal user identifies subelement (not shown) is configured to cluster the characteristic of multiple users using clustering algorithm, obtains multiple clusters;When being wrapped in cluster During containing characteristic with default off-note Data Matching, the corresponding user of all characteristics in cluster is determined as exception User.
In some optional realization methods of the present embodiment, choosing unit 402 includes:Decision tree chooses subelement (not Show), the characteristic of the abnormal user that will be determined is configured to as training sample, builds decision tree, wherein, decision tree In a node correspond to a characteristic parameter;The corresponding characteristic parameter of node that depth in decision tree is more than to depth threshold is made To be used to build the key characterization parameter of disaggregated model;Select the characteristic for the abnormal user for meeting the following conditions:Decision-making It is abnormal user to set the classification results classified to the characteristic of abnormal user;Spy to the abnormal user selected Key characterization parameter in sign data is combined, and obtains key feature data.
In some optional realization methods of the present embodiment, choosing unit 402 includes:Bayes chooses subelement (not Show), it is configured to calculate each respectively according to the characteristic for the abnormal user determined using NB Algorithm The corresponding abnormal probability of characteristic parameter, abnormal probability instruction user when the numerical exception of characteristic parameter is the general of abnormal user Rate;Corresponding abnormal probability is more than the characteristic parameter of probability threshold value as building the key characterization parameter of disaggregated model; Select the characteristic for the abnormal user for meeting the following conditions:Using NB Algorithm to the characteristic of abnormal user The classification results classified are abnormal user;To the key characterization parameter in the characteristic of the abnormal user selected It is combined, obtains key feature data.
In some optional realization methods of the present embodiment, construction unit 403 includes:Model construction subelement (does not show Go out), it is configured to create decision-tree model;Decision-tree model is trained using key feature data as training sample, with Whether it is that abnormal user is identified to user using the decision-tree model after training.
Fig. 5 shows to be used for the computer system for being used to identify the device of abnormal user for realizing the embodiment of the present application Structure diagram.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and processing.In RAM503, also it is stored with system 500 and operates required various programs and data. CPU501, ROM502 and RAM503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to bus 504。
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage part 508 including hard disk etc.; And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 510, as needed in order to read from it Computer program be mounted into as needed storage part 508.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it is machine readable including being tangibly embodied in Computer program on medium, the computer program are included for the program code of the method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communications portion 509 and/or from removable Medium 511 is unloaded to be mounted.
Flow chart and block diagram in attached drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.In this regard, each box in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for the module, program segment or code include one or more The executable instruction of logic function as defined in being used to implement.It should also be noted that some as replace realization in, institute in box The function of mark can also be occurred with being different from the order marked in attached drawing.For example, two boxes succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depending on involved function.Also It is noted that the combination of each box in block diagram and/or flow chart and the box in block diagram and/or flow chart, Ke Yiyong The dedicated hardware based systems of functions or operations as defined in execution is realized or can referred to specialized hardware and computer The combination of order is realized.
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating Machine storage medium can be nonvolatile computer storage media included in equipment described in above-described embodiment;Can also be Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited One or more program is contained, when one or more of programs are performed by an equipment so that the equipment:It obtains The characteristic of multiple users and based on the characteristic, is determined different in multiple users in a manner of unsupervised learning Common family, the characteristic include:The characteristic parameter of the feature of multiple instruction users;Spy based on the abnormal user determined Data are levied, select from multiple characteristic parameters to build the key characterization parameter of disaggregated model in a manner of supervised learning, And generation includes the key feature data of the key characterization parameter;Disaggregated model is built using the key feature data, Whether to be that abnormal user is identified to user using disaggregated model.
The preferred embodiment and the explanation to institute's application technology principle that above description is only the application.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature The other technical solutions for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical solution that the technical characteristic of energy is replaced mutually and formed.

Claims (10)

1. a kind of abnormal user recognition methods, including:
It obtains the characteristic of multiple users and based on the characteristic, multiple use is determined in a manner of unsupervised learning Abnormal user in family, the characteristic include:The characteristic parameter of the feature of multiple instruction users;
Based on the characteristic for the abnormal user determined, selected and be used for from multiple characteristic parameters in a manner of supervised learning It builds the key characterization parameter of disaggregated model and generates the key feature data for including the key characterization parameter;
Disaggregated model is built using the key feature data, whether to be that abnormal user is known to user using disaggregated model Not.
2. it according to the method described in claim 1, it is characterized in that, is determined in a manner of unsupervised learning different in multiple users Common family includes:
The characteristic of multiple users is clustered using clustering algorithm, obtains multiple clusters;
When including the characteristic with default off-note Data Matching in cluster, all characteristics in the cluster are corresponded to User be determined as abnormal user.
3. according to the method described in claim 2, it is characterized in that, the characteristic based on the abnormal user determined, to have Supervised learning mode selects to build the key characterization parameter of disaggregated model from multiple characteristic parameters and generation includes The key feature data of the key characterization parameter include:
Using the characteristic for the abnormal user determined as training sample, decision tree is built, wherein, a section in decision tree The corresponding characteristic parameter of point;
Depth in decision tree is more than the corresponding characteristic parameter of node of depth threshold as building the key of disaggregated model Characteristic parameter;
Select the characteristic for the abnormal user for meeting the following conditions:The decision tree is to the characteristic of the abnormal user The classification results classified are abnormal user;
Key characterization parameter in the characteristic of the abnormal user selected is combined, obtains key feature data.
4. according to the method described in claim 2, it is characterized in that, the characteristic based on the abnormal user determined, to have Supervised learning mode selects to build the key characterization parameter of disaggregated model from multiple characteristic parameters and generation includes The key feature data of the key characterization parameter include:
Using NB Algorithm according to the characteristic for the abnormal user determined, each characteristic parameter pair is calculated respectively The abnormal probability answered, exception probability instruction user when the numerical exception of characteristic parameter are the probability of abnormal user;
Corresponding abnormal probability is more than the characteristic parameter of probability threshold value as building the key characterization parameter of disaggregated model;
Select the characteristic for the abnormal user for meeting the following conditions:Using NB Algorithm to the abnormal user The classification results that characteristic is classified are abnormal user;
Key characterization parameter in the characteristic of the abnormal user selected is combined, obtains key feature data.
5. the method according to claim 3 or 4, which is characterized in that the disaggregated model is decision-tree model;And
Disaggregated model is built using the key feature data, whether to be that abnormal user is known to user using disaggregated model Do not include:
Create decision-tree model;
The decision-tree model is trained using key feature data as training sample, to utilize the decision tree mould after training Whether type is that abnormal user is identified to user.
6. it is a kind of for identifying the device of abnormal user, including:
Recognition unit is configured to obtain the characteristic of multiple users and based on the characteristic, with unsupervised learning Mode determines the abnormal user in multiple users, and the characteristic includes:The characteristic parameter of the feature of multiple instruction users;
Unit is chosen, the characteristic based on the abnormal user determined is configured to, from multiple spies in a manner of supervised learning The pass for building the key characterization parameter of disaggregated model and generation includes the key characterization parameter is selected in sign parameter Key characteristic;
Construction unit is configured to build disaggregated model using the key feature data, to be to user using disaggregated model It is no to be identified for abnormal user.
7. device according to claim 6, which is characterized in that recognition unit includes:
Abnormal user identifies subelement, is configured to cluster the characteristic of multiple users using clustering algorithm, obtain Multiple clusters;When including the characteristic with default off-note Data Matching in cluster, by all characteristics in the cluster Corresponding user is determined as abnormal user.
8. device according to claim 7, which is characterized in that choosing unit includes:
Decision tree chooses subelement, is configured to the characteristic for the abnormal user that will be determined as training sample, structure is determined Plan tree, wherein, a node in decision tree corresponds to a characteristic parameter;Depth in decision tree is more than to the node of depth threshold Corresponding characteristic parameter is as building the key characterization parameter of disaggregated model;Select the abnormal user for meeting the following conditions Characteristic:The classification results that the decision tree classifies to the characteristic of the abnormal user are used to be abnormal Family;Key characterization parameter in the characteristic of the abnormal user selected is combined, obtains key feature data.
9. device according to claim 8, which is characterized in that choosing unit includes:
Bayes chooses subelement, is configured to the characteristic according to the abnormal user determined using NB Algorithm According to calculating the corresponding abnormal probability of each characteristic parameter respectively, the exception probability instruction is when the numerical exception of characteristic parameter When user be abnormal user probability;The characteristic parameter that corresponding abnormal probability is more than probability threshold value is classified as building The key characterization parameter of model;Select the characteristic for the abnormal user for meeting the following conditions:Using NB Algorithm The classification results classified to the characteristic of the abnormal user are abnormal user;To the abnormal user that selects Key characterization parameter in characteristic is combined, and obtains key feature data.
10. device according to claim 9, which is characterized in that construction unit includes:
Model construction subelement is configured to create decision-tree model;It determines using key feature data as training sample to described Whether plan tree-model is trained, to be that abnormal user is identified to user using the decision-tree model after training.
CN201611051585.5A 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user Active CN108108743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611051585.5A CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611051585.5A CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Publications (2)

Publication Number Publication Date
CN108108743A true CN108108743A (en) 2018-06-01
CN108108743B CN108108743B (en) 2022-06-24

Family

ID=62204087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611051585.5A Active CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Country Status (1)

Country Link
CN (1) CN108108743B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269012A (en) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of risk score model
CN109166624A (en) * 2018-09-21 2019-01-08 广州杰赛科技股份有限公司 A kind of behavior analysis method, device, server, system and storage medium
CN109902486A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Electronic device, abnormal user processing strategie Intelligent Decision-making Method and storage medium
CN109918279A (en) * 2019-01-24 2019-06-21 平安科技(深圳)有限公司 Electronic device, method and storage medium based on daily record data identification user's abnormal operation
CN110008980A (en) * 2019-01-02 2019-07-12 阿里巴巴集团控股有限公司 Identification model generation method, recognition methods, device, equipment and storage medium
CN110570244A (en) * 2019-09-04 2019-12-13 深圳创新奇智科技有限公司 hot-selling commodity construction method and system based on abnormal user identification
WO2020078059A1 (en) * 2018-10-17 2020-04-23 阿里巴巴集团控股有限公司 Interpretation feature determination method and device for anomaly detection
WO2020143322A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 User request detection method and apparatus, computer device, and storage medium
CN112308566A (en) * 2020-09-27 2021-02-02 中智关爱通(上海)科技股份有限公司 Enterprise fraud identification method
CN113129054A (en) * 2021-03-30 2021-07-16 广州博冠信息科技有限公司 User identification method and device
CN113743963A (en) * 2021-09-28 2021-12-03 北京奇艺世纪科技有限公司 Abnormal recognition model training method, abnormal object recognition device and electronic equipment
CN113822309A (en) * 2020-09-25 2021-12-21 京东科技控股股份有限公司 User classification method, device and non-volatile computer-readable storage medium
CN113129054B (en) * 2021-03-30 2024-05-31 广州博冠信息科技有限公司 User identification method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133721A1 (en) * 2001-03-15 2002-09-19 Akli Adjaoute Systems and methods for dynamic detection and prevention of electronic fraud and network intrusion
CN103458042A (en) * 2013-09-10 2013-12-18 上海交通大学 Microblog advertisement user detection method
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN104657744A (en) * 2015-01-29 2015-05-27 中国科学院信息工程研究所 Multi-classifier training method and classifying method based on non-deterministic active learning
CN105005594A (en) * 2015-06-29 2015-10-28 嘉兴慧康智能科技有限公司 Abnormal Weibo user identification method
CN105376248A (en) * 2015-11-30 2016-03-02 睿峰网云(北京)科技股份有限公司 Method and device for identifying abnormal flow
CN105873113A (en) * 2015-01-21 2016-08-17 中国移动通信集团福建有限公司 Method and system for positioning wireless quality problem

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133721A1 (en) * 2001-03-15 2002-09-19 Akli Adjaoute Systems and methods for dynamic detection and prevention of electronic fraud and network intrusion
CN103458042A (en) * 2013-09-10 2013-12-18 上海交通大学 Microblog advertisement user detection method
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN105873113A (en) * 2015-01-21 2016-08-17 中国移动通信集团福建有限公司 Method and system for positioning wireless quality problem
CN104657744A (en) * 2015-01-29 2015-05-27 中国科学院信息工程研究所 Multi-classifier training method and classifying method based on non-deterministic active learning
CN105005594A (en) * 2015-06-29 2015-10-28 嘉兴慧康智能科技有限公司 Abnormal Weibo user identification method
CN105376248A (en) * 2015-11-30 2016-03-02 睿峰网云(北京)科技股份有限公司 Method and device for identifying abnormal flow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵秀恒等: "《概率统计模型与优化》", 30 June 2015 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269012A (en) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of risk score model
CN109166624A (en) * 2018-09-21 2019-01-08 广州杰赛科技股份有限公司 A kind of behavior analysis method, device, server, system and storage medium
WO2020078059A1 (en) * 2018-10-17 2020-04-23 阿里巴巴集团控股有限公司 Interpretation feature determination method and device for anomaly detection
TWI723476B (en) * 2018-10-17 2021-04-01 開曼群島商創新先進技術有限公司 Interpretation feature determination method, device and equipment for abnormal detection
CN110008980A (en) * 2019-01-02 2019-07-12 阿里巴巴集团控股有限公司 Identification model generation method, recognition methods, device, equipment and storage medium
WO2020143322A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 User request detection method and apparatus, computer device, and storage medium
CN109902486A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Electronic device, abnormal user processing strategie Intelligent Decision-making Method and storage medium
CN109918279A (en) * 2019-01-24 2019-06-21 平安科技(深圳)有限公司 Electronic device, method and storage medium based on daily record data identification user's abnormal operation
CN109918279B (en) * 2019-01-24 2022-09-27 平安科技(深圳)有限公司 Electronic device, method for identifying abnormal operation of user based on log data and storage medium
CN110570244A (en) * 2019-09-04 2019-12-13 深圳创新奇智科技有限公司 hot-selling commodity construction method and system based on abnormal user identification
CN113822309A (en) * 2020-09-25 2021-12-21 京东科技控股股份有限公司 User classification method, device and non-volatile computer-readable storage medium
CN113822309B (en) * 2020-09-25 2024-04-16 京东科技控股股份有限公司 User classification method, apparatus and non-volatile computer readable storage medium
CN112308566A (en) * 2020-09-27 2021-02-02 中智关爱通(上海)科技股份有限公司 Enterprise fraud identification method
CN113129054A (en) * 2021-03-30 2021-07-16 广州博冠信息科技有限公司 User identification method and device
CN113129054B (en) * 2021-03-30 2024-05-31 广州博冠信息科技有限公司 User identification method and device
CN113743963A (en) * 2021-09-28 2021-12-03 北京奇艺世纪科技有限公司 Abnormal recognition model training method, abnormal object recognition device and electronic equipment

Also Published As

Publication number Publication date
CN108108743B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN108108743A (en) Abnormal user recognition methods and the device for identifying abnormal user
US11741361B2 (en) Machine learning-based network model building method and apparatus
US11593458B2 (en) System for time-efficient assignment of data to ontological classes
Shanthamallu et al. A brief survey of machine learning methods and their sensor and IoT applications
KR102252081B1 (en) Acquisition of image characteristics
CN112232925A (en) Method for carrying out personalized recommendation on commodities by fusing knowledge maps
CN111966904B (en) Information recommendation method and related device based on multi-user portrait model
US20220284349A1 (en) Techniques to generate network simulation scenarios
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
CN108229485A (en) For testing the method and apparatus of user interface
US20080189237A1 (en) Goal seeking using predictive analytics
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
CN108111399B (en) Message processing method, device, terminal and storage medium
CN110457476A (en) Method and apparatus for generating disaggregated model
CN107679737A (en) The method and device of project recommendation
CN110708285A (en) Flow monitoring method, device, medium and electronic equipment
CN111459898A (en) Machine learning method, computer-readable recording medium, and machine learning apparatus
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN104077408B (en) Extensive across media data distributed semi content of supervision method for identifying and classifying and device
CN109961075A (en) User gender prediction method, apparatus, medium and electronic equipment
CN109961163A (en) Gender prediction's method, apparatus, storage medium and electronic equipment
CN105357583A (en) Method and device for discovering interest and preferences of intelligent television user
WO2019062404A1 (en) Application program processing method and apparatus, storage medium, and electronic device
CN114898184A (en) Model training method, data processing method and device and electronic equipment
CN114861004A (en) Social event detection method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant