CN110674839B - Abnormal user identification method and device, storage medium and electronic equipment - Google Patents

Abnormal user identification method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110674839B
CN110674839B CN201910760605.3A CN201910760605A CN110674839B CN 110674839 B CN110674839 B CN 110674839B CN 201910760605 A CN201910760605 A CN 201910760605A CN 110674839 B CN110674839 B CN 110674839B
Authority
CN
China
Prior art keywords
user
tag
preset
learning model
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910760605.3A
Other languages
Chinese (zh)
Other versions
CN110674839A (en
Inventor
高呈琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910760605.3A priority Critical patent/CN110674839B/en
Publication of CN110674839A publication Critical patent/CN110674839A/en
Application granted granted Critical
Publication of CN110674839B publication Critical patent/CN110674839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure relates to an abnormal user identification method, an abnormal user identification device, a storage medium and electronic equipment, belonging to the technical field of machine learning application, wherein the method comprises the following steps: acquiring network behavior information in a preset time period of a user; acquiring a first label of a user from network behavior information in a preset time period of the user; inputting a first label of a user into a first learning model, and outputting a first probability that the user is an abnormal user; adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to a first learning model according to the tag modification element to obtain a cascaded first learning model; and inputting the modified first label into a first learning model after cascade connection to obtain a recognition result of whether the user is an abnormal user. According to the method and the device, the key labels of the user logs are extracted, and based on the machine learning model and the automatic modification of the key labels, the efficiency and the accuracy of abnormal user identification are effectively improved.

Description

Abnormal user identification method and device, storage medium and electronic equipment
Technical Field
The disclosure relates to the technical field of machine learning application, in particular to an abnormal user identification method, an abnormal user identification device, a storage medium and electronic equipment.
Background
Abnormal user identification is to identify whether a user performing a certain network behavior has an behavior which is not in compliance with the specification.
At present, when identifying abnormal users in network behaviors, log monitoring is generally performed on operation logs and the like of the users in the network behaviors to judge whether the users have the abnormality. In the prior art, the network behavior log of the user is monitored to identify whether the user is abnormal or not mainly through the frequency of key abnormal behaviors, and the statistical workload of log data is large, so that the problem of low identification accuracy and efficiency of the abnormal user exists.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The purpose of the present disclosure is to provide an abnormal user identification scheme, and further to at least some extent, through extracting the key label of the user log, based on the machine learning model and the automatic modification of the key label, the efficiency and accuracy of abnormal user identification are effectively improved.
According to one aspect of the present disclosure, there is provided an abnormal user identification method including:
acquiring network behavior information of a user in a preset time period before a current time point from a user log;
acquiring a first label of a user from network behavior information of the user in a preset time period before a current time point;
inputting a first label of a user into a first learning model, and outputting a first probability that the user is an abnormal user by the first learning model;
when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model;
and inputting the modified first label into the first cascaded learning model to obtain a recognition result of whether the user is an abnormal user.
In an exemplary embodiment of the disclosure, after the inputting the first label of the user into the first learning model, outputting, by the first learning model, the first probability that the user is an abnormal user, the method further comprises:
Acquiring all network behavior information of a user before a current time point from a user log;
acquiring a second label of the user from all network behavior information of the user before the current time point;
inputting a second label of the user into a second learning model, and outputting a second probability that the user is an abnormal user by the second learning model;
based on the first probability and the second probability, it is determined whether the user is an abnormal user.
In one exemplary embodiment of the present disclosure, the first tag includes a first behavior tag, a first object tag, a first time tag, a first place tag,
the obtaining the first label of the user from the network behavior information of the user in a preset time period before the current time point comprises the following steps:
if the user initiates a specific behavior exceeding a first time number threshold in a preset time period before the current time point, taking the specific behavior as a first behavior label of the user;
if the number of behaviors of the user for the specific object in a preset time period before the current time point exceeds a second number threshold, the specific object is used as a first object label of the user;
if the number of times that the user has specific actions in a preset time period before the current time point falls in a preset first time interval exceeds a third time threshold, taking the preset time interval as a first time label of the user;
And if the number of times that the geographic position reported by the user terminal falls in a first preset area when the specific action occurs in a preset time period before the current time point exceeds a fourth time threshold value, taking the first preset area as a first place label of the user.
In one exemplary embodiment of the present disclosure, the second tag includes a second behavior tag, a second object tag, a second time tag, a second place tag,
the obtaining the second label of the user from all network behavior information of the user before the current time point comprises the following steps:
if the specific behavior is initiated by the user in all time periods before the current time point and exceeds the fifth time threshold, the specific behavior is used as a second behavior label of the user;
if the number of the behaviors of the user for the specific object in all time periods before the current time point exceeds a sixth number threshold, taking the specific object as a second object label of the user;
if the times of the specific behavior of the user in all time periods before the current time point in the preset second time interval exceeds a seventh time threshold, taking the specific time interval as a second time label of the user;
And if the times of the geographic position reported by the user terminal falling in the second preset area when the user acts in all time periods before the current time point exceeds an eighth time threshold, taking the second preset area as a second place label of the user.
In an exemplary embodiment of the present disclosure, after adding a tag modification element to the first tag of the user according to the magnitude of the first probability, obtaining a modified first tag includes:
acquiring a modification element table associated with the network behavior, wherein modification elements corresponding to different preset probability ranges are stored in the modification element table;
searching a preset probability range corresponding to the first probability according to the first probability to obtain the modification element corresponding to the first probability;
and adding the modification element to the first label to obtain the modified first label.
In an exemplary embodiment of the present disclosure, after cascading the preset sub-model to the first learning model according to the tag modification element, obtaining a first learning model after cascading includes:
acquiring a preset sub-model corresponding to the tag modification element from a preset model library according to the tag modification element;
And cascading the preset sub model with the first learning model to obtain a first learning model after cascading.
In an exemplary embodiment of the present disclosure, after the cascading the preset sub-model and the first learning model, obtaining the first learning model after cascading includes:
and embedding the preset sub-model into a cascade position preset in the first learning model to obtain the first learning model after cascade.
According to an aspect of the present disclosure, there is provided an abnormal user identification apparatus, comprising:
the first acquisition module is used for acquiring network behavior information of a user in a preset time period before a current time point from a user log;
the second acquisition module is used for acquiring a first label of the user from network behavior information in a preset time period before the current time point;
the prediction module comprises a first label input module for inputting a first learning model of a user, and a first probability that the user is an abnormal user is output by the first learning model;
the modification module is used for adding a tag modification element to a first tag of the user according to the first probability when the first probability exceeds a preset threshold value to obtain a modified first tag, and cascading a preset submodel to the first learning model according to the tag modification element to obtain a cascaded first learning model;
The identification module is used for inputting the modified first label into the first cascaded learning model to obtain an identification result of whether the user is an abnormal user.
According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon an abnormal user identification program, characterized in that the abnormal user identification program, when executed by a processor, implements the method of any one of the above.
According to an aspect of the present disclosure, there is provided an electronic apparatus, including:
a processor; and
a memory for storing an abnormal user identification program of the processor; wherein the processor is configured to perform the method of any of the above via execution of the abnormal user identification program.
The invention discloses an abnormal user identification method, an abnormal user identification device, a storage medium and electronic equipment, wherein firstly, network behavior information of a user in a preset time period before a current time point is obtained from a user log; therefore, the analysis efficiency can be improved while the identification accuracy of the abnormal user in the subsequent step can be ensured by acquiring the user network behavior information in the preset time period. Then, acquiring a first label of the user from network behavior information of the user in a preset time period before the current time point; the user network behavior information can be simplified by extracting the user labels, so that the analysis accuracy of the learning model in the subsequent step is ensured, and the model calculation efficiency is improved. Further, a first label of the user is input into a first learning model, and a first probability that the user is an abnormal user is output by the first learning model; the first learning model is trained in advance, so that the first probability that the user is an abnormal user can be obtained efficiently and accurately. Then when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model; when the first probability exceeds a preset threshold, namely when a user reflected by the first probability has preliminary abnormality, the first label is modified through the label modification element, so that the first label can be enriched in content, meanwhile, the first learning model is synchronously cascaded and preset with the sub-model, the first learning model and the first label can synchronously evolve, and whether the user has abnormality is further judged in the subsequent steps. Finally, inputting the modified first label into the first cascaded learning model to obtain a recognition result of whether the user is an abnormal user; the accuracy of the abnormal user identification result can be effectively improved according to the first label after the content enrichment through the machine learning model with stronger cascade post-processing capability.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
Fig. 1 schematically shows a flow chart of an abnormal user identification method.
Fig. 2 schematically shows an example diagram of an application scenario of an abnormal user identification method.
Fig. 3 schematically shows a flow chart of a method of modifying a first tag.
Fig. 4 schematically shows a block diagram of an abnormal user identification apparatus.
Fig. 5 schematically shows an example block diagram of an electronic device for implementing the above-described abnormal user identification method.
Fig. 6 schematically shows a computer readable storage medium for implementing the above-described abnormal user identification method.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In this exemplary embodiment, an abnormal user identification method is provided first, where the abnormal user identification method may be run on a server, or may be run on a server cluster or a cloud server, or the like, and of course, those skilled in the art may also run the method of the present invention on other platforms according to requirements, which is not limited in particular in this exemplary embodiment. Referring to fig. 1, the abnormal user identification method may include the steps of:
step S110, acquiring network behavior information of a user in a preset time period before a current time point from a user log;
Step S120, a first label of a user is obtained from network behavior information in a preset time period before a current time point;
step S130, inputting a first label of a user into a first learning model, and outputting a first probability that the user is an abnormal user by the first learning model;
step S140, when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, and cascading a preset submodel to the first learning model according to the tag modification element to obtain a cascaded first learning model;
and step S150, inputting the modified first label into the first cascaded learning model to obtain the identification result of whether the user is an abnormal user.
In the abnormal user identification method, firstly, network behavior information of a user in a preset time period before a current time point is obtained from a user log; therefore, the analysis efficiency can be improved while the identification accuracy of the abnormal user in the subsequent step can be ensured by acquiring the user network behavior information in the preset time period. Then, acquiring a first label of the user from network behavior information of the user in a preset time period before the current time point; the user network behavior information can be simplified by extracting the user labels, so that the analysis accuracy of the learning model in the subsequent step is ensured, and the model calculation efficiency is improved. Further, a first label of the user is input into a first learning model, and a first probability that the user is an abnormal user is output by the first learning model; the first learning model is trained in advance, so that the first probability that the user is an abnormal user can be obtained efficiently and accurately. Then when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model; when the first probability exceeds a preset threshold, namely when a user reflected by the first probability has preliminary abnormality, the first label is modified through the label modification element, so that the first label can be enriched in content, meanwhile, the first learning model is synchronously cascaded and preset with the sub-model, the first learning model and the first label can synchronously evolve, and whether the user has abnormality is further judged in the subsequent steps. Finally, inputting the modified first label into the first cascaded learning model to obtain a recognition result of whether the user is an abnormal user; the accuracy of the abnormal user identification result can be effectively improved according to the first label after the content enrichment through the machine learning model with stronger cascade post-processing capability.
Next, each step in the abnormal user identification method according to the present exemplary embodiment will be explained and described in detail with reference to the accompanying drawings.
In step S110, network behavior information of the user in a predetermined period of time before the current point of time is acquired from the user log.
In the embodiment of the present example, referring to fig. 2, a server 201 acquires network behavior information of a user in a predetermined period of time before a current point of time from a user log on a server 202. This allows the user behavior to be determined in a subsequent step by the server 201 based on the network behavior information in a predetermined period of time. It will be appreciated that, where conditions allow, the server 202 may also acquire network behavior information of the user in a predetermined period of time before the current point of time from the user log stored on itself, and the server 202 may determine the user behavior in a subsequent step based on the network behavior information in the predetermined period of time. The server 201 may be any device with processing capability, such as a computer, a microprocessor, etc., which is not limited herein, and the server 202 may be any device with instruction sending capability and data storage capability, such as a mobile phone, a computer, etc., which is not limited herein.
The user log is a log in which network behavior information such as the number of accesses to a certain website at all times of the user is recorded.
The network behavior information of the user in a predetermined period of time before the current time point is acquired from the user log, that is, the time point at which the user behavior analysis instruction is received, that is, the current time point, is crawled, and the network behavior information of one month or 3 weeks before the current time point is acquired. Therefore, the analysis efficiency can be improved while the identification accuracy of the abnormal user in the subsequent step can be ensured by acquiring the user network behavior information in the preset time period.
In step S120, a first tag of the user is acquired from the network behavior information in a predetermined period of time before the current point of time.
In this exemplary embodiment, the first tag is key information extracted according to network behavior information of the user, for example, the number of search keywords or the number of occurrences of the target behavior in a predetermined period of time before the current time point of the user is counted, and the keyword is obtained as the first tag.
Therefore, the user network behavior information can be simplified by extracting the first label of the user, and the model calculation efficiency is improved while the analysis accuracy of the learning model in the subsequent step is ensured.
In one embodiment of the present example, the first tag includes a first behavior tag, a first object tag, a first time tag, a first location tag,
the obtaining the first label of the user from the network behavior information of the user in a preset time period before the current time point comprises the following steps:
if the user initiates a specific behavior exceeding a first time number threshold in a preset time period before the current time point, taking the specific behavior as a first behavior label of the user;
if the number of behaviors of the user for the specific object in a preset time period before the current time point exceeds a second number threshold, the specific object is used as a first object label of the user;
if the number of times that the user has specific actions in a preset time period before the current time point falls in a preset first time interval exceeds a third time threshold, taking the preset time interval as a first time label of the user;
and if the number of times that the geographic position reported by the user terminal falls in a first preset area when the specific action occurs in a preset time period before the current time point exceeds a fourth time threshold value, taking the first preset area as a first place label of the user.
If the specific behavior is initiated by the user in a preset time period before the current time point and exceeds a first time number threshold, wherein the specific behavior is some network behaviors related to abnormal behaviors of the user, such as accessing a target webpage, and when the specific behavior exceeds a preset first time number threshold for explaining the behavior too frequently, the specific behavior is used as a first behavior label of the user, and the specific behavior with the possibility of abnormality of the user can be accurately determined as the label.
If the number of behaviors of the user for the specific object exceeds a second number threshold in a preset time period before the current time point, wherein the specific object is some network objects associated with abnormal behaviors of the user, such as a target webpage and the like, when the behaviors of the user for the specific object exceed a preset second number threshold for explaining the excessive output behaviors of the specific object, the specific object is used as a first object label of the user, and the specific object with the possibility of abnormality of the user can be accurately determined as the label.
If the number of times that the time of the specific behavior of the user in the preset time period before the current time point falls in the preset first time interval for the specific behavior rule of the user exceeds a third time threshold, the preset time interval is used as a first time label of the user, so that the behavior rule of the user can be accurately extracted;
If the number of times that the geographic position reported by the user terminal falls in a first preset area for identifying that the specific behavior of the user has abnormal suspicion exceeds a fourth time threshold when the specific behavior of the user occurs in a preset time period before the current time point, the first preset area is used as a first place label of the user, and the behavior place of the user can be accurately extracted.
Furthermore, a first behavior label, a first object label, a first time label and a first place label which can accurately represent the behavior characteristics of the user can be obtained.
In step S130, a first label of the user is input to the first learning model, and a first probability that the user is an abnormal user is output from the first learning model.
In the embodiment of the present example, the first learning model is trained in advance, so that the first probability that the user is an abnormal user can be efficiently and accurately calculated according to the first label. The first probability is used to indicate the probability that the user has a suspicion of an abnormal user, and the larger the first probability is, the higher the probability that the user is an abnormal user.
In one embodiment of the present example, the training method of the first learning model is:
collecting a first label sample set of a user, wherein each sample in the sample set is used for calibrating a first probability that the user belongs to an abnormal user in advance;
Respectively inputting the input data of each sample in the sample set into a learning model to obtain the probability that the user corresponding to each sample belongs to an abnormal user;
if the probability that the user corresponding to the sample belongs to the abnormal user is inconsistent with the first probability that the user calibrated in advance for the sample belongs to the abnormal user after the input data of the sample is input into the learning model, adjusting the coefficient of the learning model until the probability is consistent;
after the input data of all the samples are input into the learning model, the obtained probability that the user corresponding to each sample belongs to the abnormal user is consistent with the first probability that the user calibrated in advance for each sample belongs to the abnormal user, and training is finished.
The first label sample of the user is a first label sample obtained in a predetermined period of time in a user log in history. And (3) collecting a first label sample set of the user as input of a first learning model, wherein each sample is used for calibrating a first probability that the corresponding user belongs to an abnormal user in advance by an expert as output of a machine learning model. Then, after the input data of all the samples are input into the learning model, the probability that the obtained user corresponding to each sample belongs to the abnormal user is consistent with the first probability that the user calibrated in advance for each sample belongs to the abnormal user, and training is finished, so that training accuracy can be effectively ensured.
In step S140, when the first probability exceeds a predetermined threshold, adding a tag modification element to the first tag of the user according to the first probability to obtain a modified first tag, and cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model.
In this example embodiment, when the first probability exceeds a preset threshold value for explaining that the user has a preliminary suspicion of an abnormal user, adding a tag modification element to the first tag of the user according to the first probability to obtain a modified first tag, that is, obtaining the modified first tag which can more comprehensively reflect the user behavior by using the modification element for enriching the first tag of the user and more accurately characterizing the user behavior, and further judging whether the user is an abnormal user or not. The tag modifier is, for example, detailed data such as a behavior frequency of each tag smaller dimension, such as a first behavior tag, a first object tag, a first time tag, a first place tag, and the like in the first tag.
And after the first learning model is cascaded with the preset sub-model according to the tag modification element, the first learning model after cascade is obtained, namely, due to the increase of analysis data used for input, the processing content of the first learning model is required to be increased, and the first learning model after cascade which is suitable for the first tag after modification can be obtained by adding the preset sub-templates such as the analysis function and the like which are suitable for the modification element, so that the accuracy of analysis is effectively ensured.
In one embodiment of the present example, referring to fig. 3, after adding a tag modification element to the first tag of the user according to the first probability, the method includes:
step 310, obtaining a modification element table associated with the network behavior, wherein modification elements corresponding to different preset probability ranges are stored in the modification element table;
step 320, searching a preset probability range corresponding to the first probability according to the first probability to obtain the modification element corresponding to the first probability;
and 330, adding the modification element to the first label to obtain the modified first label.
The modifier list associated with the network behavior is a preset probability range which is preset and records the corresponding different modifier elements, and the probability range corresponds to the first probability output by the first learning model. The different probability ranges indicate that the user has different preliminary suspicions, so that the first label of the user needs different rich contents, i.e. different modifier elements need to be modified, when further analysis is needed. The preset probability range corresponding to the first probability can be accurately obtained by searching the preset probability range corresponding to the first probability according to the first probability, further the modification element corresponding to the first probability is obtained, and then the modified first label which can more comprehensively reflect the user behavior can be accurately obtained by connecting modification element data to the first label in series. The data corresponding to each modification element is crawled from the log.
In one embodiment of the present example, after the cascade connection of the preset sub-models to the first learning model according to the tag modification element, the cascade connection of the first learning model is obtained, including:
acquiring a preset sub-model corresponding to the tag modification element from a preset model library according to the tag modification element;
and cascading the preset sub model with the first learning model to obtain a first learning model after cascading.
The preset model library stores preset sub-models, such as processing functions, function modules and the like, which correspond to each tag modification element and are used for analyzing and processing the modification tag element data. According to the label modification elements, a preset sub-model corresponding to the label modification elements can be accurately found from a preset model library; and then, cascading the preset sub-model to the cascade preset cascading position of the first learning model, so that a cascade first learning model applicable to the modified first label can be accurately obtained.
In an embodiment of the present example, after cascading the preset sub-model with the first learning model, obtaining the first learning model after cascading includes:
And embedding the preset sub-model into a cascade position preset in the first learning model to obtain the first learning model after cascade.
In step S150, the modified first label is input into the first learning model after cascade connection, so as to obtain a recognition result of whether the user is an abnormal user.
In the embodiment of the present example, the accuracy of the abnormal user identification result may be effectively improved according to the first label after the content enrichment by using the machine learning model with stronger cascade post-processing capability.
In one embodiment of the present example, after the inputting the first label of the user into the first learning model and outputting the first probability that the user is an abnormal user by the first learning model, the method further includes:
acquiring all network behavior information of a user before a current time point from a user log;
acquiring a second label of the user from all network behavior information of the user before the current time point;
inputting a second label of the user into a second learning model, and outputting a second probability that the user is an abnormal user by the second learning model;
based on the first probability and the second probability, it is determined whether the user is an abnormal user.
The second labels of all time periods of the user can be extracted by acquiring network behavior information of all time points of the user before the current time point from the user log, and the second probability of the abnormal user of the user obtained by analysis of all time periods can be obtained based on a pre-trained second learning model suitable for all time periods. And then, combining the first probability and the second probability to determine whether the user is an abnormal user, and effectively ensuring the judgment accuracy of whether the user is the abnormal user through the short-term behavior data and the long-term behavior data of the user.
In one embodiment of the present example, the second tag includes a second behavior tag, a second object tag, a second time tag, a second place tag,
the obtaining the second label of the user from all network behavior information of the user before the current time point comprises the following steps:
if the specific behavior is initiated by the user in all time periods before the current time point and exceeds the fifth time threshold, the specific behavior is used as a second behavior label of the user;
if the number of the behaviors of the user for the specific object in all time periods before the current time point exceeds a sixth number threshold, taking the specific object as a second object label of the user;
If the times of the specific behavior of the user in all time periods before the current time point in the preset second time interval exceeds a seventh time threshold, taking the specific time interval as a second time label of the user;
and if the times of the geographic position reported by the user terminal falling in the second preset area when the user acts in all time periods before the current time point exceeds an eighth time threshold, taking the second preset area as a second place label of the user.
In one embodiment of the present example, the determining whether the user is an abnormal user based on the first probability and the second probability includes:
if the weighted sum of the first probability and the second probability is greater than a predetermined weighted sum threshold, determining that the user is an abnormal user.
The predetermined weighted sum threshold is a threshold that accounts for the risk of abnormality for the user based on the weighted sum of the first probability and the second probability. The weighted sum of the first probability and the second probability is obtained by weighting the first probability and the second probability by the weight corresponding to the time period reserved by the user and the weight corresponding to the whole time period of the user. Thus, whether the user is an abnormal user or not can be effectively reflected by combining the first probability with the second probability.
The disclosure also provides an abnormal user identification device. Referring to fig. 4, the abnormal user identification apparatus may include a first acquisition module 410, a second acquisition module 420, a prediction module 430, a decoration module 440, and an identification module 450. Wherein:
the first obtaining module 410 may be configured to obtain, from a user log, network behavior information of a user in a predetermined period of time before a current point of time;
the second obtaining module 420 may be configured to obtain a first tag of the user from the network behavior information of the user in a predetermined period of time before the current point of time;
the prediction module 430 may include inputting a first label of a user into a first learning model, outputting a first probability that the user is an abnormal user from the first learning model;
the modifying module 440 may include adding a tag modifying element to the first tag of the user according to the first probability when the first probability exceeds a predetermined threshold, obtaining a modified first tag, and cascading a preset sub-model to the first learning model according to the tag modifying element, to obtain a cascaded first learning model;
the identifying module 450 may include inputting the modified first tag into the cascaded first learning model to obtain an identification result of whether the user is an abnormal user.
The specific details of each module in the abnormal user identification apparatus are described in detail in the corresponding abnormal user identification method, so that the details are not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to such an embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 connecting the various system components, including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 such that the processing unit 510 performs steps according to various exemplary embodiments of the present invention described in the above section of the "exemplary method" of the present specification. For example, the processing unit 510 may perform step S110 as shown in fig. 1: acquiring network behavior information of a user in a preset time period before a current time point from a user log; s120: acquiring a first label of a user from network behavior information of the user in a preset time period before a current time point; step S130: inputting a first label of a user into a first learning model, and outputting a first probability that the user is an abnormal user by the first learning model; step S140: when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model; step S150: and inputting the modified first label into the first cascaded learning model to obtain a recognition result of whether the user is an abnormal user.
The storage unit 520 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 5201 and/or cache memory unit 5202, and may further include Read Only Memory (ROM) 5203.
The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a client to interact with the electronic device 500, and/or any device (e.g., router, modem, etc.) that enables the electronic device 500 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 550. Also, electronic device 500 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 560. As shown, network adapter 560 communicates with other modules of electronic device 500 over bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the client computing device, partly on the client device, as a stand-alone software package, partly on the client computing device and partly on a remote computing device or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the client computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present application, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (8)

1. An abnormal user identification method, comprising:
acquiring network behavior information of a user in a preset time period before a current time point from a user log;
acquiring a first label of a user from network behavior information of the user in a preset time period before a current time point; the first tag is key information extracted according to network behavior information of a user;
Inputting a first label of a user into a first learning model, and outputting a first probability that the user is an abnormal user by the first learning model;
when the first probability exceeds a preset threshold, adding a tag modification element to a first tag of the user according to the first probability to obtain a modified first tag, cascading a preset sub-model to the first learning model according to the tag modification element to obtain a cascaded first learning model;
inputting the modified first label into the first cascaded learning model to obtain a recognition result of whether the user is an abnormal user;
the step of adding a tag modification element to the first tag of the user according to the first probability to obtain a modified first tag comprises the following steps: acquiring a modification element table associated with the network behavior, wherein modification elements corresponding to different preset probability ranges are stored in the modification element table; searching a preset probability range corresponding to the first probability according to the first probability to obtain the modification element corresponding to the first probability; adding the modification element to the first label to obtain a modified first label;
After the first learning model is cascaded with a preset sub-model according to the tag modification element, a first learning model after cascade is obtained, which comprises the following steps: acquiring a preset sub-model corresponding to the tag modification element from a preset model library according to the tag modification element; and cascading the preset sub model with the first learning model to obtain a first learning model after cascading.
2. The method of claim 1, wherein after the inputting the first label of the user into the first learning model, outputting the first probability by the first learning model that the user is an abnormal user, the method further comprises:
acquiring all network behavior information of a user before a current time point from a user log;
acquiring a second label of the user from all network behavior information of the user before the current time point;
inputting a second label of the user into a second learning model, and outputting a second probability that the user is an abnormal user by the second learning model;
based on the first probability and the second probability, it is determined whether the user is an abnormal user.
3. The method of claim 1, wherein the first tag comprises a first behavior tag, a first object tag, a first time tag, a first location tag,
The obtaining the first label of the user from the network behavior information of the user in a preset time period before the current time point comprises the following steps:
if the user initiates a specific behavior exceeding a first time number threshold in a preset time period before the current time point, taking the specific behavior as a first behavior label of the user;
if the number of behaviors of the user for the specific object in a preset time period before the current time point exceeds a second number threshold, the specific object is used as a first object label of the user;
if the number of times that the time of the specific action of the user in the preset time period before the current time point falls in the preset first time interval exceeds a third time threshold, taking the preset first time interval as a first time label of the user;
and if the number of times that the geographic position reported by the user terminal falls in a first preset area when the specific action occurs in a preset time period before the current time point exceeds a fourth time threshold value, taking the first preset area as a first place label of the user.
4. The method of claim 2, wherein the second tag comprises a second behavior tag, a second object tag, a second time tag, a second location tag,
The obtaining the second label of the user from all network behavior information of the user before the current time point comprises the following steps:
if the specific behavior is initiated by the user in all time periods before the current time point and exceeds the fifth time threshold, the specific behavior is used as a second behavior label of the user;
if the number of the behaviors of the user for the specific object in all time periods before the current time point exceeds a sixth number threshold, taking the specific object as a second object label of the user;
if the times of the specific behavior of the user in all time periods before the current time point in the preset second time interval exceeds a seventh time threshold value, taking the preset second time interval as a second time label of the user;
and if the times of the geographic position reported by the user terminal falling in the second preset area when the user acts in all time periods before the current time point exceeds an eighth time threshold, taking the second preset area as a second place label of the user.
5. The method of claim 1, wherein the cascading the preset sub-model with the first learning model to obtain the cascaded first learning model comprises:
And embedding the preset sub-model into a cascade position preset in the first learning model to obtain the first learning model after cascade.
6. An abnormal user identification apparatus, comprising:
the first acquisition module is used for acquiring network behavior information of a user in a preset time period before a current time point from a user log;
the second acquisition module is used for acquiring a first label of the user from network behavior information in a preset time period before the current time point; the first tag is key information extracted according to network behavior information of a user;
the prediction module comprises a first label input module for inputting a first learning model of a user, and a first probability that the user is an abnormal user is output by the first learning model;
the modification module is used for adding a tag modification element to a first tag of the user according to the first probability when the first probability exceeds a preset threshold value to obtain a modified first tag, and cascading a preset submodel to the first learning model according to the tag modification element to obtain a cascaded first learning model;
the identification module is used for inputting the modified first label into the first cascaded learning model to obtain an identification result of whether the user is an abnormal user or not;
The step of adding a tag modification element to the first tag of the user according to the first probability to obtain a modified first tag comprises the following steps: acquiring a modification element table associated with the network behavior, wherein modification elements corresponding to different preset probability ranges are stored in the modification element table; searching a preset probability range corresponding to the first probability according to the first probability to obtain the modification element corresponding to the first probability; adding the modification element to the first label to obtain a modified first label;
after the first learning model is cascaded with a preset sub-model according to the tag modification element, a first learning model after cascade is obtained, which comprises the following steps: acquiring a preset sub-model corresponding to the tag modification element from a preset model library according to the tag modification element; and cascading the preset sub model with the first learning model to obtain a first learning model after cascading.
7. A computer readable storage medium having stored thereon an abnormal user identification program, wherein the abnormal user identification program, when executed by a processor, implements the method of any of claims 1-5.
8. An electronic device, comprising:
a processor; and
a memory for storing an abnormal user identification program of the processor; wherein the processor is configured to perform the method of any of claims 1-5 via execution of the abnormal user identification program.
CN201910760605.3A 2019-08-16 2019-08-16 Abnormal user identification method and device, storage medium and electronic equipment Active CN110674839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910760605.3A CN110674839B (en) 2019-08-16 2019-08-16 Abnormal user identification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910760605.3A CN110674839B (en) 2019-08-16 2019-08-16 Abnormal user identification method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110674839A CN110674839A (en) 2020-01-10
CN110674839B true CN110674839B (en) 2023-11-24

Family

ID=69075515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910760605.3A Active CN110674839B (en) 2019-08-16 2019-08-16 Abnormal user identification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110674839B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569949A (en) * 2021-07-28 2021-10-29 广州博冠信息科技有限公司 Abnormal user identification method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775929A (en) * 2016-11-25 2017-05-31 中国科学院信息工程研究所 A kind of virtual platform safety monitoring method and system
CN107818344A (en) * 2017-10-31 2018-03-20 上海壹账通金融科技有限公司 The method and system that user behavior is classified and predicted
CN109241418A (en) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 Abnormal user recognition methods and device, equipment, medium based on random forest
CN109687991A (en) * 2018-09-07 2019-04-26 平安科技(深圳)有限公司 User behavior recognition method, apparatus, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751414B2 (en) * 2011-05-04 2014-06-10 International Business Machines Corporation Identifying abnormalities in resource usage
US10785244B2 (en) * 2017-12-15 2020-09-22 Panasonic Intellectual Property Corporation Of America Anomaly detection method, learning method, anomaly detection device, and learning device
WO2019147980A1 (en) * 2018-01-26 2019-08-01 Ge Inspection Technologies, Lp Anomaly detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775929A (en) * 2016-11-25 2017-05-31 中国科学院信息工程研究所 A kind of virtual platform safety monitoring method and system
CN107818344A (en) * 2017-10-31 2018-03-20 上海壹账通金融科技有限公司 The method and system that user behavior is classified and predicted
CN109241418A (en) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 Abnormal user recognition methods and device, equipment, medium based on random forest
CN109687991A (en) * 2018-09-07 2019-04-26 平安科技(深圳)有限公司 User behavior recognition method, apparatus, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种无监督的数据库用户行为异常检测方法;李海斌 等;小型微型计算机系统;第39卷(第11期);第2464-2472页 *

Also Published As

Publication number Publication date
CN110674839A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN110020422B (en) Feature word determining method and device and server
CN109743311B (en) WebShell detection method, device and storage medium
CN111241453B (en) Page access duration acquisition method and device, medium and electronic equipment
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN110674009B (en) Application server performance monitoring method and device, storage medium and electronic equipment
CN111552633A (en) Interface abnormal call testing method and device, computer equipment and storage medium
CN107145446B (en) Application program APP test method, device and medium
CN111416728B (en) Method, system, equipment and medium for predicting session end and online customer service
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN110727437A (en) Code optimization item acquisition method and device, storage medium and electronic equipment
CN110162518B (en) Data grouping method, device, electronic equipment and storage medium
CN109408556B (en) Abnormal user identification method and device based on big data, electronic equipment and medium
CN110704614B (en) Information processing method and device for predicting user group type in application
CN109284450B (en) Method and device for determining order forming paths, storage medium and electronic equipment
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN110674839B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN104580109A (en) Method and device for generating click verification code
CN110348581B (en) User feature optimizing method, device, medium and electronic equipment in user feature group
CN113656391A (en) Data detection method and device, storage medium and electronic equipment
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN110032624B (en) Sample screening method and device
CN113190746A (en) Recommendation model evaluation method and device and electronic equipment
CN113111200A (en) Method and device for auditing picture file, electronic equipment and storage medium
CN115314404B (en) Service optimization method, device, computer equipment and storage medium
CN109218411B (en) Data processing method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant