CN112348321A

CN112348321A - Risk user identification method and device and electronic equipment

Info

Publication number: CN112348321A
Application number: CN202011134322.7A
Authority: CN
Inventors: 范晓波; 胡弘毅
Original assignee: Shanghai Qiyue Information Technology Co Ltd
Current assignee: Shanghai Qiyue Information Technology Co Ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-02-09

Abstract

The disclosure relates to a method and a device for identifying a risk user, electronic equipment and a computer readable medium. The method comprises the following steps: acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; and when the risk score is larger than a threshold value, determining the current user as a risk user. According to the method and device for identifying the risk users, the electronic equipment and the computer readable medium, an accurate and effective risk identification model can be generated under the condition that training data are incomplete or sample labels are lost in the training data, so that the risks of the current users are identified, and the safety of enterprise resources is ensured.

Description

Risk user identification method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer information processing, and in particular, to a method and an apparatus for identifying a risky user, an electronic device, and a computer-readable medium.

Background

With the continuous and high-speed development of financial science and technology in recent years and the various challenges of sinking of customer groups, rapid change of economic situation and the like, all internet financial service institutions face more complex and rapidly changing resource fraud risks. The main ways of dealing with the resource fraud risk detection of each internet financial service institution can be divided into two categories, namely a rule-based dealing way and a machine learning algorithm-based dealing way.

The rules mainly depend on business experience or characteristic classification of management personnel, and some rules are set for screening users, so that certain subjectivity exists, and carelessness is difficult to avoid sometimes. Compared with a coping scheme based on machine learning, the coping scheme is more objective, and the accuracy has a clearer and more reliable quantification standard. The most common algorithms at present mainly include logistic regression, naive Bayes, support vector machines, random forests and the like.

Then, most methods in the machine learning method have a problem that the sample data for training the machine learning module needs to be allocated with enough resource fraud tags to train and generate an accurate risk identification model, and when a new resource fraud behavior occurs, the trained machine learning model cannot identify a new feature, and cannot monitor a new risk naturally.

Therefore, a new method, an apparatus, an electronic device and a computer readable medium for identifying a risky user are needed.

The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of this, the present disclosure provides a method and an apparatus for identifying a risk user, an electronic device, and a computer readable medium, which can generate an accurate and effective risk identification model under the condition that training data is incomplete or a sample label is missing from the training data, so as to identify a risk of a current user and ensure enterprise resource safety.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, a method for identifying a risky user is provided, the method including: acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; and when the risk score is larger than a threshold value, determining the current user as a risk user.

Optionally, the method further comprises: when the current user is a risk user, determining a user strategy for the current user; based on the user policy, limiting the use permission of the current user; and/or sending secondary verification information to the current user based on the user strategy.

Optionally, the method further comprises: generating the plurality of behavior sets through behavior data of historical users and a clustering algorithm; and respectively allocating behavior labels to the behavior sets.

Optionally, the method further comprises: respectively inputting the behavior sets and the behavior labels corresponding to the behavior sets into a hidden Markov model for training; generating the risk identification model upon convergence of model parameters.

Optionally, the generating the plurality of behavior sets by behavior data of historical users and a clustering algorithm includes: acquiring a plurality of historical behaviors in behavior data of a historical user and corresponding behavior time; clustering a plurality of historical behavior data by a clustering algorithm to generate the plurality of behavior sets.

Optionally, the generating the plurality of behavior sets by behavior data of historical users and a clustering algorithm includes: generating the plurality of user sets through a behavior data and data stream clustering algorithm of historical users; and/or generating the plurality of user sets through behavior data of historical users and a grid clustering algorithm; and/or generating the plurality of user sets through behavior data of historical users and an automatic subspace clustering algorithm.

Optionally, assigning behavior labels to the behavior sets respectively includes: acquiring behavior data in the plurality of behavior sets; extracting preset behaviors from the behavior data; and allocating the behavior labels to the behavior set based on the number of preset behaviors in the behavior set.

Optionally, assigning behavior labels to the behavior sets respectively includes: determining a plurality of core behaviors in the plurality of behavior sets, respectively; determining risk data for core behavior; determining the behavior tag of a behavior set corresponding to the core behavior based on the risk data.

Optionally, the step of inputting the behavior sets and the behavior labels corresponding to the behavior sets into a hidden markov model respectively for training includes: respectively inputting the behavior sets and the corresponding behavior labels into a hidden Markov model; training the hidden Markov model based on a maximum expectation algorithm to generate the model parameters.

Optionally, similarly comparing the at least one historical behavior with a plurality of behavior sets of preset categories, and determining at least one behavior tag of the current user includes: performing similar comparison on the historical behaviors of the current user and a plurality of behavior sets of preset categories to obtain a plurality of comparison scores; acquiring a behavior set corresponding to the comparison score larger than the comparison threshold; and taking the label of the behavior set as a behavior label of the historical behavior of the current user.

According to an aspect of the present disclosure, an apparatus for identifying a risky user is provided, the apparatus including: the behavior module is used for acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior; the distribution module is used for carrying out similar comparison on the at least one historical behavior and a plurality of behavior sets of preset categories to determine at least one behavior tag of the current user; the calculation module is used for inputting the at least one behavior and the behavior tag corresponding to the at least one behavior into the risk identification model to calculate to obtain a risk score; and the comparison module is used for determining the current user as a risk user when the risk score is larger than a threshold value.

Optionally, the method further comprises: the strategy module is used for determining a user strategy for the current user when the current user is a risk user; based on the user policy, limiting the use permission of the current user; and/or sending secondary verification information to the current user based on the user strategy.

Optionally, the method further comprises: the aggregation module is used for generating a plurality of behavior aggregates through behavior data of historical users and a clustering algorithm; and the label module is used for respectively distributing the behavior labels to the behavior sets.

Optionally, the method further comprises: the model training module is used for respectively inputting the behavior sets and the corresponding behavior labels into a hidden Markov model for training; generating the risk identification model upon convergence of model parameters.

Optionally, the aggregation module includes: the history unit is used for acquiring a plurality of history behaviors in the behavior data of the history user and corresponding behavior time; and the clustering unit is used for clustering the historical behavior data through a clustering algorithm to generate the behavior sets.

Optionally, the clustering unit is further configured to generate the plurality of user sets through a behavior data and data stream clustering algorithm of historical users; and/or generating the plurality of user sets through behavior data of historical users and a grid clustering algorithm; and/or generating the plurality of user sets through behavior data of historical users and an automatic subspace clustering algorithm.

Optionally, the tag module includes: a data unit, configured to acquire behavior data in the plurality of behavior sets; extracting preset behaviors from the behavior data; and allocating the behavior labels to the behavior set based on the number of preset behaviors in the behavior set.

Optionally, the tag module includes: a core unit configured to determine a plurality of core behaviors in the plurality of behavior sets, respectively; determining risk data for core behavior; determining the behavior tag of a behavior set corresponding to the core behavior based on the risk data.

Optionally, the model training module is further configured to input the behavior sets and the behavior labels corresponding to the behavior sets into a hidden markov model respectively; training the hidden Markov model based on a maximum expectation algorithm to generate the model parameters.

Optionally, the allocation module includes: the similarity unit is used for performing similarity comparison on the historical behaviors of the current user and a plurality of behavior sets of preset categories to obtain a plurality of comparison scores; the threshold unit is used for acquiring a behavior set corresponding to the comparison score which is greater than the comparison threshold; and the label unit is used for taking the label of the behavior set as the behavior label of the historical behavior of the current user.

According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.

According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.

According to the identification method, the identification device, the electronic equipment and the computer readable medium of the risk user, behavior data of the current user are obtained, wherein the behavior data comprise at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; when the risk score is larger than the threshold value, the current user is determined to be a risk user, an accurate and effective risk identification model can be generated under the condition that training data are incomplete or sample labels are lost in the training data, then the risk of the current user is identified, and the resource safety of enterprises is guaranteed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.

Fig. 1 is a system block diagram illustrating an identification method and apparatus for an at risk user according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of identifying an at risk user according to an example embodiment.

Fig. 3 is a flow chart illustrating a method of identifying an at risk user according to another exemplary embodiment.

Fig. 4 is a flowchart illustrating a method of identifying an at risk user according to another exemplary embodiment.

Fig. 5 is a block diagram illustrating an apparatus for identification of an at risk user according to an example embodiment.

Fig. 6 is a block diagram illustrating an apparatus for identification of an at risk user according to another exemplary embodiment.

FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment.

FIG. 8 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.

In the present invention, resources refer to any available substances, information, time, information resources including computing resources and various types of data resources. The data resources include various private data in various domains. The innovation of the invention is how to use the information interaction technology between the server and the client to make the resource allocation process more automatic, efficient and reduce the labor cost. Thus, the present invention can be applied to the distribution of various resources including physical goods, water, electricity, and meaningful data, essentially. However, for convenience, the resource allocation is described as being implemented by taking financial data resources as an example, but those skilled in the art will understand that the present invention can also be applied to allocation of other resources.

As shown in fig. 1, the system architecture 10 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a financial services application, a shopping application, a web browser application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background management server that supports financial services websites browsed by the user using the

terminal apparatuses

101, 102, and 103. The background management server may analyze the received user data, and feed back a processing result (e.g., a risk user identifier) to an administrator of the financial services website.

The server 105 may, for example, obtain behavior data of a current user, the behavior data including at least one historical behavior and a behavior time corresponding thereto; the server 105 may, for example, perform similar comparison between the at least one historical behavior and a plurality of behavior sets of a preset category, and determine at least one behavior tag of the current user; the server 105 may, for example, input the at least one behavior and its corresponding behavior tag into a risk identification model to calculate a risk score; server 105 may determine the current user as a risky user, for example, when the risk score is greater than a threshold.

Server 105 may also determine a user policy for the current user, e.g., when the current user is a risky user; based on the user policy, limiting the use permission of the current user; and/or sending secondary verification information to the current user based on the user strategy.

The server 105 may also adjust the resource allocation quota of the user, for example, when the user is a risky user.

The server 105 may be a single entity server, or may be composed of a plurality of servers, for example, it should be noted that the method for identifying the risky user provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, the identification device of the risky user may be disposed in the server 105. And the web page end provided for the user to browse the financial service platform is generally positioned in the

terminal equipment

101, 102 and 103.

Fig. 2 is a flow chart illustrating a method of identifying an at risk user according to an example embodiment. The identification method 20 of the risky user includes at least steps S202 to S208.

As shown in fig. 2, in S202, behavior data of the current user is obtained, where the behavior data includes at least one historical behavior and a behavior time corresponding to the historical behavior. More specifically, the behavior of the user may be behavior related to resource transaction, and specifically may be resource transfer, resource borrowing, resource lending, resource repayment and the like.

In S204, the at least one historical behavior is compared with a plurality of behavior sets of a preset category, and at least one behavior tag of the current user is determined.

In one embodiment, the method further comprises generating the plurality of behavior sets through behavior data of historical users and a clustering algorithm; and respectively allocating behavior labels to the behavior sets. The different transaction characteristics involved in the actual activity are first processed. Various clustering algorithms can be used for clustering calculation of the characteristic vectors of the transactions, and a plurality of behavior sets are generated. Each distinct set of behaviors represents the most similar set of features within the group and transactions within the same set may be considered to have similar transaction behaviors.

In one embodiment, further comprising: performing similar comparison on the historical behaviors of the current user and a plurality of behavior sets of preset categories to obtain a plurality of comparison scores; acquiring a behavior set corresponding to the comparison score larger than the comparison threshold; and taking the label of the behavior set as a behavior label of the historical behavior of the current user.

More specifically, the behavior data of the current user a includes two historical behaviors a and b, and the behavior set includes a behavior set 1, a behavior set 2 and a behavior set 3. Comparing the behavior a with the behavior set comprising the behavior set 1, the behavior set 2 and the behavior set 3 respectively to obtain three values a1, a2 and a3, wherein the comparison threshold can be set to 0.5, and it can be considered that when a1, a2 and a3 are greater than 0.5, the current user a has the corresponding characteristic. Then the b behavior is also compared with the behavior set 1, the behavior set 2 and the behavior set 3 in sequence to obtain three values of b1, b2 and b3, and then the comparison threshold is also judged.

More specifically, when the comparison results of a1, a2 and a3 are all smaller than the comparison threshold, the maximum value can be selected from a1, a2 and a3, and the label corresponding to the maximum value is taken as the label corresponding to the a behavior of the current user a.

In S206, the at least one behavior and the behavior tag corresponding to the at least one behavior are input into the risk identification model to calculate a risk score.

In one embodiment, further comprising: respectively inputting the behavior sets and the behavior labels corresponding to the behavior sets into a hidden Markov model for training; generating the risk identification model upon convergence of model parameters. Hidden Markov Models (HMM) are statistical models describing a process of generating unobservable state random sequences by Markov chains with unknown parameters and then generating observation random sequences by each state, and have high-efficiency performance in the fields of speech recognition, behavior recognition, NLP, fault diagnosis and the like.

Respectively inputting the behavior sets and the behavior labels corresponding to the behavior sets into a hidden Markov model for training, wherein the training comprises the following steps: respectively inputting the behavior sets and the corresponding behavior labels into a hidden Markov model; training the hidden Markov model based on a maximum expectation algorithm to generate the model parameters. Among these, the maximum expected algorithm may be: the forward-backward (Baum-Welch) algorithm, the standard computation framework of the maximum expectation algorithm consists of alternating E and M steps, and the convergence of the algorithm ensures that the iteration approaches at least the local maximum. Because the iteration rule is easy to implement and Hidden variables can be flexibly considered, the maximum expectation algorithm can estimate parameters of a Hidden Markov Model (HMM).

In S208, when the risk score is greater than a threshold, the current user is determined to be a risk user. Further, when the current user is a risk user, determining a user policy for the current user; based on the user policy, limiting the use permission of the current user; and/or sending secondary verification information to the current user based on the user strategy.

The operation authority of the risk user on the resource account can be limited, the risk user is forbidden to perform resource extraction action, and the safety of the resource is guaranteed to the maximum extent. After limiting the operation authority of the risk user, a prompt of secondary verification information can be sent to the user to remind the user to submit more related information for proving the resource safety of the risk user so as to perform secondary verification. When the result of the secondary verification is still a risk user, all services of the user can be prohibited permanently. And when the result of the secondary verification is a non-risk user, the user can be forbidden.

According to the identification method of the risk user, behavior data of the current user are obtained, wherein the behavior data comprise at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; when the risk score is larger than the threshold value, the current user is determined to be a risk user, an accurate and effective risk identification model can be generated under the condition that training data are incomplete or sample labels are lost in the training data, then the risk of the current user is identified, and the resource safety of enterprises is guaranteed.

It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

Fig. 3 is a flow chart illustrating a method of identifying an at risk user according to another exemplary embodiment. The process 30 shown in fig. 3 is a supplementary description of the process shown in fig. 2.

As shown in fig. 3, in S302, a plurality of historical behaviors and their corresponding behavior times in the behavior data of the historical user are acquired.

In S304, a plurality of historical behavior data is clustered by a clustering algorithm to generate the plurality of behavior sets. The method comprises the following steps: generating the plurality of user sets through a behavior data and data stream clustering algorithm of historical users; and/or generating the plurality of user sets through behavior data of historical users and a grid clustering algorithm; and/or generating the plurality of user sets through behavior data of historical users and an automatic subspace clustering algorithm.

In an actual service scene, clustering algorithms like K-means are mostly based on a distance measurement criterion, and the number of categories is often specified before classification, so that the final clustering result tends to be spheroidized, and the effect is poor in high-dimensional distribution problems such as resource fraud transactions. Thus, in embodiments of the present disclosure, a clustering algorithm may comprise: and clustering algorithms such as DDE-Stream, D-Stream, Clu-Stream and CLIQUE, and performing clustering calculation on the behavior data of the historical users through one or more of the clustering algorithms.

In S306, behavior tags are respectively assigned to the behavior sets. When a plurality of behavior sets are obtained after the clustering algorithm is calculated, the labels of the behavior sets can be determined according to the characteristics of the users or the characteristics of the behaviors in the sets.

It is worth mentioning that K categories and corresponding category centers are obtained after the clustering is completed, and the category numbers are denoted as { c1, c2, …, ck }. The initial clustering model adopts a large amount of historical behavior data, so that the calculated result can accurately represent the behavior characteristics of the actual transaction. Therefore, the number of the K clusters is kept unchanged in the subsequent model updating process.

Since the HMM model can only analyze a single sequence of features, and many features in an actual transaction contribute to the resource fraud analysis, different features involved in the actual transaction are processed first. Various different clustering algorithms are used for trying to cluster the feature vectors of the transactions, each different category represents the most similar set in the group, the transactions in the same set can be considered to have similar transaction behaviors, and further in the practical application process, the behavior label corresponding to each transaction in the current user's historical behaviors can be used as the model entry variable of the HMM model.

Fig. 4 is a flowchart illustrating a method of identifying an at risk user according to another exemplary embodiment. The flow 40 shown in fig. 4 is a detailed description of S306 "assigning behavior tags to the behavior sets, respectively" in the flow shown in fig. 3.

As shown in fig. 4, in S402, behavior data in the plurality of behavior sets is acquired.

In S404, a preset behavior is extracted from the behavior data. For example, arrears or resource occupation overrun behaviors, which may be collectively referred to as bad behaviors, may be extracted from the behavior data.

In S406, the behavior tag is assigned to the behavior set based on the number of preset behaviors in the behavior set. It can be considered that, in a certain action set, the greater the number of bad actions, the higher the risk of the whole user represented by the set, and then the action label can be sequentially assigned to the set. The behavior tags may be risk behaviors, normal behaviors, high quality behaviors, and the like.

In S408, a plurality of core behaviors are respectively determined in the plurality of behavior sets. In the above, when performing cluster calculation, the core point in each behavior set is extracted, and the behavior corresponding to the core point can be used as the core behavior of the set.

In S410, risk data for core behavior is determined. Different risk data can be set for different behaviors through a preset method, for example, the risk corresponding to the resource returning behavior is a high-quality behavior with low risk, the behavior of resource over-term occupation is a high-risk behavior, and the resource borrowing and resource capturing behaviors can be normal behaviors with medium risk and the like.

In S412, the behavior tag of the behavior set corresponding to the core behavior is determined based on the risk data. And taking the behavior label corresponding to the core behavior as the behavior label corresponding to the behavior set.

In the prior art, most machine learning models default that input features are irrelevant, so model training is often impossible when a certain feature lacks a sample label, but the inventor of the present disclosure considers that various behavior variables are often relevant in practical application, and can allocate sample labels to all behaviors by means of the relevance among the behaviors, so as to perform model analysis.

According to the identification method of the risk user, firstly, the transaction characteristic vectors are tried to be clustered by utilizing a clustering algorithm, then, modeling analysis processing is carried out on account behaviors by combining an HMM algorithm, even if the user generates unknown fraudulent behaviors in practical application, labels can be distributed to the behaviors through the clustering algorithm, and then model calculation is carried out according to hidden parameters in the HMM algorithm, so that real-time judgment on various fraudulent behaviors is realized under the condition of limited sample labels. And the high false killing situation which is difficult to avoid by the traditional classification model can be reduced.

Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 5 is a block diagram illustrating an apparatus for identification of an at risk user according to an example embodiment. As shown in fig. 5, the identification device 50 of the risky user includes: an action module 502, an assignment module 504, a calculation module 506, and a comparison module 508.

The behavior module 502 is configured to obtain behavior data of a current user, where the behavior data includes at least one historical behavior and a behavior time corresponding to the historical behavior;

the allocating module 504 is configured to perform similar comparison on the at least one historical behavior and a plurality of behavior sets of a preset category, and determine at least one behavior tag of the current user; the assignment module 504 includes: the similarity unit is used for performing similarity comparison on the historical behaviors of the current user and a plurality of behavior sets of preset categories to obtain a plurality of comparison scores; the threshold unit is used for acquiring a behavior set corresponding to the comparison score which is greater than the comparison threshold; and the label unit is used for taking the label of the behavior set as the behavior label of the historical behavior of the current user.

The calculation module 506 is used for inputting the at least one behavior and the behavior tag corresponding to the behavior into the risk identification model to calculate a risk score;

the comparison module 508 is configured to determine the current user as a risky user when the risk score is greater than a threshold.

Fig. 6 is a block diagram illustrating an apparatus for identification of an at risk user according to another exemplary embodiment. As shown in fig. 6, the identification means 60 of the risky user includes: a policy module 602, a set module 604, a label module 606, and a model training module 608.

The policy module 602 is configured to determine a user policy for the current user when the current user is a risk user; based on the user policy, limiting the use permission of the current user; and/or sending secondary verification information to the current user based on the user strategy.

The aggregation module 604 is configured to generate the behavior aggregates according to behavior data of historical users and a clustering algorithm; the aggregation module 604 includes: the history unit is used for acquiring a plurality of history behaviors in the behavior data of the history user and corresponding behavior time; and the clustering unit is used for clustering the historical behavior data through a clustering algorithm to generate the behavior sets.

The label module 606 is configured to assign behavior labels to the behavior sets respectively. The label module 606 includes: a data unit, configured to acquire behavior data in the plurality of behavior sets; extracting preset behaviors from the behavior data; and allocating the behavior labels to the behavior set based on the number of preset behaviors in the behavior set. A core unit configured to determine a plurality of core behaviors in the plurality of behavior sets, respectively; determining risk data for core behavior; determining the behavior tag of a behavior set corresponding to the core behavior based on the risk data.

The model training module 608 is configured to input the behavior sets and the behavior labels corresponding to the behavior sets into a hidden markov model for training; generating the risk identification model upon convergence of model parameters. The model training module 608 is further configured to input the behavior sets and the behavior labels corresponding to the behavior sets into a hidden markov model respectively; training the hidden Markov model based on a maximum expectation algorithm to generate the model parameters.

According to the identification device of the risk user, behavior data of the current user are obtained, wherein the behavior data comprise at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; when the risk score is larger than the threshold value, the current user is determined to be a risk user, an accurate and effective risk identification model can be generated under the condition that training data are incomplete or sample labels are lost in the training data, then the risk of the current user is identified, and the resource safety of enterprises is guaranteed.

An electronic device 700 according to this embodiment of the disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 that connects the various system components (including the memory unit 720 and the processing unit 710), a display unit 740, and the like.

Wherein the storage unit stores program code that can be executed by the processing unit 710 to cause the processing unit 710 to perform the steps according to various exemplary embodiments of the present disclosure in the present specification. For example, the processing unit 710 may perform the steps as shown in fig. 2, 3, 4.

The memory unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The memory unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 700' (e.g., keyboard, pointing device, bluetooth device, etc.), such that a user can communicate with devices with which the electronic device 700 interacts, and/or any devices (e.g., router, modem, etc.) with which the electronic device 700 can communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. The network adapter 760 may communicate with other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 8, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.

The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior; comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user; inputting the at least one behavior and the corresponding behavior tag into a risk identification model to calculate a risk score; and when the risk score is larger than a threshold value, determining the current user as a risk user.

Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for identifying an at-risk user, comprising:

acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior;

comparing the at least one historical behavior with a plurality of behavior sets of preset categories similarly, and determining at least one behavior tag of the current user;

inputting the at least one behavior and the behavior tag corresponding to the at least one behavior into a risk identification model to calculate a risk score;

and when the risk score is larger than a threshold value, determining the current user as a risk user.

2. The identification method of claim 1, further comprising:

when the current user is a risk user, determining a user strategy for the current user;

based on the user policy, limiting the use permission of the current user; and/or

And sending secondary verification information to the current user based on the user strategy.

3. The identification method of any of claims 1-2, further comprising:

generating the plurality of behavior sets through behavior data of historical users and a clustering algorithm;

and respectively allocating behavior labels to the behavior sets.

4. The identification method of any of claims 1-3, further comprising:

respectively inputting the behavior sets and the behavior labels corresponding to the behavior sets into a hidden Markov model for training;

generating the risk identification model upon convergence of model parameters.

5. The identification method of any of claims 1-4, wherein generating the plurality of behavior sets by behavior data of historical users and a clustering algorithm comprises:

acquiring a plurality of historical behaviors in behavior data of a historical user and corresponding behavior time;

clustering a plurality of historical behavior data by a clustering algorithm to generate the plurality of behavior sets.

6. The identification method of any of claims 1-5, wherein generating the plurality of behavior sets by behavior data of historical users and a clustering algorithm comprises:

generating the plurality of user sets through a behavior data and data stream clustering algorithm of historical users; and/or

Generating the plurality of user sets through behavior data of historical users and a grid clustering algorithm; and/or

The plurality of user sets are generated through behavior data of historical users and an automatic subspace clustering algorithm.

7. The identification method of any of claims 1-6, wherein assigning behavior labels to the plurality of behavior sets, respectively, comprises:

acquiring behavior data in the plurality of behavior sets;

extracting preset behaviors from the behavior data;

and allocating the behavior labels to the behavior set based on the number of preset behaviors in the behavior set.

8. An apparatus for identifying an at-risk user, comprising:

the behavior module is used for acquiring behavior data of a current user, wherein the behavior data comprises at least one historical behavior and behavior time corresponding to the historical behavior;

the distribution module is used for carrying out similar comparison on the at least one historical behavior and a plurality of behavior sets of preset categories to determine at least one behavior tag of the current user;

the calculation module is used for inputting the at least one behavior and the behavior tag corresponding to the behavior into a risk identification model to calculate a risk score;

and the comparison module is used for determining the current user as a risk user when the risk score is larger than a threshold value.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.