CN113610366B - Risk warning generation method and device and electronic equipment - Google Patents

Risk warning generation method and device and electronic equipment Download PDF

Info

Publication number
CN113610366B
CN113610366B CN202110836040.XA CN202110836040A CN113610366B CN 113610366 B CN113610366 B CN 113610366B CN 202110836040 A CN202110836040 A CN 202110836040A CN 113610366 B CN113610366 B CN 113610366B
Authority
CN
China
Prior art keywords
risk
information
user
historical
user information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110836040.XA
Other languages
Chinese (zh)
Other versions
CN113610366A (en
Inventor
李心宇
聂婷婷
沈赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyue Information Technology Co Ltd
Original Assignee
Shanghai Qiyue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyue Information Technology Co Ltd filed Critical Shanghai Qiyue Information Technology Co Ltd
Priority to CN202110836040.XA priority Critical patent/CN113610366B/en
Publication of CN113610366A publication Critical patent/CN113610366A/en
Application granted granted Critical
Publication of CN113610366B publication Critical patent/CN113610366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present disclosure relates to a risk alert generation method, apparatus, electronic device, and computer-readable medium. The method comprises the following steps: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy. The risk warning generation method and the risk warning generation device can solve the problem of over fitting caused by over sampling or under sampling during machine model training, obtain an accurate calculation model, further rapidly determine users with financial risks and improve the safety of user resource allocation.

Description

Risk warning generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a risk warning generation method, apparatus, electronic device, and computer readable medium.
Background
Individual users or enterprise users often conduct resource borrowing activities by resource servicing organizations, for which the borrowing activities of the users are likely to pose a risk to the resource servicing company. In actual wind control, it is often necessary and valuable to foresee and obtain corresponding risk measures in advance. At present, the judgment of the resource risk is often obtained by analyzing the basic information and the behavior information of the user. Different risk technologies have corresponding wind control means, such as malicious default, and the behavior and characteristic expression of a malicious default user can be observed for a malicious default case, so that if the characteristics are modeled into variables and strategies, the risk prevention and control can be positively acted.
The fraudulent user features may be learned, for example, in a model predictive manner for use in the discovery of new fraudulent users in identifying the fraudulent user. However, during the modeling training of these users, workers find labeling themselves less accurate for fraudulent users. As described above, labeling fraudulent users relies largely on manual and post-investigation, which results in many fraudulent users being unrecognizable, i.e., defined users without fraudulent activity, possibly including true non-fraudulent clients, and possibly fraudulent clients but not manually and investigation found. In the training process of the sample, if the sample labeling data is not accurate, the label is subjected to single-heat coding when the classification problem is predicted, the cross entropy is adopted as a loss function for fitting, the difference between the category to which the label belongs and the category to which the label does not belong is as large as possible, the gradient is bounded, the model is excessively believed to be the predicted category due to the training of the method, the fitting is easy to be caused, the model is chaotic easily caused, and the robustness is not strong.
The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the disclosure provides a risk warning generation method, apparatus, electronic device, and computer readable medium, which can solve the problem of over-fitting caused by over-sampling or under-sampling during machine model training, obtain an accurate calculation model, further quickly determine users with financial risk, and improve the safety of user resource allocation.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to an aspect of the present disclosure, there is provided a risk warning generation method including: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy.
Optionally, the method further comprises: acquiring multidimensional feature information of a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.
Optionally, the risk model includes a plurality of sub-risk models, the multidimensional feature information is input into the risk model, and at least one risk score is generated, including: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score.
Optionally, when the at least one risk score meets a preset policy, generating risk warning information includes: randomly combining the at least one risk score to generate at least one joint score; and when the at least one joint score meets a preset strategy, generating the risk warning information.
Optionally, acquiring multidimensional feature information of a plurality of historical users includes: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
Optionally, assigning sample tags to the plurality of historical users based on the user information respectively, including: comparing the user information of the historical user with a plurality of discrimination strategies; and distributing sample labels to the historical users based on the discrimination strategies satisfied by the user information, wherein the sample labels are expressed by discrete positive integers.
Optionally, determining a tag parameter for the sample tag based on a regularization policy includes: generating a determined deviation coefficient based on a regularization strategy; and generating label parameters of the sample labels based on the deviation coefficient.
Optionally, training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model includes: inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training; generating a cross entropy loss function based on the tag parameters during training; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.
Optionally, when the cross entropy loss function obtains an optimal solution, the method includes: solving the cross entropy loss function based on a gradient descent mode; and taking the stable solution of the cross entropy loss function as the optimal solution.
According to an aspect of the present disclosure, there is provided a risk warning generation apparatus including: the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information; the feature module is used for generating multidimensional feature information based on the user information and the feature strategy; the scoring module is used for inputting the multidimensional characteristic information into a risk model to generate at least one risk score, the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and the warning module is used for generating risk warning information when the at least one risk score meets a preset strategy.
According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods as described above.
According to an aspect of the present disclosure, a computer-readable medium is presented, on which a computer program is stored, which program, when being executed by a processor, implements a method as described above.
According to the risk warning generation method, the risk warning generation device, the electronic equipment and the computer readable medium, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely examples of the present disclosure and other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a system block diagram illustrating a method and apparatus for risk alert generation in accordance with an exemplary embodiment.
Fig. 2 is a flow chart illustrating a risk alert generation method according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment.
Fig. 4 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment.
Fig. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment.
Fig. 6 is a block diagram of an electronic device, according to an example embodiment.
Fig. 7 is a block diagram of a computer-readable medium shown according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the concepts of the present disclosure. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the present disclosure, and therefore, should not be taken to limit the scope of the present disclosure.
In this disclosure, a resource refers to any substance, information, time that may be utilized, information resources including computing resources and various types of data resources. The data resources include various dedicated data in various fields. The innovation of the present disclosure is how to use information interaction techniques between a server and a client to more automate, more efficiently, and reduce labor costs in the process of risk alert information generation. Thus, the present disclosure is applicable to the allocation of various types of resources, including physical cargo, water, electricity, and meaningful data, by nature. However, for convenience, the implementation of resource allocation is described in this disclosure as an example of financial data resources, but those skilled in the art will appreciate that this disclosure may also be used for allocation of other resources.
FIG. 1 is a system block diagram illustrating a method and apparatus for risk alert generation in accordance with an exemplary embodiment.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as financial service class applications, shopping class applications, web browser applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server providing support for financial service-like websites browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze the received user data and feed back the processing result (e.g., risk warning information) to an administrator of the financial service website.
The server 105 may, for example, obtain user information for the user, including base information and behavior information; server 105 may generate multidimensional feature information, for example, based on the user information and feature policies; the server 105 may, for example, input the multi-dimensional feature information into a risk model generated from user information of a historical user and a machine learning model, wherein the historical user assigns sample tags in a regularization strategy manner according to their corresponding user information, generating at least one risk score; the server 105 may generate risk warning information, for example, when the at least one risk score satisfies a preset policy.
Server 105 may also, for example, obtain multi-dimensional characteristic information for a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.
The server 105 may also, for example, acquire a plurality of pieces of history user information satisfying a preset condition; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
The server 105 may also set the trained risk model and the preset policy at the terminal device 101, 102, 103, for example, so that the terminal device 101, 102, 103 generates multi-dimensional feature information based on the user information and the feature policy; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset policy, the terminal device 101, 102, 103 generates and sends risk warning information to the server 105.
Server 105 may be an entity's server, may also be comprised of multiple servers, for example, and some of server 105 may be used, for example, as a risk warning system in the present disclosure, to generate risk warning information; some of the servers 105 may be used, for example, as a predictive strategy generation system in the present disclosure, for generating preset strategies; and a portion of server 105 may also be used, for example, as a model training system in the present disclosure, to train a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.
It should be noted that the risk warning generation method provided by the embodiments of the present disclosure may be performed by the server 105 and/or the terminal devices 101, 102, 103, and accordingly, the risk warning generation apparatus may be provided in the server 105 and/or the terminal devices 101, 102, 103. And the web page end provided for the user to browse the financial service platform is generally located in the terminal devices 101, 102, 103.
Fig. 2 is a flow chart illustrating a risk alert generation method according to an exemplary embodiment. The risk alert generation method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, user information of a user including basic information and behavior information is acquired. In the embodiment of the disclosure, the user may be an individual user or an enterprise user, and the allocation of the resource quota may be adjustment of the financial resource quota, or may be allocation of electric power resources or hydraulic resources. The user information may include basic information, which may be, for example, service account information, page operation data of the user, service access duration of the user, service access frequency of the user, terminal equipment identification information of the user, and region information where the user is located, and may be specifically determined according to an actual application scenario, which is not limited herein. The user information may further include behavior information, which may be, for example, page operation data of the user, service access duration of the user, service access frequency of the user, etc., where specific content of the user information may be determined according to an actual application scenario, and is not limited herein. More specifically, the user information of the current user can be acquired in a webpage embedded mode based on the user authorization.
More specifically, behavior information of a user on a webpage can be obtained through a Fiddler tool, the Fiddler tool works in a web proxy server mode, after a client sends request data, the Fiddler proxy server intercepts a data packet, and then the proxy server impersonates the client to send the data to the server; similarly, the server returns response data, and the proxy server intercepts the data and returns the data to the client. Browsing data related to residence time, residence pages, clicking operations and the like of web browsing of a user can be obtained through the Fiddler.
In S204, multidimensional feature information is generated based on the user information and feature policies. Feature policies may be generated, for example, based on relationships between the plurality of historical multi-dimensional feature information and the historical user information.
The method can carry out data cleaning and data fusion on the user information so as to convert the user information into multi-dimensional data, and more particularly, can carry out variable missing rate analysis and processing and outlier processing on the user information; the continuous variable discretized user information can also be subjected to WOE conversion, discrete variable WOE conversion, text variable processing, word2vec processing of text variables and the like.
Where WOE is "Weight of Evidence", the evidence weight. WOE is a coded form of the original feature. To WOE encode a feature, this variable needs to be first packet processed. Word2vec is a group of correlation models used to generate Word vectors. These models are shallow, bi-layer neural networks that are used to train to reconstruct linguistic word text. The word2vec model may be used to map each word to a vector, which may be used to represent word-to-word relationships.
In S206, the multidimensional feature information is input into a risk model, and at least one risk score is generated, where the risk model is generated by user information of a historical user and a machine learning model, and the historical user distributes sample labels according to corresponding user information in a regularization strategy manner.
In one embodiment, the risk model includes a plurality of sub-risk models, the multi-dimensional feature information is input into the risk model to generate at least one risk score, including: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score.
More specifically, each sub-risk model may represent a risk of the user in some aspect, and sub-risk model a may represent, for example, a risk of the user returning to the resource over time; the sub-risk model B may, for example, represent the risk that the user does not plan for resources; the sub-risk model C may, for example, represent the risk of intentional fraud by the user.
In S208, risk warning information is generated when the at least one risk score satisfies a preset policy. At least one joint score may be generated, for example, by randomly combining the at least one risk score; and when the at least one joint score meets a preset strategy, generating the risk warning information.
For example, the risk scores may be combined, and compared according to a combination value and a preset policy to determine whether to generate risk warning information. More specifically, the generation of the warning message may be determined when the risk score a is greater than 0.5 and the risk score B is greater than 0.3; it may also be determined to generate warning information, for example, when the risk score C is greater than 0.8.
According to the risk warning generation method, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 3 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment. The flow 30 shown in fig. 3 is a complementary description of the flow shown in fig. 2.
As shown in fig. 3, in S302, multidimensional feature information of a plurality of historical users is acquired. And generating the multidimensional feature information from the user information of a plurality of historical users based on a preset strategy.
In S304, sample tags are respectively assigned to the plurality of history users based on the multi-dimensional feature information. Comprising the following steps: comparing the user information of the historical user with a plurality of discrimination strategies; and distributing sample labels to the historical users based on the discrimination strategies satisfied by the user information, wherein the sample labels are expressed by discrete positive integers.
The number of sample tags can be determined according to the number of sub-models in the risk model to be trained, the risk sub-models can be, for example, A, B and C, and the corresponding risk tags can be the numbers 1,2,3 and 4. Wherein label 1 represents risk a, label 2 represents risk B, label 3 represents risk C, and label 4 represents no risk.
In S306, tag parameters are determined for the sample tags based on a regularization policy. Comprising the following steps: generating a determined deviation coefficient based on a regularization strategy; and generating label parameters of the sample labels based on the deviation coefficient. The values of the tags determined above may be smoothed such that the tags are in the form of probability values, where the probability values at the real tags are the largest and the probability values at other locations are very small numbers. Therefore, the distance between different classifications in training is increased, the distance in the classification is reduced, the overfitting of prediction is reduced, and the prediction robustness is improved.
Can, for example, have a coefficient of deviation of
Wherein epsilon is a super parameter, K is the total classification number, namely the number of the neutron models in the application, and i represents one of a plurality of classes. The meaning of the above formula is that the probability that i is a label of a certain class is (1-epsilon) and the probability that i is not a label of a certain class is
In S308, training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model. Multiple historical user input machine learning models with sample tags, tag parameters may be trained, for example; generating a cross entropy loss function based on the tag parameters during training; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.
In one embodiment, the cross entropy loss function may be solved, for example, based on a gradient descent approach; and taking the stable solution of the cross entropy loss function as the optimal solution.
Specifically, aiming at a sample set of each label, respectively constructing a sub-model, inputting user information of each historical user in the sample set into the sub-model to obtain a predicted label, comparing the predicted label with a corresponding real label, judging whether the predicted label is consistent with the real label, counting the number of the predicted labels consistent with the real label, calculating the proportion of the number of the predicted labels consistent with the real label in the number of all the predicted labels, if the proportion is greater than or equal to a preset proportion, converging the sub-model to obtain a trained sub-model, if the proportion is less than the preset proportion, adjusting parameters in the sub-model, and re-predicting the predicted labels of each object through the adjusted sub-model until the proportion is greater than or equal to the preset proportion. The method for adjusting the parameters in the adjustment model can be performed by adopting a random gradient descent algorithm, a gradient descent algorithm or a normal equation.
In the application, the machine learning model can be a classification model, and specifically can be one or a combination of multiple classification algorithms such as logistic regression, naive Bayes, decision trees, support vector machines, random forests, gradient lifting trees and the like, and if the number of times of adjusting the parameters of the adjustment model exceeds the preset number of times, the type of the machine learning model used by the structure model can be replaced so as to improve the model training efficiency.
According to the risk warning generation method, the label is subjected to smoothing processing, so that the model classification hyperplane is not close to the original data, the weight of the class probability of the real label when the loss value is calculated is reduced, and the weight of the prediction probability of other classes in the final loss function is increased. Therefore, the difference between the probability of the real category and the probability average value of other categories is reduced, excessive confidence of the model is reduced, and the risk user is effectively identified.
Fig. 4 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment. The flow 40 shown in fig. 4 is a detailed description of "generate preset policy".
As shown in fig. 4, in S402, a plurality of pieces of history user information satisfying a preset condition are acquired. In this embodiment, the financial resource borrowing is taken as an example for illustration, and it can be understood that the method of the present application can also be applied to other allocation scenarios. Based on real business data of a certain financial service platform, historical users with overdue 30+ (MOB3+) of repayment performance in 3 periods are defined as target samples of modeling, and the proportion of overdue samples is less than 5% through index analysis such as vintage and mobility.
In S404, the plurality of historical user information is subjected to data cleansing and data fusion to generate a plurality of historical feature information. After the information is fused to form the ten-thousand-dimension wide-table variable, the data is required to be further cleaned so as to ensure the stability and accuracy of the later model. The data cleaning step includes, but is not limited to, variable deletion rate analysis and processing, outlier processing, continuous variable discretization and WOE conversion, discrete variable WOE conversion, text variable processing, and the like.
In S406, a plurality of historical multi-dimensional feature information is determined from the plurality of historical feature information. Variable parameters, distinguishing degree parameters, information values and model characteristic parameters of the plurality of historical characteristic information can be calculated; and extracting a plurality of historical multidimensional feature information from the plurality of historical feature information based on the variable parameter, the distinguishing degree parameter, the information value and the model feature parameter.
The method can be used for screening the characteristics with high coverage rate and obvious distinguishing effect on the target variable as multidimensional characteristics by combining various comprehensive consideration of variable coverage, single value coverage, relevance and significance of the target variable, distinguishing degree (KS) and Information Value (IV) of the target variable, characteristic importance of tree models (such as XGBoost, RF and the like) and the like.
In S408, a feature policy is generated based on the relationships between the plurality of historical multi-dimensional feature information and the historical user information.
Those skilled in the art will appreciate that all or part of the steps implementing the above described embodiments are implemented as a computer program executed by a CPU. The above-described functions defined by the above-described methods provided by the present disclosure are performed when the computer program is executed by a CPU. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic disk or an optical disk, etc.
Furthermore, it should be noted that the above-described figures are merely illustrative of the processes involved in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment. As shown in fig. 5, the risk warning generation device 50 includes: information module 502, feature module 504, scoring module 506, and warning module 508.
The information module 502 is configured to obtain user information of a user, where the user information includes basic information and behavior information;
The feature module 504 is configured to generate multidimensional feature information based on the user information and feature policies;
The scoring module 506 is configured to input the multidimensional feature information into a risk model, and generate at least one risk score, where the risk model is generated by user information of a historical user and a machine learning model, and the historical user distributes sample tags according to corresponding user information in a regularization strategy manner;
The warning module 508 is configured to generate risk warning information when the at least one risk score satisfies a preset policy.
According to the risk warning generating device, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.
Fig. 6 is a block diagram of an electronic device, according to an example embodiment.
An electronic device 600 according to such an embodiment of the present disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 6, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps in the present specification according to various exemplary embodiments of the present disclosure. For example, the processing unit 610 may perform the steps as shown in fig. 2, 3, and 4.
The memory unit 620 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 600' (e.g., keyboard, pointing device, bluetooth device, etc.), devices that enable a user to interact with the electronic device 600, and/or any devices (e.g., routers, modems, etc.) that the electronic device 600 can communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, as shown in fig. 7, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the embodiments of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The computer-readable medium carries one or more programs, which when executed by one of the devices, cause the computer-readable medium to perform the functions of: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy. The computer readable medium may also implement the following functions: acquiring multidimensional feature information of a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model. The computer readable medium may also implement the following functions: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and include several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that this disclosure is not limited to the particular arrangements, instrumentalities and methods of implementation described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (7)

1. A risk warning generation method, comprising:
acquiring multidimensional feature information of a plurality of historical users;
comparing the user information of the historical user with a plurality of discrimination strategies;
determining the number of sample tags according to the number of sub-models in the risk model to be trained;
distributing sample labels to the historical users based on a discrimination strategy satisfied by the user information, wherein the sample labels are expressed by discrete positive integers;
Generating a deviation coefficient based on a regularization strategy;
Coefficient of deviation
Wherein,K is the total classification number, namely the number of sub-models,Representing a class of the plurality of classes;
generating label parameters of the sample labels based on the deviation coefficient, and smoothing the numerical values of the labels expressed by discrete positive integers to enable the labels to be in the form of probability values, wherein the probability values at the real labels are maximum;
Training a machine learning model based on the plurality of historical users and corresponding sample tags and tag parameters thereof to generate a risk model, wherein the risk model comprises a plurality of sub-risk models;
Acquiring user information of a user, wherein the user information comprises basic information and behavior information;
generating multidimensional feature information based on the user information and feature policies;
determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information;
inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score;
randomly combining the at least one risk score to generate at least one joint score;
and when the at least one joint score meets a preset strategy, generating the risk warning information.
2. The method of claim 1, wherein obtaining multi-dimensional characteristic information for a plurality of historical users comprises:
Acquiring a plurality of pieces of historical user information meeting preset conditions;
performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information;
Determining a plurality of historical multidimensional feature information from the plurality of historical feature information;
a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
3. The method of claim 1, wherein training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model comprises:
inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training;
generating a cross entropy loss function based on the tag parameters during training;
And when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.
4. A method as claimed in claim 3, comprising, when the cross entropy loss function obtains an optimal solution:
solving the cross entropy loss function based on a gradient descent mode;
And taking the stable solution of the cross entropy loss function as the optimal solution.
5. A risk warning generation device, comprising:
the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information;
the feature module is used for generating multidimensional feature information based on the user information and the feature strategy;
The scoring module is used for acquiring multidimensional feature information of a plurality of historical users; comparing the user information of the historical user with a plurality of discrimination strategies; determining the number of sample tags according to the number of sub-models in the risk model to be trained; distributing sample labels to the historical users based on a discrimination strategy satisfied by the user information, wherein the sample labels are expressed by discrete positive integers; generating a deviation coefficient based on a regularization strategy; generating label parameters of the sample labels based on the deviation coefficient, and smoothing the numerical values of the labels expressed by discrete positive integers to enable the labels to be in the form of probability values, wherein the probability values at the real labels are maximum;
Coefficient of deviation
Wherein,K is the total classification number, namely the number of sub-models,Representing a class of the plurality of classes; training a machine learning model based on the plurality of historical users and corresponding sample tags and tag parameters thereof to generate a risk model, wherein the risk model comprises a plurality of sub-risk models; and determining at least one sub-risk model from a plurality of sub-risk models of the risk model based on the user information; inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score;
The warning module is used for generating risk warning information when the at least one risk score meets a preset strategy, and randomly combining the at least one risk score to generate at least one joint score; and when the at least one joint score meets a preset strategy, generating the risk warning information.
6. An electronic device, comprising:
One or more processors;
a storage means for storing one or more programs;
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.
7. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.
CN202110836040.XA 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment Active CN113610366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110836040.XA CN113610366B (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110836040.XA CN113610366B (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113610366A CN113610366A (en) 2021-11-05
CN113610366B true CN113610366B (en) 2024-08-16

Family

ID=78338188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110836040.XA Active CN113610366B (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113610366B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460059B (en) * 2022-07-28 2024-03-08 浪潮通信信息系统有限公司 Risk early warning method and device
CN117521042B (en) * 2024-01-05 2024-05-14 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033009A1 (en) * 2016-07-27 2018-02-01 Intuit Inc. Method and system for facilitating the identification and prevention of potentially fraudulent activity in a financial system
CN111080440A (en) * 2019-12-18 2020-04-28 上海良鑫网络科技有限公司 Big data wind control management system
CN112037009A (en) * 2020-08-06 2020-12-04 百维金科(上海)信息科技有限公司 Risk assessment method for consumption credit scene based on random forest algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance

Also Published As

Publication number Publication date
CN113610366A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN112348660B (en) Method and device for generating risk warning information and electronic equipment
CN111210335B (en) User risk identification method and device and electronic equipment
CN112529702B (en) User credit granting strategy allocation method and device and electronic equipment
CN111145009A (en) Method and device for evaluating risk after user loan and electronic equipment
CN113610366B (en) Risk warning generation method and device and electronic equipment
CN110705719A (en) Method and apparatus for performing automatic machine learning
CN111178687B (en) Financial risk classification method and device and electronic equipment
CN111583018A (en) Credit granting strategy management method and device based on user financial performance analysis and electronic equipment
CN112348662B (en) Risk assessment method and device based on user occupation prediction and electronic equipment
WO2023216494A1 (en) Federated learning-based user service strategy determination method and apparatus
CN111967543A (en) User resource quota determining method and device and electronic equipment
CN111190967B (en) User multidimensional data processing method and device and electronic equipment
CN111198967A (en) User grouping method and device based on relational graph and electronic equipment
CN111191677B (en) User characteristic data generation method and device and electronic equipment
Soni et al. Learning-Based Model for Phishing Attack Detection
CN113610625A (en) Overdue risk warning method and device and electronic equipment
CN114091815A (en) Resource request processing method, device and system and electronic equipment
CN114398465A (en) Exception handling method and device of Internet service platform and computer equipment
CN117709691A (en) Intelligent sub-packaging management method and system based on cloud service
CN113568739B (en) User resource quota allocation method and device and electronic equipment
CN113902545B (en) Resource quota allocation method and device and electronic equipment
CN113570207B (en) User policy allocation method and device and electronic equipment
CN113610536B (en) User policy allocation method and device for transaction refusing user and electronic equipment
CN114742645B (en) User security level identification method and device based on multi-stage time sequence multitask
CN112950003B (en) User resource quota adjustment method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 1109, No. 4, Lane 800, Tongpu Road, Putuo District, Shanghai, 200062

Applicant after: Shanghai Qiyue Information Technology Co.,Ltd.

Address before: Room a2-8914, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai, 201500

Applicant before: Shanghai Qiyue Information Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant