CN113610366B

CN113610366B - Risk warning generation method and device and electronic equipment

Info

Publication number: CN113610366B
Application number: CN202110836040.XA
Authority: CN
Inventors: 李心宇; 聂婷婷; 沈赟
Original assignee: Shanghai Qiyue Information Technology Co Ltd
Current assignee: Shanghai Qiyue Information Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2024-08-16
Anticipated expiration: 2041-07-23
Also published as: CN113610366A

Abstract

The present disclosure relates to a risk alert generation method, apparatus, electronic device, and computer-readable medium. The method comprises the following steps: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy. The risk warning generation method and the risk warning generation device can solve the problem of over fitting caused by over sampling or under sampling during machine model training, obtain an accurate calculation model, further rapidly determine users with financial risks and improve the safety of user resource allocation.

Description

Risk warning generation method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer information processing, and in particular, to a risk warning generation method, apparatus, electronic device, and computer readable medium.

Background

Individual users or enterprise users often conduct resource borrowing activities by resource servicing organizations, for which the borrowing activities of the users are likely to pose a risk to the resource servicing company. In actual wind control, it is often necessary and valuable to foresee and obtain corresponding risk measures in advance. At present, the judgment of the resource risk is often obtained by analyzing the basic information and the behavior information of the user. Different risk technologies have corresponding wind control means, such as malicious default, and the behavior and characteristic expression of a malicious default user can be observed for a malicious default case, so that if the characteristics are modeled into variables and strategies, the risk prevention and control can be positively acted.

The fraudulent user features may be learned, for example, in a model predictive manner for use in the discovery of new fraudulent users in identifying the fraudulent user. However, during the modeling training of these users, workers find labeling themselves less accurate for fraudulent users. As described above, labeling fraudulent users relies largely on manual and post-investigation, which results in many fraudulent users being unrecognizable, i.e., defined users without fraudulent activity, possibly including true non-fraudulent clients, and possibly fraudulent clients but not manually and investigation found. In the training process of the sample, if the sample labeling data is not accurate, the label is subjected to single-heat coding when the classification problem is predicted, the cross entropy is adopted as a loss function for fitting, the difference between the category to which the label belongs and the category to which the label does not belong is as large as possible, the gradient is bounded, the model is excessively believed to be the predicted category due to the training of the method, the fitting is easy to be caused, the model is chaotic easily caused, and the robustness is not strong.

The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of this, the disclosure provides a risk warning generation method, apparatus, electronic device, and computer readable medium, which can solve the problem of over-fitting caused by over-sampling or under-sampling during machine model training, obtain an accurate calculation model, further quickly determine users with financial risk, and improve the safety of user resource allocation.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the present disclosure, there is provided a risk warning generation method including: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy.

Optionally, the method further comprises: acquiring multidimensional feature information of a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.

Optionally, the risk model includes a plurality of sub-risk models, the multidimensional feature information is input into the risk model, and at least one risk score is generated, including: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score.

Optionally, when the at least one risk score meets a preset policy, generating risk warning information includes: randomly combining the at least one risk score to generate at least one joint score; and when the at least one joint score meets a preset strategy, generating the risk warning information.

Optionally, acquiring multidimensional feature information of a plurality of historical users includes: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.

Optionally, assigning sample tags to the plurality of historical users based on the user information respectively, including: comparing the user information of the historical user with a plurality of discrimination strategies; and distributing sample labels to the historical users based on the discrimination strategies satisfied by the user information, wherein the sample labels are expressed by discrete positive integers.

Optionally, determining a tag parameter for the sample tag based on a regularization policy includes: generating a determined deviation coefficient based on a regularization strategy; and generating label parameters of the sample labels based on the deviation coefficient.

Optionally, training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model includes: inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training; generating a cross entropy loss function based on the tag parameters during training; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.

Optionally, when the cross entropy loss function obtains an optimal solution, the method includes: solving the cross entropy loss function based on a gradient descent mode; and taking the stable solution of the cross entropy loss function as the optimal solution.

According to an aspect of the present disclosure, there is provided a risk warning generation apparatus including: the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information; the feature module is used for generating multidimensional feature information based on the user information and the feature strategy; the scoring module is used for inputting the multidimensional characteristic information into a risk model to generate at least one risk score, the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and the warning module is used for generating risk warning information when the at least one risk score meets a preset strategy.

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods as described above.

According to an aspect of the present disclosure, a computer-readable medium is presented, on which a computer program is stored, which program, when being executed by a processor, implements a method as described above.

According to the risk warning generation method, the risk warning generation device, the electronic equipment and the computer readable medium, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely examples of the present disclosure and other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a system block diagram illustrating a method and apparatus for risk alert generation in accordance with an exemplary embodiment.

Fig. 2 is a flow chart illustrating a risk alert generation method according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment.

Fig. 4 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment.

Fig. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram of an electronic device, according to an example embodiment.

Fig. 7 is a block diagram of a computer-readable medium shown according to an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the concepts of the present disclosure. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.

Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the present disclosure, and therefore, should not be taken to limit the scope of the present disclosure.

In this disclosure, a resource refers to any substance, information, time that may be utilized, information resources including computing resources and various types of data resources. The data resources include various dedicated data in various fields. The innovation of the present disclosure is how to use information interaction techniques between a server and a client to more automate, more efficiently, and reduce labor costs in the process of risk alert information generation. Thus, the present disclosure is applicable to the allocation of various types of resources, including physical cargo, water, electricity, and meaningful data, by nature. However, for convenience, the implementation of resource allocation is described in this disclosure as an example of financial data resources, but those skilled in the art will appreciate that this disclosure may also be used for allocation of other resources.

As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as financial service class applications, shopping class applications, web browser applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server providing support for financial service-like websites browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze the received user data and feed back the processing result (e.g., risk warning information) to an administrator of the financial service website.

The server 105 may, for example, obtain user information for the user, including base information and behavior information; server 105 may generate multidimensional feature information, for example, based on the user information and feature policies; the server 105 may, for example, input the multi-dimensional feature information into a risk model generated from user information of a historical user and a machine learning model, wherein the historical user assigns sample tags in a regularization strategy manner according to their corresponding user information, generating at least one risk score; the server 105 may generate risk warning information, for example, when the at least one risk score satisfies a preset policy.

Server 105 may also, for example, obtain multi-dimensional characteristic information for a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.

The server 105 may also, for example, acquire a plurality of pieces of history user information satisfying a preset condition; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.

The server 105 may also set the trained risk model and the preset policy at the terminal device 101, 102, 103, for example, so that the terminal device 101, 102, 103 generates multi-dimensional feature information based on the user information and the feature policy; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset policy, the terminal device 101, 102, 103 generates and sends risk warning information to the server 105.

Server 105 may be an entity's server, may also be comprised of multiple servers, for example, and some of server 105 may be used, for example, as a risk warning system in the present disclosure, to generate risk warning information; some of the servers 105 may be used, for example, as a predictive strategy generation system in the present disclosure, for generating preset strategies; and a portion of server 105 may also be used, for example, as a model training system in the present disclosure, to train a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model.

It should be noted that the risk warning generation method provided by the embodiments of the present disclosure may be performed by the server 105 and/or the terminal devices 101, 102, 103, and accordingly, the risk warning generation apparatus may be provided in the server 105 and/or the terminal devices 101, 102, 103. And the web page end provided for the user to browse the financial service platform is generally located in the terminal devices 101, 102, 103.

Fig. 2 is a flow chart illustrating a risk alert generation method according to an exemplary embodiment. The risk alert generation method 20 includes at least steps S202 to S208.

As shown in fig. 2, in S202, user information of a user including basic information and behavior information is acquired. In the embodiment of the disclosure, the user may be an individual user or an enterprise user, and the allocation of the resource quota may be adjustment of the financial resource quota, or may be allocation of electric power resources or hydraulic resources. The user information may include basic information, which may be, for example, service account information, page operation data of the user, service access duration of the user, service access frequency of the user, terminal equipment identification information of the user, and region information where the user is located, and may be specifically determined according to an actual application scenario, which is not limited herein. The user information may further include behavior information, which may be, for example, page operation data of the user, service access duration of the user, service access frequency of the user, etc., where specific content of the user information may be determined according to an actual application scenario, and is not limited herein. More specifically, the user information of the current user can be acquired in a webpage embedded mode based on the user authorization.

More specifically, behavior information of a user on a webpage can be obtained through a Fiddler tool, the Fiddler tool works in a web proxy server mode, after a client sends request data, the Fiddler proxy server intercepts a data packet, and then the proxy server impersonates the client to send the data to the server; similarly, the server returns response data, and the proxy server intercepts the data and returns the data to the client. Browsing data related to residence time, residence pages, clicking operations and the like of web browsing of a user can be obtained through the Fiddler.

In S204, multidimensional feature information is generated based on the user information and feature policies. Feature policies may be generated, for example, based on relationships between the plurality of historical multi-dimensional feature information and the historical user information.

The method can carry out data cleaning and data fusion on the user information so as to convert the user information into multi-dimensional data, and more particularly, can carry out variable missing rate analysis and processing and outlier processing on the user information; the continuous variable discretized user information can also be subjected to WOE conversion, discrete variable WOE conversion, text variable processing, word2vec processing of text variables and the like.

Where WOE is "Weight of Evidence", the evidence weight. WOE is a coded form of the original feature. To WOE encode a feature, this variable needs to be first packet processed. Word2vec is a group of correlation models used to generate Word vectors. These models are shallow, bi-layer neural networks that are used to train to reconstruct linguistic word text. The word2vec model may be used to map each word to a vector, which may be used to represent word-to-word relationships.

In S206, the multidimensional feature information is input into a risk model, and at least one risk score is generated, where the risk model is generated by user information of a historical user and a machine learning model, and the historical user distributes sample labels according to corresponding user information in a regularization strategy manner.

In one embodiment, the risk model includes a plurality of sub-risk models, the multi-dimensional feature information is input into the risk model to generate at least one risk score, including: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score.

More specifically, each sub-risk model may represent a risk of the user in some aspect, and sub-risk model a may represent, for example, a risk of the user returning to the resource over time; the sub-risk model B may, for example, represent the risk that the user does not plan for resources; the sub-risk model C may, for example, represent the risk of intentional fraud by the user.

In S208, risk warning information is generated when the at least one risk score satisfies a preset policy. At least one joint score may be generated, for example, by randomly combining the at least one risk score; and when the at least one joint score meets a preset strategy, generating the risk warning information.

For example, the risk scores may be combined, and compared according to a combination value and a preset policy to determine whether to generate risk warning information. More specifically, the generation of the warning message may be determined when the risk score a is greater than 0.5 and the risk score B is greater than 0.3; it may also be determined to generate warning information, for example, when the risk score C is greater than 0.8.

According to the risk warning generation method, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.

It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

Fig. 3 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment. The flow 30 shown in fig. 3 is a complementary description of the flow shown in fig. 2.

As shown in fig. 3, in S302, multidimensional feature information of a plurality of historical users is acquired. And generating the multidimensional feature information from the user information of a plurality of historical users based on a preset strategy.

In S304, sample tags are respectively assigned to the plurality of history users based on the multi-dimensional feature information. Comprising the following steps: comparing the user information of the historical user with a plurality of discrimination strategies; and distributing sample labels to the historical users based on the discrimination strategies satisfied by the user information, wherein the sample labels are expressed by discrete positive integers.

The number of sample tags can be determined according to the number of sub-models in the risk model to be trained, the risk sub-models can be, for example, A, B and C, and the corresponding risk tags can be the numbers 1,2,3 and 4. Wherein label 1 represents risk a, label 2 represents risk B, label 3 represents risk C, and label 4 represents no risk.

In S306, tag parameters are determined for the sample tags based on a regularization policy. Comprising the following steps: generating a determined deviation coefficient based on a regularization strategy; and generating label parameters of the sample labels based on the deviation coefficient. The values of the tags determined above may be smoothed such that the tags are in the form of probability values, where the probability values at the real tags are the largest and the probability values at other locations are very small numbers. Therefore, the distance between different classifications in training is increased, the distance in the classification is reduced, the overfitting of prediction is reduced, and the prediction robustness is improved.

Can, for example, have a coefficient of deviation of

Wherein epsilon is a super parameter, K is the total classification number, namely the number of the neutron models in the application, and i represents one of a plurality of classes. The meaning of the above formula is that the probability that i is a label of a certain class is (1-epsilon) and the probability that i is not a label of a certain class is

In S308, training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model. Multiple historical user input machine learning models with sample tags, tag parameters may be trained, for example; generating a cross entropy loss function based on the tag parameters during training; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.

In one embodiment, the cross entropy loss function may be solved, for example, based on a gradient descent approach; and taking the stable solution of the cross entropy loss function as the optimal solution.

Specifically, aiming at a sample set of each label, respectively constructing a sub-model, inputting user information of each historical user in the sample set into the sub-model to obtain a predicted label, comparing the predicted label with a corresponding real label, judging whether the predicted label is consistent with the real label, counting the number of the predicted labels consistent with the real label, calculating the proportion of the number of the predicted labels consistent with the real label in the number of all the predicted labels, if the proportion is greater than or equal to a preset proportion, converging the sub-model to obtain a trained sub-model, if the proportion is less than the preset proportion, adjusting parameters in the sub-model, and re-predicting the predicted labels of each object through the adjusted sub-model until the proportion is greater than or equal to the preset proportion. The method for adjusting the parameters in the adjustment model can be performed by adopting a random gradient descent algorithm, a gradient descent algorithm or a normal equation.

In the application, the machine learning model can be a classification model, and specifically can be one or a combination of multiple classification algorithms such as logistic regression, naive Bayes, decision trees, support vector machines, random forests, gradient lifting trees and the like, and if the number of times of adjusting the parameters of the adjustment model exceeds the preset number of times, the type of the machine learning model used by the structure model can be replaced so as to improve the model training efficiency.

According to the risk warning generation method, the label is subjected to smoothing processing, so that the model classification hyperplane is not close to the original data, the weight of the class probability of the real label when the loss value is calculated is reduced, and the weight of the prediction probability of other classes in the final loss function is increased. Therefore, the difference between the probability of the real category and the probability average value of other categories is reduced, excessive confidence of the model is reduced, and the risk user is effectively identified.

Fig. 4 is a flowchart illustrating a risk alert generation method according to another exemplary embodiment. The flow 40 shown in fig. 4 is a detailed description of "generate preset policy".

As shown in fig. 4, in S402, a plurality of pieces of history user information satisfying a preset condition are acquired. In this embodiment, the financial resource borrowing is taken as an example for illustration, and it can be understood that the method of the present application can also be applied to other allocation scenarios. Based on real business data of a certain financial service platform, historical users with overdue 30+ (MOB3+) of repayment performance in 3 periods are defined as target samples of modeling, and the proportion of overdue samples is less than 5% through index analysis such as vintage and mobility.

In S404, the plurality of historical user information is subjected to data cleansing and data fusion to generate a plurality of historical feature information. After the information is fused to form the ten-thousand-dimension wide-table variable, the data is required to be further cleaned so as to ensure the stability and accuracy of the later model. The data cleaning step includes, but is not limited to, variable deletion rate analysis and processing, outlier processing, continuous variable discretization and WOE conversion, discrete variable WOE conversion, text variable processing, and the like.

In S406, a plurality of historical multi-dimensional feature information is determined from the plurality of historical feature information. Variable parameters, distinguishing degree parameters, information values and model characteristic parameters of the plurality of historical characteristic information can be calculated; and extracting a plurality of historical multidimensional feature information from the plurality of historical feature information based on the variable parameter, the distinguishing degree parameter, the information value and the model feature parameter.

The method can be used for screening the characteristics with high coverage rate and obvious distinguishing effect on the target variable as multidimensional characteristics by combining various comprehensive consideration of variable coverage, single value coverage, relevance and significance of the target variable, distinguishing degree (KS) and Information Value (IV) of the target variable, characteristic importance of tree models (such as XGBoost, RF and the like) and the like.

In S408, a feature policy is generated based on the relationships between the plurality of historical multi-dimensional feature information and the historical user information.

Those skilled in the art will appreciate that all or part of the steps implementing the above described embodiments are implemented as a computer program executed by a CPU. The above-described functions defined by the above-described methods provided by the present disclosure are performed when the computer program is executed by a CPU. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic disk or an optical disk, etc.

Furthermore, it should be noted that the above-described figures are merely illustrative of the processes involved in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment. As shown in fig. 5, the risk warning generation device 50 includes: information module 502, feature module 504, scoring module 506, and warning module 508.

The information module 502 is configured to obtain user information of a user, where the user information includes basic information and behavior information;

The feature module 504 is configured to generate multidimensional feature information based on the user information and feature policies;

The scoring module 506 is configured to input the multidimensional feature information into a risk model, and generate at least one risk score, where the risk model is generated by user information of a historical user and a machine learning model, and the historical user distributes sample tags according to corresponding user information in a regularization strategy manner;

The warning module 508 is configured to generate risk warning information when the at least one risk score satisfies a preset policy.

According to the risk warning generating device, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; when the at least one risk score meets a preset strategy, the risk warning information is generated, so that the problem of over fitting caused by over sampling or under sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks can be rapidly determined, and the safety of user resource allocation is improved.

An electronic device 600 according to such an embodiment of the present disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 6, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.

Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps in the present specification according to various exemplary embodiments of the present disclosure. For example, the processing unit 610 may perform the steps as shown in fig. 2, 3, and 4.

The memory unit 620 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.

The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 600' (e.g., keyboard, pointing device, bluetooth device, etc.), devices that enable a user to interact with the electronic device 600, and/or any devices (e.g., routers, modems, etc.) that the electronic device 600 can communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, as shown in fig. 7, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the embodiments of the present disclosure.

The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The computer-readable medium carries one or more programs, which when executed by one of the devices, cause the computer-readable medium to perform the functions of: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multidimensional feature information based on the user information and feature policies; inputting the multidimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes sample labels in a regularization strategy mode according to the corresponding user information; and generating risk warning information when the at least one risk score meets a preset strategy. The computer readable medium may also implement the following functions: acquiring multidimensional feature information of a plurality of historical users; sample labels are respectively distributed to the plurality of historical users based on the multidimensional feature information; determining tag parameters for the sample tags based on a regularization strategy; training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model. The computer readable medium may also implement the following functions: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.

Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and include several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that this disclosure is not limited to the particular arrangements, instrumentalities and methods of implementation described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A risk warning generation method, comprising:

acquiring multidimensional feature information of a plurality of historical users;

comparing the user information of the historical user with a plurality of discrimination strategies;

determining the number of sample tags according to the number of sub-models in the risk model to be trained;

distributing sample labels to the historical users based on a discrimination strategy satisfied by the user information, wherein the sample labels are expressed by discrete positive integers;

Generating a deviation coefficient based on a regularization strategy;

Coefficient of deviation ；

Wherein,K is the total classification number, namely the number of sub-models,Representing a class of the plurality of classes;

generating label parameters of the sample labels based on the deviation coefficient, and smoothing the numerical values of the labels expressed by discrete positive integers to enable the labels to be in the form of probability values, wherein the probability values at the real labels are maximum;

Training a machine learning model based on the plurality of historical users and corresponding sample tags and tag parameters thereof to generate a risk model, wherein the risk model comprises a plurality of sub-risk models;

Acquiring user information of a user, wherein the user information comprises basic information and behavior information;

generating multidimensional feature information based on the user information and feature policies;

determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information;

inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score;

randomly combining the at least one risk score to generate at least one joint score;

and when the at least one joint score meets a preset strategy, generating the risk warning information.

2. The method of claim 1, wherein obtaining multi-dimensional characteristic information for a plurality of historical users comprises:

Acquiring a plurality of pieces of historical user information meeting preset conditions;

performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information;

Determining a plurality of historical multidimensional feature information from the plurality of historical feature information;

a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.

3. The method of claim 1, wherein training a machine learning model based on the plurality of historical users and their corresponding sample tags, tag parameters to generate the risk model comprises:

inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training;

generating a cross entropy loss function based on the tag parameters during training;

And when the cross entropy loss function obtains an optimal solution, determining the risk model based on model parameters of a current machine learning model.

4. A method as claimed in claim 3, comprising, when the cross entropy loss function obtains an optimal solution:

solving the cross entropy loss function based on a gradient descent mode;

And taking the stable solution of the cross entropy loss function as the optimal solution.

5. A risk warning generation device, comprising:

the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information;

the feature module is used for generating multidimensional feature information based on the user information and the feature strategy;

The scoring module is used for acquiring multidimensional feature information of a plurality of historical users; comparing the user information of the historical user with a plurality of discrimination strategies; determining the number of sample tags according to the number of sub-models in the risk model to be trained; distributing sample labels to the historical users based on a discrimination strategy satisfied by the user information, wherein the sample labels are expressed by discrete positive integers; generating a deviation coefficient based on a regularization strategy; generating label parameters of the sample labels based on the deviation coefficient, and smoothing the numerical values of the labels expressed by discrete positive integers to enable the labels to be in the form of probability values, wherein the probability values at the real labels are maximum;

Coefficient of deviation ；

Wherein,K is the total classification number, namely the number of sub-models,Representing a class of the plurality of classes; training a machine learning model based on the plurality of historical users and corresponding sample tags and tag parameters thereof to generate a risk model, wherein the risk model comprises a plurality of sub-risk models; and determining at least one sub-risk model from a plurality of sub-risk models of the risk model based on the user information; inputting the multi-dimensional characteristic information into at least one sub-risk model to generate the at least one risk score;

The warning module is used for generating risk warning information when the at least one risk score meets a preset strategy, and randomly combining the at least one risk score to generate at least one joint score; and when the at least one joint score meets a preset strategy, generating the risk warning information.

6. An electronic device, comprising:

One or more processors;

a storage means for storing one or more programs;

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

7. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.