CN111783039A

CN111783039A - Risk determination method, risk determination device, computer system and storage medium

Info

Publication number: CN111783039A
Application number: CN202010616218.5A
Authority: CN
Inventors: 许韩晨玺; 贾壮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-16
Anticipated expiration: 2040-06-30
Also published as: CN111783039B

Abstract

The application discloses a risk determination method, a risk determination device, a computer system and a storage medium, and relates to the technical field of cloud computing. The specific implementation scheme is as follows: acquiring an application program list of a user, wherein the application program list comprises at least one piece of first application program information; matching each piece of first application program information in the application program list with each piece of second application program information in N application program sets respectively to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1; determining a risk feature vector of the user according to the matching result, wherein the dimension of the risk feature vector of the user is N-dimension; and processing the risk feature vector of the user using a risk prediction model to determine risk information for the user. Through the technical scheme disclosed by the application, the value of the user data in the aspect of financial wind control can be fully mined, and the data asset valuation is realized.

Description

Risk determination method, risk determination device, computer system and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and more particularly, to a risk determination method, a risk determination apparatus, a computer system, and a computer-readable storage medium.

Background

Risk control, or simply wind control, is critical to financial transactions such as credit and insurance. The purpose of the wind control is to determine the possibility of bad behaviors such as overdue rate, fraud rate and the like of a user by analyzing the relevant information of the user applying for the business, so as to determine the approval or rejection of the application of the user and how to price the application after the application is approved (such as loan interest rate, insurance claim rate and the like). Therefore, it is important for a wind-controlled scenario to effectively analyze user-related information.

In carrying out the present disclosure, the inventors have discovered that over time, there is an increasing amount of user-related information. How to analyze the user related information and form valuable user characteristics has important significance on the wind control judgment and other related aspects.

Disclosure of Invention

In view of the above, the present disclosure provides a risk determination method, a risk determination apparatus, a computer system, and a computer-readable storage medium.

One aspect of the present disclosure provides a risk determination method, including: acquiring an application program list of a user, wherein the application program list comprises at least one piece of first application program information; matching each piece of first application program information in the application program list with each piece of second application program information in N application program sets respectively to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1; determining the risk characteristic vector of the user according to the matching result, wherein the dimension of the risk characteristic vector of the user is N-dimension; and processing the risk feature vector of the user by using a risk prediction model so as to determine the risk information of the user.

Another aspect of the present disclosure provides a risk determination device, comprising: the device comprises a first acquisition module, a matching module, a first determination module and a processing module.

The device comprises a first acquisition module and a second acquisition module, wherein the first acquisition module is used for acquiring an application program list of a user, and the application program list comprises at least one piece of first application program information.

And the matching module is used for respectively matching each piece of the first application program information in the application program list with each piece of second application program information in N application program sets to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1.

And the first determining module is used for determining the risk characteristic vector of the user according to the matching result, wherein the dimension of the risk characteristic vector of the user is N-dimension.

And the processing module is used for processing the risk characteristic vector of the user by using a risk prediction model so as to determine the risk information of the user.

Another aspect of the present disclosure provides a computer system comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method as described above.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the value of the user data in the aspect of financial wind control can be fully mined, so that the data asset value can be better realized, and the related project practice and product construction in the aspect of financial wind control can be supported and assisted.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an exemplary system architecture to which the risk determination methods and apparatus may be applied, according to an embodiment of the disclosure;

FIG. 2 schematically illustrates a flow chart of a method of risk determination according to an embodiment of the present disclosure;

fig. 3 schematically illustrates a schematic diagram of matching each first application information in an application list with each second application information in N application sets, respectively, according to an embodiment of the present disclosure;

FIG. 4 schematically shows a flow diagram for generating N sets of applications according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for cluster analysis of second application information in application lists of a plurality of sample users, according to an embodiment of the disclosure;

FIG. 6 schematically shows a schematic diagram of clustering the second application information in each risk level into K cluster classes according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow chart for generating risk feature vectors for sample users according to an embodiment of the present disclosure;

FIG. 8 schematically shows a block diagram of a risk determination device according to an embodiment of the present disclosure; and

fig. 9 schematically illustrates a block diagram of a computer system suitable for implementing a risk determination method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

With the popularization and wide use of terminal devices (e.g., smart phones, notebook computers, etc.), the types and the number of applications in user terminal devices are increasing. For each user, the application installed and used in the terminal equipment can effectively describe some personal attributes and characteristics of the user.

For example, knowledge-based applications (scallop words, learners, etc.) may generally predict a user's level of education; e-commerce and group purchase application programs (such as Taobao, Shuduo and the like) can reflect the consumption level and consumption habits of users; loan-like applications (immediate finance, clap credits, etc.) may be used to infer the economic status and loan frequency of the user. In addition, due to the development of the internet financial industry, the types of loan application programs are various, and the number of loan application programs is relatively large among users with poor credit (high overdue risk).

Therefore, the application program list has information with strong correlation with the user characteristics, and can be used as user characteristics for modeling and evaluating the risk information of the user.

The embodiment of the disclosure provides a risk determination method and a risk determination device, wherein the method comprises the following steps: acquiring an application program list of a user, wherein the application program list comprises at least one piece of first application program information; matching each piece of first application program information in the application program list with each piece of second application program information in N application program sets respectively to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1; determining a risk feature vector of the user according to the matching result, wherein the dimension of the risk feature vector of the user is N-dimension; and processing the risk feature vector of the user using a risk prediction model to determine risk information for the user.

Fig. 1 schematically illustrates an exemplary system architecture to which the risk determination method and apparatus may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various types of connections, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a shopping application, a loan application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, to name a few examples.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and/or otherwise process the received data, such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the risk determination method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the risk determination device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The risk determination method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the risk determination apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Alternatively, the risk determination method provided by the embodiment of the present disclosure may also be executed by the

terminal device

101, 102, or 103, or may also be executed by another terminal device different from the

terminal device

101, 102, or 103. Accordingly, the risk determination apparatus provided by the embodiment of the present disclosure may also be disposed in the

terminal device

101, 102, or 103, or in another terminal device different from the

terminal device

101, 102, or 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a risk determination method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, an application list of a user is acquired, wherein the application list includes at least one piece of first application information.

According to the embodiment of the disclosure, the first application information in the application list of the user has an association relationship with the user, and can be used for describing some personal attributes and traits of the user. The first application information may be, for example, identification information of an application used by the user. The application list of the user may include information of applications installed on the terminal device used by the user, or may include information of applications that are run on the terminal device used by the user but are not installed.

In operation S220, each piece of first application information in the application list is respectively matched with each piece of second application information in N application sets, to obtain a matching result, where each application set includes at least one piece of second application information, and N is a positive integer greater than 1.

According to the embodiment of the disclosure, the N application sets may be preset, where N may be set according to an actual situation. For example, N may be determined according to the dimensionality of the risk feature vector, or may also be determined according to the number of risk levels. For example, the risk levels may include 4 levels of low risk, high risk, and then N may be determined to be equal to a multiple of 4 or 4, etc., although the disclosure is not limited thereto, and each set of applications may correspond to a respective risk level. According to an embodiment of the present disclosure, one or more second applications may be included in each set of applications.

Fig. 3 schematically shows a schematic diagram of matching each first application information in the application list with each second application information in the N application sets according to an embodiment of the present disclosure.

As shown in fig. 3, the application list 301 of the user includes applications 1 to 5. Taking N equal to 3 as an example, the application set 302 includes applications a to c, the application set 303 includes applications d to g, and the application set 304 includes applications u to y.

Matching each first application information in the user's application list with each second application information in the N application sets, respectively, may be as follows.

First, each first application in the application list 301 is first matched with a second application in the first application set 302, and a first number of first applications in the application list 301 that are matched with the second application in the application set 302 is calculated. For example, the first number is 2.

Then, the first application in the application list 301 that is not matched may be matched to the second application in the second set of applications 303, and a second number of the first applications in the application list 301 that are matched to the second application in the set of applications 303 may be calculated. For example, the second number is 1.

Subsequently, the first application in the application list 301 that is not matched may be matched to the second application in the third set of applications 304, and a third number of the first applications in the application list 301 that are matched to the second application in the set of applications 304 may be calculated. For example, the third number is 2.

According to the above embodiment of the present disclosure, the first number of matches with the second application in the application set 302 is 2, the second number of matches with the second application in the application set 303 is 1, and the third number of matches with the second application in the application set 304 is 2 as a matching result.

According to the embodiment of the disclosure, for N application sets, the ith number of first applications matched with second applications in the ith application set in the application list may be calculated, where i is greater than or equal to 1 and less than or equal to N.

According to the above embodiment of the present disclosure, when the ith number of the first application programs matched with the second application programs in the ith application program set in the application program list is respectively calculated, the above description mode of fig. 3 may be adopted, and details are not repeated here.

In operation S230, a risk feature vector of the user is determined according to the matching result, wherein a dimension of the risk feature vector of the user is N-dimension.

According to an embodiment of the present disclosure, each ith quantity may represent a numerical value in one dimension in the risk feature vector of the user. The value of the ith number may be equal to 0 or greater than 0, and the specific value is related to the number of the first applications matched with the second applications in the ith application set.

According to the above embodiments of the present disclosure, taking N equal to 3 as an example, the risk feature vector of the user may be (2, 1, 2), for example.

In operation S240, the risk feature vector of the user is processed using a risk prediction model to determine risk information of the user.

According to an embodiment of the present disclosure, the risk prediction model may be a classification model, for example, a support vector machine, a neural network model, or the like. The risk information of a user may be, for example, the probability that the user belongs to different risk classes.

According to the embodiment of the disclosure, each piece of first application information in the application program list of the user is matched with each piece of second application information in the N application program sets to obtain a matching result, the risk feature vector of the user is determined according to the matching result, and the risk feature vector of the user is processed by using the risk prediction model, so that the risk information of the user can be determined. Because the application programs in the application program list of the user can effectively depict some personal attributes and characteristics of the user, the application program information in the application program list is respectively matched with each application program information in the application program set, the risk characteristic vector of the user is determined according to the matching result, the risk characteristic vector can be used as a user characteristic for risk prediction, the value of user data in the aspect of financial wind control is fully mined, and therefore data asset valuing is better realized, and related project practice and product construction in the aspect of financial wind control are supported and assisted. Therefore, the technical problem that the evaluation effect is poor due to the fact that the user risk characteristics determined by adopting the related technology are used for risk evaluation is at least partially solved, and the technical effect of improving the accuracy of risk evaluation is further achieved.

According to an embodiment of the present disclosure, before determining risk information of a new user, N sets of applications may be obtained using a sample user set, wherein characteristics of second applications in different sets of applications may be different.

According to the embodiments of the present disclosure, for example, an application list corresponding to a user having a risk tag (e.g., overdue, fraud, etc.) may be mined to generate N application sets with the existing risk tag as a supervision. Furthermore, the application program list of each sample user can be characterized by using the N application program sets, so that a risk feature vector suitable for a wind control scene is formed. And modeling by utilizing the risk characteristic vector of each sample user, and predicting the risk information of the new client.

The method shown in fig. 2 is further described with reference to fig. 4-9 in conjunction with specific embodiments.

FIG. 4 schematically shows a flow diagram for generating N sets of applications, according to an embodiment of the disclosure.

As shown in fig. 4, the method includes operations S410 to S430.

In operation S410, an application list of each of a plurality of sample users having tags is acquired.

According to an embodiment of the present disclosure, the label of the sample user indicates risk information of the sample user. The labels of the sample users can be, for example, risky users and non-risky users. The number of sample users can be determined according to factors such as model precision and training time consumption.

In operation S420, cluster analysis is performed on the second application information in the application lists of the plurality of sample users to obtain N cluster classes.

According to the embodiment of the disclosure, the application programs are clustered by using the product descriptions of the application programs, and the application programs are divided into different sets of functions, so that the similar or similar application programs are concentrated in one cluster.

According to an embodiment of the present disclosure, clustering analysis of the second application information in the application lists of the plurality of sample users may be performed, for example, in the following manner.

Firstly, obtaining the description of each piece of second application program information, and generating a vector of each piece of second application program information according to the description of each piece of second application program information; and performing clustering analysis on the vector of each second application program information to obtain N clusters.

In operation S430, each class cluster of the N class clusters is used as an application set, so as to obtain N application sets.

According to the embodiment of the disclosure, the vectors of the second application information are subjected to cluster analysis, so that the applications in the same application set have the same or similar characteristic attributes.

According to an embodiment of the present disclosure, generating a vector of each second application information according to the description of each second application information includes: the description of the second application program information can be subjected to word segmentation processing aiming at each piece of second application program information to obtain a plurality of words; determining a word vector of each word in the plurality of words to obtain a plurality of word vectors; and generating a vector of second application information from the plurality of word vectors.

Taking the second application as an example of panning, the description of panning is as follows: and the software meets the requirements of life consumption and online shopping. According to the embodiment of the disclosure, the above description may be subjected to word segmentation processing, and words such as "satisfy", "life consumption", "online shopping", "demand", and "software" are obtained. According to the embodiment of the disclosure, each word can be vectorized to obtain a word vector of each word, and finally, a vector about pan-bao is generated according to a plurality of word vectors.

According to an embodiment of the present disclosure, generating a vector of second application information from the plurality of word vectors comprises: the mean of the plurality of word vectors is calculated and then used as the vector of second application information.

According to the embodiment of the present disclosure, for example, the values of all word vectors in the corresponding dimensions may be weighted and summed, and then averaged in each dimension, and the calculated vector may be used as the vector of the second application information.

According to the embodiment of the disclosure, the risk labels of the sample users are used as supervisors, the second application information in the application lists of the plurality of sample users is divided into M risk levels, so that the application information in different risk levels has different meanings of the wind control scenes, such as low risk, high risk, extremely high risk and the like, and the second application information in each risk level is clustered.

Fig. 5 schematically illustrates a flow chart of cluster analysis of second application information in application lists of a plurality of sample users according to an embodiment of the present disclosure.

As shown in fig. 5, performing cluster analysis on the second application information in the application lists of the plurality of sample users to obtain N cluster classes includes operations S510 to S530.

In operation S510, a risk probability of each second application information is calculated.

According to an embodiment of the present disclosure, the risk probability of each second application information may be calculated, for example, as follows.

For each second application information, determining a first list number of the application lists in which the second application information is recorded in the application lists of the plurality of sample users, and determining a second list number in which the application lists of the plurality of sample users are recorded and the labels indicate risks, and taking a ratio of the second list number to the first list number as a risk probability of the second application information.

According to an embodiment of the present disclosure, for example, an application list of 1000 sample users is used. The 1000 sample users include 200 risky users and 800 non-risky users. For each second application APP in the 1000 sample users' application list_iAnd calculating the risk probability.

On the one hand, calculate and record APP_iIn other words, the calculation uses the APP_iThe number of sample users of (2) may be, for example, 400. On the other hand, calculate and record APP_iAnd the number of application lists labeled as risky for the sample user, for example, can be 150. Taking the ratio between 150 and 400 as APP_iThe risk probability of (2).

In operation S520, the second application information in the application lists of the plurality of sample users is divided into M risk levels according to the risk probability of each second application information, where M is a positive integer greater than 1.

According to embodiments of the present disclosure, for example, the risk level may include low risk, high risk, very high risk, and the like, although the present disclosure is not limited thereto. The classification of risk classes may be made using risk probabilities. For example, the risk probability is low between 0% and 20%, the risk probability is low between 20% and 40%, the risk probability is high between 60% and 80%, and the risk probability is high between 80% and 100%.

In operation S530, for each risk level of the M risk levels, the second application information in each risk level is clustered into K clusters, where K is a positive integer greater than 0, and the product of K and M is equal to N.

Fig. 6 schematically shows a schematic diagram of clustering the second application information in each risk level into K class clusters according to an embodiment of the present disclosure.

As shown in fig. 6, the second application information (APP is taken as an example) in the application lists of the multiple sample users is divided into a low-risk APP set, a lower-risk APP set, a higher-risk APP set, a high-risk APP set, and the like.

According to the embodiment of the disclosure, for each risk level, the second application information in each risk level is clustered into K class clusters, K x M class clusters can be obtained, and each class cluster can be used as an application set. For example, clustering analysis is performed on the APPs in the low-risk APP set to obtain K clusters. And performing clustering analysis on the APPs in the high-risk APP set to obtain K clusters.

According to the embodiment of the disclosure, the risk probability of the second application program is used as supervision grading, so that the second application programs in different sets obtain different meanings of the wind control scene, attributes of the second application program and financial wind control are realized to be embodied in the risk feature vector, and the generated risk feature vector has an actual reference value of risk assessment after each piece of first application program information in the application program list is respectively matched with each piece of second application program information in the N application program sets.

According to the embodiment of the disclosure, before the risk feature vector of the new user is processed by using the risk prediction model, the risk prediction model can be trained in advance.

According to the embodiment of the disclosure, an application program list of each sample user in a plurality of sample users with labels can be obtained, and then a risk feature vector of each sample user is generated; and training the initial model by using the risk feature vectors of the plurality of sample users and the labels of the plurality of sample users to obtain a risk prediction model.

In the process of implementing the present disclosure, the inventor finds that if the vectorization is directly performed on the application program with high installation frequency and the wind control model modeling is performed by using the feature vector, the following disadvantages exist: firstly, the feature vector dimension of the application program generated in the way is high, and the storage and training expenses are large; secondly, the user installed application program has larger order of magnitude difference compared with the total number of all common application programs in the application market, so that the characteristics are sparse and the overfitting is easy to realize. The attributes of the application programs related to the financial risk cannot be embodied in the feature vectors, and the vector distances of the application programs of all dimensions in the features are equal, so that the application programs related to the financial risk and common application programs of non-financial risks are not distinguished, and the wind control attributes are difficult to embody.

According to the embodiment of the disclosure, the risk feature vector of each sample user can be generated by utilizing the N application program sets, so that the dimensionality of the risk feature vector is reduced, and the risk feature vector reflects the wind control attribute.

For example, each first application information in the application list of each sample user is matched with each second application information in the N application sets, so as to obtain a matching result, and the risk feature vector of the user is determined according to the matching result. For a specific processing procedure, reference may be made to the description in fig. 2, which is not described herein again.

According to the embodiment of the disclosure, the risk feature vector of the sample user is generated by using N application program sets, and each application program set in the N application program sets has the same risk level and the same function application program, so that the risk feature vector has a wind control attribute, the dimensionality is reduced, and the processing and modeling are facilitated.

Fig. 7 schematically illustrates a flow chart for generating risk feature vectors for sample users according to an embodiment of the present disclosure.

As shown in fig. 7, the method of generating a risk feature vector of a sample user includes operations S710 to S780.

In operation S710, APP list information of a sample user with a tag y is obtained as raw data for subsequent processing, where the tag y-0 indicates a risk-free user and the tag y-1 indicates a risk-free user.

In operation S720, a risk probability P (y ═ 1| APP) corresponding to each APP is counted_i). Namely at the installation of APP_iOf all sample users in (1), i.e., users at risk (positive samples, i.e., risk samples). And sorting according to the risk probability.

In operation S730, the M risk levels are classified according to the risk probability according to the above-mentioned ranking result. For example: high risk level (corresponding to a certain risk probability interval pl)₁<P(y＝1|APP_i)<ph₁) Higher risk level (corresponding to a certain risk probability interval pl)₂<P(y＝1|APP_i)<ph₂) And so on.

In operation S740, for each APP_iRespectively obtaining the corresponding application description text from the application market_i. For text_iAnd performing word segmentation, and pre-training word segmentation results of each application description text by using an unsupervised model method, wherein the model or method can be adopted, such as fastText and the like. And obtaining a pre-training model of the APP brief introduction corpus word vectors.

In operation S750, for each risk level APP set (refer to fig. 6), the pre-training model is used to perform word segmentation and processing on the APP description text, and the APP is obtained_iVector sen _ vec of_i。

In operation S760, the sentences vectors corresponding to all APPs in the APP set corresponding to each risk level m are used to cluster the APPs in the risk level m, so as to obtain K clusters C_m1，……，C_mK. The APP Set corresponding to each cluster is Set_m1，Set_m2，……， Set_mK。

In operation S770, APP sets of each class at each stage are obtained. M × K APP sets can be obtained: set₁₁，Set₁₂，……，Set_1K；Set₂₁，Set₂₂，……，Set_2K；……；Set_M1……， Set_MK。

In operation S780, a risk feature vector of each sample user is generated by performing matching counting on the application program of each sample user and each APP in the M × K APP sets. Specific examples of the present inventionE.g. first, a zero vector V of dimension M x K is generated_MKFor each installed APP list of a user, check one by one whether the APP in the APP list belongs to any one of the M × K APP sets. For example, APP in APP List_iBelong to the jth APP set, then V_MK[j]＝V_MK[j]+1, wherein, V_MK[j]The value in the j-th dimension is characterized. After all the traversals are complete, the resulting vector V_MKI.e. the risk feature vector for the sample user.

According to the embodiment of the disclosure, the risk feature vector of the sample user is generated by using the N application program sets, so that the risk feature vector has a wind control attribute, the dimensionality is reduced, the storage and the training overhead are favorably reduced, and the processing and modeling are convenient.

Fig. 8 schematically shows a block diagram of a risk determination device according to an embodiment of the present disclosure.

As shown in fig. 8, the risk determination device 800 includes: a first obtaining module 810, a matching module 820, a first determining module 830, and a processing module 840.

The first obtaining module 810 is configured to obtain an application list of a user, where the application list includes at least one piece of first application information.

The matching module 820 is configured to match each piece of the first application information in the application list with each piece of second application information in N application sets, to obtain a matching result, where each application set includes at least one piece of second application information, and N is a positive integer greater than 1.

The first determining module 830 is configured to determine a risk feature vector of the user according to the matching result, where a dimension of the risk feature vector of the user is N-dimension.

The processing module 840 is configured to process the risk feature vectors of the user using a risk prediction model to determine risk information of the user.

According to the embodiment of the disclosure, each piece of first application information in the application program list of the user is matched with each piece of second application information in the N application program sets to obtain a matching result, the risk feature vector of the user is determined according to the matching result, and the risk feature vector of the user is processed by using the risk prediction model, so that the risk information of the user can be determined. Because the application programs in the application program list of the user can effectively depict some personal attributes and characteristics of the user, the application program information in the application program list is respectively matched with each application program information in the application program set, the risk characteristic vector of the user is determined according to the matching result, the risk characteristic vector can be used as a user characteristic for risk prediction, the value of user data in the aspect of financial wind control is fully mined, and therefore data asset value is better achieved, and related project practice and product construction in the aspect of financial wind control are supported and assisted. Therefore, the technical problem that the evaluation effect is poor due to the fact that the user risk characteristics determined by adopting the related technology are used for risk evaluation is at least partially solved, and the technical effect of improving the accuracy of risk evaluation is further achieved.

According to an embodiment of the present disclosure, the risk determination device 800 further comprises: the device comprises a second obtaining module, a clustering module and a second determining module.

And the second acquisition module is used for acquiring the application program list of each sample user in the plurality of sample users with the labels.

And the clustering module is used for clustering and analyzing the second application program information in the application program lists of the plurality of sample users to obtain N clusters.

And a second determining module, configured to use each of the N class clusters as one application set to obtain the N application sets.

According to an embodiment of the present disclosure, the clustering module includes: the device comprises a calculation unit, a division unit and a first clustering unit.

And a calculating unit, configured to calculate a risk probability of each of the second application information.

And a dividing unit, configured to divide the second application information in the application lists of the multiple sample users into M risk levels according to the risk probability of each piece of the second application information, where M is a positive integer greater than 1.

And a first clustering unit, configured to cluster, for each risk level of the M risk levels, the second application information in each risk level into K clusters, where K is a positive integer greater than 0, and a product of K and M is equal to N.

According to an embodiment of the present disclosure, the label indicates risk information of the sample user; the calculation unit is configured to determine, for each of the second application information, a first list number of application lists in which the second application information is recorded among application lists of a plurality of the sample users, and determine a second list number of application lists in which the second application information is recorded and a label indicates a risk among the application lists of a plurality of the sample users; and taking a ratio of the number of the second lists to the number of the first lists as a risk probability of the second application information.

According to an embodiment of the present disclosure, the risk determination device 800 further comprises: the device comprises a generation module and a training module.

And the generating module is used for generating the risk characteristic vector of each sample user by utilizing the N application program sets.

And the training module is used for training an initial model by using the risk characteristic vectors of the plurality of sample users and the labels of the plurality of sample users to obtain the risk prediction model.

According to an embodiment of the present disclosure, the clustering module includes: the device comprises an acquisition unit, a generation unit and a second clustering unit.

An obtaining unit configured to obtain a description of each of the second application information.

And a generating unit configured to generate a vector of each of the second application information according to the description of each of the second application information.

And a second clustering unit, configured to perform clustering analysis on each vector of the second application information to obtain N cluster types.

According to an embodiment of the present disclosure, the generating unit is configured to perform word segmentation processing on the description of the second application information for each piece of the second application information to obtain a plurality of words; determining a word vector of each word in the plurality of words to obtain a plurality of word vectors; and generating a vector of the second application information from a plurality of the word vectors.

According to an embodiment of the present disclosure, generating a vector of the second application information according to a plurality of the word vectors includes: calculating a mean of a plurality of said word vectors; and using the average value as a vector of the second application information.

According to an embodiment of the present disclosure, the matching module is configured to calculate, for the N application sets, an ith number of the first application programs that are matched with a second application program in an ith application program set in the application program list, respectively, where i is greater than or equal to 1 and less than or equal to N; wherein each of said ith quantities represents a value in one dimension of said user's risk feature vector.

Any of the modules and units according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging the circuits, or in any suitable combination of any of three implementations, software, hardware and firmware. Alternatively, one or more of the modules and units according to embodiments of the disclosure may be implemented at least partly as computer program modules, which, when executed, may perform corresponding functions.

For example, any number of the first obtaining module 810, the matching module 820, the first determining module 830 and the processing module 840 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the first obtaining module 810, the matching module 820, the first determining module 830 and the processing module 840 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in a suitable combination of any of them. Alternatively, at least one of the first obtaining module 810, the matching module 820, the first determining module 830 and the processing module 840 may be at least partly implemented as a computer program module, which when executed may perform a corresponding function.

It should be noted that, the risk determination device part in the embodiment of the present disclosure corresponds to the risk determination method part in the embodiment of the present disclosure, and the description of the risk determination device part specifically refers to the risk determination method part, which is not described herein again.

FIG. 9 schematically shows a block diagram of a computer system suitable for implementing the above described method according to an embodiment of the present disclosure. The computer system illustrated in FIG. 9 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 9, a computer system 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or related chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the system 900 are stored. The processor 901, ROM 902, and RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

System 900 may also include an input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The system 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. A drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium described above carries one or more programs which, when executed, implement a method according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It will be appreciated by a person skilled in the art that various embodiments of the disclosure and/or features recited in the claims may be combined and/or coupled in a number of ways, even if such combinations or couplings are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of risk determination, comprising:

acquiring an application program list of a user, wherein the application program list comprises at least one piece of first application program information;

matching each piece of first application program information in the application program list with each piece of second application program information in N application program sets respectively to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1;

determining a risk feature vector of the user according to the matching result, wherein the dimension of the risk feature vector of the user is N-dimension; and

processing the risk feature vector of the user with a risk prediction model to determine risk information for the user.

2. The method of claim 1, further comprising:

obtaining an application program list of each sample user in a plurality of sample users with labels;

performing cluster analysis on second application program information in the application program lists of the plurality of sample users to obtain N clusters; and

and taking each class cluster in the N class clusters as one application program set to obtain the N application program sets.

3. The method of claim 2, wherein performing cluster analysis on the second application information in the application lists of the plurality of sample users to obtain N clusters comprises:

calculating a risk probability of each of the second application information;

dividing the second application information in the application program lists of the plurality of sample users into M risk levels according to the risk probability of each second application information, wherein M is a positive integer greater than 1; and

for each risk level of the M risk levels, clustering the second application information in each risk level into K cluster classes, where K is a positive integer greater than 0, and the product of K and M is equal to N.

4. The method of claim 3, wherein the label indicates risk information of a sample user;

the calculating the risk probability of each second application information comprises:

for each of the second application information, determining a first list number of application lists in which the second application information is recorded among the application lists of the plurality of the sample users, and determining a second list number in which the application lists of the second application information is recorded and a label indicates a risk; and

and taking the ratio of the second list quantity to the first list quantity as the risk probability of the second application program information.

5. The method of claim 1, further comprising:

generating a risk feature vector for each sample user using the set of N applications; and

and training an initial model by using the risk feature vectors of the plurality of sample users and the labels of the plurality of sample users to obtain the risk prediction model.

6. The method of claim 2, wherein performing cluster analysis on the second application information in the application lists of the plurality of sample users to obtain N clusters comprises:

obtaining a description of each of the second application information;

generating a vector of each piece of second application information according to the description of each piece of second application information; and

and performing clustering analysis on the vector of each second application program information to obtain N clusters.

7. The method of claim 6, wherein generating a vector for each of the second application information from the description for each of the second application information comprises:

for each piece of second application program information, performing word segmentation processing on the description of the second application program information to obtain a plurality of words;

determining a word vector of each word in the plurality of words to obtain a plurality of word vectors; and

generating a vector of the second application information from a plurality of the word vectors.

8. The method of claim 7, wherein generating the vector of second application information from the plurality of word vectors comprises:

calculating a mean of a plurality of the word vectors; and

and taking the average value as a vector of the second application program information.

9. The method of claim 1, wherein matching each of the first application information in the application list with each of the second application information in the N application sets respectively, and obtaining a matching result comprises:

respectively calculating the ith number of the first application programs matched with the second application programs in the ith application program set in the application program list aiming at the N application program sets, wherein i is more than or equal to 1 and less than or equal to N;

wherein each of the ith quantities represents a numerical value in one dimension in the risk feature vector of the user.

10. A risk determination device, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an application program list of a user, and the application program list comprises at least one piece of first application program information;

the matching module is used for respectively matching each piece of first application program information in the application program list with each piece of second application program information in N application program sets to obtain a matching result, wherein each application program set comprises at least one piece of second application program information, and N is a positive integer greater than 1;

the first determining module is used for determining the risk characteristic vector of the user according to the matching result, wherein the dimension of the risk characteristic vector of the user is N-dimension; and

a processing module for processing the risk feature vector of the user using a risk prediction model to determine risk information of the user.

11. The apparatus of claim 10, further comprising:

the second acquisition module is used for acquiring an application program list of each sample user in the plurality of sample users with the labels;

the clustering module is used for clustering and analyzing second application program information in the application program lists of the plurality of sample users to obtain N clusters; and

and the second determining module is used for taking each class cluster in the N class clusters as the application program set to obtain the N application program sets.

12. The apparatus of claim 11, wherein the clustering module comprises:

a calculation unit configured to calculate a risk probability of each of the second application information;

the dividing unit is used for dividing the second application information in the application program lists of the plurality of sample users into M risk levels according to the risk probability of each piece of second application information, wherein M is a positive integer greater than 1; and

a first clustering unit, configured to cluster, for each risk level of the M risk levels, the second application information in each risk level into K clusters, where K is a positive integer greater than 0, and a product of K and M is equal to N.

13. The apparatus of claim 12, wherein the label indicates risk information of a sample user;

the computing unit is to:

14. The apparatus of claim 11, further comprising:

a generating module for generating a risk feature vector for each sample user using the set of N applications; and

15. The apparatus of claim 11, wherein the clustering module comprises:

an acquisition unit configured to acquire a description of each of the second application information;

a generating unit configured to generate a vector of each of the second application information according to the description of each of the second application information; and

and the second clustering unit is used for clustering and analyzing the vector of each second application program information to obtain N clusters.

16. The apparatus of claim 15, wherein the generating unit is to:

17. The apparatus of claim 16, wherein generating the vector of second application information from the plurality of word vectors comprises:

calculating a mean of a plurality of the word vectors; and

18. The apparatus of claim 10, wherein the matching module is to:

19. A computer system, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-9.

20. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.