CN113762967A

CN113762967A - Risk information determination method, model training method, device, and program product

Info

Publication number: CN113762967A
Application number: CN202110350708.XA
Authority: CN
Inventors: 石亚庆; 林元晟; 柳婷; 王晓勤; 罗尚勇
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-12-07

Abstract

The risk information determining method, the model training method, the device and the program product provided by the disclosure comprise the following steps: acquiring service data to be evaluated of a service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain; inputting data characteristics corresponding to the same characteristic domain into a coding network corresponding to the characteristic domain to obtain characteristic vectors corresponding to the characteristic domain; inputting the feature vectors corresponding to the feature domains into an attention layer to obtain a total vector comprising the relationship among the feature vectors; inputting the total vector into a determining submodule, and determining an expectation corresponding to the service data to be evaluated by using the determining submodule; and determining risk information corresponding to the business data to be evaluated according to the expectation. According to the scheme, the data characteristics belonging to the characteristic domain are processed through the coding network corresponding to the characteristic domain to obtain the characteristic vector of the characteristic domain, so that the characteristic vector of the business activity is accurately extracted, and a risk assessment result can be accurately obtained.

Description

Risk information determination method, model training method, device, and program product

Technical Field

The present disclosure relates to artificial intelligence technologies, and in particular, to a risk information determination method, a model training method, a device, and a program product.

Background

At present, online shopping platforms are developed more and more mature, and various online shopping platforms can also promote various preferential activities so as to promote the volume of trades of commodities. With the increase of the preferential activities, the black and grey industry based on the online shopping platform is promoted. Some users reserve a large number of accounts of the online shopping platform, and participate in preferential activities by utilizing the accounts to conduct arbitrage.

In order to deal with the black and grey industry of the online shopping platform, a risk assessment technology exists in the prior art, and the risk assessment is carried out on all links of a full link in which a user participates in preferential activities, so that the risk assessment technology is an effective means for preventing the black and grey industry. The risk assessment techniques are further classified into rule-based risk assessment methods and algorithm model-based risk assessment methods.

The rule-based risk assessment method has strong dependence on expert experience, the rule release period is long, and the attack on the black and gray industry is delayed relatively. In the risk assessment method based on the algorithm model, data generated by participation in preferential activities are high in feature dimensionality and strong in sparsity, and the characteristics of the data cannot be effectively learned through a traditional modeling or neural network structure.

Disclosure of Invention

The present disclosure provides a risk information determination method, a model training method, a device, and a program product, so as to solve the problem in the prior art that the risk of a business activity cannot be accurately evaluated.

A first aspect of the present disclosure is to provide a method for determining risk information of business activities, including:

acquiring service data to be evaluated corresponding to the service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain;

inputting the data features corresponding to the same feature domain into a coding network corresponding to the feature domain to obtain feature vectors corresponding to the feature domain;

inputting the feature vectors corresponding to the feature domains into an attention layer to obtain a total vector comprising the relationship among the feature vectors;

inputting the total vector into a determining submodule, and determining an expectation corresponding to the service data to be evaluated by using the determining submodule;

and determining risk information corresponding to the business data to be evaluated according to the expectation, wherein the risk information is used for indicating the risk degree of the business data to be evaluated.

A second aspect of the present disclosure is to provide a training method for a model for assessing risk of business activity, the model comprising: the coding network, the attention layer and the determining submodule corresponding to each characteristic domain;

the method comprises the following steps:

acquiring a plurality of sample data corresponding to the business activity; each said sample data comprises a plurality of data features, each said data feature corresponding to a feature field;

inputting the data characteristics corresponding to the same characteristic domain in each sample data into a coding network corresponding to the characteristic domain to obtain a characteristic vector corresponding to the characteristic domain;

inputting the feature vectors corresponding to the feature domain into the attention layer to obtain a total vector corresponding to each sample data and including the relationship among the feature vectors;

and inputting the total vector of each sample data into the determining submodule, determining the expectation of each sample data by using the determining submodule, and training the model according to the expectation of each sample data to obtain a model for evaluating the risk of business activities.

A third aspect of the present disclosure is to provide an apparatus for determining risk information of business activity, including:

the acquisition module is used for acquiring the service data to be evaluated corresponding to the service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain;

the coding module is used for inputting the data characteristics corresponding to the same characteristic domain into a coding network corresponding to the characteristic domain to obtain a characteristic vector corresponding to the characteristic domain;

the embedding module is used for inputting the feature vectors corresponding to the feature domains into an attention layer to obtain a total vector comprising the relationship among the feature vectors;

the expectation determining module is used for inputting the total vector into a determining submodule and determining an expectation corresponding to the service data to be evaluated by using the determining submodule;

and the risk determining module is used for determining risk information corresponding to the business data to be evaluated according to the expectation, and the risk information is used for indicating the risk degree of the business data to be evaluated.

A fourth aspect of the present disclosure is to provide a training apparatus for evaluating a model of business activity risk, the model comprising: the coding network, the attention layer and the determining submodule corresponding to each characteristic domain;

the device comprises:

an obtaining module, configured to obtain a plurality of sample data corresponding to the service activity; each said sample data comprises a plurality of data features, each said data feature corresponding to a feature field;

the coding module is used for inputting the data characteristics corresponding to the same characteristic domain in each sample data into a coding network corresponding to the characteristic domain to obtain a characteristic vector corresponding to the characteristic domain;

the embedding module is used for inputting the feature vectors corresponding to the feature domains into the attention layer to obtain a total vector corresponding to each sample data and including the relationship among the feature vectors;

a expectation determination module for inputting the total vector of each sample data into the determination submodule, and determining an expectation of each sample data by using the determination submodule;

and the training module is used for training the model according to the expectation of each sample data to obtain a model for evaluating the business activity risk.

A fifth aspect of the present disclosure is to provide an electronic apparatus, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method for determining risk information of a business activity according to the first aspect or the method for training a model for assessing risk of a business activity according to the second aspect.

A sixth aspect of the present disclosure is to provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method for determining risk information of a business activity according to the first aspect or the method for training a model for evaluating risk of a business activity according to the second aspect.

A seventh aspect of the present disclosure is to provide a computer program product comprising a computer program which, when executed by a processor, implements the method for determining risk information of a business activity according to the first aspect or the training method for evaluating a model of business activity risk according to the second aspect.

The risk information determining method, the model training method, the equipment and the program product provided by the disclosure have the technical effects that:

the risk information determining method, the model training method, the device and the program product provided by the disclosure comprise the following steps: acquiring service data to be evaluated corresponding to the service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain; inputting the data features corresponding to the same feature domain into the coding network corresponding to the feature domain to obtain feature vectors corresponding to the feature domain; inputting the feature vectors corresponding to the feature domains into an attention layer to obtain a total vector comprising the relationship among the feature vectors; inputting the total vector into a determining submodule, and determining an expectation corresponding to the service data to be evaluated by using the determining submodule; and determining risk information corresponding to the business data to be evaluated according to the expectation, wherein the risk information is used for indicating the risk degree of the business data to be evaluated. According to the scheme provided by the disclosure, the data characteristics belonging to the characteristic domain are processed through the coding network corresponding to the characteristic domain, so that the combination and the reinforcement of the intra-domain information can be fully performed, the characteristic vector of the characteristic domain can be obtained, the characteristic vector of the business activity can be accurately extracted under the condition that the sparsity of the data characteristics of the business activity is strong, and the risk assessment result of the business activity can be accurately obtained.

Drawings

Fig. 1 is a flowchart illustrating a method for determining risk information of a business activity according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a risk assessment model shown in an exemplary embodiment of the present application;

fig. 3 is a schematic structural diagram of each coding network shown in an exemplary embodiment of the present application;

FIG. 4 is a schematic illustration of a process of an attention layer shown in an exemplary embodiment of the present application;

fig. 5 is a flowchart illustrating a method for determining risk information of a business activity according to another exemplary embodiment of the present application;

FIG. 6 is a flowchart illustrating a training method for a model for assessing risk of business activity according to an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a risk assessment model shown in an exemplary embodiment of the present application;

FIG. 8 is a flowchart illustrating a training method for a model for assessing risk of business activity according to another exemplary embodiment of the present application;

fig. 9 is a block diagram illustrating a risk information determination apparatus for business activities according to an exemplary embodiment of the present application;

fig. 10 is a block diagram illustrating a risk information determination apparatus for business activities according to another exemplary embodiment of the present application;

FIG. 11 is a block diagram illustrating a training apparatus for a model for assessing risk of business activities according to an exemplary embodiment of the present application;

FIG. 12 is a block diagram of a training apparatus for evaluating a model of business activity risk according to another exemplary embodiment of the present application;

fig. 13 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present invention.

Detailed Description

Currently, in order to deal with the black and grey industry of the online shopping platform, a risk assessment technology is needed to effectively assess business activities so as to determine whether the business activities are easily utilized by the black and grey industry. The risk assessment technology carries out real-time risk assessment on marketing resources in each link of a full link of business activities, so that a series of risk control treatment means such as resource offline or account interception can be adopted for high-risk arbitrage behaviors in time. The risk assessment techniques are further classified into rule-based risk assessment methods and algorithm model-based risk assessment methods.

The further division of the rule-based risk scoring method can be divided according to a rule generation mode, and the further division can be divided into rules based on expert experience and rules generated by using technologies such as feature importance mining and the like; the rule use mode is divided into a single rule and a combined rule, the single rule can finish judgment by using one judgment rule, and the combined rule needs other conditions or rule combination judgment.

However, the rule-based risk scoring method strongly depends on expert experience, the rule making and releasing period is long, and hysteresis exists in real-time countermeasures of black and gray products. Furthermore, as traffic changes, constant iteration rules are required, which otherwise face the situation where the countermeasures decay. In the face of the massive dimensional characteristics of a complex business scene, even experts have difficulty in formulating online rules covering all risk problems. Risk assessment is carried out through rules, the overall mobility of the scheme is not achieved, and personalized customization is needed according to different scenes.

The risk assessment method based on the algorithm model adopts artificial intelligence technologies such as machine learning and deep learning to perform feature processing and model training on marketing scene data, and risk prediction is performed according to model results.

The anomaly detection technology in the field of machine learning plays a vital role in the field of E-commerce wind control, and is different from the traditional supervised modeling countermeasure, and the unsupervised anomaly detection technology does not need specific label training, can autonomously mine the modes and the relations in data, and can mine the anomaly data more quickly and more widely.

Generally, the anomaly detection methods are mainly classified into the following four categories: (1) a statistical test method, which generally assumes that normal data obeys normal distribution, and abnormal data has larger offset relative to the normal data; (2) a time-series detection method of discovering abnormal data by detecting abnormal points in a sequence that are inconsistent with a pattern, such as sudden rises or falls, trend changes, hierarchical transformations, and the like; (3) the supervised learning method converts the abnormal detection into a two-classification problem, namely normal data and abnormal data, and performs the abnormal detection by utilizing a mature two-classification machine learning or deep learning algorithm; (4) the unsupervised learning method generally performs detection by a clustering method, and if a certain data and a class center are relatively far away, the data is abnormal data.

However, the risk mining method based on clustering or space division still cannot process massive high-dimensional feature situations. And based on a method of firstly reducing dimensions and then excavating outliers, information contained in high-dimensional features is kept as much as possible, and then abnormal points are found out from low-dimensional vectors through clustering or space division. The two training processes are decoupled, two optimization targets with different directions are provided, and some key information can be lost in the dimension reduction process, so that the final effect can easily reach a suboptimal solution. The characteristics of business activities in the online shopping platform are high in dimensionality and strong in sparsity, the characteristics are obviously divided according to domains, and the relation between sparse characteristic domains cannot be well learned through traditional modeling or neural network fitting. In addition, the abnormal score finally output by the model is usually represented by the distance from the sample to the center of each cluster or the discrete degree of the subspace, and the large floating range of the score cannot play a good role in quantifying and guiding the service.

In order to solve the technical problem, in the scheme provided by the application, the data features of the business activities are divided according to the feature domains, the feature vectors of the data features of the feature domains are extracted through the coding networks corresponding to the feature domains, and then the risk assessment is performed on the business activities according to the feature vectors corresponding to the feature domains.

Fig. 1 is a flowchart illustrating a method for determining risk information of a business activity according to an exemplary embodiment of the present application.

As shown in fig. 1, the method for determining risk information of business activities provided by the present application includes:

step 101, acquiring service data to be evaluated corresponding to service activities; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to one characteristic domain.

The method provided by the application can be executed by an electronic device with computing capability, and the electronic device can be a background server of an e-commerce platform.

Specifically, the risk assessment model can be obtained by pre-training, and the trained risk assessment model is deployed in the electronic device. The electronic device can evaluate and process the business data to be evaluated of the business activity based on the deployed risk evaluation model.

Fig. 2 is a schematic diagram of a risk assessment model according to an exemplary embodiment of the present application.

As shown in fig. 2, the risk assessment model provided by the present application may include a plurality of coding networks 21, an attention layer 22, and a determination submodule 23. The coding network 21, the attention layer 22 and the determining submodule 23 are connected in sequence, the business data to be evaluated of the business activity can be input into the coding network 21, the expectation of the business data to be evaluated can be obtained based on the sequential processing of the coding network 21, the attention layer 22 and the determining submodule 23, and the electronic device can determine risk degree risk information for indicating the business data to be evaluated according to the expectation.

Further, the electronic device may obtain to-be-evaluated service data corresponding to the service activity, where the to-be-evaluated service data includes a plurality of data features, and each data feature corresponds to one feature domain.

In actual application, a plurality of feature fields may be set in advance, for example, a first feature field related to the registered account information may be set, and a second feature field related to the illegal reverse selling offer information may also be set. In the data features, features related to the registered account are divided into a first feature domain, so that the features have a corresponding relation with the first feature domain, and features of illegal reverse selling offer information can be divided into a second feature domain. In this way, the data features used for representing the same kind of information can be divided into the same feature domain, and then the incidence relation among the data features used for representing the same kind of information can be concerned.

For example, the service data to be evaluated includes N data features, and the N features may be divided into K feature fields according to information represented by each data feature.

And 102, inputting the data characteristics corresponding to the same characteristic domain into the coding network corresponding to the characteristic domain to obtain the characteristic vector corresponding to the characteristic domain.

In the scheme provided by the application, a plurality of coding networks corresponding to the characteristic domains are arranged, and the data characteristics of the characteristic domains are processed through the coding networks corresponding to the characteristic domains, so that the characteristic vectors corresponding to the characteristic domains are obtained.

Fig. 3 is a schematic structural diagram of each coding network according to an exemplary embodiment of the present application.

As shown in fig. 3, for example, a first coding network 31 corresponding to a first feature field is provided, a second coding network 32 corresponding to a second feature field is provided, and a kth coding network 3K corresponding to a kth feature field is provided.

The electronic device can input the data characteristics corresponding to the same characteristic field in the plurality of data characteristics of the service data to be evaluated into the coding network corresponding to the characteristic field, and further can process the data characteristics of the characteristic field by using the coding network.

For example, a plurality of data features 311 to be divided into a first feature domain are input to the first encoding network 31, a plurality of data features 321 to be divided into a second feature domain are input to the second encoding network 32.

The electronic device processes the data features input into the coding network based on the coding network to obtain a feature vector, for example, the electronic device processes the input data features 311 based on the first coding network 31 to obtain a feature vector 312. The electronic device processes the input data features 321 based on the second coding network 32 to obtain feature vectors 322.

Specifically, the sparse features of each feature domain are coded into dense feature vectors E of a specified dimension by a coding network_KComprises the following steps:

E_K＝σ(X_K；θ_f)

wherein E_kAnd representing the feature vectors of the service data to be evaluated on K different feature domains.

The data features belonging to the same feature domain are used for representing the same type of information, for example, the data features belonging to the first feature domain are used for representing the features of the registered account, and the data features belonging to the second feature domain are used for representing the features of the illegal reverse selling coupon information. Therefore, by processing the data features belonging to the feature domain through the coding network corresponding to the feature domain, the combination and enhancement of the intra-domain information can be sufficiently performed, and the feature vector of the feature domain can be obtained.

Therefore, a feature vector corresponding to each feature domain can be extracted by a plurality of coding networks. And then, a plurality of feature vectors corresponding to the plurality of feature domains in the service data to be evaluated can be extracted, and a plurality of feature vectors used for representing information categories corresponding to the feature domains are obtained. For example, a feature vector used for characterizing the registered account in the service data to be evaluated can be obtained, and for example, a feature vector used for characterizing the back-selling benefit information in the service data to be evaluated can be obtained.

Specifically, the weight parameters in each coding network may be different, and may be specifically adjusted and determined in the model training process. The structures of the coding networks may be the same or different, and may be specifically set according to requirements.

Step 103, inputting the feature vectors corresponding to the feature domains into the attention layer to obtain a total vector including the relationship among the feature vectors.

Further, in the solution provided by the present application, the risk assessment model further includes an attention layer, and the feature vectors corresponding to the feature domains may be input into the attention layer. The attention layer may process the feature vectors of the respective feature domains to obtain a total vector including the relationship between the respective feature vectors.

Fig. 4 is a schematic diagram illustrating a processing procedure of an attention layer according to an exemplary embodiment of the present application.

As shown in fig. 4, each feature vector 41 corresponding to each feature domain may be input to the attention layer 42, and the electronic device may process the feature vectors input to the attention layer 42 according to the attention layer 42 to determine the relationship between the feature vectors, so as to obtain a total vector 43 including the relationship between the feature vectors.

In practical application, the electronic device may determine a relationship between every two feature vectors according to the weight in the attention layer, and may determine a new expression of the feature vector based on the relationship between the feature vector and each of the other feature vectors, so that information of the other feature vectors may be embedded in the new expression. For example, the new representation of the first feature vector may be determined based on the relationship between the first feature vector and each of the other feature vectors.

Wherein the electronic device may construct a total vector comprising the relationship between the respective feature vectors according to the new representation of the respective feature vectors. The total vector comprises the relationship among the feature vectors, so that the feature vectors can synthesize the feature vectors of a plurality of information categories in the data to be evaluated, and further, the information included in the data to be evaluated is accurately extracted.

Specifically, the weight values in the attention layer can be obtained by training the model.

And 104, inputting the total vector into a determining submodule, and determining the expectation corresponding to the service data to be evaluated by using the determining submodule.

Further, the risk assessment model further includes a determination submodule, and the electronic device may input the total vector determined in step 103 into the determination submodule, and process the total vector according to a weight value in the determination submodule, so as to obtain an expectation of the business data to be assessed.

In an embodiment, the electronic device may compress the total vector to reduce the dimensionality of the total vector, and then process the vector with the reduced dimensionality to obtain the expectation of the total vector, that is, the expectation corresponding to the service data to be evaluated.

In practical application, a gaussian distribution parameter may be set in the determination submodule, and the gaussian distribution parameter may be obtained by training the model. For example, if k feature fields are set, k gaussian distribution parameters may be set.

The total vector or the compressed total vector may be processed according to parameters of each gaussian distribution, and an expectation corresponding to the service data to be evaluated is determined, where the expectation is used to represent a distance between the service data to be evaluated and each gaussian distribution. And then, whether the service data to be evaluated is abnormal or not can be determined according to the distance between the service data to be evaluated and each Gaussian distribution.

And 105, determining risk information corresponding to the business data to be evaluated according to the expectation, wherein the risk information is used for indicating the risk degree of the business data to be evaluated.

Specifically, the expectation determined by the risk assessment model can be converted into risk information, and then the risk degree of the business data to be assessed is indicated through the risk information. For example, a mapping function may be preset, and the expectation of the service data to be evaluated may be mapped to a score value in the range of 0 to 1 by the mapping function. The expectation may be mapped to the range of 0-1, for example, by means of a translation log transformation, sigmoid mapping, etc.

According to the embodiment, if the floating range of expectations of different data to be evaluated is large, the numerical values can be mapped into the same score scale, and the risk degree of the data to be evaluated is further well represented.

In one embodiment, a risk threshold may be set, and if the determined risk information exceeds the risk threshold, an alarm may be issued.

The method provided by the present embodiment is used for risk information of business activities, and is performed by a device provided with the method provided by the present embodiment, and the device is generally implemented in a hardware and/or software manner.

The method for determining the risk information of the business activity comprises the following steps: acquiring service data to be evaluated corresponding to the service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain; inputting the data features corresponding to the same feature domain into the coding network corresponding to the feature domain to obtain feature vectors corresponding to the feature domain; inputting the feature vectors corresponding to the feature domains into an attention layer to obtain a total vector comprising the relationship among the feature vectors; inputting the total vector into a determining submodule, and determining an expectation corresponding to the service data to be evaluated by using the determining submodule; and determining risk information corresponding to the business data to be evaluated according to the expectation, wherein the risk information is used for indicating the risk degree of the business data to be evaluated. According to the method, the data characteristics belonging to the characteristic domain are processed through the coding network corresponding to the characteristic domain, the combination and the reinforcement of the intra-domain information can be fully performed, the characteristic vector of the characteristic domain is obtained, the characteristic vector of the business activity can be accurately extracted under the condition that the sparsity of the data characteristics of the business activity is strong, and the risk assessment result of the business activity can be accurately obtained.

Fig. 5 is a flowchart illustrating a method for determining risk information of a business activity according to another exemplary embodiment of the present application.

As shown in fig. 5, the method for determining risk information of business activity provided by the present application includes:

step 501, acquiring an offline feature corresponding to a business activity; real-time data corresponding to the business activity is obtained.

Step 502, determining data to be evaluated of business activities according to the offline characteristics and the real-time data; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to one characteristic domain.

The features in the data to be evaluated may include both offline features and real-time features. For example, an offline feature corresponding to a business activity may be preset, and when there is real-time data corresponding to the business activity, the offline feature of the business activity may be obtained and processed together with the real-time data to obtain to-be-evaluated business data corresponding to the business activity.

Specifically, the offline characteristics can accurately reflect the characteristics of the business activities, the online real-time data can be generated in real time, and the real-time characteristics and the offline characteristics of the business activities are processed, so that not only can the characteristic information accurate to the business activities be extracted, but also the real-time performance of the business data to be evaluated can be ensured, and the accuracy of evaluating the business activities is improved.

Further, a feature processing engine may also be provided. For example, after generating the real-time data related to the first business activity, the offline feature of the first business activity may be obtained, and the feature processing engine is used to process the offline feature and the real-time data to obtain the business data X to be evaluated, which includes a plurality of data features.

Step 503, inputting the data features corresponding to the same feature domain into the coding network corresponding to the feature domain to obtain the feature vector corresponding to the feature domain.

Step 503 is similar to step 102 in execution principle and manner, and is not described again.

Step 504, determining feature similarity between every two feature vectors according to the feature vectors corresponding to the feature domains.

And 505, determining a total vector comprising the relationship among the characteristic vectors according to the similarity of the characteristic vectors.

The electronic device may input each feature vector corresponding to each feature domain obtained in step 503 into an attention layer of the model, and determine a total vector including a relationship between each feature vector through the attention layer. Specifically, steps 504 and 505 may be performed.

The electronic device may determine feature similarity between every two feature vectors according to the feature vectors corresponding to the feature domains. For example, a function for calculating the similarity may be provided, and the similarity between two feature vectors in each feature vector may be determined by the function.

For example, a similarity function Φ may be set, and the feature similarity Φ between E1 and E2 (E1, E2) may be determined by the similarity function Φ for the two feature vectors E1 and E2. Feature similarity can characterize the relationship between feature vectors belonging to different feature domains.

For the combination of any two eigenvectors Em, Ek,

comprises the following steps:

wherein W_query、W_keyTo convert the matrix, the values in the matrix may be updated during training of the model.

Specifically, a total vector including a relationship between each of the feature vectors may be determined according to a feature similarity between every two feature vectors. Specifically, a new expression of each feature vector can be determined according to the feature similarity, and then a total vector P including information of each feature vector itself and information between feature vectors can be determined according to the new expression of each feature vector.

Further, the relation coefficient between each feature vector and other feature vectors can be determined according to the feature similarity between each feature vector and other feature vectors. For example, for the feature vector Em, the similarity Φ (Em, Ek) between the feature vector Em and any other feature vector Ek can be determined. The relation coefficient between Em and other feature vectors can be determined according to the feature similarity phi (Em, Ek) of Em.

In practical application, a mode for determining the relationship coefficient may be preset, and then the relationship coefficient between each feature vector and other feature vectors may be determined according to the mode, where the relationship coefficient is used to represent the relationship between two feature vectors. The coefficient of the relationship between the feature vector Em and the feature vector Ek may specifically be determined based on the following formula:

the eigenvector Em and the eigenvector Ek are any two of the eigenvectors determined by each coding network. K is the number of feature fields.

The feature domain embedded vector of each feature vector can be determined according to the relation coefficient between each feature vector and other feature vectors and each feature vector; wherein the total vector comprises a feature domain embedding vector for each feature vector.

Specifically, any two feature vectors have a relationship coefficient, and therefore, a feature domain embedding vector of a feature vector can be determined according to the relationship coefficient between the feature vector and another feature vector, where the feature domain embedding vector includes information of the feature vector and information of another feature vector.

Further, for any feature vector Em, the feature domain embedding vector may be:

wherein W_valueIs a transformation matrix, the values in the matrix may be updated during training of the model.

In actual application, the vectors can be embedded according to the feature domain of each feature vector to generate a total vector. The total vector represents the embedded vector for each different feature domain

And (5) performing global pooling.

The total vector P may be:

step 506, inputting the total vector into a compression network to obtain a compressed vector.

According to the scheme provided by the application, the determining submodule comprises a compression network, and the electronic equipment can process the total vector by using the compression network to obtain a compression vector corresponding to the total vector.

Specifically, the compression network has a weight parameter, and the weight parameter is obtained by training the model. The compression network is responsible for reducing the dimension of the total vector P.

Further, when the total vector is processed based on the compression network, the total vector may be encoded according to the encoding function and the first neural network parameter, so as to obtain an encoded vector. The first neural network parameters are obtained by training a model.

The coding vector Zc of the total vector P is:

Z_c＝h(P；θ_e)

where h () is the coding function and θ e is the first neural network parameter.

In practical application, a decoding function can be set in the compression network, and the electronic device can decode the encoding vector according to the decoding function and the second neural network parameter to obtain a decoding vector. The second neural network parameters are obtained by training the model.

The decoded vector X' of the encoded vector Zc is:

X′＝g(Z_c；θ_d)

where g () is the decoding function and θ d is the second neural network parameter.

Specifically, the reconstruction error Zr may be determined according to the feature included in the data X to be evaluated and the decoding vector X'. The reconstruction error Zr is used to characterize the difference between the decoded vector X' and the characteristics of the data X to be evaluated.

Further, z_rF (X, X'). Where the function f () is used to determine the difference between the coded vector and the characteristics of the data to be evaluated. For example, the decoded vector may be subtracted from the features of the data to be evaluated to obtain the reconstruction error Zr.

In practical application, the encoding vector Z can be used_cReconstruction error z_rAnd determining a compression vector Z. The compressed vector Z is:

Z＝[Z_c，Z_r]

wherein, can be to Z_c，Z_rAnd splicing to obtain a compressed vector Z.

Specifically, the compressed vector is obtained by processing the total vector in this way, so that the problem that the information included in the data to be evaluated cannot be accurately reflected by the obtained total vector P due to the loss of the feature information when the features of each feature domain in the data to be evaluated are extracted through the coding network can be avoided.

Step 507, determining an expectation corresponding to the data to be evaluated according to the compressed vector and preset parameters corresponding to the Gaussian distributions; the number of Gaussian distributions is the number of characteristic domains; it is desirable to characterize the distance between the data to be evaluated and each gaussian distribution.

Furthermore, preset parameters corresponding to the gaussian distributions can be set in the determination submodule, and the preset parameters can be determined in a model training mode.

In practical application, K preset parameters of Gaussian distribution are set, namely the Gaussian distribution with the same number as the feature domains is set, and the preset parameters of each Gaussian distribution can represent the characteristics of one feature domain through training the model.

Determining an expectation corresponding to the data to be evaluated according to the compressed vector and preset parameters corresponding to the Gaussian distributions; it is desirable to characterize the distance between the data to be evaluated and each gaussian distribution. Whether the data to be evaluated has abnormal conditions can be determined through the expectation.

Specifically, the preset parameter of each gaussian distribution may include the occurrence probability of the gaussian distribution

Mean value

Covariance matrix

The compressed code may be processed according to preset parameters of each gaussian distribution to determine an expectation, i.e., an expectation of the data to be evaluated.

Further, the expectation e (z) of the ith piece of data to be evaluated is:

wherein zi represents the compressed vector of the ith piece of data to be evaluated.

Specifically, it is desirable to characterize the distance between the data to be evaluated and each gaussian distribution, and therefore, it is desirable to be able to determine whether there is an abnormal situation in the data to be evaluated.

Step 508, mapping the expectation of the data to be evaluated as a risk score according to a preset mapping function.

In practical application, the scheme provided by the application is further provided with a preset mapping function, and the preset mapping function is used for mapping the expectation of the data to be evaluated into the risk score. Since there is a large floating range between expectations for different data to be assessed, mapping expectations for data to be assessed to risk scores, each expectation can be mapped to the same scale range to provide a referenceable risk score.

Wherein, the ith data z to be evaluated_iRisk score of (z)_i) Can be as follows:

mscore(z_i)＝sigmoid(log(E(z)+abs(γ(E(z))+ε))

specifically, γ (e (z)) is a threshold partition function of all sample spaces, min, average, k- σ, etc. can be selected, and ε can be 0.01 as a hyperparameter.

And step 508, if the risk score is larger than the threshold value, early warning is carried out.

Further, a threshold value for evaluating the risk score can be set, and if the risk score is larger than the threshold value, an early warning can be sent out. For example, early warning information may be sent to users associated with business activities.

FIG. 6 is a flowchart illustrating a training method for a model for assessing risk of business activity according to an exemplary embodiment of the present application.

As shown in fig. 6, the training method of the model for evaluating business activity risk provided by the present application includes:

601, acquiring a plurality of sample data corresponding to business activities; each sample data includes a plurality of data features, each data feature corresponding to a feature field.

The method provided by the present application can be executed by an electronic device with computing capability, such as a computer.

Specifically, a risk assessment model may be set up in advance, where the risk assessment model includes: and the coding network, the attention layer and the determining submodule correspond to each characteristic domain. And training the built model. The trained risk assessment model can be deployed in a server of the e-commerce platform, and the server can evaluate and process the business data to be assessed of the business activity based on the deployed risk assessment model.

Fig. 7 is a schematic diagram of a risk assessment model according to an exemplary embodiment of the present application.

As shown in fig. 7, the risk assessment model provided by the present application may include a plurality of coding networks 71, an attention layer 72, and a determination submodule 73. The coding network 71, the attention layer 72 and the determination submodule 73 are connected in sequence, and the model can be trained by using sample data of business activities.

Further, the electronic device may obtain a plurality of sample data corresponding to the business activity, each sample data may include a plurality of data features, and each data feature corresponds to a feature domain. For example, M pieces of sample data may be acquired, and each sample data may include N features.

In actual application, a plurality of feature fields may be set in advance, for example, a first feature field related to the registered account information may be set, and a second feature field related to the illegal reverse selling offer information may also be set. In the data characteristics of the sample data, the characteristics related to the registered account number can be divided into a first characteristic domain, so that the characteristics have a corresponding relation with the first characteristic domain, and the characteristics of the illegal reverse selling discount information can be divided into a second characteristic domain. In this way, the data features used for representing the same kind of information can be divided into the same feature domain, and then the model can learn the association relation between the data features used for representing the same kind of information.

Step 602, inputting the data features corresponding to the same feature field in each sample data into the coding network corresponding to the feature field, and obtaining the feature vector corresponding to the feature field.

The built model is provided with a plurality of coding networks corresponding to the characteristic domains, and the data characteristics of the characteristic domains are processed through the coding networks corresponding to the characteristic domains, so that the characteristic vectors corresponding to the characteristic domains are obtained.

The electronic device may input, of the multiple data features of the sample data, a data feature corresponding to the same feature field into the coding network corresponding to the feature field, and may further process the data feature of the feature field by using the coding network. For example, a plurality of data features divided into a first feature field are input into a first coding network, and a plurality of data features divided into a second feature field are input into a second coding network.

The electronic device processes the data features input into the coding network based on the coding network to obtain the feature vector, for example, the electronic device processes the input data features based on the first coding network to obtain the feature vector. The electronic equipment processes the input data features based on the second coding network to obtain feature vectors.

Therefore, by processing the data features belonging to the feature domain through the coding network corresponding to the feature domain, the combination and enhancement of the intra-domain information can be sufficiently performed, and the feature vector of the feature domain can be obtained.

E_K＝σ(X_K；θ_f)

X_Kis the feature vector of the kth sample data, θ_fAre parameters in the encoded network.

Specifically, the weight parameters in each coding network may be continuously updated during the training iteration. The structures of the coding networks may be the same or different, and may be specifically set according to requirements.

Step 603, inputting the feature vectors corresponding to the feature domain into the attention layer to obtain a total vector corresponding to each sample data and including the relationship between the feature vectors.

Further, in the scheme provided by the application, the constructed model further comprises an attention layer, and the feature vectors corresponding to the feature domains can be input into the attention layer. The attention layer may process the feature vectors of the respective feature domains to obtain a total vector including the relationship between the respective feature vectors.

In practical application, the electronic device may determine a relationship between every two feature vectors according to the weight in the attention layer, and the weight in the attention layer may be updated in a training process. The attention layer may determine a new expression of the feature vector based on the relationship between the feature vector and other respective feature vectors, so that information of other feature vectors may be embedded in the new expression. For example, the new representation of the first feature vector may be determined based on the relationship between the first feature vector and each of the other feature vectors.

Wherein the electronic device may construct a total vector comprising the relationship between the respective feature vectors according to the new representation of the respective feature vectors. The total vector comprises the relationship among the characteristic vectors, so that the characteristic vectors can be integrated with the characteristic vectors of a plurality of information categories in each piece of sample data, and the information contained in the sample data is further accurately extracted.

And step 604, inputting the total vector of each sample data into a determining submodule, determining the expectation of each sample data by using the determining submodule, and training a model according to the expectation of each sample data to obtain a model for evaluating the risk of the business activity.

Furthermore, the built model also comprises a determining submodule, the electronic device can input the determined total vector into the determining submodule, and the total vector is processed according to the weight value in the determining submodule to obtain the expectation of each sample data.

In an embodiment, the electronic device may compress the total vector of each sample data to reduce the dimension of the total vector, and then process the vector with the reduced dimension to obtain the expectation of each total vector, that is, the expectation corresponding to each sample data.

In practical application, the probability vector of each sample belonging to each Gaussian distribution can be determined according to the total vector of each sample after data compression. For example, if k feature domains are provided, k gaussian distributions may be provided. For each batch of sample data, the gaussian distribution to which each sample data in the batch of sample data belongs can be determined, and then the probability vector of each gaussian distribution to which the sample belongs can be obtained.

The parameters of each gaussian distribution can be estimated according to the probability vector, the number of sample data and the total compressed vector of each sample data. During the training iteration, the parameters of each gaussian distribution are also updated.

The expectation of each sample data can be determined according to each estimated gaussian distribution parameter and the total vector of each sample data after compression. It is desirable to be able to represent the probability that a sample belongs to K gaussian distributions. The model is trained through a large number of samples, and the probability that each sample data belongs to K Gaussian distributions can be built in the model, namely, a plurality of Gaussian distribution spaces are built. When the model is used for processing the service data to be evaluated, the probability that the service data to be evaluated belongs to each Gaussian distribution can be determined, the distance between the service data to be evaluated and each Gaussian distribution space is further determined, and abnormal sample data can be further identified.

And an objective function can be set, the objective function can be constructed based on the expectation of each sample, gradient feedback is carried out through the objective function, the coding network and the attention layer corresponding to each feature domain in the model are updated, and the weight value of the sub-module is determined.

Fig. 8 is a flowchart illustrating a training method for a model for assessing risk of business activity according to another exemplary embodiment of the present application.

As shown in fig. 8, the training method of the model for evaluating business activity risk provided by the present application includes:

step 801, acquiring an offline feature corresponding to a business activity; each real-time data corresponding to a business activity is obtained.

Step 802, determining each sample data of the business activity according to the off-line characteristics and each real-time data; each sample data includes a plurality of data features, each data feature corresponding to a feature field.

The features in the sample data may include both offline features and real-time features. For example, an offline feature corresponding to a business activity may be preset, and when there is real-time data corresponding to the business activity, the offline feature of the business activity may be obtained and processed together with the real-time data to obtain sample data corresponding to the business activity.

Specifically, the offline features can accurately reflect the characteristics of the business activities, the online real-time data can be generated in real time, and the real-time features and the offline features of the business activities are processed, so that not only can the feature information accurate to the business activities be extracted, but also the real-time performance of sample data can be ensured, and the accuracy of the trained model is improved.

Further, a feature processing engine may also be provided. For example, after generating real-time data related to a first business activity, the offline feature of the first business activity may be obtained, and the feature processing engine is used to process the offline feature and the real-time data to obtain sample data X including a plurality of data features.

Step 803, inputting each sample data and the data characteristics corresponding to the same characteristic domain into the coding network corresponding to the characteristic domain to obtain the characteristic vector corresponding to the characteristic domain.

Step 803 is similar to step 602 in execution principle and manner, and is not described again.

And step 804, determining the feature similarity between every two feature vectors according to the feature vectors corresponding to the feature domains.

Step 805, determining a total vector including the relationship between the feature vectors according to the feature similarity.

The electronic device may input each feature vector corresponding to each feature domain obtained in step 803 into an attention layer of the model, and determine a total vector including a relationship between each feature vector through the attention layer.

Steps

804, 805 may specifically be performed.

The electronic device may determine feature similarity between every two feature vectors in each sample data according to the feature vectors corresponding to the feature domains in each sample data. For example, a function for calculating the similarity may be provided, and the similarity between two feature vectors in each feature vector may be determined by the function.

Aiming at the combination of any two eigenvectors Em and Ek in one sample data,

comprises the following steps:

wherein W_query、W_keyTo transform the matrix, the values in the matrix may be updated during the model training process.

Specifically, a total vector including a relationship between the feature vectors of each sample data may be determined according to the feature similarity of each sample data. Specifically, the new expression of each feature vector can be determined according to the feature similarity of one sample data, and then the total vector P is determined according to the new expression of each feature vector, wherein the total vector includes the information of each feature vector and the information among the feature vectors.

Further, a relationship coefficient between each feature vector and other feature vectors can be determined according to the feature similarity between each feature vector and other feature vectors, and the relationship coefficient is used for representing the relationship between two feature vectors. For example, for a feature vector Em in one sample data, the similarity Φ (Em, Ek) between the feature vector Em and any other feature vector Ek in the sample data can be determined. The relation coefficient between Em and other feature vectors in the sample data can be determined according to the feature similarity phi (Em, Ek) of Em.

In practical application, a mode for determining the relation coefficient may be preset, and then the relation between each feature vector and other feature vectors may be determined according to the mode. The coefficient of the relationship between the feature vector Em and the feature vector Ek may specifically be determined based on the following formula:

the feature vector Em and the feature vector Ek are any two of the feature vectors of the sample data determined by each encoding network. K is the number of feature fields.

The feature domain embedded vector of each feature vector in the sample data can be determined according to the relation coefficient between each feature vector and other feature vectors in the sample data and each feature vector; wherein the total vector comprises a feature domain embedding vector for each feature vector.

Specifically, any two feature vectors in the sample data have a relationship coefficient, and therefore, a feature domain embedding vector of a feature vector can be determined according to a relationship coefficient between one feature vector in the sample data and other feature vectors in the sample data, where the feature domain embedding vector includes information of the feature vector and information of other feature vectors in the sample data.

Further, for any feature vector Em, the feature domain embedding vector may be:

In actual application, the vector can be embedded according to the feature domain of each feature vector in each sample data, and the total vector of each sample data can be generated. The total vector represents the embedded vector for each different feature domain

And (5) performing global pooling.

The total vector P may be:

step 806, inputting the total vector of each sample data into the compression network to obtain the compression vector of each sample data.

According to the scheme provided by the application, the determining submodule comprises a compression network, and the electronic equipment can process the total vector of each sample data by using the compression network to obtain the compressed vector of each sample data.

Specifically, the compression network has a weight parameter, and the weight parameter can be updated in the model training process. The compression network is responsible for reducing the dimension of the total vector P of each sample data.

Further, when the total vector of the sample data is processed based on the compression network, the total vector of each sample data may be encoded according to the encoding function and the first neural network parameter, so as to obtain the encoding vector of each sample data. The first neural network parameters may be updated during model training.

The coding vector Zc of the total vector P of one sample data is:

Z_c＝h(P；θ_e)

In practical application, a decoding function can be set in the compression network, and the electronic device can perform decoding processing on the encoding vector of one sample data according to the decoding function and the second neural network parameters to obtain the decoding vector of the sample data. The second neural network parameters are updated during the model training process.

The decoded vector X' of the encoded vector Zc of one sample data is:

X′＝g(Z_c；θ_d)

Specifically, the reconstruction error Zr of each sample data may be determined according to the data feature included in each sample data X and the decoding vector X' of each sample data. The reconstruction error Zr is used to characterize the difference between the decoded vector X' of each sample data and the features of each sample data X.

Further, z_rF (X, X'). Where the function f () is used to determine the difference between the decoded vector of the sample data and the data characteristics of the sample data. For example, the feature of the sample data may be subtracted from the decoded vector to obtain the reconstruction error Zr.

In practical application, the encoding vector Z of each sample data can be used_cReconstruction error z of each sample data_rA compressed vector Z for each sample data is determined. The compressed vector Z is:

Z＝[Z_c，Z_r]

wherein, can be to Z_c，Z_rAnd splicing to obtain a compressed vector Z.

Specifically, the compressed vector is obtained by processing the total vector of each sample data in this way, so that the problem that the obtained total vector P cannot accurately reflect the information included in the sample data due to the loss of the characteristic information when the characteristics of each characteristic domain in each sample data are extracted through a coding network can be avoided.

In an optional implementation manner, the reconstruction error Z of each sample data may be further determined_rParameters in the encoded network corresponding to each of the feature fields are modified.

Further, the data features X corresponding to each feature domain in the sample data may be processed through the encoding network to obtain the feature vector E corresponding to each feature domain. By processing the feature vector E corresponding to each feature field, a decoded vector X' can be obtained. If the coding network can accurately extract the features in the sample data to obtain accurate feature vectors, the finally obtained decoding vector X 'can express each data feature X in the sample data, the reconstruction error Zr determined based on the decoding vector X' of each sample data and the data feature X is close to 0, the preset condition of the reconstruction error Zr can be set to 0, and the parameter theta in each coding network is updated based on the constraint condition_f。

Step 807, inputting the compressed vector of each sample data into an estimation network to obtain a probability vector of each sample data belonging to each Gaussian distribution; wherein the number of gaussian distributions is the number of feature domains.

In practical application, the determining submodule is also provided with an estimating network, and the estimating network is used for estimating probability vectors of each Gaussian distribution according to the compressed vectors of each sample data.

The estimation network is provided with a third neural network parameter and a preset function, and the electronic device can process the compressed vector of each sample data according to the parameter set in the estimation network.

Specifically, the output characteristic p of the sample data is:

p＝MLN(Z；θ_m)

further, Z is a compressed vector of sample data, θ_mIs a third neural network parameter. MLN () is a neural network, and the electronic device takes Z as input and processes Z using MLN () and a third neural network parameter to obtain p. MLN () may be a sub-network in the estimation network.

For example, m sample data are used in one training process, and then m compressed vectors can be obtained for the m sample data, so as to obtain m output features p.

In practical application, a preset function can be set in the estimation network, and the preset function is used for determining probability vectors of each sample data belonging to each gaussian distribution.

Wherein the probability vector of each Gaussian distribution

Comprises the following steps:

specifically, softmax () is a preset function, and p is an output characteristic of sample data. For example, each time an output feature p is obtained, the obtained output feature is processed by softmax () to obtain probability vectors of gaussian distributions

If there are m sample data and there are K Gaussian distributions, the probability vector

May be m x K dimensional. Probability vector

Each value in (2) may correspond to a combination including a sample data and a gaussian distribution, and the value is used to indicate whether the sample data belongs to the gaussian distribution, for example, if the value is 1, the sample data belongs to the gaussian distribution space.

For example,

corresponding to the first sample data and the second Gaussian distribution space, then

It may be characterized whether the first sample data belongs to the second gaussian distribution space.

Further, K gaussian distributions may be set, K being the number of feature fields. Through a large amount of sample data, K probability vectors with Gaussian distribution can be obtained

808, determining the expectation of each sample data according to the probability vector of each sample data belonging to each Gaussian distribution and the compressed vector of each sample data; where it is desirable to characterize the distance between the sample data and each gaussian distribution.

In practical application, the space of each gaussian distribution can be constructed according to the probability vector of each sample data belonging to each gaussian distribution and the compressed vector of each sample data, and then the expectation of each sample data is determined according to the space of each gaussian distribution. It is desirable to characterize the distance between the sample data and each gaussian distribution.

The parameters of the gaussian distributions can be determined according to the probability vector of each sample data belonging to each gaussian distribution and the compressed vector of each sample number, and then each gaussian distribution space can be constructed.

Specifically, the parameters of each gaussian distribution may include an occurrence probability, a mean value, and a covariance matrix. The probability, mean and covariance matrix of each gaussian distribution can be determined according to the probability vector of each gaussian distribution to which each sample data belongs and the compressed vector of each sample data.

Furthermore, the occurrence probability of each Gaussian distribution is determined according to the probability vector of each sample data belonging to each Gaussian distribution and the number of the sample data.

Probability of occurrence of kth Gaussian distribution

Comprises the following steps:

if the ith sample data belongs to the kth Gaussian distribution, the ith sample data is subjected to the processing of the k-th Gaussian distribution

If not, then,

there are a total of M sample data. According to the probability vector of each sample data belonging to each Gaussian distribution, a plurality of samples corresponding to the kth Gaussian distribution can be determined

And then the occurrence probability of the kth Gaussian distribution is determined.

For characterizing the probability of occurrence of the kth Gaussian distribution in a plurality of samples, and in turn, for enablingThe probability of occurrence of each gaussian distribution characterizes the space of each gaussian distribution.

Specifically, the mean value of each gaussian distribution may be determined according to the probability vector of each sample data belonging to each gaussian distribution and the compressed vector of each sample data.

Further, the mean of the kth Gaussian distribution

Comprises the following steps:

in practical application, zi is a compressed vector of the ith sample data.

The mean value of the compressed vector of the sample data belonging to the kth Gaussian distribution can be represented, and the feature information of each Gaussian distribution can be represented through the mean value of each sample data.

The covariance matrix of each gaussian distribution can be determined according to the mean value of each gaussian distribution, the probability vector of each sample data belonging to each gaussian distribution, and the compressed vector of each sample data.

In particular, the k-th Gaussian distributed covariance matrix

Comprises the following steps:

furthermore, the probability, the mean value and the covariance matrix of each Gaussian distribution can be determined by the method, so that Gaussian distribution parameters are obtained, and the Gaussian distribution parameters can represent the characteristics of one Gaussian distribution. And updating the Gaussian distribution parameters according to the compressed vectors of the sample data when the model is trained through the sample every time. For example, parameters of gaussian distribution can be obtained by training a first batch of sample data, and the parameters of gaussian distribution can be updated by combining compression variables of the first batch of sample data and compression variables of a second batch of sample data when a second batch of sample data is processed.

In practical application, in the process of training the model by using sample data, after the parameters of the gaussian distribution are determined each time, the expectation of each sample data can be determined according to the compressed vector of each sample data of the current batch and the parameters of each gaussian distribution.

Among other things, it is desirable to characterize the distance between the sample data and the gaussian distributions, such as a first expectation for a first sample, for characterizing the distance between the first sample and the gaussian distributions.

Specifically, the compressed code of each sample data may be processed according to the determined parameters of each gaussian distribution, so as to determine the expectation of each sample data.

Further, the expected e (z) of the ith sample data is:

where zi represents the compressed vector of the ith sample data.

And step 809, according to the expectation of each sample data, correcting the attention layer, determining network parameters in the sub-modules, and obtaining a model for evaluating the business activity risk. The determining submodule comprises a compression network and an estimation network.

Specifically, a function for training the model may be preset, and the function may be used for training the coding network, the attention layer, and determining the network parameters in the sub-module. The model may specifically be trained by the following formula:

wherein thetae is a first neural network parameter in the compression network, thetad is a second neural network parameter in the compression network, and thetae_mTo estimate a third neural network parameter in the network.

The model obtained by the training of the method can be applied to the method shown in the figure.

Fig. 9 is a block diagram illustrating a risk information determination apparatus for business activities according to an exemplary embodiment of the present application.

As shown in fig. 9, the apparatus 900 for determining risk information of business activity provided by this embodiment includes:

an obtaining module 910, configured to obtain service data to be evaluated corresponding to the service activity; the service data to be evaluated comprises a plurality of data characteristics, and each data characteristic corresponds to a characteristic domain;

the encoding module 920 is configured to input the data features corresponding to the same feature domain into an encoding network corresponding to the feature domain to obtain a feature vector corresponding to the feature domain;

an embedding module 930, configured to input the feature vectors corresponding to the feature domains into an attention layer, so as to obtain a total vector including a relationship between the feature vectors;

an expectation determining module 940, configured to input the total vector into a determining submodule, and determine, by using the determining submodule, an expectation corresponding to the service data to be evaluated;

a risk determining module 950, configured to determine, according to the expectation, risk information corresponding to the to-be-evaluated business data, where the risk information is used to indicate a risk degree of the to-be-evaluated business data.

The specific principle and implementation of the device for determining risk information of business activities provided by this embodiment are similar to those of the embodiment shown in fig. 1, and are not described herein again.

Fig. 10 is a block diagram of a risk information determination apparatus for business activities according to another exemplary embodiment of the present application.

As shown in fig. 10, on the basis of the foregoing embodiment, the device 1000 for determining risk information of business activity provided by this embodiment includes:

a similarity determining unit 931 configured to determine feature similarities between every two feature vectors according to the feature vectors corresponding to the respective feature domains;

an embedding unit 932, configured to determine, according to the respective feature similarities, a total vector including a relationship between the respective feature vectors.

Optionally, the embedding unit 932 is specifically configured to:

determining a relation coefficient between each feature vector and other feature vectors according to the feature similarity between each feature vector and other feature vectors, wherein the relation coefficient is used for representing the relation between the two feature vectors;

determining a characteristic domain embedding vector of each characteristic vector according to a relation coefficient between each characteristic vector and other characteristic vectors and each characteristic vector; the feature domain embedding vector is used for representing a feature vector embedded with other feature vector information;

and determining a total vector comprising the relation between the characteristic vectors according to the embedded vectors of the characteristic domains.

Optionally, the desire determining module 940 includes:

a compressing unit 941, configured to input the total vector into a compression network to obtain a compressed vector;

an expectation determining unit 942, configured to determine, according to the compressed vector and preset parameters corresponding to each of the gaussian distributions, an expectation corresponding to the data to be evaluated; the number of the Gaussian distributions is the number of the characteristic domains; the expectation is used to characterize the distance between the data to be evaluated and each of the gaussian distributions.

Optionally, the compressing unit 941 is specifically configured to:

coding the total vector according to a coding function and a first neural network parameter to obtain a coding vector;

decoding the coding vector according to a decoding function and a second neural network parameter to obtain a decoding vector;

determining a reconstruction error according to the characteristics of the data to be evaluated and the decoding vector; the reconstruction error is used for representing the difference between the decoding vector and the characteristics of the data to be evaluated;

and determining the compressed vector according to the coding vector and the reconstruction error.

The risk determination module 950 is specifically configured to:

and mapping the expectation of the data to be evaluated into the risk score according to a preset mapping function.

Optionally, the obtaining module 910 includes:

an obtaining unit 911, configured to obtain an offline feature corresponding to the business activity, and obtain real-time data corresponding to the business activity;

a determining unit 912, configured to determine, according to the offline feature and the real-time data, to-be-evaluated data of the business activity.

Optionally, the apparatus further comprises an early warning module 960, configured to:

and if the risk score is larger than a threshold value, early warning is carried out.

The specific principle and implementation of the apparatus provided in this embodiment are similar to those of the embodiment shown in fig. 5, and are not described here again.

Fig. 11 is a block diagram illustrating a training apparatus for a model for assessing risk of business activities according to an exemplary embodiment of the present application.

The model comprises: the coding network, the attention layer and the determining submodule corresponding to each characteristic domain;

as shown in fig. 11, the present application provides a training apparatus 1100 for evaluating a model of business activity risk, including:

an obtaining module 1110, configured to obtain a plurality of sample data corresponding to the service activity; each said sample data comprises a plurality of data features, each said data feature corresponding to a feature field;

the encoding module 1120 is configured to input the data features corresponding to the same feature domain in each sample data into an encoding network corresponding to the feature domain to obtain a feature vector corresponding to the feature domain;

an embedding module 1130, configured to input a feature vector corresponding to the feature domain into the attention layer, to obtain a total vector corresponding to each sample data and including a relationship between the feature vectors;

a desire determination module 1140 for inputting the total vector of each sample data into the determination submodule, with which a desire for each of the sample data is determined;

a training module 1150, configured to train the model according to the expectation of each sample data, to obtain a model for evaluating a risk of a business activity.

The specific principle and implementation of the apparatus provided in this embodiment are similar to those of the embodiment shown in fig. 6, and are not described herein again.

Fig. 12 is a block diagram illustrating a training apparatus for a model for assessing risk of business activities according to another exemplary embodiment of the present application.

As shown in fig. 12, based on the above embodiment, the embedding module 1130 in the training apparatus 1200 for evaluating a model of business activity risk provided by the present application includes:

a similarity determining unit 1131, configured to determine, according to the feature vector corresponding to each feature domain in each sample data, a feature similarity between every two feature vectors in each sample data;

the embedding unit 1132 is configured to determine, according to the feature similarity in each sample data, that each sample data includes a total vector of relationships between the feature vectors.

Optionally, the embedding unit 1132 is specifically configured to:

determining a relation coefficient between each feature vector and other feature vectors in each sample data according to each feature similarity of each sample data; the relation coefficient is used for representing the relation between the two feature vectors;

determining a characteristic domain embedding vector of each characteristic vector in each sample data according to a relation coefficient between each characteristic vector and other characteristic vectors in each sample data and each characteristic vector;

and generating a total vector of each sample data according to the feature domain embedded vector of each feature vector in each sample data.

Optionally, the desire determination module 1140 includes:

a compressing unit 1141, configured to input the total vector of each sample data into a compression network, to obtain a compressed vector of each sample data;

an estimating unit 1142, configured to input the compressed vector of each sample data into an estimation network, to obtain a probability vector that each sample data belongs to each gaussian distribution; wherein the number of Gaussian distributions is the number of feature domains;

an expectation determining unit 1143, configured to determine an expectation of each sample data according to a probability vector that each sample data belongs to each gaussian distribution and the compressed vector of each sample data; wherein the expectation is used to characterize a distance between the sample data and each of the Gaussian distributions.

Optionally, the compressing unit 1141 is specifically configured to:

coding the total vector of each sample data according to a coding function and a first neural network parameter to obtain a coding vector of each sample data;

decoding the coding vector of each sample data according to a decoding function and a second neural network parameter to obtain a decoding vector of each sample data;

determining a reconstruction error of each sample data according to the data characteristics of each sample data and the decoding vector of each sample data; the reconstruction error is used for representing the difference between the data characteristic of each sample data and the decoding vector of each sample data;

determining the compressed vector of each sample data according to the encoding vector of each sample data and the reconstruction error of each sample data.

Optionally, the training module 1150 includes a first training unit 1151, configured to:

and correcting parameters in the coding network corresponding to each characteristic domain according to the reconstruction error of each sample data.

Optionally, the estimating unit 1142 is specifically configured to:

converting the compressed vector of each sample data into an output characteristic according to a third neural network parameter;

and determining the probability vector of each sample data belonging to each Gaussian distribution according to a preset function and the output characteristics of each sample data.

Optionally, the expectation determining unit 1143 is specifically configured to:

determining parameters of each Gaussian distribution according to a probability vector of each sample data belonging to each Gaussian distribution and the compressed vector of each sample number;

and determining the expectation of each sample data according to the compressed vector of each sample data and the parameters of each Gaussian distribution.

Optionally, the parameters of the gaussian distribution include occurrence probability, mean, and covariance matrix;

the expectation determining unit 1143 is specifically configured to:

determining the occurrence probability of each Gaussian distribution according to the probability vector of each sample data belonging to each Gaussian distribution and the number of the sample data;

determining the mean value of each Gaussian distribution according to the probability vector of each sample data belonging to each Gaussian distribution and the compressed vector of each sample data;

and determining a covariance matrix of each Gaussian distribution according to the mean value of each Gaussian distribution, the probability vector of each sample data belonging to each Gaussian distribution and the compressed vector of each sample data.

Optionally, the training module 1150 includes a second training unit 1152, configured to:

and according to the expectation of each sample data, modifying the network parameters in the attention layer and the determining submodule to obtain a model for evaluating the business activity risk.

Optionally, the obtaining module 1110 includes:

an obtaining unit 1111, configured to obtain an offline feature corresponding to the service activity, and obtain each piece of real-time data corresponding to the service activity;

a determining unit 1112, configured to determine each sample data of the business activity according to the offline feature and each real-time data.

The specific principle and implementation of the apparatus provided in this embodiment are similar to those of the embodiment shown in fig. 8, and are not described here again.

As shown in fig. 13, the electronic device provided in this embodiment includes:

a memory 131;

a processor 132; and

a computer program;

wherein the computer program is stored in the memory 131 and configured to be executed by the processor 132 to implement any one of the above-described risk information determination methods for business activities or the training method for evaluating models of business activity risk.

The present embodiments also provide a computer-readable storage medium, having stored thereon a computer program,

the computer program is executed by a processor to implement any of the business activity risk information determination methods or the training methods for models that evaluate business activity risk as described above.

The present embodiment also provides a computer program comprising a program code for performing any one of the above methods for determining risk information of business activities or a method for training a model for evaluating risk of business activities, when the computer program is run by a computer.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for determining risk information of business activities is characterized by comprising the following steps:

2. The method of claim 1, wherein the inputting the feature vectors corresponding to the feature domain into an attention layer to obtain a total vector including a relationship between the feature vectors comprises:

determining the feature similarity between every two feature vectors according to the feature vectors corresponding to the feature domains;

and determining a total vector comprising the relation between the characteristic vectors according to the characteristic similarity.

3. The method according to claim 2, wherein the determining a total vector including a relationship between each two feature vectors according to the feature similarity between the feature vectors comprises:

4. The method of claim 1, wherein the inputting the total vector into a determination submodule, with which a desire corresponding to the data to be evaluated is determined, comprises:

inputting the total vector into a compression network to obtain a compressed vector;

determining an expectation corresponding to the data to be evaluated according to the compressed vector and preset parameters corresponding to the Gaussian distributions; the number of the Gaussian distributions is the number of the characteristic domains; the expectation is used to characterize the distance between the data to be evaluated and each of the gaussian distributions.

5. The method of claim 4, wherein inputting the total vector into a compression network to obtain a compressed vector comprises:

6. The method of claim 1, wherein determining a risk score corresponding to the data to be assessed according to the expectation comprises:

7. The method according to any one of claims 1-6, wherein obtaining data to be evaluated corresponding to the business activity comprises:

acquiring offline characteristics corresponding to the business activities and acquiring real-time data corresponding to the business activities;

and determining the data to be evaluated of the business activity according to the offline characteristics and the real-time data.

8. The method of any one of claims 1-6, further comprising:

9. A training method for a model for assessing risk of business activity, the model comprising: the coding network, the attention layer and the determining submodule corresponding to each characteristic domain;

the method comprises the following steps:

10. The method according to claim 9, wherein said inputting the feature vectors corresponding to the feature domain into the attention layer, and obtaining a total vector corresponding to each sample data including the relationship between the respective feature vectors, comprises:

determining the feature similarity between every two feature vectors in each sample data according to the feature vectors corresponding to the feature domains in each sample data;

and determining the total vector of each sample data including the relationship among the characteristic vectors according to the characteristic similarity in each sample data.

11. The method according to claim 10, wherein determining a total vector of each sample data including a relationship between each two feature vectors according to a feature similarity between the feature vectors in each sample data comprises:

12. The method of claim 9, wherein said inputting said total vector of each sample data into a determination submodule, with which a determination of a expectation for each said sample data is made, comprises:

inputting the total vector of each sample data into a compression network to obtain a compression vector of each sample data;

inputting the compressed vector of each sample data into an estimation network to obtain a probability vector of each sample data belonging to each Gaussian distribution; wherein the number of Gaussian distributions is the number of feature domains;

determining the expectation of each sample data according to the probability vector of each Gaussian distribution of each sample data and the compressed vector of each sample data; wherein the expectation is used to characterize a distance between the sample data and each of the Gaussian distributions.

13. The method of claim 12, wherein said inputting said total vector of each sample data into a compression network resulting in a compressed vector of each sample data comprises:

14. The method of claim 13, further comprising:

15. The method of claim 12, wherein said inputting said compressed vector for each sample data into an estimation network to obtain a probability vector for each sample data belonging to a respective gaussian distribution comprises:

16. The method of claim 12, wherein said determining the expectation for each sample data based on the probability vector for each sample data belonging to the respective gaussian distribution, the compressed vector for each sample data, comprises:

17. The method of claim 16, wherein the parameters of the gaussian distribution include probability of occurrence, mean, covariance matrix;

determining parameters of the Gaussian distributions according to the probability vector of each sample data belonging to the Gaussian distributions and the compressed vector of each sample number, wherein the parameters comprise:

18. The method of claim 9, wherein training the model according to the expectation of each sample data, resulting in a model for assessing risk of business activity, comprises:

modifying network parameters in the attention layer, the determination submodule, according to the expectation of each sample data.

19. The method according to any of claims 9-18, wherein obtaining a plurality of sample data corresponding to the business activity comprises:

acquiring an offline feature corresponding to the business activity, and acquiring each real-time data corresponding to the business activity;

and determining each sample data of the business activity according to the offline characteristics and each real-time data.

20. An apparatus for determining risk information of a business activity, comprising:

21. A training apparatus for a model for assessing risk of business activity, the model comprising: the coding network, the attention layer and the determining submodule corresponding to each characteristic domain;

the device comprises:

22. An electronic device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-8 or 9-19.

23. A computer-readable storage medium, having stored thereon a computer program,

the computer program is executed by a processor to implement the method of any of claims 1-8 or 9-19.

24. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8 or 9-19.