CN116595486A - Risk identification method, risk identification model training method and corresponding device - Google Patents

Risk identification method, risk identification model training method and corresponding device Download PDF

Info

Publication number
CN116595486A
CN116595486A CN202310624324.1A CN202310624324A CN116595486A CN 116595486 A CN116595486 A CN 116595486A CN 202310624324 A CN202310624324 A CN 202310624324A CN 116595486 A CN116595486 A CN 116595486A
Authority
CN
China
Prior art keywords
module
risk
feature extraction
training
integration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310624324.1A
Other languages
Chinese (zh)
Inventor
郑开元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310624324.1A priority Critical patent/CN116595486A/en
Publication of CN116595486A publication Critical patent/CN116595486A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/34Betting or bookmaking, e.g. Internet betting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a risk identification method, a method for training a risk identification model and a corresponding device. Comprising the following steps: inputting behavior data to be identified into a risk identification model, wherein the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations; the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination; the feature extraction module is used for extracting feature representation corresponding to the main key from behavior data to be identified; the integration module is used for predicting risk by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information. The embodiment of the specification adopts a general integrated risk identification model, so that the construction and maintenance cost is reduced, and the influence on the storage performance is also reduced.

Description

Risk identification method, risk identification model training method and corresponding device
Technical Field
One or more embodiments of the present disclosure relate to the field of network security technologies, and in particular, to a risk identification method, a method for training a risk identification model, and a corresponding device.
Background
Today, where internet technology is increasingly developed, users may present various risks in various behaviors using the internet, and thus risk identification is a core capability of security services. Because the behavior characteristics generated by users in the internet use process are various, or the behavior characteristics which can be provided for the risk recognition system by the users are various, the risk recognition model is often built aiming at various behavior characteristics in the traditional risk recognition system, but the model in the mode has high building cost and high maintenance cost, and generates larger pressure on the storage performance.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure disclose a risk identification method, a method of training a risk identification model, and a corresponding apparatus for reducing costs required for risk identification and stress on storage performance.
According to a first aspect, embodiments of the present disclosure provide a risk identification method, the method including:
Acquiring behavior data to be identified;
inputting the behavior data to be identified into a risk identification model, and acquiring risk information output by the risk identification model aiming at the behavior data to be identified; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the behavior data to be identified;
the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from the behavior data to be identified;
and the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information aiming at the to-be-identified behavior data output.
According to one implementation in an embodiment of the present disclosure, the primary key includes at least one of a behavioural active party, a behavioural passive party, a transaction amount, and a variety of environmental information.
According to an implementation manner in the embodiments of the present disclosure, the feature extraction module further performs preliminary risk prediction by using the extracted feature representation to obtain preliminary risk information;
and when the integrated module corresponding to the main key combination carries out risk prediction, the integrated module further utilizes the preliminary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination.
According to one implementation manner in the embodiments of the present disclosure, the feature extraction module includes: the device comprises a preprocessing sub-module, a coding sub-module and a first classification sub-module;
the preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the behavior data to be identified;
the encoding submodule is used for encoding the feature data obtained after preprocessing to obtain feature representation corresponding to the main key;
the first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
According to an implementation manner in the embodiments of the present disclosure, the integrated module corresponding to the primary key combination includes: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module;
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination;
the second integration sub-module is used for performing second integration processing on the preliminary risk information output by each feature extraction module corresponding to the primary key contained in the primary key combination;
the third integration sub-module is used for performing third integration processing on the results of the first integration processing and the second integration processing;
and the second classification sub-module is used for carrying out risk prediction by utilizing the result of the third integration processing to obtain risk information output aiming at the behavior data to be identified.
In a second aspect, there is provided a method of training a risk identification model, the method comprising:
acquiring first training data comprising a plurality of first training samples, wherein the first training samples comprise first row data samples and risk tags marked for the first row data samples;
training a risk identification model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the second behavior data sample;
The routing module is used for determining a primary key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to a primary key contained in the primary key combination, determining an integration module corresponding to the primary key combination, and determining an integration module corresponding to the primary key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples;
the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information output for the first row of data samples;
the training targets include: minimizing the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples.
According to an implementation manner in the embodiments of the present disclosure, the feature extraction module further performs preliminary risk prediction by using the extracted feature representation to obtain preliminary risk information;
And when the integrated module corresponding to the main key combination carries out risk prediction, the integrated module further utilizes the preliminary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination.
According to one implementation manner in the embodiments of the present disclosure, the feature extraction module includes: the device comprises a preprocessing sub-module, a coding sub-module and a first classification sub-module;
the preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the first row of data samples;
the encoding submodule is used for encoding the feature data obtained after preprocessing to obtain feature representation corresponding to the main key;
the first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
According to an implementation manner in the embodiments of the present disclosure, the integrated module corresponding to the primary key combination includes: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module;
the first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination;
The second integration sub-module is used for performing second integration processing on the preliminary risk information output by each feature extraction module corresponding to the primary key contained in the primary key combination;
the third integration sub-module is used for performing third integration processing on the results of the first integration processing and the second integration processing;
and the second classification sub-module is used for carrying out risk prediction by using the result of the third integration processing to obtain a risk probability value output for the first row of data samples.
According to an implementation manner in the embodiments of the present disclosure, the method further includes pre-training each feature extraction module;
training a risk identification model by using the first training data is further training based on the parameters of the feature extraction module obtained by pre-training;
wherein the pre-training comprises: acquiring second training data comprising a plurality of second training samples, wherein the second training samples comprise second behavior data samples and risk tags marked for the second behavior data samples;
the second training data is utilized to pretrain a feature extraction module, wherein the second behavior data sample is used as input of a feature extraction module and a feature extraction module copy, the feature extraction module and the feature extraction module copy acquire preliminary risk information obtained by aiming at the second behavior data sample, and the feature extraction module copy adopt the same structure but different initialization parameters; the pre-trained targets include: minimizing the difference between the preliminary risk information obtained by the feature extraction module and the risk tag labeled for the second behavioral data sample, minimizing the difference between the preliminary risk information obtained by the feature extraction module copy and the risk tag labeled for the second behavioral data sample, and minimizing the output distribution divergence between the feature extraction module and the feature extraction module copy; and removing the feature extraction module copy after the pre-training is finished.
According to an implementation manner in the embodiments of the present disclosure, an auxiliary classification layer is set before the second classification sub-module in each integrated module, where the auxiliary classification layer includes a plurality of auxiliary classifiers;
each auxiliary classifier carries out risk prediction by utilizing the result of the third integration processing to obtain risk information, and the auxiliary classifier and the second classification sub-module adopt the same structure but different initialization parameters;
the training goals also include at least one of:
taking one of the auxiliary classifiers with the smallest corresponding loss function as a teacher network and the rest as student networks, wherein the difference between the risk information output by the minimum chemical student network and the risk information output by the teacher network,
minimizing the difference between the risk information obtained by the prediction after integrating the risk probability values output by the auxiliary classifiers by the second classification sub-module and the risk information output by the teacher network,
maximizing the difference between the risk probability values output by the auxiliary classifiers;
and after training, removing the auxiliary classification layer.
According to an implementation manner in the embodiments of the present disclosure, an integrated module with a minimum corresponding loss function is used as a teacher network, the routing module inputs the feature representations extracted by the feature extraction modules corresponding to the primary keys included in the primary key combination into the teacher network, and the teacher network performs risk prediction by using the input feature representations to obtain a risk probability value output for the first row of data samples; the integrated module corresponding to the primary key combination is used as a student network, and the output risk information comprises a risk probability value;
The training goals further include: minimizing the difference between the risk probability value output by the teacher network and the risk probability value output by the student network.
In a third aspect, there is provided a risk identification apparatus, the apparatus comprising:
a data acquisition unit configured to acquire behavior data to be recognized;
the risk identification unit is configured to input the behavior data to be identified into a risk identification model, and acquire risk information output by the risk identification model aiming at the behavior data to be identified; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the behavior data to be identified;
the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from the behavior data to be identified;
And the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information aiming at the to-be-identified behavior data output.
In a fourth aspect, there is provided an apparatus for training a risk identification model, the apparatus comprising:
a sample acquisition unit configured to acquire first training data including a plurality of first training samples including a first line of data samples and risk tags labeled for the first line of data samples;
a model training unit configured to train a risk recognition model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the second behavior data sample;
the routing module is used for determining a primary key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to a primary key contained in the primary key combination, determining an integration module corresponding to the primary key combination, and determining an integration module corresponding to the primary key combination;
Each feature extraction module is respectively corresponding to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples;
the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information output for the first row of data samples;
the training targets include: minimizing the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples.
According to a fifth aspect, the present description provides a computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to perform the method as described above.
According to a sixth aspect, embodiments of the present specification provide a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements a method as described above.
According to the technical scheme, the embodiment of the specification can have the following advantages:
1) The embodiment of the specification provides a general integrated risk recognition model, wherein feature extraction modules corresponding to different primary keys and integration modules corresponding to different primary key combinations are configured in the risk recognition model, and a routing module determines the required feature extraction modules and integration modules according to the primary key combinations corresponding to behavior data to be recognized, so that feature extraction and risk prediction aiming at the behavior data to be recognized are realized. According to the embodiment of the specification, the universal integrated risk identification model is adopted, different risk identification models are not required to be respectively built in advance aiming at different behavior characteristics, the construction and maintenance cost is reduced, and the influence on the storage performance is also reduced.
2) The feature extraction module in the embodiment of the specification can further utilize the extracted feature representation to perform preliminary risk prediction so as to obtain preliminary risk information; when the integrated module corresponding to the main key combination predicts the risk, the preliminary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination can be further utilized, so that the prediction accuracy of the risk recognition model is improved.
3) Before training the risk recognition model, the embodiment of the specification can improve the effect of feature representation in a mutual learning mode, so that the training efficiency and effect of the risk recognition model are improved.
4) According to the embodiment of the specification, knowledge distillation among the integrated modules is adopted in the training part of the integrated modules, and the integrated modules with better training assist in learning the integrated modules with worse training, so that the recognition effect of the modules with worse training is improved.
5) According to the embodiment of the specification, a training mode of mutual learning and negative correlation learning is adopted in the integrated module, and the second classification sub-module is helped to learn a better effect by constructing different auxiliary classifiers.
Of course, not all of the above-described advantages need be achieved at the same time in practicing any one of the embodiments of the present description.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 illustrates an exemplary system architecture diagram to which embodiments of the present description may be applied;
fig. 2 is a flowchart of a risk identification method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a risk identification model according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a feature extraction module according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an integrated module according to an embodiment of the present disclosure;
FIG. 6 is a flowchart of a method for training a risk identification model provided in an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a training feature extraction module according to an embodiment of the present disclosure;
fig. 8a to 8c are schematic diagrams illustrating mutual learning inside an integrated module according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of knowledge distillation between integrated modules provided in an embodiment of the present disclosure;
fig. 10 shows a structural diagram of a risk identification apparatus according to an embodiment of the present specification;
FIG. 11 shows a block diagram of an apparatus for training a risk identification model according to one embodiment of the present description.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
In order to reduce the pressure and cost pressure generated by respectively constructing a risk identification model for various behavior features on storage performance, the embodiment of the specification provides a brand-new idea, and the risk identification for various behavior features is realized through a general integrated model.
To facilitate understanding of the embodiments of the present specification, a system architecture on which the embodiments of the present specification are based will first be described. FIG. 1 illustrates an exemplary system architecture to which embodiments of the present description may be applied. The system mainly comprises a model training device for establishing a risk identification model in an off-line mode and a risk identification device for carrying out risk identification on behavior data to be identified on line.
After the training data is obtained, the model training device can perform model training by adopting the method provided by the embodiment of the specification to obtain the risk identification model.
And the risk recognition device performs risk recognition on the behavior data to be recognized by using the trained risk recognition model to obtain risk information of the behavior data to be recognized.
The model training device and the risk recognition device can be respectively set as independent servers, can be set in the same server or server group, and can be set in independent or same cloud servers. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPS, virtual Private Server) service. The model training device and the risk recognition device can also be arranged on a computer terminal with stronger computing capability.
In addition to performing risk recognition on the line, the risk recognition device may perform risk recognition in an offline manner, for example, perform risk recognition on the behavior data to be recognized in a batch.
It should be understood that the number of model training devices, risk recognition devices, and risk recognition models in fig. 1 are merely illustrative. There may be any number of model training means, risk recognition means, and risk recognition models, as required by the implementation.
Fig. 2 is a flowchart of a risk identification method according to an embodiment of the present disclosure. It will be appreciated that the method may be performed by a risk identification device in the system shown in fig. 1. Referring to fig. 2, the method mainly comprises the following steps:
step 201: and acquiring behavior data to be identified.
Step 203: inputting behavior data to be identified into a risk identification model, and acquiring risk information output by the risk identification model aiming at the behavior data to be identified; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of behavior data to be identified; the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination; each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from behavior data to be identified; the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination, and obtaining risk information aiming at the to-be-identified behavior data output.
As can be seen from the technical content provided by the above embodiments, the present embodiments provide a general integrated risk recognition model, in which feature extraction modules corresponding to different primary keys and integration modules corresponding to different primary key combinations are configured in the risk recognition model, and a routing module determines a required feature extraction module and an integration module according to the primary key combination corresponding to behavior data to be recognized, so as to implement feature extraction and risk prediction for the behavior data to be recognized. According to the embodiment of the specification, the universal integrated risk identification model is adopted, different risk identification models are not required to be respectively built in advance aiming at different behavior characteristics, the construction and maintenance cost is reduced, and the influence on the storage performance is also reduced.
The respective steps shown in fig. 2 are explained below. The behavior data to be identified obtained in the step 201 may be behavior data actually occurring in the network, and may be obtained from a server side or may be obtained in real time.
In addition, the behavior data to be identified acquired in the step 201 may be behavior data provided by the user. For example, in the risk consultation scenario, the user wants to perform risk consultation on the behavior data that is about to occur or has occurred, and part or all of the behavior data may be provided as behavior data to be identified to the risk identification device provided in the embodiment of the present specification.
Other application scenarios are also possible, not explicitly recited herein.
Various behavior data can occur in the process of using the network by the user, and the behavior data embody the behavior intention of the user. Some of these actions are risky and most are trusted. Behavior data may vary from scene to scene. Taking the network transaction scenario as an example, the user may be an account, a bank card, a red packet id, or the like. The network behavior may be transaction behavior related to financial classes such as payment behavior, deposit and withdrawal behavior, subscription binding behavior, receive and dispatch red package behavior, collection behavior, and the like. Taking the network friend making scenario as an example, the user may be a social platform account, and the network behavior may be, for example, a login behavior, a friend request sending behavior, a friend request receiving behavior, a chat behavior, a link sending behavior, and so on.
The behavior data to be identified obtained in this step can be parsed to obtain various behavior description items, such as at least one of a behavior active party, a behavior passive party, a transaction amount, various environmental information, and the like, and these behavior description items are called primary keys in the embodiment of this specification. Wherein the active and passive parties may be, for example, accounts, bank cards, user ids, certificate numbers, social networking accounts, instant messaging tool accounts, financial accounts, host addresses, client identifications, and the like. The environmental information may include, for example, geographic location information, platform information, time information, and the like.
The degree of integrity corresponding to the primary key information that can be obtained can be classified into different levels from high to low, for example:
class 1: an active or passive behavioural party;
class 2: a behavioural active party and a behavioural passive party;
grade 3: an active party, a passive party and a transaction amount;
grade 4: an active party, a passive party, a transaction amount, and environmental information.
The following describes the step 203 in detail, namely, "inputting the behavior data to be identified into the risk identification model, and obtaining the risk information output by the risk identification model with respect to the behavior data to be identified".
The risk identification model provided in the embodiment of the present disclosure is shown in fig. 3, and mainly includes a routing module, a feature extraction module corresponding to a plurality of different primary keys, and an integration module corresponding to a plurality of different primary key combinations.
Each primary key corresponds to a respective feature, the computational nature of which originates from the query of the primary key, and therefore the computation of the feature is affected by the primary key of the behavioural data to be identified. As one of the realizations, a different feature extraction module may be created for each primary key separately. As another, more alternative, the primary keys required for all features may be combed, dividing each feature into different feature fields. One of the primary keys may correspond to a plurality of different feature domains, each creating a different feature extraction module for each feature domain. Such as shown in table 1:
TABLE 1
After the risk recognition model is input with the behavior data to be recognized, the routing module can determine the primary key combination corresponding to the behavior data to be recognized, and the behavior data to be recognized is provided for each feature extraction module corresponding to the primary key contained in the primary key combination. If the division is performed in a finer granularity, one primary key may correspond to a plurality of feature domains, the routing module actually provides the behavior data to be identified to each feature extraction module corresponding to the feature domains included in the primary key combination. Taking table 1 as an example, as shown in fig. 3, each feature domain corresponds to a respective feature extraction module.
The feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from behavior data to be identified. More precisely, the feature extraction module may respectively correspond to one of the feature domains, so as to extract a feature representation corresponding to the feature domain from the behavior data to be identified.
In the embodiment of the specification, the configuration of the size of the feature extraction module can be realized by configuring the feature domains with different granularities, so that a series of complex feature extraction modules are formed, and the hardware requirements under different resources can be met conveniently.
Furthermore, the routing module is further configured to determine an integration module corresponding to the primary key combination. The primary key combination referred to in the embodiment of the present specification may include one primary key or may include a plurality of primary keys. If the primary key is divided into finer grained feature fields, the primary key combination may be a finer grained feature field combination, accordingly. As shown in table 2 below:
TABLE 2
Integrated module Feature domain combination
E1 F1
E2 F2,F3,F5
E3 F2,F3,F4,F5
E4 F1,F2,F3,F5
E5 F1,F2,F3,F4,F5,F6
E6 F1,F2,F3,F5,F6
E7 F1,F2,F3,F4,F5,F6,F7
The integrated module corresponding to the primary key combination is used for performing risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the primary keys contained in the primary key combination (the feature domain combination if the primary key combination is finer), so as to obtain risk information aiming at the to-be-identified behavior data output.
Taking fig. 3 as an example, if the feature domain combination is F1, after feature extraction is performed by the feature extraction module corresponding to F1, risk prediction is performed by the integration module E1 using the feature representation extracted by the feature extraction module corresponding to F1. If the feature domain combination is F2, F3, F5, the feature extraction module corresponding to F2, F3, F5 performs feature extraction, and then the integrated module E2 performs risk prediction by using the feature representation extracted by the feature extraction module corresponding to F2, F3, F5. If the feature domain combination is F2, F3, F4, F5, the feature extraction module corresponding to F2, F3, F4, F5 performs feature extraction, and then the integrated module E3 performs risk prediction by using the feature representation extracted by the feature extraction module corresponding to F2, F3, F4, F5. And so on according to the correspondence shown in table 2.
The feature extraction module provided in the embodiments of the present disclosure may be a conventional feature extraction module based on a transducer structure. For example comprising a preprocessing sub-module and a coding sub-module.
The preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the behavior data to be identified. Where the preprocessing may be such as discretizing the features, min-Max (dispersion normalization) processing, dynamic embedding, DNN (Deep Neural Networks) processing, etc., as shown in fig. 4.
The encoding submodule is used for encoding the feature data obtained after preprocessing to obtain the feature representation corresponding to the primary key. The coding submodule may be composed of DNN and a transducer.
As another more preferred embodiment, a first classification sub-module may be further included in the feature extraction module, as shown in fig. 4. The first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information. The preliminary risk information may be information such as whether there is a risk, risk level information, or the like.
In this embodiment, the feature extraction module actually further performs preliminary risk prediction by using the extracted feature representation to obtain preliminary risk information, so that when the integrated module corresponding to the main key combination performs risk prediction, the preliminary risk information obtained by each feature extraction module corresponding to the main key included in the main key combination can be further utilized, thereby improving accuracy of risk prediction.
The integration module may include a first integration sub-module, a second integration sub-module, a third integration sub-module, and a second classification sub-module.
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination. As shown in fig. 5, the first integration sub-module may perform the processing of splicing and DNN.
The second integration submodule is used for carrying out second integration processing on the preliminary risk information output by each feature extraction module corresponding to the main key contained in the main key combination. As shown in fig. 5, the second integration sub-module may perform the processing of splicing and DNN.
The third integration sub-module is configured to perform a third integration process on the results of the first integration process and the second integration process, as shown in fig. 5, and may perform a stitching process.
The second classification sub-module is used for carrying out risk prediction by using the result of the third integration processing to obtain risk information output aiming at the behavior data to be identified. The risk information may be information on whether there is a risk or not, or may be risk level information. The risk probability value may be obtained first, from which it is determined whether there is a risk or a risk level is determined.
The information about whether the risk exists is actually classified into two categories, namely risk and no risk, and the second classification sub-module adopts a classifier of two classifications. The risk level information is actually classified into multiple classes, such as high risk, medium risk, low risk, no risk, etc., and the second classification model adopts a multi-classification classifier.
By means of the method provided by the embodiment, the risk identification model can automatically route the input primary key to the corresponding feature extraction module and the corresponding integration module so as to realize risk identification. When the user performs risk consultation or risk identification, although the primary keys included in the behavior data can be provided are different, the risk identification can be performed through a general risk identification model, so that the construction and maintenance cost and the pressure on the storage performance are obviously reduced.
The training process of the risk identification model utilized in the above-described method embodiment is described in detail below. FIG. 6 is a flowchart of a method for training a risk identification model according to an embodiment of the present disclosure, which may be performed by the model training apparatus in the system shown in FIG. 1. As shown in fig. 6, the method may include the steps of:
Step 601: first training data comprising a plurality of first training samples is obtained, the first training samples comprising a first row of data samples and risk tags labeled for the first row of data samples.
The user is recorded with a large amount of behavior data by the server side during the use of the network, and the behavior data are usually recorded in a data warehouse, and the behavior data represent the behavior intention of the user. Some of these actions are risky and most are trusted. In the embodiment of the specification, some behavior data with explicit risk information can be marked with a risk information label as a first behavior data sample. The risk information tag may be a tag having a risk, a risk level tag, or the like.
When training the risk identification model, behavior data of various risk types can be acquired to construct first training data. For example, data of one or any combination of types of theft risk, fraud risk, money laundering risk, gambling risk, etc. and data without any risk are acquired to construct the first training data. If the risk information is a risk level, behavior data with various levels of risk may be acquired to construct the first training data.
Step 603: training a risk identification model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of a second behavior data sample. The routing module is used for determining a main key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to the main key contained in the main key combination, determining an integration module corresponding to the main key combination, and determining an integration module corresponding to the main key combination; each feature extraction module corresponds to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples; the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information output for the first row of data samples; the training targets include: the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples is minimized.
As a preferred embodiment, the feature extraction module may further perform preliminary risk prediction by using the extracted feature representation to obtain preliminary risk information. Correspondingly, when the integrated module corresponding to the main key combination carries out risk prediction, the primary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination is further utilized.
In this embodiment, the feature extraction module may have a structure as shown in fig. 4, including: the device comprises a preprocessing sub-module, a coding sub-module and a first classifying sub-module.
The preprocessing sub-module is used for preprocessing the characteristics of the primary key corresponding to the characteristic extraction module for the first row of data samples. Where the preprocessing may be, for example, discretizing the features, min-Max, dynamic embedding, DNN processing, etc.
The encoding submodule is used for encoding the feature data obtained after preprocessing to obtain the feature representation corresponding to the primary key. The coding submodule may be composed of DNN and a transducer.
The first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information. The preliminary risk information may be information such as whether there is a risk, risk level information, or the like.
The structure of the integrated module corresponding to the primary key combination may be as shown in fig. 5, including: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module.
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination. As shown in fig. 5, the first integration sub-module may perform the processing of splicing and DNN.
The second integration submodule is used for carrying out second integration processing on the preliminary risk information output by each feature extraction module corresponding to the main key contained in the main key combination. As shown in fig. 5, the second integration sub-module may perform the processing of splicing and DNN.
The third integration sub-module is configured to perform a third integration process on the results of the first integration process and the second integration process, as shown in fig. 5, and may perform a stitching process.
The second classification sub-module is used for performing risk prediction by using the result of the third integration processing to obtain a risk probability value output for the first row of data samples.
The corresponding relationship between the primary key and the feature extraction module, the corresponding relationship between the primary key combination and the integration module, and the principle and structure of the risk identification model can be referred to the relevant records in the embodiment of the risk identification method, and will not be described herein.
In the embodiment of the present disclosure, in order to improve the training effect and efficiency of the model, a two-stage training manner may be adopted, that is, in the first stage, the feature extraction modules are first pre-trained by using second training data, and in the second stage, the risk recognition model is further trained by using the first training data on the basis of the feature extraction modules obtained by pre-training.
The first stage is a pre-training process of each feature extraction module, and mainly comprises the steps S11-S12:
step S11: second training data comprising a plurality of second training samples is obtained, the second training samples comprising second behavior data samples and risk tags labeled for the second behavior data samples.
Some of the behavior data with explicit risk information can be marked with risk information labels as second behavior data samples in the embodiments of the present disclosure. The risk information tag may be a tag having a risk, a risk level tag, or the like.
Step S12: the feature extraction module is pre-trained using the second training data.
In the embodiment of the specification, each feature extraction module is independently trained, so that the pluggable property of the feature extraction module can be conveniently realized, and no matter whether the feature extraction module is increased or decreased or finely adjusted, other feature extraction modules are not affected.
In the embodiment of the present specification, a preferred training manner is provided, and a manner of mutual learning distillation is integrated in the training process of the feature extraction module. Describing one of the feature extraction modules as an example, the training modes of the other feature extraction modules are the same. As shown in fig. 7, a feature extraction module copy may be constructed for the feature extraction module in advance, both employing the same structure but employing different initialization parameters.
And taking the second behavior data sample as the input of the feature extraction module and the feature extraction module copy, and acquiring the feature extraction module and the preliminary risk information obtained by the feature extraction module copy aiming at the second behavior data sample.
The pre-trained targets include: minimizing the difference between the preliminary risk information obtained by the feature extraction module and the risk tag labeled for the second behavioral data sample, minimizing the difference between the preliminary risk information obtained by the feature extraction module copy and the risk tag labeled for the second behavioral data sample, and minimizing the output distribution divergence between the feature extraction module and the feature extraction module copy.
The loss function may be constructed in accordance with the training objectives described above. And updating model parameters in a gradient descending mode by using the value of the loss function in each round of iteration until a preset training ending condition is met. The training ending condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, the number of iterations reaching a preset number of times threshold, etc.
The loss function L may be constructed as follows in the embodiment of the present specification:
L=L1+L2
L1=L C1 +D KL(p2||p1)
L2=L C2 +D KL(p1||p2)
wherein L is C1 The difference between the preliminary risk information obtained by the feature extraction module and the risk label marked for the second behavior data sample is reflected. D (D) KL(p2||p1) The output distribution divergence between the feature extraction module and the feature extraction module copy is reflected. For example:
wherein X is i For the ith second behavior data sample input, N is the number of second behavior data samples contained in a batch (batch) of training samples. M is the category of risk labels, and M risk labels are used. P1 m (X i ) X output by the feature extraction module i The probability of m is represented by z according to the features extracted by the feature extraction module 1 Obtained. y1 i Is the characteristic extraction module aiming at X i Predicting the obtained preliminary risk information.
L C2 The difference between the preliminary risk information obtained by the feature extraction module and the risk label marked for the second behavior data sample is reflected. D (D) KL(p1||p2) The output distribution divergence between the feature extraction module and the feature extraction module copy is reflected. For example:
wherein p2 m (X i ) X output for feature extraction module copy i The probability of m is represented by the feature z extracted by the feature extraction module copy 2 Obtained. y2 i Is that feature extraction module copy is directed to X i Predicting the obtained preliminary risk information.
And after the pre-training is finished, removing the copies of the feature extraction modules, and only reserving the feature extraction modules. The training is performed in a similar manner for each feature extraction module, the training being the same, except that the primary key (or finer granularity feature domain) for which each feature extraction module is directed is different.
It should be noted that, in addition to the above-mentioned preferred pre-training method, other pre-training methods may be used, for example, the second behavior data sample is used as an input of the feature extraction module, and the preliminary risk information obtained by the feature extraction module for the second behavior data sample is obtained. The pre-trained targets include: and minimizing the difference between the preliminary risk information obtained by the feature extraction module and the risk label marked for the second behavior data sample. Other ways are possible and are not listed here.
When the risk recognition model is trained in the second stage, for the integrated modules, a mode of mutual learning and negative correlation learning can be adopted in the integrated modules, and a knowledge distillation mode is adopted between the integrated modules. The mode of 'mutual learning and negative correlation learning' inside the integrated module and the mode of knowledge distillation adopted between the integrated modules can be selected or adopted. The following detailed description is made respectively.
When the mode of mutual learning and negative correlation learning is adopted in the integrated module, an auxiliary classification layer can be arranged in front of the second classification sub-module in the integrated module, wherein the auxiliary classification layer comprises a plurality of auxiliary classifiers, and 3 auxiliary classifiers are taken as examples in fig. 8 a-8 c. And each auxiliary classifier performs risk prediction by using the result of the third integration processing to obtain a risk probability value (i.e. regression processing), so as to obtain risk information (i.e. classification processing). The second classification sub-module and each auxiliary classifier adopt the same structure, but adopt different initialization parameters. The training objective may further include at least one of the following in addition to minimizing a difference between risk information output by the integration module for the first row of data samples and risk tags in the first training samples that are labeled for the first row of data samples:
taking one of the auxiliary classifiers with the smallest corresponding loss function as a teacher network and the rest as student networks, wherein the difference between the risk information output by the minimum chemical generation network and the risk information output by the teacher network;
minimizing the difference between the risk information obtained by the prediction after integrating the risk probability values output by the auxiliary classifiers by the second classification sub-module and the risk information output by the teacher network;
The difference between the risk probability values output by the auxiliary classifiers is maximized.
The loss function may be constructed in accordance with the training objectives described above. And updating model parameters in a gradient descending mode by using the value of the loss function in each round of iteration until a preset training ending condition is met. The training ending condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, the number of iterations reaching a preset number of times threshold, etc. When updating the model parameters, only the parameters of each integrated module may be updated, or the parameters of each feature extraction module and each integrated module may be updated.
Taking the example of including all the training objectives described above, the Loss function Loss can be expressed as:
Loss=Loss1+Loss2+Loss3+Loss4
the difference between the risk information output by the second classification sub-module for the first row of data samples and the risk label marked for the first row of data samples in the first training samples is embodied by the Loss1, and a cross entropy Loss function may be adopted, which is not described in detail herein.
Loss2 reflects the difference between the risk information output by the student network and the risk information output by the teacher network. In the embodiment of the present disclosure, one of the auxiliary classifiers having the smallest corresponding loss function (i.e., the smallest difference between the output risk information and the risk label) is used as the teacher network, and the rest is used as the student network. As shown in fig. 8a, knowledge distillation is performed from the teacher's network to the student's network. For example, loss2 may employ:
Wherein CE () is a cross entropy loss function, best label For the risk information (i.e. classification result) output by the teacher network, y sub_predi And (5) representing risk information output by the ith auxiliary classifier, wherein n is the number of student networks.
The Loss3 reflects the difference between the risk information obtained by the prediction after integrating the risk probability values output by the auxiliary classifiers by the second classification sub-module and the risk information output by the teacher network. As shown in 8b, loss3 may employ:
Loss3=CE(y cls_pred ,Best label )
wherein CE () is a cross entropy loss function, best label For the risk information (i.e. classification result) output by the teacher network, y cls_pred And the risk information obtained by integrating and predicting the risk probability values output by the auxiliary classifiers by the second classification submodule is represented. Specifically, the second classification sub-module may integrate the risk probability values obtained by the auxiliary classifiers, and predict the risk information by using the integrated risk probability values and the feature representation output by the third integration sub-module.
The Loss4 reflects the difference between the risk probability values output by the auxiliary classifiers, and is a punishment item used for carrying out negative correlation learning. The purpose is to avoid the phenomenon that each auxiliary classifier is over fitted, namely that each auxiliary classifier is too common to cause mutual learning failure. As shown in fig. 8c, loss4 may employ: logit 3
Or (F)>
Wherein, the risk probability value, logic, output by the ith auxiliary classifier in the auxiliary classifiers is calculated k The risk probability value output by the kth auxiliary classifier in the auxiliary classifiers is h which is the total number of the auxiliary classifiers, and logic mean And outputting a risk probability value average value for each auxiliary classifier. MSE () is a mean square error loss function.
Through the mutual learning and negative correlation learning processes, a group of auxiliary classifier auxiliary second classification sub-modules with larger quantity and different types can be constructed, so that the effect of the integrated module is improved. And after training, removing the auxiliary classification layer.
The knowledge distillation mode is adopted between the integrated modules, and the training mode is similar to that between the feature extraction modules. An integrated module in which the corresponding loss function is smallest (the difference between the outputted risk information and the corresponding risk tag is smallest) may be used as a teacher network. The routing module further inputs the feature representations extracted by the feature extraction modules corresponding to the primary keys included in the primary key combination into the teacher network. And the teacher network performs risk prediction by using the input characteristic representation to obtain a risk probability value output for the first row of data samples. The integrated module corresponding to the primary key combination is used as a student network. The training targets employed further include: the difference between the risk probability value output by the teacher network and the risk probability value output by the student network is minimized.
As shown in fig. 9, the features corresponding to the first row of data samples represent an input teacher network and a student network, where the teacher network is an integrated module with the smallest corresponding loss function (i.e., the smallest difference between the output risk information and the corresponding risk label) among all integrated modules, and the integrated module is usually the integrated module with the most abundant information, which has better effect. Therefore, knowledge of the teacher network can be distilled to the student network, and the whole prediction effect of the integrated module can be effectively improved.
By introducing this knowledge distillation, as shown in fig. 9, it is equivalent to the fact that on the one hand the output to the student network itself is as expected as possible, which reaches the expected corresponding student loss. This part of the student Loss can be obtained from the above-mentioned Loss, and if the mutual learning inside the integrated module is not adopted, it can be obtained from the above-mentioned Loss 1. The difference in risk probability values (soft targets) between the teacher network and the student network corresponds to a distillation loss. The overall loss of the training risk identification model may be determined by both the student loss and the distillation loss, e.g., the overall loss may be obtained by weighted summing the student loss and the teacher loss. λ in fig. 9 corresponds to the weighting coefficient in which the student loses.
The above is a detailed description of the method provided by the embodiments of the present specification, and the following describes in detail the apparatus provided by the embodiments of the present specification.
Fig. 10 shows a block diagram of a risk identification apparatus according to an embodiment of the present specification, and as shown in fig. 10, the apparatus 1000 may include: a data acquisition unit 1001 and a risk identification unit 1002. Wherein the main functions of each constituent unit are as follows:
the data acquisition unit 1001 is configured to acquire behavior data to be recognized.
The risk recognition unit 1002 is configured to input the behavior data to be recognized into a risk recognition model, and acquire risk information output by the risk recognition model for the behavior data to be recognized; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of behavior data to be identified.
The routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination.
Each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from behavior data to be identified.
The integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination, and obtaining risk information aiming at the to-be-identified behavior data output.
The primary key may include at least one of an active party, a passive party, a transaction amount, and a variety of environmental information.
Furthermore, the feature extraction module may further perform preliminary risk prediction using the extracted feature representation to obtain preliminary risk information.
When the integrated module corresponding to the main key combination carries out risk prediction, the primary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination can be further utilized.
As one of the realizations, the feature extraction module includes: the device comprises a preprocessing sub-module, a coding sub-module and a first classifying sub-module.
The preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the behavior data to be identified.
The encoding submodule is used for encoding the feature data obtained after preprocessing to obtain the feature representation corresponding to the primary key.
The first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
As one of the realizable modes, the integrated module corresponding to the primary key combination includes: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module.
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination.
The second integration submodule is used for carrying out second integration processing on the preliminary risk information output by each feature extraction module corresponding to the main key contained in the main key combination.
The third integration sub-module is used for performing a third integration process on the results of the first integration process and the second integration process.
The second classification sub-module is used for carrying out risk prediction by using the result of the third integration processing to obtain risk information output aiming at the behavior data to be identified.
FIG. 11 shows a block diagram of an apparatus for training a risk identification model according to one embodiment of the present disclosure, as shown in FIG. 11, the apparatus 1100 may include: the sample acquisition unit 1101 and the model training unit 1102 may further comprise a pre-training unit 1103. Wherein the main functions of each constituent unit are as follows:
the sample acquiring unit 1101 is configured to acquire first training data including a plurality of first training samples, the first training samples including a first row of data samples and risk tags labeled for the first row of data samples.
A model training unit 1102 configured to train a risk identification model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of a second behavior data sample.
The routing module is used for determining a main key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to the main key contained in the main key combination, determining an integration module corresponding to the main key combination, and determining an integration module corresponding to the main key combination;
each feature extraction module corresponds to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples;
the integrated module corresponding to the main key combination is used for predicting risk by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination, and obtaining risk information output for the first row of data samples.
The training targets include: the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples is minimized.
As one of the realizable modes, the feature extraction module may further utilize the extracted feature representation to perform preliminary risk prediction, so as to obtain preliminary risk information.
When the integrated module corresponding to the main key combination performs risk prediction, the primary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination can be further utilized.
As one of the realizations, the feature extraction module includes: the device comprises a preprocessing sub-module, a coding sub-module and a first classifying sub-module.
The preprocessing sub-module is used for preprocessing the characteristics of the primary key corresponding to the characteristic extraction module for the first row of data samples.
The encoding submodule is used for encoding the feature data obtained after preprocessing to obtain the feature representation corresponding to the primary key.
The first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
As one of the realizable modes, the integrated module corresponding to the primary key combination includes: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module.
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination.
The second integration submodule is used for carrying out second integration processing on the preliminary risk information output by each feature extraction module corresponding to the main key contained in the main key combination.
The third integration sub-module is used for performing a third integration process on the results of the first integration process and the second integration process.
The second classification sub-module is used for performing risk prediction by using the result of the third integration processing to obtain a risk probability value output for the first row of data samples.
As one of the realizations, the sample acquisition unit 1101 is further configured to: second training data comprising a plurality of second training samples is obtained, the second training samples comprising second behavior data samples and risk tags labeled for the second behavior data samples.
The pre-training unit 1103 is configured to pre-train the feature extraction module with the second training data, wherein the second behavior data sample is used as input of the feature extraction module and the feature extraction module copy, and preliminary risk information obtained by the feature extraction module and the feature extraction module copy for the second behavior data sample is obtained, and the feature extraction module copy adopt the same structure but different initialization parameters; the pre-trained targets include: minimizing the difference between the preliminary risk information obtained by the feature extraction module and the risk tag marked for the second behavioral data sample, minimizing the difference between the preliminary risk information obtained by the feature extraction module copy and the risk tag marked for the second behavioral data sample, and minimizing the output distribution divergence between the feature extraction module and the feature extraction module copy; and removing the feature extraction module copy after the pre-training is finished.
The training of the risk recognition model by the model training unit 1102 using the first training data is a further training based on the parameters of the feature extraction module obtained by the pre-training.
As one of the realizable modes, the model training unit 1102 may set an auxiliary classification layer before the second classification sub-module in each integrated module, where the auxiliary classification layer includes a plurality of auxiliary classifiers;
each auxiliary classifier carries out risk prediction by utilizing the result of the third integration processing to obtain risk information, and the auxiliary classifier and the second classification submodule adopt the same structure but different initialization parameters;
the training goals also include at least one of:
taking the smallest corresponding loss function in each auxiliary classifier as a teacher network and the rest as student networks, wherein the difference between the risk information output by the smallest chemical generation network and the risk information output by the teacher network,
minimizing the difference between the risk information obtained by the prediction after integrating the risk probability values output by the auxiliary classifiers by the second classification sub-module and the risk information output by the teacher network,
maximizing the difference between the risk probability values output by the auxiliary classifiers;
And after training, removing the auxiliary classification layer.
As one of the realizable modes, the model training unit 1102 may use the integrated module with the smallest corresponding loss function as a teacher network, the routing module inputs the feature representations extracted by the feature extraction modules corresponding to the primary keys included in the primary key combination into the teacher network, and the teacher network performs risk prediction by using the input feature representations to obtain a risk probability value output for the first row of data samples; and the integrated module corresponding to the primary key combination is used as a student network, and the output risk information comprises a risk probability value. At this time, the training targets further include: the difference between the risk probability value output by the teacher network and the risk probability value output by the student network is minimized.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
Embodiments of the present specification also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.
And an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.
The present description embodiment also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.
The Memory may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like.
From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a computer program product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.

Claims (15)

1. A risk identification method, the method comprising:
acquiring behavior data to be identified;
inputting the behavior data to be identified into a risk identification model, and acquiring risk information output by the risk identification model aiming at the behavior data to be identified; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the behavior data to be identified;
the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from the behavior data to be identified;
and the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information aiming at the to-be-identified behavior data output.
2. The method of claim 1, wherein the primary key comprises at least one of a behavioural active party, a behavioural passive party, a transaction amount, and a variety of environmental information.
3. The method of claim 1, wherein the feature extraction module further performs preliminary risk prediction using the extracted feature representation to obtain preliminary risk information;
and when the integrated module corresponding to the main key combination carries out risk prediction, the integrated module further utilizes the preliminary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination.
4. A method according to claim 3, wherein the feature extraction module comprises: the device comprises a preprocessing sub-module, a coding sub-module and a first classification sub-module;
the preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the behavior data to be identified;
the encoding submodule is used for encoding the feature data obtained after preprocessing to obtain feature representation corresponding to the main key;
the first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
5. The method of claim 3, wherein the integrated module corresponding to the primary key combination comprises: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module;
the first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination;
the second integration sub-module is used for performing second integration processing on the preliminary risk information output by each feature extraction module corresponding to the primary key contained in the primary key combination;
the third integration sub-module is used for performing third integration processing on the results of the first integration processing and the second integration processing;
and the second classification sub-module is used for carrying out risk prediction by utilizing the result of the third integration processing to obtain risk information output aiming at the behavior data to be identified.
6. A method of training a risk identification model, the method comprising:
acquiring first training data comprising a plurality of first training samples, wherein the first training samples comprise first row data samples and risk tags marked for the first row data samples;
Training a risk identification model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the second behavior data sample;
the routing module is used for determining a primary key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to a primary key contained in the primary key combination, determining an integration module corresponding to the primary key combination, and determining an integration module corresponding to the primary key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples;
the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information output for the first row of data samples;
the training targets include: minimizing the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples.
7. The method of claim 6, wherein the feature extraction module further performs preliminary risk prediction using the extracted feature representation to obtain preliminary risk information;
and when the integrated module corresponding to the main key combination carries out risk prediction, the integrated module further utilizes the preliminary risk information obtained by each feature extraction module corresponding to the main key contained in the main key combination.
8. The method of claim 7, wherein the feature extraction module comprises: the device comprises a preprocessing sub-module, a coding sub-module and a first classification sub-module;
the preprocessing sub-module is used for preprocessing the characteristics of the main key corresponding to the characteristic extraction module for the first row of data samples;
the encoding submodule is used for encoding the feature data obtained after preprocessing to obtain feature representation corresponding to the main key;
the first classification submodule is used for carrying out preliminary risk prediction by utilizing the characteristic representation obtained by the coding submodule to obtain preliminary risk information.
9. The method of claim 7, wherein the integrated module corresponding to the primary key combination comprises: the first integration sub-module, the second integration sub-module, the third integration sub-module and the second classification sub-module;
The first integration submodule is used for carrying out first integration processing on the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination;
the second integration sub-module is used for performing second integration processing on the preliminary risk information output by each feature extraction module corresponding to the primary key contained in the primary key combination;
the third integration sub-module is used for performing third integration processing on the results of the first integration processing and the second integration processing;
and the second classification sub-module is used for carrying out risk prediction by using the result of the third integration processing to obtain a risk probability value output for the first row of data samples.
10. The method of claim 7 or 8, further comprising pre-training each feature extraction module;
training a risk identification model by using the first training data is further training based on the parameters of the feature extraction module obtained by pre-training;
wherein the pre-training comprises: acquiring second training data comprising a plurality of second training samples, wherein the second training samples comprise second behavior data samples and risk tags marked for the second behavior data samples;
The second training data is utilized to pretrain a feature extraction module, wherein the second behavior data sample is used as input of a feature extraction module and a feature extraction module copy, the feature extraction module and the feature extraction module copy acquire preliminary risk information obtained by aiming at the second behavior data sample, and the feature extraction module copy adopt the same structure but different initialization parameters; the pre-trained targets include: minimizing the difference between the preliminary risk information obtained by the feature extraction module and the risk tag labeled for the second behavioral data sample, minimizing the difference between the preliminary risk information obtained by the feature extraction module copy and the risk tag labeled for the second behavioral data sample, and minimizing the output distribution divergence between the feature extraction module and the feature extraction module copy; and removing the feature extraction module copy after the pre-training is finished.
11. The method of claim 9, wherein a secondary classification layer is provided before the second classification sub-module in each integrated module, the secondary classification layer comprising a plurality of secondary classifiers;
Each auxiliary classifier carries out risk prediction by utilizing the result of the third integration processing to obtain risk information, and the auxiliary classifier and the second classification sub-module adopt the same structure but different initialization parameters;
the training goals also include at least one of:
taking one of the auxiliary classifiers with the smallest corresponding loss function as a teacher network and the rest as student networks, wherein the difference between the risk information output by the minimum chemical student network and the risk information output by the teacher network,
minimizing the difference between the risk information obtained by the prediction after integrating the risk probability values output by the auxiliary classifiers by the second classification sub-module and the risk information output by the teacher network,
maximizing the difference between the risk probability values output by the auxiliary classifiers;
and after training, removing the auxiliary classification layer.
12. The method according to any one of claims 6 to 9, wherein an integrated module with a minimum corresponding loss function is taken as a teacher network, the routing module inputs the feature representations extracted by the feature extraction modules corresponding to the primary keys included in the primary key combination into the teacher network, and the teacher network performs risk prediction by using the input feature representations to obtain a risk probability value output for the first row of data samples; the integrated module corresponding to the primary key combination is used as a student network, and the output risk information comprises a risk probability value;
The training goals further include: minimizing the difference between the risk probability value output by the teacher network and the risk probability value output by the student network.
13. A risk identification device, the device comprising:
a data acquisition unit configured to acquire behavior data to be recognized;
the risk identification unit is configured to input the behavior data to be identified into a risk identification model, and acquire risk information output by the risk identification model aiming at the behavior data to be identified; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the behavior data to be identified;
the routing module is used for determining a main key combination corresponding to the behavior data to be identified, providing the behavior data to be identified for each feature extraction module corresponding to the main key contained in the main key combination, and determining an integration module corresponding to the main key combination;
each feature extraction module is respectively corresponding to one of the main keys and is used for extracting feature representation corresponding to the main key from the behavior data to be identified;
And the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information aiming at the to-be-identified behavior data output.
14. An apparatus for training a risk identification model, the apparatus comprising:
a sample acquisition unit configured to acquire first training data including a plurality of first training samples including a first line of data samples and risk tags labeled for the first line of data samples;
a model training unit configured to train a risk recognition model using the first training data; the risk identification model comprises a routing module, a feature extraction module corresponding to a plurality of different primary keys and an integration module corresponding to a plurality of different primary key combinations, wherein the primary keys are behavior description items of the second behavior data sample;
the routing module is used for determining a primary key combination corresponding to the first row of data samples, providing the first row of data samples for each feature extraction module corresponding to a primary key contained in the primary key combination, determining an integration module corresponding to the primary key combination, and determining an integration module corresponding to the primary key combination;
Each feature extraction module is respectively corresponding to one of the main keys and is used for extracting a feature representation corresponding to the main key from the first row of data samples;
the integrated module corresponding to the main key combination is used for carrying out risk prediction by utilizing the feature representations extracted by the feature extraction modules corresponding to the main keys contained in the main key combination to obtain risk information output for the first row of data samples;
the training targets include: minimizing the difference between risk information output by the risk identification model for the first row of data samples and risk labels in the first training samples that are labeled for the first row of data samples.
15. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1 to 12.
CN202310624324.1A 2023-05-30 2023-05-30 Risk identification method, risk identification model training method and corresponding device Pending CN116595486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310624324.1A CN116595486A (en) 2023-05-30 2023-05-30 Risk identification method, risk identification model training method and corresponding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310624324.1A CN116595486A (en) 2023-05-30 2023-05-30 Risk identification method, risk identification model training method and corresponding device

Publications (1)

Publication Number Publication Date
CN116595486A true CN116595486A (en) 2023-08-15

Family

ID=87589701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310624324.1A Pending CN116595486A (en) 2023-05-30 2023-05-30 Risk identification method, risk identification model training method and corresponding device

Country Status (1)

Country Link
CN (1) CN116595486A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349346A (en) * 2023-12-05 2024-01-05 深圳市威诺达工业技术有限公司 Method for identifying main key and external key in relational database table

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349346A (en) * 2023-12-05 2024-01-05 深圳市威诺达工业技术有限公司 Method for identifying main key and external key in relational database table
CN117349346B (en) * 2023-12-05 2024-03-26 深圳市威诺达工业技术有限公司 Method for identifying main key and external key in relational database table

Similar Documents

Publication Publication Date Title
Save et al. A novel idea for credit card fraud detection using decision tree
CN112700252B (en) Information security detection method and device, electronic equipment and storage medium
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN109831459B (en) Method, device, storage medium and terminal equipment for secure access
CN112927061B (en) User operation detection method and program product
CN113011884A (en) Account feature extraction method, device and equipment and readable storage medium
CN116595486A (en) Risk identification method, risk identification model training method and corresponding device
CN115204886A (en) Account identification method and device, electronic equipment and storage medium
CN112487284A (en) Bank customer portrait generation method, equipment, storage medium and device
CN112819024A (en) Model processing method, user data processing method and device and computer equipment
CN115293235A (en) Method for establishing risk identification model and corresponding device
CN115935265B (en) Method for training risk identification model, risk identification method and corresponding device
CN113935738A (en) Transaction data processing method, device, storage medium and equipment
CN116823428A (en) Anti-fraud detection method, device, equipment and storage medium
CN116522131A (en) Object representation method, device, electronic equipment and computer readable storage medium
CN116245645A (en) Financial crime partner detection method based on graph neural network
CN115630147A (en) Response method, response device, electronic equipment and storage medium
Vaishnaw et al. Development of anti-phishing model for classification of phishing e-mail
CN111126503B (en) Training sample generation method and device
CN113468540A (en) Security portrait processing method based on network security big data and network security system
Sun et al. Image steganalysis based on convolutional neural network and feature selection
Sivanantham et al. Web Hazard Identification and Detection Using Deep Learning-A Comparative Study
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN112966122A (en) Corpus intention identification method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination