CN117372021A - Training method of risk identification model, risk identification method and device - Google Patents

Training method of risk identification model, risk identification method and device Download PDF

Info

Publication number
CN117372021A
CN117372021A CN202311228697.3A CN202311228697A CN117372021A CN 117372021 A CN117372021 A CN 117372021A CN 202311228697 A CN202311228697 A CN 202311228697A CN 117372021 A CN117372021 A CN 117372021A
Authority
CN
China
Prior art keywords
wind control
risk identification
data
sample
control data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311228697.3A
Other languages
Chinese (zh)
Inventor
杨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311228697.3A priority Critical patent/CN117372021A/en
Publication of CN117372021A publication Critical patent/CN117372021A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a training method of a risk identification model, which comprises the steps of obtaining wind control sample data; splitting the wind control sample data into sub-sample sequences according to a preset time granularity; inputting the sub-sample sequence into an encoder to obtain a first sample feature vector; arranging the first sample feature vectors according to a time sequence to obtain second sample feature vectors; inputting the second sample feature vector into a classifier; determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data; updating parameters of the encoder and the classifier based on the loss function; the risk identification model is determined based on the encoder and the classifier. Accordingly, the invention discloses a risk identification method and a risk identification device.

Description

Training method of risk identification model, risk identification method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a risk identification model training method, a risk identification method, and a risk identification device.
Background
The wind control identification means that wind control data are identified, and whether risk or risk type exists is judged. The wind control data is composed of a series of wind control events such as login, check-in, payment, transfer, red package and other various events, i.e. the wind control data is a long time series data. Currently, neural networks are commonly used to identify wind control data. Because of the characteristic that the sequence dependence of wind control data on time is large and the data volume is large, a risk identification model which is usually trained occupies a large space, the training and calculation speed of the model is low, and when the model is used on line, all historical data are difficult to store in a storage which can be read in real time.
Disclosure of Invention
One or more embodiments of the present disclosure provide a risk recognition model training method, risk recognition method, and apparatus, which can at least partially overcome the above technical problems.
According to a first aspect, there is provided a training method of a risk identification model, comprising:
acquiring wind control sample data;
splitting the wind control sample data into sub-sample sequences according to a preset time granularity;
inputting the sub-sample sequence into an encoder to obtain a first sample feature vector;
arranging the first sample feature vectors according to a time sequence to obtain second sample feature vectors;
inputting the second sample feature vector into a classifier;
determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data;
updating parameters of the encoder and the classifier based on the loss function;
the risk identification model is determined based on the encoder and the classifier.
According to a second aspect, there is provided a risk identification method comprising:
acquiring wind control data to be identified;
inputting the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; the risk identification model is obtained by training in advance by adopting the training method of the risk identification model;
And determining a risk identification result of the wind control data based on the prediction result of the risk identification model.
As an alternative embodiment of the method of the second aspect, the method further comprises:
aiming at a preset wind control scene, collecting historical wind control data of the wind control scene;
splitting the historical wind control data into subsequences according to a preset time granularity;
inputting the subsequence into a pre-trained encoder to obtain a historical sub-feature vector of the wind control scene;
and storing the historical sub-feature vectors of the wind control scene.
Correspondingly, acquiring wind control data to be identified specifically comprises:
aiming at a target wind control scene, acquiring current wind control data of the target wind control scene based on a preset time window; and the length of the current wind control data of the target wind control scene is equal to the length of the subsequence.
Specifically, inputting the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; the method specifically comprises the following steps:
inputting the wind control data into the encoder to obtain a first characteristic vector;
acquiring a historical sub-feature vector of a target wind control scene based on the target wind control scene to which the wind control data belong;
Arranging the first characteristic vector and the historical sub-characteristic vector of the target wind control scene according to a time sequence to obtain a second characteristic vector;
and inputting the second feature vector into a pre-trained classifier to obtain the prediction result.
According to a third aspect, there is provided a training device of a risk identification model, comprising:
the first data acquisition module is configured to acquire wind control sample data;
the first data processing module is configured to split the wind control sample data into sub-sample sequences according to a preset time granularity;
the training module is configured to input the sub-sample sequence into an encoder to obtain a first sample characteristic vector; arranging the first sample feature vectors according to a time sequence to obtain second sample feature vectors; inputting the second sample feature vector into a classifier; determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data; based on the loss function, parameters of the encoder and the classifier are updated.
According to a fourth aspect, there is provided a risk identification device comprising:
the second data acquisition module is configured to acquire wind control data to be identified;
The risk identification module is configured to input the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; determining a risk identification result of the wind control data based on a prediction result of the risk identification model; the risk identification model is obtained by training in advance by adopting the training method of the risk identification model.
According to a fifth aspect, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the training method of the risk identification model described above.
According to a sixth aspect, there is provided an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the training method of the risk identification model described above.
According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the risk identification methods described above.
According to an eighth aspect, there is provided an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of any of the risk identification methods described above.
The training method of the risk identification model has the beneficial effects that the lightweight risk identification model can be obtained and used for analyzing the wind control data of the long sequence to obtain the risk identification result. According to the method, wind control sample data of a long-time sequence are divided into sub-sample sequences, feature codes are respectively carried out, and all obtained feature vectors are input into a classifier after being arranged according to a time sequence, so that the classifier is free from the limitation of the sequence length and the time span, the calculation speed of the classifier and the training speed of a risk recognition model are accelerated, the calculation complexity is reduced, the scale required by the model is further reduced, and a lightweight risk recognition model is obtained. The trained risk identification model can be applied to the risk identification method in the embodiment of the specification, and by pre-storing the historical wind control data as the feature vector, the risk identification model is prevented from storing all the historical wind control data, the storage space required by the model is reduced, meanwhile, the calculation cost is reduced, and the risk identification rate of the model is accelerated.
The risk recognition model training device and the risk recognition device described in the embodiments of the present disclosure have the above-described beneficial effects as well.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present description, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 schematically shows a flow chart of a training method of a risk identification model according to an embodiment of the present disclosure in an implementation manner.
Fig. 2 schematically illustrates a training method of a risk identification model according to an embodiment of the present disclosure in one scenario.
Fig. 3 schematically shows a flow chart of a risk identification method according to an embodiment of the present disclosure in an implementation manner.
Fig. 4 schematically illustrates a risk identification method according to an embodiment of the present disclosure in one scenario.
Fig. 5 schematically shows a block diagram of a training apparatus for risk identification models according to an embodiment of the present disclosure, in an implementation manner.
Fig. 6 shows schematically a block diagram of a risk identification device according to an embodiment of the present disclosure in an implementation manner.
Fig. 7 schematically illustrates a risk identification system according to an embodiment of the present disclosure.
Fig. 8 schematically illustrates another risk identification system according to an embodiment of the present disclosure.
Fig. 9 exemplarily shows a block diagram of an electronic device provided in an embodiment of the present specification.
Detailed Description
It is first noted that the terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The wind control identification means that wind control data are identified, and whether risk exists or the type of the risk is judged. The wind control data is composed of a series of wind control events such as login, check-in, payment, transfer, red package and other various events, i.e. the wind control data is a long time series data. When using a traditional machine learning model to perform risk recognition, various statistical calculations are generally performed on the sequence data, such as the last 7 days of transaction number, to obtain a plurality of features, and then the features are used as input of a risk recognition model. However, there are large differences in statistical calculations within different risk scenarios, so the calculation logic of these features requires a large amount of processing experience corresponding to a particular scenario; however, even if there is a sufficient processing experience, there is a place where consideration is lacking, and it is difficult to sufficiently mine information in data.
Currently, neural networks are commonly used to identify wind control data, such as Recurrent Neural Networks (RNNs), long Short-Term Memory (LSTM), convolutional Neural Networks (CNNs), and self-attention networks (self-attention). The neural network can directly use data of multiple modes, such as sequence data, and can automatically extract key information through training data. This approach does not require experience, and only retrains the neural network when the risk scene changes; and the original sequence data is directly input, so that no information loss caused by manual processing exists, and the information is fully utilized. However, due to limited computational resources, the sequence length and time span are typically limited, such as limiting the use of up to the last N days of sequence data, and the number of entries in the sequence cannot exceed M. However, from previous studies of risk characteristics, it is common for long spans of time to exhibit high importance, such as some characteristics for the past 90 days.
The long-sequence historical information is reserved to show a certain necessity in risk identification, but because of the characteristic that the sequence dependence of wind control data on time is large and the data quantity is large, a trained risk identification model occupies a large space, and the training and calculating speeds of the model are low; it is also difficult to save all of the history data to a storage that can be read in real time when used on-line.
In view of this, it is desirable to obtain a training method of a risk identification model capable of performing risk identification using long-sequence data with low resource consumption.
The training method, risk identification method and apparatus for risk identification model described in the embodiments of the present specification will be further described in detail below with reference to the accompanying drawings and specific embodiments of the present specification, but the detailed description is not limited to the embodiments of the present specification.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
In one embodiment of the present specification, a method for training a risk identification model is presented. Fig. 1 schematically shows a flow chart of a training method of a risk identification model according to an embodiment of the present disclosure in an implementation manner.
As shown in fig. 1, the method includes:
s100: and acquiring wind control sample data.
S102: and splitting the wind control sample data into sub-sample sequences according to a preset time granularity.
S104: the sequence of sub-samples is input to an encoder, resulting in a first sample feature vector.
S106: and arranging the first sample feature vectors according to the time sequence to obtain second sample feature vectors.
S108: the second sample feature vector is input to a classifier.
S110: and determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data.
S112: based on the loss function, parameters of the encoder and classifier are updated.
S114: based on the encoder and the classifier, a risk identification model is determined.
The wind control sample data is composed of a series of wind control events such as login, check-in, payment, transfer, red pack, and other various events, and is a long time series data. Specifically, firstly, wind control sample data is obtained according to a preset sequence length and a preset time span, and the longer the sequence length or the time span is, the larger the data volume of the input risk identification model is, so that the more useful information is contained in the corresponding wind control sample data, the more the extracted features can represent the risk of the sample data. Then dividing the wind control sample data according to a preset time granularity, for example, the wind control sample data of each day can be divided into a sub-sample sequence by taking the day as a time granularity unit, namely from the current day; therefore, the calculation complexity in model training can be reduced, and the scale of the trained risk identification model can be controlled, so that the resource loss is reduced.
The characteristic of each sub-sample sequence is extracted by an encoder, and a first sample characteristic vector divided according to time granularity is obtained. Alternatively, the encoder may be constructed based on the structure of a neural network, including but not limited to RNN, LSTM, CNN and self-attention networks, etc.
Because of the time-sequential dependence of the wind control data, the first sample feature vector needs to be arranged in time sequence to obtain the second sample feature vector before risk identification is performed. The time sequence of the first sample feature vector corresponds to the time sequence of the sub-sample sequence split according to the preset time granularity. Specifically, the sub-sample sequence obtained after splitting follows a certain time sequence, and the corresponding first sample feature vectors are arranged based on the time sequence. For example, the wind control sample data of each day of the first 90 days is divided into a sub-sample sequence by taking the day as a time granularity unit, and the first sample feature vectors of each day of the first 90 days are obtained after feature encoding, so that the first sample feature vectors can be arranged in the time sequence from the first 90 th day to the first 1 st day according to the time sequence of the sub-sample sequence.
And inputting the sorted features, namely the second sample feature vector, into a classifier to perform risk identification, so as to obtain a risk prediction result, for example, whether the risk exists, and of course, in other embodiments, specific risk categories contained in the sample data, such as theft risk, fraud risk and the like, can be predicted. Alternatively, the classifier may also be constructed based on the structure of a neural network, including but not limited to RNN, LSTM, CNN, self-attention network, and the like. The real label of the wind control sample data is used as a supervision signal of model training and is used for marking real risk conditions (such as whether risk exists or not or the risk type) of the wind control sample data, so that the accuracy of a classifier prediction result is measured. Specifically, the loss function may be determined based on the difference between the prediction result of the classifier and the real label of the wind control sample data, such as calculating the cross entropy between the two, and then updating the parameters of the encoder and the classifier with the goal of minimizing the difference between the prediction result of the classifier and the real label of the wind control sample data; after several rounds of training, when the training end condition is reached (for example, the loss function value is smaller than a preset threshold value), the training is stopped, so as to determine the risk identification model.
Embodiments of the training method for risk identification model described in the embodiments of the present specification are described in detail below in a specific scenario with reference to the accompanying drawings, but the detailed description is not meant to limit the embodiments of the present specification.
Fig. 2 schematically illustrates a training method of a risk identification model according to an embodiment of the present disclosure in one scenario.
As shown in fig. 2, assuming that a total of n days of wind control sample data are required, and sub-sample sequences are divided in units of days, the wind control sample data are composed of the wind control data of the same day and the wind control data of each day n-1 days before. The wind control data of the same day is set to be an event sequence of events marked by the real label when the same day is stopped, and the event sequence is represented by s 1; the wind control data of the day n-1 comprises an event sequence formed by all wind control events of the day n-1, the wind control data of the day n is represented by s2, and the wind control data of the day n-1 is represented by sn, and the wind control sample data is split into n sub-sample sequences s1, s2, … and sn.
In fig. 2, m1 represents an encoder, and the same encoder structure, such as a neural network, may be used for processing different sub-sample sequences. m1 is input as wind control data of a certain day, namely a sub-sample sequence, and a corresponding first sample feature vector of the sub-sample sequence is output to represent the risk condition of the certain day; the first sample feature vector is denoted v1, v2, …, vn.
m2 represents a neural network-based classifier, which inputs the first sample feature vectors v1, v2, …, vn arranged in time sequence, and outputs a risk prediction score or a representation, so as to obtain a risk prediction result, such as whether a risk or a risk type exists, according to the risk prediction score or the representation. Next, cross entropy is calculated based on the difference between the prediction result of the classifier and the real tag of the wind control sample data, thereby determining a loss function, and parameters of the encoder m1 and the classifier m2 are updated with the loss function minimized as a target.
In some embodiments of the present disclosure, a risk identification method is further provided, as shown in fig. 3, including:
s200: and acquiring wind control data to be identified.
Because the wind control data with long time span shows higher importance, in order to ensure accuracy of risk identification, the wind control data to be identified can select a longer time span or a sequence length so as to ensure that the data quantity is sufficient, for example, wind control events of nearly three months can be selected as the wind control data to be identified.
In some embodiments, further comprising:
collecting historical wind control data of a wind control scene aiming at a preset wind control scene;
Splitting historical wind control data into subsequences according to a preset time granularity;
inputting the subsequence into a pre-trained encoder to obtain a historical sub-feature vector of the wind control scene;
and storing the historical sub-feature vectors of the wind control scene.
Different wind control scenarios such as transfer to card, account, online physical transaction, etc., and the wind control scenarios may contain a series of wind control events such as identity verification, payment, etc., which constitute wind control data. In order to avoid feature extraction of historical wind control data beyond the current day during each risk identification, in the embodiment of the specification, the historical wind control data are collected in advance, feature codes are performed on subsequences divided according to preset time granularity, and the obtained historical sub-feature vectors are stored. And storing the historical sub-feature vectors under different wind control scenes by presetting the different wind control scenes. Therefore, all past wind control events are not needed to be stored online, and the historical sub-feature vectors of wind control data corresponding to each period of time in the past and the wind control data of the current day are stored, so that the storage cost is saved.
The encoder is trained in advance by adopting the training method of the risk identification model in the embodiment of the specification.
Correspondingly, acquiring wind control data to be identified specifically comprises:
aiming at a target wind control scene, acquiring current wind control data of the target wind control scene based on a preset time window; the length of the current wind control data of the target wind control scene is equal to the length of the subsequence.
The preset time window may represent a time period for collecting the wind control data; based on a preset time window, all wind control data of the current day when the target wind control scene is cut off to the current moment can be acquired. It should be noted that, the length of the current wind control data of the target wind control scene is set to be consistent with the length of the subsequence obtained by splitting the historical wind control data, so that the feature vector obtained by feature encoding the current wind control data can be adapted to the historical sub-feature vector, and the feature vector and the historical sub-feature vector can be conveniently combined to perform risk identification.
S202: inputting the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; the risk recognition model is trained in advance by the training method of the risk recognition model.
In some embodiments, the wind control data is input into a risk identification model to obtain a prediction result of the risk identification model for the wind control data, which specifically includes:
Inputting wind control data into an encoder to obtain a first characteristic vector;
acquiring a historical sub-feature vector of a target wind control scene based on the target wind control scene to which the wind control data belongs;
arranging the first feature vector and the historical sub-feature vector of the target wind control scene according to a time sequence to obtain a second feature vector;
and inputting the second feature vector into a pre-trained classifier to obtain a prediction result.
Specifically, firstly, feature coding is carried out on current wind control data of a target wind control scene to obtain a first feature vector; and acquiring a pre-stored historical sub-feature vector of the target wind control scene, namely a first feature vector of the historical wind control data, and arranging all the first feature vectors according to a time sequence to obtain a second feature vector. And then predicting risk by using a pre-trained classifier.
When the wind control data to be identified is processed, only the wind control data of the current day is required to be subjected to feature coding, and then the historical sub-feature vectors stored in advance are called, so that the calculation cost can be reduced.
S204: and determining a risk identification result of the wind control data based on the prediction result of the risk identification model.
Fig. 4 schematically illustrates a risk identification method according to an embodiment of the present disclosure in one scenario.
As shown in fig. 4, assuming that n days of wind control data to be recognized are required in total, the sub-sequences are divided in units of days, the wind control data to be recognized is composed of wind control data of the same day and historical wind control data of each day of the previous n-1 days. When the day of the previous n-1 days is ended, m1 performs feature coding on the wind control data of the current day and stores the wind control data as a historical sub-feature vector of the current day, so that n-1 historical sub-feature vectors v2, … and vn are stored in total; wherein the historical sub-feature vector of the previous day is denoted v2, and similarly, the historical sub-feature vector of the first n-1 days is denoted vn.
The encoder m1 encodes the wind control data of the current day to obtain a first characteristic vector v1; then, all feature vectors, namely v1, v2, … and vn, are arranged according to the time sequence, input into a trained classifier m2, and output a risk prediction score or a representation, so that a risk recognition result, such as whether a risk or a risk type exists, is obtained according to the risk prediction score or the representation.
Some embodiments of the present disclosure provide a training apparatus for risk identification model, as shown in fig. 5, including:
a first data acquisition module 30 configured to acquire wind control sample data;
A first data processing module 32 configured to split the wind control sample data into a sequence of sub-samples according to a preset time granularity;
a training module 34 configured to input the sequence of sub-samples into the encoder to obtain a first sample feature vector; arranging the first sample feature vectors according to the time sequence to obtain second sample feature vectors; inputting the second sample feature vector into a classifier; determining a loss function based on a prediction result of the classifier and a real label of wind control sample data; based on the loss function, parameters of the encoder and classifier are updated.
The wind control sample data is composed of a series of wind control events such as login, check-in, payment, transfer, red pack, and other various events, and is a long time series data. The first data acquisition module is used for acquiring wind control sample data according to a preset sequence length and a preset time span, and the longer the sequence length or the time span is, the larger the data volume of the input risk identification model is, so that the more useful information is contained in the corresponding wind control sample data, the more the extracted characteristics can represent the risk of the sample data.
The first data processing module is configured to divide the wind control sample data according to a preset time granularity, for example, the wind control sample data of each day before may be divided into a sub-sample sequence by taking a day as a time granularity unit, that is, from the current day; therefore, the calculation complexity in model training can be reduced, and the scale of the trained risk identification model can be controlled, so that the resource loss is reduced.
The training module extracts the characteristics of each sub-sample sequence by using the encoder to obtain first sample characteristic vectors divided according to time granularity. Alternatively, the encoder may be constructed based on the structure of a neural network, including but not limited to RNN, LSTM, CNN and self-attention networks, etc.
Because of the time-sequential dependence of the wind control data, the training module needs to arrange the first sample feature vectors in time sequence to obtain the second sample feature vectors before risk identification. The time sequence of the first sample feature vector corresponds to the time sequence of the sub-sample sequence split according to the preset time granularity. Specifically, the sub-sample sequence obtained after splitting follows a certain time sequence, and the training module ranks the corresponding first sample feature vectors based on the time sequence. For example, the wind control sample data of each day of the first 90 days is divided into a sub-sample sequence by taking the day as a time granularity unit, and the first sample feature vectors of each day of the first 90 days are obtained after feature encoding, so that the training module can arrange the first sample feature vectors in the time sequence from the first 90 th day to the first 1 st day according to the time sequence of the sub-sample sequence.
Then, the training module inputs the sorted features, i.e. the second sample feature vector, into the classifier to perform risk recognition, so as to obtain a risk prediction result, for example, whether the risk exists, and of course, in other embodiments, specific risk categories, such as theft risk, fraud risk, etc., included in the sample data may be predicted. Alternatively, the classifier may also be constructed based on the structure of a neural network, including but not limited to RNN, LSTM, CNN, self-attention network, and the like. Specifically, the training module may determine a loss function based on a difference between a prediction result of the classifier and a real label of the wind control sample data, such as calculating a cross entropy between the two, and then update parameters of the encoder and the classifier with a goal of minimizing the difference between the prediction result of the classifier and the real label of the wind control sample data; after several rounds of training, when the training end condition is reached (for example, the loss function value is smaller than a preset threshold value), the training is stopped, so as to determine the risk identification model.
Some embodiments of the present disclosure further provide a risk identification apparatus, as shown in fig. 6, including:
a second data acquisition module 40 configured to acquire wind control data to be identified;
The risk identification module 42 is configured to input the wind control data into the risk identification model to obtain a prediction result of the risk identification model for the wind control data; determining a risk identification result of the wind control data based on a prediction result of the risk identification model; the risk recognition model is trained in advance by the training method of the risk recognition model.
In some embodiments, further comprising:
the data preprocessing module is configured to collect historical wind control data of a wind control scene aiming at a preset wind control scene; splitting historical wind control data into subsequences according to a preset time granularity; inputting the subsequence into a pre-trained encoder to obtain a historical sub-feature vector of the wind control scene;
and the storage module is configured to store the history sub-feature vector.
Different wind control scenarios such as transfer to card, account, online physical transaction, etc., and the wind control scenarios may contain a series of wind control events such as identity verification, payment, etc., which constitute wind control data. In order to avoid feature extraction of historical wind control data beyond the current day during each risk identification, the data preprocessing module collects the historical wind control data in advance, performs feature coding on subsequences divided according to preset time granularity, and then stores the obtained historical sub-feature vectors by the storage module. And storing the historical sub-feature vectors under different wind control scenes by presetting the different wind control scenes. Therefore, all past wind control events are not needed to be stored online, and the historical sub-feature vectors of wind control data corresponding to each period of time in the past and the wind control data of the current day are stored, so that the storage cost is saved.
The encoder in the data preprocessing module is obtained by training in advance by adopting the training method of the risk identification model in the embodiment of the specification.
In some embodiments, the second data acquisition module is specifically configured to acquire, for a target wind-controlled scene, current wind-controlled data of the target wind-controlled scene based on a preset time window; the length of the current wind control data of the target wind control scene is equal to the length of the subsequence.
The preset time window may represent a time period for collecting the wind control data; the second data acquisition module can acquire all wind control data of the day when the target wind control scene is cut off to the current moment based on a preset time window. It should be noted that, the second data obtaining module sets the length of the current wind control data of the target wind control scene to be consistent with the sub-sequence length obtained by splitting the historical wind control data, so that the feature vector obtained by feature encoding the current wind control data can be adapted to the historical sub-feature vector, and the feature vector and the historical sub-feature vector can be conveniently combined to perform risk identification.
In some embodiments, the risk identification module is specifically configured to input wind control data into the encoder to obtain a first feature vector; acquiring a historical sub-feature vector of a target wind control scene based on the target wind control scene to which the wind control data belongs; arranging the first feature vector and the historical sub-feature vector of the target wind control scene according to a time sequence to obtain a second feature vector; and inputting the second feature vector into a pre-trained classifier to obtain a prediction result.
Therefore, when the risk identification module processes the wind control data to be identified, only the wind control data of the current day is required to be subjected to feature coding, and then the historical sub-feature vector stored in advance is called, so that the calculation cost can be reduced.
One or more embodiments of the present invention provide a training method of a risk identification model and a risk identification method. Referring to fig. 7, fig. 7 schematically illustrates a risk identification system that may be used to implement the training method of the risk identification model and the risk identification method. It should be noted that, the training method and the risk identification method of the risk identification model according to one or more embodiments of the present application may be implemented by depending on the risk identification system shown in fig. 7, but are not limited to the risk identification system.
As shown in fig. 7, the risk identification system includes a user terminal 50 and an air control terminal 52, where the user terminal 50 and the air control terminal 52 may be disposed in two terminal devices respectively. The air control terminal 52 is connected to the user terminal 50 through a communication link, which may be a wired network or a wireless network. For example, the wind control terminal 52 may establish a communication connection with the user terminal 50 using WIFI, bluetooth, infrared, etc. communication methods. Alternatively, the wind control end 52 may also establish a communication connection with the user terminal 50 through a mobile network, where the network system of the mobile network may be any one of 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4g+ (lte+), wiMax, etc.
The user terminal 50 may be a terminal device such as a mobile phone, a tablet computer, a notebook computer, a personal computer, etc., configured to perform operations such as login, payment, personal identification, etc.
The trained risk recognition model is deployed in the wind control end 52, wind control data generated by a user at the user terminal can be obtained through a communication link, the wind control data is input into the risk recognition model to recognize risks, a risk recognition result is output, and the risk recognition result is returned to the user terminal to inform the user. The wind control end can be any device, equipment, platform, equipment cluster with computing and processing capabilities. In this embodiment, the implementation form of the wind control end is not limited, for example, the wind control end may be a single server, or may be a server cluster formed by a plurality of servers, and the wind control end may also be a cloud server, also referred to as a cloud computing server or a cloud host, which is a host product in a cloud computing service system. The risk identification model can be trained in an air control terminal or a user terminal, and can also be trained in one or more other servers.
Of course, in other embodiments, the user terminal 50 and the wind control terminal 52 may be deployed in the same terminal device in a module form, as shown in fig. 8, where the terminal device may include a user device 62, a user device 64, and a user device 66, and each user device may individually complete each event under the operation of the user and identify the risk for the generated event. Specifically, each user device may use the data in the data storage system 60 to train the risk identification model, or train the risk identification model by using other servers or terminal devices, store the trained model in the form of program codes in the data storage system 60, and invoke the program codes in the data storage system 60 by the user device to implement the risk identification method provided in the embodiments of the present specification.
One embodiment of the present specification provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of training a risk identification model described above.
One embodiment in the present specification also provides an electronic device, including:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the training method of the risk identification model described above.
An embodiment in the present specification provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the risk identification methods described above.
One embodiment in the present specification also provides an electronic device, including:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of any of the risk identification methods described above.
Fig. 9 exemplarily shows a block diagram of an electronic device provided in an embodiment of the present disclosure, which shows a schematic structural diagram of a computer system 700 of a terminal device or a server suitable for implementing an embodiment of the present invention. The terminal device or server shown in fig. 9 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.
In a typical configuration, computer 700 includes one or more processors (CPUs) 702, an input interface 704, an output interface 706, a network interface 708, and a memory 710.
Memory 710 may include non-volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
It should be noted that the above-mentioned embodiments are merely examples of the present invention, and it is obvious that the present invention is not limited to the above-mentioned embodiments, and many similar variations are possible. All modifications attainable or obvious from the present disclosure set forth herein should be deemed to be within the scope of the present disclosure.

Claims (14)

1. A method of training a risk identification model, comprising:
acquiring wind control sample data;
splitting the wind control sample data into sub-sample sequences according to a preset time granularity;
inputting the sub-sample sequence into an encoder to obtain a first sample feature vector;
arranging the first sample feature vectors according to a time sequence to obtain second sample feature vectors;
inputting the second sample feature vector into a classifier;
determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data;
updating parameters of the encoder and the classifier based on the loss function;
the risk identification model is determined based on the encoder and the classifier.
2. A risk identification method, comprising:
acquiring wind control data to be identified;
inputting the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; the risk recognition model is obtained by training in advance by adopting the training method of the risk recognition model according to claim 1;
And determining a risk identification result of the wind control data based on the prediction result of the risk identification model.
3. The method of claim 2, further comprising:
aiming at a preset wind control scene, collecting historical wind control data of the wind control scene;
splitting the historical wind control data into subsequences according to a preset time granularity;
inputting the subsequence into a pre-trained encoder to obtain a historical sub-feature vector of the wind control scene;
and storing the historical sub-feature vectors of the wind control scene.
4. The method of claim 3, wherein the obtaining wind control data to be identified specifically comprises:
aiming at a target wind control scene, acquiring current wind control data of the target wind control scene based on a preset time window; and the length of the current wind control data of the target wind control scene is equal to the length of the subsequence.
5. The method of claim 3, wherein the wind control data is input into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; the method specifically comprises the following steps:
inputting the wind control data into the encoder to obtain a first characteristic vector;
acquiring a historical sub-feature vector of a target wind control scene based on the target wind control scene to which the wind control data belong;
Arranging the first characteristic vector and the historical sub-characteristic vector of the target wind control scene according to a time sequence to obtain a second characteristic vector;
and inputting the second feature vector into a pre-trained classifier to obtain the prediction result.
6. A training device of a risk identification model, comprising:
the first data acquisition module is configured to acquire wind control sample data;
the first data processing module is configured to split the wind control sample data into sub-sample sequences according to a preset time granularity;
the training module is configured to input the sub-sample sequence into an encoder to obtain a first sample characteristic vector; arranging the first sample feature vectors according to a time sequence to obtain second sample feature vectors; inputting the second sample feature vector into a classifier; determining a loss function based on the prediction result of the classifier and the real label of the wind control sample data; based on the loss function, parameters of the encoder and the classifier are updated.
7. A risk identification device comprising:
the second data acquisition module is configured to acquire wind control data to be identified;
the risk identification module is configured to input the wind control data into a risk identification model to obtain a prediction result of the risk identification model for the wind control data; determining a risk identification result of the wind control data based on a prediction result of the risk identification model; the risk recognition model is pre-trained by the training method of the risk recognition model according to claim 1.
8. The apparatus of claim 7, further comprising:
the data preprocessing module is configured to collect historical wind control data of a wind control scene aiming at a preset wind control scene; splitting the historical wind control data into subsequences according to a preset time granularity; inputting the subsequence into a pre-trained encoder to obtain a historical sub-feature vector of the wind control scene;
and the storage module is configured to store the history sub-feature vector.
9. The device of claim 8, wherein the second data acquisition module is specifically configured to acquire, for a target wind-controlled scene, current wind-controlled data of the target wind-controlled scene based on a preset time window; and the length of the current wind control data of the target wind control scene is equal to the length of the subsequence.
10. The apparatus of claim 8, the risk identification module being specifically configured to input the wind control data to the encoder to obtain a first feature vector; acquiring a historical sub-feature vector of a target wind control scene based on the target wind control scene to which the wind control data belong; arranging the first characteristic vector and the historical sub-characteristic vector of the target wind control scene according to a time sequence to obtain a second characteristic vector; and inputting the second feature vector into a pre-trained classifier to obtain the prediction result.
11. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of claim 1.
12. An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of claim 1.
13. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 2 to 5.
14. An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions which, when read for execution by the one or more processors, perform the steps of the method of any of claims 2 to 5.
CN202311228697.3A 2023-09-21 2023-09-21 Training method of risk identification model, risk identification method and device Pending CN117372021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311228697.3A CN117372021A (en) 2023-09-21 2023-09-21 Training method of risk identification model, risk identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311228697.3A CN117372021A (en) 2023-09-21 2023-09-21 Training method of risk identification model, risk identification method and device

Publications (1)

Publication Number Publication Date
CN117372021A true CN117372021A (en) 2024-01-09

Family

ID=89399251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311228697.3A Pending CN117372021A (en) 2023-09-21 2023-09-21 Training method of risk identification model, risk identification method and device

Country Status (1)

Country Link
CN (1) CN117372021A (en)

Similar Documents

Publication Publication Date Title
CN110443969B (en) Fire detection method and device, electronic equipment and storage medium
CN110264270B (en) Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium
CN115345355B (en) Energy consumption prediction model construction method, short-term energy consumption prediction method and related devices
CN109523117A (en) Risk Forecast Method, device, computer equipment and storage medium
CN107392311B (en) Method and device for segmenting sequence
CN112382099A (en) Traffic road condition prediction method and device, electronic equipment and storage medium
CN114580263A (en) Knowledge graph-based information system fault prediction method and related equipment
CN114550053A (en) Traffic accident responsibility determination method, device, computer equipment and storage medium
CN110968689A (en) Training method of criminal name and law bar prediction model and criminal name and law bar prediction method
CN116451594B (en) Training method and device of icing prediction model, prediction method and device and electronic equipment
CN111104954A (en) Object classification method and device
CN114638633A (en) Abnormal flow detection method and device, electronic equipment and storage medium
CN112488142A (en) Radar fault prediction method and device and storage medium
CN111310918B (en) Data processing method, device, computer equipment and storage medium
CN113326177A (en) Index anomaly detection method, device, equipment and storage medium
CN115222061A (en) Federal learning method based on continuous learning and related equipment
CN110781818A (en) Video classification method, model training method, device and equipment
CN117688955A (en) Method, apparatus, electronic device, and computer-readable medium for humidity temperature adjustment
CN112330442A (en) Modeling method and device based on ultra-long behavior sequence, terminal and storage medium
CN117372021A (en) Training method of risk identification model, risk identification method and device
KR102608408B1 (en) Method for predicting depression occurrence using artificial intelligence model and computer readable record medium thereof
CN113011893B (en) Data processing method, device, computer equipment and storage medium
CN115018608A (en) Risk prediction method and device and computer equipment
CN113837977A (en) Object tracking method, multi-target tracking model training method and related equipment
CN114120180A (en) Method, device, equipment and medium for generating time sequence nomination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination