CN115905864A - Abnormal data detection model training method and device and computer equipment - Google Patents

Abnormal data detection model training method and device and computer equipment Download PDF

Info

Publication number
CN115905864A
CN115905864A CN202211410472.5A CN202211410472A CN115905864A CN 115905864 A CN115905864 A CN 115905864A CN 202211410472 A CN202211410472 A CN 202211410472A CN 115905864 A CN115905864 A CN 115905864A
Authority
CN
China
Prior art keywords
resource transfer
data
abnormal
historical resource
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211410472.5A
Other languages
Chinese (zh)
Inventor
郑子彬
黄进波
赵山河
蔡倬
邬稳
林华春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merchants Union Consumer Finance Co Ltd
Sun Yat Sen University
Original Assignee
Merchants Union Consumer Finance Co Ltd
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merchants Union Consumer Finance Co Ltd, Sun Yat Sen University filed Critical Merchants Union Consumer Finance Co Ltd
Priority to CN202211410472.5A priority Critical patent/CN115905864A/en
Publication of CN115905864A publication Critical patent/CN115905864A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application relates to an abnormal data detection model training method, an abnormal data detection model training device and computer equipment. The method comprises the following steps: acquiring historical resource transfer data and training data labels, wherein each historical resource transfer characteristic included in the historical resource transfer data is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type; calculating the feature correlation among the historical resource transfer features and screening the features to obtain the transfer features of the target resources; inputting each target resource transfer characteristic into the initial abnormal data detection model for training to obtain an updated abnormal data detection model, and returning to the step of obtaining the historical resource transfer data and the corresponding training data label for iterative execution until a training completion condition is reached to obtain the target abnormal data detection model. By adopting the method, the training efficiency of the abnormal data detection model can be improved.

Description

Abnormal data detection model training method and device and computer equipment
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for training an abnormal data detection model.
Background
With the development of internet technology, people can use an online platform to apply for services such as resource transfer, for example, loan services. When a user wants the platform to apply for resource transfer, the user is subjected to resource transfer after the user credit evaluation. Therefore, whether the user has a risk behavior of resource transfer abnormality to the platform is necessary evaluation content, and abnormality detection of resource transfer data needs to be performed on the user.
The existing method for detecting the resource transfer data abnormity is to use a deep learning algorithm to detect user resource transfer data. However, the deep learning algorithm needs to use a large amount of training data for training in the training process, so that the data acquisition efficiency is low, and the problem of low training efficiency of the abnormal data detection model of the resource transfer data is caused.
Disclosure of Invention
In view of the above, it is necessary to provide an abnormal data detection model training method, an abnormal data detection model training apparatus, a computer device, a computer readable storage medium, and a computer program product, which can improve the abnormal data detection model training efficiency.
In a first aspect, the present application provides a method for training an abnormal data detection model. The method comprises the following steps:
acquiring historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain target resource transfer features;
inputting the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and obtaining training abnormal possibility corresponding to historical resource transfer data;
performing loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and taking the updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
In a second aspect, the application further provides an abnormal data detection model training device. The device comprises:
the acquisition module is used for acquiring historical resource transfer data and corresponding training data labels, the historical resource transfer data comprises various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting the initial historical resource transfer characteristics of the initial storage type into the target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
the screening module is used for calculating the characteristic correlation among the historical resource transfer characteristics, and screening the characteristics of the historical resource transfer characteristics based on the characteristic correlation to obtain the target resource transfer characteristics;
the initial detection module is used for inputting the transfer characteristics of each target resource into the initial abnormal data detection model to perform abnormal data detection so as to obtain the training abnormal possibility corresponding to the historical resource transfer data;
the loss calculation module is used for performing loss calculation based on the training abnormal possibility and the training data labels to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and the training iteration module is used for taking the updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features;
inputting the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and obtaining training abnormal possibility corresponding to historical resource transfer data;
performing loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and taking the updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features;
inputting the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and obtaining training abnormal possibility corresponding to historical resource transfer data;
performing loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and taking the updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
obtaining historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprise various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features;
inputting the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and obtaining training abnormal possibility corresponding to historical resource transfer data;
performing loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and taking the updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
According to the abnormal data detection model training method, the abnormal data detection model training device, the abnormal data detection model storage medium and the abnormal data detection model training program product, the historical resource transfer data and the training data labels are obtained, each historical resource transfer feature in the historical resource transfer data is obtained by converting the initial historical resource transfer feature of the initial storage type into the target storage type, the storage space represented by the initial storage type exceeds the target storage type, the storage space of the historical resource transfer data is reduced, the storage resources are saved, the reading speed of the historical resource transfer data in the model iterative training process is increased, and therefore the model training efficiency is improved. Furthermore, the feature correlation among the historical resource transfer features is calculated, feature screening is carried out on the historical resource transfer features according to the feature correlation, the target resource transfer features are obtained, feature redundancy can be reduced, and therefore the model training speed is improved when the target resource transfer features are input into the initial abnormal data detection model for model training in the follow-up process.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a method for training an anomaly data detection model;
FIG. 2 is a schematic flow chart diagram of a method for training an anomaly data detection model in one embodiment;
FIG. 3 is a flow diagram of an integration model for anomaly data detection in one embodiment;
FIG. 4 is a diagram illustrating training of an anomaly data detection integration model according to an embodiment;
FIG. 5 is a diagram illustrating abnormal data detection in the integrated abnormal data detection model, according to one embodiment;
FIG. 6 is a block diagram showing the structure of an abnormal data detection model training apparatus according to an embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment;
fig. 8 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The abnormal data detection model training method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The server 104 may obtain, through the terminal 102, historical resource transfer data and corresponding training data labels, where the historical resource transfer data includes various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and a storage space represented by the initial storage type exceeds a storage space represented by the target storage type; the server 104 calculates the feature correlation among the historical resource transfer features, and performs feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features; the server 104 inputs the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and training abnormal possibility corresponding to the historical resource transfer data is obtained; the server 104 performs loss calculation based on the training abnormal possibility and the training data labels to obtain training loss information, and updates the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model; the server 104 takes the updated abnormal data detection model as an initial abnormal data detection model, and returns the step of obtaining the historical resource transfer data and the corresponding training data labels to be executed in an iterative manner, until a training completion condition is reached, a target abnormal data detection model is obtained, and the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In one embodiment, as shown in fig. 2, an abnormal data detection model training method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 202, obtaining historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, each historical resource transfer characteristic is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and a storage space represented by the initial storage type exceeds a storage space represented by the target storage type.
The historical resource transfer data refers to pre-stored historical resource transfer data, represents data generated when an account transfers resources in a historical time period, and is used as training data for model training. The account may be a financial account, an instant messaging account, or the like. The resource transfer refers to a resource transfer process between terminals in internet services, and includes a resource transfer process performed by a management end to a user end or a resource rotation process returned to the management end by the user end after receiving resources transferred by the management end. The historical resource transfer characteristics refer to characteristic information which is used for characterizing resource transfer in historical resource transfer data. The initial storage type refers to a storage form when the initial resource transfer characteristics are stored in the storage space for the first time. The initial historical resource transfer characteristics refer to unprocessed resource transfer characteristics. The target storage type refers to a storage form with reduced storage space for characterizing the initial historical resource transfer.
Specifically, the server acquires initial historical resource transfer data sent by the terminal, the initial historical resource transfer data represents unprocessed resource transfer data, and the server stores the initial historical resource transfer data in a data storage space. The server detects that the storage type of each initial historical resource transfer feature in the initial historical resource transfer data in the storage space is the initial storage type, and converts the initial historical resource transfer feature of the initial storage type into the historical resource transfer feature of the target storage type to obtain the historical resource transfer data.
The server acquires the historical resource transfer data in the data storage space, takes the historical resource transfer data as training data, and acquires training data labels corresponding to the historical resource transfer data.
And 204, calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features.
Wherein, the feature correlation refers to the correlation degree between the various historical resource transfer features. The target resource transfer characteristics refer to historical resource transfer characteristics after characteristic screening, and the historical resource transfer characteristics related to the target resource transfer characteristics do not exist in all the historical resource transfer characteristics.
Specifically, the server calculates feature correlation among the historical resource transfer features, screens repeated features of the historical resource transfer features according to the feature correlation among the historical resource transfer features, and takes non-repeated features of the screened historical resource transfer features as target resource transfer features.
And step 206, inputting the transfer characteristics of each target resource into the initial abnormal data detection model to perform abnormal data detection, so as to obtain the training abnormal possibility corresponding to the historical resource transfer data.
The initial abnormal data detection model is an untrained abnormal data detection model constructed by the initial model parameters and used for detecting abnormal data. Anomaly data refers to data that characterizes a resource transfer anomaly. The resource transfer abnormality refers to an abnormal condition that the user side does not turn the resource to the management side in time. The training abnormal possibility refers to the abnormal possibility of data output by the initial abnormal data detection model according to the transfer characteristics of each target resource.
Specifically, the server may divide the historical resource transfer data into training resource transfer data and test resource transfer data, where the training resource transfer data includes each training target resource transfer characteristic, and the test resource transfer data includes each test target resource transfer characteristic. And acquiring a training data label corresponding to the training resource transfer data and a test data label corresponding to the test resource transfer data.
The server acquires an initial abnormal data detection model, inputs training resource transfer data into the initial abnormal data detection model for abnormal data detection, and acquires training abnormal possibility corresponding to the training resource transfer data.
And 208, performing loss calculation based on the training abnormal possibility and the training data labels to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model.
Wherein the training loss information refers to the difference information between the training anomaly probability and the training data label. The updated abnormal data detection model refers to an initial abnormal data detection model after model parameters are updated.
Specifically, the server may perform loss calculation on the training data labels and the training abnormal possibility by using a loss function, obtain difference information between the training data labels and the training abnormal possibility, and use the difference information as training loss information. And the server updates the model parameters in the initial abnormal data detection model according to the loss information to obtain an updated abnormal data detection model.
And step 210, taking the updated abnormal data detection model as an initial abnormal data detection model, and returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution until a training completion condition is reached to obtain a target abnormal data detection model, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
The target abnormal data detection model refers to an abnormal data detection model after training.
Specifically, the server inputs the test resource transfer data into the updated abnormal data detection model for abnormal data detection, and test abnormal possibility corresponding to the test resource transfer data is obtained. And then the server calculates the test abnormal possibility and the test loss information between the test data labels, when the test loss information is checked to be not up to the training completion requirement, the updated abnormal data detection model is used as an initial abnormal data detection model, the iterative execution is returned to the step of obtaining the historical resource transfer data and the corresponding training data labels, the iterative training is carried out on the updated abnormal data detection model by using the historical resource transfer data and the corresponding training data labels, and the target abnormal data detection model is obtained until the training completion condition is reached and is used for detecting the abnormal possibility of the resource transfer data.
In the abnormal data detection model training method, the abnormal data detection model training device, the computer equipment, the storage medium and the computer program product, through acquiring historical resource transfer data and a training data label, each historical resource transfer characteristic in the historical resource transfer data is obtained by converting an initial historical resource transfer characteristic of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the target storage type, so that the storage space of the historical resource transfer data is reduced, not only are the storage resources saved, but also the reading speed of the historical resource transfer data is improved in the model iterative training process, and further the model training efficiency is improved. Furthermore, the feature correlation among the historical resource transfer features is calculated, feature screening is carried out on the historical resource transfer features according to the feature correlation, the target resource transfer features are obtained, feature redundancy can be reduced, and therefore the model training speed is improved when the target resource transfer features are input into the initial abnormal data detection model for model training in the follow-up process.
In one embodiment, as shown in FIG. 3, a flow diagram of an anomaly data detection integration model is provided; the initial abnormal data detection model comprises at least two abnormal data detection models, and the training method of the abnormal data detection model further comprises the following steps:
step 302, inputting the transfer characteristics of each target resource into each initial abnormal data detection model respectively for abnormal data detection, so as to obtain training abnormal possibility corresponding to each initial abnormal data detection model, wherein at least two initial abnormal data detection models are established by using different model structures;
step 304, respectively performing loss calculation based on the training abnormal possibility and the training data label corresponding to each initial abnormal data detection model to obtain model loss information corresponding to each initial abnormal data detection model;
step 306, updating the corresponding initial abnormal data detection models respectively based on the model loss information corresponding to each initial abnormal data detection model to obtain each updated abnormal data detection model;
308, taking each updated abnormal data detection model as an initial abnormal data detection model, and returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution until a training completion condition is reached to obtain each target abnormal data detection model;
and 310, performing model integration based on each target abnormal data detection model to obtain an abnormal data detection integration model.
The abnormal data detection integrated model is a fusion model fusing each trained abnormal data detection model.
Specifically, the server obtains the history resource transfer data after feature screening and a data tag corresponding to the history resource transfer data. The server equally divides the historical resource transfer data according to the preset data to obtain each equal amount of historical resource transfer data with the same data amount, each equal amount of historical resource transfer data is respectively used as test resource transfer data, the rest equal amount of historical resource transfer data except the test resource transfer data is used as training resource transfer data, one test resource transfer data and the rest of the training resource transfer data are used as a training group by the server, and a training group corresponding to each test resource transfer data is obtained. For example, the server equally divides the historical resource transfer data into 5 equal amounts of historical resource transfer data, namely equal amount data 1, equal amount data 2, equal amount data 3, equal amount data 4 and equal amount data 5. And the server takes the equivalent data 1 as test resource transfer data and takes the equivalent data 2 to the equivalent data 5 as training resource transfer data to obtain a training set taking the equivalent data 1 as the test resource transfer data. And according to the logic, obtaining training sets with different test resource transfer data of the other four test resources in sequence.
The server acquires each initial abnormal data detection model, different initial abnormal data detection models are different in model structure and represent different model types, and each initial abnormal data detection model represents various initial abnormal data detection models.
The server inputs each training group into each type of initial abnormal data detection model for model training, each type of initial abnormal data detection model can comprise each initial abnormal data detection model of the model type, and the number of the initial abnormal data detection models corresponding to each type of initial abnormal data detection model is the same as that of the training groups. The server inputs each training set to each initial abnormal data detection model corresponding to each type of initial abnormal data detection model for model training, that is, each initial abnormal data detection model corresponds to one training set for model training, and each type of initial abnormal data detection model is trained by using a non-repetitive training set. And the initial abnormal data detection model carries out abnormal data detection on the training resource transfer data in the training group to obtain the training abnormal possibility corresponding to the training resource transfer data. And the server performs loss calculation according to the training data labels corresponding to the training resource transfer data and the training abnormal possibility to obtain model loss information corresponding to the initial abnormal data detection model. And then updating the initial abnormal data detection model according to the model training loss information to obtain an updated abnormal data detection model.
And the server verifies the updated abnormal data detection model by using the test resource transfer data in the training group, and when the abnormal possibility corresponding to the test resource transfer data is detected to not reach a preset test completion condition, iterative training is performed on the updated abnormal data detection model by using the training resource transfer data in the training group until the loss information of the updated abnormal data detection model is less than a preset threshold value and the abnormal possibility corresponding to the test resource transfer data reaches the preset test completion condition, so that a trained target abnormal data detection model is obtained.
According to the execution steps, target abnormal data detection models corresponding to the training groups are obtained, namely the target abnormal data detection models corresponding to the training groups in the various target abnormal data detection models, different model parameters are obtained by representing training with different training groups, and the target abnormal data detection models with the same model structure and constructed by using the different model parameters are used as a class of target abnormal data detection models.
The server may also obtain the history resource transfer data after feature screening as a group of training groups, and perform model training on an initial abnormal data detection model corresponding to each type of initial abnormal data detection model, respectively, to obtain a target abnormal data detection model corresponding to the training group in each type of initial abnormal data detection model. And then the server acquires the historical resource transfer data after feature screening as a next group of training groups again, model training is carried out on the initial abnormal data detection models in various initial abnormal data detection models, and various target abnormal data detection models are obtained according to the execution steps. And the server performs model integration on various target abnormal data detection models to obtain an abnormal data detection integration model. And the storage type of each target resource transfer characteristic in the history resource transfer data after the characteristic screening acquired by the server is a target storage type.
In one embodiment, as shown in FIG. 4, a training diagram of an anomaly data detection integration model is provided. The various initial abnormal data detection models are respectively a Random Forest model (RF), a multilayer perceptron Model (MLP), a Decision Tree model (DT) and a Support Vector Machine (SVM). The server divides the history resource transfer data after feature screening into 5 training groups, each training group comprises training resource transfer data and test training transfer data, the server uses the 5 training groups to respectively carry out model training on various initial abnormal data detection models to obtain various target abnormal data detection models after training is finished, each target abnormal data detection model respectively corresponds to a target abnormal data detection model corresponding to each training group, namely MLP 5, SVM 5, DT 5 and RF 5, and the four types of target abnormal data detection models, namely MLP, SVM, DT and RF, respectively correspond to target abnormal data detection models with different 5 model parameters.
In this embodiment, because the storage space represented by the target storage type is smaller than the storage space represented by the initial storage type, the historical resource transfer data of the target storage type with a smaller storage space is acquired in the training process of each initial abnormal data detection model for training, and the acquisition speed of the historical resource transfer data is increased, so that the training efficiency of each initial abnormal data detection model is increased.
In an embodiment, in step 310, after performing model integration based on the abnormal data detection models of the targets to obtain an abnormal data detection integration model, the method further includes:
acquiring resource transfer data to be detected;
inputting the resource transfer characteristics to be detected into an abnormal data detection integrated model, and respectively detecting abnormal data through each target abnormal data detection model in the abnormal data detection integrated model to obtain target abnormal possibility corresponding to each target abnormal data detection model;
and performing merging calculation based on the target abnormal possibility corresponding to each target abnormal data detection model to obtain the abnormal possibility corresponding to the resource transfer data to be detected.
The resource transfer data to be detected refers to the resource transfer data which needs to detect whether abnormal data exists or not. The target abnormal possibility means that the target abnormal data detection model calculates the abnormal possibility according to the resource transfer data to be detected.
Specifically, the server responds to an abnormal data detection instruction sent by the terminal, and obtains resource transfer data to be detected corresponding to a target account identifier in a data storage space according to the target account identifier in the abnormal data detection instruction, the resource transfer data may be historical resource transfer data corresponding to the target account identifier, and the storage type of the resource transfer feature in the resource transfer data to be detected is a target storage type.
And then the server inputs the resource transfer data to be detected into the abnormal data detection integrated model, and abnormal data detection is respectively carried out through various target abnormal data detection models in the abnormal data detection integrated model, wherein the abnormal data detection is respectively carried out on the resource transfer data to be detected through various target abnormal data detection models corresponding to various target abnormal data detection models, so that the abnormal possibility output by various target abnormal data detection models corresponding to various target abnormal data detection models is obtained. And carrying out average calculation on the abnormal possibility output by each target abnormal data detection model to obtain the target abnormal possibility corresponding to the target abnormal data detection model. According to the execution logic, the target abnormal possibility corresponding to various target abnormal data detection models can be obtained.
And the abnormal data detection integrated model carries out average calculation on the target abnormal possibility corresponding to various target abnormal data detection models to obtain the abnormal possibility corresponding to the resource transfer data to be detected. When the server detects that the possibility of the abnormality exceeds a preset possibility threshold value of the abnormality, the server determines that the resource transfer data to be detected is abnormal data.
The server may preset an abnormality possibility range to determine different abnormality levels, for example, when the abnormality possibility is in the range of (0.5,0.6), the abnormality level is determined to be a low abnormality level; when the abnormal possibility is in the range of (0.6,0.8), determining the abnormal level as a middle abnormal level; when the abnormality probability is in the range of (0.8,1), the abnormality level is determined to be a high abnormality level.
In one embodiment, as shown in fig. 5, an abnormal data detection diagram of an abnormal data detection integration model is provided. The method comprises the steps that a server obtains resource transfer data to be detected of a target storage type, after detection features of the resource transfer data to be detected are screened, the resource transfer data to be detected are input into an abnormal data detection integrated model, the abnormal data detection integrated model comprises four types of target abnormal data detection models including a Random Forest model (RF), a multilayer sensor Model (MLP), a Decision Tree model (DT) and a Support Vector Machine (SVM), and each type of target abnormal data detection model corresponds to 5 target abnormal data detection models of different parameter models, namely MLP 5, SVM 5, DT 5 and RF 5. And MLP 5, SVM 5, DT 5 and RF 5 respectively detect abnormal data according to the resource transfer data to be detected and calculate the average value of the abnormal possibility output by various target abnormal data detection models to obtain the prediction probability average values respectively corresponding to MLP, SVM, DT and RF, namely the target abnormal possibility. And then, carrying out average calculation on all the prediction probability mean values through model integration to obtain the abnormal possibility corresponding to the resource transfer data to be detected, namely a prediction result.
In the embodiment, various initial abnormal data detection models are trained, and various trained target abnormal data detection models are subjected to model integration to obtain the abnormal data detection integrated model, so that the results of the models can be fused, the robustness is higher, and the detection accuracy of the abnormal data is improved.
In one embodiment, in step 202, obtaining historical resource transfer data and corresponding training data labels, where the historical resource transfer data includes each historical resource transfer feature, and each historical resource transfer feature is obtained by converting an initial historical resource transfer feature of an initial storage type into a target storage type, and before a storage space represented by the initial storage type exceeds a storage space represented by the target storage type, the method further includes:
acquiring initial historical resource transfer data, wherein the initial historical resource transfer data comprises each initial historical resource transfer characteristic;
and detecting the storage type corresponding to each initial historical resource transfer characteristic, and when the storage type corresponding to each initial historical resource transfer characteristic is the initial storage type, performing storage type conversion on the initial historical resource transfer characteristic of the initial storage type to obtain the historical resource transfer characteristic of the target storage type.
Wherein, the initial historical resource transfer data refers to the original historical resource transfer data. The initial historical resource transfer characteristics refer to unprocessed resource transfer characteristics. The storage type refers to a storage form of the resource transfer data in the storage space.
Specifically, the server obtains initial historical resource transfer data, where the initial historical resource transfer data includes various initial historical resource transfer characteristics, the initial historical resource transfer characteristics may include resource transfer information corresponding to an account and a data value corresponding to the resource transfer information, and the account may be a financial account. The resource transfer information may also include account information, representing user base information. Initial historical resource transfer data as shown in table 1, the first column of the table represents resource transfer information and the second column represents data values. For example, the number of data anomalies and the data value 2 corresponding to the number of data anomalies both characterize data transfer characteristics.
TABLE 1
Resource transfer information Data value
Whether resource transfer data is abnormal 0 (0 is Normal, 1 is abnormal)
Available resource balance 10000.0
Number of data anomalies 2
Amount of unreturned resources 10000.0
Income per month 2500
Age (age) 20
Family members 4
Sex For male
After the server acquires the initial historical resource transfer data, data preprocessing including missing value processing, abnormal value processing and the like can be performed on the initial historical resource transfer data. The server can use the pandas library (extension library) to look up the missing value position of the data and fill in the missing value. The server may then use the pandas library to plot a boxplot of the data corresponding to each resource transfer information, finding extreme outliers, which are values that far exceed the normal distribution of the data, using a Q3+3 IQR calculation (where Q3 represents the 75% boundary value of the data, i.e., 75% of the data falls below this boundary, and IQR refers to the result of subtracting 25% boundary value from the 75% boundary value of the data). The server determines the data value exceeding Q3+3 × iqr as an extreme abnormal value, and eliminates the data where the extreme abnormal value is located, for example, 1000 pieces of data, wherein if 3 pieces of data are detected to be the extreme abnormal value, the three pieces of data are eliminated.
The method comprises the steps that a server obtains initial historical resource transfer data after data preprocessing, detects storage types corresponding to all initial historical resource transfer characteristics, and when the server detects that the storage types of all the initial resource transfer characteristics are the initial storage types, the initial historical resource transfer characteristics of the initial storage types are subjected to storage type conversion to obtain historical resource transfer characteristics of target storage types.
The server can detect whether the initial storage type of each initial historical resource transfer feature exceeds a target storage space, the target storage space is the minimum storage space which can be stored by the initial resource transfer feature, and the server performs storage type conversion on the initial resource transfer feature corresponding to the initial historical storage type exceeding the target storage space to obtain the historical resource transfer feature corresponding to the target storage type.
In one particular embodiment, the storage type may be a data length type to which the data value corresponds. The initial resource transfer characteristic may be a data value, and the initial storage type may be an initial data length of the data value, for example, the initial data length of the data value is 100000 is int 32 bits, and int represents an integer data type. The initial data lengths corresponding to the data values are the same.
The server can detect each data value, compare each data value with a preset storage type conversion numerical range, and perform storage type conversion on the data value in the preset storage type conversion numerical range to obtain a data value of a target storage type. For example, when a data value in a preset storage type conversion numerical range of 0 to 127 (2^8) is detected, converting the data value from an initial data length of int 32 bits to a target data length of int 8 bits, for example, generating an int 8-bit data value to replace the same data value of int 32 bits; when detecting that the data value is in the data value of the preset storage type conversion numerical range of 0-65535 (2 ^ 16), converting the data value from the initial data length of the int 32 bits to the target data length of the int 16 bits.
In this embodiment, the storage type conversion is performed on the initial historical resource transfer characteristic of the initial storage type to obtain the historical resource transfer characteristic of the target storage type, so that the storage space of the historical resource transfer data can be reduced, and the resources can be saved. Furthermore, the reading speed of the training data in the training process of the abnormal data detection model can be increased, so that the training efficiency of the abnormal data detection model is improved.
In one embodiment, step 204, calculating a feature correlation between the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain the target resource transfer features, includes:
calculating the feature correlation among the historical resource transfer features, and acquiring each relevant historical resource transfer feature set of which the feature correlation exceeds a preset feature correlation threshold from each historical resource transfer feature;
and respectively carrying out random screening on each related resource transfer characteristic set to obtain target resource transfer characteristics corresponding to each related resource transfer characteristic set.
The relevant historical resource transfer feature set refers to a set of all historical resource transfer features of which the feature correlation exceeds a preset feature correlation threshold.
Specifically, the server may calculate a feature Correlation between the respective historical resource transfer features using a Pearson Correlation coefficient (Pearson Correlation), a higher feature Correlation indicating a higher degree of Correlation. And acquiring each related resource transfer characteristic set of which the characteristic correlation exceeds a preset characteristic correlation threshold from each historical resource transfer characteristic, wherein the related resource transfer characteristic set can be a resource transfer characteristic with linear correlation. The preset feature correlation threshold may be 0.6. Then, the server can randomly select a target resource transfer characteristic from each relevant resource transfer characteristic set to obtain each target resource transfer characteristic.
In the embodiment, the characteristic correlation among the historical resource transfer characteristics is calculated, and the related resource transfer characteristic set with the characteristic correlation exceeding the preset characteristic correlation threshold is randomly screened, so that the linear relation among the historical resource transfer characteristics can be prevented, the characteristic redundancy is avoided, and the training accuracy and the training efficiency of the abnormal data detection model are improved.
In one embodiment, before obtaining the historical resource transfer data and the corresponding training data labels in step 202, the method further includes:
acquiring a training sample set, wherein the training sample set comprises various historical resource transfer data and corresponding training data labels;
determining each normal historical resource transfer data and each abnormal historical resource transfer data from each historical resource transfer data based on the training data labels;
dividing each normal historical resource transfer data based on the abnormal data volume of each abnormal historical resource transfer data to obtain each normal historical resource transfer data set, wherein the normal data volume corresponding to each normal historical resource transfer data set is the same as the abnormal data volume;
selecting target normal historical resource transfer data from the normal historical resource transfer data set, and selecting target abnormal historical resource transfer data from each abnormal historical resource transfer data;
and taking the target normal historical resource transfer data and the target abnormal historical resource transfer data as historical resource transfer data.
The training sample set refers to historical resource transfer data used for performing model training on the abnormal data detection model. Normal historical resource transfer data refers to resource transfer data that is marked as normal. The exception historical resource transfer data refers to resource transfer data marked as exceptions. The target normal historical resource transfer data refers to part of normal resource transfer data selected from the normal historical resource transfer data set. The target abnormal historical resource transfer data refers to part of abnormal resource transfer data selected from the abnormal historical resource transfer data set.
Specifically, the server obtains a training sample set, wherein the training sample set comprises various historical resource transfer data and corresponding training data labels. And then the server determines each normal historical resource transfer data and each abnormal historical resource transfer data from each historical resource transfer data according to the training data labels. And the server divides the normal historical resource transfer data according to the abnormal data volume of the abnormal historical resource transfer data to obtain normal historical resource transfer data sets, wherein the normal data volume corresponding to the normal historical resource transfer data sets is the same as the abnormal data volume. The server may calculate a ratio of the abnormal data amount to a normal data amount of each abnormal historical resource transfer data, divide each normal historical resource transfer data according to the ratio, for example, N = normal data amount/abnormal data amount, divide each normal historical resource transfer data into N parts, and obtain N normal historical resource transfer data sets.
Then, the server can randomly select target normal historical resource transfer data from each normal historical resource transfer data set according to a preset data volume, select target abnormal historical resource transfer data from each abnormal historical resource transfer data set, and take the target normal historical resource transfer data and the target abnormal historical resource transfer data as historical resource transfer data. The server may also use each abnormal historical resource transfer data as an abnormal historical resource transfer data set, combine each abnormal historical resource transfer data set with each normal historical resource transfer data set, and use the combined data as historical resource transfer data.
In this embodiment, each piece of normal historical resource transfer data is divided according to the abnormal data volume of each piece of abnormal historical resource transfer data, and the target normal historical resource transfer data and the target abnormal historical resource transfer data with the same data volume are used as the historical resource transfer data, so that the problem of unbalanced data categories can be avoided, and the detection accuracy of the abnormal data detection model is improved.
In a specific embodiment, the server obtains the initial historical resource transfer data, performs data preprocessing such as missing value filling and abnormal value processing on the initial historical resource transfer data, and converts an initial storage type of the initial historical resource transfer data after the data preprocessing into a target storage type to obtain the historical resource transfer data. And then the server takes the historical resource transfer data as training data and acquires training data labels corresponding to the training data. And the server performs model training on the abnormal data detection integrated model according to the training data and the training data labels to obtain a trained abnormal data detection integrated model, and the abnormal data detection integrated model is used for performing abnormal data detection.
In one embodiment, the anomaly data detection integration model is also used for detecting whether the account is overdue for payment. The server acquires to-be-detected borrowing data corresponding to the to-be-detected account, the to-be-detected borrowing data are historical borrowing data stored in a data storage space in advance, and the storage type of the historical borrowing data is a target storage type. And the server inputs the to-be-borrowed data into the abnormal data detection integrated model for abnormal data detection, obtains the abnormal possibility output by the abnormal data detection integrated model, and represents the overdue repayment probability of the to-be-borrowed account predicted by the abnormal data detection integrated model according to the to-be-borrowed data. And when the detected abnormal possibility exceeds a preset abnormal possibility threshold value, indicating that the overdue repayment possibility of the account to be detected is high, and judging that the account to be detected is the overdue repayment account.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides an abnormal data detection model training device for implementing the above abnormal data detection model training method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so that specific limitations in one or more embodiments of the abnormal data detection model training device provided below can be referred to the limitations of the abnormal data detection model training method in the above, and details are not repeated herein.
In one embodiment, as shown in fig. 6, there is provided an abnormal data detection model training apparatus 600, including: an obtaining module 602, an initial detection module 606, a loss calculation module 608, and a training iteration module 610, wherein:
an obtaining module 602, configured to obtain historical resource transfer data and a corresponding training data label, where the historical resource transfer data includes various historical resource transfer features, each historical resource transfer feature is obtained by converting an initial historical resource transfer feature of an initial storage type into a target storage type, and a storage space represented by the initial storage type exceeds a storage space represented by the target storage type;
the screening module 604 is configured to calculate a feature correlation between the historical resource transfer features, and perform feature screening on the historical resource transfer features based on the feature correlation to obtain target resource transfer features;
an initial detection module 606, configured to input each target resource transfer characteristic into an initial abnormal data detection model to perform abnormal data detection, so as to obtain a training abnormal possibility corresponding to historical resource transfer data;
a loss calculation module 608, configured to perform loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and update the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and the training iteration module 610 is configured to use the updated abnormal data detection model as an initial abnormal data detection model, and return to the step of obtaining the historical resource transfer data and the corresponding training data label for iteration execution, until a training completion condition is reached, to obtain a target abnormal data detection model, where the target abnormal data detection model is used to detect the abnormal possibility of the resource transfer data.
In one embodiment, the abnormal data detection model training apparatus 600 further includes:
the integrated model unit is used for inputting the transfer characteristics of each target resource into each initial abnormal data detection model respectively to carry out abnormal data detection so as to obtain the training abnormal possibility corresponding to each initial abnormal data detection model, and at least two initial abnormal data detection models are established by using different model structures; respectively performing loss calculation based on the training abnormal possibility and the training data label corresponding to each initial abnormal data detection model to obtain model loss information corresponding to each initial abnormal data detection model; updating the corresponding initial abnormal data detection models respectively based on the model loss information corresponding to each initial abnormal data detection model to obtain each updated abnormal data detection model; respectively taking each updated abnormal data detection model as an initial abnormal data detection model, returning to the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining each target abnormal data detection model until a training completion condition is reached; and carrying out model integration based on each target abnormal data detection model to obtain an abnormal data detection integrated model.
In one embodiment, the abnormal data detection model training apparatus 600 further includes:
the detection unit is used for acquiring resource transfer data to be detected;
inputting the resource transfer characteristics to be detected into an abnormal data detection integrated model, and respectively detecting abnormal data through each target abnormal data detection model in the abnormal data detection integrated model to obtain target abnormal possibility corresponding to each target abnormal data detection model; and performing merging calculation based on the target abnormal possibility corresponding to each target abnormal data detection model to obtain the abnormal possibility corresponding to the resource transfer data to be detected.
In one embodiment, the abnormal data detection model training apparatus 600 further comprises:
the storage type conversion unit is used for acquiring initial historical resource transfer data which comprises each initial historical resource transfer characteristic; and detecting the storage type corresponding to each initial historical resource transfer characteristic, and when the storage type corresponding to each initial historical resource transfer characteristic is the initial storage type, performing storage type conversion on the initial historical resource transfer characteristic of the initial storage type to obtain the historical resource transfer characteristic of the target storage type.
In one embodiment, the filtering module 604 includes:
the characteristic screening unit is used for calculating the characteristic correlation among the historical resource transfer characteristics and acquiring each relevant historical resource transfer characteristic set of which the characteristic correlation exceeds a preset characteristic correlation threshold value from each historical resource transfer characteristic; and respectively carrying out random screening on each related resource transfer characteristic set to obtain target resource transfer characteristics corresponding to each related resource transfer characteristic set.
In one embodiment, the abnormal data detection model training apparatus 600 further includes:
the data dividing unit is used for taking a training sample set, and the training sample set comprises various historical resource transfer data and corresponding training data labels; determining each normal historical resource transfer data and each abnormal historical resource transfer data from each historical resource transfer data based on the training data labels; dividing each normal historical resource transfer data based on the abnormal data volume of each abnormal historical resource transfer data to obtain each normal historical resource transfer data set, wherein the normal data volume corresponding to each normal historical resource transfer data set is the same as the abnormal data volume; selecting target normal historical resource transfer data from the normal historical resource transfer data set, and selecting target abnormal historical resource transfer data from each abnormal historical resource transfer data; and taking the target normal historical resource transfer data and the target abnormal historical resource transfer data as historical resource transfer data.
All or part of each module in the abnormal data detection model training device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing historical resource transfer data. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of anomaly data detection model training.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an abnormal data detection model training method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the configurations shown in fig. 7-8 are only block diagrams of some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. An abnormal data detection model training method, characterized in that the method comprises:
acquiring historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, the various historical resource transfer characteristics are obtained by converting initial historical resource transfer characteristics of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
calculating the feature correlation among the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain target resource transfer features;
inputting the transfer characteristics of each target resource into an initial abnormal data detection model to perform abnormal data detection, and obtaining training abnormal possibility corresponding to the historical resource transfer data;
performing loss calculation based on the training abnormal possibility and the training data label to obtain training loss information, and updating the initial abnormal data detection model based on the training loss information to obtain an updated abnormal data detection model;
and taking the updated abnormal data detection model as an initial abnormal data detection model, and returning the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, so as to obtain a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
2. The method of claim 1, wherein the initial anomaly data detection model comprises at least two; the method further comprises the following steps:
respectively inputting the transfer characteristics of each target resource into each initial abnormal data detection model to perform abnormal data detection, so as to obtain training abnormal possibility corresponding to each initial abnormal data detection model, wherein the at least two initial abnormal data detection models are established by using different model structures;
respectively performing loss calculation based on the training abnormal possibility corresponding to each initial abnormal data detection model and the training data label to obtain model loss information corresponding to each initial abnormal data detection model;
updating the corresponding initial abnormal data detection models respectively based on the model loss information corresponding to each initial abnormal data detection model to obtain each updated abnormal data detection model;
respectively taking each updated abnormal data detection model as an initial abnormal data detection model, and returning the step of obtaining the historical resource transfer data and the corresponding training data labels to be executed in an iterative way until a training completion condition is reached to obtain each target abnormal data detection model;
and carrying out model integration based on the abnormal data detection models of the targets to obtain an abnormal data detection integration model.
3. The method according to claim 2, wherein after the model integration is performed based on the target abnormal data detection models to obtain an abnormal data detection integration model, the method further comprises:
acquiring resource transfer data to be detected;
inputting the resource transfer characteristics to be detected into the abnormal data detection integrated model, and respectively performing abnormal data detection through each target abnormal data detection model in the abnormal data detection integrated model to obtain target abnormal possibility corresponding to each target abnormal data detection model;
and performing merging calculation based on the target abnormal possibility corresponding to each target abnormal data detection model to obtain the abnormal possibility corresponding to the resource transfer data to be detected.
4. The method according to claim 1, wherein before the obtaining of the historical resource transfer data and the corresponding training data labels, the historical resource transfer data includes each historical resource transfer feature, and each historical resource transfer feature is obtained by converting an initial historical resource transfer feature of an initial storage type into a target storage type, and a storage space represented by the initial storage type exceeds a storage space represented by the target storage type, the method further includes:
acquiring initial historical resource transfer data, wherein the initial historical resource transfer data comprises each initial historical resource transfer characteristic;
and detecting the storage type corresponding to each initial historical resource transfer characteristic, and when the storage type corresponding to each initial historical resource transfer characteristic is the initial storage type, performing storage type conversion on the initial historical resource transfer characteristic of the initial storage type to obtain the historical resource transfer characteristic of the target storage type.
5. The method according to claim 1, wherein the calculating a feature correlation between the historical resource transfer features, and performing feature screening on the historical resource transfer features based on the feature correlation to obtain target resource transfer features comprises:
calculating the feature correlation among the historical resource transfer features, and acquiring each related historical resource transfer feature set of which the feature correlation exceeds a preset feature correlation threshold from each historical resource transfer feature;
and respectively carrying out random screening on each related resource transfer characteristic set to obtain target resource transfer characteristics corresponding to each related resource transfer characteristic set.
6. The method of claim 1, further comprising, prior to said obtaining historical resource transfer data and corresponding training data labels:
acquiring a training sample set, wherein the training sample set comprises various historical resource transfer data and corresponding training data labels;
determining each normal historical resource transfer data and each abnormal historical resource transfer data from the historical resource transfer data based on the training data labels;
dividing each normal historical resource transfer data based on the abnormal data volume of each abnormal historical resource transfer data to obtain each normal historical resource transfer data set, wherein the normal data volume corresponding to each normal historical resource transfer data set is the same as the abnormal data volume;
selecting target normal historical resource transfer data from the normal historical resource transfer data set, and selecting target abnormal historical resource transfer data from each abnormal historical resource transfer data;
and taking the target normal historical resource transfer data and the target abnormal historical resource transfer data as the historical resource transfer data.
7. An abnormal data detection model training apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring historical resource transfer data and corresponding training data labels, wherein the historical resource transfer data comprises various historical resource transfer characteristics, the various historical resource transfer characteristics are obtained by converting initial historical resource transfer characteristics of an initial storage type into a target storage type, and the storage space represented by the initial storage type exceeds the storage space represented by the target storage type;
the screening module is used for calculating the characteristic correlation among the historical resource transfer characteristics, and carrying out characteristic screening on the historical resource transfer characteristics based on the characteristic correlation to obtain the target resource transfer characteristics;
the initial detection module is used for inputting the transfer characteristics of each target resource into an initial abnormal data detection model to carry out abnormal data detection so as to obtain the training abnormal possibility corresponding to the historical resource transfer data;
the loss calculation module is used for performing loss calculation on the basis of the training abnormal possibility and the training data labels to obtain training loss information, and updating the initial abnormal data detection model on the basis of the training loss information to obtain an updated abnormal data detection model;
and the training iteration module is used for taking the updated abnormal data detection model as an initial abnormal data detection model, returning the step of obtaining the historical resource transfer data and the corresponding training data labels for iterative execution, and obtaining a target abnormal data detection model when a training completion condition is reached, wherein the target abnormal data detection model is used for detecting the abnormal possibility of the resource transfer data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202211410472.5A 2022-11-11 2022-11-11 Abnormal data detection model training method and device and computer equipment Pending CN115905864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211410472.5A CN115905864A (en) 2022-11-11 2022-11-11 Abnormal data detection model training method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211410472.5A CN115905864A (en) 2022-11-11 2022-11-11 Abnormal data detection model training method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN115905864A true CN115905864A (en) 2023-04-04

Family

ID=86481779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211410472.5A Pending CN115905864A (en) 2022-11-11 2022-11-11 Abnormal data detection model training method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN115905864A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116449770A (en) * 2023-06-15 2023-07-18 中科航迈数控软件(深圳)有限公司 Machining method, device and equipment of numerical control machine tool and computer storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116449770A (en) * 2023-06-15 2023-07-18 中科航迈数控软件(深圳)有限公司 Machining method, device and equipment of numerical control machine tool and computer storage medium
CN116449770B (en) * 2023-06-15 2023-09-15 中科航迈数控软件(深圳)有限公司 Machining method, device and equipment of numerical control machine tool and computer storage medium

Similar Documents

Publication Publication Date Title
CN110598845B (en) Data processing method, data processing device, computer equipment and storage medium
CN110689084B (en) Abnormal user identification method and device
CN113689285B (en) Method, device, equipment and storage medium for detecting user characteristics
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN115905864A (en) Abnormal data detection model training method and device and computer equipment
CN115758271A (en) Data processing method, data processing device, computer equipment and storage medium
CN116304251A (en) Label processing method, device, computer equipment and storage medium
CN115186890A (en) Early warning method, early warning device, computer equipment and storage medium
CN117312405A (en) User anomaly level determination method, device, computer equipment, medium and product
CN117391490A (en) Evaluation information processing method and device for financial business and computer equipment
CN115758110A (en) Abnormal account detection method and device, computer equipment and storage medium
CN116342242A (en) Abnormality detection method, abnormality detection device, computer device, and storage medium
CN117853217A (en) Financial default rate prediction method, device and equipment for protecting data privacy
CN117436972A (en) Resource object recommendation method, device, computer equipment and storage medium
CN117726431A (en) Credit information updating method, apparatus, device, storage medium and program product
CN117575772A (en) Abnormal user detection method and device, computer equipment and storage medium
CN112862472A (en) Loan payment system based on business display industry chain and loan payment judgment method
CN117151873A (en) Abnormality prompting method, abnormality prompting device, computer equipment and storage medium
CN117522138A (en) Method, device, equipment and medium for identifying testing risk of financial business system
CN115760384A (en) Abnormal behavior recognition method, abnormal behavior recognition device, electronic device, and storage medium
CN117455664A (en) Method and device for processing resource data, computer equipment and storage medium
CN115757958A (en) Product recommendation method and device, computer equipment and storage medium
CN116781373A (en) Risk assessment method, apparatus, device, storage medium, and program product
CN116932935A (en) Address matching method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Zhaolian Consumer Finance Co.,Ltd.

Applicant after: SUN YAT-SEN University

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: MERCHANTS UNION CONSUMER FINANCE Co.,Ltd.

Country or region before: China

Applicant before: SUN YAT-SEN University