CN116051262A - Training method of risk prediction model, risk prediction method and device - Google Patents

Training method of risk prediction model, risk prediction method and device Download PDF

Info

Publication number
CN116051262A
CN116051262A CN202211616132.8A CN202211616132A CN116051262A CN 116051262 A CN116051262 A CN 116051262A CN 202211616132 A CN202211616132 A CN 202211616132A CN 116051262 A CN116051262 A CN 116051262A
Authority
CN
China
Prior art keywords
data
risk prediction
prediction model
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211616132.8A
Other languages
Chinese (zh)
Inventor
王皓
周志忠
刘文虎
彭杰
余菡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zoomlion Heavy Industry Science and Technology Co Ltd
Zhongke Yungu Technology Co Ltd
Original Assignee
Zoomlion Heavy Industry Science and Technology Co Ltd
Zhongke Yungu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zoomlion Heavy Industry Science and Technology Co Ltd, Zhongke Yungu Technology Co Ltd filed Critical Zoomlion Heavy Industry Science and Technology Co Ltd
Priority to CN202211616132.8A priority Critical patent/CN116051262A/en
Publication of CN116051262A publication Critical patent/CN116051262A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of data analysis, and discloses a training method, a risk prediction method and a device for a risk prediction model, wherein the training method for the risk prediction model comprises the following steps: collecting the returned data of the equipment in real time; filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data; classifying the associated index data into operation index data and order index data; constructing a device data set according to the operation index data, the order index data and the default state data; based on the device dataset, a risk prediction model of the device is trained. According to the risk prediction method and the risk prediction device, the risk prediction model is trained through the equipment data set comprising the operation index data, and the accuracy of risk prediction is improved. Meanwhile, the trained risk prediction model is a real-time prediction model, and can be used for predicting the risk of the equipment in real time.

Description

Training method of risk prediction model, risk prediction method and device
Technical Field
The invention relates to the field of data analysis, in particular to a training method of a risk prediction model, a risk prediction method and a risk prediction device.
Background
With the continuous development of engineering machinery technology, the equipment price of engineering machinery is also higher and higher. In order to quickly put the equipment into production, users typically purchase the equipment of the work machine in a staged loan manner. When the user of the loan cannot continue to repay the loan, determining that the user has liability violations, monitoring the violating user, and taking treatment measures for the user. In order to avoid bad account caused by debt default, the repayment status and repayment capability of the loan user are monitored according to financial index data of orders such as overdue amount of the user, and risk analysis after loan is performed on the user purchasing the equipment.
However, in the prior art, a large amount of financial index data is usually collected manually to perform risk analysis on equipment, and the situation that data is not updated timely exists, so that accuracy of a risk analysis result is further affected. In addition, the financial index data such as the overdue amount and overdue number of the user are generally updated based on the repayment date, and the obtained financial index data has poor timeliness, so that the accuracy of the risk analysis result is low. Meanwhile, the analysis results have time delay, so that early warning is difficult to be carried out on the user with debt default risk in advance.
Disclosure of Invention
The invention aims to provide equipment for solving the problem of low accuracy of risk analysis results.
To achieve the above object, in a first aspect, the present application provides a training method of a risk prediction model, including:
collecting returned data of the equipment in real time, wherein the returned data comprises default state data and first quantity of dimension index data;
filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data;
classifying the associated index data into operation index data and order index data;
constructing a device data set according to the operation index data, the order index data and the default state data;
and training a preset model based on the equipment data set to obtain a risk prediction model of the equipment.
With reference to the first aspect, in a first possible implementation manner, after the training a preset model based on the device dataset to obtain a risk prediction model of the device, the method further includes:
classifying the equipment data sets according to preset time intervals to obtain a second number of time window data sets;
performing a loop step for each of the time window data sets in turn based on a time sequence, wherein the loop step comprises:
inputting the time window data set into the risk prediction model to obtain risk prediction data of the next time window;
comparing the risk prediction data of the next time window with the default state data to obtain risk prediction data with errors;
updating a next time window dataset based on the risk prediction data with errors;
and updating the risk prediction model according to the next time window data set.
With reference to the first aspect, in a second possible implementation manner, the training a preset model based on the device data set to obtain a risk prediction model of the device includes:
and training a preset model by taking the operation index data and the order index data as model independent variables and the default state data as model dependent variables to obtain a risk prediction model of the equipment.
With reference to the first aspect, in a third possible implementation manner, before the filtering the dimension index data according to the association level of the default status data and each dimension index data to determine the operation index data and the order index data of the device, the method further includes:
classifying the return data into positive class data and negative class data, wherein the number of the positive class data is smaller than that of the negative class data;
and performing linear fitting on the positive class data, and increasing the number of the positive class data until the difference value of the number of the positive class data and the number of the negative class data is smaller than a preset threshold value.
With reference to the first aspect, in a fourth possible implementation manner, training a preset model based on the device data set to obtain a risk prediction model of the device includes:
classifying the device data set into a training data set, a test data set and a verification data set based on a preset proportion;
training at least one preset model based on the training data set;
according to the test data set, testing each preset model to obtain a model to be verified;
and verifying the model to be verified based on the verification data set, and determining a risk prediction model of the equipment.
In a second aspect, the present application provides a risk prediction method, including:
acquiring the to-be-detected return data of the target equipment in real time;
and inputting the feedback data to be detected into a risk prediction model to obtain risk prediction data of the target equipment, wherein the risk prediction model is obtained according to the training method of the risk prediction model in the first aspect.
In a third aspect, the present application provides a training device for a risk prediction model, including:
the data acquisition module is used for acquiring the returned data of the equipment in real time, wherein the returned data comprises the default state data and the first number of dimension index data;
the data filtering module is used for filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data;
the data classification module is used for classifying the associated index data into operation index data and order index data;
the data set construction module is used for constructing a device data set according to the operation index data, the order index data and the default state data;
and the model training module is used for training a preset model based on the equipment data set to obtain a risk prediction model of the equipment.
In a fourth aspect, the present application provides a risk prediction apparatus, including:
the data acquisition module is used for acquiring the feedback data to be detected of the target equipment in real time;
and the risk prediction module is used for inputting the feedback data to be detected into a risk prediction model to obtain risk prediction data of the target equipment, wherein the risk prediction model is obtained according to the training method of the risk prediction model in the first aspect.
In a fifth aspect, the present application provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the training method of the risk prediction model according to the first aspect, or implements the risk prediction method according to the second aspect.
In a sixth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of training the risk prediction model according to the first aspect, or implements the method of risk prediction according to the second aspect.
The application provides a training method of a risk prediction model, which comprises the following steps: collecting the returned data of the equipment in real time; filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data; classifying the associated index data into operation index data and order index data; constructing a device data set according to the operation index data, the order index data and the default state data; and training a preset model based on the equipment data set to obtain a risk prediction model of the equipment. According to the risk prediction method and the risk prediction device, the risk prediction model is trained through the equipment data set comprising the operation index data, and the accuracy of risk prediction is improved. Meanwhile, the trained risk prediction model is a real-time prediction model, and can be used for predicting the risk of the equipment in real time.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:
FIG. 1 shows a flowchart of a method for training a risk prediction model provided by an embodiment of the present application;
FIG. 2 shows a flow chart of a risk prediction method provided by an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a training device of a risk prediction model according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of a risk prediction apparatus provided in an embodiment of the present application.
Detailed Description
The following describes the detailed implementation of the embodiments of the present invention with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present invention, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the invention belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the invention.
Example 1
Referring to fig. 1, fig. 1 shows a flowchart of a training method of a risk prediction model provided in an embodiment of the present application. The training method of the risk prediction model in fig. 1 includes:
s110, the returned data of the equipment are collected in real time.
The data returned by the device, that is, the data returned by the device to the data center and the service system, wherein the service system may be CRM (Customer Relationship Management ), CSS (Cluster Synchronization Service, cluster synchronization service). Meanwhile, the equipment type is set according to actual requirements, and can be engineering machinery, and is not limited herein. The return data includes a first amount of breach status data and dimension index data, the breach status data being used to determine whether the device is breached or not breached. The dimension index data is used for representing quantized data of different dimensions such as working hours, specifications and the like of equipment.
And collecting the return data of the relevant dimensions of the order, such as loan amount, delinquent amount and the like in the data center station in real time, and collecting the return data of the relevant dimensions of the operation, such as accumulated working hours and the like in the business system in real time. Specifically, under the condition that the returned data is structured data, the offline returned data is collected by using data synchronization tools such as nifi and datax, and the real-time returned data is collected by using data synchronization tools such as kafka. And under the condition that the returned data is unstructured data, converting the unstructured data into structured data, acquiring the returned data by using a data synchronization tool, and storing the returned data into a data warehouse.
And S120, filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data.
The returned data returned by the equipment comprises various types of dimension index data, including dimension index data related to the default state of the equipment such as the default amount of loan, the default amount of loan and the like, and dimension index data related to the default state of the equipment such as the production batch of the equipment and the like. And carrying out association relation analysis on the default state data and each dimension index data, and filtering dimension index data which is not related to the default state data to obtain association index data associated with the default state data.
Specifically, the default state data is taken as an independent variable, the dimension index data is taken as an independent variable, and the obtained formula is as follows:
Y=b 0 +b 1 X
wherein b 1 The coefficient of the association relation between the dimension index data and the default state data; b 0 Is the best fit parameter, and b 0 Is a constant term; y is default state data; x is dimension index data, which can be data such as working hours, repayment conditions, loan conditions and the like of equipment, and is not described herein.
The data returned from a plurality of devices is generally collected, b 1 The calculation formula of (2) is as follows:
Figure BDA0004001786780000071
wherein b 1 The coefficient of the association relation between the dimension index data and the default state data; x is x i The dimension index data of the ith equipment;
Figure BDA0004001786780000072
the average value of the dimension index data of all the devices is obtained; y is i Default state data for the ith device; />
Figure BDA0004001786780000073
The average value of the default state data of the dimension index data of all the devices.
As an example, the filtering the dimension index data according to the association level of the default status data and each dimension index data, before determining the operation index data and the order index data of the device, further includes:
classifying the return data into positive class data and negative class data, wherein the number of the positive class data is smaller than that of the negative class data;
and performing linear fitting on the positive class data, and increasing the number of the positive class data until the difference value of the number of the positive class data and the number of the negative class data is smaller than a preset threshold value.
The amount of data returned by the device where the breach typically occurred is much less than the amount of data returned by the device where no breach occurred. And carrying out data balance on the returned data so as to improve the accuracy of the model. Specifically, the returned data of the device with the default is classified into positive class data, the returned data of the device without the default is classified into negative class data, the number of the positive class data is increased, and data balance is realized. For easy understanding, in the embodiment of the present application, a SMOTE (Synthetic Minority Oversampling Technique) algorithm is used to synthesize minority class oversampling, and linear fitting is performed on positive class data, so as to synthesize new minority class positive class data until the difference between the number of positive class data and the number of negative class data is smaller than a preset threshold, where the preset threshold is set according to actual requirements, and is not limited herein.
It should be understood that after the returned data of the device is collected in real time, the collected returned data may be cleaned, filled, converted, and other processing operations. Specifically, in the case that the data is abnormal due to operator error or equipment abnormality, the data is cleaned. And under the condition that the data return abnormality occurs due to network abnormality, positioning abnormality and the like, the data filling is carried out on the return data. In the case that the returned data includes character type data, the character type data in the returned data is converted into a numerical virtual variable.
S130, classifying the associated index data into operation index data and order index data.
For ease of understanding, in the embodiment of the present application, the order index data includes a total of 5 index data, which are respectively: financial exposure index, financial overdue index, overdue number index, sales order belonging to province index, sales order belonging to city index. Specifically, the financial exposure index is the equipment loan amount; the financial overdue index is the overdue limit of the equipment loan; the overdue index is the default of the loan of the equipment.
The order index data is used as index data directly acting on risk identification after loan, can be used for identifying direct risks and predicting the risk that the equipment is likely to be violated. Meanwhile, the operation index data is used for determining the current operation condition of the equipment. And determining the benefits generated when the equipment operates according to the operating condition of the equipment. And determining the repayment capability of the user on the loan according to the benefits, and further assisting in identifying the risk after the loan of the equipment.
The operation index data comprises equipment working condition data and information data. For easy understanding, in the embodiment of the present application, the working condition data in the operation index data includes 7 index data in total, which are respectively an accumulated working hour index, an accumulated square quantity index, an accumulated hole forming depth index, a daily working hour index, a daily square quantity index, a daily hole forming depth index and an offline time index. Specifically, the accumulated working hour index is the accumulated working hours of the equipment from the network access time to the current time period. The accumulated square quantity index is the concrete square quantity pumped from the network access time to the current time. The accumulated hole depth index is accumulated hole depth of the piling equipment from the network access time to the current time. The time of day index is the number of hours that the equipment is in working state every day. The daily mass index is the daily pumping concrete mass value of the concrete equipment. The daily pore-forming depth index is the daily pumping concrete mass value of the concrete equipment. The offline time index is the time difference between the current time and the last positioning time of the equipment. The information class data in the operation index data includes 2 index data in total, which are respectively a model index and a specification index, specifically, the model index is the model of the equipment. The specification index is the type of the equipment. After classifying the associated index data, 14 index data for risk identification are obtained in total.
And S140, constructing a device data set according to the operation index data, the order index data and the default state data.
The order index data is index data directly contributing to post-loan risk identification, and the running index data is index data assisting the device in post-loan risk identification. And constructing a device data set according to the operation index data, the order index data and the default state data, so as to train a model for risk prediction through the device data set, and further predict the default state of the device according to the operation index data and the order index data.
And S150, training a preset model based on the equipment data set to obtain a risk prediction model of the equipment.
Machine learning algorithms refer to algorithms that a computer can automatically refine through experience. And inputting the constructed equipment data set into a machine learning algorithm, and training a preset model to obtain a risk prediction model of the equipment. When the risk prediction needs to be carried out on the current equipment, the feedback data of the current equipment are input into a risk prediction model. And carrying out risk prediction on the current equipment according to the output result of the risk prediction model.
Because the operation index data is the index data which plays an auxiliary role in risk identification after loan of the equipment, compared with a model trained only by the order index data, the risk prediction model is trained by the equipment data set comprising the operation index data, and the accuracy of risk prediction is improved. Meanwhile, the order index data is updated based on the repayment date, time lag exists, and the operation index data is updated in real time. Compared with a model trained only through order index data, the risk prediction model trained by the method is a real-time prediction model, and risk prediction can be performed on equipment in real time.
As an example, after training a preset model based on the device data set to obtain the risk prediction model of the device, the method further includes:
classifying the equipment data sets according to preset time intervals to obtain a second number of time window data sets;
performing a loop step for each of the time window data sets in turn based on a time sequence, wherein the loop step comprises:
inputting the time window data set into the risk prediction model to obtain risk prediction data of the next time window;
comparing the risk prediction data of the next time window with the default state data to obtain risk prediction data with errors;
updating a next time window dataset based on the risk prediction data with errors;
and updating the risk prediction model according to the next time window data set.
The accuracy of the risk prediction result output by the risk prediction model can be further improved by empirically optimizing parameters of the risk prediction model. In particular, the data of the device dataset is time ordered based on the time series. Dividing the ordered device data sets according to preset time intervals to obtain a second number of time window data sets, wherein the time intervals and the second number are set according to actual requirements, and the method is not limited herein. And sequentially inputting each time window data set into the risk prediction model according to the time sequence, and carrying out parameter optimization on the risk prediction model.
Taking the input of the T time window data set into the risk prediction model as an example, obtaining risk prediction data of the next time window output by the risk prediction model, and obtaining the risk prediction data of the T+1 time window. And comparing the risk prediction data of the next time window with the default state data, and detecting whether the risk prediction model predicts the equipment which is not subjected to default in the T+1 time window as the equipment which is subjected to default, or predicts the equipment which is subjected to default in the T+1 time window as the equipment which is not subjected to default, so as to obtain the risk prediction data with errors. The risk prediction data with errors is added to the next time window data set and the next time window data set is updated. And executing a circulation step, namely inputting the T+1 time window data set into the risk prediction model until all the time window data sets are input into the risk prediction model, optimizing parameters of the risk prediction model, and improving the accuracy of the risk prediction data output by the risk prediction model.
As an example, the training a preset model based on the device dataset to obtain a risk prediction model of the device includes:
and training a preset model by taking the operation index data and the order index data as model independent variables and the default state data as model dependent variables to obtain a risk prediction model of the equipment.
And taking the default state data as a model dependent variable, specifically, marking the equipment with default as 1, marking the equipment without default as 0, taking the running index data and the order index data as model independent variables, and training a preset model to obtain a risk prediction model of the equipment. The trained risk prediction model can output risk prediction data for predicting the default state of the equipment according to the input data. It should be understood that the risk prediction data further includes data such as a prediction time of the device in a state, which is not described herein.
As an example, the training a preset model based on the device dataset to obtain a risk prediction model of the device includes:
classifying the device data set into a training data set, a test data set and a verification data set based on a preset proportion;
training at least one preset model based on the training data set;
according to the test data set, testing each preset model to obtain a model to be verified;
and verifying the model to be verified based on the verification data set, and determining a risk prediction model of the equipment.
When the preset model is built based on the machine learning algorithm, different machine learning algorithms are selected to influence the accuracy of the preset model, and the preset model with higher accuracy is required to be predetermined and screened. Specifically, the device data set is classified into a training data set, a test data set and a verification data set based on a preset proportion, wherein the preset proportion is set according to actual requirements, and the method is not limited herein. For ease of understanding, the preset ratio in the embodiments of the present application is 6:2:2.
The type of the machine learning algorithm is selected according to actual requirements, and may be an SVM (Support Vector Machine ) algorithm, a random forest algorithm, or the like, which is not limited herein. At least one machine learning algorithm is selected according to requirements, a training data set is input into the machine learning algorithm, and at least one preset model is trained. Each preset model is tested according to the same test data set. And evaluating the accuracy of each preset model in predicting the default state, filtering the preset model with lower accuracy, and determining the preset model with higher accuracy as the model to be verified. Based on the verification data set, verifying the model to be verified to obtain the preset model with highest accuracy in the preset default state. And determining the preset model with highest accuracy as a risk prediction model of the equipment.
The application provides a training method of a risk prediction model, which comprises the following steps: collecting the returned data of the equipment in real time; filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data; classifying the associated index data into operation index data and order index data; constructing a device data set according to the operation index data, the order index data and the default state data; and training a preset model based on the equipment data set to obtain a risk prediction model of the equipment. According to the risk prediction method and the risk prediction device, the risk prediction model is trained through the equipment data set comprising the operation index data, and the accuracy of risk prediction is improved. Meanwhile, the trained risk prediction model is a real-time prediction model, and can be used for predicting the risk of the equipment in real time.
Example 2
Referring to fig. 2, fig. 2 shows a flowchart of a risk prediction method provided in an embodiment of the present application.
The risk prediction method in fig. 2 includes:
s210, the to-be-detected return data of the target equipment are obtained in real time.
The target device is any device that needs to make an default risk prediction, and is not limited herein. Specifically, the data synchronization tool is utilized to collect the to-be-detected return data of the target equipment in real time.
S220, inputting the feedback data to be detected into a risk prediction model to obtain risk prediction data of the target equipment.
The risk prediction model was obtained according to the training method of the risk prediction model described in embodiment 1. The feedback data to be detected, which is generally returned by the target device, comprises the data of the running condition of the device and the data of the order state. The risk prediction model in the embodiment can output high-accuracy risk prediction data according to the running condition of the equipment. Meanwhile, the trained risk prediction model is a real-time prediction model, and the risk prediction model can predict the risk of the equipment in real time based on the returned data to be detected of the target equipment acquired in real time.
Based on the risk prediction data output by the risk prediction model, predicting whether the target device will violate in a subsequent time. And under the condition that the target equipment is predicted to be subjected to default in the subsequent time, carrying out risk early warning on the target equipment, and avoiding equipment bad account. It is to be understood that the feedback data to be detected of the plurality of target devices can be obtained in real time, and the risk prediction data of the plurality of target devices can be output in real time by using the risk prediction model.
Example 3
Referring to fig. 3, fig. 3 is a schematic structural diagram of a training device of a risk prediction model according to an embodiment of the present application. The training apparatus 300 of the risk prediction model in fig. 3 includes:
the data collection module 310 is configured to collect, in real time, feedback data of the device, where the feedback data includes default status data and a first number of dimension index data;
the data filtering module 320 is configured to filter the dimension index data according to the association relationship between the default status data and each dimension index data, so as to obtain association index data of the default status data;
a data classification module 330, configured to classify the associated index data into operation index data and order index data;
a data set construction module 340, configured to construct a device data set according to the operation index data, the order index data, and the default status data;
the model training module 350 is configured to train a preset model based on the device data set, and obtain a risk prediction model of the device.
As an example, the training device 300 of the risk prediction model further includes:
the time window data set classification module is used for classifying the equipment data sets according to a preset time interval to obtain a second number of time window data sets;
a loop execution module, configured to execute a loop step for each of the time window data sets in turn based on a time sequence, where the loop step includes:
inputting the time window data set into the risk prediction model to obtain risk prediction data of the next time window;
comparing the risk prediction data of the next time window with the default state data to obtain risk prediction data with errors;
updating a next time window dataset based on the risk prediction data with errors;
and updating the risk prediction model according to the next time window data set.
As an example, the model training module 350 is further configured to train a preset model by using the operation index data and the order index data as model independent variables and the default state data as model dependent variables, so as to obtain a risk prediction model of the device.
As an example, the training device 300 of the risk prediction model further includes:
the positive and negative classification module is used for classifying the returned data into positive class data and negative class data, wherein the number of the positive class data is smaller than that of the negative class data;
and the data adding module is used for carrying out linear fitting on the positive class data and adding the number of the positive class data until the difference value of the number of the positive class data and the number of the negative class data is smaller than a preset threshold value.
As one example, model training module 350 includes:
the data set classification sub-module is used for classifying the equipment data set into a training data set, a test data set and a verification data set based on a preset proportion;
an algorithm training sub-module for training at least one preset model based on the training data set;
the algorithm testing sub-module is used for testing each preset model according to the testing data set to obtain a model to be verified;
and the algorithm verification sub-module is used for verifying the model to be verified based on the verification data set and determining a risk prediction model of the equipment.
The training device 300 for risk prediction model is configured to perform the corresponding steps in the training method for risk prediction model, and specific implementation of each function is not described herein. Furthermore, the alternative example in embodiment 1 is also applicable to the training apparatus 300 of the risk prediction model in embodiment 3.
Example 4
Referring to fig. 4, fig. 4 is a schematic structural diagram of a risk prediction apparatus according to an embodiment of the present application. The risk prediction apparatus 400 in fig. 4 includes:
a data acquisition module 410, configured to acquire, in real time, to-be-detected backhaul data of the target device;
the risk prediction module 420 is configured to input the feedback data to be detected to a risk prediction model, so as to obtain risk prediction data of the target device, where the risk prediction model is obtained according to the training method of the risk prediction model described in embodiment 1.
The risk prediction apparatus 400 is configured to perform the corresponding steps in the risk prediction training method, and specific implementation of each function is not described herein. Furthermore, the alternative example in embodiment 2 is also applicable to the risk prediction apparatus 400 of embodiment 4.
The embodiment of the application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program implements the training method of the risk prediction model described in embodiment 1 or implements the risk prediction method described in embodiment 2 when the processor executes the computer program.
The data acquisition module 310, the data filtering module 320, the data classification module 330, the data set construction module 340, the model training module 350 in embodiment 1, or the data acquisition module 410, the risk prediction module 420, and the like in embodiment 2 are stored as program units in a memory, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem of low accuracy of risk analysis results is solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The present application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the training method of the risk prediction model described in embodiment 1, or implements the risk prediction method described in embodiment 2.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A method of training a risk prediction model, comprising:
collecting returned data of the equipment in real time, wherein the returned data comprises default state data and first quantity of dimension index data;
filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data;
classifying the associated index data into operation index data and order index data;
constructing a device data set according to the operation index data, the order index data and the default state data;
and training a preset model based on the equipment data set to obtain a risk prediction model of the equipment.
2. The method for training a risk prediction model according to claim 1, wherein after training a preset model based on the device data set, the method further comprises:
classifying the equipment data sets according to preset time intervals to obtain a second number of time window data sets;
performing a loop step for each of the time window data sets in turn based on a time sequence, wherein the loop step comprises:
inputting the time window data set into the risk prediction model to obtain risk prediction data of the next time window;
comparing the risk prediction data of the next time window with the default state data to obtain risk prediction data with errors;
updating a next time window dataset based on the risk prediction data with errors;
and updating the risk prediction model according to the next time window data set.
3. The method for training a risk prediction model according to claim 1, wherein training a preset model based on the device data set to obtain the risk prediction model of the device comprises:
and training a preset model by taking the operation index data and the order index data as model independent variables and the default state data as model dependent variables to obtain a risk prediction model of the equipment.
4. The method for training a risk prediction model according to claim 1, wherein before filtering the dimension index data according to the association relationship between the default state data and each dimension index data to obtain the association index data of the default state data, the method further comprises:
classifying the return data into positive class data and negative class data, wherein the number of the positive class data is smaller than that of the negative class data;
and performing linear fitting on the positive class data, and increasing the number of the positive class data until the difference value of the number of the positive class data and the number of the negative class data is smaller than a preset threshold value.
5. The method for training a risk prediction model according to claim 1, wherein training a preset model based on the device data set to obtain the risk prediction model of the device comprises:
classifying the device data set into a training data set, a test data set and a verification data set based on a preset proportion;
training at least one preset model based on the training data set;
according to the test data set, testing each preset model to obtain a model to be verified;
and verifying the model to be verified based on the verification data set, and determining a risk prediction model of the equipment.
6. A risk prediction method, comprising:
acquiring the to-be-detected return data of the target equipment in real time;
and inputting the feedback data to be detected into a risk prediction model to obtain risk prediction data of the target device, wherein the risk prediction model is obtained according to the training method of the risk prediction model as claimed in any one of claims 1 to 5.
7. A training device for a risk prediction model, comprising:
the data acquisition module is used for acquiring the returned data of the equipment in real time, wherein the returned data comprises the default state data and the first number of dimension index data;
the data filtering module is used for filtering the dimension index data according to the association relation between the default state data and each dimension index data to obtain association index data of the default state data;
the data classification module is used for classifying the associated index data into operation index data and order index data;
the data set construction module is used for constructing a device data set according to the operation index data, the order index data and the default state data;
and the model training module is used for training a preset model based on the equipment data set to obtain a risk prediction model of the equipment.
8. A risk prediction apparatus, comprising:
the data acquisition module is used for acquiring the feedback data to be detected of the target equipment in real time;
the risk prediction module is configured to input the feedback data to be detected into a risk prediction model to obtain risk prediction data of the target device, where the risk prediction model is obtained according to the training method of the risk prediction model according to any one of claims 1 to 5.
9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, implements the method of training the risk prediction model of any one of claims 1 to 5 or implements the method of risk prediction of claim 6.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the method of training the risk prediction model of any one of claims 1 to 5, or implements the risk prediction method of claim 6.
CN202211616132.8A 2022-12-15 2022-12-15 Training method of risk prediction model, risk prediction method and device Pending CN116051262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211616132.8A CN116051262A (en) 2022-12-15 2022-12-15 Training method of risk prediction model, risk prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211616132.8A CN116051262A (en) 2022-12-15 2022-12-15 Training method of risk prediction model, risk prediction method and device

Publications (1)

Publication Number Publication Date
CN116051262A true CN116051262A (en) 2023-05-02

Family

ID=86112371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211616132.8A Pending CN116051262A (en) 2022-12-15 2022-12-15 Training method of risk prediction model, risk prediction method and device

Country Status (1)

Country Link
CN (1) CN116051262A (en)

Similar Documents

Publication Publication Date Title
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
CN106991145B (en) Data monitoring method and device
CN112148561B (en) Method and device for predicting running state of business system and server
CN116205355B (en) Power load prediction method, device and storage medium
CN111242430A (en) Power equipment supplier evaluation method and device
CN117494292A (en) Engineering progress management method and system based on BIM and AI large model
KR101960755B1 (en) Method and apparatus of generating unacquired power data
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN113705074A (en) Chemical accident risk prediction method and device
CN116051262A (en) Training method of risk prediction model, risk prediction method and device
CN116384680A (en) Visual monitoring method, device and equipment for research and development quality and readable storage medium
CN115619539A (en) Pre-loan risk evaluation method and device
CN111311086B (en) Capacity monitoring method, device and computer readable storage medium
CN115373339A (en) Machine tool spare part monitoring method, equipment and medium based on industrial Internet
CN114418450A (en) Data processing method and device
CN112398706B (en) Data evaluation standard determining method and device, storage medium and electronic equipment
CN114491936A (en) Method and device for determining service life of high-pressure manifold and high-pressure manifold system
CN115858606A (en) Method, device and equipment for detecting abnormity of time series data and storage medium
CN113703974A (en) Method and device for predicting server capacity
CN113065683A (en) Price prediction method, device, equipment and storage medium for vehicle pledge
CN113128734A (en) Method and device for predicting oil field yield
CN116881787A (en) Data sample classification method and device, processor and electronic equipment
CN111461446B (en) Prediction method and device for complaint report cases based on machine learning
CN112882854B (en) Method and device for processing request exception
Galatro et al. Data Analytics for Process Engineers: Prediction, Control and Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination