CN116894721A

CN116894721A - Index prediction method and device and computer equipment

Info

Publication number: CN116894721A
Application number: CN202310783775.XA
Authority: CN
Inventors: 张鹏; 陈绍尊; 常洞霞; 许志强; 袁华
Original assignee: Industrial Bank Co Ltd; CIB Fintech Services Shanghai Co Ltd
Current assignee: Industrial Bank Co Ltd; CIB Fintech Services Shanghai Co Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-10-17

Abstract

The application relates to an index prediction method. The method comprises the following steps: obtaining user original resource interaction data, and carrying out standardization processing on the user original resource interaction data to obtain standardized standard resource data, wherein the standard resource data comprises a plurality of types of sub-resource data; calculating each sub-resource data in the sub-resource data of a plurality of types to obtain the contribution rate of the sub-resource data to the standard resource data, and screening the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data as predicted resource data; and inputting the prediction resource data into a pre-constructed prediction model to obtain a credit index prediction result of the user. The method can improve the accuracy and efficiency of index prediction.

Description

Index prediction method and device and computer equipment

Technical Field

The present application relates to the field of data processing technology, and in particular, to an index prediction method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

With the development of economy, many individuals and small micro-enterprises become important clients of banks, and various services for such clients are introduced by banks in dispute. Prior to borrowing, the bank needs to predict which customers may have default conditions in the future.

In the related art, a professional determines whether the service is subjected to default conditions according to historical default information of a client and based on experience knowledge, so that the judgment result is inaccurate.

Disclosure of Invention

Based on the above, it is necessary to provide an index prediction method for the above technical problems, which can screen data through standardized processing and principal component analysis, can retain principal information of the data, and predict according to original resource interaction data of a user, so as to improve prediction speed.

In a first aspect, the present application provides an index prediction method. The method comprises the following steps:

obtaining user original resource interaction data, and carrying out standardization processing on the user original resource interaction data to obtain standardized standard resource data, wherein the standard resource data comprises a plurality of types of sub-resource data;

calculating each sub-resource data in the sub-resource data of a plurality of types to obtain the contribution rate of the sub-resource data to the standard resource data, and screening the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data as predicted resource data;

and inputting the prediction resource data into a pre-constructed prediction model to obtain a credit index prediction result of the user.

In one embodiment, the prediction model is an integrated algorithm based on a decision tree, and is obtained by training with a pre-constructed training sample, and the training process of the prediction model includes:

inputting the predicted resource data in the training sample into a preset model, carrying out grid search on the training sample to obtain target training parameters of the predicted model, and training an integration algorithm of a decision tree adjusted to the target training parameters according to the predicted resource data in the training sample.

In one embodiment, the calculating each of the sub-resource data in the plurality of types of sub-resource data to obtain the contribution rate of the sub-resource data to the standard resource data includes:

establishing a covariance matrix of the sub-resource data to obtain a characteristic value of the sub-resource data;

and according to the characteristic values of the sub-resource data and the characteristic values of the standard resource data, the variance contribution rate is obtained.

In one embodiment, the standardized resource data after the standardized processing includes:

obtaining the maximum value and the minimum value of the sub-original resource data in the user original resource interaction data, and calculating the extremely bad;

and obtaining the standard resource data after the standardized processing according to the sub-original resource data, the minimum value and the extreme difference.

In one embodiment, the method further comprises:

when the predicted result and the actual result have errors;

and performing optimization training on the model by using the user original resource interaction data corresponding to the error.

In a second aspect, the present application also provides a breach prediction apparatus, the apparatus comprising:

the processing module is used for acquiring user original resource interaction data, carrying out standardized processing on the user original resource interaction data to obtain standardized processed standard resource data, wherein the standard resource data comprises a plurality of types of sub-resource data;

the analysis module is used for calculating each sub-resource data in the sub-resource data of a plurality of types to obtain the contribution rate of the sub-resource data to the standard resource data, and screening the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data as predicted resource data;

and the prediction module is used for inputting the prediction resource data into a pre-constructed prediction model to obtain a credit index prediction result of the user.

inputting the predicted resource data in the training sample into a preset model, carrying out network search on the training sample to obtain target training parameters of the predicted model, and training an integration algorithm of a decision tree adjusted to the target training parameters according to the predicted resource data in the training sample.

In one embodiment, the apparatus further comprises:

when the violation prediction result and the actual result have errors;

In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the breach prediction method when the processor executes the computer program.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a breach prediction method.

In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of a breach prediction method.

The default prediction method at least comprises the following beneficial effects:

according to the embodiment scheme provided by the disclosure, the original resource interaction data of the user can be subjected to standardized processing, the influence of the data with larger numerical value is reduced, the obtained standard resource data is screened, the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data is obtained and used as the prediction resource data, and the credit index prediction result of the user is obtained by utilizing a pre-built prediction model. The accuracy and efficiency of index prediction can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present disclosure, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a diagram of an application environment for a method of index prediction in one embodiment;

FIG. 2 is a flow chart of a method of index prediction in one embodiment;

FIG. 3 is a flow chart of a method of index prediction in one embodiment;

FIG. 4 is a block diagram of an index prediction device in one embodiment;

FIG. 5 is a block diagram of an index prediction device in one embodiment;

FIG. 6 is an internal block diagram of a computer device in one embodiment;

fig. 7 is an internal structural diagram of a server in one embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For example, if first, second, etc. words are used to indicate a name, but not any particular order.

The embodiment of the disclosure provides a default prediction method, which can be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In some embodiments of the present disclosure, as shown in fig. 2, a breach prediction method is provided, and the method is applied to the server in fig. 1 to process the user original resource interaction for example. It will be appreciated that the method may be applied to a server, and may also be applied to a system comprising a terminal and a server, and implemented by interaction of the terminal and the server. In a specific embodiment, the method may include the steps of:

s202: and obtaining user original resource interaction data, and carrying out standardization processing on the user original resource interaction data to obtain standardized standard resource data, wherein the standard resource data comprises a plurality of types of sub-resource data.

The user original resource interaction data can comprise personal service data, enterprise service data, credit card service data and other financial service data, wherein the user original resource interaction data comprises a plurality of types of sub-resource interaction data, and the characteristics of properties, dimensions, orders of magnitude and the like among different types of sub-resource interaction data have certain differences, before analyzing the data, the user original resource interaction data can be subjected to standardized analysis, the user original resource interaction data can be converted according to a certain proportion in a certain mathematical transformation mode, and falls into a small specific interval, for example, an interval of 0-1 or-1, so that the differences of the characteristic properties, dimensions, orders of magnitude and the like among different variables are eliminated, and are converted into relative values without dimensions, namely standardized values, so that the values of all indexes are in the same quantity level, and the indexes of different units or orders of magnitude can be comprehensively analyzed and compared.

The standardized standard resource data includes a plurality of types of sub-resource data, and the sub-resource data may include personal information data, credit information data, financial information data, and the like.

S204: calculating each sub-resource data in the sub-resource data of a plurality of types to obtain the contribution rate of the sub-resource data to the standard resource data, and screening the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data as prediction resource data.

Each type of sub-resource data may be considered as a one-dimensional feature of the standard resource data, and if it is currently an n-dimensional feature, the n-dimensional feature may be mapped to k-dimensions, where n < k, and the k-dimensional feature is a main feature in the standard resource data, and the main feature may be sub-resource data with a contribution rate higher than a contribution threshold, and may be used as prediction resource data to perform prediction of an index, and in some embodiments of the present disclosure, the contribution threshold may be set to 85%. Sub-resource data with a contribution rate higher than a contribution threshold value can be searched in a coordinate system mode, and the selection of a new coordinate axis is closely related to the data. The first new coordinate axis is selected to be the direction with the maximum variance in the original data, the second new coordinate axis is selected to be the direction with the maximum variance in the plane orthogonal to the first coordinate axis, the third axis is the direction with the maximum variance in the plane orthogonal to the 1 st and the 2 nd axes, and so on, n coordinate axes can be obtained, most of variances are contained in the previous K coordinate axes, the K-dimensional data can be used as prediction resource data for prediction, and the dimension reduction processing of the data can be realized.

S206: and inputting the prediction resource data into a pre-constructed prediction model to obtain a credit index prediction result of the user.

The predicted resource data is data containing more user information, and the predicted resource data is input into a pre-constructed prediction model to obtain a credit index prediction result of the user. The credit index prediction result can be passing, failing, to be inspected and the like.

In the index prediction method, the original resource interaction data of the user can be subjected to standardized processing, the influence of data with larger numerical values is reduced, the obtained standard resource data is screened, the sub-resource data with the contribution rate higher than the contribution threshold value in the sub-resource data is obtained as prediction resource data, and the credit index prediction result of the user is obtained by utilizing a pre-constructed prediction model. The accuracy and efficiency of index prediction can be improved.

In some embodiments of the present disclosure, the prediction model is an integrated algorithm based on a decision tree, and is obtained by training with a pre-constructed training sample, and the training process of the prediction model includes:

The prediction model can be a regression model of an integrated algorithm based on a decision tree, the initial model is trained by using resource data in a training sample, and optimal model parameters are obtained in a grid search mode, wherein the optimal model parameters can comprise the number and the maximum depth of the decision tree. The prediction resource data in the training sample can be divided into a plurality of parts, one part is selected as a test set, the rest part is taken as a verification set, and the effect of the model is evaluated through a plurality of calculation modes. The effect of the model can also be assessed by subject work characteristics (ROC, receiver operating characterist), AUC values (Area Under ROC Curve).

In some embodiments of the present disclosure, the calculating each of the sub-resource data in the plurality of types of sub-resource data to obtain a contribution rate of the sub-resource data to the standard resource data includes:

All the sub-resource data can be displayed in the coordinate system, and a new coordinate system is selected through the dispersion of the sub-resource data on the coordinate axis, so that the dispersion of the sub-resource data is reduced. And selecting a correct coordinate axis, and determining which dimension data are reserved according to the data variance in each dimension. The coordinate system can be rotated through matrix transformation, wherein the mathematical derivation process comprises characteristic values and characteristic vectors, the characteristic vectors corresponding to the characteristic values can be used as coordinate axes, the characteristic values can be used as variances of data on the coordinate axes after rotation, and the information quantity contained in the directions of the corresponding characteristic vectors can be represented. The characteristic value of the sub-resource data divided by the characteristic value of the standard resource data is the contribution rate of the sub-resource data.

In some embodiments of the present disclosure, the normalized standard resource data includes:

The method comprises the steps of obtaining maximum sub-original resource data and minimum sub-original resource data, calculating the difference value of the maximum sub-original resource data and the minimum sub-original resource data to be extremely poor, calculating the difference value of each sub-original resource data at the minimum value, dividing the difference value by the extremely poor to obtain a numerical value range of 0-1, eliminating the difference of characteristic attributes such as properties, dimensions, orders of magnitude and the like among different variables, and converting the characteristic attributes into a dimensionless relative value, namely a standardized value, so that the numerical values of all indexes are in the same number level.

In some embodiments of the present disclosure, fig. 3 is a flow chart illustrating a method for index prediction in one embodiment, the method further comprising:

s302: when the predicted result and the actual result have errors; and performing optimization training on the model by using the user original resource interaction data corresponding to the error.

After the model obtains the credit index prediction result, the model can be manually judged again, and if errors exist, the original data of the user is input into the model for training.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the disclosure further provides an index prediction device for implementing the index prediction method. The implementation scheme of the solution provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the index prediction device provided below may refer to the limitation of the index prediction method hereinabove, and will not be repeated herein.

The apparatus may comprise a system (including a distributed system), software (applications), modules, components, servers, clients, etc. that employ the methods described in the embodiments of the present specification in combination with the necessary apparatus to implement the hardware. Based on the same innovative concepts, embodiments of the present disclosure provide for devices in one or more embodiments as described in the following examples. Because the implementation scheme and the method for solving the problem by the device are similar, the implementation of the device in the embodiment of the present disclosure may refer to the implementation of the foregoing method, and the repetition is not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

In one embodiment, as shown in fig. 4, an index prediction apparatus 400 is provided, which may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. The apparatus 400 may include:

the processing module 402 is configured to obtain user original resource interaction data, and perform standardization processing on the user original resource interaction data to obtain standardized standard resource data, where the standard resource data includes multiple types of sub-resource data;

an analysis module 404, configured to calculate each of the sub-resource data in the plurality of types of sub-resource data, so as to obtain a contribution rate of the sub-resource data to the standard resource data, and screen sub-resource data in the sub-resource data, where the contribution rate is higher than a contribution threshold, as predicted resource data;

and the prediction module 406 is configured to input the predicted resource data into a pre-constructed prediction model, so as to obtain a credit index prediction result of the user.

In one embodiment, fig. 5 is a block diagram of an index prediction device in one embodiment, the device further comprising:

the training module 502 is used for when the predicted result and the actual result have errors; and performing optimization training on the model by using the user original resource interaction data corresponding to the error.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The above-described respective modules in the index prediction device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store user raw resource interaction data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an index prediction method.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements an index prediction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the structures shown in fig. 6 and 7 are merely block diagrams of partial structures related to the disclosed aspects and do not constitute a limitation of the computer device on which the disclosed aspects are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, implements the method of any of the embodiments of the present disclosure.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method described in any of the embodiments of the present disclosure.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory, among others. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided by the present disclosure may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors involved in the embodiments provided by the present disclosure may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing-based data processing logic, etc., without limitation thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples have expressed only a few embodiments of the present disclosure, which are described in more detail and detail, but are not to be construed as limiting the scope of the present disclosure. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of the present disclosure should be determined from the following claims.

Claims

1. An index prediction method, characterized in that the method comprises:

2. The method according to claim 1, wherein the predictive model is an integrated decision tree-based algorithm trained using pre-constructed training samples, and the training process of the predictive model comprises:

3. The method of claim 1, wherein the calculating each of the sub-resource data in the plurality of types of sub-resource data to obtain the contribution rate of the sub-resource data to the standard resource data comprises:

4. The method of claim 1, wherein the normalized standard resource data comprises:

5. The method according to claim 1, wherein the method further comprises:

when the predicted result and the actual result have errors;

6. A breach prediction apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the predictive model is a decision tree based integrated algorithm trained using pre-constructed training samples, the training process of the predictive model comprising:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.