CN116523622A - Object risk prediction method and device, electronic equipment and storage medium - Google Patents

Object risk prediction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116523622A
CN116523622A CN202310440622.5A CN202310440622A CN116523622A CN 116523622 A CN116523622 A CN 116523622A CN 202310440622 A CN202310440622 A CN 202310440622A CN 116523622 A CN116523622 A CN 116523622A
Authority
CN
China
Prior art keywords
data
sample
information
sub
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310440622.5A
Other languages
Chinese (zh)
Inventor
欧阳燕绚
易艳
肖京
王建明
张路
张静兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310440622.5A priority Critical patent/CN116523622A/en
Publication of CN116523622A publication Critical patent/CN116523622A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides an object risk prediction method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring sample data; the sample data is labeling data with a sample label; coding the first basic information data to obtain information characteristics, and coding the sample behavior data to obtain behavior characteristics; inputting the sample relational network data into a preset graph neural network model for feature extraction to obtain relational features; performing feature stitching on the information features, the behavior features and the relation features to obtain sample stitching features; inputting the sample splicing characteristics into a preset initial evaluation model to perform risk prediction to obtain an initial risk tag; parameter adjustment is carried out on the initial evaluation model, and a target evaluation model is obtained; and inputting the obtained target data into a target evaluation model to perform risk prediction, so as to obtain a target risk tag. According to the risk prediction method and device, accuracy of risk prediction can be improved.

Description

Object risk prediction method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an object risk prediction method and apparatus, an electronic device, and a storage medium.
Background
In the related art, individual data of a user is modeled through a machine learning model such as a logistic regression score card model, a tree model and the like, so as to determine a risk probability evaluation result of the user. However, the individual data acquired by the method has a certain limitation on determining the variation condition of the user, so that the accuracy of the risk probability evaluation result is affected.
Disclosure of Invention
The embodiment of the application mainly aims to provide an object risk prediction method and device, electronic equipment and storage medium, and aims to improve accuracy of risk prediction.
To achieve the above object, a first aspect of an embodiment of the present application proposes an object risk prediction method, including: acquiring sample data; the sample data is annotation data with a sample label, wherein the sample label is used for representing a risk level category of a sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
Coding the first basic information data to obtain information characteristics, and coding the sample behavior data to obtain behavior characteristics;
inputting the sample relational network data into a preset graph neural network model for feature extraction to obtain relational features;
performing feature stitching on the information features, the behavior features and the relation features to obtain sample stitching features;
inputting the sample splicing characteristics into a preset initial evaluation model to perform risk prediction to obtain an initial risk tag;
parameter adjustment is carried out on the initial evaluation model according to the initial risk label and the sample label, and a target evaluation model is obtained;
inputting the obtained target data into the target evaluation model for risk prediction to obtain a target risk tag; the target data comprises target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk class of the target object.
In some embodiments, the sample tag includes a first positive tag and a first negative tag, the first base information data includes first sub-data having the first positive tag and second sub-data having the first negative tag;
The encoding processing of the first basic information data to obtain information features includes:
discretizing the first basic information data to obtain a plurality of initial sample information intervals;
acquiring the data quantity of the first sub-data contained in one of the initial sample information intervals to obtain a first sub-quantity;
acquiring the data quantity of the second sub-data contained in one of the initial sample information intervals to obtain a second sub-quantity;
acquiring the data quantity of the first sub-data contained in all the initial sample information intervals to obtain a first total quantity;
acquiring the data quantity of the second sub-data contained in all the initial sample information intervals to obtain a second total quantity;
calculating to obtain initial information quantity of the initial sample information interval according to the first sub-quantity, the second sub-quantity, the first total quantity and the second total quantity;
and calculating to obtain total information quantity of all the initial sample information intervals according to the initial information quantity, and if the total information quantity is in a first preset range, carrying out coding processing on the initial sample information intervals to obtain the information characteristics.
In some embodiments, the sample tag includes a first positive tag and a first negative tag, the first base information data includes first sub-data having the first positive tag and second sub-data having the first negative tag;
the encoding processing of the first basic information data to obtain information features includes:
discretizing the first basic information data to obtain an initial sample information interval;
acquiring the data quantity of first sub-data contained in the initial sample information interval to obtain a first sub-quantity;
acquiring the data quantity of second sub-data contained in the initial sample information interval to obtain a second sub-quantity;
acquiring verification data according to the acquisition time of the sample data and a preset sampling interval; the verification data are labeling data with a verification tag, the verification tag is used for representing a risk level category of the sample object, and the verification data comprise second basic information data of the sample object; the verification tag comprises a second positive tag and a second negative tag, and the second basic information data comprises third sub-data with the second positive tag and fourth sub-data with the second negative tag;
Discretizing the second basic information data to obtain a verification information interval;
acquiring the data quantity of the third sub-data contained in the verification information interval to obtain a third sub-quantity;
acquiring the data quantity of the fourth sub-data contained in the verification information interval to obtain a fourth sub-quantity;
calculating a stable value of the initial sample information interval according to the first sub-quantity, the second sub-quantity, the third sub-quantity and the fourth sub-quantity;
and if the stable value is in a second preset range, carrying out coding processing on the initial sample information interval to obtain the information characteristic.
In some embodiments, the sample relational network data comprises object association data and initial device common data, the relational features comprising object relational features and device relational features;
inputting the relational network data into a preset graph neural network model for feature extraction to obtain relational features, wherein the method comprises the following steps:
inputting the object association data into the graph neural network model for feature extraction to obtain the object relationship features;
data screening is carried out on the initial equipment shared data to obtain target equipment shared data;
And inputting the shared data of the target equipment to the graph neural network model for feature extraction to obtain the equipment relationship features.
In some embodiments, the object association data includes node data having a node tag for characterizing a risk level class of a node object, and relationship data for representing an association of the node object with the sample object, the node tag including a third positive tag and a third negative tag, the node data including first child node data having the third positive tag and second child node data having the third negative tag;
before the characteristic splicing is carried out according to the information characteristic, the behavior characteristic and the relation characteristic to obtain a sample splicing characteristic, the method further comprises the following steps:
acquiring the data quantity of the first sub-node data of which the relationship data represent direct association relationship, and obtaining the first node quantity;
acquiring the data quantity of second sub-node data of which the relationship data represents a direct association relationship, and obtaining the number of second nodes;
calculating to obtain a first duty ratio number according to the first node number and the second node number;
Acquiring the data quantity of the first sub-node data of which the relationship data represents the indirect association relationship, and obtaining the third node quantity;
acquiring the data quantity of the second sub-node data of which the relationship data represents the indirect association relationship, and obtaining the fourth node quantity;
calculating to obtain a second duty ratio number according to the third node number and the fourth node number;
constructing a duty ratio feature according to the first duty ratio quantity and the second duty ratio quantity;
and updating the relation characteristic according to the occupancy bit.
In some embodiments, the sample tag comprises a first positive tag and a first negative tag, and the target stitching feature comprises a first stitching feature having the first positive tag and a second stitching feature having the first negative tag;
before the sample splicing features are input into a preset initial evaluation model to perform risk prediction, and an initial risk label is obtained, the method further comprises:
performing up-acquisition processing on the first spliced characteristic to obtain an extended characteristic;
and updating the sample splicing characteristics according to the expansion characteristics.
In some embodiments, the performing parameter adjustment on the initial evaluation model according to the initial risk tag and the sample tag to obtain a target evaluation model includes:
Calculating according to the initial risk tag and a preset model evaluation function to obtain an evaluation value;
and carrying out parameter adjustment on the initial evaluation model according to the evaluation value, the initial risk label and the sample label to obtain the target evaluation model.
To achieve the above object, a second aspect of the embodiments of the present application proposes an object risk prediction apparatus, the apparatus including:
the sample data acquisition module is used for acquiring sample data; the sample data is annotation data with a sample label, wherein the sample label is used for representing a risk level category of a sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
the coding module is used for coding the first basic information data to obtain information characteristics and coding the sample behavior data to obtain behavior characteristics;
the feature extraction module is used for inputting the sample relational network data into a preset graph neural network model to perform feature extraction to obtain relational features;
the characteristic splicing module is used for carrying out characteristic splicing according to the information characteristic, the behavior characteristic and the relation characteristic to obtain a sample splicing characteristic;
The first risk assessment module is used for inputting the sample splicing characteristics into a preset initial assessment model to carry out risk assessment to obtain an initial risk label;
the parameter adjustment module is used for carrying out parameter adjustment on the initial evaluation model according to the initial risk label and the sample label to obtain a target evaluation model;
the second risk assessment module is used for inputting the acquired target data into the target assessment model to conduct risk prediction so as to obtain a target risk label; the target data comprises target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk class of the target object.
To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, which includes a memory and a processor, the memory storing a computer program, the processor implementing the method according to the first aspect when executing the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method of the first aspect.
According to the object risk prediction method and device, the electronic equipment and the storage medium, the sample splicing characteristics are input into the preset initial evaluation model to conduct risk prediction, and the initial risk label representing the risk class category of the sample object is obtained. Training the initial evaluation model through the initial risk label and the sample label to obtain a target evaluation model. The sample splicing features not only comprise information features and behavior features for representing individual data of the sample object, but also comprise relationship features for representing relationship data of the sample object and other objects, so that the target evaluation model obtained through training can fully learn relationship data of the sample object and other objects, the individual data of the sample object and the correlation with risk class categories, the phenomenon that modeling is carried out only through the individual data of the sample object in the related art is avoided, and the accuracy of risk prediction can be improved when risk prediction is carried out on the risk class categories of the target object through the target evaluation model.
Drawings
FIG. 1 is a flowchart of an object risk prediction method provided in an embodiment of the present application;
FIG. 2 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
FIG. 3 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
FIG. 4 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of object association data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of initial device common data according to an embodiment of the present application;
FIG. 7 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
FIG. 8 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
FIG. 9 is another flow chart of an object risk prediction method provided by an embodiment of the present application;
fig. 10 is a schematic structural diagram of an object risk prediction apparatus provided in an embodiment of the present application;
fig. 11 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
First, several nouns referred to in this application are parsed:
artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Wind control model: is the short name of a risk control model, and is used for controlling the risk of the business. Specifically, in the decision flow of the business system, the wind control model can provide effective data support and decision basis for business decision makers. The wind control model can identify, classify and early warn potential business risks in a prospective way, automatically evaluate and intelligently quantify the risk detail and the level of the business objects in a multi-dimensional way, and further count and analyze the change trend of the risks, so that the risk cost of using the objects by the wind control model is reduced to the greatest extent.
And (3) box separation treatment: is a data preprocessing technique for reducing the effects of minor viewing errors. Specifically, the binning process is a method of "binning" a plurality of consecutive values into fewer data, and thus the binning process is also referred to as a discretization process. For example, the age dataset is binned to obtain multiple bins of [ 0,12 ], [ 13,20 ], [ 21,50 ], [ 50,100 ], etc. The related technology comprises equal frequency bin division, equidistant bin division, chi-square bin division and minimum entropy value bin division methods. The equal frequency bin and the equidistant bin are non-supervision bins, and the chi-square bin and the minimum entropy bin are supervision bins. Equi-frequency binning refers to each bin containing a substantially equal number of instances, e.g., when a data set is binned into 10 bins, each bin contains 10% of the number of instances. Equidistant binning refers to equally dividing N equal parts from ase:Sub>A minimum value to ase:Sub>A maximum value in ase:Sub>A datase:Sub>A set, for example, ase:Sub>A is the minimum value in the datase:Sub>A set, B is the maximum value in the datase:Sub>A set, and then the length w= (B-ase:Sub>A)/N of each feature group, and boundary values of ase:Sub>A plurality of intervals obtained by binning are a+w, a+2w, respectively.
WOE (Weight of Eivdence, evidence weight): WOE is a coded form of the original feature. To perform WOE encoding on a feature, a discretization process is performed on a data set corresponding to the feature. Wherein, the calculation formula of the WOE value is shown in the following formula (1):
wherein py i Indicating the proportion of the positive samples in the ith interval to the positive samples in all intervals, pn i Representing the proportion of the negative examples in the ith interval to the negative examples in all intervals, y i Represents the number of positive examples samples in the ith interval, n i Representing the number of negative examples in the ith interval, y T Represents the number of positive examples samples in all intervals, n T Representing the number of negative examples in all intervals. Thus, equation (1) represents the difference between "the proportion of positive example samples in the current section to positive example samples in all sections" and "the proportion of negative example samples in the current section to negative example samples in all sections". The smaller the WOE value, the smaller the difference, i.e. the greater the probability of dividing the current interval into positive examples. Therefore, the WOE value describes the current interval in which a certain feature is located, and has positive influence on the influence direction and the influence magnitude of judging whether the feature is a positive example, specifically, when the WOE value is positive, the current value of the feature has positive influence on judging whether an individual is a positive example; when the WOE value is negative, the current value of the feature has a negative impact on determining whether the individual is a positive example. The specific value of the WOE value is the embodiment of the size of the influence.
IV (Information Value, information value or information amount): for screening features when constructing the model. In choosing features, the predictive power of the features is measured. For example, assuming that in a classification problem, the classification class includes Y1 and Y2, for a sample X to be predicted, when determining whether X belongs to Y1 or Y2, certain information needs to be obtained to provide a basis for the determination result. Assuming that the total amount of these pieces of information is I, the information that can provide a basis for judgment is included in the features C1, C2, C3, &... When the more information the feature Ci contains, the greater its contribution to determining whether the sample X to be predicted belongs to Y1 or Y2, the greater the information value of the feature Ci, i.e. the greater the IV of the feature Ci, the more the feature Ci should be put into the model variable list. The calculation formula of the IV value is shown in the following formula (2):
iv i =(py i -pn i )·WOE i ... (2)
The formula for calculating the IV value of the whole feature is shown in the following formula (3):
PSI (Population Stability Index ): the PSI value reflects the stability of the distribution of the validation sample over the fractional segments and the distribution of the modeling sample. In modeling, PSI values are commonly used to screen feature variables, evaluate model stability. The evaluation of stability requires a reference comparison and therefore requires the setting of two distribution samples, the actual distribution and the expected distribution. Where training samples are typically used as the expected distribution and verification samples are used as the actual distribution in modeling, the verification samples may take Out of Sample (OOS) and Out of Time (OOT) samples. The calculation formula of the PSI value is shown in the following formula (4):
Wherein D is i Representing the actual distribution duty ratio of the ith interval after the binning treatment, E i Indicating the expected distribution duty of the ith interval after binning. The smaller the PSI value, the smaller the difference between the actual sample and the validated sample, the more stable the model. For example, when the value range of the PSI value is 0-0.1, the model stability is good, that is, no change or less change exists between the actual sample and the verification sample; when the value range of the PSI value is 0.1-0.25, the model is slightly unstable, namely, the actual sample and the verification sample have changes, and the subsequent changes should be continuously monitored; when the value range of the PSI value is larger than 0.25, the model is unstable, namely, a large difference exists between the actual sample and the verification sample, and characteristic item analysis is required. By calculating the PSI value, the method can avoid the fact that in practical application, the PSI value is calculatedThe model is affected by factors such as guest group change (fast change of user group in the mutual gold market), data source acquisition change (the data source acquisition interface is controlled by wind) and the like, and the actual sample distribution is deviated, so that the problem of unstable model is caused.
OOT (Out of Time, OOT) samples: the time-span samples, i.e., the actual samples and validation samples in the calculation of the PSI values, are replaced with samples that span in time. For example, when the sample collected at 5 in 2022 is taken as the verification sample, the actual sample may be taken as the sample collected at 6 in 2022. At this time, the time span between the verification sample and the actual sample is one month, and it can be understood that the time span can also be adaptively selected according to actual needs, which is not specifically limited in the embodiment of the present application.
In the related art, individual data of a user is modeled through a machine learning model such as a logistic regression score card model, a tree model and the like, so as to determine a risk probability evaluation result of the user. However, the individual data acquired by the method has a certain limitation on determining the variation condition of the user, so that the accuracy of the risk probability evaluation result is affected.
Based on the above, the embodiment of the application provides an object risk prediction method and device, electronic equipment and storage medium, and aims to improve the accuracy of risk prediction.
The method and device for predicting object risk, electronic equipment and storage medium provided by the embodiments of the present application are specifically described through the following embodiments, and the method for predicting object risk in the embodiments of the present application is described first.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The embodiment of the application provides an object risk prediction method, and relates to the technical field of artificial intelligence. The object risk prediction method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the object risk prediction method, but is not limited to the above form.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be noted that, in each specific embodiment of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of these data comply with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.
Fig. 1 is an optional flowchart of a method for predicting risk of an object according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S107.
Step S101, obtaining sample data; the sample data is annotation data with a sample label, the sample label is used for representing a risk level class of the sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
Step S102, encoding the first basic information data to obtain information characteristics, and encoding the sample behavior data to obtain behavior characteristics;
step S103, inputting the sample relational network data into a preset graph neural network model for feature extraction to obtain relational features;
step S104, carrying out feature stitching on the information features, the behavior features and the relation features to obtain sample stitching features;
step S105, inputting sample splicing characteristics into a preset initial evaluation model to perform risk prediction, and obtaining an initial risk label;
step S106, carrying out parameter adjustment on the initial evaluation model according to the initial risk label and the sample label to obtain a target evaluation model;
step S107, inputting the obtained target data into a target evaluation model for risk prediction to obtain a target risk label; the target data comprise target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk level class of the target object.
In the steps S101 to S106 illustrated in the embodiments of the present application, the risk prediction is performed by inputting the sample splicing feature into a preset initial evaluation model, so as to obtain an initial risk tag representing the risk class category of the sample object. Training the initial evaluation model through the initial risk label and the sample label to obtain a target evaluation model. The sample splicing features not only comprise information features and behavior features for representing individual data of the sample object, but also comprise relationship features for representing relationship data of the sample object and other objects, so that the target evaluation model obtained through training can fully learn relationship data of the sample object and other objects, the individual data of the sample object and the correlation with risk class categories, the phenomenon that modeling is carried out only through the individual data of the sample object in the related art is avoided, and the accuracy of risk prediction can be improved when risk prediction is carried out on the risk class categories of the target object through the target evaluation model.
It should be noted that, first, the target evaluation model constructed in the embodiment of the present application is a wind control model, and the scoring card model of the wind control model is suitable for application scenarios such as bank credit, financial wind control, enterprise risk evaluation, and vendor risk evaluation. In the object risk prediction method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present application, an example in which the wind control model is applied to bank credit is specifically described, but it should be understood that when the wind control model is applied to other application scenarios and the applicability modification is performed on the object risk prediction method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present application, the adaptive modification scheme shall also belong to the protection scope of the embodiments of the present application. Secondly, when the wind control model is applied to bank credit as an example for concrete explanation, the sample of the positive example is a sample object with a sample label being the positive label, namely the risk class of the sample object is a class indicating that risks exist; the negative example is a sample object with a sample label being negative, namely, the risk level class of the sample object is a class which indicates that no risk exists.
In step S101 of some embodiments, sample data of a plurality of sample objects are obtained from a sample database, it being understood that the sample objects are objects for which the relevant bank credit operation has been completed and for which the risk class category is known. Thus, the acquired sample data is data having a sample tag for representing a risk level category of the corresponding sample object. The risk class can be set into a plurality of classes according to actual needs, for example, a class and a class are set, wherein one class represents that a sample object is a risk object or a risk object with a high probability exists; the class II indicates that the sample object is a non-risk object or a risk object with a small probability. It is understood that the sample data includes first base information data, sample behavior data, and sample relationship network data. Wherein the first basic information data includes individual basic data such as sex, age, historical loan number, historical loan time, and the like of the sample object, and loan basic data. The sample behavior data comprise browsing time of the sample object in a target application, browsing times of the target webpage, clicking times of the target webpage and the like, wherein the target application refers to an application capable of carrying out reference evaluation on a risk level category of the sample object, such as financial application and the like, and the same target webpage can be a financial webpage and the like. The sample relationship network data includes object association data of the sample object with other objects in interpersonal communication, and device sharing data for characterizing whether the sample object shares a device with the other objects.
It is understood that the sample data may be obtained by an API (Application Programming Interface, application level encoding interface), a data embedding point, or the like, which is not specifically limited to the embodiment of the present application.
In step S102 of some embodiments, the first basic information data is encoded by means of single-hot encoding or the like, so as to obtain information features. Meanwhile, the sample behavior data is encoded in the same encoding mode, so that behavior characteristics are obtained.
Referring to fig. 2, in some embodiments, the sample tag includes a first positive tag and a first negative tag, and the first base information data includes first sub-data having the first positive tag and second sub-data having the first negative tag. The "encoding processing is performed on the first basic information data to obtain the information feature" in step S102 includes, but is not limited to, steps S201 to S207.
Step S201, discretizing the first basic information data to obtain a plurality of initial sample information intervals;
step S202, acquiring the data quantity of first sub-data contained in one initial sample information interval to obtain a first sub-quantity;
step S203, obtaining the data quantity of second sub-data contained in one of the initial sample information intervals to obtain a second sub-quantity;
Step S204, obtaining the data quantity of first sub-data contained in all initial sample information intervals to obtain a first total quantity;
step S205, obtaining the data quantity of second sub-data contained in all the initial sample information intervals to obtain a second total quantity;
step S206, calculating to obtain the initial information quantity of the initial sample information interval according to the first sub-quantity, the second sub-quantity, the first total quantity and the second total quantity;
step S207, a total information amount of all the initial sample information sections is calculated according to the initial information amount, and if the total information amount is within a first preset range, the initial sample information sections are encoded to obtain information features.
In step S201 of some embodiments, mapping relationships are respectively constructed between the sample tags and the sample data of the sample object, so that the first basic information data, the sample behavior data, and the sample relationship network data are all data with sample tags. Thus, when the sample tag includes a first positive tag and a first negative tag, the first basic information data correspondingly includes first sub-data having the first positive tag, and second sub-data having the first negative tag. The first sub data is basic information data corresponding to the positive example sample, and the second sub data is basic information data corresponding to the negative example sample. It is understood that the first sub-data and the second sub-data each include continuous type data and discrete type data, for example, continuous type data including age, income, etc., and discrete type data including gender, academy, etc. Discretizing continuous data in the first sub data and the second sub data to obtain a plurality of initial sample information intervals. The discretization processing mode may be any of equidistant bin division, equal frequency bin division, chi-square bin division, minimum entropy value bin division, and the like, which is not particularly limited in this embodiment of the present application. Taking equidistant binning as an example, discretizing the first sub-data and the second sub-data representing the ages to obtain a plurality of initial sample information intervals such as [ 20,29 ], [ 30,39 ], [ 40,49 ], [ 50,59 ].
In step S202 and step S203 of some embodiments, the data amount of the first sub data included in any one of the initial sample information sections and the data amount of the second sub data included are acquired. For example, the acquired sample data includes sample data of three sample objects of a sample object L1, a sample object L2 and a sample object L3, wherein the sample object L1 is a positive sample and is 21 years old; sample subject L2 is a negative example, with an age of 27 years; sample subject L3 is a negative example, whose age was 38 years. Then for an initial sample information interval [ 20,29 ], the first sub-number of the initial sample information interval is 1 and the second sub-number is 2.
In step S204 and step S205 of some embodiments, the data amount of the first sub-data included in all the initial sample information sections representing the same attribute and the data amount of the second sub-data included are acquired. For example, a first total number is obtained by acquiring first data amounts contained in all the initial sample information sections representing the age attribute, and a second total number is obtained by acquiring second data amounts contained in all the initial sample information sections representing the age attribute.
In step S206 of some embodiments, an initial sample is calculated according to the above formulas (1) and (2) The initial information amount of the information interval, e.g. the initial information amount iv of the initial sample information interval [ 20,29 ] is calculated i . Wherein y is i A first sub-number, n, representing an i-th initial sample information interval i A second sub-number, y, representing the i-th initial sample information interval T Represents a first total number, n T Representing a second total number.
In step S207 of some embodiments, the total information amount IV of the initial sample information section representing the age attribute is calculated according to the above formula (3). When the value of the total information quantity IV is in the first preset range, more information which can provide a risk prediction basis in the initial sample information interval which represents the age attribute is indicated, namely, the information has a larger contribution to risk prediction. Therefore, the plurality of initial sample information sections are encoded by means of single-heat encoding and the like, and information characteristics corresponding to each initial sample information section are obtained.
According to the method and the device for estimating the risk prediction of the target evaluation model, the information value of the initial sample information intervals with different attributes is judged through the total information quantity, so that the initial sample information intervals capable of contributing to distinguishing the positive example sample and the negative example sample are obtained through screening, the initial evaluation model is prevented from learning through the non-contributing information characteristics, the data processing quantity of the initial evaluation model is further reduced, and the risk prediction accuracy of the target evaluation model obtained through training of the initial evaluation model is improved.
It can be understood that the method for encoding the sample behavior data may refer to the method for encoding the first basic information data in the above embodiment, and this will not be repeated in this embodiment of the present application.
Referring to fig. 3, in other embodiments, the "encoding the first basic information data to obtain the information feature" in step S102 may further include, but is not limited to, steps S301 to S309.
Step S301, discretizing the first basic information data to obtain an initial sample information interval;
step S302, acquiring the data quantity of first sub-data contained in an initial sample information interval to obtain a first sub-quantity;
step S303, acquiring the data quantity of second sub-data contained in the initial sample information interval to obtain a second sub-quantity;
step S304, acquiring verification data according to the acquisition time of sample data and a preset sampling interval; the verification data are labeling data with verification tags, the verification tags are used for representing risk level categories of the sample objects, and the verification data comprise second basic information data of the sample objects; the verification tag comprises a second positive tag and a second negative tag, and the second basic information data comprises third sub-data with the second positive tag and fourth sub-data with the second negative tag;
Step S305, discretizing the second basic information data to obtain a verification information interval;
step S306, obtaining the data quantity of the third sub-data contained in the verification information interval to obtain the third sub-quantity;
step S307, obtaining the data quantity of the fourth sub-data contained in the verification information interval to obtain the fourth sub-quantity;
step S308, calculating a stable value of the initial sample information interval according to the first sub-number, the second sub-number, the third sub-number and the fourth sub-number;
step S309, if the stable value is within the second preset range, the initial sample information interval is encoded to obtain the information feature.
In step S301 of some embodiments, mapping relationships are respectively constructed between sample tags and sample data of sample objects, such that the first basic information data, the sample behavior data, and the sample relationship network data are all data with sample tags. Thus, when the sample tag includes a first positive tag and a first negative tag, the first basic information data correspondingly includes first sub-data having the first positive tag, and second sub-data having the first negative tag. The first sub data is basic information data corresponding to the positive example sample, and the second sub data is basic information data corresponding to the negative example sample. It is understood that the first sub-data and the second sub-data each include continuous type data and discrete type data, for example, continuous type data including age, income, etc., and discrete type data including gender, academy, etc. Discretizing continuous data in the first sub data and the second sub data to obtain a plurality of initial sample information intervals. The discretization processing mode may be any of equidistant bin division, equal frequency bin division, chi-square bin division, minimum entropy value bin division, and the like, which is not particularly limited in this embodiment of the present application. Taking equidistant box division as an example, discretizing the first sub-data and the second sub-data representing the historical loan times to obtain a plurality of initial sample information intervals such as 0,4, 5,9, 10,14, 15,19 and the like.
In step S302 and step S303 of some embodiments, the data amount of the first sub-data included in one of the initial sample information sections and the data amount of the second sub-data included therein are acquired. For example, the obtained sample data includes sample data of three sample objects, namely a sample object L1, a sample object L2 and a sample object L3, wherein the sample object L1 is a positive sample, and the historical loan number is 1; sample object L2 is a negative example, whose historical loan number is 3; sample object L3 is a negative example, with a historical loan count of 6. Then for an initial sample information interval [ 0,4 ], the first sub-number of the initial sample information interval is 1 and the second sub-number is 1.
In step S304 of some embodiments, cross-time samples of sample data are acquired according to the sampling time and a preset sampling interval, i.e. verification data are acquired. For example, if the sample data is data acquired for a plurality of sample objects at 2022 month 5, the verification data is data acquired for the sample objects at 2022 month 4. It will be appreciated that in the above example, the sampling time is 2022, 5 months, the preset sampling interval is one month, and the sampling time and the preset sampling interval may be adaptively adjusted according to actual needs, which is not specifically limited in this embodiment of the present application. The verification data is data with a verification tag, and the type of the data included in the verification data is the same as that of the sample data, namely the verification data includes second basic information data of the sample object and behavior data of the sample object. Accordingly, the second basic information data includes third sub data having a second positive label and fourth sub data having a second negative label, corresponding to the sample data.
In step S305 of some embodiments, the same discretization process is performed on the second basic information data, so as to obtain a plurality of verification information intervals. For example, the same equidistant binning process is performed on the second basic information data representing the historical loan number, so as to obtain a plurality of verification information intervals such as [ 0,4 ], [ 5,9 ], [ 10,14 ], [ 15,19 ], and the like. It will be appreciated that the number of validation information intervals should be the same as the number of initial sample information intervals.
In steps S306 to S307 of some embodiments, the data size of the third sub-data included in one of the verification information intervals is obtained, so as to obtain the third sub-size. And obtaining the data quantity of the fourth sub-data contained in the verification information interval to obtain the fourth sub-quantity.
In step S308 of some embodiments, a stable value PSI of the initial sample information interval is calculated according to equation (4) above, wherein A i The method comprises the steps of calculating according to a first sub-quantity and a second sub-quantity, wherein the first sub-quantity and the second sub-quantity are used for representing the distribution duty ratio of positive samples contained in an initial sample information interval in the current month of acquisition time; e (E) i The distribution ratio of the positive sample contained in the verification information interval is calculated according to the third sub-quantity and the fourth sub-quantity and is used for representing the acquisition month of the verification data.
In step S309 of some embodiments, when the calculated stable value is within the second preset range, the first basic information data included in the initial sample information interval indicating the current attribute is stable, for example, the first basic information data included in the initial sample information interval indicating the historical loan number attribute is stable. Therefore, when modeling is performed based on the historical loan number data in the first base information data, the stability of the model can be improved. And encoding the initial sample information interval according to modes such as single-heat encoding and the like to obtain corresponding information characteristics.
It can be understood that the method for encoding the sample behavior data may refer to the method for encoding the first basic information data in the above embodiment, and this will not be repeated in this embodiment of the present application.
In some specific embodiments, to enhance the screening of the first basic information data, the data may be screened by using the total information amount IV and the stable value PSI at the same time, for example, when the total information amount of the initial sample information section representing the attribute of the historical loan number is 0.3+.iv+.1, and the stable value PSI of the initial sample information section is less than 0.1, the initial sample information section is encoded.
In step S103 of some embodiments, the sample relational network data is graph structure data, so when the sample relational network data is used as input data of a preset graph neural network model, the data with association relations in the sample relational network data can be clustered through the graph neural network model, thereby obtaining corresponding relational features.
Referring to FIG. 4, in some embodiments, the sample relational network data comprises object association data and initial device common data, and the relational features comprise object relational features and device relational features. Step S103 includes, but is not limited to, steps S401 to S403.
S401, inputting object association data into a graph neural network model for feature extraction to obtain object relationship features;
step S402, data screening is carried out on the initial equipment shared data to obtain target equipment shared data;
and S403, inputting the shared data of the target equipment into a graph neural network model for feature extraction to obtain the equipment relationship features.
In step S401 of some embodiments, the sample relationship network data includes object association data for characterizing a relationship of the sample object with other objects, and initial device sharing data for characterizing whether the sample object shares a device with other objects. It is understood that the object association data and the initial device common data are graph structure data. For example, referring to fig. 5, the object-related data includes data that can be expressed as an object-relationship triplet, such as (sample object, parent-child, other object 1), (sample object, parent-child, other object 2), (other object 2, relatives, other object 3), (other object 3, friends, other object 4), and the like. Referring to fig. 6, the initial device-sharing data includes (sample object, use, device 1), (other object 5, use, device 1), (sample object, use, device 2), and the like, which can be expressed as device-sharing relationship triples. And extracting the characteristics of the object associated data through a preset graph neural network to obtain corresponding object relationship characteristics. It will be appreciated that the diagram structural patterns shown in fig. 5 and 6 are merely exemplary, and other diagram structural patterns may be provided according to actual needs, and the embodiments of the present application are not specifically limited thereto.
In step S402 of some embodiments, data filtering is performed on the initial device common data to filter out invalid data and/or abnormal data, so as to obtain target device common data. For example, device common relation triples with device ID numbers that are empty are filtered out. It will be appreciated that when a device has a "use" relationship with a plurality of different objects, or an object has a "use" relationship with a plurality of different devices at the same time, risk behavior is considered to exist, and thus the target device common data can be used as an evaluation basis for risk prediction.
In step S403 of some embodiments, feature extraction is performed on the target device common data through a preset graph neural network, so as to obtain corresponding device relationship features.
According to the method and the device for predicting the risk of the object, the object association data and the initial equipment sharing data are represented in a graph structure, so that the relation between the sample object and other objects can be mined through the graph neural network model, the risk level type of the other objects is used as a prediction reference of the risk level type of the sample object, the integrity of modeling data is improved, and the phenomenon that the accuracy of risk prediction is affected due to the fact that individual data cannot comprehensively reflect the change condition of the object when the risk is predicted only through the object individual data in the related art is avoided.
Referring to fig. 7, in some embodiments, the object association data includes node data having a node tag for characterizing a risk level class of the node object, and relationship data for characterizing an association of the node object with the sample object, the node tag including a third positive tag and a third negative tag, the node data including first child node data having a third positive tag and second child node data having a third negative tag. Prior to step S104, the object risk prediction method provided in the embodiment of the present application further includes, but is not limited to, steps S701 to S708.
Step 701, acquiring the data quantity of the first child node data of which the relationship data represents the direct association relationship, and obtaining the first node quantity;
step S702, obtaining the data quantity of second sub-node data of which the relationship data represents the direct association relationship, and obtaining the second node quantity;
step S703, calculating a first duty ratio number according to the first node number and the second node number;
step S704, obtaining the data quantity of the first sub-node data of which the relationship data represents the indirect association relationship, and obtaining the third node quantity;
step S705, obtaining the data quantity of the second sub-node data of which the relationship data represents the indirect association relationship, and obtaining the fourth node quantity;
Step S706, calculating to obtain a second duty ratio number according to the third node number and the fourth node number;
step S707, constructing a duty ratio feature according to the first duty ratio number and the second duty ratio number;
step S708, the relation characteristic is updated according to the occupancy sign.
It should be noted that, referring to fig. 5, in the embodiment of the present application, the other objects are node objects. And taking each node object and the corresponding risk level category of the node object as node data. According to whether the node label is a third positive label or a third negative label, namely according to whether the node object is a positive example sample or a negative example sample, the node data can be divided into first sub-node data with the third positive label and second sub-node data with the third negative label. For example, referring to fig. 5, when the other object 1 is a positive example sample, the other object 2 is a negative example sample, the other object 3 is a negative example sample, and the other object 4 is a positive example sample, the object association data shown in fig. 5 includes first child node data 501 and data 504, and second child node data 502 and data 503. The relationship data is used for representing the association relationship between the node object and the sample object, and the association relationship comprises a direct association relationship and an indirect association relationship. When a certain node object and a sample object need to be associated through other node objects, determining that the association relationship between the node object and the sample object is an indirect association relationship. For example, if the other object 3 needs to be associated with the sample object through the other object 2, the association relationship between the other object 3 and the sample object is an indirect association relationship; the other objects 4 need to be associated with the sample object through the other objects 3 and 2, and the association relationship between the other objects 4 and the sample object is an indirect association relationship. When a certain node object and a sample object do not need to be associated through other node objects, determining that the association relationship between the node object and the sample object is a direct association relationship, for example, the association relationship between other objects 1 and the sample object is a direct association relationship, and the association relationship between other objects 2 and the sample object is a direct association relationship.
In step S701 of some embodiments, the data size of the first child node data having a direct association with the sample object in the object association data is obtained, to obtain a first node size.
In step S702 of some embodiments, the data size of second child node data having a direct association with the sample object in the object association data is obtained, to obtain a second node number.
In step S703 of some embodiments, a first duty number is calculated according to the first node number and the second node number, where the first duty number is used to characterize a distribution duty ratio of first child node data having a direct association relationship with the sample object in the object association data, that is, the first duty number is calculated according to the following formula (5).
First duty ratio number=first node number/(first node number+second node number) & gt
In step S704 of some embodiments, the data amount of the first child node data having an indirect association relationship with the sample object in the object association data is obtained, so as to obtain the third node number. It will be appreciated that the first child node data may comprise data having a two degree relationship with the sample object, or comprise data having a two degree relationship with the sample object and data having a three degree relationship with the sample object. The second degree relationship is used for representing that the corresponding first child node data and the sample object are required to be associated through one object; the three-degree relationship is used for representing that the corresponding first child node data and the sample object are required to be associated through two objects. For example, the first child node data 504 shown in FIG. 5 is in a three degree relationship with the sample object. Correspondingly, the direct association relationship may be referred to as a one-degree relationship, for example, the first child node data 501 shown in fig. 5 is a one-degree relationship with the sample object.
In step S705 of some embodiments, the data amount of the second child node data having an indirect association relationship with the sample object in the object association data is obtained, to obtain the fourth node number. It will be appreciated that the second child node data may comprise data having a two degree relationship with the sample object, or comprise data having a two degree relationship with the sample object and data having a three degree relationship with the sample object.
In step S706 of some embodiments, a second number of duty ratios is calculated according to the third number of nodes and the fourth number of nodes, where the second number of duty ratios is used to characterize a distribution duty ratio of the first child node data having an indirect association relationship with the sample object in the object association data, that is, the second number of duty ratios is calculated according to the following formula (6).
Second duty cycle number=third node number/(third) number of nodes+number of fourth nodes.)..
It is understood that when the first child node data includes data having a two-degree relationship with the sample object and data having a three-degree relationship with the sample object, and the second child node data includes data having a two-degree relationship with the sample object and data having a three-degree relationship with the sample object, the second duty number should be calculated based on the data amount corresponding to the two-degree relationship and the data amount corresponding to the three-degree relationship, respectively. I.e. the second number of duty cycles comprises a distribution duty cycle for characterizing the positive examples in the two-degree relationship and a distribution duty cycle for characterizing the positive examples in the three-degree relationship.
In step S707 of some embodiments, the first duty number and the second duty number are spliced, so as to construct a duty sign s= { S1, S2}. Where s1 represents a first number of duty cycles and s2 represents a second number of duty cycles.
In step S708 of some embodiments, the relationship features are updated according to the occupancy signature such that the relationship features include the occupancy features and features of the graph neural network model output according to the sample relationship network data.
According to the method and the device for modeling the risk level, modeling is conducted through the first-degree relationship, the second-degree relationship and the third-degree relationship of the sample object, so that the initial evaluation model can learn and obtain the risk level category of the object with the first-degree relationship, the second-degree relationship and the third-degree relationship with the sample object and the risk level category of the sample object, and the phenomenon that in the related art, risk prediction is inaccurate due to modeling only through object individual data is avoided.
In step S104 of some embodiments, the information features, the behavior features, and the relationship features are sequentially subjected to feature stitching, so as to obtain multidimensional sample stitching features. Therefore, the sample stitching feature can characterize the association relationship between the sample object and other objects and the association relationship between the sample object and equipment used by other objects besides the individual information of the sample object.
Referring to fig. 8, in some embodiments, prior to step S105, the method provided by embodiments of the present application further includes, but is not limited to including, step S801 and step S802.
Step S801, performing up-acquisition processing on the first spliced characteristic to obtain an extended characteristic;
and step S802, updating sample splicing characteristics according to the expansion characteristics.
In step S801 of some embodiments, since in practical applications, the probability of the risk of the object is smaller than the probability of no risk, i.e. the number of positive examples is smaller than the number of negative examples. Therefore, in order to balance the sample distribution, the sample data corresponding to the sample of the positive example is up-sampled. Specifically, since the sample data includes numeric data and category data, an up-sampling algorithm is used to up-sample the first stitching feature, so as to obtain an extended feature. The first splicing characteristic is a target splicing characteristic obtained after sample data of the positive sample are subjected to the processing steps described in the embodiment. That is, the sample label includes a first positive label and a first negative label, and the target stitching feature includes a first stitching feature having the first positive label and a second stitching feature having the first negative label. It is understood that a positive sample is a sample object with a first positive label.
In step S803 of some embodiments, the training set of the initial evaluation model includes a plurality of sample stitching features, and the sample stitching features are updated according to the expansion features, so as to obtain a new training set, i.e., the new training set includes the plurality of sample stitching features and the expansion features.
According to the method and the device for predicting the risk, the quantity balance of the positive example samples and the negative example samples in the training set is guaranteed through up-sampling processing, so that the phenomenon that the risk prediction accuracy is low due to the fact that the quantity of the samples is unbalanced is avoided to a certain extent.
In step S105 of some embodiments, preliminary risk prediction is performed on the sample stitching features through a preset initial evaluation model, so as to obtain an initial risk tag for representing a risk class category of the sample object.
In step S106 of some embodiments, the loss value of the initial risk tag and the sample tag is calculated according to a preset loss function, and the parameter adjustment is performed on the initial evaluation model according to the calculated loss value, so as to obtain a target evaluation model with more accurate risk prediction.
Referring to fig. 9, in some embodiments, step S106 includes, but is not limited to including, step S901 and step S902.
Step S901, calculating to obtain an evaluation value according to an initial risk tag and a preset model evaluation function;
And step S902, carrying out parameter adjustment on the initial evaluation model according to the evaluation value, the initial risk label and the sample label to obtain a target evaluation model.
In step S901 of some embodiments, the risk differentiating capability of the initial evaluation model is evaluated according to the initial risk tag and a preset model evaluation function, so as to obtain a corresponding evaluation value. Wherein, the model evaluation function may be a KS (Kolmogorov-Smirnov) function or other model evaluation functions, which is not specifically limited in this embodiment of the present application.
In some implementations, in step S902, the initial risk tag and the sample tag are subjected to loss calculation according to a preset loss function to obtain a loss value, and the initial evaluation model is subjected to parameter adjustment according to the loss value and the evaluation value to improve the risk distinguishing capability of the initial evaluation model, so as to obtain the target evaluation model.
In step S107 of some embodiments, target data of a target object to be evaluated is acquired, and risk prediction is performed on the target data through a target evaluation model, so as to obtain a target risk tag for representing a risk class of the target object. The data type of the target data is the same as the data type of the sample data, namely the target data also comprises target behavior data, target relation network data and target basic information data. The data content specifically included in the target behavior data, the target relationship network data, and the target basic information data may refer to sample data, which is not described in detail in this embodiment of the present application.
According to the object risk prediction method, the sample splicing characteristics are input into the preset initial evaluation model to conduct risk prediction, and the initial risk label representing the sample object risk level category is obtained. Training the initial evaluation model through the initial risk label and the sample label to obtain a target evaluation model. The sample splicing features not only comprise information features and behavior features for representing individual data of the sample object, but also comprise relationship features for representing relationship data of the sample object and other objects, so that the target evaluation model obtained through training can fully learn relationship data of the sample object and other objects, the individual data of the sample object and the correlation with risk class categories, the phenomenon that modeling is carried out only through the individual data of the sample object in the related art is avoided, and the accuracy of risk prediction can be improved when risk prediction is carried out on the risk class categories of the target object through the target evaluation model. And by independently constructing the duty ratio characteristics, the interpretability of the target evaluation model is improved, so that the business can adjust the corresponding approval process according to the interpretability of the model.
Referring to fig. 10, an embodiment of the present application further provides an object risk prediction apparatus, which may implement the object risk prediction method, where the apparatus includes:
a sample data obtaining module 1001, configured to obtain sample data; the sample data is annotation data with a sample label, the sample label is used for representing a risk level class of the sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
the encoding module 1002 is configured to encode the first basic information data to obtain information features, and encode the sample behavior data to obtain behavior features;
the feature extraction module 1003 is configured to input the sample relational network data to a preset graph neural network model to perform feature extraction, so as to obtain relational features;
the feature stitching module 1004 is configured to perform feature stitching according to the information feature, the behavior feature, and the relationship feature, to obtain a sample stitching feature;
the first risk assessment module 1005 is configured to input the sample splicing feature to a preset initial assessment model for risk assessment, so as to obtain an initial risk tag;
the parameter adjustment module 1006 is configured to perform parameter adjustment on the initial evaluation model according to the initial risk tag and the sample tag, so as to obtain a target evaluation model;
The second risk assessment module 1007 is configured to input the obtained target data to a target assessment model for risk prediction, so as to obtain a target risk tag; the target data comprise target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk level class of the target object.
The specific implementation of the object risk prediction apparatus is substantially the same as the specific embodiment of the object risk prediction method described above, and will not be described herein.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the object risk prediction method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 11, fig. 11 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 1101 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
The memory 1102 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). Memory 1102 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in memory 1102, and the processor 1101 invokes an object risk prediction method to execute the embodiments of the present application;
an input/output interface 1103 for implementing information input and output;
the communication interface 1104 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
bus 1105 transmits information between the various components of the device (e.g., processor 1101, memory 1102, input/output interface 1103, and communication interface 1104);
wherein the processor 1101, memory 1102, input/output interface 1103 and communication interface 1104 enable communication connection therebetween within the device via bus 1105.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the object risk prediction method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A method of predicting risk of an object, the method comprising:
Acquiring sample data; the sample data is annotation data with a sample label, wherein the sample label is used for representing a risk level category of a sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
coding the first basic information data to obtain information characteristics, and coding the sample behavior data to obtain behavior characteristics;
inputting the sample relational network data into a preset graph neural network model for feature extraction to obtain relational features;
performing feature stitching on the information features, the behavior features and the relation features to obtain sample stitching features;
inputting the sample splicing characteristics into a preset initial evaluation model to perform risk prediction to obtain an initial risk tag;
parameter adjustment is carried out on the initial evaluation model according to the initial risk label and the sample label, and a target evaluation model is obtained;
inputting the obtained target data into the target evaluation model for risk prediction to obtain a target risk tag; the target data comprises target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk class of the target object.
2. The method of claim 1, wherein the sample tag comprises a first positive tag and a first negative tag, the first base information data comprising a first sub-data having the first positive tag and a second sub-data having the first negative tag;
the encoding processing of the first basic information data to obtain information features includes:
discretizing the first basic information data to obtain a plurality of initial sample information intervals;
acquiring the data quantity of the first sub-data contained in one of the initial sample information intervals to obtain a first sub-quantity;
acquiring the data quantity of the second sub-data contained in one of the initial sample information intervals to obtain a second sub-quantity;
acquiring the data quantity of the first sub-data contained in all the initial sample information intervals to obtain a first total quantity;
acquiring the data quantity of the second sub-data contained in all the initial sample information intervals to obtain a second total quantity;
calculating to obtain initial information quantity of the initial sample information interval according to the first sub-quantity, the second sub-quantity, the first total quantity and the second total quantity;
And calculating to obtain total information quantity of all the initial sample information intervals according to the initial information quantity, and if the total information quantity is in a first preset range, carrying out coding processing on the initial sample information intervals to obtain the information characteristics.
3. The method of claim 1, wherein the sample tag comprises a first positive tag and a first negative tag, the first base information data comprising a first sub-data having the first positive tag and a second sub-data having the first negative tag;
the encoding processing of the first basic information data to obtain information features includes:
discretizing the first basic information data to obtain an initial sample information interval;
acquiring the data quantity of first sub-data contained in the initial sample information interval to obtain a first sub-quantity;
acquiring the data quantity of second sub-data contained in the initial sample information interval to obtain a second sub-quantity;
acquiring verification data according to the acquisition time of the sample data and a preset sampling interval; the verification data are labeling data with a verification tag, the verification tag is used for representing a risk level category of the sample object, and the verification data comprise second basic information data of the sample object; the verification tag comprises a second positive tag and a second negative tag, and the second basic information data comprises third sub-data with the second positive tag and fourth sub-data with the second negative tag;
Discretizing the second basic information data to obtain a verification information interval;
acquiring the data quantity of the third sub-data contained in the verification information interval to obtain a third sub-quantity;
acquiring the data quantity of the fourth sub-data contained in the verification information interval to obtain a fourth sub-quantity;
calculating a stable value of the initial sample information interval according to the first sub-quantity, the second sub-quantity, the third sub-quantity and the fourth sub-quantity;
and if the stable value is in a second preset range, carrying out coding processing on the initial sample information interval to obtain the information characteristic.
4. The method of claim 1, wherein the sample relational network data comprises object association data and initial device common data, and wherein the relational features comprise object relational features and device relational features;
inputting the relational network data into a preset graph neural network model for feature extraction to obtain relational features, wherein the method comprises the following steps:
inputting the object association data into the graph neural network model for feature extraction to obtain the object relationship features;
data screening is carried out on the initial equipment shared data to obtain target equipment shared data;
And inputting the shared data of the target equipment to the graph neural network model for feature extraction to obtain the equipment relationship features.
5. The method of claim 4, wherein the object association data comprises node data having a node tag, the node tag being used to characterize a risk level category of a node object, the relationship data being used to represent an association of the node object with the sample object, the node tag comprising a third positive tag and a third negative tag, the node data comprising first child node data having the third positive tag and second child node data having the third negative tag;
before the characteristic splicing is carried out according to the information characteristic, the behavior characteristic and the relation characteristic to obtain a sample splicing characteristic, the method further comprises the following steps:
acquiring the data quantity of the first sub-node data of which the relationship data represent direct association relationship, and obtaining the first node quantity;
acquiring the data quantity of second sub-node data of which the relationship data represents a direct association relationship, and obtaining the number of second nodes;
calculating to obtain a first duty ratio number according to the first node number and the second node number;
Acquiring the data quantity of the first sub-node data of which the relationship data represents the indirect association relationship, and obtaining the third node quantity;
acquiring the data quantity of the second sub-node data of which the relationship data represents the indirect association relationship, and obtaining the fourth node quantity;
calculating to obtain a second duty ratio number according to the third node number and the fourth node number;
constructing a duty ratio feature according to the first duty ratio quantity and the second duty ratio quantity;
and updating the relation characteristic according to the occupancy bit.
6. The method of any one of claims 1 to 5, wherein the sample label comprises a first positive label and a first negative label, the target stitching feature comprises a first stitching feature having the first positive label and a second stitching feature having the first negative label;
before the sample splicing features are input into a preset initial evaluation model to perform risk prediction, and an initial risk label is obtained, the method further comprises:
performing up-acquisition processing on the first spliced characteristic to obtain an extended characteristic;
and updating the sample splicing characteristics according to the expansion characteristics.
7. The method of claim 6, wherein the performing parameter adjustment on the initial assessment model according to the initial risk tag and the sample tag to obtain a target assessment model comprises:
Calculating according to the initial risk tag and a preset model evaluation function to obtain an evaluation value;
and carrying out parameter adjustment on the initial evaluation model according to the evaluation value, the initial risk label and the sample label to obtain the target evaluation model.
8. An object risk prediction apparatus, the apparatus comprising:
the sample data acquisition module is used for acquiring sample data; the sample data is annotation data with a sample label, wherein the sample label is used for representing a risk level category of a sample object, and the sample data comprises first basic information data of the sample object, sample behavior data of the sample object and sample relation network data of the sample object;
the coding module is used for coding the first basic information data to obtain information characteristics and coding the sample behavior data to obtain behavior characteristics;
the feature extraction module is used for inputting the sample relational network data into a preset graph neural network model to perform feature extraction to obtain relational features;
the characteristic splicing module is used for carrying out characteristic splicing according to the information characteristic, the behavior characteristic and the relation characteristic to obtain a sample splicing characteristic;
The first risk assessment module is used for inputting the sample splicing characteristics into a preset initial assessment model to carry out risk assessment to obtain an initial risk label;
the parameter adjustment module is used for carrying out parameter adjustment on the initial evaluation model according to the initial risk label and the sample label to obtain a target evaluation model;
the second risk assessment module is used for inputting the acquired target data into the target assessment model to conduct risk prediction so as to obtain a target risk label; the target data comprises target basic information data of a target object, target behavior data of the target object and target relation network data of the target object, and the target risk tag is used for representing a risk class of the target object.
9. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.
CN202310440622.5A 2023-04-12 2023-04-12 Object risk prediction method and device, electronic equipment and storage medium Pending CN116523622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310440622.5A CN116523622A (en) 2023-04-12 2023-04-12 Object risk prediction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310440622.5A CN116523622A (en) 2023-04-12 2023-04-12 Object risk prediction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116523622A true CN116523622A (en) 2023-08-01

Family

ID=87395343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310440622.5A Pending CN116523622A (en) 2023-04-12 2023-04-12 Object risk prediction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116523622A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575308A (en) * 2023-10-17 2024-02-20 中科宏一教育科技集团有限公司 Risk assessment method, device and equipment for distributed power distribution network and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575308A (en) * 2023-10-17 2024-02-20 中科宏一教育科技集团有限公司 Risk assessment method, device and equipment for distributed power distribution network and storage medium

Similar Documents

Publication Publication Date Title
CN112148987B (en) Message pushing method based on target object activity and related equipment
CN111698247B (en) Abnormal account detection method, device, equipment and storage medium
WO2018103718A1 (en) Application recommendation method and apparatus, and server
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN110516910A (en) Declaration form core based on big data protects model training method and core protects methods of risk assessment
CN110515986B (en) Processing method and device of social network diagram and storage medium
CN115081641A (en) Model training method, estimation result prediction method, device and storage medium
CN110852785B (en) User grading method, device and computer readable storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN115545331A (en) Control strategy prediction method and device, equipment and storage medium
CN113569162A (en) Data processing method, device, equipment and storage medium
CN116523622A (en) Object risk prediction method and device, electronic equipment and storage medium
CN117155771B (en) Equipment cluster fault tracing method and device based on industrial Internet of things
CN113807728A (en) Performance assessment method, device, equipment and storage medium based on neural network
CN113468421A (en) Product recommendation method, device, equipment and medium based on vector matching technology
CN112990583A (en) Method and equipment for determining mold entering characteristics of data prediction model
CN117435999A (en) Risk assessment method, apparatus, device and medium
CN116703526A (en) Article recommendation method, device, equipment and storage medium
CN111950623A (en) Data stability monitoring method and device, computer equipment and medium
CN116741396A (en) Article classification method and device, electronic equipment and storage medium
CN117009621A (en) Information searching method, device, electronic equipment, storage medium and program product
CN115392361A (en) Intelligent sorting method and device, computer equipment and storage medium
CN115099344A (en) Model training method and device, user portrait generation method and device, and equipment
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
US9317125B2 (en) Searching of line pattern representations using gestures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination