WO2022142001A1 - 基于多评分卡融合的目标对象评价方法及其相关设备 - Google Patents

基于多评分卡融合的目标对象评价方法及其相关设备 Download PDF

Info

Publication number
WO2022142001A1
WO2022142001A1 PCT/CN2021/090154 CN2021090154W WO2022142001A1 WO 2022142001 A1 WO2022142001 A1 WO 2022142001A1 CN 2021090154 W CN2021090154 W CN 2021090154W WO 2022142001 A1 WO2022142001 A1 WO 2022142001A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature variable
target
feature
model
variable set
Prior art date
Application number
PCT/CN2021/090154
Other languages
English (en)
French (fr)
Inventor
张巧丽
林荣吉
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022142001A1 publication Critical patent/WO2022142001A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a target object evaluation method, device, computer equipment and storage medium based on multi-scorecard fusion.
  • Scorecard models are widely used in risk assessment and control in the financial industry, such as pre-loan application scorecards, loan behavior scorecards, and post-loan collection scorecard models.
  • characteristic variables As model input.
  • the inventor realized that, on the one hand, when obtaining characteristic variables, the characteristic variables need to be screened to reduce the information redundancy of the characteristic variables, and on the other hand, the model fitting process Feature variables with high correlation will be partially eliminated, which will lose the diversity of features to a certain extent, thereby reducing the accuracy of the scorecard model and the evaluation accuracy.
  • the model score There is an offset in the distribution of the characteristic variables input by the card. After the correlation of the characteristic variables is eliminated, the information of the offset characteristic variables will be lost and cannot be compensated by other characteristic variables, which will increase the risk of the scorecard model and reduce the stability.
  • the purpose of the embodiments of the present application is to propose a target object evaluation method, device, computer equipment and storage medium based on multi-scorecard fusion, so as to solve the problem of the accuracy of the scorecard model caused by the loss of the diversity of features due to the elimination of feature variables in the prior art reduction, and the reduced stability of the scorecard model due to the long time span of the target variable.
  • the embodiment of the present application provides a target object evaluation method based on multi-scorecard fusion, which adopts the following technical embodiments:
  • a target object evaluation method based on multi-score card fusion comprising the following steps:
  • the embodiment of the present application also provides a target object evaluation device based on multi-scorecard fusion, which adopts the following technical embodiments:
  • a target object evaluation device based on multi-score card fusion comprising:
  • a feature acquisition module configured to acquire historical data of the target object, perform data cleaning according to the distribution state of the feature variables included in the historical data, and perform feature variable screening on the cleaned historical data to obtain an original feature variable set;
  • a grouping module which is used to perform a grouping operation on the original feature variable set to obtain a plurality of mutually exclusive target feature variable sets
  • a model building module for constructing a plurality of target LR models based on each described target feature variable set and generating the model precision value of each described target LR model
  • the model fusion module is used to generate a plurality of scorecard models based on each of the target LR models, wherein multiple scorecard models can be used to obtain a plurality of score values of the target object, according to each of the scorecard models and their scorecard models.
  • the model accuracy value of the corresponding target LR model is fused with the scorecard to obtain a target scorecard model, and the target score value of the target object can be output based on the target scorecard model.
  • the embodiments of the present application also provide a computer device, which adopts the following technical embodiments:
  • a computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical embodiments:
  • a computer-readable storage medium where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
  • the target object evaluation method, device, computer equipment and storage medium based on multi-scorecard fusion mainly have the following beneficial effects:
  • multiple mutually exclusive target feature variable sets are obtained by grouping, and then multiple LR models are constructed based on the target feature variable sets respectively, and multiple scorecard models have been obtained.
  • the target scorecard model is obtained by the fusion of the models, and the target score value of the target object is obtained, which can reduce the risk of the model caused by the offset of the feature variables, improve the stability of the model, and avoid the loss of information diversity caused by excessive deduplication of the feature variables. , to ensure the accuracy of the model, thereby improving the accuracy of the target object evaluation.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a target object evaluation method based on multi-scorecard fusion according to the present application
  • FIG. 3 is a schematic structural diagram of an embodiment of a target object evaluation device based on multi-scorecard fusion according to the present application
  • FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4
  • the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
  • the target object evaluation method based on multi-scorecard fusion provided in the embodiment of the present application is generally executed by the server, and accordingly, the target object evaluation device based on multi-scorecard fusion is generally set in the server.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 shows a flow chart of an embodiment of the target object evaluation method based on multi-scorecard fusion according to the present application.
  • the described target object evaluation method based on multi-scorecard fusion includes the following steps:
  • S204 generate a plurality of scorecard models based on each of the target LR models, wherein a plurality of score values of the target object can be obtained through the plurality of scorecard models, and according to each of the scorecard models and their corresponding target LR models
  • the model accuracy value of the model is fused with the scorecard to obtain a target scorecard model, and the target score value of the target object is output based on the target scorecard model.
  • the target object in this embodiment is mainly the transaction subject in the transaction activity scenario
  • the transaction object in this transaction activity scenario is not limited to the transaction of physical products, but may also include transactions of financial products, experience and knowledge transactions, labor force Transactions, such as the recruitment of insurance agents in the insurance industry, can be regarded as labor transactions.
  • Insurance agents sell labor as transaction subjects, so the target object can be a person or an enterprise.
  • the historical data can include attributes Data of different dimensions such as information and behavior information.
  • the attribute information includes the basic information of the insurance agent
  • the behavior information includes the pre-job performance of the agent recruitment, the activity of the insurance agent platform, Historical purchase policy information, etc., based on this information, feature variables of multiple dimensions related to the target object can be extracted.
  • exploratory data analysis (EDA, Exploratory Data Analysis) should be carried out for each characteristic variable contained in it, and the data distribution characteristics (ie distribution state) of the characteristic variables should be specifically analyzed, including but not limited to data saturation, Whether there are outliers, maximum values, minimum values, mean values, distribution types, etc., then perform data cleaning according to the data distribution characteristics, and process the dirty data, missing values, outliers, etc. in the acquired historical data. For example, when dealing with missing values, you can Delete the feature variables whose missing rate exceeds the preset threshold (the threshold is set according to the situation, and can be 50%, 70%, 90%, etc.).
  • the threshold is set according to the situation, and can be 50%, 70%, 90%, etc.
  • the feature variable screening is performed, based on the PSI (Population Stability Index) value and IV (information value) value of the feature variable, the unstable distribution, poor prediction ability and For a feature variable with unstable predictive ability, a screened original feature variable set is obtained, and for the convenience of later description, the original feature variable set is denoted as C 0 .
  • PSI Population Stability Index
  • IV information value
  • the grouping operation on the original feature variable set may be inputting the training sample set corresponding to the original feature variable set into a preset LightGBM model for training, and through the output of each
  • the information gain provided by the feature variables is used to perform grouping operations, which may include:
  • the first step is to determine the number of clusters according to the number of characteristic variables in the original characteristic variable set
  • the training sample set corresponding to the feature variables in the original feature variable set is input into the preset LightGBM model for training, and the information gain values of each feature variable in the model training process are output and sorted. Screen out several feature variables from the original feature variable set to obtain a target feature variable set;
  • the third step is to generate a new original feature variable set based on the screening of the remaining feature variables, input the training sample set corresponding to the feature variables in the new original feature variable set into the preset LightGBM model for training, and output the model training process and sort the information gain values of each feature variable in the set, and screen out several feature variables from the new original feature variable set based on the sorting result to obtain another target feature variable set;
  • the previous step is repeated until a plurality of target feature variable sets consistent with the number of groups are obtained, and the grouping operation is completed.
  • the number of clusters determined according to the number of feature variables in the original feature variable set is set to N, where N is a positive integer, since the target feature variable sets other than the target feature variable set obtained for the first time are all based on the remaining ones after screening
  • the characteristic variables are obtained, so the N target characteristic variable sets do not have the same characteristic variables, so through the above steps, N mutually exclusive target characteristic variable sets can be obtained; in this embodiment, according to the number of characteristic variables in the original characteristic variable set
  • the number of confirmed clusters N can be 2 or 3.
  • the filtering out several characteristic variables from the original characteristic variable set based on the sorting result includes: selecting several characteristic variables whose information gain values are ranked first, so that the sum of the information gain values of the selected several characteristic variables is the same as the information gain value of the selected several characteristic variables.
  • the ratio of the sum of the information gain values of all the feature variables in the original feature variable set exceeds the preset gain threshold.
  • the preset gain threshold can be selected as 90% and above.
  • the above method is also used to select several feature variables from the new original feature variable set based on the sorting result, and the preset gain threshold and the screening process of selecting several feature variables from the original feature variable set based on the sorting result remain Consistent.
  • N mutually independent LR models are constructed based on the N target feature variable sets, and after the LR models are generated, the model precision values of each LR model are obtained based on the predicted sample set.
  • constructing a target LR model according to a target feature variable set includes:
  • Binning the multiple sample values of the feature variables in the target feature variable set, calculating the WOE value of each of the bins, and using the WOE value to bin each feature variable in the target feature variable set Carry out coding, train the original LR model based on the coding result, and judge whether the weight coefficients of each feature variable in the target feature variable set in the trained LR model are all positive, and if so, the trained LR model is constructed. the target LR model.
  • each sample value is a bin
  • the The method of equal frequency division divides multiple sample values into bins to obtain several bins.
  • the method further includes: when each feature variable in the target feature variable set has a negative weight coefficient in the LR model after training, performing secondary screening of feature variables on the target feature variable set.
  • the secondary screening of feature variables on the target feature variable set includes:
  • C 1 , C 2 , ..., CN Take C 1 , C 2 , ..., CN to represent N target feature variable sets respectively, take the target feature variable set C 1 as an example, if C 1 contains 10 feature variables, namely x 1 , x 2 , ..., x 10. Initially perform binning and encoding operations based on x 1 , x 2 , ..., x 10 , and train the LR model based on the encoding results. When there is a negative weight coefficient in x 1 , x 2 , ..., x 10 , Arrange the information gain of x 1 , x 2 , ..., x 10 in descending order.
  • the basic feature variable set is obtained, and the remaining 6
  • the feature variables are arranged in descending order based on information gain as x 3 , x 1 , x 5 , x 7 , x 9 , x 6 , then x 3 is first added to the basic feature variable set for LR model training, and x 2 , x 4 ,
  • the basic feature variable set of x 8 , x 10 , and x 3 is used to judge the positive and negative weight coefficients. If there is a negative number, x 3 will be eliminated, otherwise it will be retained.
  • x 1 when x 3 is eliminated, x 1 will be added to the basic feature variable set. Carry out LR model training, and carry out positive and negative judgments of the weight coefficients of x 2 , x 4 , x 8 , x 10 , and x 1. When x 3 is reserved, add x 1 to the basic feature variable set for LR model training, and carry out Positive and negative judgment of the weight coefficients of x 2 , x 4 , x 8 , x 10 , x 3 , x 1 , and so on.
  • the LR model M 1 is generated based on the target feature variable set C 1 , and then the predicted sample set is input into M 1 to obtain its corresponding model accuracy AUC value, which is recorded as AUC 1 . Repeat the foregoing operations for C 2 , .
  • step S204 in this embodiment, the scorecard model is obtained according to the LR model and the following formula is used as the basis:
  • a and B are coefficients.
  • each feature variable can be expressed as is in the form of ( ⁇ i ⁇ ij ) ⁇ ij , and the scorecard model obtained according to the LR model is as follows:
  • N scorecard models are generated, which are respectively S 1 , S 2 , ..., S N , and p 0 , S 0 , and SD in each score card are consistent.
  • the target score value of the target object can be output through the target scorecard model.
  • the target object evaluation method based on multi-scorecard fusion provided by the present application, after obtaining feature variable sets based on historical data, multiple mutually exclusive target feature variable sets are obtained by grouping, and then multiple LR models are constructed based on the target feature variable sets respectively. , multiple scorecard models have been obtained, and finally the target scorecard model is obtained based on the fusion of multiple scorecard models, thereby obtaining the target score value of the target object, which can reduce the model risk caused by the offset of feature variables and improve the stability of the model At the same time, it avoids the loss of information diversity caused by excessive de-duplication of feature variables, ensures the accuracy of the model, and improves the accuracy of target object evaluation.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, computer readable instructions, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a target object evaluation device based on multi-scorecard fusion, which is the same as the method embodiment shown in FIG. 2 .
  • the apparatus can be specifically applied to various electronic devices.
  • the target object evaluation device based on multi-scorecard fusion described in this embodiment includes: a feature acquisition module 301 , a grouping module 302 , a model building module 303 and a model fusion module 304 .
  • the feature acquisition module 301 is used to acquire historical data of the target object, perform data cleaning according to the distribution state of the feature variables contained in the historical data, and filter the cleaned historical data for feature variables to obtain original feature variables
  • the grouping module 302 is used to perform a grouping operation on the original feature variable set to obtain a plurality of mutually exclusive target feature variable sets
  • the model building module 303 is used to construct a plurality of target feature variable sets based on each target LR model and generate model accuracy values of each of the target LR models
  • the model fusion module 304 is used to generate a plurality of scorecard models based on each of the target LR models, wherein the scorecard models can be obtained through the plurality of scorecard models.
  • scorecard fusion is performed according to the model accuracy values of each of the scorecard models and their corresponding target LR models to obtain a target scorecard model, and the target scorecard model can be output based on the target scorecard model.
  • the object's target rating value is the object's target rating value.
  • the process of forming the original feature variable set from the feature variables in the historical data by the feature acquisition module 301 may refer to the above method embodiments, which will not be expanded here.
  • the grouping module 302 when the grouping module 302 performs the grouping operation on the original feature variable set, it is used to input the training sample set corresponding to the original feature variable set into the preset LightGBM model for training,
  • the information gain provided by the feature variable is used to perform the grouping operation, which is specifically used to determine the number of groups according to the number of feature variables in the original feature variable set; input the training sample set corresponding to the feature variable in the original feature variable set into a preset
  • the LightGBM model is trained, the information gain value of each feature variable in the model training process is output and sorted, and based on the sorting result, several feature variables are screened from the original feature variable set to obtain a target feature variable set;
  • the feature variable generates a new original feature variable set, input the training sample set corresponding to the feature variable in the new original feature variable set into the preset LightGBM model for training, and output the information gain value of each feature variable in the model training process and sort, based on the sorting result, screen out several feature variables from
  • the grouping module 302 selects several feature variables from the original feature variable set based on the sorting result, it is specifically used to: select several feature variables with information gain values ranked first, so that the selected several features The ratio of the sum of the information gain values of the variables to the sum of the information gain values of all the feature variables in the original feature variable set exceeds a preset gain threshold.
  • the model building module 303 constructs a target LR model according to a target feature variable set
  • it is specifically configured to: classify a plurality of sample values of the feature variables in the target feature variable set into classification box, calculate the WOE value of each described bin, encode each bin of each feature variable in the target feature variable set with the WOE value, train the original LR model based on the coding result, and judge the target feature variable Whether the weight coefficients of each feature variable in the set in the trained LR model are all positive, and if so, the trained LR model is the constructed target LR model.
  • each sample value is a bin
  • the frequency division method divides multiple sample values into bins to obtain several bins.
  • the method of the model building module 303 is further configured to: when the weight coefficient of each feature variable in the target feature variable set in the LR model after training has a negative number, perform a calculation on the target feature variable set Secondary screening of characteristic variables.
  • Secondary screening for the specific process of secondary screening, reference may be made to the above method embodiments, which will not be described here.
  • the model fusion module 304 is specifically configured to perform weighted fusion of the accuracy values of each scorecard model and the corresponding LR model to obtain a target scorecard model.
  • the specific process can refer to the above method embodiments, which will not be described here.
  • the target object evaluation device based on multi-scorecard fusion provided by this application, after obtaining feature variable sets based on historical data, obtains multiple mutually exclusive target feature variable sets by grouping, and then constructs multiple LR models based on the target feature variable sets respectively. , multiple scorecard models have been obtained, and finally the target scorecard model is obtained based on the fusion of multiple scorecard models, thereby obtaining the target score value of the target object, which can reduce the model risk caused by the offset of feature variables and improve the stability of the model At the same time, it avoids the loss of information diversity caused by excessive de-duplication of feature variables, ensures the accuracy of the model, and improves the accuracy of target object evaluation.
  • FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other through a system bus.
  • the memory 41 stores computer-readable instructions
  • the processor 42 implements the above when executing the computer-readable instructions.
  • the steps of the target object evaluation method based on multi-scorecard fusion described in the method embodiment have beneficial effects corresponding to the above-mentioned target object evaluation method based on multi-scorecard fusion, and are not described here.
  • the computer device 4 having the memory 41, the processor 42, and the network interface 43 is shown in the figure, but it should be understood that it is not required to implement all the components shown, and more or more components may be implemented instead. Fewer components.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the readable storage medium Including flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable only memory Read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or a memory of the computer device 4 .
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store the operating system and various application software installed on the computer device 4 , such as computer-readable instructions corresponding to the above-mentioned target object evaluation method based on multi-scorecard fusion.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 42 is typically used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to execute computer-readable instructions stored in the memory 41 or process data, for example, execute computer-readable instructions corresponding to the target object evaluation method based on multi-scorecard fusion.
  • CPU Central Processing Unit
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores Computer-readable instructions, executable by at least one processor, to cause the at least one processor to perform the steps of the method for evaluating a target object based on multi-scorecard fusion as described above, and have the same steps as those described above.
  • the beneficial effects corresponding to the target object evaluation method of scorecard fusion are not discussed here.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical embodiments of the present application can be embodied in the form of software products that are essentially or contribute to the prior art.
  • the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, etc. , CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present application.

Abstract

本申请属于人工智能领域,涉及基于多评分卡融合的目标对象评价方法及其相关设备,所述方法包括:根据目标对象的历史数据包含的特征变量的分布状态进行数据清洗,之后进行特征变量筛选得到原始特征变量集;对原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;基于各目标特征变量集构建多个目标LR模型并生成模型精度值;基于各目标LR模型生成多个评分卡模型,根据各评分卡模型及其对应的模型精度值进行评分卡融合,得到目标评分卡模型,基于目标评分卡模型可输出目标评分值。此外,本申请还涉及区块链技术,被确定为私密信息的数据可存储于区块链中。本申请可以降低特征变量偏移带来的模型风险,提升模型稳定性的同时保证模型精度。

Description

基于多评分卡融合的目标对象评价方法及其相关设备
本申请要求于2020年12月31日提交中国专利局、申请号为202011617815.6,发明名称为“基于多评分卡融合的目标对象评价方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,具体涉及基于多评分卡融合的目标对象评价方法、装置、计算机设备及存储介质。
背景技术
评分卡模型广泛用于金融行业的风险评估与控制,如贷前申请评分卡、贷中行为评分卡、贷后催收评分卡模型等。在评分卡模型训练过程中,需获取特征变量作为模型输入,发明人意识到,一方面获取特征变量时需对特征变量进行筛选以降低特征变量的信息冗余度,另一方面模型拟合过程相关性较高的特征变量会部分被剔除,从而在一定程度上会损失特征的多样性,从而降低评分卡模型的精度,评价准确度降低,且对于目标变量时间跨度长的应用场景,模型评分卡输入的特征变量分布存在偏移,特征变量相关性被剔除后,偏移的特征变量信息将会损失且无法由其他特征变量弥补,导致评分卡模型风险增大,稳定性降低。
发明内容
本申请实施例的目的在于提出一种基于多评分卡融合的目标对象评价方法、装置、计算机设备及存储介质,以解决现有技术中因剔除特征变量而损失特征的多样性导致评分卡模型精度降低、以及因目标变量时间跨度长导致的评分卡模型稳定性降低的问题。
为了解决上述技术问题,本申请实施例提供一种基于多评分卡融合的目标对象评价方法,采用了如下所述的技术实施例:
一种基于多评分卡融合的目标对象评价方法,包括下述步骤:
获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,并基于所述目标评分卡模型输出所述目标对象的目标评分值。
为了解决上述技术问题,本申请实施例还提供一种基于多评分卡融合的目标对象评价装置,采用了如下所述的技术实施例:
一种基于多评分卡融合的目标对象评价装置,包括:
特征获取模块,用于获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
分群模块,用于对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
模型构建模块,用于基于各所述目标特征变量集构建多个目标LR模型并生成各所述 目标LR模型的模型精度值;
模型融合模块,用于基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术实施例:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:
获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术实施例:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
与现有技术相比,本申请实施例提供的基于多评分卡融合的目标对象评价方法、装置、计算机设备及存储介质主要有以下有益效果:
在基于历史数据得到特征变量集后,通过分群得到多个互斥的目标特征变量集,再分别基于目标特征变量集构建多个LR模型,已得到多个评分卡模型,最终基于多个评分卡模型的融合得到目标评分卡模型,由此得到目标对象的目标评分值,可以降低特征变量偏移带来的模型风险,提升模型稳定性,同时避免特征变量过度去重带来的信息多样性损失,保证模型精度,从而提高目标对象评价的准确率。
附图说明
为了更清楚地说明本申请中的实施例,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,下面描述中的附图对应于本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的基于多评分卡融合的目标对象评价方法的一个实施例的流程图;
图3是根据本申请的基于多评分卡融合的目标对象评价装置的一个实施例的结构示意 图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请实施例,下面将结合附图,对本申请实施例中的技术实施例进行清楚、完整地描述。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。
需要说明的是,本申请实施例所提供的基于多评分卡融合的目标对象评价方法一般由服务器执行,相应地,基于多评分卡融合的目标对象评价装置一般设置于服务器中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,其示出了根据本申请的基于多评分卡融合的目标对象评价方法的一个实施例的流程图。所述的基于多评分卡融合的目标对象评价方法包括以下步骤:
S201,获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
S202,对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
S203,基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
S204,基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,并基于所述目标评分卡模型输出所述目标对象的目标评分值。
下面对上述步骤进行展开说明。
对于步骤S201,本实施例中所述目标对象主要为交易活动场景中的交易主体,这种交 易活动场景中被交易的对象不限于实体产品交易,也可包括金融产品交易、经验知识交易、劳动力交易等,比如保险行业的保险代理人招聘,可看做是劳动力的交易,保险代理人作为交易主体出售劳动力,因此所述目标对象可以是人或者企业,相应的,所述历史数据可包括属性信息和行为信息等不同维度的数据,比如在保险代理人招聘场景中,所述属性信息包括保险代理人基本信息,所述行为信息包括代理人招聘岗前班表现、保险代理人平台活跃情况、历史购买保单信息等,基于这些信息可提取与目标对象相关的多个维度的特征变量。
在得到历史数据后,需对其包含的各特征变量进行探索性数据分析(EDA,Exploratory Data Analysis),具体分析特征变量的数据分布特征(也即分布状态),包括但不限于数据饱和度、是否存在异常值、最大值、最小值、均值、分布类型等,之后根据数据分布特征进行数据清洗,处理获取的历史数据中的脏数据、缺失值、异常值等,比如处理缺失值时,可删除缺失率超过预设阈值(阈值根据情况自行设定,可取50%、70%、90%等)的特征变量。
完成数据清洗后,进行特征变量筛选时,具体基于特征变量的PSI(Population Stability Index,群体稳定性指数)值和IV(information value,信息价值或信息量)值剔除分布不稳定、预测能力差和预测能力不稳定的特征变量,得到筛选后的原始特征变量集,为便于后文描述,将所述原始特征变量集记为C 0
对于步骤S202,在本实施例中,所述对所述原始特征变量集进行分群操作可为将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作,具体可包括:
第一步,根据所述原始特征变量集中的特征变量的数量确定分群数量;
第二步,将所述原始特征变量集中的特征变量对应的训练样本集输入预设的LightGBM模型进行训练,输出模型训练过程中的各特征变量的信息增益值并进行排序,基于排序结果从所述原始特征变量集中筛选出若干特征变量,得到一个目标特征变量集;
第三步,基于筛选剩余的特征变量生成新的原始特征变量集,将所述新的原始特征变量集中的特征变量对应的训练样本集输入所述预设的LightGBM模型进行训练,输出模型训练过程中各特征变量的信息增益值并进行排序,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量,得到另一个目标特征变量集;
第四步,重复上一步骤,直到得到与所述分群数量一致的多个目标特征变量集,完成分群操作。
具体的,设定根据原始特征变量集中的特征变量的数量确定的分群数量为N,N为正整数,由于除第一次得到目标特征变量集以外的目标特征变量集都是基于筛选后剩余的特征变量获得,故N个目标特征变量集不存在相同的特征变量,因此通过上述步骤可得到N个互斥的目标特征变量集;在本实施例中,根据原始特征变量集中的特征变量的数量确认的分群数量N可取2或3。
进一步地,所述基于排序结果从所述原始特征变量集中筛选出若干特征变量包括:选取信息增益值排序靠前的若干特征变量,使得选取的所述若干特征变量的信息增益值总和与所述原始特征变量集中所有特征变量的信息增益值总和的比值超过预设增益阈值。其中,由于预设增益阈值的值越小,单模型信息去重度越高,但模型精度也会因过度信息去重而降低,故预设增益阈值可选择90%及以上。同样的,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量也采用上述方式,其预设增益阈值和基于排序结果从所述原始特征变量集中筛选出若干特征变量的筛选过程保持一致。
对于步骤S203,基于N个目标特征变量集构建N个相互独立的LR模型,并在生成LR模型后基于预测样本集得到各LR模型的模型精度值。其中,根据一个所述目标特征变量集构建一个所述目标LR模型包括:
将所述目标特征变量集中的特征变量的多个样本值进行分箱,计算每个所述分箱的 WOE值,以所述WOE值对所述目标特征变量集中的各特征变量的各个分箱进行编码,基于编码结果训练原始LR模型,判断所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数是否均为正,若是则所述训练后的LR模型即为构建的所述目标LR模型。
具体的,在对各特征变量的多个样本值进行分箱时,若多个样本值为离散值,则每个样本值为一个分箱,若多个样本值为连续值时,则可按照等频划分的方式将多个样本值进行分箱操作,得到若干分箱。
进一步地,当基于一个目标特征变量集训练后的LR模型中各特征变量的权重系数存在负数时,说明一些特征变量的线性相关性较强,需要进一步对各目标特征变量集进行特征变量的筛选,故所述方法还包括:当所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数存在负数时,对所述目标特征变量集进行特征变量的二次筛选。其中,所述对所述目标特征变量集进行特征变量的二次筛选包括:
根据所述目标特征变量集中各特征变量的信息增益值进行降序排列;从所述目标特征变量集中选择信息增益值靠前的预设数量的特征变量作为基本特征变量,得到基本特征变量集;将所述目标特征变量集中除所述基本特征变量以外的其他特征变量,按信息增益值从高到低的顺序逐一添加至所述基本特征变量集中进行LR模型训练,根据训练结果进行二次筛选,得到筛选后的目标特征变量集;具体的,所述基本特征变量集添加一个特征变量时,根据添加后的基本特征变量集进行LR模型训练,并判断所述添加后的基本特征变量集中的特征变量在训练后LR模型中的权重系数是否均为正,若是则保留当前添加的特征变量,否则剔除当前添加的特征变量,再添加下一个特征变量进行LR模型训练,并进行权重系数的正负判断,基于判断结果确定是否保留新添加的特征变量,直到最后一个添加的特征变量完成筛选。在得到筛选后的目标特征变量集的同时也完成LR模型的构建,其中所述预设数量可以为4至5个。
以C 1、C 2、……、C N分别表示N个目标特征变量集,以目标特征变量集C 1为例,假如C 1包含10特征变量,分别为x 1、x 2、…、x 10,初始基于x 1、x 2、…、x 10进行分箱和编码操作,并基于编码结果训练LR模型,当x 1、x 2、…、x 10中存在权重系数为负的情况时,对x 1、x 2、…、x 10的信息增益进行降序排列,假如选出x 2、x 4、x 8、x 10四个特征变量作为基本特征变量,得到基本特征变量集,其余6个特征变量基于信息增益的降序排列为x 3、x 1、x 5、x 7、x 9、x 6,则先将x 3加入基本特征变量集进行LR模型训练,并以x 2、x 4、x 8、x 10、x 3的基本特征变量集进行权重系数的正负判断,若存在负数,则将x 3剔除,否则保留,其中,当剔除x 3时,将x 1加入基本特征变量集进行LR模型训练,并进行x 2、x 4、x 8、x 10、x 1的权重系数的正负判断,当保留x 3时,将x 1加入基本特征变量集进行LR模型训练,并进行x 2、x 4、x 8、x 10、x 3、x 1的权重系数的正负判断,以此类推。此时基于目标特征变量集C 1生成LR模型M 1,再将预测样本集输入M 1中得到其对应的模型精度AUC值,记为AUC 1。对C 2、……、C N重复前述操作直至生成N个LR模型,并得到对应的模型精度AUC值,分别为AUC 2、……、AUC N
对于步骤S204,在本实施例中,根据LR模型得到评分卡模型以下式作为基础:
Figure PCTCN2021090154-appb-000001
A、B为系数。
将LR模型输出的样本为正样本的概率设为p(如违约概率为p),约定某特定概率p 0下预期评分为S 0,且概率翻倍后评分为S D,建立二元一次方程组求解A、B。
则当LR模型对应的目标特征变量集包含的特征变量为x 1、x 2、…、x n,且x 1、x 2、…、x n都进行了WOE编码时,可将各特征变量表示为(θ iω ijij的形式,此时根据LR模型得到的评分卡模型为下式:
Figure PCTCN2021090154-appb-000002
其中,A-B*θ 0为基础分数,θ i为LR模型中第i个特征变量的系数,ω ij为第i个特征变量的第j个分箱的WOE值,δ ij是0,1逻辑变量。重复以上操作直至生成N个评分卡模型,分别为S 1、S 2、…、S N,各评分卡中的p 0、S 0、S D一致。
最后根据各评分卡模型S 1、S 2、…、S N及对应的LR模型的精度值AUC 1、AUC 2、…、AUC N,进行融合,得到目标评分卡模型为S=S 1*AUC 1+S 2*AUC 2+...+S N*AUC N,通过目标评分卡模型即可输出目标对象的目标评分值。
本申请提供的基于多评分卡融合的目标对象评价方法,在基于历史数据得到特征变量集后,通过分群得到多个互斥的目标特征变量集,再分别基于目标特征变量集构建多个LR模型,已得到多个评分卡模型,最终基于多个评分卡模型的融合得到目标评分卡模型,由此得到目标对象的目标评分值,可以降低特征变量偏移带来的模型风险,提升模型稳定性,同时避免特征变量过度去重带来的信息多样性损失,保证模型精度,从而提高目标对象评价的准确率。
需要强调的是,为进一步保证信息的私密和安全性,历史数据中的隐私信息或者需防止被篡改的信息可以存储于一区块链的节点中。本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、计算机可读指令、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种基于多评分卡融合的目标对象评价装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图3所示,本实施例所述的基于多评分卡融合的目标对象评价装置包括:特征获取模块301、分群模块302、模型构建模块303以及模型融合模块304。其中,所述特征获取模块301用于获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;所述分群模块302用于对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集; 所述模型构建模块303用于基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;所述模型融合模块304用于基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
在本实施例中,特征获取模块301从历史数据中特征变量形成原始特征变量集的过程可参考上述方法实施例,在此不作展开。
在本实施例中,所述分群模块302对所述原始特征变量集进行分群操作时,用于将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作,具体用于根据所述原始特征变量集中的特征变量的数量确定分群数量;将所述原始特征变量集中的特征变量对应的训练样本集输入预设的LightGBM模型进行训练,输出模型训练过程中的各特征变量的信息增益值并进行排序,基于排序结果从所述原始特征变量集中筛选出若干特征变量,得到一个目标特征变量集;基于筛选剩余的特征变量生成新的原始特征变量集,将所述新的原始特征变量集中的特征变量对应的训练样本集输入所述预设的LightGBM模型进行训练,输出模型训练过程中各特征变量的信息增益值并进行排序,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量,得到另一个目标特征变量集;重复前一过程直到得到与所述分群数量一致的多个目标特征变量集,完成分群操作。
在本实施例中,所述分群模块302基于排序结果从所述原始特征变量集中筛选出若干特征变量时具体用于:选取信息增益值排序靠前的若干特征变量,使得选取的所述若干特征变量的信息增益值总和与所述原始特征变量集中所有特征变量的信息增益值总和的比值超过预设增益阈值。具体参考上述方法实施例,在此不作展开。
在本实施例中,所述模型构建模块303根据一个所述目标特征变量集构建一个所述目标LR模型时,具体用于:将所述目标特征变量集中的特征变量的多个样本值进行分箱,计算每个所述分箱的WOE值,以所述WOE值对所述目标特征变量集中的各特征变量的各个分箱进行编码,基于编码结果训练原始LR模型,判断所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数是否均为正,若是则所述训练后的LR模型即为构建的所述目标LR模型。其中,在对各特征变量的多个样本值进行分箱时,若多个样本值为离散值,则每个样本值为一个分箱,若多个样本值为连续值时,则可按照等频划分的方式将多个样本值进行分箱操作,得到若干分箱。
在本实施例中,所述模型构建模块303方法还用于:当所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数存在负数时,对所述目标特征变量集进行特征变量的二次筛选。具体进行二次筛选的过程可参考上述方法实施例,在此不作展开。
在本实施例中,模型融合模块304具体用于将各评分卡模型及对应的LR模型的精度值进行加权融合,得到目标评分卡模型,具体过程可参考上述方法实施例,在此不作展开。
本申请提供的基于多评分卡融合的目标对象评价装置,在基于历史数据得到特征变量集后,通过分群得到多个互斥的目标特征变量集,再分别基于目标特征变量集构建多个LR模型,已得到多个评分卡模型,最终基于多个评分卡模型的融合得到目标评分卡模型,由此得到目标对象的目标评分值,可以降低特征变量偏移带来的模型风险,提升模型稳定性,同时避免特征变量过度去重带来的信息多样性损失,保证模型精度,从而提高目标对象评价的准确率。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43,所述存储器41中存储有计算机可读指令,所述处理器42执行所述计算机可读指令时实现上述方法实施例中所述的基于多评分卡融合的目标对象评价方法的步骤,并具有与上述基于多评分卡融合的目标对象评价方法相对应的有益效 果,在此不作展开。
需要指出的是,图中仅示出了具有存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
在本实施例中,所述存储器41至少包括一种类型的可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,具体的,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如对应于上述基于多评分卡融合的目标对象评价方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行对应于所述基于多评分卡融合的目标对象评价方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于多评分卡融合的目标对象评价方法的步骤,并具有与上述基于多评分卡融合的目标对象评价方法相对应的有益效果,在此不作展开。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术实施例本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其 依然可以对前述各具体实施方式所记载的技术实施例进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种基于多评分卡融合的目标对象评价方法,包括下述步骤:
    获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
    对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
    基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
    基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
  2. 根据权利要求1所述的基于多评分卡融合的目标对象评价方法,其中,所述对所述原始特征变量集进行分群操作包括:将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作。
  3. 根据权利要求2所述的基于多评分卡融合的目标对象评价方法,其中,所述将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作包括:
    根据所述原始特征变量集中的特征变量的数量确定分群数量;
    将所述原始特征变量集中的特征变量对应的训练样本集输入预设的LightGBM模型进行训练,输出模型训练过程中的各特征变量的信息增益值并进行排序,基于排序结果从所述原始特征变量集中筛选出若干特征变量,得到一个目标特征变量集;
    基于筛选剩余的特征变量生成新的原始特征变量集,将所述新的原始特征变量集中的特征变量对应的训练样本集输入所述预设的LightGBM模型进行训练,输出模型训练过程中各特征变量的信息增益值并进行排序,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量,得到另一个目标特征变量集;
    重复上一步骤,直到得到与所述分群数量一致的多个目标特征变量集,完成分群操作。
  4. 根据权利要求3所述的基于多评分卡融合的目标对象评价方法,其中,所述基于排序结果从所述原始特征变量集中筛选出若干特征变量包括:选取信息增益值排序靠前的若干特征变量,使得选取的所述若干特征变量的信息增益值总和与所述原始特征变量集中所有特征变量的信息增益值总和的比值超过预设增益阈值。
  5. 根据权利要求2至4任一项所述的基于多评分卡融合的目标对象评价方法,其中,根据一个所述目标特征变量集构建一个所述目标LR模型包括:将所述目标特征变量集中的特征变量的多个样本值进行分箱,计算每个所述分箱的WOE值,以所述WOE值对所述目标特征变量集中的各特征变量的各个分箱进行编码,基于编码结果训练原始LR模型,判断所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数是否均为正,若是则所述训练后的LR模型即为构建的所述目标LR模型。
  6. 根据权利要求5所述的基于多评分卡融合的目标对象评价方法,其中,当所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数存在负数时,所述方法还包括对所述目标特征变量集进行特征变量的二次筛选。
  7. 根据权利要求6所述的基于多评分卡融合的目标对象评价方法,其中,所述对所述目标特征变量集进行特征变量的二次筛选包括:
    根据所述目标特征变量集中各特征变量的信息增益值进行降序排列;
    从所述目标特征变量集中选择信息增益值靠前的预设数量的特征变量作为基本特征变量,得到基本特征变量集;
    将所述目标特征变量集中除所述基本特征变量以外的其他特征变量,按信息增益值从高到低的顺序逐一添加至所述基本特征变量集中进行LR模型训练,根据训练结果进行二次筛选,得到筛选后的目标特征变量集;具体的,所述基本特征变量集添加一个特征变量 时,根据添加后的基本特征变量集进行LR模型训练,并判断所述添加后的基本特征变量集中的特征变量在训练后LR模型中的权重系数是否均为正,若是则保留当前添加的特征变量,否则剔除当前添加的特征变量,再添加下一个特征变量进行LR模型训练,并进行权重系数的正负判断,基于判断结果确定是否保留新添加的特征变量,直到最后一个添加的特征变量完成筛选。
  8. 一种基于多评分卡融合的目标对象评价装置,包括:
    特征获取模块,用于获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
    分群模块,用于对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
    模型构建模块,用于基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
    模型融合模块,用于基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:
    获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
    对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
    基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
    基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述对所述原始特征变量集进行分群操作的步骤时,具体实现如下步骤:
    将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作。
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作的步骤时,具体实现如下步骤:
    根据所述原始特征变量集中的特征变量的数量确定分群数量;
    将所述原始特征变量集中的特征变量对应的训练样本集输入预设的LightGBM模型进行训练,输出模型训练过程中的各特征变量的信息增益值并进行排序,基于排序结果从所述原始特征变量集中筛选出若干特征变量,得到一个目标特征变量集;
    基于筛选剩余的特征变量生成新的原始特征变量集,将所述新的原始特征变量集中的特征变量对应的训练样本集输入所述预设的LightGBM模型进行训练,输出模型训练过程中各特征变量的信息增益值并进行排序,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量,得到另一个目标特征变量集;
    重复上一步骤,直到得到与所述分群数量一致的多个目标特征变量集,完成分群操作。
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述基于排序结果从所述原始特征变量集中筛选出若干特征变量的步骤时,具体实现 如下步骤:
    选取信息增益值排序靠前的若干特征变量,使得选取的所述若干特征变量的信息增益值总和与所述原始特征变量集中所有特征变量的信息增益值总和的比值超过预设增益阈值。
  13. 根据权利要求10至12任一项所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现根据一个所述目标特征变量集构建一个所述目标LR模型的步骤时,具体实现如下步骤:
    将所述目标特征变量集中的特征变量的多个样本值进行分箱,计算每个所述分箱的WOE值,以所述WOE值对所述目标特征变量集中的各特征变量的各个分箱进行编码,基于编码结果训练原始LR模型,判断所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数是否均为正,若是则所述训练后的LR模型即为构建的所述目标LR模型。
  14. 根据权利要求13所述的计算机设备,其中,当所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数存在负数时,所述处理器执行所述计算机可读指令时还实现对所述目标特征变量集进行特征变量的二次筛选的步骤,具体包括:
    根据所述目标特征变量集中各特征变量的信息增益值进行降序排列;
    从所述目标特征变量集中选择信息增益值靠前的预设数量的特征变量作为基本特征变量,得到基本特征变量集;
    将所述目标特征变量集中除所述基本特征变量以外的其他特征变量,按信息增益值从高到低的顺序逐一添加至所述基本特征变量集中进行LR模型训练,根据训练结果进行二次筛选,得到筛选后的目标特征变量集;具体的,所述基本特征变量集添加一个特征变量时,根据添加后的基本特征变量集进行LR模型训练,并判断所述添加后的基本特征变量集中的特征变量在训练后LR模型中的权重系数是否均为正,若是则保留当前添加的特征变量,否则剔除当前添加的特征变量,再添加下一个特征变量进行LR模型训练,并进行权重系数的正负判断,基于判断结果确定是否保留新添加的特征变量,直到最后一个添加的特征变量完成筛选。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
    获取目标对象的历史数据,根据所述历史数据包含的特征变量的分布状态进行数据清洗,对清洗后的所述历史数据进行特征变量筛选,得到原始特征变量集;
    对所述原始特征变量集进行分群操作,得到多个互斥的目标特征变量集;
    基于各所述目标特征变量集构建多个目标LR模型并生成各所述目标LR模型的模型精度值;
    基于各所述目标LR模型生成多个评分卡模型,其中通过多个所述评分卡模型可得到所述目标对象的多个评分值,根据各所述评分卡模型及其对应的目标LR模型的模型精度值进行评分卡融合,得到目标评分卡模型,基于所述目标评分卡模型可输出所述目标对象的目标评分值。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述对所述原始特征变量集进行分群操作的步骤时,具体执行如下步骤:
    将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述将所述原始特征变量集对应的训练样本集输入预设的LightGBM模型进行训练,通过输出的各所述特征变量提供的信息增益来进行分群操作的步骤时,具体执行如下步骤:
    根据所述原始特征变量集中的特征变量的数量确定分群数量;
    将所述原始特征变量集中的特征变量对应的训练样本集输入预设的LightGBM模型进行训练,输出模型训练过程中的各特征变量的信息增益值并进行排序,基于排序结果从所述原始特征变量集中筛选出若干特征变量,得到一个目标特征变量集;
    基于筛选剩余的特征变量生成新的原始特征变量集,将所述新的原始特征变量集中的特征变量对应的训练样本集输入所述预设的LightGBM模型进行训练,输出模型训练过程中各特征变量的信息增益值并进行排序,基于排序结果从所述新的原始特征变量集中筛选出若干特征变量,得到另一个目标特征变量集;
    重复上一步骤,直到得到与所述分群数量一致的多个目标特征变量集,完成分群操作。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述基于排序结果从所述原始特征变量集中筛选出若干特征变量的步骤时,具体执行如下步骤:
    选取信息增益值排序靠前的若干特征变量,使得选取的所述若干特征变量的信息增益值总和与所述原始特征变量集中所有特征变量的信息增益值总和的比值超过预设增益阈值。
  19. 根据权利要求16至18任一项所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行根据一个所述目标特征变量集构建一个所述目标LR模型的步骤时,具体执行如下步骤:
    将所述目标特征变量集中的特征变量的多个样本值进行分箱,计算每个所述分箱的WOE值,以所述WOE值对所述目标特征变量集中的各特征变量的各个分箱进行编码,基于编码结果训练原始LR模型,判断所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数是否均为正,若是则所述训练后的LR模型即为构建的所述目标LR模型。
  20. 根据权利要求19所述的计算机可读存储介质,其中,当所述目标特征变量集中的各特征变量在训练后的LR模型中的权重系数存在负数时,所述计算机可读指令被所述处理器执行,使得所述处理器还执行对所述目标特征变量集进行特征变量的二次筛选的步骤,具体包括:
    根据所述目标特征变量集中各特征变量的信息增益值进行降序排列;
    从所述目标特征变量集中选择信息增益值靠前的预设数量的特征变量作为基本特征变量,得到基本特征变量集;
    将所述目标特征变量集中除所述基本特征变量以外的其他特征变量,按信息增益值从高到低的顺序逐一添加至所述基本特征变量集中进行LR模型训练,根据训练结果进行二次筛选,得到筛选后的目标特征变量集;具体的,所述基本特征变量集添加一个特征变量时,根据添加后的基本特征变量集进行LR模型训练,并判断所述添加后的基本特征变量集中的特征变量在训练后LR模型中的权重系数是否均为正,若是则保留当前添加的特征变量,否则剔除当前添加的特征变量,再添加下一个特征变量进行LR模型训练,并进行权重系数的正负判断,基于判断结果确定是否保留新添加的特征变量,直到最后一个添加的特征变量完成筛选。
PCT/CN2021/090154 2020-12-31 2021-04-27 基于多评分卡融合的目标对象评价方法及其相关设备 WO2022142001A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011617815.6 2020-12-31
CN202011617815.6A CN112766649B (zh) 2020-12-31 2020-12-31 基于多评分卡融合的目标对象评价方法及其相关设备

Publications (1)

Publication Number Publication Date
WO2022142001A1 true WO2022142001A1 (zh) 2022-07-07

Family

ID=75697835

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090154 WO2022142001A1 (zh) 2020-12-31 2021-04-27 基于多评分卡融合的目标对象评价方法及其相关设备

Country Status (2)

Country Link
CN (1) CN112766649B (zh)
WO (1) WO2022142001A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997419A (zh) * 2022-07-18 2022-09-02 北京芯盾时代科技有限公司 评分卡模型的更新方法、装置、电子设备及存储介质
CN115860275A (zh) * 2023-02-23 2023-03-28 深圳市南湖勘测技术有限公司 一种用于土地整备利益统筹测绘采集方法及系统
CN116364223A (zh) * 2023-05-26 2023-06-30 平安科技(深圳)有限公司 特征处理方法、装置、计算机设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130623B (zh) * 2022-09-01 2022-11-25 浪潮通信信息系统有限公司 数据融合方法、装置、电子设备及存储介质
CN115564069A (zh) * 2022-09-28 2023-01-03 北京百度网讯科技有限公司 服务器维保策略的确定方法、模型的生成方法及其装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408423A (zh) * 2016-11-25 2017-02-15 泰康保险集团股份有限公司 用于风险评估的方法、系统及构建风险评估系统的方法
CN108665120A (zh) * 2017-03-27 2018-10-16 阿里巴巴集团控股有限公司 打分模型的建立、用户信用的评估方法及装置
CN111191889A (zh) * 2019-12-17 2020-05-22 东方微银科技(北京)有限公司 一种基于逻辑回归与投票式模型集成的评分卡开发方法
US20200225222A1 (en) * 2017-09-28 2020-07-16 Immunarray Ltd. Sle disease management
CN111915428A (zh) * 2020-08-10 2020-11-10 杭州排列科技有限公司 一种基于决策树特征融合的评分卡模型优化方法
CN111931848A (zh) * 2020-08-10 2020-11-13 中国平安人寿保险股份有限公司 数据的特征提取方法、装置、计算机设备及存储介质
CN112037005A (zh) * 2020-07-21 2020-12-04 苏宁金融科技(南京)有限公司 一种评分卡的融合方法、装置、计算机设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610516B (zh) * 2009-08-04 2011-12-21 华为技术有限公司 自组织网络中的入侵检测方法与设备
CN105786860B (zh) * 2014-12-23 2020-07-07 华为技术有限公司 一种数据建模中的数据处理方法及装置
CN107193804B (zh) * 2017-06-02 2019-03-29 河海大学 一种面向词和组合词的垃圾短信文本特征选择方法
CN108416495B (zh) * 2018-01-30 2021-02-26 杭州排列科技有限公司 基于机器学习的评分卡模型建立方法及装置
CN109272402A (zh) * 2018-10-08 2019-01-25 深圳市牛鼎丰科技有限公司 评分卡的建模方法、装置、计算机设备及存储介质
US11544630B2 (en) * 2018-10-15 2023-01-03 Oracle International Corporation Automatic feature subset selection using feature ranking and scalable automatic search
CN109598095B (zh) * 2019-01-07 2023-08-08 平安科技(深圳)有限公司 评分卡模型的建立方法、装置、计算机设备和存储介质
CN112035549B (zh) * 2020-08-31 2023-12-08 中国平安人寿保险股份有限公司 数据挖掘方法、装置、计算机设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408423A (zh) * 2016-11-25 2017-02-15 泰康保险集团股份有限公司 用于风险评估的方法、系统及构建风险评估系统的方法
CN108665120A (zh) * 2017-03-27 2018-10-16 阿里巴巴集团控股有限公司 打分模型的建立、用户信用的评估方法及装置
US20200225222A1 (en) * 2017-09-28 2020-07-16 Immunarray Ltd. Sle disease management
CN111191889A (zh) * 2019-12-17 2020-05-22 东方微银科技(北京)有限公司 一种基于逻辑回归与投票式模型集成的评分卡开发方法
CN112037005A (zh) * 2020-07-21 2020-12-04 苏宁金融科技(南京)有限公司 一种评分卡的融合方法、装置、计算机设备及存储介质
CN111915428A (zh) * 2020-08-10 2020-11-10 杭州排列科技有限公司 一种基于决策树特征融合的评分卡模型优化方法
CN111931848A (zh) * 2020-08-10 2020-11-13 中国平安人寿保险股份有限公司 数据的特征提取方法、装置、计算机设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997419A (zh) * 2022-07-18 2022-09-02 北京芯盾时代科技有限公司 评分卡模型的更新方法、装置、电子设备及存储介质
CN115860275A (zh) * 2023-02-23 2023-03-28 深圳市南湖勘测技术有限公司 一种用于土地整备利益统筹测绘采集方法及系统
CN115860275B (zh) * 2023-02-23 2023-05-05 深圳市南湖勘测技术有限公司 一种用于土地整备利益统筹测绘采集方法及系统
CN116364223A (zh) * 2023-05-26 2023-06-30 平安科技(深圳)有限公司 特征处理方法、装置、计算机设备及存储介质
CN116364223B (zh) * 2023-05-26 2023-08-29 平安科技(深圳)有限公司 特征处理方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112766649A (zh) 2021-05-07
CN112766649B (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2022142001A1 (zh) 基于多评分卡融合的目标对象评价方法及其相关设备
CN112148987B (zh) 基于目标对象活跃度的消息推送方法及相关设备
WO2022126961A1 (zh) 针对数据偏移的目标对象行为预测方法及其相关设备
WO2021155713A1 (zh) 基于权重嫁接的模型融合的人脸识别方法及相关设备
WO2022105117A1 (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN112035549B (zh) 数据挖掘方法、装置、计算机设备及存储介质
WO2021068513A1 (zh) 异常对象识别方法、装置、介质及电子设备
CN112633973A (zh) 一种商品推荐方法及其相关设备
CN110148053B (zh) 用户信贷额度评估方法、装置、电子设备和可读介质
CN104077723A (zh) 一种社交网络推荐系统及方法
CN112508075A (zh) 基于横向联邦的dbscan聚类方法、及其相关设备
CN115795000A (zh) 基于联合相似度算法对比的围标识别方法和装置
CN112734104A (zh) 一种融合双生成器双判别器的生成对抗网络和自编码器的跨域推荐方法
CN112990583B (zh) 一种数据预测模型的入模特征确定方法及设备
CN112328657A (zh) 特征衍生方法、装置、计算机设备及介质
CN111931848A (zh) 数据的特征提取方法、装置、计算机设备及存储介质
CN111209403B (zh) 数据处理方法、装置、介质及电子设备
CN112199374A (zh) 针对数据缺失的数据特征挖掘方法及其相关设备
WO2021000411A1 (zh) 基于神经网络的文档分类方法、装置、设备及存储介质
CN115099875A (zh) 基于决策树模型的数据分类方法及相关设备
CN114925275A (zh) 产品推荐方法、装置、计算机设备及存储介质
CN114911952A (zh) 一种数据修正方法、装置、计算机设备及存储介质
CN114900364A (zh) 基于溯源图和异构图神经网络的高级持续威胁检测方法
CN109614587B (zh) 一种智能人脉关系分析建模方法、终端设备及存储介质
CN114332472A (zh) 一种基于图神经网络的数据处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912764

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912764

Country of ref document: EP

Kind code of ref document: A1