CN115238148A - Characteristic combination screening method for multi-party enterprise joint credit rating and application - Google Patents

Characteristic combination screening method for multi-party enterprise joint credit rating and application Download PDF

Info

Publication number
CN115238148A
CN115238148A CN202211146961.4A CN202211146961A CN115238148A CN 115238148 A CN115238148 A CN 115238148A CN 202211146961 A CN202211146961 A CN 202211146961A CN 115238148 A CN115238148 A CN 115238148A
Authority
CN
China
Prior art keywords
credit rating
feature
data
enterprise
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211146961.4A
Other languages
Chinese (zh)
Inventor
陈定
徐行
吴俊杰
刘冠男
陈宏�
张丽君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hengtai Technology Co ltd
Original Assignee
Hangzhou Hengtai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hengtai Technology Co ltd filed Critical Hangzhou Hengtai Technology Co ltd
Priority to CN202211146961.4A priority Critical patent/CN115238148A/en
Publication of CN115238148A publication Critical patent/CN115238148A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The scheme provides a characteristic combination screening method and application for multi-party enterprise joint credit rating, an enterprise credit rating model which is formed by at least two parties participating in modeling in longitudinal federal learning is constructed, and the marginal contribution of characteristic data of each party to the Shapril value of the trained enterprise credit rating model is obtained; the method comprises the steps of selecting feature data with the largest marginal contribution amount as initial feature data, iteratively screening effective feature data by adopting a greedy algorithm based on the initial feature data, integrating the initial feature data and the effective feature data as feature combinations, reducing overfitting of a model while effectively controlling data cost on the premise of ensuring privacy of multi-party enterprise credit data, and ensuring that the prediction performance of an enterprise credit rating model is not influenced.

Description

Characteristic combination screening method for multi-party enterprise joint credit rating and application
Technical Field
The application relates to the field of enterprise rating, in particular to a feature combination screening method and application for multi-party enterprise joint credit rating.
Background
And the enterprise credit rating refers to the activity of credit evaluation organizations for evaluating credit levels of the collected enterprise credit information according to certain indexes. The main contents of the enterprise main body credit analysis comprise: industry, enterprise quality, management, financial status, liability and liability paying capacity. Because the non-linear relation often exists between each financial index of enterprise and the credit risk, therefore artificial neural network relatively is applicable to the credit evaluation of enterprise. At present, all organizations are beginning to establish or optimize enterprise credit rating models, but most organizations are based on enterprise data owned by the organizations, and because the enterprise data in most organizations are limited, the credit rating models of all organizations often cannot achieve the expected practical effect.
If a better enterprise rating result needs to be obtained, enterprise credit data of multiple parties needs to be combined, however, data between different organizations have data barriers and supervision requirements of data privacy protection, so that different organizations cannot generally transmit or exchange data, a data island problem is caused, and popularization and development of an enterprise credit rating model are limited. The federal learning technology is a solution in the scene, and can protect data privacy while ensuring model performance. The main idea of the federal learning paradigm is to build machine learning models based on training data sets that exist across multiple data sources without requiring direct exchange of data between individual data owners. The mode allows a plurality of participants to cooperatively train a joint model and save the data of the joint model locally, so that the problem of data privacy safety is solved to a certain extent, and joint modeling is realized.
However, in the scheme of constructing the enterprise credit rating model of the combined multi-party enterprise credit data by using the federal learning, a longitudinal federal learning mode is adopted, the specific characteristics of each participant have repeated or highly-related characteristics, and the federal learning mode can be used for training all characteristic data to cause the problem of model overfitting; in addition, more feature data participating in learning means higher data cost, so it is desirable to be able to use as little feature data as possible, either from the computational stress of the model itself or the cost of computing enterprise credit ratings.
Disclosure of Invention
The scheme of the application provides a feature combination screening method and application for multi-party enterprise joint credit rating, on the premise that privacy of credit data of a multi-party enterprise is guaranteed, proper feature data combinations are screened from a plurality of feature data of each participant to be used for building an enterprise credit rating model, overfitting of the model is reduced, data cost is effectively controlled, and prediction performance of the enterprise credit rating model can be guaranteed not to be affected.
In order to achieve the above object, the present technical solution provides a feature combination screening method for multi-party enterprise joint credit rating, including:
constructing an enterprise credit rating model which is learned and participated in modeling by at least two participants in a longitudinal federal manner, and acquiring the marginal contribution of the feature data of each participant to the Shapril value of the trained enterprise credit rating model;
selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data to serve as a characteristic combination.
In a second aspect, the scheme provides an enterprise credit rating model, and the enterprise credit rating model is obtained by training the feature combinations obtained by screening by the feature combination screening method for multi-party enterprise combined credit rating.
In a third aspect, the present solution provides a feature combination screening device for multi-party enterprise joint credit rating, including:
the system comprises a marginal sharing amount obtaining unit, a data processing unit and a data processing unit, wherein the marginal sharing amount obtaining unit is used for constructing an enterprise credit rating model which is learned and participated in modeling by at least two participants in a longitudinal federal study, and obtaining the marginal contribution amount of the characteristic data of each participant to the Shapril value of the trained enterprise credit rating model;
and the screening unit is used for selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data to serve as a characteristic combination.
In a fourth aspect, the present disclosure provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for feature combination screening for multi-party enterprise federated credit rating.
In a fifth aspect, the present solution provides a readable storage medium having stored thereon a computer program comprising program code for controlling a process to execute a process comprising a method of screening a combination of features according to the multiparty enterprise joint credit rating.
Compared with the prior art, the technical scheme has the following characteristics and beneficial effects:
aiming at the longitudinal federated learning mode of each participant, under the condition that the data of each participant is not directly visible, proper feature data are screened from a plurality of credit ratings of a plurality of participants by combining the contribution degree of a Shapril value and a greedy algorithm, so that the feature data which are most needed by an enterprise credit rating model for the longitudinal federated learning under the scene are effectively screened out, the phenomenon of model overfitting caused by the fact that overlapped features or highly-related features of each participant participate in model training is effectively avoided, the cost of joint modeling is controlled, and the performance of the final credit rating model is improved as much as possible.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for screening feature combinations for joint credit rating for multiple parties according to the present solution;
FIG. 2 is a logic diagram of the feature combination screening method for multi-party enterprise federated credit rating in the present scheme;
FIG. 3 is a schematic illustration of longitudinal federal learning;
FIG. 4 is a schematic diagram of a feature combination screening mechanism for multi-party enterprise federated credit rating in accordance with the present solution;
fig. 5 is a schematic diagram of an electronic device implementing the method for screening feature combinations for multi-party enterprise federated credit rating.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Example one
Before introducing the present solution, the technical terms involved in the present solution are explained first:
longitudinal federal learning: the federal learning composed of participants having the same sample space and different feature spaces on the data set can also be understood as the federal learning divided by features. Mainly consists of two parts: first aligning entities having the same ID but distributed among different parties; cryptographic model training is then performed on these aligned entities.
And (3) enterprise credit rating: the credit evaluation organization carries out credit rating activities on the collected enterprise credit information according to certain indexes.
A Shapril value: shapley value, a method for allocating benefits, evaluates how to allocate benefits according to contributions, and the contributions are in direct proportion to gains, and mainly consists of four axioms: symmetry, availability, redundancy, and independence.
The application scheme provides a feature combination screening method for multi-party enterprise combination credit rating, which comprises the following steps:
constructing an enterprise credit rating model which is learned and participated in modeling by at least two participants in a longitudinal federal manner, and acquiring the marginal contribution of the feature data of each participant to the Shapril value of the trained enterprise credit rating model;
selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data to be used as a characteristic combination.
It is worth mentioning that the scheme creatively combines a greedy algorithm with the marginal contribution of the Shapril value, screens characteristic combinations of a plurality of data sources from a plurality of participants, and screens effective characteristic data from a plurality of characteristic data for building an enterprise credit rating model.
In the step of establishing an enterprise credit rating model which is participated in modeling by at least two participants through longitudinal federal learning, and acquiring the marginal contribution of the feature data of each participant to the Shapril value of the trained enterprise credit rating model, the participants in the scheme refer to the participants providing the enterprise credit rating data, can be financial institutions or other enterprise management institutions, and each participant provides own enterprise credit rating data to train the enterprise credit rating model; the feature data refers to enterprise credit rating data.
In an embodiment of the scheme, a longitudinal federated learning model is selected for modeling aiming at a multi-party enterprise credit rating model, and in some embodiments, the model may be selected as a longitudinal logistic regression model (LR), a multilayer perceptron Model (MLP), a federated Support Vector Machine (SVM), or the like.
The schematic diagram of longitudinal federal learning is shown in fig. 3, wherein feature data of each participant is firstly aligned with a sample and then encrypted for training.
Specifically, "constructing an enterprise credit rating model that is participated in modeling by at least two participants in longitudinal federal learning" includes the following steps:
acquiring feature data of each participant, wherein the feature data are marked with enterprise credit rating results, and the feature data have the same or overlapped sample space but different feature spaces;
sample alignment is carried out on the feature data of each participant, and the aligned feature data is calculated, encrypted and input into an enterprise credit rating model for learning;
and globally aggregating the feature data uploaded by each participant and then updating the enterprise credit rating model.
In an embodiment of the present solution, a scheme of privacy intersection PSI is adopted to perform sample alignment on feature data of each participant, so as to find an overlapping portion of the feature data of each participant, and use an overlapping sample space for subsequent model training. In some embodiments, a scheme based on Bind RSA and a hash algorithm may be selected to perform sample alignment on feature data of each participant, so as to obtain an overlapped sample space for subsequent model training. Because the source of the characteristic data of different participants is different, the scheme adopts an encryption-based user ID alignment technology to ensure that different participants can align common users without exposing respective original data.
In the specific embodiment of the scheme, an RSA public key encryption algorithm is adopted, the Client side has a public key e and a random number, the Server side has a private key d and a public key e, the final encrypted splicing key is encrypted through the private key d, and based on the complexity of RSA factorization, the Client side obtains the encryption based on the private key d in a mode of blinding and then blinding. The Server side can generate the encryption based on the private key d, but cannot unlock the Client side to seek the concatenation key for assisting the encryption generated by the private key d because the concatenation key is blinded and disturbed by the random number. Therefore, the scheme realizes sample alignment on the premise of protecting privacy.
In another embodiment of the scheme, each participant uses its own feature data to perform calculation and encryption, and all participants locally use their own feature data to perform local calculation and encrypt intermediate results such as data or gradient.
Specifically, the trusted third party issues the selected enterprise credit rating model to each participant, each participant calculates by using local characteristic data to obtain intermediate results such as gradient and the like, encrypts in a homomorphic encryption manner and uploads to the server.
In the embodiment of the scheme, the trusted third party serves as a server to help each participant perform global aggregation by using a local calculation result so as to update the enterprise credit rating model, and the updated enterprise credit rating model is issued to each participant to perform the next round of calculation.
Specifically, the third-party service end conducts aggregate updating on the enterprise credit rating model. According to the local calculation results uploaded by all the participants, the server side realizes the processes of global gradient aggregation, public key decryption and the like, and issues the results to all the participants again to complete the round of model updating.
In the step of 'acquiring the marginal contribution amount of the characteristic data of each participant to the Shapril value of the trained enterprise credit rating model', the characteristic data of each participant is respectively input into the enterprise credit rating model to obtain an optimization index, and the marginal contribution amount of the Shapril value of each characteristic data is calculated based on the optimization index.
The logic of the scheme is that the contribution degree of the characteristic data is judged through the improvement of marginal performance brought to the enterprise credit rating model after the characteristic data participates in the enterprise credit rating model; in general, the most contributing feature data indicates the greater its importance in the enterprise credit rating model.
Specifically, the method simulates the model performance improvement effect brought by the fact that each feature data of each participant participates in the enterprise credit rating model respectively, and calculates the marginal contribution amount of the Shapril value of each feature data based on the accuracy of the enterprise credit rating.
The scheme optimizes a calculation formula of the marginal contribution amount as follows:
Figure DEST_PATH_IMAGE002
where S represents a global feature set consisting of all feature data of the N participants,v(Q) represents some subset of features Q \8838Sthe value of the federal model when used for modeling,φ i (ν) Individual characteristic data representing a partyiMarginal contribution of the Shapril value participating in the Enterprise Credit rating model.
After the characteristic data with the largest marginal contribution amount is obtained, the scheme screens characteristic combinations by using a greedy algorithm. Greedy algorithms are relative to "optimization algorithms" whose goal is to search for solutions that are typically locally optimal but can be maximally close to optimal, guided by some heuristic.
Specifically, the "iteratively screening valid feature data by using a greedy algorithm based on the initial feature data" includes the following steps:
s1: integrating all feature data of all participants into a global feature set, and selecting initial feature data as the first feature of the modelP 1 =p 1
S2:Fixing the first selected initial feature combinationP k Sequentially combining the rest of the features in the global feature set S with the initial featuresP k Are combined to obtain a composition comprisingk+1 candidate feature combinations of feature data, and allk+1 candidate feature combination sandpril value margin contribution, selecting the candidate feature combination with the largest contribution
P k +1 =(p 1 , p 2 ,..., p k , p k+1 );
S3: calculating new added feature data relative to the initial feature combination in the candidate feature combinationp k+1 Incremental contribution to the value of Shapril
Figure DEST_PATH_IMAGE004
S4: determining the increment deltaφ P Whether the gamma value is smaller than the set threshold value gamma or not, if not, repeating S2 to S4; if yes, the newly added features are considered to have small contribution to the federal model, and the screened feature combinations are obtained at the momentP m =(p 1 , p 2 ,..., p m )。
That is to say, the scheme judges whether the feature data is valuable or not by calculating the increment of the margin contribution of the added feature data to the whole feature combination, if so, the feature data is integrated into the feature combination, so that the finally obtained feature combination can most effectively generate data contribution to the Federal model, and the enterprise credit rating model trained according to the feature data meets the requirement of Federal learning.
In addition, in "calculate allk+1 marginal contribution amount of candidate feature combination of Shapril value ", calculating group Shapril value by group Shapril value for marginal contribution amount of candidate feature combination of multiple feature data combination, and candidate feature combination by assumption that Shapril group interaction index is zeroPThe group of salapril values of (a) were calculated as follows:
Figure DEST_PATH_IMAGE006
wherein S represents a global feature set consisting of all feature data of N participants, v (Q) represents a certain feature subset Q \8838Sis used for modeling the value of a federal model,φ P (ν) Then the marginal contribution of the salpril values for which the feature combination P participates in the federal model is represented and Q represents some subset of features.
According to the calculation formula of the marginal contribution amount optimized by the scheme, when the Shapril value of a certain feature subset is calculated as the marginal contribution amount, a plurality of feature data are regarded as a whole 'combined feature data set', the Shapril value does not need to be calculated independently for all feature data in the feature space of the feature subset, the contribution of a certain feature combination is obtained more easily in the feature screening process, and the calculation efficiency is improved.
In summary, the scheme provides a feature combination screening method and application for multi-party enterprise joint credit rating, and aiming at a longitudinal federal learning multi-party joint modeling scene, each participant adopts own feature data to perform local calculation and encryption, and then the server performs global model aggregation to obtain an enterprise credit rating model with better performance. And evaluating the contribution degree of single feature data through the marginal contribution amount of each feature data provided by each participant to the Shapley value of the global enterprise credit rating model, and then adopting a heuristic greedy algorithm to perform subsequent feature combination screening. The marginal contribution amount after the combination of the plurality of features is calculated by adopting a mode of Group Shapley Group Value, a simple calculation mode of the Group Shapley Value is obtained according to the assumption that the Shapley Group Interaction Index is zero, and the plurality of features are combined into one combined feature for calculation. The most effective feature combination for ensuring the performance of the credit rating model is obtained through feature screening, and the effect of the credit rating model of the multi-party enterprise is improved as much as possible.
In a second aspect, the present disclosure provides an enterprise credit rating model, which is obtained by training an enterprise credit rating model by using the feature combinations obtained by screening the feature combination screening method for multi-party enterprise combined credit rating.
The enterprise credit rating model obtained by the scheme training can utilize limited characteristic data of a plurality of participants to evaluate the enterprise credit rating as economic and effective evaluation.
Example two
Based on the same technical conception, the scheme provides a characteristic combination screening device for multi-party enterprise combined credit rating, which comprises the following steps:
the system comprises a marginal sharing amount obtaining unit, a data processing unit and a data processing unit, wherein the marginal sharing amount obtaining unit is used for constructing an enterprise credit rating model which is learned and participated in modeling by at least two participants in a longitudinal federal study, and obtaining the marginal contribution amount of the characteristic data of each participant to the Shapril value of the trained enterprise credit rating model;
and the screening unit is used for selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data as a characteristic combination.
EXAMPLE III
The embodiment further provides an electronic apparatus, referring to fig. 4, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps of any of the above embodiments of the feature combination screening method for multi-party enterprise credit rating.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, the memory 404 may include a hard disk drive (hard disk drive, abbreviated HDD), a floppy disk drive, a solid state drive (solid state drive, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 404 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.
Memory 404 may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by processor 402.
Processor 402 reads and executes computer program instructions stored in memory 404 to implement any of the above-described embodiments for implementing a method for feature combination screening for credit rating for multi-party enterprises.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module configured to communicate with the internet via wireless.
The input and output devices 408 are used to input or output information. In this embodiment, the input information may be feature data of individual participants, etc., and the output information may be a combination of features or a credit rating of the enterprise.
Alternatively, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
constructing an enterprise credit rating model which is participated in modeling by at least two participants through longitudinal federal learning, and acquiring the marginal contribution amount of the characteristic data of each participant to the Shapril value of the enterprise credit rating model obtained through training;
selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data to serve as a characteristic combination.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered as being described in the present specification.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (10)

1. A feature combination screening method for multi-party enterprise joint credit rating is characterized by comprising the following steps:
constructing an enterprise credit rating model which is learned and participated in modeling by at least two participants in a longitudinal federal manner, and acquiring the marginal contribution of the feature data of each participant to the Shapril value of the trained enterprise credit rating model;
selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data to be used as a characteristic combination.
2. The method of claim 1, wherein the step of constructing an enterprise credit rating model for modeling by at least two participating parties in longitudinal federal learning comprises the steps of:
acquiring feature data of each participant, wherein the feature data are marked with enterprise credit rating results, and the feature data have the same or overlapped sample spaces but different feature spaces;
sample alignment is carried out on the feature data of each participant, and the aligned feature data is input into an enterprise credit rating model for learning after being calculated and encrypted;
and after global aggregation is carried out on the feature data uploaded by each participant, the enterprise credit rating model is updated.
3. The method for screening the feature combinations oriented to the multi-party enterprise joint credit rating of claim 2, wherein the trusted third-party server issues the enterprise credit rating model to each participant, each participant calculates by using local feature data to obtain a result of the calculation, encrypts in a homomorphic encryption manner and uploads the result to the trusted third-party server, and the third-party server aggregates and updates the enterprise credit rating model according to the local calculation result uploaded by each participant and issues the result to each participant again.
4. The method for screening the combination of features oriented to the multi-party enterprise joint credit rating according to claim 1, wherein in the step of "obtaining the marginal contribution amount of the feature data of each participant to the salpril value of the trained enterprise credit rating model", the feature data of each participant is respectively input to the enterprise credit rating model to obtain an optimization index, and the marginal contribution amount of the salpril value of each feature data is calculated based on the optimization index.
5. The method for screening the combination of characteristics oriented to the rating of the multi-party enterprise federated credit as recited in claim 1, wherein "iteratively screening the valid characteristic data based on the initial characteristic data by using a greedy algorithm" comprises the following steps:
s1, integrating all feature data of all participants into a global feature set, and selecting initial feature data as a first feature of a modelP 1 =p 1
S2, fixing the first selected initial characteristic combinationP k Sequentially combining the rest of the features in the global feature set S with the initial featuresP k Are combined to obtain a composition comprisingk+1 candidate feature combinations of feature data, and all of them are calculatedk+1 candidate feature combination sandpril value margin contribution, selecting the candidate feature combination with the largest contributionP k+1 =(p 1 , p 2 ,..., p k , p k+1 );
S3, calculating the newly added feature data relative to the initial feature combination in the candidate feature combinationp k+1 Incremental contribution to the value of salpril brought;
s4, judging the increment deltaφ P Whether the gamma value is smaller than the set threshold value gamma or not, if not, repeating S2 to S4; and if so, obtaining the screened feature combination.
6. The method of claim 5, wherein computing all is performedk+1 marginal contribution amount of candidate feature combination, calculating a group Shapril value by using a group Shapril value for the marginal contribution amount of candidate feature combination of a plurality of feature data combinations.
7. An enterprise credit rating model, characterized in that the enterprise credit rating model is obtained by training the feature combinations obtained by screening according to the feature combination screening method for multi-party enterprise joint credit rating of any one of claims 1 to 6.
8. A feature combination screening device for multi-party enterprise joint credit rating, comprising:
the system comprises a marginal sharing quantity acquisition unit, a data processing unit and a data processing unit, wherein the marginal sharing quantity acquisition unit is used for constructing an enterprise credit rating model which is participated in modeling by longitudinal federal learning of at least two participants and acquiring marginal contribution quantity of the characteristic data of each participant to the Shapril value of the enterprise credit rating model obtained by training;
and the screening unit is used for selecting the characteristic data with the largest marginal contribution amount as initial characteristic data, iteratively screening effective characteristic data by adopting a greedy algorithm based on the initial characteristic data, and integrating the initial characteristic data and the effective characteristic data as a characteristic combination.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for screening combinations of features for ratings of joint credit for multi-party enterprises as claimed in any one of claims 1 to 6.
10. A readable storage medium having stored thereon a computer program comprising program code for controlling a process to execute a process comprising the method of screening a combination of features for a multiparty enterprise joint credit rating according to any of claims 1 to 6.
CN202211146961.4A 2022-09-21 2022-09-21 Characteristic combination screening method for multi-party enterprise joint credit rating and application Pending CN115238148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211146961.4A CN115238148A (en) 2022-09-21 2022-09-21 Characteristic combination screening method for multi-party enterprise joint credit rating and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211146961.4A CN115238148A (en) 2022-09-21 2022-09-21 Characteristic combination screening method for multi-party enterprise joint credit rating and application

Publications (1)

Publication Number Publication Date
CN115238148A true CN115238148A (en) 2022-10-25

Family

ID=83681198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211146961.4A Pending CN115238148A (en) 2022-09-21 2022-09-21 Characteristic combination screening method for multi-party enterprise joint credit rating and application

Country Status (1)

Country Link
CN (1) CN115238148A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809526A (en) * 2015-05-07 2015-07-29 上海交通大学 Redundant data utility maximization method
CN107391433A (en) * 2017-06-30 2017-11-24 天津大学 A kind of feature selection approach based on composite character KDE conditional entropies
CN111081321A (en) * 2019-12-18 2020-04-28 江南大学 CNS drug key feature identification method
CN112508199A (en) * 2020-11-30 2021-03-16 同盾控股有限公司 Feature selection method, device and related equipment for cross-feature federated learning
CN113947213A (en) * 2021-10-19 2022-01-18 中国电信股份有限公司 Method, device, storage medium and equipment for measuring contribution of federal learning participants
CN114021735A (en) * 2021-10-22 2022-02-08 中国银联股份有限公司 Method and device for processing data in federated learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809526A (en) * 2015-05-07 2015-07-29 上海交通大学 Redundant data utility maximization method
CN107391433A (en) * 2017-06-30 2017-11-24 天津大学 A kind of feature selection approach based on composite character KDE conditional entropies
CN111081321A (en) * 2019-12-18 2020-04-28 江南大学 CNS drug key feature identification method
CN112508199A (en) * 2020-11-30 2021-03-16 同盾控股有限公司 Feature selection method, device and related equipment for cross-feature federated learning
CN113947213A (en) * 2021-10-19 2022-01-18 中国电信股份有限公司 Method, device, storage medium and equipment for measuring contribution of federal learning participants
CN114021735A (en) * 2021-10-22 2022-02-08 中国银联股份有限公司 Method and device for processing data in federated learning

Similar Documents

Publication Publication Date Title
US20210004718A1 (en) Method and device for training a model based on federated learning
Zhu et al. From federated learning to federated neural architecture search: a survey
Qu et al. Blockchain-enabled federated learning: A survey
US11354539B2 (en) Encrypted data model verification
US11157833B2 (en) Learning service blockchain
CN112257873A (en) Training method, device, system, equipment and storage medium of machine learning model
CN108681482B (en) Task scheduling method and device based on graph data
US11362829B2 (en) Distributed privacy-preserving verifiable computation
CN111475854B (en) Collaborative computing method and system for protecting data privacy of two parties
KR20200083935A (en) Blockchain-based crowdsourcing of map applications
CN111428887B (en) Model training control method, device and system based on multiple computing nodes
US11410081B2 (en) Machine learning with differently masked data in secure multi-party computing
CN112818374A (en) Joint training method, device, storage medium and program product of model
US11991156B2 (en) Systems and methods for secure averaging of models for federated learning and blind learning using secure multi-party computation
EP3863002B1 (en) Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program
Singh et al. Federated learning: Challenges, methods, and future directions
CN112288094A (en) Federal network representation learning method and system
US11265166B2 (en) Distributed machine learning via secure multi-party computation and ensemble learning
Nagar Privacy-preserving blockchain based federated learning with differential data sharing
US20210232981A1 (en) Method and system for incremental training of machine learning models on edge devices
CN112966878A (en) Loan overdue prediction and learning method and device
WO2023124219A1 (en) Joint learning model iterative update method, apparatus, system, and storage medium
CN115238148A (en) Characteristic combination screening method for multi-party enterprise joint credit rating and application
WO2023038940A1 (en) Systems and methods for tree-based model inference using multi-party computation
CN113297310B (en) Method for selecting block chain fragmentation verifier in Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221025