WO2022057108A1 - 基于联邦学习的个人资质评估方法、装置及系统及存储介质 - Google Patents
基于联邦学习的个人资质评估方法、装置及系统及存储介质 Download PDFInfo
- Publication number
- WO2022057108A1 WO2022057108A1 PCT/CN2020/135276 CN2020135276W WO2022057108A1 WO 2022057108 A1 WO2022057108 A1 WO 2022057108A1 CN 2020135276 W CN2020135276 W CN 2020135276W WO 2022057108 A1 WO2022057108 A1 WO 2022057108A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- evaluation
- model parameters
- evaluation sub
- sub
- Prior art date
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 224
- 238000012797 qualification Methods 0.000 title claims abstract description 115
- 238000003860 storage Methods 0.000 title claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 78
- 238000013210 evaluation model Methods 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims description 50
- 230000006399 behavior Effects 0.000 claims description 20
- 230000010354 integration Effects 0.000 claims description 6
- 238000013450 outlier detection Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 238000002955 isolation Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000012847 principal component analysis method Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 17
- 238000009826 distribution Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000013079 data visualisation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000005477 standard model Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Definitions
- the present invention relates to the technical field of big data, and in particular, to a method, device, system and storage medium for evaluating personal qualifications based on federated learning.
- the traditional personal qualification assessment method requires a large number of manual (auditor) participation, and at the same time has a high risk of privacy leakage, human manipulation risk and fraud risk.
- the collection method of user data is mainly provided by the applicant, and then the approval agency manually verifies the accuracy of the information, and finally conducts a credit evaluation of the applicant according to a set of evaluation methods established internally. , and then determine whether to grant credit and the amount of credit, the more typical expert evaluation method and scoring evaluation method.
- various personal qualification evaluation models based on machine learning algorithms have been proposed.
- Federated learning is an emerging artificial intelligence basic technology. Its design goal is to develop high-efficiency machines among multiple participants or multiple computing nodes on the premise of ensuring information security during big data exchange and ensuring legal compliance. Learn.
- the existing federated learning-based evaluation system generally includes participants and a central server (coordinator), wherein: each participant uses the local The data trains the target model, obtains the gradient of the target model and sends it to the coordinator.
- the coordinator integrates the gradients of each participant, obtains the updated gradient of the target model and returns it to each participant.
- Each participant is based on the updated gradient and local The data trains the target model again and sends the trained gradient to the coordinator again, and iterates until the final target model is obtained.
- a first aspect of the present invention provides a method for evaluating individual qualifications based on federated learning, the technical solution of which is as follows:
- a method for evaluating individual qualifications based on federated learning which runs on a central server, including:
- the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model and the model parameters of the third evaluation sub-model are integrated to obtain the integrated model parameters, and the integrated model parameters are distributed to the The intelligent terminal and the local participant are used for model updating.
- a second aspect of the present invention provides a federated learning-based personal qualification assessment device, which runs on a central server, and includes:
- the first acquisition module is used to acquire the model parameters of the first evaluation sub-model sent by the intelligent terminal, wherein the first evaluation sub-model is obtained by the intelligent terminal based on the preprocessed user behavior data on the intelligent terminal. ;
- a first training module used for acquiring preprocessed external user data sent by at least one external participant, and training based on the external user data to obtain a second evaluation sub-model and its model parameters;
- a gradient update module configured to obtain the gradients of at least two third evaluation sub-models sent by at least two local participants, and perform a weighted average of the obtained gradients of the at least two third evaluation sub-models to generate an average gradient, based on the obtained gradients of the at least two third evaluation sub-models
- the average gradient updates the model parameters of the third evaluation sub-model and sends the updated model parameters to each of the local participants so that each of the local participants retrains the third evaluation model, wherein , the third evaluation model is obtained by the local participant training based on the preprocessed local user data;
- the integration module is used to integrate the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model and the model parameters of the third evaluation sub-model to obtain the integrated model parameters, and the integrated model The parameters are distributed to the intelligent terminal and the local participants for model updating.
- a third aspect of the present invention provides a federated learning-based personal qualification evaluation method, which runs on an intelligent terminal, and includes:
- the first evaluation sub-model is obtained by training based on the preprocessed user behavior data on the intelligent terminal, and the model parameters of the first evaluation sub-model are sent to the central server;
- the integrated model parameters generated by the central server include:
- the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model, and the model parameters of the third evaluation sub-model are integrated to obtain the integrated model parameters.
- a fourth aspect of the present invention provides a federated learning-based personal qualification assessment device, which runs on an intelligent terminal, and includes:
- the second training module is used for obtaining the first evaluation sub-model based on the preprocessed user behavior data on the intelligent terminal, and sending the model parameters of the first evaluation sub-model to the central server;
- An update module for receiving the integrated model parameters generated by the central server, and updating the first evaluation sub-model based on the integrated model parameters, wherein:
- the integrated model parameters generated by the central server include:
- the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model, and the model parameters of the third evaluation sub-model are integrated to obtain the integrated model parameters.
- a fifth aspect of the present invention provides a federated learning-based personal qualification evaluation system, which includes an intelligent terminal, at least one external participant, at least two local participants, and a central server, wherein:
- the intelligent terminal obtains a first evaluation sub-model based on the preprocessed user behavior data on the intelligent terminal, and sends the model parameters of the first evaluation sub-model to the central server;
- the external participant sends the preprocessed external user data to the central server, and the central server obtains the second evaluation sub-model and its model parameters through training based on the external user data;
- the local participant sends the gradient of the third evaluation sub-model to the central server, and the central server performs a weighted average of the obtained gradients of the at least two third evaluation sub-models to generate an average gradient, based on the average gradient updating the model parameters of the third evaluation sub-model and sending the updated model parameters to each of the local participants so that each of the local participants retrains the third evaluation model;
- the central server integrates the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model, and the model parameters of the third evaluation sub-model to obtain the integrated model parameters, and the integrated model parameters are Distributed to the smart terminal and the local participant for model update.
- a sixth aspect of the present invention provides a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause The electronic device performs the following operations:
- the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model and the model parameters of the third evaluation sub-model are integrated to obtain the integrated model parameters, and the integrated model parameters are distributed to the The intelligent terminal and the local participant are used for model updating.
- the strategy for qualification evaluation based on federated learning of the present invention can achieve the following technical effects on the premise of achieving the evaluation effect:
- the intelligent terminal After the intelligent terminal completes the model training by itself, it provides the model parameters to the central server without frequent gradient exchange with the central server, so that the intelligent terminal can join the evaluation system as a participant and enable users to pass the intelligent The terminal can quickly obtain the evaluation results.
- Fig. 1 is the implementation environment involved in the personal qualification evaluation method provided by the embodiment of the present invention.
- FIG. 2 is an architecture diagram of a federated learning-based personal qualification assessment system in an embodiment of the present invention
- FIG. 3 is a flowchart of a method for evaluating individual qualifications based on federated learning in an embodiment of the present invention
- FIG. 4 is a flowchart of a method for evaluating individual qualifications based on federated learning in an embodiment of the present invention
- FIG. 5 is a structural block diagram of an apparatus for evaluating personal qualifications based on federated learning in an embodiment of the present invention
- FIG. 6 is a flowchart of a method for evaluating individual qualifications based on federated learning in an embodiment of the present invention
- FIG. 7 is a flowchart of a method for evaluating individual qualifications based on federated learning in an embodiment of the present invention.
- FIG. 8 is a structural block diagram of an apparatus for evaluating personal qualifications based on federated learning in an embodiment of the present invention.
- FIG. 9 is a flow chart of the execution of the personal qualification evaluation method of the present invention in a specific application example.
- the existing federated learning-based evaluation system includes a participant and a central server (or becomes a coordinator), wherein: each participant uses the local data owned by each participant to train the target model to obtain the gradient of the target model and Send it to the coordinator, the coordinator integrates the gradients of each participant, obtains the updated gradient of the target model and returns it to each participant, each participant retrains the target model based on the updated gradient and local data, and retrains the obtained The gradients are sent to the coordinator and iterate until the final target model is obtained.
- the intelligent terminal completes the training of the first evaluation sub-model by itself based on the behavior data of the user on the intelligent terminal, and provides the parameters of the first evaluation sub-model to the central server.
- the external participants only provide their preprocessed data to the central server, and the central server uses these data to train the model, thereby obtaining the model parameters of the second evaluation sub-model for the external user data.
- Each local participant adopts the existing federated learning strategy, realizes the training of the third evaluation sub-model through frequent gradient exchange with the central server, and obtains the model parameters of the third evaluation sub-model for local user data.
- the smart terminal, external participants, and local participants all obtain an evaluation model through their respective user data training. It's just that the model training tasks of external parties are implemented by the central server.
- the central server finally integrates the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model, and the model parameters of the third evaluation sub-model to obtain the integrated model parameters and sends the integrated model parameters to the Smart terminals and local participants.
- the first evaluation sub-model located on the smart terminal, the second evaluation sub-model located on the central server and the third evaluation sub-model located at each local participant are updated to a unified global qualification evaluation model.
- the smart terminal, the central server and the local participants can implement the funding evaluation for the user, and the evaluation results should have greater consistency.
- the present invention provides a federated learning-based personal qualification assessment method, device, system, and storage medium.
- FIG. 1 is an implementation environment involved in the personal qualification evaluation method provided by an embodiment of the present invention. As shown in FIG. 1 , the implementation environment includes four layers, which are:
- the model training data required by each participant is located in the storage layer, and the data is stored in various business databases in various formats.
- multiple data converters are deployed in the data access layer to convert data in various formats into a unified data format.
- the data access layer provides a consistent Hive interface to the outside world by converting the messy internal data storage form of the participants into unified and structured structured data to access the big data platform.
- the data processing layer can implement:
- Missing of missing values for example, after the missing rate of data is counted, discard the data whose missing rate exceeds a predetermined threshold (such as 60%), use mode filling for discrete data, and use nearest neighbor difference or average interpolation for continuous data. .
- a predetermined threshold such as 60%
- Outlier detection for example, using the isolation forest method to detect outliers on the data, and discard outliers at a rate of 10%.
- Data binning for example, selecting an appropriate data interval to complete the data binning operation.
- Feature encoding for example, using the WOE encoding method to complete the encoding of the data
- Data dimensionality reduction such as using principal component analysis to reduce the dimensionality of the data to eliminate redundant features.
- Data balance for example, using the SMOTE oversampling method to balance the negative sample data to make up for the model overfitting problem caused by too few sample data.
- Sample alignment for example, corresponding to data obtained from external parties, requires sample alignment technology to process the data.
- the data after the preprocessing of the data layer, the data can meet the model training requirements.
- the heterogeneous data from different business data bureaus After being processed by the data processing layer, the heterogeneous data from different business data bureaus have been converted into feature data that can be directly input into the model, and the IDs of the trainable sample data have also been agreed.
- the personal qualification evaluation system of the present invention is arranged in the federation layer, evaluates each participant in the system and communicates with the central server to train the model. Finally, a global unified model is formed. Subsequent embodiments will describe in detail the specific model training process of the personal qualification evaluation system.
- It can implement business logic including user information registration, background data review, evaluation standard formulation, qualification score generation, evaluation model fine-tuning, user tag access, metadata information annotation, and visual information display.
- It can provide user information pages, global data visualization pages, background data management pages, etc.
- FIG. 2 shows the personal qualification evaluation system based on federated learning in this embodiment.
- the personal qualification evaluation system includes at least an intelligent terminal 100 , an external participant 200 , a local participant 300 and a central server 400 .
- the smart terminal 100 may be a user's smart phone, a palmtop computer, or the like.
- the smart terminal 100 is equipped with various consumer and credit APPs, and historical behavior data of the user, such as the user's consumption data, credit data, and the user's personal information, can be obtained from these APPs.
- the smart terminal 100 is also equipped with relevant program modules capable of implementing the model training task of the present invention.
- the smart terminal 100 performs data interaction with the central server 400 through a wireless network, thereby realizing the federated learning task of the present invention.
- the local participant 300 and the central server 400 generally belong to the same interest group, which is the initiator or beneficiary of the personal qualification assessment, and the data interaction between the local participant 300 and the central server 400 is relatively convenient, and There is generally no data island problem.
- the external participant 200 and the local participant 300 belong to different interest groups.
- the data access interface provided by the external participant 200 to the central server 400 is subject to various restrictions, and is provided to The model training data of the central server 400 must also undergo relevant encryption processing and so on.
- Tencent needs to evaluate the credit status of customers.
- it not only needs to use various business departments within Tencent (such as WeChat, QQ, etc.) etc.), and also need to use the user data stored in Pinduoduo’s database, at this time, each business department within Tencent (such as WeChat, QQ, etc.) is the local participant 300, while Pinduoduo It is the external party 200 .
- both the local participant 300 and the central server 400 are equipped with relevant program modules for implementing model training tasks, while the external participant 200 only provides a data interface.
- the model training process of the intelligent terminal 100 , the external participant 200 , the local participant 300 , and the central server 400 is roughly as follows.
- the smart terminal 100 As mentioned above, there is a large amount of user behavior data on the smart terminal 100, and the smart terminal is equipped with relevant program modules for model training tasks, and the authenticity and timeliness of these behavior data are very high, and the smart terminal 100 has more powerful computing power.
- the only defect is that the communication capability of the intelligent terminal 100 is poor, and it is difficult to achieve continuous interaction with the central server 400 .
- the smart terminal 100 obtains the user's daily payment order information, website access records, loan information and other behavior data under the premise of the user's authorization.
- the intelligent terminal 100 trains the first evaluation sub-model based on the data samples, and sends the model parameters of the trained first evaluation sub-model to the central server 400 to trigger the central server 400 to obtain the model parameters of the global qualification evaluation model.
- the central server acts as an agent for external participants to train the second evaluation sub-model
- the external participant 200 does not have model training capability, and only provides preprocessed training sample data.
- the preprocessed external user data is encrypted and provided to the central server 400 .
- the central server 400 trains the second evaluation sub-model based on the external user data, and obtains model parameters of the second evaluation sub-model.
- the local participant 300 and the central server 400 jointly train the third evaluation sub-model
- the local participant 300 and the central server 400 belong to the same interest group, and convenient and efficient data interaction is performed between them. Therefore, in order to improve the training effect.
- the local participant 300 and the central server 400 start the training of the third evaluation sub-model based on the traditional federated learning strategy, specifically:
- each local participant 300 sends the gradient of the model to the central server 400, and the central server 400 performs a weighted average on the obtained gradients to generate an average gradient. Based on the average gradient, the central server 400 updates the model parameters of the model and sends the updated model parameters to each local participant 300, and each local participant 300 retrains the respective third evaluation model. This iteration is performed until the training is completed, and the trained third evaluation model is obtained.
- the central server 400 obtains the global qualification evaluation model
- the first evaluation sub-model, the second evaluation sub-model and the third evaluation sub-model have all been trained, and the model parameters of the three sub-models have been provided to the central server 400 .
- the central server 400 analyzes the parameter weights of each sub-model according to the data distribution and data value of different participants, and the integrated model parameters can be obtained through the weighted average calculation, and the integrated model parameters can be used as the global The model parameters of the qualification evaluation model are distributed to each participant, so that each participant can update their evaluation model.
- the evaluation model in this embodiment selects the XGBoost model.
- the XGBoost model has the function of automatic integration, which can prevent the model from overfitting, thereby improving the generalization ability of the model.
- the XGBoost model uses the first-order partial derivative and the second-order partial derivative of the cost function, and the gradient descent is faster and more accurate, and it is also conducive to the calculation of the loss function and the update and decoupling of the parameters.
- other suitable machine learning models may also be selected.
- the present invention will be further introduced below from the side of the central server and the side of the intelligent terminal.
- the method for evaluating individual qualifications based on federated learning in this embodiment includes the following steps:
- S102 Acquire preprocessed external user data sent by at least one external participant, and train based on the external user data to obtain a second evaluation sub-model and its model parameters.
- steps S101 to S103 may be performed in parallel.
- the smart terminal, local participants and the central server all have a global qualification evaluation model. At this point, the user's qualification evaluation can be implemented.
- the central server does not directly accept the personal qualification score uploaded by the smart terminal, and it needs to verify the personal qualification score before storing it.
- Tamper-proofing is achieved by storing individual qualification scores in a pre-arranged blockchain. Moreover, visitors with relevant permissions who join the blockchain can query the personal qualification score of a specific user from the blockchain.
- the blockchain in this embodiment is a consortium chain.
- This embodiment also provides a federated learning-based personal qualification evaluation device, which runs on the central server 400 .
- the device includes a first acquisition module 301, a first training module 302, a gradient update module 303 and an integration module 304, a first acquisition module 301, a first training module 302, a gradient update module 303 and an integration module 304 corresponds to implementing the method steps S101-S104 in this embodiment respectively, and details are not repeated here.
- the personal qualification evaluation apparatus in this embodiment further includes relevant functional modules for implementing the method steps S105-S107 in this embodiment.
- Embodiment method and device/running on smart terminal are Embodiment method and device/running on smart terminal
- the execution process of the present invention is described from the side of the smart terminal 100 .
- the method for evaluating individual qualifications based on federated learning in this embodiment includes the following steps:
- the integrated model parameters generated by the central server include:
- the smart terminal, local participants and the central server all have a global qualification evaluation model. At this point, the user's qualification evaluation can be implemented.
- the smart terminal After the smart terminal completes the evaluation and gives a personal qualification score, it generally needs to be uploaded to the central server. Therefore, optionally, in this embodiment, as shown in FIG. 7 , the following steps are further included:
- S406 encrypting and sending the qualification score to the central server to trigger the central server to perform: obtaining the user's second personal qualification score based on the second evaluation sub-model; comparing and verifying the first personal qualification score and the second personal qualification score, if the first personal qualification score and the second personal qualification score conform to a predetermined rule, the first personal qualification score or the second personal qualification score is stored in the pre-arranged good blockchain.
- This embodiment also provides a federated learning-based personal qualification assessment device, which runs on an intelligent terminal.
- the personal qualification evaluation device includes a second training module 601 and an update module 602, and the second training module 601 and the update module 602 respectively implement the method steps S401-S402 in this embodiment, which are not repeated here. Repeat.
- the personal qualification evaluation apparatus in this embodiment further includes relevant functional modules for implementing the method steps S405-S406 in this embodiment.
- the existing evaluation models may not be able to accurately evaluate the personal qualifications of users. Therefore, it is necessary to check the eligibility of the models before performing the qualification evaluation, so as to decide whether to choose an existing evaluation model.
- Some evaluation models perform direct evaluation, or choose to retrain the evaluation model before evaluating.
- step S403 the following steps (not shown) may also be included:
- the intelligent terminal synchronizes the model's fault tolerance rate, AUC value and F1-SCORE from the central server, and calculates the evaluation data of the first evaluation sub-model. If the first evaluation sub-model meets the standard, execute the qualification evaluation, otherwise, go to the step S404.
- the central server calculates the AUC value and F1-SCORE of the second evaluation sub-model, and compares them with the preset standard model parameters. If the second evaluation sub-model meets the standard, the second evaluation sub-model is sent to the intelligent terminal. Smart terminals perform qualification assessment. Otherwise, a new round of model training is performed to update the evaluation model.
- This embodiment provides a computer-readable storage medium that stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic
- the device does the following:
- the model parameters of the first evaluation sub-model, the model parameters of the second evaluation sub-model and the model parameters of the third evaluation sub-model are integrated to obtain the integrated model parameters, and the integrated model parameters are distributed to the The intelligent terminal and the local participant are used for model updating.
- this specification also provides a specific application example, which uses the invented evaluation method to evaluate the employment qualifications of poor households, which is implemented by local participants.
- the characteristics of poor households include public information such as ID number, name, age, and gender, and learning information such as health status, consumption level, education level, and income.
- public information such as ID number, name, age, and gender
- learning information such as health status, consumption level, education level, and income.
- the public information is shared by all participants, and the learning information is cross-stored in different participants.
- the central server coordinates the training model of each participant, and obtains the characteristic data of the learning information.
- the following takes the internal data distribution system as an example to introduce the user qualification score generation process.
- part of the process of using the evaluation method of the present invention to carry out employment qualifications for poor households is as follows:
- Kafka is used for local distributed databases, such as Mysql, SQL Server, Oracle, etc. Collect and integrate user data items with the same ID. Subsequently, it is opened to Hadoop through a unified interface service for data consumption.
- the integrated data items include basic data such as poor household ID, age, gender, income, transaction information such as order number, quantity, and product name of historical orders, medical information such as document number, hospital type, amount, and disease name of medical insurance documents, and training data. Label fields (eg, eligible for employment support/non-eligible for employment support).
- the data information is sorted by summing, counting, and averaging, and the ratio of missing values is counted.
- the method of isolation forest is used for outlier detection, and outliers are discarded at a rate of 10%.
- Count the data distribution of each feature item select the appropriate data interval, and complete the data binning. Then, calculate the feature WOE code Aggregate the entire data based on the id value.
- the PCA principal component analysis method is used for data dimensionality reduction, aiming to eliminate redundant features to solve the multicollinearity problem, and at the same time, a smaller data size is helpful for data visualization.
- the SMOTE oversampling method is used to balance the data of negative samples to make up for the model overfitting problem caused by negative samples, that is, the number of unqualified poor households is too small.
- the XGBoost model is selected as the vocational qualification evaluation model for poor households.
- XGboost is an advanced implementation of Gradient Boosting Algorithms (GBM).
- GBM Gradient Boosting Algorithms
- XGboost has the function of automatic integration, which can prevent the model from overfitting and improve the generalization ability of the model.
- the XGBoost model uses the first-order derivative and second-order partial derivative of the cost function, and the gradient descent is faster and more accurate, and it is also conducive to the decoupling of loss function calculation and parameter update.
- the model in the internal data distribution system performs a forward propagation, and calculates the model gradient Encrypted and uploaded to the central server.
- the central server receives the gradients of each internal data distribution system, and summarizes and integrates them after decryption. Calculate the average gradient according to the set model learning rate ⁇ , the updated model parameters Then synchronize to each internal data distribution system, and repeat several times until the model training is completed.
- feature item scoring Calculate the weighted sum of qualification scores according to the XGBoost model parameters
- the qualification score it can be determined whether it needs employment support.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (10)
- 一种基于联邦学习的个人资质评估方法,其运行于中心服务器端,其包括:获取智能终端发送的第一评估子模型的模型参数,其中,所述第一评估子模型为所述智能终端基于智能终端上的经过预处理后的用户行为数据训练得到;获取至少一个外部参与方发送的经过预处理后的外部用户数据,并基于所述外部用户数据训练得到第二评估子模型及其模型参数;获取至少两个本地参与方发送的至少两个第三评估子模型的梯度,对获取到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方再次对所述第三评估模型进行再次训练,其中,所述第三评估模型为所述本地参与方基于预处理后的本地用户数据训练得到;对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得的整合后的模型参数,将整合后的模型参数分发给所述智能终端、所述本地参与方用于模型更新。
- 如权利要求1所述的个人资质评估方法,其特征在于,所述预处理操作包括:将原始的数值、字符串以及比率数值转化为适合模型输入的特征,对数据进行缺失值填充、离群值检测、数据分箱、特征编码、数据降维、数据平衡或样本对齐。
- 如权利要求2所述的个人资质评估方法,其特征在于:所述缺失值填充包括:舍弃缺失率超过预定阈值的数据,对离散型数据采用众数填充,对连续型数据采用最近邻差值或平均插值填充;所述离群值检测采用隔离森林法;所述特征编码采用WOE编码法;所述数据降维采用主成分分析法;所述数据平衡采用SMOTE过采样法。
- 如权利要求1所述的个人资质评估方法,其特征在于,其还包括:获取所述智能终端发送的经过加密后的用户的第一个人资质评分,所述第一个人资质评分由所述智能终端基于所述第一评估子模型获得;基于所述第二评估子模型获得用户的第二个人资质评分;对比验证所述第一个人资质评分和所述第二个人资质评分,如果所述第一个人资质评分和所述第二个人资质评分符合预定规则,则将第一个人资质评分或所述第二个人资质评分存储至预先布置好的区块链中。
- 一种基于联邦学习的个人资质评估装置,其运行于中心服务器端,其包括:第一获取模块,用于获取智能终端发送的第一评估子模型的模型参数,其中,所述第一评估子模型为所述智能终端基于智能终端上的经过预处理后的用户行为数据训练得到;第一训练模块,用于获取至少一个外部参与方发送的经过预处理后的外部用户数据,并基于所述外部用户数据训练得到第二评估子模型及其模型参数;梯度更新模块,用于获取至少两个本地参与方发送的至少两个第三评估子模型的梯度,对获取到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方再次对所述第三评估模型进行再次训练,其中,所述第三评估模型为所述本地参与方基于预处理后的本地用户数据训练得到;整合模块,用于对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得的整合后的模型参数,将整合后的模型参数分发给所述智能终端、所述本地参与方用于模型更新。
- 一种基于联邦学习的个人资质评估方法,其运行于智能终端,其包括:基于智能终端上的经过预处理后的用户行为数据训练得到第一评估子模型,并将第一评估子模型的模型参数发送给中心服务器;接收中心服务器生成的整合后的模型参数,并基于整合后的模型参数对所述第一评估子模型进行更新,其中:所述中心服务器生成所述整合后的模型参数包括:获取至少一个外部参与方发送的经过预处理后的外部用户数据,并基于所述外部用户数据训练得到第二评估子模型及其模型参数;获取至少两个本地参与方发送的至少两个第三评估子模型的梯度,对获取到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方再次对所述第三评估模型进行再次训练,其中,所述第三评估模型为所述本地参与方基于预处理后的本地用户数据训练得到;对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得所述整合后的模型参数。
- 一种基于联邦学习的个人资质评估装置,其运行于智能终端,其包括:第二训练模块,用于基于智能终端上的经过预处理后的用户行为数据训练得到第一评估子模型,并将第一评估子模型的模型参数发送给中心服务器;更新模块,用于接收中心服务器生成的整合后的模型参数,并基于整合后的模型参数对所述第一评估子模型进行更新,其中:所述中心服务器生成所述整合后的模型参数包括:获取至少一个外部参与方发送的经过预处理后的外部用户数据,并基于所述外部用户数据训练得到第二评估子模型及其模型参数;获取至少两个本地参与方发送的至少两个第三评估子模型的梯度,对获取到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方再次对所述第三评估模型进行再次训练,其中,所述第三评估模型为所述本地参与方基于预处理后的本地用户数据训练得到;对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得所述整合后的模型参数。
- 如权利要求6所述的个人资质评估方法,其特征在于,其还包括:基于所述第一评估子模型获得用户的资质评分并显示所述资质评分;将所述资质评分加密发送至所述中心服务器,以触发所述中心服务器执行:基于所述第二评估子模型获得用户的第二个人资质评分;对比验证所述第一个人资质评分和所述第二个人资质评分,如果所述第一个人资质评分和所述第二个人资质评分符合预定规则,则将第一个人资质评分或所述第二个人资质评分存储至预先布置好的区块链中。
- 一种基于联邦学习的个人资质评估系统,其特征在于,所述个人资质评估系统包括智能终端、至少一个外部参与方、至少两个本地参与方及中心服务器端,其中:所述智能终端基于智能终端上的经过预处理后的用户行为数据训练得到第一评估子模型,并将第一评估子模型的模型参数发送给中心服务器端;所述外部参与方发送经过预处理后的外部用户数据给中心服务器端,所述中心服务器端基于所述外部用户数据训练得到第二评估子模型及其模型参数;所述本地参与方发送第三评估子模型的梯度给中心服务器端,所述中心服务器端对获取到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方对所述第三评估模型进行再次训练;所述中心服务器对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得的整合后的模型参数,将整合后的模型参数分发给所述智能终端、所述本地参与方用于模型更新。
- 一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:获取智能终端发送的第一评估子模型的模型参数,其中,所述第一评估子模型为所述智能终端基于智能终端上的经过预处理后的用户行为数据训练得到;获取至少一个外部参与方发送的经过预处理后的外部用户数据,并基于所述外部用户数据训练得到第二评估子模型及其模型参数;获取至少两个本地参与方发送的至少两个第三评估子模型的梯度,对获取 到的至少两个第三评估子模型的梯度进行加权平均以生成平均梯度,基于所述平均梯度更新所述第三评估子模型的模型参数并将更新后的模型参数发送给各所述本地参与方以使得各所述本地参与方再次对所述第三评估模型进行再次训练,其中,所述第三评估模型为所述本地参与方基于预处理后的本地用户数据训练得到;对所述第一评估子模型的模型参数、第二评估子模型的模型参数及第三评估子模型的模型参数进行整合以获得的整合后的模型参数,将整合后的模型参数分发给所述智能终端、所述本地参与方用于模型更新。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010979864.8 | 2020-09-17 | ||
CN202010979864.8A CN112116103B (zh) | 2020-09-17 | 2020-09-17 | 基于联邦学习的个人资质评估方法、装置及系统及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057108A1 true WO2022057108A1 (zh) | 2022-03-24 |
Family
ID=73799839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/135276 WO2022057108A1 (zh) | 2020-09-17 | 2020-12-10 | 基于联邦学习的个人资质评估方法、装置及系统及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112116103B (zh) |
WO (1) | WO2022057108A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116415199A (zh) * | 2023-04-13 | 2023-07-11 | 广东铭太信息科技有限公司 | 基于审计中间表的业务数据离群分析方法 |
CN117972793A (zh) * | 2024-03-28 | 2024-05-03 | 中电科网络安全科技股份有限公司 | 一种纵向联邦树模型训练方法、装置、设备及存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113379708A (zh) * | 2021-02-26 | 2021-09-10 | 山东大学 | 一种基于联邦学习的空调外机外观检测方法及系统 |
CN113159279B (zh) * | 2021-03-18 | 2023-06-23 | 中国地质大学(北京) | 基于神经网络与深度学习的跨域知识协助方法与系统 |
CN113191090A (zh) * | 2021-05-31 | 2021-07-30 | 中国银行股份有限公司 | 基于区块链的联邦建模方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008709A (zh) * | 2020-03-10 | 2020-04-14 | 支付宝(杭州)信息技术有限公司 | 联邦学习、资料风险评估方法、装置和系统 |
CN111461874A (zh) * | 2020-04-13 | 2020-07-28 | 浙江大学 | 一种基于联邦模式的信贷风险控制系统及方法 |
CN111582508A (zh) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | 一种基于联邦学习框架的策略制定方法、装置和电子设备 |
CN111652383A (zh) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | 数据贡献度评估方法、装置、设备及存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106384197A (zh) * | 2016-09-13 | 2017-02-08 | 北京协力筑成金融信息服务股份有限公司 | 一种基于大数据的业务质量评估方法和装置 |
US11010637B2 (en) * | 2019-01-03 | 2021-05-18 | International Business Machines Corporation | Generative adversarial network employed for decentralized and confidential AI training |
US20200202243A1 (en) * | 2019-03-05 | 2020-06-25 | Allegro Artificial Intelligence Ltd | Balanced federated learning |
CN110263921B (zh) * | 2019-06-28 | 2021-06-04 | 深圳前海微众银行股份有限公司 | 一种联邦学习模型的训练方法及装置 |
CN110610242B (zh) * | 2019-09-02 | 2023-11-14 | 深圳前海微众银行股份有限公司 | 一种联邦学习中参与者权重的设置方法及装置 |
-
2020
- 2020-09-17 CN CN202010979864.8A patent/CN112116103B/zh active Active
- 2020-12-10 WO PCT/CN2020/135276 patent/WO2022057108A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008709A (zh) * | 2020-03-10 | 2020-04-14 | 支付宝(杭州)信息技术有限公司 | 联邦学习、资料风险评估方法、装置和系统 |
CN111582508A (zh) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | 一种基于联邦学习框架的策略制定方法、装置和电子设备 |
CN111461874A (zh) * | 2020-04-13 | 2020-07-28 | 浙江大学 | 一种基于联邦模式的信贷风险控制系统及方法 |
CN111652383A (zh) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | 数据贡献度评估方法、装置、设备及存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116415199A (zh) * | 2023-04-13 | 2023-07-11 | 广东铭太信息科技有限公司 | 基于审计中间表的业务数据离群分析方法 |
CN116415199B (zh) * | 2023-04-13 | 2023-10-20 | 广东铭太信息科技有限公司 | 基于审计中间表的业务数据离群分析方法 |
CN117972793A (zh) * | 2024-03-28 | 2024-05-03 | 中电科网络安全科技股份有限公司 | 一种纵向联邦树模型训练方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112116103B (zh) | 2024-07-09 |
CN112116103A (zh) | 2020-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057108A1 (zh) | 基于联邦学习的个人资质评估方法、装置及系统及存储介质 | |
CN111461874A (zh) | 一种基于联邦模式的信贷风险控制系统及方法 | |
CN110298547A (zh) | 风险评估方法、装置、计算机装置及存储介质 | |
CN110009475A (zh) | 风险稽核监察方法、装置、计算机设备及存储介质 | |
CN108596443A (zh) | 一种基于多维度数据的用电客户信用等级评价方法 | |
CN107424070A (zh) | 一种基于机器学习的贷款用户信用评级方法及系统 | |
CN112784994A (zh) | 基于区块链的联邦学习数据参与方贡献值计算和激励方法 | |
CN110399533A (zh) | 资金流向查询方法及装置 | |
CN109242673A (zh) | 鹰眼反欺诈大数据风控评估系统 | |
CN112418520A (zh) | 一种基于联邦学习的信用卡交易风险预测方法 | |
CN106127634A (zh) | 一种基于朴素贝叶斯模型的学生学业成绩预测方法及系统 | |
CN107993142A (zh) | 一种金融反欺诈风险控制系统 | |
CN107527240A (zh) | 一种运营商行业产品口碑营销效果鉴定系统及方法 | |
CN109165337A (zh) | 一种基于知识图谱构建招投标领域关联分析的方法及系统 | |
CN110659976A (zh) | 基于区块链的企业技术服务征信系统及其管理方法 | |
WO2021042541A1 (zh) | 新零售模式下的商品导购方法、装置、设备及存储介质 | |
CN109670947A (zh) | 一种基于专利申请分期付款业务的专利运营平台系统及使用方法 | |
CN113902037A (zh) | 非正常银行账户识别方法、系统、电子设备及存储介质 | |
Zhang et al. | Service failure risk assessment and service improvement of self-service electric vehicle | |
CN206497498U (zh) | 一种基于企业征信业务的信用评级信息数据集成系统 | |
Elezaj et al. | Big data in e-government environments: Albania as a case study | |
CN108846739A (zh) | 一种债权债务应用方法及系统 | |
CN115082203A (zh) | 生息方案推送方法、装置、电子设备及存储介质 | |
CN112686751B (zh) | 数据管理系统及技术交易平台 | |
CN117079772A (zh) | 一种基于社区矫正对象心理评估分析的智慧矫正系统及终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20953964 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20953964 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20953964 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.09.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20953964 Country of ref document: EP Kind code of ref document: A1 |