CN111008709A

CN111008709A - Federal learning and data risk assessment method, device and system

Info

Publication number: CN111008709A
Application number: CN202010162831.4A
Authority: CN
Inventors: 汲小溪; 赵闻飙; 王维强; 傅欣艺
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2020-04-14

Abstract

The embodiment of the specification discloses a method, a device and a system for federated learning and data risk assessment, wherein in the method, all data parties are used as participants in federated learning, a target model is trained based on local user data, the gradient of the target model is obtained and fed back to a coordinating party in federated learning, and the target model is used for assessing the risk of user data; the coordinating party integrates the gradients fed back by at least two data owners to obtain the updating gradient of the target model and sends the updating gradient to the data owners; and the data owner trains the target model again based on the updated gradient and the local user data, and feeds back the gradient obtained by the retraining to the coordinator.

Description

Federal learning and data risk assessment method, device and system

Technical Field

The application relates to the technical field of computers, in particular to a method, a device and a system for federated learning and data risk assessment.

Background

With the development of internet and computer technology, more and more services are moved to network platforms for processing, for example, many merchants select e-commerce platforms to sell goods. In addition, in many service scenarios, the client needs to submit relevant data to the network platform to prove the identity, qualification, and the like of the client, and can process relevant services only when the audit is passed. For example, when a merchant enters the e-commerce platform, it is necessary to submit the qualification certificates such as a door license, a website, a license, and the like to the e-commerce platform so that the e-commerce platform performs access audit on the merchant, and after the access audit is passed, the merchant can sell the goods through the e-commerce platform.

However, for a long time, the network platform relies on manual work to perform related data auditing, and the efficiency is low.

Disclosure of Invention

The embodiment of the specification provides a method, a device and a system for federated learning and data risk assessment to improve data auditing efficiency.

In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:

in a first aspect, a federal learning method of a material risk assessment model is provided, which includes:

the method comprises the following steps that all data parties are used as participants in federal learning, a target model is trained based on local user data, the gradient of the target model is obtained and fed back to a coordinator in the federal learning, and the target model is used for evaluating risks of user data;

the coordinator integrates the gradients fed back by at least two data owners to obtain the update gradient of the target model and sends the update gradient to the data owners;

and the data owner trains the target model again based on the updated gradient and the local user data, and feeds back the gradient obtained by the retraining to the coordinator.

In a second aspect, a method for evaluating risk of data is provided, including:

receiving data to be evaluated uploaded by a user;

extracting characteristic data from the data to be evaluated;

and determining the risk of the data to be evaluated based on the feature data and a target model, wherein the target model is obtained by training based on the method of the first aspect.

In a third aspect, a federal learning method of a material risk assessment model is provided, which includes:

training a target model based on local user data to obtain a gradient of the target model and feeding the gradient back to a coordinator in federated learning so that the coordinator integrates the gradients fed back by all data parties to obtain an updated gradient of the target model, wherein the target model is used for evaluating risks existing in user data;

receiving the updating gradient issued by the coordinator;

and training the target model again based on the updated gradient and the local user data, and feeding back the gradient obtained by the training again to the coordinator.

In a fourth aspect, a federal learning method of a material risk assessment model is provided, which includes:

receiving a gradient fed back by a data owner, wherein the gradient is obtained by training a target model based on local user data by the data owner, the data owner is a participant in federal learning, and the target model is used for evaluating risks of user data;

integrating the gradients fed back by at least two data owners to obtain an updated gradient of the target model;

and sending the updating gradient to the data owner so that the data owner trains the target model again based on the updating gradient and local user data, and feeding back the gradient obtained by retraining.

In a fifth aspect, a federated learning system is provided, which includes: a coordinator and a plurality of data owners as participants,

the data owner is used for training a target model based on local user data to obtain the gradient of the target model and feeding the gradient back to a coordinator in federal learning, wherein the target model is used for evaluating the risk of user data;

the coordinator is used for integrating the gradients fed back by at least two data owners to obtain the update gradient of the target model and sending the update gradient to the data owners;

and the data owner is further used for training the target model again based on the updated gradient and the local user data, and feeding back the gradient obtained by the retraining to the coordinator.

In a sixth aspect, an apparatus for evaluating risk of material is provided, the apparatus comprising:

the data receiving module is used for receiving the data to be evaluated uploaded by the user;

the characteristic extraction module is used for extracting characteristic data from the data to be evaluated;

and a risk assessment module, configured to determine a risk of the data to be assessed based on the feature data and an object model, where the object model is obtained based on the method of the first aspect.

In a seventh aspect, a federal learning device of a material risk assessment model is provided, including:

the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for training a target model based on local user data to obtain the gradient of the target model and feeding the gradient back to a coordinator in federated learning so that the coordinator integrates the gradients fed back by at least two data owners to obtain an updated gradient of the target model, and the target model is used for evaluating the risk of user data;

an update gradient receiving module, configured to receive the update gradient issued by the coordinator;

and the second training module is used for training the target model again based on the updated gradient and the local user data, and feeding back the gradient obtained by the retraining to the coordinator.

In an eighth aspect, a federal learning device of a document risk assessment model is provided, which includes:

the gradient receiving module is used for receiving a gradient fed back by a data owner, wherein the gradient is obtained by training a target model by the data owner based on local user data, the data owner is a participant in federal learning, and the target model is used for evaluating risks of user data;

the gradient integration module is used for integrating the gradients fed back by at least two data owners to obtain the update gradient of the target model;

and the update gradient issuing module is used for sending the update gradient to the data owner so that the data owner trains the target model again based on the update gradient and the local user data and feeds back the gradient obtained by the retraining.

In a ninth aspect, there is provided an electronic device comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

receiving the updating gradient issued by the coordinator;

In a tenth aspect, a computer-readable storage medium is provided that stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform operations comprising:

receiving the updating gradient issued by the coordinator;

In an eleventh aspect, an electronic device is provided, including:

a processor; and

receiving the updating gradient issued by the coordinator;

In a twelfth aspect, a computer-readable storage medium is provided that stores one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:

receiving the updating gradient issued by the coordinator;

As can be seen from the technical solutions provided in the embodiments of the present specification, the solutions provided in the embodiments of the present specification have at least one of the following technical effects: because the user data of all parties of different data can be fused through federal learning, a target model which can automatically evaluate whether the user data has risks is trained, and the intelligent identification of the risks of the user data is realized, manual participation is not needed when the user data is audited, so that the auditing efficiency of the user data can be greatly improved, and the labor cost is saved. In addition, due to federal learning, joint modeling of all the data parties can be achieved under the condition that local user data of the participating parties cannot be out of the domain, so that on one hand, safety of the local user data of the participating parties is guaranteed, on the other hand, sample data adopted in modeling is more and more comprehensive, further, the risk of the user data can be more accurately identified by the established model, and finally, accuracy of user data risk identification is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic diagram of a framework of a federated learning system provided in an embodiment of the present specification.

Fig. 2 is a schematic flow chart of a federal learning method of a material risk assessment model provided in an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of the principle of horizontal federal learning provided by the embodiments of the present specification.

FIG. 4 is a schematic diagram of a multi-modal model training provided by an embodiment of the present description.

Fig. 5 is a schematic diagram of a user data security protection scheme provided in an embodiment of the present specification.

Fig. 6 is a schematic flow chart of a data risk assessment method according to an embodiment of the present disclosure.

Fig. 7 is a schematic flowchart of a federal learning method of a document risk assessment model according to an embodiment of the present disclosure.

Fig. 8 is a schematic flowchart of a federal learning method of a document risk assessment model provided in an embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.

Fig. 10 is a schematic structural diagram of a data risk assessment apparatus according to an embodiment of the present disclosure.

Fig. 11 is a schematic structural diagram of a federal learning device of a document risk assessment model according to an embodiment of the present disclosure.

Fig. 12 is a schematic structural diagram of a federal learning device of a document risk assessment model according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to improve auditing efficiency of user data, embodiments of the present specification provide a method, an apparatus, and a system for federal learning of a data risk assessment model, and a method and an apparatus for data risk assessment. The method and the apparatus provided by the embodiments of the present disclosure may be executed by an electronic device, such as a terminal device or a server device. In other words, the method may be performed by software or hardware installed in the terminal device or the server device. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The terminal devices include but are not limited to: any one of smart terminal devices such as a smart phone, a Personal Computer (PC), a notebook computer, a tablet computer, an electronic reader, a web tv, and a wearable device.

The "plurality" mentioned in the embodiments of the present specification means "two or more".

Federal Learning (federal Learning) is an emerging artificial intelligence supporting technology, and the goal of the federal Learning is to carry out efficient machine Learning among multiple parties or multiple computing nodes on the premise of ensuring the safety of private data (such as terminal data and personal privacy data) and meeting the legal and legal regulations.

In order to realize automatic and intelligent auditing of user data, a large amount of sample and label data are firstly acquired, but a single organization or organization is difficult to have a large amount of high-quality sample and label data, and multiple parties are required to be combined to share the sample and label data at the moment. However, with the increasing importance of the public and government on user data privacy protection, samples and tag data of each party cannot be directly shared, so that the samples and tag data of each party become individual data islands and cannot be applied. In the embodiment of the specification, the data islands are opened through a federal learning technology, and a multi-party combined learning target model is performed on the premise that the sample and tag data of each party do not appear in the field or leak, so that intelligent wind control of user data is realized.

A bang learning system provided in this specification will be described with reference to fig. 1.

As shown in fig. 1, in one possible application scenario, the federal learning system may include a coordinator 11 and a plurality of local peers (e.g., local peer 1, local peer 2, … …, local peer n) 12. The coordinator 11 may be assumed by a cloud server, a server of a participant in federal learning, or a third-party server, and the local terminals 12 may represent data owners (e.g., data owner 1, data owner 2, … …, data owner n), where the data owners include an organization, a department in an organization, etc., where user data is locally accumulated.

It should be understood that, in the present specification, the functions performed by the data owner are actually performed by an electronic device, such as a terminal or a server of the data owner.

In the federal system shown in fig. 1, a coordinator 11 first deploys an initial object model for the same learning object to each local end 12; after receiving the initial target model, the local end 12 will train based on the local user data to obtain the gradient of the target model and feed it back to the coordinator 11; the coordinator 11 integrates the received gradients from the plurality of local ends 12 to obtain an updated gradient of the target model and sends the updated gradient to each local end 12; the local end 12 updates the target model after receiving the update gradient, trains the updated target model by using local user data again, and feeds back the gradient of the target model obtained by retraining to the coordinator 11; the coordinator 11 integrates the gradient fed back from the local end again to obtain the updated gradient of the target model. And continuously iterating in the above way until an iteration termination condition is met, and issuing the updated gradient obtained by the last integration of the coordinator 11 as a final model parameter to a required local end.

In one practical example, the coordinator in fig. 1 may be a server of a third party payment platform, and the plurality of local terminals in fig. 1 may include, but are not limited to, the third party payment platform, an e-commerce platform, a financial institution, and the like.

On the basis of fig. 1, the method provided in the present specification is explained below.

Fig. 2 is a schematic flow chart of an implementation of a federal learning method of a document risk assessment model according to an embodiment of the present disclosure, which may be applied to the federal system shown in fig. 1. As shown in fig. 2, the method may include:

and step 202, taking the data owner as a participant in federal learning, training the target model based on local user data, obtaining the gradient of the target model and feeding the gradient back to a coordinator in federal learning, wherein the target model is used for evaluating the risk of user data.

The data owner may be any organization or organization that accumulates user data.

The user profile may include, but is not limited to, at least one of identification credential, qualification credential, and credit credential of the user. Taking the data owner as the e-commerce platform and the user as the merchant of the e-commerce platform as an example, the user data may be qualification certificates such as a door head, a website, a business license and the like submitted when the merchant initiates a request for entering the e-commerce platform.

Typically, the gradient of the model is a vector. In this embodiment, the gradient may be a vector formed by the partial derivatives of the target model.

Federal Learning is classified into Horizontal federal Learning (Horizontal federal Learning), Vertical federal Learning (Vertical federal Learning), and federal migration Learning. In one example of the present description, the federal learning described above is a horizontal federal learning. Fig. 3 shows a schematic diagram of the lateral federal learning. As shown in fig. 3, in the horizontal federal learning, sample data from different participants (e.g., participant a and participant B) are different, but features in the sample data from different participants overlap with each other, and the combination of different participants can increase the amount of sample data, but the features in one sample do not increase. In this embodiment, if the federal learning is a horizontal federal learning, at least a part of the user profile data of different profile owners is from different users, and the characteristics included in the user profile data of different profile owners are the same.

The local user profile data of a profile owner may include all or a collection of locally accumulated profile data of users, wherein a user's profile data may be considered a sample. Optionally, in step 102, one sample also corresponds to one label if there is a supervised training target model.

In one example, the target model may be a classification model, the input of the target model may be features extracted from the local user profile data by the profile owner, and the output may be a risk score or a risk level of the user profile.

In another example, the target model may be a multi-modal model, the input of which may include multi-modal features extracted from the user profile data, and the output may be a risk score or risk level of the user profile. Specifically, if the target model is a multi-modal model, the data owner trains the target model based on local user data, which may include: and (4) extracting multi-modal characteristics from local user data by the data owner, and performing multi-modal model training based on the multi-modal characteristics.

Alternatively, the multi-modal model may be a Tensor Fusion Network (TFN), and fig. 4 shows a schematic diagram of the principle of TFN training, which is described in detail below.

As shown in fig. 4, a data owner may extract features in different modalities from local user data, where the features of one modality are usually represented by one feature vector, for example, features of multiple modalities such as an image, a text, and a structured feature may be extracted from the local user data, where the image may be a qualification certificate photo of a user, the text may be text information extracted from the qualification certificate photo of the user by Optical Character Recognition (OCR), and the structured feature may be a user name, a city where the user is located, a specific geographic location, and other features that the user fills in on a network platform of the data owner; then, the data owner takes the extracted multi-modal features (such as the modal source of fig. 4) as the input of the TFN to perform multi-modal learning; and finally, outputting the risk score of the user data or the risk grade corresponding to the score, wherein in general, the risk score of the data of one user is very high (such as 80-100 points), which indicates that the data of the user belongs to high-risk data, the risk score of the data of one user is higher (such as 60-80 points), which indicates that the data of the user belongs to medium-risk data, and the risk score of the data of one user is lower (such as below 50 points), which indicates that the data of the user belongs to low-risk data.

TFN is a fusion operation based on Kronecker Product, and after the fusion operation is added into a training pool, each dimension of each modal feature vector is subjected to point multiplication to obtain fusion feature expression. As shown in fig. 4, for vision z^vAnd the sense of hearing z^aAnd the language z^lPerforming point multiplication on each dimension of the original features (Unimodal) in the three modes to obtain a Fusion feature expression (Tensor Fusion):

the TFN has the advantages that on the basis of keeping all the information of the original mode, the full high-order characteristics are constructed simultaneously, so that the loss of the fused information is nearly minimum.

The multi-mode learning can fuse various information to obtain more comprehensive characteristics, improve the robustness of the model and ensure that the model can still work effectively when some modes are absent. Particularly, the TFN model keeps original modal information and simultaneously constructs high-order features for fusion, so that a better identification effect is achieved in a plurality of certificate risk identification scenes such as a doorphone, a website, a business license and the like.

It is understood that features of other modalities may be added in practical applications besides the three modalities of image, text and structural features indicated above, and the embodiments of the present specification are not limited thereto.

After the gradient of the target model is obtained based on local user data training, in order to avoid that an attacker steals the gradient uploaded to a coordinating party to reversely deduce user data, which causes user privacy disclosure, further, a data owner can encrypt the gradient of the target model and feed the gradient back to the coordinating party. Specifically, the data owner encrypts the gradient at least based on the public key of the coordinator, and correspondingly, after receiving the gradient fed back by the data owner, the coordinator decrypts the gradient by using the private key of the coordinator, and then integrates the gradient.

And step 204, integrating the gradients fed back by the at least two data owners by the coordinator to obtain an updated gradient of the target model and sending the updated gradient to the data owners.

If the gradient uploaded by the at least two data owners is not encrypted, the coordinator can directly integrate the gradients fed back by the at least two data owners to obtain the update gradient of the target model. If the gradient uploaded by the at least two data owners is encrypted, the coordinator can decrypt the gradient fed back by the at least two data owners first, and then integrate the gradient obtained by decryption to obtain the update gradient of the target model.

Specifically, the coordinator performs safe summation on the gradients of the at least two data owners, so as to obtain the updated gradient of the target model.

As an example, assume that the gradient uploaded by the owner of the data i is w_iThen the update gradient integrated by the coordinator can be expressed as:

，

wherein i =1,2, …, k, k represents the number of all parties of the above-mentioned at least two materials, k is less than or equal to n, n is the number of all parties participating in the federal study,

the gradient of the upload of the data owner i is represented as w_iThe weight occupied in the integration, t represents the number of iterations, t +1 represents that the iteration of the next round is not w_iThe power of the sum.

Optionally, to further ensure that the privacy of the user is not revealed, the coordinator may decrypt, in a Trusted Execution Environment (TEE), the gradient fed back by at least two of the data owners, and integrate the decrypted gradient to obtain an updated gradient of the target model.

Optionally, in order to further ensure that the privacy of the user is not revealed, the coordinator may encrypt the update gradient of the target model in the TEE and send the encrypted update gradient to the owner of the knowledge in federal learning. Specifically, the coordinator may encrypt the update gradient using its own private key in the TEE, and then issue the encrypted update gradient to the data owner.

In summary, for the security of the local user profile data of the profile owners, as shown in fig. 5, each profile owner (or each local end) may encrypt the Δ w using the public key of the coordinating party first after training the gradient (Δ w, where the symbol "Δ" represents the gradient, and w is a gradient value) of the target model based on the local user profile data, and then feed back the encrypted Δ w to the coordinating party; correspondingly, the coordinator can receive and decrypt (using the private key of the coordinator) in the TEE to obtain the Δ w fed back by each data owner, then integrate to obtain the update gradient, encrypt the update gradient in the TEE, and send the update gradient to each data owner. The method comprises the steps that all data owners encrypt gradients by using a public key of the coordinator and feed the encrypted gradients back to the coordinator, so that not only can other organizations or organizations outside a federal system be prevented from intercepting the gradients and reversely deducing user data, but also the fact that all data owners can not decrypt the gradients obtained by training each other can be guaranteed, all data owners can not know the gradients of each other, and the method is beneficial to protecting the safety of the user data of all the data owners.

And step 206, the data owner trains the target model again based on the updated gradient and the local user data, and feeds back the gradient obtained by the retraining to the coordinator.

After the data owner receives the updated gradient, the parameters of the locally deployed target model can be updated according to the updated gradient, then the target model is trained again based on the local user data, the gradient obtained by retraining is fed back to the coordinating party, and iteration is continuously performed in a circulating mode until a preset iteration termination condition is met. The preset iteration termination condition may be that a preset iteration number is reached, or the gradient of the target model is not changed or changed very little due to the increase of the iteration number.

Optionally, when a preset iteration termination condition is met, the coordinator may send the updated gradient obtained by the last integration to the target profile owner, where the target profile owner includes one or more profile owners in federal learning, so that the target profile owner updates the target model based on the updated gradient obtained by the last integration, and uses the target model to evaluate the risk of the user profile. When the target data owner comprises a plurality of data owners in federal learning, the aim of automatically identifying the risk of the user data by applying the target model simultaneously by a plurality of parties can be achieved.

Optionally, the target model obtained by training in the embodiments of the present specification may also be output externally, for example, the coordinator may send the updated gradient obtained by the last iteration and the target model to a specified device, so that the specified device may perform automatic auditing of the user profile based on the target model, where the specified device may be a device of an organization or an organization other than the participant in the federated system. It can be seen that embodiments of the present description may also provide user profile auditing capabilities to other organizations or organizations outside of the parties to the federated system described above.

Further, for safety, the coordinator may encrypt the updated gradient and the target model obtained from the last iteration and send the encrypted updated gradient and target model to the specified device. In practical applications, the coordinator may use the public key of the designated device owner for encryption, and accordingly, the designated device owner may use its own private key for decryption.

According to the federal learning method for the data risk assessment model provided by the embodiment of the specification, because user data of all parties of different data can be fused through federal learning, a target model which can automatically assess whether the user data has risks is trained, and the intelligent identification of the user data risks is realized, manual participation is not needed when the user data is audited, so that the auditing efficiency of the user data can be greatly improved, and the labor cost is saved. In addition, due to federal learning, joint modeling of all the data parties can be achieved under the condition that local user data of the participating parties cannot be out of the domain, so that on one hand, safety of the local user data of the participating parties is guaranteed, on the other hand, sample data adopted in modeling is more and more comprehensive, further, the risk of the user data can be more accurately identified through the established model, and accuracy of user data risk identification is improved.

Optionally, as shown in fig. 6, on the basis of the federal learning method of the material risk assessment model shown in fig. 2, the present specification further provides a material risk assessment method, which may be applied to the equipment of any one of the material owners shown in fig. 1, or alternatively, the method may be applied to the above-mentioned specified equipment, and the method may include:

step 602, receiving the data to be evaluated uploaded by the user.

For example, the e-commerce platform receives one or more pieces of information in the warranty such as a gate head light, a website and a business license submitted when the merchant initiates the parking request.

And step 604, extracting characteristic data from the data to be evaluated.

If the target model trained in the embodiment shown in fig. 2 is a multi-modal model, in this step, the data owner device or the above-mentioned designated device can extract multi-modal features from the data to be evaluated as feature data.

And 606, determining the risk of the data to be evaluated based on the characteristic data and the target model.

The target model is obtained by the above federal learning method of the data risk assessment model.

And if the target model is a multi-modal model, inputting the multi-modal features extracted in the last step into the trained multi-modal model to obtain a risk score output by the multi-modal model, and judging the risk condition of the data to be evaluated based on the risk score.

According to the data risk assessment method provided by the embodiment of the specification, the risk of the data to be assessed can be automatically assessed on the basis of the target model obtained through federal learning, so that the risk identification or audit efficiency of the data to be assessed can be improved. In addition, the target model is established by combining the user data of a plurality of data owners, so that the sample data adopted in modeling is more and more comprehensive, and the established target model can identify the risk of the user data more accurately, so that the method can also improve the accuracy of risk identification of the user data.

Next, the federal learning method applied to the coordinator and the information owner in the above-described federal learning will be described with reference to fig. 7 and 8, respectively.

FIG. 7 is a flow chart illustrating a federated learning method of a document risk assessment model that may be applied to all parties to the document. As shown in fig. 7, the method may include:

step 702, training the target model based on local user data to obtain a gradient of the target model and feeding the gradient back to a coordinator in federal learning, so that the coordinator integrates the gradients fed back by all data parties to obtain an updated gradient of the target model, wherein the target model is used for evaluating risks existing in user data.

And 704, receiving the update gradient issued by the coordinator.

And step 706, training the target model again based on the updated gradient and the local user data, and feeding back the gradient obtained by the training again to the coordinating party.

The federal learning method for a data risk assessment model provided in fig. 7 can achieve the same technical effects as the method shown in fig. 2, please refer to the above, and will not be described herein again.

FIG. 8 is a flow chart of a federated learning method that can be applied to one of the above-described coordination partners' risk assessment models. As shown in fig. 8, the method may include:

and step 802, receiving a gradient fed back by a data owner, wherein the gradient is obtained by training a target model by the data owner based on local user data, the data owner is a participant in federal learning, and the target model is used for evaluating risks of user data.

Step 804, integrating the gradients fed back by at least two data owners to obtain the update gradient of the target model.

And 806, sending the updating gradient to the data owner, so that the data owner trains the target model again based on the updating gradient and the local user data, and feeds back the gradient obtained by the retraining.

Fig. 8 provides a federal learning method for a data risk assessment model, which can achieve the same technical effects as the method shown in fig. 2, please refer to the above, and details are not repeated here.

On the basis of the method, the specification provides a federated system, which comprises: a coordinator and a plurality of data owners acting as participants.

The data owner is used for training a target model based on local user data, obtaining the gradient of the target model and feeding the gradient back to a coordinator in federal learning, wherein the target model is used for evaluating risks existing in user data.

The coordination party is used for integrating the gradients fed back by at least two data owners to obtain the update gradient of the target model and sending the update gradient to the data owners;

The system can achieve the same technical effects as the method shown in fig. 2, and the details are not repeated herein.

The above is a description of embodiments of the method and system provided in this specification, and the following is a description of an electronic device provided in this specification.

Fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification. Referring to fig. 9, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

And the processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the federal learning device of the material risk assessment model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

receiving the updating gradient issued by the coordinator;

Or, the processor executes the program stored in the memory, and is specifically configured to perform the following operations:

The above federal learning method for a document risk assessment model disclosed in the embodiments of fig. 7 or fig. 8 of the present specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

This specification embodiment also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 7, and in particular to perform the following operations:

receiving the updating gradient issued by the coordinator;

This specification embodiment also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 8, and in particular to perform the following operations:

The federal learning device of the document risk assessment model provided in the present specification is explained below.

As shown in fig. 10, an embodiment of the present specification provides a material risk assessment apparatus, which may be applied to the equipment of the owner of the material or the designated equipment, and in a software implementation, the material risk assessment apparatus 1000 may include: a material receiving module 1001, a feature extraction module 1002 and a risk assessment module 1003.

The data receiving module 1001 is configured to receive data to be evaluated uploaded by a user.

The feature extraction module 1002 is configured to extract feature data from the data to be evaluated.

A risk assessment module 1003, configured to determine a risk of the data to be assessed based on the feature data and an object model, where the object model is trained based on the method according to any one of claims 1 to 9.

It should be noted that the data risk assessment apparatus 1000 can implement the method of the embodiment of the method shown in fig. 6, and specific reference may be made to the data risk assessment method of the embodiment shown in fig. 6, which is not described again.

As shown in fig. 11, an embodiment of the present disclosure provides a federal learning device for a document risk assessment model, which can be applied to a device of the document owner, and in a software implementation, the federal learning device 1100 for the document risk assessment model may include: a first training module 1101, an update gradient receiving module 1102, and a second training module 1103.

The first training module 1101 is configured to train a target model based on local user profile data to obtain a gradient of the target model and feed the gradient back to a coordinator in federal learning, so that the coordinator integrates the gradients fed back by at least two data owners to obtain an updated gradient of the target model, where the target model is used to evaluate risks existing in user profiles.

An update gradient receiving module 1102, configured to receive the update gradient sent by the coordinator.

A second training module 1103, configured to train the target model again based on the updated gradient and the local user profile data, and feed back the gradient obtained by the retraining to the coordinator.

It should be noted that, the federal learning apparatus 1100 of the material risk assessment model can implement the method in the embodiment of the method in fig. 7, and specifically, reference may be made to the federal learning method of the material risk assessment model in the embodiment shown in fig. 7, which is not described again.

As shown in fig. 12, another embodiment of the present specification further provides a federal learning device for a material risk assessment model, which can be applied to the above-mentioned coordinating party device, and in a software implementation, the device 1200 may include: a gradient receiving module 1201, a gradient integrating module 1202 and an update gradient issuing module 1203.

The gradient receiving module 1201 is configured to receive a gradient fed back by a material owner, where the gradient is obtained by the material owner training a target model based on local user material data, the material owner is a party in federal learning, and the target model is used to evaluate a risk existing in user material.

A gradient integration module 1202, configured to integrate gradients fed back by at least two data owners to obtain an updated gradient of the target model.

An update gradient issuing module 1203, configured to send the update gradient to the data owner, so that the data owner trains the target model again based on the update gradient and local user data, and feeds back a gradient obtained by the retraining.

It should be noted that the federal learning apparatus 900 of the document risk assessment model can implement the method in the embodiment of the method in fig. 8, and specifically, reference may be made to the federal learning method of the document risk assessment model in the embodiment shown in fig. 8, and details are not repeated.

While certain embodiments of the present disclosure have been described above, other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A federal learning method for a material risk assessment model comprises the following steps:

2. The method of claim 1, wherein the target model is a multi-modal model, the profile owner training the target model based on local user profile data, comprising:

and the data owner extracts multi-modal features from local user data, and performs multi-modal model training based on the multi-modal features.

3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,

the multi-modal model is a tensor fusion network TFN.

4. The method of claim 1, wherein the profiling all directions the coordinator feeding back a gradient of the target model, comprises:

and the data owner encrypts the gradient of the target model and feeds the gradient back to the coordinator.

5. The method of claim 4, wherein the coordinating party integrates the gradients fed back by at least two of the data owners to obtain an updated gradient of the target model, comprising:

and the coordinator decrypts the gradients fed back by at least two data owners in a trusted computing environment TEE, and integrates the decrypted gradients to obtain the update gradient of the target model.

6. The method of claim 5, wherein the reconciliation direction sends the updated gradient of the target model to the profile owner, comprising:

and the coordinator encrypts the update gradient of the target model in a trusted computing environment (TEE) and sends the encrypted update gradient to all data parties in the federal study.

7. The method according to any one of claims 1 to 6,

the federal learning is horizontal federal learning, the user data of all the parties of different materials in the federal learning come from different users, and the characteristics contained in the user data of all the parties of different materials in the federal learning are the same.

8. The method of claim 7, wherein said step of treating is carried out in a batch process,

at least one data owner in the federal study is an e-commerce platform, the user is a merchant residing on the e-commerce platform, and the user data comprises at least one of identity certification certificates and qualification certification certificates.

9. The method of any of claims 1-6, further comprising:

when a preset iteration termination condition is met, the coordinator sends the updated gradient obtained by the last integration to a target data owner, wherein the target data owner is any data owner in federal learning;

and updating the target model by the target data owner based on the updated gradient obtained by the last integration so as to evaluate the risk of the user data.

10. A method for evaluating risk of data comprises the following steps:

receiving data to be evaluated uploaded by a user;

extracting characteristic data from the data to be evaluated;

determining the risk of the data to be evaluated based on the feature data and an objective model, wherein the objective model is obtained by training based on the method of any one of claims 1-9.

11. A method of federal learning of a document risk assessment model, the method comprising:

receiving the updating gradient issued by the coordinator;

12. A method of federal learning of a document risk assessment model, the method comprising:

13. A bang learning system, comprising: a coordinator and a plurality of data owners as participants,

14. An apparatus for document risk assessment, the apparatus comprising:

a risk assessment module, configured to determine a risk of the data to be assessed based on the feature data and an objective model, where the objective model is trained based on the method according to any one of claims 1 to 9.

15. A federal learning device for a document risk assessment model, the device comprising:

16. A federal learning device for a document risk assessment model, the device comprising:

17. An electronic device, comprising:

a processor; and

receiving the updating gradient issued by the coordinator;

18. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:

receiving the updating gradient issued by the coordinator;

19. An electronic device, comprising:

a processor; and

20. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to: