CN113469377B

CN113469377B - Federal learning auditing method and device

Info

Publication number: CN113469377B
Application number: CN202110762621.3A
Authority: CN
Inventors: 霍昱光; 刘春伟; 权纯
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2023-01-13
Anticipated expiration: 2041-07-06
Also published as: CN113469377A

Abstract

The specification relates to the technical field of machine learning, and particularly discloses a method and a device for auditing federated learning, wherein the method comprises the following steps: receiving a federal learning audit request, wherein the federal learning audit request comprises a model identifier of a target federal learning model to be audited; responding to a federal learning audit request, and acquiring interactive information in a target federal learning model training process corresponding to a model identifier, wherein the interactive information carries an interactive information identifier which comprises information of a stage of generating the interactive information; performing model training based on the interaction information to obtain an audit model; and generating a model auditing result according to the auditing model and the target federal learning model. In the scheme, the stage of the specific generation of the interactive information can be obtained through the interactive information identification, so that model training can be performed based on the interactive information and the specific generation stage thereof, the process is simple and convenient, and the auditing efficiency and accuracy of the model can be improved.

Description

Federal learning auditing method and device

Technical Field

The specification relates to the technical field of machine learning, in particular to a method and a device for auditing bang learning.

Background

In some business scenarios, different data parties often own different data sources. Sometimes, a required target federated learning model is constructed through joint training (for example, federated learning) on the premise that data owned by different data parties is not leaked to each other by utilizing data sources owned by different data parties at the same time.

Federal learning is a cross-domain data sharing technology, and is combined with modeling in a mode which can prove safety through cryptography, but due to the adoption of a distributed peer-to-peer system, the problem that mutual trust is difficult exists among participants, whether a data provider provides real data, whether a model training process is really carried out according to an algorithm, whether a training result is credible or not are main factors influencing Federal learning application, and both the data provider and a cooperative party have concerns.

Therefore, a method for auditing and checking a model obtained by federal learning is needed to improve the reliability and accuracy of the model.

Disclosure of Invention

The embodiment of the specification provides a method and a device for auditing federal study, and aims to provide a method for auditing and checking a model obtained by federal study so as to improve the reliability and accuracy of the model.

An embodiment of the present specification provides a method for auditing federated study, including: receiving a federal learning audit request, wherein the federal learning audit request comprises a model identifier of a target federal learning model to be audited; responding to a federal learning audit request, and acquiring interactive information in a target federal learning model training process corresponding to a model identifier, wherein the interactive information carries an interactive information identifier which comprises information of a stage of generating the interactive information; performing model training based on the interaction information to obtain an audit model; and generating a model auditing result according to the auditing model and the target federal learning model.

In one embodiment, the interaction information identifier comprises a plurality of identifiers, a first identifier of the plurality of identifiers comprises a module identifier that generates the interaction information, and a second identifier of the plurality of identifiers comprises a stage identifier at which the interaction information is generated.

In one embodiment, the interaction information identifier comprises a plurality of identifiers, a first identifier of the plurality of identifiers comprises a module identifier for generating the interaction information, and a second identifier of the plurality of identifiers comprises a stage identifier at which the interaction information is generated.

In one embodiment, the stage at which the interaction information is generated comprises a plurality of iterations; accordingly, a third identifier of the plurality of identifiers comprises a sequence number of an iteration in which the stage of generating the mutual information is located.

In one embodiment, the stage at which the interaction information is generated comprises a plurality of sub-stages; accordingly, a fourth identifier of the plurality of identifiers comprises an identification of a sub-phase of the phase of generating the interaction information.

In one embodiment, the sub-phase in which the interaction information is generated comprises a plurality of cyclic sub-phases; accordingly, a fifth identifier of the plurality of identifiers comprises an identification of a recurring sub-phase in which the sub-phase of generating the interaction information is located.

In one embodiment, the loop sub-phase in which the interaction information is generated includes a plurality of layers; accordingly, a sixth identifier of the plurality of identifiers comprises an identification of a layer of a recurring sub-phase of generating the interaction information.

In one embodiment, generating a layer of interaction information includes a plurality of steps; accordingly, a seventh identifier of the plurality of identifiers comprises an identification of the step of generating the interaction information.

In one embodiment, generating model audit results according to the audit model and the target federal learning model comprises: obtaining a prediction sample set, wherein the prediction sample set comprises a plurality of prediction samples; predicting each prediction sample in the plurality of prediction samples by using an audit model to obtain an audit prediction result corresponding to each prediction sample; predicting each prediction sample in the plurality of prediction samples by using a target federal learning model to obtain a target prediction result corresponding to each prediction sample; and generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample.

In one embodiment, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample includes: determining the percentage of prediction samples with the same target prediction result and corresponding audit prediction results in the plurality of prediction samples; judging whether the percentage is greater than a preset percentage; and determining that the target federal learning model is audited to pass under the condition that the percentage is larger than the preset percentage.

In one embodiment, the prediction sample set further includes label information corresponding to each of the plurality of prediction samples; correspondingly, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample, including: determining a prediction performance index of a target federal learning model according to a target prediction result corresponding to each prediction sample and label information corresponding to each prediction sample; determining the prediction performance index of the audit model according to the audit prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample; and generating a model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model.

In one embodiment, generating the model audit result based on the predicted performance metrics of the target federated learning model and the predicted performance metrics of the audit model includes: judging whether the difference value between the predicted performance index of the target federal learning model and the predicted performance index of the auditing model is within a preset range or not; and determining that the target federal learning model is audited to pass under the condition that the difference value is judged to be within the preset range.

This description embodiment still provides a bang study audit device, includes: the receiving module is used for receiving a federal learning audit request, wherein the federal learning audit request comprises a model identifier of a target federal learning model to be audited; the acquisition module is used for responding to the federal learning audit request and acquiring interactive information in a target federal learning model training process corresponding to the model identification, wherein the interactive information carries interactive information identification, and the interactive information identification comprises information of a stage of generating the interactive information; the training module is used for carrying out model training based on the interaction information to obtain an audit model; and the generating module is used for generating a model auditing result according to the auditing model and the target federal learning model.

Embodiments of the present specification further provide a computer device, including a processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the steps of the federal study audit method described in any of the above embodiments.

Embodiments of the present specification further provide a computer readable storage medium having stored thereon computer instructions that, when executed, implement the steps of the federal study audit method as described in any of the above embodiments.

In an embodiment of the specification, a federated learning audit method is provided, which may receive a federated learning audit request, where the federated learning audit request includes a model identifier of a target federated learning model to be audited, and in response to the federated learning audit request, interaction information in a target federated learning model training process corresponding to the model identifier may be obtained, where the interaction information carries an interaction information identifier, and the interaction information identifier includes information of a stage in which the interaction information is generated, and may perform model training based on the interaction information to obtain an audit model, and then generate a model audit result according to the audit model and the target federated learning model. In the above scheme, the model can be retrained based on the interactive information, the trained model can be recorded as an audit model, the audit model can be compared with a target federal learning model to be audited, whether one or more parties in federal learning do not train according to an agreed method during federal learning is judged, and reliability and accuracy of federal learning can be improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, are incorporated in and constitute a part of this specification, and are not intended to limit the specification. In the drawings:

FIG. 1 illustrates a flow diagram of a federated study audit method in one embodiment of the present description;

FIG. 2 illustrates an interaction diagram for federated learning in one embodiment of the present description;

FIG. 3 is a diagram illustrating interactive information identification in one embodiment of the present specification;

FIG. 4 is a schematic diagram of a Federal study Audit apparatus in one embodiment of the present description;

FIG. 5 shows a schematic diagram of a computer device in one embodiment of the present description.

Detailed Description

The principles and spirit of the present description will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely to enable those skilled in the art to better understand and to implement the present description, and are not intended to limit the scope of the present description in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present description may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

The embodiment of the specification provides a method for auditing the federated study. In one example scenario of the present description, a data provider and a data consumer may perform federated learning to arrive at a target federated learning model. Wherein the data provider and the data consumer can upload data for federated modeling. And then, carrying out encrypted sample alignment on the data of both sides, and outputting sample information for federal learning. A device of a data consumer may receive a federal learning audit request input by a user via a client. The federal learning audit request can carry a target federal learning model to be audited or a target federal learning model identification to be audited. Under the condition of carrying the target federal learning model identification, the device of the data user can obtain the target federal learning model corresponding to the target federal learning model identification. And responding to the federal learning audit request, the equipment of the data user can acquire the interactive information corresponding to the target federal learning model identification, and perform model training based on the interactive information to obtain the audit model. And then, the equipment of the data user can generate a model audit result according to the audit model and the target federal learning model.

The device of the data user may be a single server, or a server cluster including a plurality of servers, or a cloud server. This is not limited by the present application.

FIG. 1 shows a flow diagram of a federated learning audit method in one embodiment of the present specification. Although the present specification provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or modular units may be included in the methods or apparatus based on conventional or non-inventive efforts. In the step or structure in which the necessary cause and effect relationship does not logically exist, the execution sequence of the steps or the module structure of the apparatus is not limited to the execution sequence or the module structure described in the embodiment of the present specification and shown in the drawings. When the described method or module structure is applied in an actual device or end product, the method or module structure according to the embodiments or shown in the drawings can be executed sequentially or executed in parallel (for example, in a parallel processor or multi-thread processing environment, or even in a distributed processing environment).

Specifically, as shown in fig. 1, the federal study audit method provided in an embodiment of the present specification may include the following steps:

step S101, receiving a federal learning audit request, wherein the federal learning audit request comprises a model identification of a target federal learning model to be audited.

The method in the embodiment of the specification can be applied to equipment corresponding to one participant in federal learning, and can also be applied to trusted equipment of a third party. For convenience of description, the apparatuses are collectively referred to as a server in the embodiments of the present specification.

The server may receive a federal learning audit request. For example, the server may receive a federal learning audit request sent by a user via a client. The federal learning audit request can comprise a model identifier of a target federal learning model to be audited. Based on the model identification, the server may obtain a corresponding target federated learning model.

And step S102, responding to the federal learning audit request, and acquiring interactive information in a target federal learning model training process corresponding to the model identification, wherein the interactive information carries interactive information identification, and the interactive information identification comprises information of a stage of generating the interactive information.

And S103, performing model training based on the interaction information to obtain an audit model.

After receiving the federal learning audit request, the server can respond to the federal learning audit request to acquire interaction information in a target federal learning model training process corresponding to the model identification.

In one embodiment, the federal learning participants can store the interaction information in the training process in a local database, and the interaction information can carry corresponding model identifications, so that the subsequent acquisition is facilitated. The server can send an acquisition request to the federal learning participant, wherein the acquisition request carries the model identification, and the federal learning participant responds to the acquisition request and returns the interactive information corresponding to the model identification to the server.

After the interactive information is obtained, the server can perform model training based on the interactive information to obtain the audit model.

And step S104, generating a model auditing result according to the auditing model and the target federal learning model.

Specifically, the server may generate a model audit result according to the audit model and the target federal learning model to determine whether the target federal learning model is audited. Model audit results may include model audit pass and model audit fail.

In one embodiment, the server may generate the model audit result according to the model parameters in the audit model and the model parameters in the target federal learning model. For example, it may be determined whether a difference between each of a plurality of model parameters in the audit model and a corresponding model parameter in the target federal learning model is within a preset range.

In one embodiment, the server may generate the model audit result according to the prediction result of the audit model on each of the plurality of prediction samples and the prediction result of the target federal learning model on each of the plurality of prediction samples.

In one embodiment, the server may generate the model audit result according to the predicted performance index of the audit model and the predicted performance index of the target federal learning model.

The above-described methods of generating model audit results are merely exemplary, and other methods of generating model audit results will occur to those of skill in the art.

In the method in the embodiment, the model can be retrained based on the interactive information, the trained model can be recorded as the audit model, the audit model can be compared with the target federal learning model to be audited, whether one or more parties in the federal learning are not trained according to an agreed method during the federal learning is judged, and the reliability and the accuracy of the federal learning can be improved.

In some embodiments of the present description, the interaction information identifier may include a plurality of identifiers, a first identifier of the plurality of identifiers may include a module identifier that generates the interaction information, and a second identifier of the plurality of identifiers may include a stage identifier at which the interaction information is generated.

In particular, the model training process may include multiple stages. The specific information of multiple stages can be related to different model training modes, and can be divided in advance according to specific situations. The interaction information identifier may include a plurality of identifiers. The first identifier of the plurality of identifiers may include an identification corresponding to a module that generated the interaction information.

In general, the training process may include multiple phases. Thus, the second identifier of the plurality of identifiers may comprise an identification of a stage at which the interaction information was generated. For example, when the model training includes 3 stages, the stage identifier may take values of 1, 2, and 3.

In the above embodiment, the phase identifier for generating the interactive information is included in the interactive information identifier, so that the interactive information can be applied to a specific phase when the server performs model training, thereby smoothly performing model training and obtaining the audit model.

In some embodiments of the present description, the stage at which the interaction information is generated may include a plurality of iterations; accordingly, a third identifier of the plurality of identifiers comprises a sequence number of an iteration in which the stage of generating the mutual information is located.

In some training phases of the model, multiple iterations are required, such as a sub-tree training phase. Thus, a third one of the plurality of identifiers may comprise a sequence number of an iteration at which the stage of generating the interaction information is located.

For example, if a phase includes 4 iterations, the number of iterations may take 1, 2, 3, and 4.

In the above embodiment, the interactive information includes the iteration sequence number corresponding to the stage of generating the interactive information, so that the interactive information can be applied to the stage corresponding to the specific iteration sequence number when the server performs model training, thereby smoothly performing model training and obtaining the audit model.

In some embodiments of the present description, the phase at which the interaction information is generated may include a plurality of sub-phases; accordingly, a fourth identifier of the plurality of identifiers may include an identification of a sub-phase of the phase of generating the interaction information.

In particular, a stage of model training may include multiple sub-stages. A fourth identifier of the plurality of identifiers may include an identification of a sub-phase in which the phase of generating the interaction information is located.

For example, the training round may include two sub-phases of subtree training and subtree prediction. In this case, the identification of the sub-phases may take 1 and 2.

In the above embodiment, by including the identifier of the sub-stage corresponding to the stage of generating the interactive information in the interactive information, the interactive information can be applied to the stage corresponding to the specific iteration number when the server performs model training, so that the model training is smoothly performed, and the audit model is obtained.

In some embodiments of the present description, the sub-phase in which the interaction information is generated may include a plurality of loop sub-phases; accordingly, a fifth identifier of the plurality of identifiers may include an identification of a recurring sub-phase in which the sub-phase of generating the interaction information is located.

In particular, a stage of model training may include a plurality of sub-stages, each of which may need to cycle multiple times, i.e., each sub-stage may include a plurality of cycle sub-stages. A fifth identifier of the plurality of identifiers may include an identification of a recurring sub-phase in which the sub-phase of generating the interaction information is located.

For example, if the sub-phase includes 4 cyclic sub-phases, the identity of the cyclic sub-phases may take 1, 2, 3, and 4.

In the above embodiment, by including, in the interaction information, the identifier of the cycle sub-stage in which the sub-stage of the stage for generating the interaction information is located, the interaction information may be applied to the stage corresponding to the specific cycle sub-stage when the server performs model training, so that the model training is performed smoothly, and the audit model is obtained.

In some embodiments of the present description, the loop sub-phase in which the interaction information is generated may include a plurality of layers; accordingly, a sixth identifier of the plurality of identifiers may comprise an identification of a layer of the recurring sub-phase of generating the interaction information.

In particular, the cyclic sub-phase of model training may be divided into multiple layers. That is, the cyclic sub-phase includes multiple layers. A sixth identifier of the plurality of identifiers may comprise an identification of a layer of a recurring sub-phase of generating the interaction information.

For example, if a certain cyclic sub-phase includes 3 layers, the layer identifiers may take 1, 2, and 3.

In the above embodiment, the interactive information includes the information of the layer of the cyclic sub-stage generating the interactive information, so that the interactive information can be applied to the stage corresponding to the specific layer of the specific cyclic sub-stage when the server performs model training, thereby smoothly performing model training and obtaining the audit model.

In some embodiments of the present description, generating a layer of interaction information may include multiple steps; accordingly, a seventh identifier of the plurality of identifiers may comprise an identification of the step of generating the interaction information.

In particular, the layer of cyclic sub-phases may include multiple steps. Accordingly, a seventh identifier of the plurality of identifiers comprises an identification of the step of generating the interaction information.

For example, a layer may include 2 steps, and the identification of the steps may take 1 and 2.

In the above embodiment, the interactive information includes information of a step of generating a layer of the interactive information, so that the interactive information can be applied to a stage corresponding to a specific step of a specific layer of a specific cycle sub-stage when the server performs model training, thereby smoothly performing model training and obtaining the audit model.

In the above embodiment, the mutual information identifier may include 1 to 7 identifiers. It will be appreciated by those skilled in the art that the above embodiments are merely exemplary, and the application is not limited thereto.

In some embodiments of the present specification, in consideration of the fact that different models have different stages and hierarchical divisions, the number of identifiers has different responses, and in order to prevent overflow, the interaction information may be stored as a fixed number, which is a large value, for example, may take 50, 100, or the like.

In some embodiments of the present description, the interaction information identifiers may be stored in a dynamic list, taking into account the differences in the stages and hierarchical divisions of the different models. Therefore, in some embodiments of the present specification, the mutual information identification of the mutual information may be stored in a dynamic list. In the embodiment, the interactive information identifier is stored in a dynamic list mode, so that different requirements of different models at different stages can be met, the memory can be saved, and the operation is flexible and simple.

In some embodiments of the present description, generating a model audit result according to the audit model and the target federal learning model may include: obtaining a prediction sample set, wherein the prediction sample set comprises a plurality of prediction samples; predicting each prediction sample in the plurality of prediction samples by using an audit model to obtain an audit prediction result corresponding to each prediction sample; predicting each prediction sample in the plurality of prediction samples by using a target federal learning model to obtain a target prediction result corresponding to each prediction sample; and generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample.

Specifically, in order to determine whether the model is approved by auditing, the target federal learning model and the auditing model can be used for predicting the prediction samples in the prediction sample set respectively, and judgment is performed according to the prediction result. The server may obtain a prediction sample set. The prediction sample set may include a plurality of prediction samples. And predicting each prediction sample in the plurality of prediction samples by using a target federal learning model to obtain a target prediction result corresponding to each prediction sample. The audit model can be used for predicting each prediction sample in the prediction samples to obtain the audit prediction result corresponding to each prediction sample. Then, the server may generate a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample. For example, in the case that the prediction results of the two are close, the model audit is determined to be passed, otherwise, the model audit is determined to be failed. Model audit results can be generated in the above mode.

Further, in some embodiments of the present specification, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample may include: determining the percentage of prediction samples with the same target prediction result and corresponding audit prediction results in the plurality of prediction samples; judging whether the percentage is greater than a preset percentage or not; and determining that the target federal learning model is audited to pass under the condition that the percentage is larger than the preset percentage.

In particular, the percentage of test samples with the same target prediction result as the corresponding audit prediction result in the plurality of prediction samples may be determined. Thereafter, it may be determined whether the percentage is greater than a preset percentage. The preset percentage can be set to 85%, 90%, 95% and the like, and can be specifically set according to user requirements. And when the percentage is judged to be larger than the preset percentage, determining that the target federal learning model is audited to pass. By the mode, whether the model audit passes or not can be determined simply and conveniently.

In some embodiments of the present specification, the prediction sample set may further include label information corresponding to each of the plurality of prediction samples; correspondingly, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample may include: determining a prediction performance index of a target federal learning model according to a target prediction result corresponding to each prediction sample and label information corresponding to each prediction sample; determining the prediction performance index of the audit model according to the audit prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample; and generating a model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model.

The prediction sample set may further include a label corresponding to each of the plurality of prediction samples. The label may be used to indicate type information corresponding to the prediction sample. The server can determine the prediction performance index of the target federal learning model according to the target prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample. The predicted performance index may include accuracy, precision, recall, receiver Operating Characteristic Curve (ROC), area Under ROC Curve (Area Under ROC Curve), and the like. The server can determine the prediction performance index of the audit model according to the audit prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample. Thereafter, an audit result may be generated based on the predicted performance metrics of the target federal learning model and the predicted performance metrics of the audit model. For example, whether the predicted performance index of the target federal learning model and the predicted performance index of the audit model are close to each other or not can be determined, if yes, the target federal learning model is determined to be audited to be passed, and if not, the target federal learning model is determined to be audited not to be passed.

Further, in some embodiments of the present description, generating the model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model may include: judging whether the difference value between the predicted performance index of the target federal learning model and the predicted performance index of the auditing model is within a preset range or not; and determining that the target federal learning model is audited to pass under the condition that the difference value is judged to be within the preset range.

Specifically, the server may determine whether a difference between a predicted performance index of the target federal learning model and a corresponding predicted performance index of the audit model is within a preset range. For example, accuracy may be selected as the predicted performance indicator. Whether the difference value between the accuracy of the target federal learning model and the accuracy of the auditing model is within a preset range or not can be judged. Wherein the predetermined range may be between plus or minus five percent.

And determining that the target federal learning model is audited to pass under the condition that the difference value is judged to be within the preset range. Otherwise, determining that the target federal learning model audit does not pass.

The above method is described below with reference to a specific example, however, it should be noted that the specific example is only for better describing the present specification and should not be construed as an undue limitation on the present specification.

In the embodiment, a method for auditing federated study is provided. The server may receive a federal study audit request. The federal learning audit request can carry a model identifier of a target federal learning model to be audited. And responding to the received federal learning audit request, and the server can acquire the interactive information in the training process of the target joint learning model corresponding to the model identification. The interactive information may carry an interactive information identifier, and the interactive information identifier may include information of a generation stage of the interactive information. The server can perform model training based on the interaction information to obtain the audit model. And generating a model audit result according to the audit model and the target federal learning model.

In this embodiment, the identifier may be designed based on two principles: uniqueness and order. And taking the identification of the component or module of the federal learning task as a first identifier of the interactive information identification. And (3) decomposing the interactive flow inside the component according to the hierarchy (such as the number of rounds): for information within the same hierarchy (as data generated in a round of iteration), the same number is used on the same bit of the sequence identifier; for numbers on the same bit, the order of precedence is shown from small to large. Each bit of information is internally subdivided on subsequent zone bits until the behavior position of information interaction can be uniquely confirmed.

The method for making the unique identifier of each interactive information is described below by taking a vertical federal task modeled by a vertical model as an example of a FATE (Federal AI Technology Enabler based on the artificial intelligence enabled Technology of federal learning) open source framework.

Taking federal learning of both parties as an example, both parties of the data provider and the data user can respectively complete the uploading of the joint modeling task data through the first data module and the second data module. The data uploaded by the two parties can be transmitted to a data intersection module for processing.

In the data intersection module, the data of the two parties are aligned with the encrypted samples, and common sample information is output subsequently to perform longitudinal federal learning. The common sample information may be input to the model training module (e.g., the module identified as Secureboost _ 0).

The model training module may perform an actual model training task based on the output common sample information.

For the mutual information transmitted in the model training mode, the first position of the unique sequence identifier is 'Secureboost _ 0'.

Referring to fig. 2, a schematic diagram of interaction between a data provider and a data consumer in a model training process is shown. The data provider and the data user of the left party and the right party respectively represent two participants in model training; the implementation with an arrow in the middle represents the information sent between the participants; the outer dashed box represents a loop of a certain level, for example, the dashed box corresponding to the number of model rounds represents the generation process of each iterative sub-model, and the operation in the dashed box is repeatedly executed in the training process of each sub-model in the training process of the whole model.

Referring to fig. 3, a schematic diagram of an interactive information identifier in an embodiment of the present specification is shown. The meaning of each identifier in fig. 3 is as follows:

first, secureboost _0: this information corresponds to the interaction occurring in the Secureboost _0 module.

Second bit, 2: this information is generated in the second stage (sub-model iteration) corresponding to the interaction.

Third position, 4: this information corresponds to the 4 th iteration where the interaction occurs in the second phase.

Fourth, 1: the information corresponding interaction occurs in the sub-model generation stage of the iteration.

Fifth bit, 2: this information is generated in the second cycle phase (layer-wise) of the submodel generation corresponding to the interaction.

Sixth position, 2: this information corresponds to the interaction occurring in the second layer of node splitting.

Seventh position, 2: this information provides the data consumer with the split node location.

In the above embodiment, the interactive information identifier is used as additional information stored in the interactive information, so that the specific occurrence steps of the interactive information can be confirmed. The unique identifiers of the information in the other modules (encrypted sample alignment, other federal learning models) are also combed according to the interactive logic, but are edited in the same way. After the federal learning task is finished, the generation step of mutual information of all parties is confirmed through the unique identifier, and the method can be applied to federal learning audit.

Based on the same inventive concept, the embodiment of the present specification further provides a federal study audit device, as described in the following embodiments. The principle of the Federal learning and auditing device for solving the problems is similar to that of the Federal learning and auditing method, so the implementation of the Federal learning and auditing device can be referred to the implementation of the Federal learning and auditing method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated. Fig. 4 is a block diagram of a structure of a federal study audit device in an embodiment of the present specification, and as shown in fig. 4, the block diagram includes: a receiving module 401, an obtaining module 402, a training module 403 and a generating module 404, and the structure will be described below.

The receiving module 401 is configured to receive a federal learning audit request, where the federal learning audit request includes a model identifier of a target federal learning model to be audited.

The obtaining module 402 is configured to, in response to the federal learning audit request, obtain interaction information in a target federal learning model training process corresponding to the model identifier, where the interaction information carries an interaction information identifier, and the interaction information identifier includes information of a stage where the interaction information is generated.

The training module 403 is configured to perform model training based on the interaction information to obtain an audit model.

The generating module 404 is configured to generate a model audit result according to the audit model and the target federal learning model.

In some embodiments of the present description, the interaction information identifier comprises a plurality of identifiers, a first identifier of the plurality of identifiers comprises a module identifier that generates the interaction information, and a second identifier of the plurality of identifiers comprises a stage identifier at which the interaction information is generated.

In some embodiments of the present description, the interaction information identifier comprises a plurality of identifiers, a first identifier of the plurality of identifiers comprises a module identifier for generating the interaction information, and a second identifier of the plurality of identifiers comprises a stage identifier at which the interaction information is generated.

In some embodiments of the present description, the stage at which the interaction information is generated comprises a plurality of iterations; accordingly, a third identifier of the plurality of identifiers comprises a sequence number of an iteration in which the phase of generating the mutual information is located.

In some embodiments of the present description, the phase in which the interaction information is generated comprises a plurality of sub-phases; accordingly, a fourth identifier of the plurality of identifiers comprises an identification of a sub-phase of the phase of generating the interaction information.

In some embodiments of the present description, the sub-phase in which the interaction information is generated includes a plurality of cyclic sub-phases; accordingly, a fifth identifier of the plurality of identifiers comprises an identification of a recurring sub-phase in which the sub-phase of generating the interaction information is located.

In some embodiments of the present description, the loop sub-phase in which the interaction information is generated includes a plurality of layers; accordingly, a sixth identifier of the plurality of identifiers comprises an identification of a layer of the cyclical sub-phase of generating the interaction information.

In some embodiments of the present description, generating a layer of interaction information includes a plurality of steps; accordingly, a seventh identifier of the plurality of identifiers comprises an identification of the step of generating the interaction information.

In some embodiments of the present description, the generation module may be specifically configured to: obtaining a prediction sample set, wherein the prediction sample set comprises a plurality of prediction samples; predicting each prediction sample in the multiple prediction samples by using an audit model to obtain an audit prediction result corresponding to each prediction sample; predicting each prediction sample in the multiple prediction samples by using a target federal learning model to obtain a target prediction result corresponding to each prediction sample; and generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample.

In some embodiments of the present description, generating a model audit result based on a target prediction result corresponding to each prediction sample and an audit prediction result corresponding to each prediction sample includes: determining the percentage of prediction samples with the same target prediction result and corresponding audit prediction results in the plurality of prediction samples; judging whether the percentage is greater than a preset percentage; and determining that the target federal learning model is audited to pass under the condition that the percentage is larger than the preset percentage.

In some embodiments of the present specification, the prediction sample set further includes label information corresponding to each of the plurality of prediction samples; correspondingly, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample, including: determining a prediction performance index of a target federal learning model according to a target prediction result corresponding to each prediction sample and label information corresponding to each prediction sample; determining the prediction performance index of the audit model according to the audit prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample; and generating a model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model.

In some embodiments of the present description, the generating of the model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model includes: judging whether the difference value between the predicted performance index of the target federal learning model and the predicted performance index of the auditing model is within a preset range or not; and determining that the target federal learning model is audited to pass under the condition that the difference value is judged to be within the preset range.

From the above description, it can be seen that the embodiments of the present specification achieve the following technical effects: the method comprises the steps that a model can be retrained based on interactive information, the trained model can be recorded as an audit model, the audit model can be compared with a target federal learning model to be audited, whether one or more parties in the federal learning are not trained according to an appointed method during the federal learning is judged, and the reliability and the accuracy of the federal learning can be improved.

The embodiment of the present specification further provides a computer device, which may specifically refer to a schematic structural diagram of a computer device based on the federal learning audit method provided in the embodiment of the present specification, shown in fig. 5, where the computer device may specifically include an input device 51, a processor 52, and a memory 53. Wherein the memory 53 is configured to store processor-executable instructions. The processor 52, when executing the instructions, performs the steps of the federal learned audit method described in any of the embodiments above.

In this embodiment, the input device may be one of the main apparatuses for information exchange between a user and a computer system. The input device may include a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input board, a voice input device, etc.; the input device is used to input raw data and a program for processing the data into the computer. The input device can also acquire and receive data transmitted by other modules, units and devices. The processor may be implemented in any suitable way. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller and embedded microcontroller, and so forth. The memory may in particular be a memory device used in modern information technology for storing information. The memory may include multiple levels, and in a digital system, the memory may be any memory as long as it can store binary data; in an integrated circuit, a circuit without a real form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

In this embodiment, the functions and effects of the specific implementation of the computer device can be explained in comparison with other embodiments, and are not described herein again.

In an embodiment of the present specification, a computer storage medium based on a federal study audit method is further provided, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements the steps of the federal study audit method in any of the embodiments.

In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present specification described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present description are not limited to any specific combination of hardware and software.

It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the description should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

The above description is only a preferred embodiment of the present disclosure, and is not intended to limit the present disclosure, and it will be apparent to those skilled in the art that various modifications and variations can be made in the embodiment of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.

Claims

1. The method for auditing the bang study is characterized by comprising the following steps:

receiving a federal learning audit request, wherein the federal learning audit request comprises a model identifier of a target federal learning model to be audited;

responding to the federal learning audit request, and acquiring interactive information in a target federal learning model training process corresponding to the model identification, wherein the interactive information carries interactive information identification, and the interactive information identification comprises information of a stage of generating the interactive information; the interactive information also comprises common sample information generated after carrying out encryption sample alignment on sample data uploaded by a data provider and a data user;

performing model training based on the interaction information to obtain an audit model;

and generating a model auditing result according to the auditing model and the target federal learning model.

2. The method of claim 1, wherein the interaction information identifier comprises a plurality of identifiers, wherein a first identifier of the plurality of identifiers comprises a module identifier for generating interaction information, and wherein a second identifier of the plurality of identifiers comprises a stage identifier for generating interaction information.

3. The method of claim 2, wherein the stage at which the interaction information is generated comprises a plurality of iterations;

correspondingly, a third identifier of the plurality of identifiers comprises a sequence number of an iteration in which the phase of generating the mutual information is located.

4. The method of claim 3, wherein the stage in which the interaction information is generated comprises a plurality of sub-stages;

accordingly, a fourth identifier of the plurality of identifiers comprises an identification of a sub-phase of the phase of generating the interaction information.

5. The method of claim 4, wherein the sub-phase in which the mutual information is generated comprises a plurality of cyclic sub-phases;

accordingly, a fifth identifier of the plurality of identifiers comprises an identification of a recurring sub-phase in which the sub-phase of generating the interaction information is located.

6. The method of claim 5, wherein the cyclic sub-phase in which the mutual information is generated comprises a plurality of layers;

accordingly, a sixth identifier of the plurality of identifiers comprises an identification of a layer of a cyclic sub-phase of generating the interaction information.

7. The method of claim 6, wherein generating a layer of interaction information comprises a plurality of steps;

accordingly, a seventh identifier of the plurality of identifiers comprises an identification of the step of generating the interaction information.

8. The method of claim 1, wherein generating model audit results based on the audit model and the target federal learning model comprises:

obtaining a prediction sample set, wherein the prediction sample set comprises a plurality of prediction samples;

predicting each prediction sample in the plurality of prediction samples by using the audit model to obtain an audit prediction result corresponding to each prediction sample;

predicting each prediction sample in the plurality of prediction samples by using the target federal learning model to obtain a target prediction result corresponding to each prediction sample;

and generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample.

9. The method of claim 8, wherein generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample comprises:

determining the percentage of prediction samples with the same target prediction result and corresponding audit prediction results in the plurality of prediction samples;

judging whether the percentage is larger than a preset percentage or not;

and determining that the target federal learning model is audited to pass under the condition that the percentage is larger than the preset percentage.

10. The method of claim 8, wherein the set of prediction samples further includes label information corresponding to each of the plurality of prediction samples;

correspondingly, generating a model audit result based on the target prediction result corresponding to each prediction sample and the audit prediction result corresponding to each prediction sample, including:

determining a prediction performance index of the target federal learning model according to a target prediction result corresponding to each prediction sample and label information corresponding to each prediction sample;

determining the prediction performance index of the audit model according to the audit prediction result corresponding to each prediction sample and the label information corresponding to each prediction sample;

and generating a model audit result based on the predicted performance index of the target federal learning model and the predicted performance index of the audit model.

11. The method of claim 10, wherein generating model audit results based on the predicted performance metrics of the target federal learning model and the predicted performance metrics of the audit model comprises:

judging whether the difference value between the predicted performance index of the target federal learning model and the predicted performance index of the audit model is within a preset range or not;

and determining that the target federal learning model audits to pass under the condition that the difference value is judged to be within a preset range.

12. The utility model provides a bang study audit device which characterized in that includes:

the receiving module is used for receiving a federal learning audit request, wherein the federal learning audit request comprises a model identifier of a target federal learning model to be audited;

an obtaining module, configured to respond to the federal learning audit request, to obtain interaction information in a target federal learning model training process corresponding to the model identifier, where the interaction information carries an interaction information identifier, and the interaction information identifier includes information of a stage where the interaction information is generated; the interactive information also comprises common sample information generated after carrying out encryption sample alignment on sample data uploaded by a data provider and a data user;

the training module is used for carrying out model training based on the interaction information to obtain an audit model;

and the generating module is used for generating a model auditing result according to the auditing model and the target federal learning model.

13. The apparatus of claim 12, wherein the interaction information identifier comprises a plurality of identifiers, wherein a first identifier of the plurality of identifiers comprises a module identifier for generating interaction information, and wherein a second identifier of the plurality of identifiers comprises a stage identifier for generating interaction information.

14. A computer device comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 11.

15. A computer-readable storage medium having stored thereon computer instructions, wherein the instructions, when executed, implement the steps of the method of any one of claims 1 to 11.