WO2021203260A1 - 一种节点匹配方法、装置、设备及系统 - Google Patents

一种节点匹配方法、装置、设备及系统 Download PDF

Info

Publication number
WO2021203260A1
WO2021203260A1 PCT/CN2020/083639 CN2020083639W WO2021203260A1 WO 2021203260 A1 WO2021203260 A1 WO 2021203260A1 CN 2020083639 W CN2020083639 W CN 2020083639W WO 2021203260 A1 WO2021203260 A1 WO 2021203260A1
Authority
WO
WIPO (PCT)
Prior art keywords
data flow
flow graph
plaintext
operator
node information
Prior art date
Application number
PCT/CN2020/083639
Other languages
English (en)
French (fr)
Inventor
黄高峰
陈元丰
晏意林
史俊杰
谢翔
李升林
孙立林
Original Assignee
云图技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 云图技术有限公司 filed Critical 云图技术有限公司
Priority to PCT/CN2020/083639 priority Critical patent/WO2021203260A1/zh
Publication of WO2021203260A1 publication Critical patent/WO2021203260A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • This application relates to the field of data processing technology, and in particular to a node matching method, device, equipment and system.
  • the embodiments of this specification provide a node matching method, device, equipment and system, which can realize automatic testing of the correctness of the data flow graph and the execution result of the data flow graph, thereby improving the verification efficiency.
  • the node matching method, device, equipment and system provided in this manual are implemented in the following ways:
  • a node matching method includes:
  • the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model
  • the node information includes all The plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and the ciphertext operator node information included in the second data flow graph;
  • the first data flow graph is a subgraph of the second data flow graph
  • matching the plaintext operator node information with the ciphertext operator node information and outputting a matching result.
  • a node matching device includes:
  • An information acquisition module for acquiring data flow graph information and node information, where the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model,
  • the node information includes plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and ciphertext operator node information included in the second data flow graph;
  • a judging module configured to judge whether the first data flow graph is a subgraph of the second data flow graph
  • the matching module is configured to match the plaintext operator node information with the ciphertext operator node information when determining that the first data flow graph is a subgraph of the second data flow graph, and output a matching result.
  • a node matching device includes a processor and a memory for storing executable instructions of the processor.
  • the implementation includes the following steps:
  • the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model
  • the node information includes all The plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and the ciphertext operator node information included in the second data flow graph;
  • the first data flow graph is a subgraph of the second data flow graph
  • matching the plaintext operator node information with the ciphertext operator node information and outputting a matching result.
  • a node matching system includes at least one processor and a memory storing computer-executable instructions.
  • the processor executes the instructions, the steps of any method embodiment method in the embodiments of this specification are implemented.
  • the optimizer test component in the process of replacing the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator, encapsulates the static optimizer, so that in the process of obtaining data flow graph information and node information , Not only can reuse the existing plaintext machine learning model to realize the privacy machine learning model, reduce development costs, but also provide guarantee for the realization of automatic test data flow graphs and the correctness of graph execution results. After obtaining the data flow graph information and node information, by judging the corresponding data flow graph before and after the plaintext operator replacement, it can be ensured that the part of the original graph has not been modified, and the correct execution of the plaintext machine learning model can still be provided.
  • the automatic testing of the correctness of the data flow graph and the execution result of the graph can be realized, thereby improving the verification efficiency.
  • the implementation scheme provided in this manual can not only reuse the existing plaintext machine learning model to realize the privacy machine learning model, reduce the development cost, and improve the coding efficiency, but also can realize the automatic test of the correctness of the data flow graph and the execution result of the graph. Thereby improving the verification efficiency.
  • Fig. 1 is a schematic flowchart of an embodiment of a node matching method provided in this specification
  • FIG. 2 is a schematic flowchart of a specific embodiment of the node matching method provided in this specification
  • FIG. 3 is a schematic flowchart of another embodiment of the node matching method provided in this specification.
  • FIG. 4 is a schematic diagram of the module structure of an embodiment of a node matching device provided in this specification.
  • Fig. 5 is a hardware structure block diagram of an embodiment of a node matching server provided in this specification.
  • the local plaintext operator in the preset plaintext machine learning model can be replaced with the corresponding cryptographic operator to obtain the corresponding private machine learning model.
  • the privacy machine learning model is realized by reusing the existing plaintext machine learning model, which can effectively reduce the huge development cost caused by recoding the application program interface and private data type unique to the privacy machine learning framework, and improve the coding efficiency.
  • the plaintext machine learning model is transformed into the corresponding private machine learning model, not only the data flow graph corresponding to the private machine learning model will be generated, but also the execution result of the private data type will be output.
  • it is usually necessary to manually write tests for verification which is inefficient.
  • This manual provides a node matching method, device, equipment and system, which can not only reuse the existing plaintext machine learning model to realize the privacy machine learning model, reduce development costs, improve coding efficiency, but also realize the execution of data flow graphs and graphs Automated testing of the correctness of the results, thereby improving the efficiency of verification.
  • FIG. 1 is a schematic flowchart of an embodiment of the node matching method provided in this specification.
  • this specification provides method operation steps or device structures as shown in the following embodiments or drawings, the method or device may include more or fewer operation steps after partial combination based on conventional or no creative labor. Or modular unit.
  • steps or structures where there is no necessary causal relationship logically the execution order of these steps or the module structure of the device is not limited to the execution order or module structure shown in the embodiments of this specification or the drawings.
  • FIG. 1 A specific embodiment is shown in FIG. 1.
  • the method may include the following steps.
  • S0 Obtain data flow graph information and node information, where the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model, and the node information It includes the plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and the ciphertext operator node information included in the second data flow graph.
  • the data flow graph information may include a data flow graph.
  • the data flow graph can be used to characterize the data flow information in the machine learning model.
  • the data flow graph is a tensor flow graph.
  • the nodes in the tensorflow graph represent mathematical operations in the graph, and the lines in the graph represent multi-dimensional data arrays that are interconnected between nodes, that is, tensors.
  • the data flow graph information may include a first data flow graph and a second data flow graph.
  • the first data flow graph can be understood as the data flow graph corresponding to the preset plaintext machine learning model
  • the second data flow graph can be understood as the data flow graph corresponding to the privacy machine learning model.
  • a plaintext machine learning model can be written in a machine learning framework.
  • the machine learning framework can be understood as all machine learning systems or methods including machine learning algorithms, and can include data representation and processing methods, methods for representing and suggesting predictive models, and methods for evaluating and using modeling results.
  • the machine learning framework can include one of the following: TensorFlow, Pytorch, MxNet, CNTK-Azure and other frameworks.
  • the preset plaintext machine learning model may be implemented based on the plaintext machine learning framework.
  • the plaintext machine learning model may include local plaintext operators (referred to as plaintext operators) provided by the machine learning framework.
  • plaintext operators local plaintext operators
  • the local plaintext operator in the plaintext machine learning model can be replaced with the corresponding cryptographic operator to obtain the corresponding private machine learning model.
  • this manual does not limit the specific plaintext machine learning framework used to generate the preset plaintext machine learning model, and it can be selected according to actual scenarios.
  • the machine learning framework may include multiple plaintext operators.
  • the node information may include plaintext operator node information and ciphertext operator node information.
  • the node information may include plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and ciphertext operator node information included in the second data flow graph.
  • the plaintext operator in the preset plaintext machine learning model is replaced with the operator corresponding to the plaintext operator, so the privacy sample can be replaced
  • the operator through which the data flows is determined as the plaintext operator to be replaced.
  • the private sample data is used to train the model to obtain model parameters (also referred to as training variables)
  • the operator through which the training variable flows can be determined as the plaintext operator to be replaced.
  • the plaintext operator that needs to be replaced with the cryptographic operator can be determined.
  • the cryptographic operator can be any cryptographic operator that can provide privacy protection for the input data of all parties in a scenario where two or more data holders jointly (or collaboratively) perform machine learning training and prediction.
  • the cryptographic operator may be a Secure Multi-Party Computation (MPC) operator, a homomorphic encryption (Homomorphic Encryption, HE) operator, or a zero-knowledge proof (ZKP) operator. Operators and so on.
  • MPC Secure Multi-Party Computation
  • HE homomorphic Encryption
  • ZKP zero-knowledge proof
  • the cryptographic operator can be realized and saved by the developer through a static language (such as C, C++, etc.) programming in advance, and obtained when needed, thereby improving efficiency.
  • the cipher operator may also include a cipher gradient operator.
  • these cryptographic operators should correspond one-to-one with the plaintext operators in the preset plaintext machine learning model to facilitate subsequent corresponding replacements.
  • developers can register it in the plaintext machine learning framework to facilitate the use of the plaintext machine learning model.
  • the plaintext operator node information may include at least the node location identifier of the plaintext operator that needs to be replaced, and the cryptographic operator identifier corresponding to the plaintext operator, and the ciphertext operator node information may include at least the cryptographic operator.
  • the child's node position identification and cryptographic operator identification can be used to uniquely identify the location of the node, for example, the IP address (Internet Protocol Address) corresponding to the node.
  • the cipher operator identifier can be used to identify the cipher operator, for example, it can be the name corresponding to the cipher operator.
  • the cryptographic operator identifier corresponding to the plaintext operator refers to a preset identifier of the cryptographic operator corresponding to the plaintext operator.
  • the obtaining data flow graph information and node information may include: obtaining an optimization test component in a preset plaintext machine learning model, the optimization test component includes a static optimizer, and the optimization test component uses To save information during the node matching process, and use the saved information to verify the data flow graph; based on the optimization test component, save the first data flow graph and the first data corresponding to the preset plaintext machine learning model The plaintext operator node information that needs to be replaced with a cryptographic operator in the flow graph; execute a static optimizer to replace the plaintext operator in the preset plaintext machine learning model with the cryptographic operator corresponding to the plaintext operator to generate a privacy machine Learning model; based on the optimization test component, save the second data flow graph corresponding to the privacy machine learning model and the ciphertext operator node information included in the second data flow graph; obtain data flow graph information and node information .
  • the static optimizer can be used to replace the plaintext operator in the plaintext machine model with the corresponding codon operator.
  • the optimizer test component in the preset plaintext machine learning model can be obtained during user input, or obtained from a pre-stored server, or obtained in other ways, which is not limited in this specification. .
  • the optimizer test component can be used to save information during the node matching process.
  • the optimized test component can be used to save the data flow graph corresponding to the preset plaintext machine learning model and In the data flow graph, determine the plaintext operator node information that needs to be replaced with the cipher operator. For example, in order to protect the privacy of the private sample data stored in each holder, the operator that the private sample data flows through in the data flow graph corresponding to the preset plaintext machine learning model can be determined as the plaintext operator to be replaced, and then use The optimization test component saves the node position identification of the plaintext operator to be replaced, and the cryptographic operator identification corresponding to the plaintext operator.
  • the plaintext operator node information after determining the plaintext operator that needs to be replaced with the cipher operator, it can be marked in the data flow graph. After that, the marked operator node information can be saved to the stack in sequence.
  • the plaintext operator node information may also include the position of the operator in the stack.
  • the stack is a data structure. It is a special linear table that can only be inserted and deleted at one end. It stores data according to the principle of first in, last out. The first data is pushed into the bottom of the stack, and the last data is on the top of the stack. , When you need to read data, pop data from the top of the stack (the last data is read out first).
  • the static optimizer may be executed. Since the static optimizer can replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator, the private machine learning model corresponding to the preset plaintext machine learning model can be generated by executing the static optimizer.
  • the general principle for replacing the plaintext operators in the plaintext machine learning model with cryptographic operators is: all the plaintext operators that affect data privacy protection need to be replaced with corresponding cryptographic operators to ensure input Data privacy security; for plaintext operators that do not affect data privacy protection, try not to replace them as much as possible to increase the reuse rate of the plaintext machine learning model, thereby helping to reduce the implementation cost of the privacy machine learning model.
  • the optimization test component can be used to save the data flow graph corresponding to the privacy machine learning model and the ciphertext operator node information in the data flow graph.
  • the optimization test component can be used to save the node position identification of the cryptographic operator in the data flow graph after the replacement, and the cryptographic operator identification.
  • the replaced cipher operator can be marked in the data flow diagram. After that, the marked operator node information can be saved to the stack in sequence.
  • the ciphertext operator node information may also include the position of the operator in the stack.
  • the data flow graph information and node information can be obtained after saving the corresponding data flow graph information and node information before replacing the plaintext operator, and the corresponding data flow graph information and node information after replacing the plaintext operator.
  • the optimization test component since the optimization test component includes a static optimizer, in the process of obtaining data flow graph information and node information, not only can the existing plaintext machine learning model be reused to implement a private machine learning model, and the development cost is reduced, but also It can provide guarantee for the realization of automatic test data flow graph and the correctness of graph execution results.
  • the data flow graph can be used to characterize the data flow information in the machine learning model.
  • the data flow graph can include nodes.
  • the data flow graph is a tensor flow graph.
  • the nodes in the tensorflow graph represent mathematical operations in the graph, and the lines in the graph represent multi-dimensional data arrays that are interconnected between nodes, that is, tensors.
  • the plaintext operator in the plaintext machine learning model has been replaced with the corresponding cryptographic operator to obtain the corresponding privacy machine learning model.
  • a data flow graph with a mixture of plaintext operators and ciphertext operators can be generated.
  • the data flow graph information corresponding to the model can be judged first to ensure that the original graph has not been modified, and the correct plaintext machine learning model execution can still be provided.
  • the judging whether the first data flow graph is a subgraph of the second data flow graph may include: obtaining the first data flow graph and the second data flow graph The unique identifier corresponding to the node; the unique identifier corresponding to the node in the first data flow graph is formed into a first set; the unique identifier corresponding to the node in the second data flow graph is formed into a second set; based on the node identification increment rule, Determine whether the first set is a subset of the second set; when the first set is a subset of the second set, determine that the first data flow graph is the second data flow graph Subgraph.
  • a node identifier can be added to each node in the data flow graph accordingly.
  • the plaintext operator in the plaintext machine learning model is replaced with the corresponding cryptographic operator to obtain the corresponding privacy machine learning model.
  • the node corresponding to the node in the data flow graph The identification needs to satisfy the incremental rule.
  • the incremental rule can be understood as the data flow graph corresponding to the obtained privacy machine learning model (hereinafter referred to as " The corresponding node of the cryptographic operator in the new map is incrementally identified.
  • the original graph includes 5 nodes.
  • node identifiers 1, 2, 3, 4, and 5 for these 5 nodes. Since the plaintext operators corresponding to the 2 nodes in the original graph need to be replaced with the corresponding cryptographic operators After the replacement to obtain the new graph, the corresponding node identifiers of the cryptographic operators in the new graph should be 6, 7.
  • the new graph obtained needs to be saved with the original graph.
  • the original graph includes 5 nodes.
  • the node identifiers in each graph can be formed into a set, and then it is determined whether the set corresponding to the node identifier in the original graph is a new graph
  • the middle node identifies a subset of the corresponding set. If it is, it means that the original graph is a subgraph of the new graph, which can ensure that parts of the original graph have not been modified and still provide correct plaintext machine learning model execution.
  • the set corresponding to the node identifiers in the original graph is not a subset of the set corresponding to the node identifiers in the new graph, it means that the original graph is not a subgraph of the new graph, and there is an abnormality in the plaintext operator replacement process in the plaintext machine learning model.
  • the developer can be notified through a preset method, where the preset method may include sending information, sending out reminders, etc., which are not limited in this specification.
  • the plaintext operator node information may include at least the node location identifier of the plaintext operator that needs to be replaced, and the cryptographic operator identifier corresponding to the plaintext operator, and the ciphertext operator node information may include at least the node location identifier of the cryptographic operator , Password operator identification.
  • the matching result may include the successful matching of the plaintext operator node information and the ciphertext operator node information, and may also include the unsuccessful matching of the plaintext operator node information and the ciphertext operator node information.
  • the plaintext operator node information can be matched with the ciphertext operator node information.
  • the plaintext operator node information when it is determined that the first data flow graph is a subgraph of the second data flow graph, the plaintext operator node information can be matched with the ciphertext operator node information, so as to achieve the correctness of the data flow graph.
  • the checksum Specifically, for example, in some implementation scenarios, the plaintext operator node information includes the IP address corresponding to the plaintext operator node that needs to be replaced in the original graph and the preset name of the cipher operator corresponding to the plaintext operator, and the ciphertext operator node The information includes the IP address corresponding to the cryptographic operator node in the new graph and the name of the cryptographic operator.
  • the IP address corresponding to the node before and after the operator is replaced is unchanged, it can be determined based on the same IP address in the plaintext operator node information. Whether the name of the password operator set corresponding to the plaintext operator is consistent with the name of the password operator in the ciphertext operator node information, if the plaintext operator node information corresponding to the same IP address is preset to correspond to the plaintext operator The name of the cipher operator is consistent with the name of the cipher operator in the ciphertext operator node information, indicating that the matching is successful, and the result of matching the plaintext operator node information with the ciphertext operator node information is output.
  • each node information can also include other information, the other information included in each node information can be matched in sequence during the matching process. When the information of each node is all the same, the matching is indicated. Success, otherwise the match is unsuccessful.
  • the plaintext operator node information may include the IP address corresponding to the plaintext operator node that needs to be replaced in the original graph and the preset name of the cryptographic operator corresponding to the plaintext operator, as well as the preset The first characteristic information of the cryptographic operator corresponding to the plaintext operator.
  • the ciphertext operator node information may also include some second characteristic information corresponding to the cryptographic operator.
  • the feature information may include the generation time, location, and generation method of the operator.
  • the process of matching the plaintext operator node information with the ciphertext operator node information may also include: calculating the similarity between the first feature information and the second feature information, according to the degree of acquaintance and a preset threshold. Relationship, to determine whether the plaintext operator node information matches the ciphertext operator node information.
  • the name of the cryptographic operator corresponding to the plaintext operator preset in the plaintext operator node information can also be set. It is judged with the name of the cipher operator in the ciphertext operator node information. If the names are consistent, the matching is successful, and the result of matching the plaintext operator node information with the ciphertext operator node information is output. If the corresponding names in at least one node information are inconsistent, the matching is unsuccessful, and the result that the plaintext operator node information does not match the ciphertext operator node information is output. In this way, through multiple matches, the accuracy of the verification can be improved.
  • the way of calculating the similarity can be through some methods known to those skilled in the art, such as Euclidean distance, Manhattan distance, etc., which are not limited in this specification.
  • the preset threshold can be set according to the actual scene.
  • the node information may also include the position of the operator in the stack.
  • the node information can be stored in the stack in order, so the position of the operator in the stack can be recorded, so that the position in the stack can be matched accordingly to improve the verification accuracy. For example, in some implementation scenarios, after obtaining the node information in the original graph, you can save it in the stack in turn, and record the position of each node information in the stack.
  • the matching method may also include other methods, which are not limited in this specification.
  • the node information can also be stored in a database in the form of a table.
  • the node information can also include information such as the location of the node information in the database, the name of the table corresponding to the node information in the database, and so on. The information is matched accordingly to obtain the matching result.
  • the method of the foregoing embodiment can not only reuse the existing plaintext machine learning model to implement a private machine learning model, reduce development costs, and improve coding efficiency, but also can realize automated testing of the correctness of the data flow graph, thereby improving verification efficiency.
  • the method of the above embodiment can be externally encapsulated into a corresponding interface (such as the validate_graph interface) when it is implemented, so that the external can directly call this interface to implement automated testing of the data flow graph.
  • a corresponding interface such as the validate_graph interface
  • the TensorFlow framework is taken as an example for description.
  • a static optimizer (Static Pass) is encapsulated in the static optimization test component (Static Pass Tester), which can be used to save information during the implementation process and use the saved information to The data flow diagram is verified.
  • Static Pass Tester static optimization test component
  • the following information storage is based on static optimization test components. As shown in Figure 2, in this specific embodiment, the following steps may be included.
  • the original graph can be understood as the first data flow graph corresponding to the preset plaintext machine learning model.
  • the original reference can be provided for updating the graph.
  • the original graph can be obtained by copying the data flow graph corresponding to the preset plaintext machine learning model.
  • the TensorFlow framework may be used to generate the preset plaintext machine learning model, and then the data flow graph corresponding to the preset plaintext machine learning model can be saved.
  • the secure multi-party computation operator can be understood as a cryptographic operator.
  • the operator node stack can be understood as plaintext operator node information.
  • the operator node stack that needs to be updated to a secure multi-party calculation operator can be understood as the plaintext operator node information that needs to be replaced with MPC op.
  • op is the abbreviation of Operation.
  • the original graph includes 5 nodes. Based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model, it can be determined that there are 2 Tensorflow native ops that need to be replaced with MPC op. Then you can add these in the original graph.
  • the two Tensorflow native ops are marked, and then the IP addresses corresponding to the marked two Tensorflow native ops and the preset MPC op name are stored in the stack in order, and the position information of the Tensorflow native op in the stack is correspondingly recorded.
  • the two Tensorflow native ops are Tensorflow native op3 and Tensorflow native op4
  • the node information corresponding to Tensorflow native op3 can be stored in position 1 of the stack
  • the node information corresponding to Tensorflow native op4 can be stored in position 2 of the stack.
  • the corresponding relationship between Tensorflow native op and MPC op can be preset.
  • performing operator update and replacement can be understood as replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator.
  • the new graph refers to the second data flow graph corresponding to the private machine learning model obtained after the operator is updated and replaced.
  • a static optimizer can be used to update and replace the op, thereby constructing a new graph.
  • the new graph and the operator node stack of the safe multi-party calculation operator in the new graph can be saved.
  • the preservation method is similar to that in steps (1) and (2), and will not be repeated here.
  • a node identifier can be added to each node in the data flow graph.
  • the node identifier needs to meet the incremental rule.
  • the incremental rule can be understood as the incremental identification of the corresponding node of the MPC op in the obtained new graph on the basis of the node identification in the original graph.
  • the original graph includes 5 nodes.
  • the node identifiers in each graph can be formed into a set, and then it is determined whether the set corresponding to the node identifier in the original graph is the node identifier in the new graph Corresponding to a subset of the set, if it is, it means that the original graph is a subgraph of the new graph, that is, the original graph is consistent with the previous part of the new graph, so that it can ensure that the original graph has not been modified, and the correct plaintext machine learning model can still be provided implement. If not, it means that the original graph is not a subgraph of the new graph, that is, the original graph is inconsistent with the previous part of the new graph, which means that the data flow graph after the operator replacement is incorrect and the result of the verification failure is output.
  • the elements in the stack can be compared in turn. If the Tensorflow native op and MPC op of one of the elements do not match, it is judged as a failure and the result of the verification failure is output. Otherwise, all matches are judged to be successful, and the result of successful verification is output.
  • the comparison can be performed based on the same IP addresses of the nodes before and after the operator replacement. For the specific comparison process, please refer to the description of the above method, which will not be repeated.
  • the execution result of the data flow graph can also be verified.
  • FIG. 3 A specific embodiment is shown in FIG. 3, and the method may include the following steps.
  • S12 Input the plaintext data into a session tester to obtain a plaintext execution result and a ciphertext execution result, where the session tester includes a first data flow graph and a second data flow graph;
  • S16 Calculate the difference between the plaintext execution result and the decryption result, determine whether the difference is within a preset error range, and output the determination result.
  • plaintext data can be understood as any data that has not been encrypted.
  • the plaintext data can be input by the user through the interface, or it can be pre-stored in the server, which is not limited in this specification.
  • the session tester may include a first data flow graph and a second data flow graph, which can be used to execute the data flow graph and return corresponding parameter information.
  • the inputting the plaintext data into the session tester to obtain the plaintext execution result and the ciphertext execution result may include: inputting the plaintext data into the first data flow diagram included in the session tester , Obtain the plaintext execution result; encrypt the plaintext data, distribute the encrypted data to each multi-party secure computing process, and execute the second data included in the session tester based on the data in each multi-party secure computing process Flow graph to obtain the ciphertext execution result. For example, in some implementation scenarios, you can first encrypt the plaintext data, and then distribute the encrypted data to each data holder, and finally execute the second data flow diagram included in the session tester based on the data stored by each data holder To obtain the ciphertext execution result.
  • the encryption of the plaintext data can be achieved by secret sharing, so that any data holder storing the sub-secret after the secret sharing cannot obtain the sub-secrets stored by other data holders, and only all data holders Only when the party's sub-secrets are combined can the encryption result be restored or decrypted.
  • Secret sharing can include addition secret sharing, Sherman secret sharing, and so on.
  • the native tf.Session (session executor) of the Tensorflow framework can be extended to obtain the SessionTester (session tester) that includes the original graph and the new graph.
  • the SessionTester can include an execution interface (run interface) and a verification interface (validate_run interface).
  • the run interface is used to obtain the execution result based on the data flow graph and the provided plaintext data
  • the validate_run interface is used to verify the execution result and return the verification result. .
  • the plaintext data input by the input interface can be received, and then the run interface can be executed based on the plaintext data and the original graph to obtain the plaintext parameters corresponding to the original graph.
  • the SessionTester includes the original graph and the new graph
  • the execution result can be obtained based on the data flow graph and the plaintext data successively, or the execution result can be obtained based on the data flow graph and the plaintext data at the same time.
  • the manual does not limit this.
  • the ciphertext parameters can be decrypted based on the validate_run interface to obtain the decryption parameters, and then the decryption parameters are compared with the plaintext parameters to achieve the execution result of the data flow graph Verification of correctness.
  • the execution result of the data flow graph is usually digital. Therefore, when comparing the decryption parameter with the plaintext parameter, the error range can be preset. If the value of the decryption parameter and the value of the plaintext parameter are within the preset error range, It means that the execution result of the data flow graph is correct.
  • the execution result of the data flow graph can also be of other types. If it is of other types, it can be converted into a digital type through a preset conversion method.
  • the preset conversion method is not limited in this specification.
  • a static optimizer and a session tester can be implemented by Python language.
  • Python language can also be used to implement cryptographic operators, such as C language, C++ language, etc., which are not limited in this specification.
  • This manual provides a node matching method.
  • the data flow graph is obtained
  • the existing plaintext machine learning model be reused to realize the privacy machine learning model, and the development cost can be reduced, but also can provide guarantee for the realization of automatic testing of the data flow graph and the correctness of the graph execution result.
  • After obtaining the data flow graph information and node information by judging the corresponding data flow graph before and after the plaintext operator replacement, it can be ensured that the part of the original graph has not been modified, and the correct execution of the plaintext machine learning model can still be provided.
  • automated testing of the correctness of the data flow graph and graph execution results can be realized, thereby improving the verification efficiency.
  • one or more embodiments of this specification also provide a node matching device.
  • the described devices may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc., which use the methods described in the embodiments of this specification, combined with necessary implementation hardware devices.
  • the devices in one or more embodiments provided in the embodiments of this specification are as described in the following embodiments. Since the implementation scheme of the device to solve the problem is similar to the method, the implementation of the specific device in the embodiment of this specification can refer to the implementation of the foregoing method, and the repetition will not be repeated.
  • unit or “module” can be a combination of software and/or hardware that implements a predetermined function.
  • the devices described in the following embodiments are preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.
  • FIG. 4 is a schematic diagram of the module structure of an embodiment of a node matching device provided in this specification.
  • a node matching device provided in this specification may include: an information acquisition module 120 and a judgment module 122 ,Matching module 124.
  • the information acquisition module 120 may be used to acquire data flow graph information and node information, where the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow corresponding to a privacy machine learning model Figure, the node information includes plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and ciphertext operator node information included in the second data flow graph;
  • the judging module 122 can be used to judge whether the first data flow graph is a subgraph of the second data flow graph
  • the matching module 124 may be used to match the plaintext operator node information with the ciphertext operator node information when determining that the first data flow graph is a subgraph of the second data flow graph, and output matching result.
  • it may further include:
  • the plaintext data acquisition module can be used to acquire plaintext data when the matching result is a successful match
  • the execution result obtaining module may be used to input the plaintext data into the session tester to obtain the plaintext execution result and the ciphertext execution result, wherein the session tester includes a first data flow graph and a second data flow graph;
  • the decryption module can be used to decrypt the ciphertext execution result to obtain the decryption result
  • the result judgment module can be used to calculate the difference between the plaintext execution result and the decryption result, and judge whether the difference is within a preset error range, and output the judgment result.
  • the execution result obtaining module may include:
  • the plaintext execution result obtaining unit may be used to input the plaintext data into the first data flow graph included in the conversation tester to obtain the plaintext execution result;
  • the ciphertext execution result obtaining unit may be used to encrypt the plaintext data, distribute the encrypted data to each multi-party secure computing process, and execute the session tester based on the data in each of the multi-party secure computing process includes The second data flow diagram to obtain the ciphertext execution result.
  • the information acquisition module 120 may include:
  • the first obtaining unit 1200 can obtain the optimized test component in the preset plaintext machine learning model, the optimized test component includes a static optimizer, and the optimized test component is used to save information during the node matching process and use the saved information to Data flow diagram for verification;
  • the first saving unit 1202 may be configured to save, based on the optimization test component, the first data flow graph corresponding to the preset plaintext machine learning model and the plaintext operator that needs to be replaced with a cryptographic operator in the first data flow graph Node information;
  • the model generation unit 1204 may be used to execute a static optimizer, and replace the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to generate a privacy machine learning model;
  • the second saving unit 1206 may be configured to save the second data flow graph corresponding to the privacy machine learning model and the ciphertext operator node information included in the second data flow graph based on the optimization test component;
  • the information obtaining unit 1208 may be used to obtain data flow graph information and node information.
  • the judgment module 122 may include:
  • the second acquiring unit 1220 may be configured to acquire unique identifiers corresponding to nodes in the first data flow graph and the second data flow graph;
  • the first forming unit 1222 may be used to form a first set of unique identifiers corresponding to nodes in the first data flow graph;
  • the second forming unit 1224 may be used to form a second set of unique identifiers corresponding to nodes in the second data flow graph;
  • the judging unit 1226 may be configured to judge whether the first set is a subset of the second set based on the node identification increment rule;
  • the determining unit 1228 may be configured to determine that the first data flow graph is a subgraph of the second data flow graph when the first set is a subset of the second set.
  • the plaintext operator node information includes at least the node location identifier of the plaintext operator that needs to be replaced, and the cryptographic operator corresponding to the plaintext operator.
  • the ciphertext operator node information includes at least the node position identification of the cryptographic operator and the cryptographic operator identification.
  • This specification provides a node matching device.
  • the data flow graph is obtained
  • the existing plaintext machine learning model be reused to realize the privacy machine learning model, and the development cost can be reduced, but also can provide guarantee for the realization of automatic testing of the data flow graph and the correctness of the graph execution result.
  • After obtaining the data flow graph information and node information by judging the corresponding data flow graph before and after the plaintext operator replacement, it can be ensured that the part of the original graph has not been modified, and the correct execution of the plaintext machine learning model can still be provided.
  • the automatic testing of the correctness of the data flow graph and the execution result of the graph can be realized, thereby improving the verification efficiency.
  • the above-mentioned device may also include other implementation manners according to the description of the method embodiment, and for the specific implementation manner, refer to the description of the related method embodiment, which is not repeated here.
  • This specification also provides an embodiment of a node matching device, which includes a processor and a memory for storing executable instructions of the processor.
  • the implementation includes the following steps:
  • the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model
  • the node information includes all The plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and the ciphertext operator node information included in the second data flow graph;
  • the first data flow graph is a subgraph of the second data flow graph
  • matching the plaintext operator node information with the ciphertext operator node information and outputting a matching result.
  • the above-mentioned equipment according to the description of the method or device embodiment may also include other implementation manners, such as determining well spacing information of adjacent wells, well spacing information splitting based on reserves, and well spacing splitting based on production. The information determines how well spacing is achieved.
  • implementation manners such as determining well spacing information of adjacent wells, well spacing information splitting based on reserves, and well spacing splitting based on production. The information determines how well spacing is achieved.
  • This specification also provides an embodiment of a node matching system, which includes at least one processor and a memory storing computer-executable instructions.
  • the processor executes the instructions, the method described in any one or more of the foregoing embodiments is
  • the steps include, for example, obtaining data flow graph information and node information, where the data flow graph information includes a first data flow graph corresponding to a preset plaintext machine learning model and a second data flow graph corresponding to a privacy machine learning model, so
  • the node information includes plaintext operator node information that needs to be replaced with a cipher operator in the first data flow graph and ciphertext operator node information included in the second data flow graph; judging the first data flow graph Whether it is a subgraph of the second data flow graph; when it is determined that the first data flow graph is a subgraph of the second data flow graph, the plaintext operator node information is combined with the ciphertext operator node The information is matched, and the matching result is output.
  • the system can be a single server
  • FIG. 5 is a hardware structural block diagram of an embodiment of a node matching server provided in this specification.
  • the server may be the node matching device or the node matching system in the foregoing embodiment.
  • the server 10 may include one or more (only one is shown in the figure) processor 100 (the processor 100 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA),
  • the memory 200 for storing data
  • the transmission module 300 for communication functions.
  • the server 10 may also include more or fewer components than shown in FIG. 5, for example, may also include other processing hardware, such as a database or multi-level cache, GPU, or have a configuration different from that shown in FIG.
  • the memory 200 can be used to store software programs and modules of application software, such as program instructions/modules corresponding to the node matching method in the embodiment of this specification.
  • the processor 100 executes various software programs and modules stored in the memory 200 by running the software programs and modules. Functional application and data processing.
  • the memory 200 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 200 may further include a memory remotely provided with respect to the processor 100, and these remote memories may be connected to a computer terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission module 300 is used to receive or send data via a network.
  • the above-mentioned specific examples of the network may include a wireless network provided by a communication provider of a computer terminal.
  • the transmission module 300 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission module 300 may be a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the methods or devices described in the above embodiments provided in this specification can implement business logic through computer programs and are recorded on a storage medium, and the storage medium can be read and executed by a computer to achieve the effects of the solutions described in the embodiments of this specification.
  • the storage medium may include a physical device for storing information, and the information is usually digitized and then stored in an electric, magnetic, or optical medium.
  • the storage medium may include: devices that use electrical energy to store information, such as various types of memory, such as RAM, ROM, etc.; devices that use magnetic energy to store information, such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, U disk; a device that uses optical methods to store information, such as a CD or DVD.
  • devices that use electrical energy to store information such as various types of memory, such as RAM, ROM, etc.
  • devices that use magnetic energy to store information such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, U disk
  • a device that uses optical methods to store information such as a CD or DVD.
  • quantum memory graphene memory, and so on.
  • the above node matching method or device embodiments provided in this specification can be implemented in a computer by a processor executing corresponding program instructions, such as using the c++ language of the windows operating system to be implemented on the PC side, linux system, or other such as using android,
  • the iOS system programming language is implemented in smart terminals, and the processing logic based on quantum computers is implemented.
  • the device, computer storage medium, and system described above in the specification may also include other implementation manners according to the description of the related method embodiments.
  • specific implementation manners please refer to the description of the corresponding method embodiments, which will not be repeated here. .
  • the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow).
  • hardware improvements for example, improvements in circuit structures such as diodes, transistors, switches, etc.
  • software improvements improvements in method flow.
  • the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure.
  • Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module.
  • a programmable logic device Programmable Logic Device, PLD
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic.
  • controllers in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers, and embedded logic.
  • the same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • Part of the system, device, module or unit explained in the above embodiments may be implemented by a computer chip or entity, or implemented by a product with a certain function.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a tablet computer, a smart phone, and the like.
  • the functions are divided into various modules and described separately.
  • the functions of some modules can be implemented in the same one or more software and/or hardware, or the modules that implement the same function can be implemented by a combination of multiple sub-modules or sub-units, etc. .
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本说明书提供了一种节点匹配方法、装置、设备及系统。所述方法包括获取数据流图信息和节点信息;判断所述第一数据流图是否为所述第二数据流图的子图;确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配;当匹配成功时,获取明文数据;将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果;对所述密文执行结果进行解密,获得解密结果;计算所述明文执行结果与所述解密结果的差值,并判断所述差值是否在预设误差范围内,输出判断结果。利用本说明书实施例可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。

Description

一种节点匹配方法、装置、设备及系统 技术领域
本申请涉及数据处理技术领域,特别涉及一种节点匹配方法、装置、设备及系统。
背景技术
身处大数据驱动的AI(Artificial Intelligence,人工智能)时代,人们越来越认识到数据的价值。为此,对个人信息、数据的隐私保护提出了更高的要求。
为了解决数据保护和AI之间的矛盾,目前借助密码学和tensorflow机器学习平台,诞生了各种基于加密机器学习的框架(例如,TF-Encrypted、PySyft等)。这些加密机器学习框架可以使不精通密码学、分布式系统的开发者能够对加密数据进行训练和预测。
发明内容
本说明书实施例提供了一种节点匹配方法、装置、设备及系统,可以实现对数据流图和数据流图执行结果正确性的自动化测试,从而提高验证效率。
本说明书提供的节点匹配方法、装置、设备及系统是包括以下方式实现的:
一种节点匹配方法,包括:
获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
判断所述第一数据流图是否为所述第二数据流图的子图;
确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
一种节点匹配装置,包括:
信息获取模块,用于获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
判断模块,用于判断所述第一数据流图是否为所述第二数据流图的子图;
匹配模块,用于确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
一种节点匹配设备,包括处理器及用于存储处理器可执行指令的存储器,所述指令被所述处理器执行时实现包括以下步骤:
获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
判断所述第一数据流图是否为所述第二数据流图的子图;
确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
一种节点匹配系统,包括至少一个处理器以及存储计算机可执行指令的存储器,所述处理器执行所述指令时实现本说明书实施例中任意一个方法实施例方法的步骤。
本说明书提供的一种节点匹配方法、装置、设备及系统。一些实施例中在将预设明文机器学习模型中的明文算子替换为对应的密码算子过程中,由于优化器测试组件封装了静态优化器,使得在获取数据流图信息和节点信息过程中,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,而且可以为实现自动测试数据流图和图执行结果正确性提供保障。在获取数据流图信息和节点信息后,通过对明文算子替换前后对应的数据流图进行判断,可以确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。通过对明文算子替换前后的节点信息进行匹配以及对图执行结果的比较,可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。采用本说明书提供的实施方案,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,提高编码效率,而且可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。
附图说明
此处所说明的附图用来提供对本说明书的进一步理解,构成本说明书的一部分,并不构成对本说明书的限定。在附图中:
图1是本说明书提供的节点匹配方法的一个实施例的流程示意图;
图2是本说明书提供的节点匹配方法的一个具体实施例的流程示意图;
图3是本说明书提供的节点匹配方法的另一个实施例的流程示意图;
图4是本说明书提供的一种节点匹配装置的一个实施例的模块结构示意图;
图5是本说明书提供的一种节点匹配服务器的一个实施例的硬件结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书中的一部分实施例,而不是全部的实施例。基于本说明书中的一个或多个实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书实施例保护的范围。
一些实施场景中,为了在利用样本数据训练机器模型时保护样本数据的隐私,可以将预设明文机器学习模型中的本地明文算子替换为对应的密码算子,得到对应的隐私机器学习模型,这样通过复用已有的明文机器学习模型实现了隐私机器学习模型,可以有效减少因使用隐私机器学习框架特有的应用程序接口和隐私数据类型重新编码所带来的巨大开发成本,提高编码效率。然而,将明文机器学习模型转化为对应的隐私机器学习模型后,不仅会产生与隐私机器学习模型对应的数据流图,而且也会输出隐私数据类型的执行结果。目前,为了实现数据流图正确性的校验,通常需要先手动导出图并借助Tensorboard进行可视化校验。为了实现对图执行结果的校验,通常需要手动编写测试进行验证,效率较低。
本说明书提供一种节点匹配方法、装置、设备及系统,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,提高编码效率,而且可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。
下面以一个具体的应用场景为例对本说明书实施方案进行说明。具体的,图1是本说明书提供的节点匹配方法的一个实施例的流程示意图。虽然本说明书提供了如下述实施例或附图所示的方法操作步骤或装置结构,但基于常规或者无需创造性的劳动在所述方法或装置中可以包括更多或者部分合并后更少的操作步骤或模块单元。在逻辑性上不存在必要因果关系的步骤或结构中,这些步骤的执行顺序或装置的模块结构不限于本说明书实施例或附图所示的执行顺序或模块结构。所述的方法或模块结构的在实际中的装置、服务器或终端产品应用时,可以按照实施例或者附图所示的方法或模块结构进行顺序执行或者并行执行(例如并行处理器或者多线程处理的环境、甚至包括分布式处理、服务器集群的实施环境)。
需要说明的是,下述实施例描述并不对基于本说明书的其他可扩展到的应用场景中的技术方案构成限制。具体的一种实施例如图1所示,本说明书提供的一种节点匹配方法的一种实施例中,所述方法可以包括以下步骤。
S0:获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息。
本说明书一个实施例中,数据流图信息可以包括数据流图。数据流图可以用于表征机器学习模型中的数据流动信息。例如,在TensorFlow机器学习框架中,数据流图为张量流图。张量流图中的节点在图中表示数学操作,图中的线则表示在节点间相互联系的多维数据数组,即张量。本实施例中,数据流图信息可以包括第一数据流图和第二数据流图。其中,第一数据流图可以理解为是预设明文机器学习模型对应的数据流图,第二数据流图可以理解为是隐私机器学习模型对应的数据流图。
本说明书一个实施例中,可以在机器学习框架中编写明文机器学习模型。机器学习框架可以理解为是包括机器学习算法在内的所有机器学习的系统或方法,可以包括数据表示与处理的方法、表示和建议预测模型的方法、评价和使用建模结果的方法。机器学习框架可以包括以下之一:TensorFlow、Pytorch、MxNet和CNTK-Azure等框架。本实施例中,预设明文机器学习模型可以是基于明文机器学习框架实现的。
一些实施场景中,明文机器学习模型中可以包括机器学习框架提供的本地明文算子(简称明文算子)。为了在利用样本数据训练机器模型时保护样本数据的隐私,可以将明文机器学习模型中的本地明文算子替换为对应的密码算子,得到对应的隐私机器学习模型。
需要说明的是,本说明书对于预设明文机器学习模型具体采用何种明文机器学习框架生成不作限定,具体可根据实际场景进行选择。一些实施场景中,机器学习框架中可以包括多个明文算子。
本说明书一个实施例中,节点信息可以包括明文算子节点信息和密文算子节点信息。本实施例中,节点信息可以包括第一数据流图中需要替换为密码算子的明文算子节点信息和第二数据流图中包括的密文算子节点信息。
一些实施场景中,考虑到是为了保护各持有方中存储的隐私样本数据的隐私才将预设明文机器学习模型中的明文算子替换为明文算子对应的算子,因此可以将隐私样本数据流经的算子确定为要替换的明文算子。另一些实施场景中,考虑到隐私样本数据是为了训练 模型,以得到模型参数(也称为训练变量),因此,可以将训练变量流经的算子确定为要替换的明文算子。上述实施例中,基于预设明文机器学习模型对应的数据流图中的数据流,可以确定需要替换为密码算子的明文算子。
一些实施场景中,密码算子可以为任何可在两个或多个数据持有方联合(或协同)进行机器学习训练及预测场景中,为各方输入数据提供隐私保护的密码算子。例如一些实施场景中,密码算子可以为安全多方计算(Secure Multi-Party Computation,MPC)算子、同态加密(Homomorphic Encryption,HE)算子、或零知识证明(Zero-knowledge Proof,ZKP)算子等。同样,本说明书对具体采用何种密码算子不作限定,具体可根据实际场景进行选择。需要说明的是,密码算子可以由开发人员预先通过静态语言(例如C、C++等)编程实现并保存,在需要时获取,从而提高效率。一些实施例中,密码算子中通常还可以包含有密码梯度算子。当然,这些密码算子(包括密码梯度算子),应与预设明文机器学习模型中的明文算子一一对应,以便于后续对应替换。在实现密码算子后,开发人员可以将其注册到明文机器学习框架中,以便于明文机器学习模型使用。
本说明书一个实施例中,明文算子节点信息中至少可以包括需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识,密文算子节点信息中至少可以包括密码算子的节点位置标识、密码算子标识。其中,节点位置标识可以用于唯一标识该节点所处位置,例如,节点对应的IP地址(Internet Protocol Address,互联网协议地址)。密码算子标识可以用于标识该密码算子,例如,可以是密码算子对应的名称等。其中,与明文算子对应的密码算子标识是指预先设置的与明文算子对应的密码算子的标识。
本说明书一个实施例中,所述获取数据流图信息和节点信息,可以包括:获取预设明文机器学习模型中的优化测试组件,所述优化测试组件包括静态优化器,所述优化测试组件用于在节点匹配过程中保存信息,并利用保存信息对数据流图进行校验;基于所述优化测试组件,保存所述预设明文机器学习模型对应的第一数据流图和所述第一数据流图中需要替换为密码算子的明文算子节点信息;执行静态优化器,将所述预设明文机器学习模型中明文算子替换为所述明文算子对应的密码算子,生成隐私机器学习模型;基于所述优化测试组件,保存所述隐私机器学习模型对应的第二数据流图和所述第二数据流图中包括的密文算子节点信息;获取数据流图信息和节点信息。其中,静态优化器可以用于将明文机器模型中的明文算子替换为对应的密码算子。优化器测试组件是对静态优化器的封装。
一些实施场景中,获取预设明文机器学习模型中的优化器测试组件可以是在用户输入时获取,也可以是从预先存储的服务器中获取,还可以是其他方式获取,本说明书对此不 作限定。
一些实施场景中,在获取优化器测试组件之后,由于优化器测试组件可以用于在节点匹配过程中保存信息,这样,可以利用优化测试组件保存与预设明文机器学习模型对应的数据流图以及该数据流图中确定需要替换为密码算子的明文算子节点信息。例如,为了保护各持有方中存储的隐私样本数据的隐私,可以将隐私样本数据在预设明文机器学习模型对应的数据流图中流经的算子确定为要替换的明文算子,然后利用优化测试组件保存需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识。一些实施场景中,在确定需要替换为密码算子的明文算子后,可以将其在数据流图中进行标记。之后,可以按照顺序将被标记的算子节点信息保存到堆栈中。其中,明文算子节点信息中还可以包括算子在堆栈中的位置。堆栈是一种数据结构,是一种只能在一端进行插入和删除操作的特殊线性表,其按照先进后出的原则存储数据,先进入的数据被压入栈底,最后的数据在栈顶,需要读数据的时候从栈顶开始弹出数据(最后一个数据被第一个读出来)。
一些实施场景中,在保存预设明文机器学习模型对应的信息后,可以执行静态优化器。由于静态优化器可以将预设明文机器学习模型中的明文算子替换为对应的密码算子,所以可以通过执行静态优化器生成与预设明文机器学习模型对应的隐私机器学习模型。本说明书一个实施例中,将明文机器学习模型中的明文算子替换为密码算子的一般原则是:对于影响数据隐私保护的明文算子,均需要替换为对应的密码算子,以确保输入数据的隐私安全;对于不影响数据隐私保护的明文算子,尽量不进行替换,以提高对明文机器学习模型的复用率,从而有利于降低隐私机器学习模型的实现成本。
一些实施场景中,在生成隐私机器学习模型后,可以利用优化测试组件保存与隐私机器学习模型对应的数据流图以及该数据流图中的密文算子节点信息。例如,可以利用优化测试组件保存替换后数据流图中密码算子的节点位置标识、以及密码算子标识。一些实施场景中,可以将替换后的密码算子在数据流图中进行标记。之后,可以按照顺序将被标记的算子节点信息保存到堆栈中。其中,密文算子节点信息中还可以包括算子在堆栈中的位置。
一些实施场景中,在将明文算子替换前对应的数据流图信息、节点信息以及将明文算子替换后对应的数据流图信息、节点信息保存后,可以获取数据流图信息和节点信息。
上述实施例中的方法,由于优化测试组件包括静态优化器,在获取数据流图信息和节点信息过程中,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,而且可以为实现自动测试数据流图和图执行结果正确性提供保障。
S2:判断所述第一数据流图是否为所述第二数据流图的子图。
其中,数据流图可以用于表征机器学习模型中的数据流动信息。数据流图中可以包括节点。例如,在TensorFlow机器学习框架中,数据流图为张量流图。张量流图中的节点在图中表示数学操作,图中的线则表示在节点间相互联系的多维数据数组,即张量。
本说明书实施例中,在获取数据流图信息和节点信息后,表明已经将明文机器学习模型中的明文算子替换为对应的密码算子,得到对应的隐私机器学习模型。一些实施场景中,通过对明文机器学习模型中需要替换的明文算子进行替换,可以产生明文算子以及密文算子混合的数据流图。为了验证模型中明文算子替换是否正确,可以先对与模型对应的数据流图信息进行判断,以便确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。
本说明书一个实施例中,所述判断所述第一数据流图是否为所述第二数据流图的子图,可以包括:获取所述第一数据流图和所述第二数据流图中节点对应的唯一标识;将所述第一数据流图中节点对应的唯一标识组成第一集合;将所述第二数据流图中节点对应的唯一标识组成第二集合;基于节点标识递增规则,判断所述第一集合是否是所述第二集合的子集;当所述第一集合是所述第二集合的子集时,确定所述第一数据流图是所述第二数据流图的子图。
一些实施场景中,在获取数据流图信息后,可以相应的为数据流图中的每个节点添加节点标识。其中,为了验证模型中明文算子替换是否正确,在将明文机器学习模型中的明文算子替换为对应的密码算子,得到对应隐私机器学习模型的过程中,数据流图中节点对应的节点标识需要满足递增规则。递增规则可以理解为是在明文机器学习模型对应的数据流图(以下可以简称为“原始图”)中节点标识的基础上,对得到的隐私机器学习模型对应的数据流图(以下可以简称“新图”)中密码算子对应节点进行递增标识。例如,原始图中包括5个节点,相应的,为这5个节点添加节点标识1、2、3、4、5,由于原始图中2个节点对应的明文算子需要替换为对应的密码算子,则在替换获得新图后,新图中密码算子对应节点标识应为6、7。
一些实施场景中,为了验证模型中明文算子替换是否正确,得到的新图中需要保存有原始图。例如,原始图中包括5个节点,相应的,为这5个节点添加节点标识1、2、3、4、5,由于原始图中2个节点对应的明文算子需要替换为对应的密码算子,则在替换获得新图后,新图中应包括7个节点,对应的节点标识分别为1、2、3、4、5、6、7。
一些实施场景中,在为原始图、新图中的每个节点添加节点标识后,可以将每个图中 的节点标识组成一个集合,然后通过判断原始图中节点标识对应的集合是否是新图中节点标识对应集合的子集,如果是,则说明原始图是新图的子图,从而可以确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。如果原始图中节点标识对应的集合不是新图中节点标识对应集合的子集,则说明原始图不是新图的子图,明文机器学习模型中明文算子替换过程中存在异常。在出现异常时,可以通过预设方式通知开发人员,其中,预设方式可以包括发送信息、发出提醒等方式进行,本说明书对此不作限定。
本说明书实施例中,在获取数据流图信息和节点信息后,通过对明文算子替换前后对应的数据流图进行判断,可以确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。
S4:确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
其中,明文算子节点信息中至少可以包括需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识,密文算子节点信息中至少可以包括密码算子的节点位置标识、密码算子标识。匹配结果可以包括明文算子节点信息与密文算子节点信息匹配成功,还可以包括明文算子节点信息与密文算子节点信息匹配不成功。
本说明书实施例中,通过对数据流图信息进行判断,确定原始图是新图的子图后,可以说明原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。之后,为了实现对数据流图正确性的校验,可以将明文算子节点信息与密文算子节点信息进行匹配。
本说明书一个实施例中,在确定第一数据流图是第二数据流图的子图时,可以将明文算子节点信息与密文算子节点信息进行匹配,从而实现对数据流图正确性的校验。具体的,例如一些实施场景中,明文算子节点信息包括原始图中需要替换的明文算子节点对应的IP地址和预先设置的与明文算子对应的密码算子的名称,密文算子节点信息包括新图中密码算子节点对应的IP地址和该密码算子的名称,由于算子替换前后节点对应的IP地址是不变的,所以可以基于同一IP地址判断明文算子节点信息中预先设置的与明文算子对应的密码算子的名称是否与密文算子节点信息中密码算子的名称一致,如果同一IP地址对应的明文算子节点信息中预先设置的与明文算子对应的密码算子的名称与密文算子节点信息中密码算子的名称均一致,则说明匹配成功,输出明文算子节点信息与密文算子节点信息匹配的结果。如果至少存在一个节点信息中对应的名称不一致,则说明匹配不成功,输出明文算子节点信息与密文算子节点信息不匹配的结果。需要说明的是,由于每个节点信息中还可以包括其他信息,所以在匹配过程中可以相应的对每个节点信息中包括的其他信息依 次进行匹配,在每个节点信息全部一致时,说明匹配成功,否则匹配不成功。
另一些实施场景中,明文算子节点信息除了可以包括原始图中需要替换的明文算子节点对应的IP地址和预先设置的与明文算子对应的密码算子的名称外,还可以包括预先设置的与明文算子对应的密码算子的第一特征信息。密文算子节点信息除了可以包括新图中密码算子节点对应的IP地址和该密码算子的名称外,还可以包括一些与该密码算子对应的第二特征信息。其中,特征信息可以包括算子的生成时间、地点、生成方式等。相应的,在将明文算子节点信息与密文算子节点信息进行匹配过程中,还可以包括:计算第一特征信息与第二特征信息的相似度,根据相识度与预先设定阈值之间关系,确定明文算子节点信息与密文算子节点信息是否匹配。例如,可以先判断IP地址是否一致;确定IP地址一致时,计算第一特征信息与第二特征信息的相似度;判断第一特征信息与第二特征信息的相似度是否大于预先设定阈值,确定第一特征信息与第二特征信息的相似度大于或等于预先设定阈值时,说明匹配成功,输出明文算子节点信息与密文算子节点信息匹配的结果。如果第一特征信息与第二特征信息的相似度小于预先设定阈值,则说明匹配不成功,输出明文算子节点信息与密文算子节点信息不匹配的结果。
一些实施场景中,在第一特征信息与第二特征信息的相似度大于或等于预先设定阈值时,还可以对明文算子节点信息中预先设置的与明文算子对应的密码算子的名称与密文算子节点信息中密码算子的名称进行判断,如果名称均一致,则说明匹配成功,输出明文算子节点信息与密文算子节点信息匹配的结果。如果至少存在一个节点信息中对应的名称不一致,则说明匹配不成功,输出明文算子节点信息与密文算子节点信息不匹配的结果。这样,通过多次匹配,可以提高校验的准确度。
需要说明的是,计算相似度的方式可以通过本领域技术人员知晓的一些方式,如欧几里得距离、曼哈顿距离等,本说明书对此不做限定。预先设定阈值可以根据实际场景进行设定。
本说明书一个实施例中,节点信息中还可以包括算子在堆栈中的位置。由于在获取节点信息后,可以按照顺序将节点信息保存到堆栈中,所以可以记录算子在堆栈中的位置,以便后续可以通过对堆栈中的位置进行相应匹配,从而提高校验准确度。例如一些实施场景中,在获取原始图中节点信息后,可以依次将其保存到堆栈中,并记录每个节点信息在堆栈中的位置,在获取新图中的节点信息后,同样,依次保存到另一个堆栈中,并记录每个节点在堆栈中的位置,最后对明文算子节点信息和密文算子节点信息进行匹配时,可以先对节点信息中包括的算子在堆栈中的位置进行匹配,在匹配成功时,再对节点对应的IP 地址和密码算子的名称进行匹配。
需要说明的是,上述只是进行示例性说明,匹配方式还可以包括其他方式,本说明书对此不做作限定。例如,节点信息中还可以以表的形式存储在数据库等中,这样,节点信息中还可以包括节点信息在数据库中的位置、节点信息在数据库中对应表的名称等信息,此时,可以根据这些信息进行相应匹配,获得匹配结果。
上述实施例的方法,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,提高编码效率,而且可以实现对数据流图正确性的自动化测试,从而提高验证效率。
上述实施例的方法,在进行实现时,可以对外封装成相应的接口(如validate_graph接口),这样,外部就可以直接调用该接口实现数据流图的自动化测试。
下面结合一个具体实施例对上述方法进行说明,然而,值得注意的是,该具体实施例仅是为了更好地说明本申请,并不构成对本申请的不当限定。
本具体实施例中以TensorFlow框架为例进行说明,其中,静态优化测试组件(Static Pass Tester)中封装有静态优化器(Static Pass),可以用于在实施过程中保存信息,并利用保存信息对数据流图进行校验。下述信息保存均基于静态优化测试组件完成。如图2所示,在本具体实施例中,可以包括以下步骤。
(1)保存原始graph;
其中,原始graph可以理解为是预设明文机器学习模型对应的第一数据流图。本实施例中,通过保存原始graph,可以为更新图提供原始参考。需要说明的是,该原始graph可以通过对预设明文机器学习模型对应的数据流图进行复制获得。
本实施例中,可以采用TensorFlow框架生成预设明文机器学习模型,然后保存预设明文机器学习模型对应的数据流图。
(2)保存原始graph中需要更新为安全多方计算算子的算子节点栈;
其中,安全多方计算算子(MPC op)可以理解为是密码算子。算子节点栈可以理解为明文算子节点信息。需要更新为安全多方计算算子的算子节点栈可以理解为是需要替换为MPC op的明文算子节点信息。其中,op为Operation的缩写。
本实施例中,可以基于预设明文机器学习模型对应的数据流图中的数据流,确定需要替换为MPC op的本地明文算子(Tensorflow native op),然后将需要替换的Tensorflow native op在原始graph中进行标记,最后将进行标记的节点对应的信息保存到堆栈中,并保存节点在堆栈中的顺序信息。例如,原始graph中包括5个节点,基于预设明文机器学习模型 对应的数据流图中的数据流,可以确定需要替换为MPC op的Tensorflow native op有2个,则可以在原始graph中将这2个Tensorflow native op进行标记,然后把标记的2个Tensorflow native op对应的IP地址、以及预先设定的MPC op名称按顺序保存到堆栈中,并对应记录Tensorflow native op在堆栈中的位置信息。例如,2个Tensorflow native op分别为Tensorflow native op3和Tensorflow native op4,则可以将Tensorflow native op3对应的节点信息保存到堆栈的位置1中,将Tensorflow native op4对应的节点信息保存到堆栈的位置2中,然后将Tensorflow native op在堆栈的具体位置(如,位置1、位置2)记录到对应的节点信息中。其中,可以预先设置Tensorflow native op与MPC op的对应关系。
(3)执行静态优化器进行算子更新替换,并构建新graph;
其中,进行算子更新替换可以理解为将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子。新graph是指算子更新替换后获得的隐私机器学习模型对应的第二数据流图。
本实施例中,可以通过静态优化器,实现op的更新替换,从而构建新graph。
相应的,本实施例中,在构建新graph后,可以保存新graph以及新graph中的安全多方计算算子的算子节点栈。其中,保存方式与步骤(1)、(2)中类似,对此不作赘述。
需要说明的是,上述执行过程中,如果静态优化测试组件没有进行保存,则可以抛出异常。
(4)比较原始graph是否为新graph的子图;
本实施例中,可以为数据流图中的每个节点添加节点标识。其中,节点标识需要满足递增规则。递增规则可以理解为是在原始graph中节点标识的基础上,对得到的新graph中MPC op对应节点进行递增标识。例如,原始graph中包括5个节点,相应的,为这5个节点添加节点标识1、2、3、4、5,由于原始graph中2个节点对应的Tensorflow native op需要替换为对应的MPC op,则在替换获得新graph后,新graph中MPC op对应节点标识应为6、7。
本实施例中,在为数据流图中的每个节点添加节点标识后,可以将每个图中的节点标识组成一个集合,然后判断原始graph中节点标识对应的集合是否是新graph中节点标识对应集合的子集,如果是,则说明原始graph是新graph的子图,即原始graph与新graph前面部分一致,从而可以确保原始graph的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。如果不是,则说明原始graph不是新graph的子图,即原始graph与新graph前面部分不一致,从而说明进行算子替换后的数据流图不正确,输出校验失败的结 果。
(5)确定原始graph为新graph的子图时,比较原始graph中需要更新为安全多方计算算子的算子节点栈与新graph中的安全多方计算算子的算子节点栈是否匹配,并输出结果。
本实施例中,由于原始graph中需要更新为安全多方计算算子的算子节点栈与新graph中的安全多方计算算子的算子节点栈分别保存到堆栈中,所以在确定原始graph为新graph的子图后,可以分别对堆栈中的元素依次进行比较,如果其中有一个元素的Tensorflow native op和MPC op不匹配,则判断为失败,输出校验失败的结果。否则,全部匹配,判断为成功,输出校验成功的结果。其中,匹配过程中可以基于算子替换前后节点对应的IP地址不变进行比较,具体比较过程可参见上述方法的描述,对此不作赘述。
本说明书一个实施例中,在实现对数据流图正确性的自动化测试后,还可以对数据流图的执行结果进行验证。具体的一种实施例如图3所示,所述方法可以包括以下步骤。
S10:当所述匹配结果为匹配成功时,获取明文数据;
S12:将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,其中,所述会话测试器包括第一数据流图和第二数据流图;
S14:对所述密文执行结果进行解密,获得解密结果;
S16:计算所述明文执行结果与所述解密结果的差值,并判断所述差值是否在预设误差范围内,输出判断结果。
其中,明文数据可以理解为是任何没有经过加密的数据。明文数据可以是用户通过接口输入,也可以是预先存储在服务器中,本说明书对此不作限定。会话测试器中可以包括第一数据流图和第二数据流图,其可以用于执行数据流图,并返回对应的参数信息。
本说明书一个实施例中,所述将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,可以包括:将所述明文数据输入所述会话测试器包括的第一数据流图,获得明文执行结果;对所述明文数据进行加密,将加密后的数据分发到各个多方安全计算进程,基于所述各个多方安全计算进程中的数据,执行所述会话测试器包括的第二数据流图,获得密文执行结果。例如一些实施场景中,可以先对明文数据进行加密,然后将加密后的数据分发到各个数据持有方,最后根据各个数据持有方存储的数据,执行会话测试器包括的第二数据流图,获得密文执行结果。其中,对明文数据进行加密可以通过秘密分享的方式实现,这样,进行秘密分享后任何一个存储子秘密的数据持有方均无法获取其他数据持有方存储的子秘密,而且只有所有数据持有方的子秘密合起来才可以对加密结果进行还原 或解密。秘密分享可以包括加法秘密分享、谢尔曼秘密分享等。
例如一些实施场景中,为了对密文执行结果和明文执行结果进行比较,可以对Tensorflow框架的原生tf.Session(会话执行器)进行扩展,获得包括原始图和新图的SessionTester(会话测试器)。其中,SessionTester可以包括执行接口(run接口)和验证接口(validate_run接口),run接口用于基于数据流图和提供的明文数据获得执行结果,validate_run接口用于对执行结果进行验证,并返回验证结果。
一些实施场景中,在获得SessionTester后,可以接收输入接口输入的明文数据,然后可以基于明文数据和原始图执行run接口,获得与原始图对应明文参数。另一些实施场景中,在获得SessionTester后,可以接收输入接口输入的明文数据,对明文数据进行加密,把加密数据分发到各个多方安全计算进程,然后使用新图执行run接口,获得与新图对应的密文参数。需要说明的是,由于SessionTester中包括原始图和新图,所以在获得明文数据后,可以先后基于数据流图和明文数据获得执行结果,也可以同时基于数据流图和明文数据获得执行结果,本说明书对此不作限定。
一些实施场景中,在获得明文参数和密文参数后,可以基于validate_run接口先对密文参数进行解密操作,获得解密参数,然后将解密参数与明文参数进行比较,从而实现对数据流图执行结果正确性的验证。其中,通常对数据流图的执行结果为数字型,所以,在将解密参数与明文参数进行比较时,可以预先设置误差范围,如果解密参数的值与明文参数的值在预设误差范围内,则说明数据流图执行结果正确。如果解密参数的值与明文参数的值不在预设误差范围内,则说明数据流图执行结果不正确。需要说明的是,对数据流图的执行结果还可以是其他类型,如是其他类型,可以通过预设的转化方式将其转化为数字型,其中,本说明书对预设的转化方式不限。
本说明书一些实施例中,可以通过Python语言实现静态优化器以及会话测试器等。当然,也可以采用其他语言来实现密码算子,例如由C语言、C++语言等,本说明书对此不作限定。
本说明书提供的一种节点匹配方法,在将预设明文机器学习模型中的明文算子替换为对应的密码算子过程中,由于优化器测试组件封装了静态优化器,使得在获取数据流图信息和节点信息过程中,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,而且可以为实现自动测试数据流图和图执行结果正确性提供保障。在获取数据流图信息和节点信息后,通过对明文算子替换前后对应的数据流图进行判断,可以确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。通过对明文算子 替换前后的节点信息进行匹配以及对图执行结果的比较,可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。
本说明书中上述方法的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参加即可,每个实施例重点说明的都是与其他实施例的不同之处。相关之处参见方法实施例的部分说明即可。
基于上述所述的一种节点匹配方法,本说明书一个或多个实施例还提供一种节点匹配装置。所述的装置可以包括使用了本说明书实施例所述方法的系统(包括分布式系统)、软件(应用)、模块、组件、服务器、客户端等并结合必要的实施硬件的装置。基于同一创新构思,本说明书实施例提供的一个或多个实施例中的装置如下面的实施例所述。由于装置解决问题的实现方案与方法相似,因此本说明书实施例具体的装置的实施可以参见前述方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
具体地,图4是本说明书提供的一种节点匹配装置的一个实施例的模块结构示意图,如图4所示,本说明书提供的一种节点匹配装置可以包括:信息获取模块120,判断模块122,匹配模块124。
信息获取模块120,可以用于获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
判断模块122,可以用于判断所述第一数据流图是否为所述第二数据流图的子图;
匹配模块124,可以用于确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
基于前述方法所述实施例的描述,本说明书所述装置的另一个实施例中,还可以包括:
明文数据获取模块,可以用于当所述匹配结果为匹配成功时,获取明文数据;
执行结果获得模块,可以用于将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,其中,所述会话测试器包括第一数据流图和第二数据流图;
解密模块,可以用于对所述密文执行结果进行解密,获得解密结果;
结果判断模块,可以用于计算所述明文执行结果与所述解密结果的差值,并判断所述差值是否在预设误差范围内,输出判断结果。
基于前述方法所述实施例的描述,本说明书所述装置的另一个实施例中,所述执行结果获得模块,可以包括:
明文执行结果获得单元,可以用于将所述明文数据输入所述会话测试器包括的第一数据流图,获得明文执行结果;
密文执行结果获得单元,可以用于对所述明文数据进行加密,将加密后的数据分发到各个多方安全计算进程,基于所述各个多方安全计算进程中的数据,执行所述会话测试器包括的第二数据流图,获得密文执行结果。
基于前述方法所述实施例的描述,本说明书所述装置的另一个实施例中,所述信息获取模块120,可以包括:
第一获取单元1200,可以获取预设明文机器学习模型中的优化测试组件,所述优化测试组件包括静态优化器,所述优化测试组件用于在节点匹配过程中保存信息,并利用保存信息对数据流图进行校验;
第一保存单元1202,可以用于基于所述优化测试组件保存所述预设明文机器学习模型对应的第一数据流图和所述第一数据流图中需要替换为密码算子的明文算子节点信息;
模型生成单元1204,可以用于执行静态优化器,将所述预设明文机器学习模型中明文算子替换为所述明文算子对应的密码算子,生成隐私机器学习模型;
第二保存单元1206,可以用于基于所述优化测试组件保存所述隐私机器学习模型对应的第二数据流图和所述第二数据流图中包括的密文算子节点信息;
信息获取单元1208,可以用于获取数据流图信息和节点信息。
基于前述方法所述实施例的描述,本说明书所述装置的另一个实施例中,所述判断模块122,可以包括:
第二获取单元1220,可以用于获取所述第一数据流图和所述第二数据流图中节点对应的唯一标识;
第一组成单元1222,可以用于将所述第一数据流图中节点对应的唯一标识组成第一集合;
第二组成单元1224,可以用于将所述第二数据流图中节点对应的唯一标识组成第二集合;
判断单元1226,可以用于基于节点标识递增规则,判断所述第一集合是否是所述第二集合的子集;
确定单元1228,可以用于当所述第一集合是所述第二集合的子集时,确定所述第一数 据流图是所述第二数据流图的子图。
基于前述方法所述实施例的描述,本说明书所述装置的另一个实施例中,所述明文算子节点信息至少包括需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识;所述密文算子节点信息至少包括密码算子的节点位置标识、密码算子标识。
本说明书提供的一种节点匹配装置,在将预设明文机器学习模型中的明文算子替换为对应的密码算子过程中,由于优化器测试组件封装了静态优化器,使得在获取数据流图信息和节点信息过程中,不仅可以复用已有的明文机器学习模型实现隐私机器学习模型,减少开发成本,而且可以为实现自动测试数据流图和图执行结果正确性提供保障。在获取数据流图信息和节点信息后,通过对明文算子替换前后对应的数据流图进行判断,可以确保原始图的部分没有被修改,仍然能够提供正确地明文机器学习模型执行。通过对明文算子替换前后的节点信息进行匹配以及对图执行结果的比较,可以实现对数据流图和图执行结果正确性的自动化测试,从而提高验证效率。
需要说明的,上述所述的装置根据方法实施例的描述还可以包括其他的实施方式,具体的实现方式可以参照相关方法实施例的描述,在此不作一一赘述。
本说明书还提供一种节点匹配设备的实施例,包括处理器及用于存储处理器可执行指令的存储器,所述指令被所述处理器执行时实现包括以下步骤:
获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
判断所述第一数据流图是否为所述第二数据流图的子图;
确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
需要说明的,上述所述的设备根据方法或装置实施例的描述还可以包括其他的实施方式,如确定相邻井的井距信息、根据储量劈分的井距信息和产量劈分的井距信息确定井距等的实现方式。具体的实现方式可以参照相关方法实施例的描述,在此不作一一赘述。
本说明书还提供一种节点匹配系统的实施例,包括至少一个处理器以及存储计算机可执行指令的存储器,所述处理器执行所述指令时实现上述任意一个或者多个实施例中所述方法的步骤,例如包括:获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述 节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;判断所述第一数据流图是否为所述第二数据流图的子图;确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。所述的系统可以为单独的服务器,也可以包括使用了本说明书的一个或多个所述方法或一个或多个实施例装置的服务器集群、系统(包括分布式系统)、软件(应用)、实际操作装置、逻辑门电路装置、量子计算机等并结合必要的实施硬件的终端装置。
本说明书所提供的方法实施例可以在移动终端、计算机终端、服务器或者类似的运算装置中执行。以运行在服务器上为例,图5是本说明书提供的一种节点匹配服务器的一个实施例的硬件结构框图,该服务器可以是上述实施例中的节点匹配装置或节点匹配系统。如图5所示,服务器10可以包括一个或多个(图中仅示出一个)处理器100(处理器100可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器200、以及用于通信功能的传输模块300。本领域普通技术人员可以理解,图5所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,服务器10还可包括比图5中所示更多或者更少的组件,例如还可以包括其他的处理硬件,如数据库或多级缓存、GPU,或者具有与图5所示不同的配置。
存储器200可用于存储应用软件的软件程序以及模块,如本说明书实施例中的节点匹配方法对应的程序指令/模块,处理器100通过运行存储在存储器200内的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器200可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器200可进一步包括相对于处理器100远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输模块300用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输模块300包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输模块300可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并 且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书提供的上述实施例所述的方法或装置可以通过计算机程序实现业务逻辑并记录在存储介质上,所述的存储介质可以计算机读取并执行,实现本说明书实施例所描述方案的效果。
所述存储介质可以包括用于存储信息的物理装置,通常是将信息数字化后再以利用电、磁或者光学等方式的媒体加以存储。所述存储介质有可以包括:利用电能方式存储信息的装置如,各式存储器,如RAM、ROM等;利用磁能方式存储信息的装置如,硬盘、软盘、磁带、磁芯存储器、磁泡存储器、U盘;利用光学方式存储信息的装置如,CD或DVD。当然,还有其他方式的可读存储介质,例如量子存储器、石墨烯存储器等等。
本说明书提供的上述节点匹配方法或装置实施例可以在计算机中由处理器执行相应的程序指令来实现,如使用windows操作系统的c++语言在PC端实现、linux系统实现,或其他例如使用android、iOS系统程序设计语言在智能终端实现,以及基于量子计算机的处理逻辑实现等。
需要说明的是说明书上述所述的装置、计算机存储介质、系统根据相关方法实施例的描述还可以包括其他的实施方式,具体的实现方式可以参照对应方法实施例的描述,在此不作一一赘述。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于硬件+程序类实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书实施例并不局限于必须是符合行业通信标准、标准计算机数据处理和数据存储规则或本说明书一个或多个实施例所描述的情况。某些行业标准或者使用自定义方式或实施例描述的实施基础上略加修改后的实施方案也可以实现上述实施例相同、等同或相近、或变形后可预料的实施效果。应用这些修改或变形后的数据获取、存储、判断、处理方式等获取的实施例,仍然可以属于本说明书实施例的可选实施方案范围之内。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改 进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元中的部分具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、平板电脑、智能手机等。
虽然本说明书一个或多个实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把部分模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本发明是参照根据本发明实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他 可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储、石墨烯存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
以上所述仅为本说明书一个或多个实施例的实施例而已,并不用于限制本本说明书一个或多个实施例。对于本领域技术人员来说,本说明书一个或多个实施例可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在权利要求范围之内。

Claims (14)

  1. 一种节点匹配方法,其特征在于,包括:
    获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
    判断所述第一数据流图是否为所述第二数据流图的子图;
    确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    当所述匹配结果为匹配成功时,获取明文数据;
    将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,其中,所述会话测试器包括第一数据流图和第二数据流图;
    对所述密文执行结果进行解密,获得解密结果;
    计算所述明文执行结果与所述解密结果的差值,并判断所述差值是否在预设误差范围内,输出判断结果。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,包括:
    将所述明文数据输入所述会话测试器包括的第一数据流图,获得明文执行结果;
    对所述明文数据进行加密,将加密后的数据分发到各个多方安全计算进程,基于所述各个多方安全计算进程中的数据,执行所述会话测试器包括的第二数据流图,获得密文执行结果。
  4. 根据权利要求1所述的方法,其特征在于,所述获取数据流图信息和节点信息,包括:
    获取预设明文机器学习模型中的优化测试组件,所述优化测试组件包括静态优化器,所述优化测试组件用于在节点匹配过程中保存信息,并利用保存信息对数据流图进行校验;
    基于所述优化测试组件,保存所述预设明文机器学习模型对应的第一数据流图和所述第一数据流图中需要替换为密码算子的明文算子节点信息;
    执行静态优化器,将所述预设明文机器学习模型中明文算子替换为所述明文算子对 应的密码算子,生成隐私机器学习模型;
    基于所述优化测试组件,保存所述隐私机器学习模型对应的第二数据流图和所述第二数据流图中包括的密文算子节点信息;
    获取数据流图信息和节点信息。
  5. 根据权利要求1所述的方法,其特征在于,所述判断所述第一数据流图是否为所述第二数据流图的子图,包括:
    获取所述第一数据流图和所述第二数据流图中节点对应的唯一标识;
    将所述第一数据流图中节点对应的唯一标识组成第一集合;
    将所述第二数据流图中节点对应的唯一标识组成第二集合;
    基于节点标识递增规则,判断所述第一集合是否是所述第二集合的子集;
    当所述第一集合是所述第二集合的子集时,确定所述第一数据流图是所述第二数据流图的子图。
  6. 根据权利要求1或4所述的方法,其特征在于,所述明文算子节点信息至少包括需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识;所述密文算子节点信息至少包括密码算子的节点位置标识、密码算子标识。
  7. 一种节点匹配装置,其特征在于,包括:
    信息获取模块,用于获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
    判断模块,用于判断所述第一数据流图是否为所述第二数据流图的子图;
    匹配模块,用于确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
  8. 如权利要求7所述的装置,其特征在于,还包括:
    明文数据获取模块,用于当所述匹配结果为匹配成功时,获取明文数据;
    执行结果获得模块,用于将所述明文数据输入会话测试器,获得明文执行结果和密文执行结果,其中,所述会话测试器包括第一数据流图和第二数据流图;
    解密模块,用于对所述密文执行结果进行解密,获得解密结果;
    结果判断模块,用于计算所述明文执行结果与所述解密结果的差值,并判断所述差值是否在预设误差范围内,输出判断结果。
  9. 如权利要求8所述的装置,其特征在于,所述执行结果获得模块,包括:
    明文执行结果获得单元,用于将所述明文数据输入所述会话测试器包括的第一数据流图,获得明文执行结果;
    密文执行结果获得单元,用于对所述明文数据进行加密,将加密后的数据分发到各个多方安全计算进程,基于所述各个多方安全计算进程中的数据,执行所述会话测试器包括的第二数据流图,获得密文执行结果。
  10. 如权利要求7所述的装置,其特征在于,所述信息获取模块,包括:
    第一获取单元,用于获取预设明文机器学习模型中的优化测试组件,所述优化测试组件包括静态优化器,所述优化测试组件用于在节点匹配过程中保存信息,并利用保存信息对数据流图进行校验;
    第一保存单元,用于基于所述优化测试组件保存所述预设明文机器学习模型对应的第一数据流图和所述第一数据流图中需要替换为密码算子的明文算子节点信息;
    模型生成单元,用于执行静态优化器,将所述预设明文机器学习模型中明文算子替换为所述明文算子对应的密码算子,生成隐私机器学习模型;
    第二保存单元,用于基于所述优化测试组件保存所述隐私机器学习模型对应的第二数据流图和所述第二数据流图中包括的密文算子节点信息;
    信息获取单元,用于获取数据流图信息和节点信息。
  11. 如权利要求7所述的装置,其特征在于,所述判断模块,包括:
    第二获取单元,用于获取所述第一数据流图和所述第二数据流图中节点对应的唯一标识;
    第一组成单元,用于将所述第一数据流图中节点对应的唯一标识组成第一集合;
    第二组成单元,用于将所述第二数据流图中节点对应的唯一标识组成第二集合;
    判断单元,用于基于节点标识递增规则,判断所述第一集合是否是所述第二集合的子集;
    确定单元,用于当所述第一集合是所述第二集合的子集时,确定所述第一数据流图是所述第二数据流图的子图。
  12. 根据权利要求7或10所述的装置,其特征在于,所述明文算子节点信息至少包括需要替换的明文算子的节点位置标识、与明文算子对应的密码算子标识;所述密文算子节点信息至少包括密码算子的节点位置标识、密码算子标识。
  13. 一种节点匹配设备,其特征在于,包括处理器及用于存储处理器可执行指令的 存储器,所述指令被所述处理器执行时实现包括以下步骤:
    获取数据流图信息和节点信息,其中,所述数据流图信息包括预设明文机器学习模型对应的第一数据流图和隐私机器学习模型对应的第二数据流图,所述节点信息包括所述第一数据流图中需要替换为密码算子的明文算子节点信息和所述第二数据流图中包括的密文算子节点信息;
    判断所述第一数据流图是否为所述第二数据流图的子图;
    确定所述第一数据流图是所述第二数据流图的子图时,将所述明文算子节点信息与所述密文算子节点信息进行匹配,输出匹配结果。
  14. 一种节点匹配系统,其特征在于,包括至少一个处理器以及存储计算机可执行指令的存储器,所述处理器执行所述指令时实现权利要求1-6中任意一项所述方法的步骤。
PCT/CN2020/083639 2020-04-08 2020-04-08 一种节点匹配方法、装置、设备及系统 WO2021203260A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083639 WO2021203260A1 (zh) 2020-04-08 2020-04-08 一种节点匹配方法、装置、设备及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083639 WO2021203260A1 (zh) 2020-04-08 2020-04-08 一种节点匹配方法、装置、设备及系统

Publications (1)

Publication Number Publication Date
WO2021203260A1 true WO2021203260A1 (zh) 2021-10-14

Family

ID=78023829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083639 WO2021203260A1 (zh) 2020-04-08 2020-04-08 一种节点匹配方法、装置、设备及系统

Country Status (1)

Country Link
WO (1) WO2021203260A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185900A (zh) * 2021-12-20 2022-03-15 平安付科技服务有限公司 业务数据处理方法、装置、计算机设备及存储介质
CN115185525A (zh) * 2022-05-17 2022-10-14 贝壳找房(北京)科技有限公司 数据倾斜代码块定位方法、装置、设备、介质及程序产品
CN115774663A (zh) * 2022-09-15 2023-03-10 江苏瑞蓝自动化设备集团有限公司 一种LabVIEW的测试系统的优化方法、装置、设备及存储介质
CN117077161A (zh) * 2023-07-31 2023-11-17 上海交通大学 基于动态规划求解的隐私保护深度模型构建方法与系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016048776A1 (en) * 2014-09-26 2016-03-31 Thomson Licensing Key-private cryptosystems based on the quadratic residuosity
CN108717514A (zh) * 2018-05-21 2018-10-30 中国人民大学 一种机器学习中的数据隐私保护方法和系统
CN110033266A (zh) * 2019-02-19 2019-07-19 阿里巴巴集团控股有限公司 区块链中实现隐私保护的方法、节点和存储介质
CN110059497A (zh) * 2019-02-19 2019-07-26 阿里巴巴集团控股有限公司 区块链中实现隐私保护的方法、节点和存储介质
CN110750801A (zh) * 2019-10-11 2020-02-04 矩阵元技术(深圳)有限公司 数据处理方法、装置、计算机设备和存储介质
CN111415013A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 隐私机器学习模型生成、训练方法、装置及电子设备
CN111414646A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 实现隐私保护的数据处理方法和装置
CN111428880A (zh) * 2020-03-20 2020-07-17 矩阵元技术(深圳)有限公司 隐私机器学习实现方法、装置、设备及存储介质
CN111488277A (zh) * 2020-04-08 2020-08-04 矩阵元技术(深圳)有限公司 一种节点匹配方法、装置、设备及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016048776A1 (en) * 2014-09-26 2016-03-31 Thomson Licensing Key-private cryptosystems based on the quadratic residuosity
CN108717514A (zh) * 2018-05-21 2018-10-30 中国人民大学 一种机器学习中的数据隐私保护方法和系统
CN110033266A (zh) * 2019-02-19 2019-07-19 阿里巴巴集团控股有限公司 区块链中实现隐私保护的方法、节点和存储介质
CN110059497A (zh) * 2019-02-19 2019-07-26 阿里巴巴集团控股有限公司 区块链中实现隐私保护的方法、节点和存储介质
CN110750801A (zh) * 2019-10-11 2020-02-04 矩阵元技术(深圳)有限公司 数据处理方法、装置、计算机设备和存储介质
CN111415013A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 隐私机器学习模型生成、训练方法、装置及电子设备
CN111414646A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 实现隐私保护的数据处理方法和装置
CN111428880A (zh) * 2020-03-20 2020-07-17 矩阵元技术(深圳)有限公司 隐私机器学习实现方法、装置、设备及存储介质
CN111488277A (zh) * 2020-04-08 2020-08-04 矩阵元技术(深圳)有限公司 一种节点匹配方法、装置、设备及系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAN YANG, XIAO-LIN GUI, JING YAO, JIAN-CAI LIN, FENG TIAN, XUE-JUN ZHANG: "Research on algorithms of data encryption scheme that supports homomorphic arithmetical operations", JOURNAL ON COMMUNICATIONS, RENMIN YOUDIAN CHUBANSHE, BEIJING, CN, vol. 36, no. 1, 1 January 2015 (2015-01-01), CN , pages 171 - 182, XP055856621, ISSN: 1000-436X, DOI: 10.11959/j.issn.1000-436x.2015019 *
ZHOU TANPING, YANG HAIBING; YANG XIAOYUAN; HAN YILIANG: "A Fully Homomorphic Proxy Re-encryption Scheme Based on LWE", SICHUAN DAXUE XUEBAO (GONGCHENG KEXUE BAN), SICHUAN DAXUE, CHENGDU, CN, vol. 48, no. 1, 1 January 2016 (2016-01-01), CN , pages 99 - 105, XP055856624, ISSN: 1009-3087, DOI: 10.15961/j.jsuese.2016.01.015 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185900A (zh) * 2021-12-20 2022-03-15 平安付科技服务有限公司 业务数据处理方法、装置、计算机设备及存储介质
CN114185900B (zh) * 2021-12-20 2024-04-09 平安付科技服务有限公司 业务数据处理方法、装置、计算机设备及存储介质
CN115185525A (zh) * 2022-05-17 2022-10-14 贝壳找房(北京)科技有限公司 数据倾斜代码块定位方法、装置、设备、介质及程序产品
CN115774663A (zh) * 2022-09-15 2023-03-10 江苏瑞蓝自动化设备集团有限公司 一种LabVIEW的测试系统的优化方法、装置、设备及存储介质
CN117077161A (zh) * 2023-07-31 2023-11-17 上海交通大学 基于动态规划求解的隐私保护深度模型构建方法与系统
CN117077161B (zh) * 2023-07-31 2024-05-03 上海交通大学 基于动态规划求解的隐私保护深度模型构建方法与系统

Similar Documents

Publication Publication Date Title
WO2021203260A1 (zh) 一种节点匹配方法、装置、设备及系统
TWI682304B (zh) 基於圖結構模型的異常帳號防控方法、裝置以及設備
JP6804668B2 (ja) ブロックデータ検証方法および装置
CN111488277B (zh) 一种节点匹配方法、装置、设备及系统
CN113159327B (zh) 基于联邦学习系统的模型训练方法、装置、电子设备
TWI745861B (zh) 資料處理方法、裝置和電子設備
CN106133537B (zh) 一种fpga功能模块仿真验证方法及其系统
CN109101415A (zh) 基于数据库比对的接口测试方法、系统、设备和存储介质
WO2021114585A1 (zh) 模型训练方法、装置和电子设备
WO2017020590A1 (zh) 一种芯片验证方法和装置、设备、存储介质
CN107483485A (zh) 授权码的生成方法、授权方法、相关装置及终端设备
WO2021017424A1 (zh) 数据预处理方法、密文数据获取方法、装置和电子设备
JP2018505506A (ja) 機械ベースの命令編集
US10747657B2 (en) Methods, systems, apparatuses and devices for facilitating execution of test cases
TW201923647A (zh) 可溯源的多方數據處理方法、裝置及設備
CN108345453A (zh) 代码生成方法、代码生成器及可读存储介质
CN114329644B (zh) 对逻辑系统设计进行加密仿真的方法、设备及存储介质
CN112860587B (zh) Ui自动测试方法和装置
CN109858914A (zh) 区块链数据验证方法、装置、计算机设备及可读存储介质
CN116257303B (zh) 一种数据安全处理的方法、装置、存储介质及电子设备
US20160055287A1 (en) Method for decomposing a hardware model and for accelerating formal verification of the hardware model
Pan et al. A new reliability evaluation method for networks with imperfect vertices using BDD
WO2023020448A1 (zh) 数据处理方法、装置和存储介质
CN113469377B (zh) 联邦学习审计方法和装置
CN107291524A (zh) 一种远程命令的处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929710

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20929710

Country of ref document: EP

Kind code of ref document: A1