CN113824546B

CN113824546B - Method and device for generating information

Info

Publication number: CN113824546B
Application number: CN202010567116.9A
Authority: CN
Inventors: 何恺; 杨青友; 洪爵
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2024-04-02
Anticipated expiration: 2040-06-19
Also published as: CN113824546A

Abstract

The application discloses a method and a device for generating information, and relates to the field of artificial intelligence. The specific implementation scheme comprises the following steps: obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample; based on the gradient information, determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal; the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm is sent to a feature providing end; receiving a second feature and a corresponding optimal segmentation point sent by the feature extraction end, wherein the second feature and the corresponding optimal segmentation point are determined from the held feature by the feature providing end based on ciphertext of the gradient information and multiparty security calculation; and determining a final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on multiparty security calculation between the feature providing ends. This embodiment improves information security.

Description

Method and device for generating information

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an artificial intelligence technology.

Background

The data required for machine learning often involves multiple fields. Since the data held by a single data owner may be incomplete, multiple data owners are often required to cooperate to perform joint training of the model in order to obtain a model with better prediction. Federal learning is a distributed machine learning technology, and aims to realize joint modeling, break data islands and improve model effects on the basis of guaranteeing data privacy safety. In federal learning, characteristic information is distributed with tags among different data owners. For the tree model, since the optimal division points need to be jointly calculated, leakage of information may be unavoidable.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for generating information.

According to a first aspect of the present disclosure, an embodiment of the present disclosure provides a method for generating information, applied to a tag providing end, including: obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample; based on the gradient information, determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal; the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm is sent to a feature providing end; receiving a second feature and a corresponding optimal segmentation point sent by the feature providing end, wherein the second feature and the corresponding optimal segmentation point are determined from the held feature by the feature providing end based on ciphertext of the gradient information and multiparty security calculation; and determining a final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on multiparty security calculation between the feature providing ends.

According to a second aspect of the present disclosure, an embodiment of the present disclosure provides an apparatus for generating information, which is partially disposed at a tag providing end, including: the first determining unit is configured to obtain gradient information of the sample according to the sample label and the prediction information of the current model for the sample; a second determining unit configured to determine, based on the gradient information, a first feature and a corresponding optimal division point from features held by the local terminal; the sending unit is configured to send the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm to the feature providing end; a receiving unit configured to receive a second feature and a corresponding optimal segmentation point sent by the feature providing end, where the second feature and the corresponding optimal segmentation point are determined by the feature providing end from the held features based on the ciphertext of the gradient information and multiparty security calculation; and a generating unit configured to determine a final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on the multiparty security calculation between the feature providing terminals.

According to a third aspect of the present disclosure, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

According to a fourth aspect of the present disclosure, an embodiment of the present disclosure provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the first aspects.

According to the technology, in the process of determining the final division point, the label providing end and the feature providing end interact based on the multiparty security computing technology, so that each end can expose held data to the other side to the minimum extent, and information leakage is avoided.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of one embodiment of a method for generating information according to the present application;

FIG. 2 is a schematic illustration of one application scenario of a method for generating information according to the present application;

FIG. 3 is a flow chart of one embodiment of a method for determining a second feature and corresponding optimal segmentation point according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of an apparatus for generating information according to the present application;

fig. 5 is a block diagram of an electronic device for implementing a method for generating information of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, a flow chart 100 of one embodiment of a method for generating information according to the present disclosure is shown. The method for generating information comprises the following steps:

s101, obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample.

In this embodiment, the method for generating information may be applied to a tag providing end. Here, the tag provider may hold the tag and part of the features of the sample. The label providing end can obtain gradient information of each sample according to the sample label and the prediction information of the current model aiming at the sample. Here, gradient information of the sample may be obtained based on a model-based loss function. As an example, the tag provider may first calculate the derivative of the loss function. Then, for each sample, gradient information can be calculated according to the sample label of the sample, the prediction information of the current model for the sample, and the derivative of the loss function. It is to be appreciated that the predictive information for the current model for the sample can be generated jointly by multiple participants for the joint training model.

In practice, multiple rounds of model training are often performed on the model. The current model may refer to a model obtained from a previous round of training. The jointly trained model may be a tree model. The tree model is a supervised machine learning model, e.g., the tree model may be a binary tree, etc. As an example, the algorithm for implementing the tree model may include GBDT (Gradient Boosting Decision Tree, gradient-lifting decision tree) or the like. The tree model may include a plurality of nodes, each of which may correspond with a location identifier that may be used to identify the location of the node in the tree model, e.g., the number of the node. The plurality of nodes may include leaf nodes and non-leaf nodes. Nodes in the tree model that cannot be split down are called leaf nodes. The leaf nodes are associated with leaf values, each of which may represent a prediction result. Nodes in the tree model that can be split down are called non-leaf nodes. The non-leaf nodes may include a root node, and other nodes other than the leaf node and the root node. The non-leaf nodes may correspond to partition points that may be used to select a predicted path.

In an actual application scenario, a participant for the joint training model may include a tag providing end and a feature providing end. The label provider may hold labels and partial features of the sample. The feature providing terminal may hold a part of the features of the sample. Taking one scenario as an example, in the present scenario, the label provider may be a credit agency holding labels (e.g., high, medium, low, etc.) and partial features (e.g., user age, gender, etc.) of the user's credit risk. The feature provider may be a big data company that may hold some of the features of the user (e.g., academic, annual revenue, etc.). In the course of the joint training model, credit agencies are unable to provide tags and features they hold to large data companies in order to protect data privacy. Big data companies are also unable to provide the credit agency with the features they hold. It will be appreciated that one or more feature providing ends may be included in the actual joint training.

In some alternative implementations of the present embodiment, the gradient information of the sample may include a first order gradient and a second order gradient. As an example, the tag provider may first calculate the first and second derivatives of the loss function. Then, for each sample, a first-order gradient and a second-order gradient can be calculated according to the sample label of the sample, the prediction information of the current model for the sample, and the first-order derivative and the second-order derivative of the loss function. According to the implementation mode, gradient information comprising a first-order gradient and a second-order gradient of each sample can be obtained, and conditions are provided for subsequent calculation of the segmentation gain.

S102, determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal based on the gradient information.

In this embodiment, according to the gradient information obtained in S101, the tag providing end may determine the first feature and the corresponding optimal segmentation point from the features held by the local end in various manners. As an example, for each feature held by the home terminal, since the tag providing terminal holds all feature data corresponding to the home terminal feature, the tag providing terminal can determine all possible division points corresponding to each feature. In this way, the label providing end can respectively calculate the segmentation gains obtained when the sample is divided into subsets by the segmentation points corresponding to various features in a plaintext form, select one feature from various features as a first feature according to the segmentation gains, and determine the optimal segmentation point corresponding to the first feature. For example, a feature corresponding to the maximum division gain in the calculated division gains may be selected as the first feature, and a division point corresponding to the maximum division gain may be used as the optimal division point. Here, the division gain may be calculated based on gradient information. As an example, the division gain score may be calculated by the following formula:

wherein G is _L Representing the sum of first-order gradients of samples in the divided left node, H _L Representing the sum of the second order gradients of the samples in the divided left node, G _R Representing the sum of first-order gradients of samples in right nodes after division, H _R Represents the sum of the second-order gradients of the samples in the right node after division, G represents the sum of the first-order gradients of the samples when not divided, H represents the sum of the second-order gradients of the samples when not divided, and λ represents the regularized term coefficient.

In some alternative implementations of the present embodiment, S102 may specifically be performed as follows:

first, for each division point corresponding to each feature held by the local terminal, the division gain corresponding to each division point is calculated according to the gradient information.

In this implementation, the tag provider may first determine all the partitioning points corresponding to all the partitioning ways of each feature held. For each corresponding division point of each feature, the tag providing end may calculate a division gain corresponding to the division point according to the gradient information. For example, for a certain partition point of a certain feature, the tag providing end may calculate the sum of first-order gradients and the sum of second-order gradients of all samples in the left node divided based on the partition point, and may also calculate the sum of first-order gradients and the sum of second-order gradients of all samples in the right node. And calculating the segmentation gain corresponding to the segmentation point based on the sum of the first-order gradients and the sum of the second-order gradients of the left node, the sum of the first-order gradients and the sum of the second-order gradients of the right node.

Then, based on the comparison result of the division gains corresponding to the division points, the first feature and the corresponding optimal division point are determined from the features held by the local terminal.

In this implementation manner, the feature providing terminal may determine, based on the comparison result of the division gains corresponding to the respective division points, one feature from the features held by the local terminal as the first feature and the corresponding optimal division point. For example, a greedy method may be used to select a feature corresponding to a maximum segmentation gain of the computed segmentation gains as the first feature, and a segmentation point corresponding to the maximum segmentation gain is used as an optimal segmentation point. According to the implementation mode, the label providing end can select the first feature and the corresponding optimal segmentation point according to the segmentation gain of each segmentation point corresponding to each feature held by the label providing end, so that the feature with the largest segmentation gain and the corresponding optimal segmentation point in the features held by the label providing end can be selected.

S103, the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm is sent to the feature providing end.

In this embodiment, the tag providing end may firstly encrypt the gradient information obtained in S101 by using a homomorphic encryption algorithm to obtain the ciphertext of the gradient information. And then, the tag providing end can send the ciphertext of the obtained gradient information to the feature providing end. Homomorphic encryption (Homomorphic Encryption) is an encryption technique. The homomorphically encrypted data is processed to obtain an output, and the output is decrypted, the result of which is the same as the output result obtained by processing the unencrypted original data by the same method. Homomorphic encryption algorithms may include addition homomorphic encryption algorithms and multiplication homomorphic encryption algorithms.

S104, receiving the second feature and the corresponding optimal segmentation point sent by the feature providing end.

In this embodiment, the tag providing end may receive the second feature and the corresponding optimal division point sent by the feature providing end. Here, the second feature and the corresponding optimal segmentation point may be a ciphertext based on gradient information by the feature providing end and multiparty security calculation is determined from the held feature. As an example, after the feature providing end receives the ciphertext of the gradient information sent by the tag providing end, the feature providing end may determine the second feature and the corresponding optimal division point from the held features according to the ciphertext of the gradient information and the multiparty security calculation with the tag providing end. For example, the feature providing end adopts multiparty secure computation when computing the segmentation gain and comparing the segmentation gain. Therefore, the feature providing end does not provide the segmentation gain of the plaintext for the tag providing end, so that the tag providing end is prevented from deducing the data of the features held by the feature providing end based on the segmentation gain of the plaintext, and the data security of the feature providing end is protected.

S105, determining a final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on multiparty security calculation between the feature providing ends.

In this embodiment, the tag providing end may determine the final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on the multiparty security calculation between the feature providing ends. For example, assuming that a division gain divided based on an optimal division point corresponding to a first feature is taken as a first division gain, a division gain divided based on an optimal division point corresponding to a second feature is taken as a second division gain, the tag providing end and the feature providing end may calculate the first division gain and the second division gain in a multiparty secure calculation manner, and compare magnitudes of the first division gain and the second division gain. Then, the tag providing end may determine a final division point based on a comparison result of the first division gain and the second division gain. For example, a division point corresponding to a larger value of the first division gain and the second division gain may be selected as the final division point.

With continued reference to fig. 2, fig. 2 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 2, the label providing end a is a credit agency, which holds labels and part of features of the user's credit risk. The label providing end A firstly generates gradient information of a sample according to sample labels and prediction information of a current model aiming at the sample. And secondly, the label providing end A determines a first characteristic and a corresponding optimal segmentation point split_a from the characteristics held by the label providing end based on gradient information. And then, the tag providing end A sends the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm to the feature providing end B. Then, the tag providing end A receives a second feature and a corresponding optimal segmentation point split_b sent by the feature providing end B, wherein the second feature and the corresponding optimal segmentation point split_b are determined by the feature providing end B from the held features based on ciphertext of gradient information and multiparty security calculation. Finally, the label providing end A determines a final division point from the optimal division point split_a corresponding to the first feature and the optimal division point split_b corresponding to the second feature based on multiparty security calculation between the label providing end A and the feature providing end B.

In the method provided by the embodiment of the present disclosure, in the process of determining the final division point, the tag providing end and the feature providing end interact based on the multiparty secure computing technology, so that each end can expose the held data to the other end to the minimum extent, thereby avoiding information leakage.

With further reference to fig. 3, a flow 300 of one embodiment of a method for determining a second feature and corresponding optimal segmentation point is shown. The process 300 for determining the second feature and the corresponding optimal segmentation point includes the steps of:

s301, for each of the division points corresponding to each of the held features, determining ciphertext of gradient information of the samples in the left node and the right node obtained based on each of the division points.

In this embodiment, the method for determining the second feature and the corresponding optimal segmentation point may be applied to the feature providing end. Here, the feature providing terminal may hold a part of the features of the sample. For each of the division points corresponding to each of the held features, the feature providing end can determine ciphertext of gradient information of samples in the left node and the right node obtained after the nodes are divided based on the division points.

S302, based on multiparty security calculation with the label providing end, steps S3021 to S3023 are executed.

In the present embodiment, the feature providing end may perform the following steps S3021 to S3023 based on the multiparty security calculation with the tag providing end.

S3021, converting the sum of ciphertext of gradient information of samples in the left node and the right node obtained based on each partition point into fragments.

In this embodiment, the feature providing terminal may calculate the sum of the ciphertext of the gradient information of the left node based on each division point first, for example, the sum of the ciphertext of the first-order gradient and the sum of the ciphertext of the second-order gradient of all samples in the left node divided based on the division point, and may calculate the sum of the ciphertext of the first-order gradient and the sum of the ciphertext of the second-order gradient of all samples in the right node. As an example of this, the number of devices,the sum of the various ciphertexts can be calculated by homomorphic ciphertext addition. Then, the feature providing end can convert the sum of the ciphertext of the first-order gradient and the sum of the ciphertext of the second-order gradient of all samples in the left node, and the sum of the ciphertext of the first-order gradient and the sum of the ciphertext of the second-order gradient of all samples in the right node into fragments based on a multiparty security calculation technology. For example, the sum of the ciphertext may be converted into fragments based on an arithmetic circuit (Arithmetic Circuit). Taking two parties participating in calculation as examples, the arithmetic circuit can realize the addition and fragmentation calculation of the two parties based on data. For example, data x is randomly split into x ₀ =x-r and x ₁ R, where r is a random number. Where "=" means assignment. One of the two parties participating in the calculation holds x ₀ The other party holds x ₁ . Since no one of the shards holds all the shards of the corresponding data, no information is revealed other than the result.

In some alternative implementations of the present embodiment, S3021 may further specifically be performed as follows: and (3) adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.

In this implementation manner, the feature providing end may adopt an addition homomorphic encryption algorithm to segment the sum of the ciphertexts of the gradient information of the samples in the left node and the right node obtained by each partition point. For example, an additive homomorphic encryption algorithm may be employed to convert the sum of the first-order gradient ciphertext and the sum of the second-order gradient ciphertext for all samples in the left node, and the sum of the first-order gradient ciphertext and the sum of the second-order gradient ciphertext for all samples in the right node into fragments. By the implementation mode, the data can be fragmented by adopting an addition homomorphic encryption algorithm.

S3022, calculating the division gain corresponding to each division point based on the slice.

In this embodiment, the feature providing end and the tag providing end may jointly calculate the division point gains corresponding to the respective division points based on the fragments obtained in S3021 through multiparty security calculation. For example, the segmentation gain may be calculated in the form of homomorphic ciphertext.

S3023, determining a comparison result of the division gains corresponding to the respective division points.

In this embodiment, the feature providing terminal may determine the comparison result of the division gain corresponding to each division point by using multiparty security calculation with the tag providing terminal. The data is compared based on the multiparty security calculation mode, so that the security of the data can be ensured. For example, based on multiparty security calculations, the size of two data may be compared by, taking x and y as examples of data to be compared, fragmenting the data by the above fragmentation method, for x=x ₀ +x ₁ And y=y ₀ +y ₁ Both sides first calculate z ₀ ＝x ₀ -y ₀ And z ₁ ＝x ₁ -y ₁ The two parties combine z=z ₀ +z ₁ Is converted into a garbled circuit z ', and the sign bit (the most significant bit) of the z' is recovered into a plaintext so as to obtain the result of comparing the sizes of x and y. As an example, z can be converted to z' by inadvertently transmitting (Oblivious Transfer) ₀ And z ₁ Conversion to a garbled circuit z' ₀ And z ₁ 'both sides jointly calculate z' =z 'through a garbled circuit' ₀ +z ₁ '。

S303, based on the comparison result, determining a second feature and a corresponding optimal segmentation point from the held features.

In this implementation manner, the feature providing end may determine, according to the comparison result, one feature from the held features as the second feature and determine the corresponding optimal division point. For example, a greedy method may be used to select a feature corresponding to the maximum segmentation gain in the computed segmentation gains as the second feature, and a segmentation point corresponding to the maximum segmentation gain is used as the optimal segmentation point.

In the method provided by the embodiment of the present disclosure, in the process of determining the second feature and the corresponding optimal segmentation point, the tag providing end and the feature providing end interact based on the multiparty secure computing technology, so that each end can minimally expose the held data to the other end, thereby avoiding information leakage.

With further reference to fig. 4, as an implementation of the manner shown in the foregoing figures, the present disclosure provides an embodiment of an apparatus for generating information, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus may specifically be partially disposed in a tag providing end.

As shown in fig. 4, the apparatus 400 for generating information of the present embodiment includes: a first determination unit 401, a second determination unit 402, a transmission unit 403, a reception unit 404, and a generation unit 405. Wherein the first determining unit 401 is configured to obtain gradient information of the sample according to the sample label and prediction information of the current model for the sample; the second determining unit 402 is configured to determine, based on the gradient information, a first feature and a corresponding optimal division point from features held by the local end; the transmitting unit 403 is configured to transmit the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm to the feature providing terminal; the receiving unit 404 is configured to receive a second feature and a corresponding optimal segmentation point sent by the feature providing end, where the second feature and the corresponding optimal segmentation point are determined by the feature providing end from the held features based on the ciphertext of the gradient information and multiparty security calculation; the generating unit 405 is configured to determine a final division point from the optimal division point corresponding to the first feature and the optimal division point corresponding to the second feature based on the multiparty security calculation between the feature providing terminals.

In this embodiment, the specific processes and the technical effects of the first determining unit 401, the second determining unit 402, the transmitting unit 403, the receiving unit 404 and the generating unit 405 of the apparatus 400 for generating information may refer to the descriptions related to S101, S102, S103, S104 and S105 in the corresponding embodiment of fig. 1, and are not repeated herein.

In some optional implementations of this embodiment, the second determining unit 402 is further configured to: for each division point corresponding to each feature held by the local end, calculating the division gain corresponding to each division point according to the gradient information; and determining a first characteristic and a corresponding optimal division point from the characteristics held by the local terminal based on the comparison result of the division gains corresponding to the division points.

In some optional implementations of this embodiment, the apparatus 400 further includes a third determining unit (not shown in the drawing) configured to determine the second feature and a corresponding optimal segmentation point, where the third determining unit includes: a determination subunit (not shown in the figure) configured to determine, for each of the held division points corresponding to each of the features, ciphertext of gradient information of the samples in the left node and the right node obtained based on each of the division points; an execution unit (not shown in the figure) configured to execute a preset step based on a multiparty security calculation with the tag providing terminal, the execution unit comprising: a conversion unit (not shown in the figure) configured to convert the sum of ciphertext of gradient information of samples in the left node and the right node obtained based on each of the division points into fragments; a calculation unit (not shown in the figure) configured to calculate a division gain corresponding to each division point based on the division; a result determination unit (not shown in the figure) configured to determine a comparison result of the division gains corresponding to the respective division points; a segmentation point determination unit (not shown in the figure) configured to determine the second feature and the corresponding optimal segmentation point from the held features based on the comparison result.

In some optional implementations of this embodiment, the conversion unit is further configured to: and (3) adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.

In some optional implementations of this embodiment, the gradient information of the sample includes a first-order gradient and a second-order gradient.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, is a block diagram of an electronic device for generating information according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the methods provided herein for generating information. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods for generating information provided herein.

The memory 502 is a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules (e.g., the first determining unit 401, the second determining unit 402, the transmitting unit 403, the receiving unit 404, and the generating unit 405 shown in fig. 4) corresponding to the method for generating information in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the methods for generating information in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device for generating information, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device for generating information via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for generating information may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device used to generate the information, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, in the process of determining the final division point, the tag providing end and the feature providing end interact based on the multiparty security computing technology, so that each end can expose held data to the other side to the minimum extent, and information leakage is avoided.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for generating information, applied to a label providing end, comprising:

obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample;

based on the gradient information, determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal;

the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm is sent to a feature providing end;

receiving a second feature and a corresponding optimal segmentation point sent by the feature providing end, wherein the second feature and the corresponding optimal segmentation point are determined by the feature providing end through the following modes: determining ciphertext of gradient information of samples in the left node and the right node, which are obtained based on each segmentation point, for each segmentation point corresponding to each held feature; based on multiparty security calculations with the tag provider, the following steps are performed: converting the sum of ciphertext of gradient information of samples in the left node and the right node obtained based on each partition point into fragments; calculating the segmentation gain corresponding to each segmentation point based on the segmentation; determining comparison results of the segmentation gains corresponding to the segmentation points; determining a second feature and a corresponding optimal segmentation point from the held features based on the comparison result;

dividing the optimal dividing points corresponding to the first features to obtain dividing gains, and dividing the optimal dividing points corresponding to the second features to obtain dividing gains, wherein the dividing gains are obtained by dividing the optimal dividing points corresponding to the second features; and determining a final segmentation point based on a comparison result of the first segmentation gain and the second segmentation gain.

2. The method of claim 1, wherein the determining, based on the gradient information, the first feature and the corresponding optimal segmentation point from the features held by the local end includes:

for each division point corresponding to each feature held by the local terminal, calculating the division gain corresponding to each division point according to the gradient information;

and determining a first characteristic and a corresponding optimal division point from the characteristics held by the local terminal based on the comparison result of the division gains corresponding to the division points.

3. The method of claim 1, wherein the converting the sum of ciphertext of gradient information of samples in the left node and the right node obtained based on each of the division points into the fragments comprises:

and (3) adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.

4. The method of claim 1, wherein the gradient information of the sample comprises a first order gradient and a second order gradient.

5. An apparatus for generating information, partially disposed at a tag providing end, comprising:

the first determining unit is configured to obtain gradient information of the sample according to the sample label and the prediction information of the current model for the sample;

a second determining unit configured to determine, based on the gradient information, a first feature and a corresponding optimal division point from features held by the local terminal;

the sending unit is configured to send the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm to the feature providing end;

a receiving unit configured to receive a second feature and a corresponding optimal division point transmitted by the feature providing end, where the second feature and the corresponding optimal division point are determined based on a third determining unit configured at the feature providing end, and the third determining unit includes: a determination subunit configured to determine, for each of the held division points corresponding to each of the features, ciphertext of gradient information of the samples in the left node and the right node obtained based on each of the division points; an execution unit configured to execute a preset step based on multiparty security computation with the tag providing end, the execution unit comprising: a conversion unit configured to convert a sum of ciphertext of gradient information of the samples in the left node and the right node obtained based on each of the division points into fragments; a calculation unit configured to calculate a division gain corresponding to each division point based on the division; a result determination unit configured to determine a comparison result of the division gains corresponding to the respective division points; a division point determination unit configured to determine a second feature and a corresponding optimal division point from the held features based on the comparison result;

a generating unit configured to obtain a division gain based on the division of the optimal division point corresponding to the first feature as a first division gain and obtain a division gain based on the division of the optimal division point corresponding to the second feature as a second division gain based on the multiparty security calculation between the feature providing end and the feature providing end; and determining a final segmentation point based on a comparison result of the first segmentation gain and the second segmentation gain.

6. The apparatus of claim 5, wherein the second determination unit is further configured to:

7. The apparatus of claim 5, wherein the conversion unit is further configured to:

8. The apparatus of claim 5, wherein the gradient information of the sample comprises a first order gradient and a second order gradient.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.