CN111611601A - Multi-data-party user analysis model joint training method and device and storage medium - Google Patents

Multi-data-party user analysis model joint training method and device and storage medium Download PDF

Info

Publication number
CN111611601A
CN111611601A CN202010370875.6A CN202010370875A CN111611601A CN 111611601 A CN111611601 A CN 111611601A CN 202010370875 A CN202010370875 A CN 202010370875A CN 111611601 A CN111611601 A CN 111611601A
Authority
CN
China
Prior art keywords
data
analysis model
user analysis
training
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010370875.6A
Other languages
Chinese (zh)
Inventor
戴佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010370875.6A priority Critical patent/CN111611601A/en
Publication of CN111611601A publication Critical patent/CN111611601A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption

Abstract

The invention relates to artificial intelligence, and discloses a multi-data-party user analysis model joint training method, which comprises the following steps: constructing a public key and a private key, and distributing the public key to at least two data side terminal devices; receiving encrypted sample data obtained by the at least two data side terminal devices by using the public key to encrypt data; decrypting the encrypted sample data by using the private key, and removing repeated data in the decrypted encrypted sample data to obtain training sample data; constructing an initial user analysis model, and training the initial user analysis model according to the training sample data to obtain a user analysis model; and distributing the user analysis model to the at least two data side terminal devices. The invention also relates to a blockchain technique, wherein the public key and the private key are stored in the blockchain.

Description

Multi-data-party user analysis model joint training method and device and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for multi-data-party user analysis model joint training, electronic equipment and a computer readable storage medium.
Background
With the rise of machine learning and big data, most companies can perform modeling training according to the existing user data of the companies to obtain a user analysis model for user behavior analysis. For example, in the current insurance claim settlement rescue service, a plurality of fraud reporting situations occur, in order to solve the problems of low speed and high cost of manual auditing, the current processing method is to use big data row fraud detection, count out the characteristics of known user fraud behaviors by researching the reporting data of own company, use machine learning method for modeling, train to obtain a fraud model, and apply the fraud model to the user reporting detection link. However, the current method has the following problems: the data for model training is only the data of the insurance company, the data volume cannot contain the users in the whole network, and the more general fraud behaviors cannot be counted, so that when the fraud users occur in one company, the users can choose a mode of replacing the company to continue fraud.
In summary, in the existing modeling method, due to privacy protection, companies are generally reluctant to exchange user data, so that training data that can be obtained by each company is only data of the company, and a trained user analysis model may not be very accurate.
Disclosure of Invention
The invention provides a method, a device, electronic equipment and a computer readable storage medium for joint training of multi-party user analysis models, and mainly aims to realize the joint training of the user analysis models by all parties without exchanging data.
Using an asymmetric encryption algorithm to construct a public key and a private key, and distributing the public key to at least two data side terminal devices;
receiving encrypted sample data obtained by the at least two data side terminal devices by using the public key to perform data encryption operation;
decrypting the encrypted sample data by using the private key to obtain sample data;
performing repeated data elimination operation on the sample data to obtain training sample data;
constructing an initial user analysis model, and training the initial user analysis model according to the training sample data to obtain a user analysis model; and
and distributing the user analysis model to the at least two data side terminal devices.
Optionally, the method further comprises:
and performing distributed storage on the sample data by using the following Hash function:
slice_id=(w1×(hash_str(point_name)/b1)+w2×(day_time(time)/b2))
wherein: slice _ id is a fragment number allocated to data, hash _ str (point _ name) is a quantization function of a data name of the data added to the storage node, day _ time (time) is a quantization function of a time period of the data added to the storage node, b1 is the dispersion degree of the data name, and b2 is the dispersion degree of the time period; w1 and w2 are weight coefficients.
Optionally, the performing a repeated data elimination operation on the sample data includes:
calculating the similarity between the data of different users in the sample data;
and according to the similarity between the data, eliminating the data of the repeated user in the sample data.
Optionally, the calculation formula of the similarity is:
Figure RE-GDA0002569105920000021
wherein, XiI-th feature data, Y, representing user XiSim (X, Y) represents the similarity of users X and Y for the ith feature data of user Y.
Optionally, the training the initial user analysis model according to the training sample data to obtain a user analysis model includes:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
In order to solve the above problem, the present invention further provides a multi-data-party user analysis model joint training apparatus, including:
the key construction and distribution module is used for constructing a public key and a private key by using an asymmetric encryption algorithm and distributing the public key to at least two data side terminal devices;
the encryption receiving module is used for receiving the encrypted sample data obtained by the data encryption operation of the at least two data side terminal devices by using the public key;
the decryption summarizing module is used for carrying out decryption operation on the encrypted sample data by using the private key to obtain the sample data;
the data removing module is used for removing the repeated data from the sample data to obtain training sample data;
the model training and distributing module is used for constructing an initial user analysis model and training the initial user analysis model according to the training sample data to obtain a user analysis model; and distributing the user analysis model to the at least two data side terminal devices.
Optionally, the public key and the private key are stored in a block chain, and the performing the operation of removing the repeated data from the sample data includes:
calculating the similarity between the data of different users in the sample data by using the following calculation formula of the similarity:
Figure RE-GDA0002569105920000031
wherein, XiI-th feature data, Y, representing user XiFor the ith feature data of user Y, sim (X, Y) represents the similarity of users X and Y;
and according to the similarity between the data, eliminating the data of the repeated user in the sample data.
Training the initial user analysis model according to the training sample data to obtain a user analysis model, including:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the multi-data-party user analysis model joint training described above.
To solve the above problem, the present invention further provides a computer-readable storage medium having at least one instruction stored therein, where the at least one instruction is executed by a processor in an electronic device to implement the joint training of multiple data-party user analysis models described above.
The embodiment of the invention uses an asymmetric encryption algorithm to construct a public key and a private key, and distributes the public key to at least two data side terminal devices, and the at least two data side terminal devices use the public key to carry out data encryption operation, thereby preventing the data of each data side from being exchanged and leaked; furthermore, repeated data elimination operation is carried out on the sample data, so that the calculated amount of model training can be reduced, and the accuracy of the model can be improved; in addition, the initial user analysis model is trained according to training sample data obtained from the at least two data side terminal devices, so that the user analysis model can be trained by combining all data, and the accuracy of the model is higher.
Drawings
FIG. 1 is a schematic flow chart of a method for joint training of multiple data user analysis models according to an embodiment of the present invention;
FIG. 2 is a block diagram of a multi-data-party user analysis model joint training apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an internal structure of an electronic device for a multi-data-party user analysis model joint training method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a multi-data-party user analysis model joint training method. Referring to fig. 1, a flow chart of a multi-data-party user analysis model joint training method according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the multi-data-party user analysis model joint training method includes:
s1, constructing a public key and a private key by using an asymmetric encryption algorithm, and distributing the public key to at least two data parties.
In detail, in the embodiment of the invention, a third party organization constructs the public key and the private key through a pre-constructed model training system, reserves the private key, and distributes the public key to at least two data side terminal devices. It is emphasized that, to further ensure the privacy and security of the public and private keys, the public and private keys may also be stored in nodes of a blockchain.
And S2, receiving the encrypted sample data obtained by the data encryption operation of the at least two data side terminal devices by using the public key.
In the embodiment of the invention, when the at least two data side terminal devices receive the public key of the third party organization, the public key is used for encrypting respective user data, and the encrypted user data is sent to the third party organization, thereby realizing the confidentiality of data.
And S3, decrypting the encrypted data by using the private key to obtain sample data.
In detail, the decrypting the encrypted data by using the private key to obtain sample data includes:
decrypting the encrypted sample data of the at least two data side terminal devices by using the private key to obtain decrypted data;
summarizing the decrypted data to obtain summarized data;
and performing distributed storage on the summarized data to obtain the sample data.
Because the summarized data is data provided by at least two data parties, the data volume is huge, and in order to reduce the storage and calculation pressure of computer equipment and realize permanent recording of the summarized data, the embodiment of the invention adopts a block chain mechanism to store the summarized data in a distributed manner.
In detail, the present invention performs distributed storage on the summarized data by using the following Hash function:
slice_id=(w1×(hash_str(point_name)/b1)+w2×(day_time(time)/b2))
wherein: slice _ id is a fragment number allocated to data in a database, hash _ str (point _ name) is a quantization function of a data name of a storage node of the database to which the data is added, day _ time (time) is a quantization function of a time period of the storage node of the database to which the data is added, b1 is the dispersion degree of the data name in the database, and b2 is the dispersion degree of the time period in the database; w1 and w2 are weight coefficients and can be set artificially. In an extreme case, when w2 is set to 0, it indicates that data is distributed according to the data names of the data completely, and the data of all times of each data name is stored in the same data slice or a backup slice thereof; similarly, if w1 is set to 0, it indicates that data is distributed according to the time period of data addition to the storage node, and data of all data names in each time period is stored in the same data slice or a backup slice thereof.
And S4, performing repeated data elimination operation on the sample data to obtain training sample data.
Because the sample data is from different data parties, there may be repeated data, such as data including the user lie XX in the data party a, and data including the user lie XX in the data party B, and in order to reduce the calculation amount of model training and improve the model accuracy, the embodiment of the present invention needs to perform the operation of removing the repeated data from the sample data.
In detail, the embodiment of the present invention calculates the similarity between the data of different users in the sample data, and eliminates the data of the repeat user in the sample data according to the similarity between the data. Wherein, the calculation formula of the similarity is as follows:
Figure RE-GDA0002569105920000061
wherein, XiI-th feature data, Y, representing user XiSim (X, Y) represents the similarity of users X and Y for the ith feature data of user Y.
sim (X, Y) values range from-1 to 1, where-1 means that the two vectors point in exactly the opposite direction, 1 means that their points are identical, and 0 usually means that they are independent of each other. In the embodiment of the present invention, when the sim (X, Y) value is 1, it indicates that the user X and the user Y are duplicate data, and the common user of the at least two data parties, therefore, the data of the user X is deleted or the data of the user Y is deleted.
S5, constructing an initial user analysis model, and training the initial user analysis model according to the training sample data to obtain a user analysis model.
In detail, the initial user analysis model can be constructed by using a linear model, a tree structure model and a convolutional neural network model.
In detail, the training of the model can be completed by the following steps in the implementation of the invention:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
And S6, distributing the user analysis model to the at least two data side terminal devices.
After the user analysis model is distributed to the at least two data parties, the at least two data parties can utilize the user analysis model to perform analysis and detection of user behaviors.
The embodiment of the invention uses an asymmetric encryption algorithm to construct a public key and a private key, and distributes the public key to at least two data side terminal devices, so that the at least two data side terminal devices perform data encryption operation by using the public key, thereby preventing data exchange and leakage of each side; furthermore, repeated data elimination operation is carried out on the sample data, so that the calculated amount of model training can be reduced, and the accuracy of the model can be improved; furthermore, the initial user analysis model is trained according to training sample data obtained from the at least two data side terminal devices, so that the user analysis model can be trained by combining all data, and the model accuracy is higher.
FIG. 2 is a functional block diagram of the multi-data-party user analysis model joint training apparatus according to the present invention.
The multi-data-party user analysis model joint training apparatus 100 of the present invention can be installed in an electronic device. According to the realized functions, the multi-data-party user analysis model joint training device can comprise a key construction and distribution module 101, an encryption receiving module 102, a decryption summarizing module 103, a data eliminating module 104 and a model training and distribution module 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the key construction and distribution module 101 is configured to construct a public key and a private key using an asymmetric cryptographic algorithm, and distribute the public key to at least two data side terminal devices.
In detail, in the embodiment of the invention, a third party organization constructs the public key and the private key through a pre-constructed model training system, reserves the private key, and distributes the public key to at least two data side terminal devices. It is emphasized that, to further ensure the privacy and security of the public and private keys, the public and private keys may also be stored in nodes of a blockchain.
The encryption receiving module 102 is configured to receive encryption sample data obtained by the at least two data side terminal devices performing data encryption operation by using the public key.
In the embodiment of the present invention, when the at least two data side terminal devices receive the public key of the third party organization, the public key is used to encrypt respective user data, and the encrypted user data is sent to the third party organization.
The decryption summary module 103 is configured to perform a decryption operation on the encrypted sample data by using the private key to obtain the sample data.
In detail, the decrypting the encrypted data by using the private key to obtain sample data includes:
decrypting the encrypted sample data of the at least two data side terminal devices by using the private key to obtain decrypted data;
summarizing the decrypted data to obtain summarized data;
and performing distributed storage on the summarized data to obtain the sample data.
Because the summarized data is data provided by at least two data parties, the data volume is huge, and in order to reduce the storage and calculation pressure of computer equipment and realize permanent recording of the summarized data, the embodiment of the invention adopts a block chain mechanism to store the summarized data in a distributed manner.
In detail, the present invention performs distributed storage on the summarized data by using the following Hash function:
slice_id=(w1×(hash_str(point_name)/b1)+w2×(day_time(time)/b2))
wherein: slice _ id is a fragment number allocated to data in a database, hash _ str (point _ name) is a quantization function of a data name of a storage node of the database to which the data is added, day _ time (time) is a quantization function of a time period of the storage node of the database to which the data is added, b1 is the dispersion degree of the data name in the database, and b2 is the dispersion degree of the time period in the database; w1 and w2 are weight coefficients and can be set artificially. In an extreme case, when w2 is set to 0, it indicates that data is distributed according to the data names of the data completely, and the data of all times of each data name is stored in the same data slice or a backup slice thereof; similarly, if w1 is set to 0, it indicates that data is distributed according to the time period of data addition to the storage node, and data of all data names in each time period is stored in the same data slice or a backup slice thereof.
The data eliminating module 104 is configured to perform repeated data eliminating operation on the sample data to obtain training sample data.
The sample data is from different data parties, so that repeated data may exist, and in order to reduce the calculation amount of model training and improve the model accuracy, the repeated data removal operation needs to be performed on the sample data.
In detail, the embodiment of the present invention calculates the similarity between the data of different users in the sample data, and eliminates the data of the repeat user in the sample data according to the similarity between the data. Wherein, the calculation formula of the similarity is as follows:
Figure RE-GDA0002569105920000091
wherein, XiI-th feature data, Y, representing user XiSim (X, Y) represents the similarity of users X and Y for the ith feature data of user Y.
sim (X, Y) values range from-1 to 1, where-1 means that the two vectors point in exactly the opposite direction, 1 means that their points are identical, and 0 usually means that they are independent of each other. In the embodiment of the present invention, when the sim (X, Y) value is 1, it indicates that the user X and the user Y are duplicate data, and the common user of the at least two data parties, therefore, the data of the user X is deleted or the data of the user Y is deleted.
The model training and distributing module 105 is configured to construct an initial user analysis model, train the initial user analysis model according to the training sample data, and obtain a user analysis model; and distributing the user analysis model to the at least two data side terminal devices.
In detail, the initial user analysis model can be constructed by using a linear model, a tree structure model and a convolutional neural network model.
In detail, the training of the model can be completed by the following steps in the implementation of the invention:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
After the user analysis model is distributed to the at least two data parties, the at least two data parties can utilize the user analysis model to perform analysis and detection of user behaviors.
Fig. 3 is a schematic structural diagram of an electronic device for implementing joint training of multiple data user analysis models according to the present invention.
The electronic device 1 may include a processor 10, a memory 11 and a bus, and may further include a computer program, such as a multi-data-party user analysis model joint training program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the multi-data-party user analysis model joint training program 12, but also for temporarily storing data that has been output or will be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., a multi-data user analysis model joint training program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The multi-data-party user analysis model joint training program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions, and when executed in the processor 10, can implement:
using an asymmetric encryption algorithm to construct a public key and a private key, and distributing the public key to at least two data side terminal devices; it should be emphasized that, in order to further ensure the privacy and security of the public key and the private key, the public key and the private key may also be stored in a node of a block chain;
receiving encrypted sample data obtained by the at least two data side terminal devices by using the public key to perform data encryption operation;
decrypting the encrypted sample data by using the private key to obtain sample data;
performing repeated data elimination operation on the sample data to obtain training sample data;
constructing an initial user analysis model, and training the initial user analysis model according to the training sample data to obtain a user analysis model; and
and distributing the user analysis model to the at least two data side terminal devices.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A multi-data-party user analysis model joint training method is characterized by comprising the following steps:
using an asymmetric encryption algorithm to construct a public key and a private key, and distributing the public key to at least two data side terminal devices;
receiving encrypted sample data obtained by the at least two data side terminal devices by using the public key to perform data encryption operation;
decrypting the encrypted sample data by using the private key to obtain sample data;
performing repeated data elimination operation on the sample data to obtain training sample data;
constructing an initial user analysis model, and training the initial user analysis model according to the training sample data to obtain a user analysis model; and
and distributing the user analysis model to the at least two data side terminal devices.
2. The method for multi-data party user analysis model joint training as defined in claim 1, the method further comprising:
and performing distributed storage on the sample data by using the following Hash function:
slice_id=(w1×(hash_str(point_name)/b1)+w2×(day_time(time)/b2))
wherein: slice _ id is a fragment number allocated to data, hash _ str (point _ name) is a quantization function of a data name of the data added to the storage node, day _ time (time) is a quantization function of a time period of the data added to the storage node, b1 is the dispersion degree of the data name, and b2 is the dispersion degree of the time period; w1 and w2 are weight coefficients.
3. The multi-data party user analysis model joint training method of claim 1, wherein the performing repeated data culling operations on the sample data comprises:
calculating the similarity between the data of different users in the sample data;
and according to the similarity between the data, eliminating the data of the repeated user in the sample data.
4. The multi-data party user analysis model joint training method of claim 3, wherein the similarity is calculated by the formula:
Figure FDA0002474688890000011
wherein, XiI-th feature data, Y, representing user XiSim (X, Y) represents the similarity of users X and Y for the ith feature data of user Y.
5. The method for jointly training multiple data party user analysis models according to any one of claims 1 to 4, wherein the public key and the private key are stored in a blockchain, and the training of the initial user analysis model according to the training sample data to obtain a user analysis model comprises:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
6. A multi-data-party user analysis model joint training apparatus, the apparatus comprising:
the key construction and distribution module is used for constructing a public key and a private key by using an asymmetric encryption algorithm and distributing the public key to at least two data side terminal devices;
the encryption receiving module is used for receiving the encrypted sample data obtained by the data encryption operation of the at least two data side terminal devices by using the public key;
the decryption summarizing module is used for carrying out decryption operation on the encrypted sample data by using the private key to obtain the sample data;
the data removing module is used for removing the repeated data from the sample data to obtain training sample data;
and the model training and distributing module is used for constructing an initial user analysis model, training the initial user analysis model according to the training sample data to obtain a user analysis model, and distributing the user analysis model to the at least two data side terminal devices.
7. The multi-data-party user analysis model joint training apparatus according to claim 6, wherein the performing repeated data culling operations on the sample data comprises:
calculating the similarity between the data of different users in the sample data by using the following calculation formula of the similarity:
Figure FDA0002474688890000021
wherein, XiI-th feature data, Y, representing user XiFor the ith feature data of user Y, sim (X, Y) represents the similarity of users X and Y;
and according to the similarity between the data, eliminating the data of the repeated user in the sample data.
8. The apparatus for multi-data party user analysis model joint training according to claim 6 or 7, wherein the public key and the private key are stored in a block chain, and the training of the initial user analysis model according to the training sample data to obtain a user analysis model comprises:
distributing the initial user analysis model to the at least two data side terminal devices;
receiving model parameters which are transmitted by the at least two data side terminal devices and encrypted by using the public key, wherein the model parameters are obtained by training the initial user analysis model by using respective feature data and feature label data of the at least two data side terminal devices;
decrypting the encrypted model parameters by using the private key to obtain decrypted model parameters, and calculating according to the decrypted model parameters to obtain total model parameters;
and updating the model parameters of the initial user analysis model by using the total model parameters to obtain the user analysis model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for joint training of multiple data party user analysis models as claimed in any one of claims 1 to 5.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the multi-data-party user analysis model joint training according to any one of claims 1 to 5.
CN202010370875.6A 2020-04-30 2020-04-30 Multi-data-party user analysis model joint training method and device and storage medium Pending CN111611601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010370875.6A CN111611601A (en) 2020-04-30 2020-04-30 Multi-data-party user analysis model joint training method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010370875.6A CN111611601A (en) 2020-04-30 2020-04-30 Multi-data-party user analysis model joint training method and device and storage medium

Publications (1)

Publication Number Publication Date
CN111611601A true CN111611601A (en) 2020-09-01

Family

ID=72199560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010370875.6A Pending CN111611601A (en) 2020-04-30 2020-04-30 Multi-data-party user analysis model joint training method and device and storage medium

Country Status (1)

Country Link
CN (1) CN111611601A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326521A (en) * 2021-06-11 2021-08-31 杭州煋辰数智科技有限公司 Data source joint modeling method based on safe multi-party calculation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326521A (en) * 2021-06-11 2021-08-31 杭州煋辰数智科技有限公司 Data source joint modeling method based on safe multi-party calculation

Similar Documents

Publication Publication Date Title
CN112347214B (en) Target area dividing method and device, electronic equipment and storage medium
CN112949760A (en) Model precision control method and device based on federal learning and storage medium
CN112732297B (en) Method and device for updating federal learning model, electronic equipment and storage medium
CN111008863A (en) Lottery drawing method and system based on block chain
CN112651035A (en) Data processing method, device, electronic equipment and medium
CN114124502B (en) Message transmission method, device, equipment and medium
CN111695097A (en) Login checking method and device and computer readable storage medium
CN111612458A (en) Method and device for processing block chain data and readable storage medium
CN113868529A (en) Knowledge recommendation method and device, electronic equipment and readable storage medium
CN106712928A (en) Big data rainbow table based decryption method and device
CN114417374A (en) Intelligent contract business card method, device, equipment and storage medium based on block chain
CN114881616A (en) Business process execution method and device, electronic equipment and storage medium
CN115795517A (en) Asset data storage method and device
CN111611601A (en) Multi-data-party user analysis model joint training method and device and storage medium
CN112217639B (en) Data encryption sharing method and device, electronic equipment and computer storage medium
CN115643090A (en) Longitudinal federal analysis method, device, equipment and medium based on privacy retrieval
CN114781940B (en) Carbon transaction management method and device
CN116340918A (en) Full-secret-text face comparison method, device, equipment and storage medium
CN116192386A (en) Multi-platform intercommunication method and device based on blockchain privacy calculation
CN115170286A (en) Anonymous query method and device for blacklist user, electronic equipment and storage medium
CN114629663A (en) Block chain-based digital commodity transaction method and device
CN115603965A (en) Data transmission method and device based on privacy calculation, electronic equipment and medium
CN115473739A (en) Data processing method, device and equipment based on block chain and storage medium
CN114969651B (en) Intelligent wind control system construction method and device based on big data AI technology
CN113343288B (en) Block chain intelligent contract security management system based on TEE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination