CN110378749B

CN110378749B - Client similarity evaluation method and device, terminal equipment and storage medium

Info

Publication number: CN110378749B
Application number: CN201910681352.0A
Authority: CN
Inventors: 魏锡光; 李�权; 曹祥; 刘洋; 陈天健; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2023-09-26
Anticipated expiration: 2039-07-25
Also published as: CN110378749A

Abstract

The invention discloses a method, a device, a terminal device and a storage medium for evaluating user data similarity, wherein the method for evaluating the user data similarity comprises the following steps: the server side obtains pre-stored sample data or sample data of each client side to serve as first sample data; combining the first sample data and the second sample data generated by the server to form a test sample set, and testing the test sample set; and evaluating the similarity of the user data of each client based on the test result obtained by the test of the test sample set. The invention realizes that the similarity of the user data of the client is evaluated under the condition that the client does not contact with the real user data of the federal learning client, improves the knowledge of the federal learning system to the user, ensures the safety of the user data, and promotes the federal learning system to provide high-quality service for the user in a targeted manner.

Description

Client similarity evaluation method and device, terminal equipment and storage medium

Technical Field

The present invention relates to the technical field of Fintech (financial technology), and in particular, to a method, an apparatus, a terminal device, and a storage medium for evaluating client similarity.

Background

With rapid development of financial technologies, particularly internet financial technologies, there have been increasing applications of technologies in the financial field, wherein federal learning technology is receiving increasing attention based on security guarantee of user privacy and data.

Federal learning (federated learning) refers to a method of machine learning modeling by joining different participants, or party, also known as data owners, or clients. In federal learning, participants do not need to expose their own data to other participants and coordinators (also called servers, parameter servers, or aggregation servers (aggregation server)), so federal learning can well protect user privacy and data security, and can solve the problem of data islanding.

However, in the existing federal learning, especially in the transverse federal learning (the transverse federal learning is that when samples of different institutions overlap less, but feature dimensions overlap more, the data of the parts with identical features of the multiparty users and the users are not identical are extracted for training), based on the safety consideration of the federal learning mechanism for the user data, the service end of the federal learning cannot contact the original data of the client user, so that the knowledge of the service end to the client user in the federal learning is greatly limited, and the service end of the federal learning is difficult to provide high-quality service for the client user in a targeted manner.

Disclosure of Invention

The invention mainly aims to provide a method, a device, a terminal device and a storage medium for evaluating the similarity of clients, and aims to evaluate the similarity of the clients under the condition of not contacting user data of federal learning clients, so that the knowledge of a federal learning system to the users is improved, and the federal learning system is promoted to provide high-quality services for the users in a targeted manner.

In order to achieve the above object, the present invention provides a method for evaluating client similarity, where the method for evaluating client similarity is applied to a federal learning system, and the federal learning system includes: the method for evaluating the similarity of the clients comprises the following steps:

the server side obtains pre-stored sample data or sample data of each client side to serve as first sample data;

combining the first sample data and the second sample data generated by the server to form a test sample set, and testing the test sample set;

and evaluating the similarity of the user data of each client based on the test result obtained by the test of the test sample set.

Optionally, the step of obtaining, by the server, pre-stored sample data or sample data of each client as the first sample data includes:

the server detects a pre-stored sample data set;

acquiring sample data based on a random sampling mode from the sample data set as first sample data; or alternatively, the process may be performed,

the server acquires sample data randomly input by each client as the first sample data.

Optionally, after the step of obtaining the pre-stored sample data by the server or obtaining the sample data of each client as the first sample data, the method further includes:

the server generates second sample data based on the first sample data.

Optionally, the step of generating, by the server, second sample data based on the first sample data includes:

and the server randomly adds noise in the acquired first sample data and/or randomly perturbs the first sample data to generate the second sample data.

Optionally, the step of combining the first sample data and the second sample data generated by the server into a test sample set includes:

Extracting first target sample data and second target sample data from the first sample data and the second sample data according to a preset proportion;

and combining the first target sample data and the second target sample data to obtain a test sample set of the first sample data and the second sample data.

Optionally, the step of testing the test sample set includes:

the server side invokes a machine learning model of each client side;

training test is performed on the first target sample data and the second target sample data in the test sample set based on each machine learning model.

Optionally, the step of evaluating the similarity of the user data of each client based on the test result obtained by testing the test sample set includes:

the server records each test result of training test of each machine learning model;

sequentially extracting any two test results, and calculating the similarity of the user data based on a similarity evaluation function; or alternatively, the process may be performed,

and performing unsupervised clustering on the obtained test results to evaluate the similarity of the user data.

In addition, the invention also provides a device for evaluating the similarity of the client, which is applied to a federal learning system, and the federal learning system comprises: the device for evaluating the similarity of the clients comprises a server and a plurality of clients, wherein the clients are provided with the evaluation device, and the evaluation device comprises:

the acquisition module is used for acquiring pre-stored sample data or sample data of each client side as first sample data by the server side;

the testing module is used for combining the first sample data and the second sample data generated by the server into a testing sample set and testing the testing sample set;

and the evaluation module is used for evaluating the similarity of the user data of each client based on the test result obtained by the test of the test sample set.

In addition, the invention also provides a terminal device, which comprises: the system comprises a memory, a processor and a client similarity evaluation program stored on the memory and capable of running on the processor, wherein the client similarity evaluation program realizes the steps of the client similarity evaluation method when being executed by the processor.

In addition, the invention also provides a storage medium, which is applied to a computer, wherein the storage medium stores a client similarity evaluation program, and the client similarity evaluation program realizes the steps of the client similarity evaluation method when being executed by a processor.

The method comprises the steps of obtaining pre-stored sample data or obtaining sample data of each client side to serve as first sample data through the server side; combining the first sample data and the second sample data generated by the server to form a test sample set, and testing the test sample set; and evaluating the similarity of the user data of each client based on the test result obtained by the test of the test sample set. In the federal learning system, user data which is stored in advance and is irrelevant to each client connected with the current server is collected based on the server, or the server collects user data which is temporarily and randomly input and is used as first sample data for evaluating the similarity of each client, the collected first sample data is combined with second sample data generated by the current server to form a test sample set for waiting for testing of the server, after the server tests the test sample set through a calling model to obtain a test result, the similarity of the user data of each client connected with the current server is evaluated according to the test result by using the existing arbitrary data similarity evaluation function, so that the similarity of the user data of the clients is evaluated under the condition that the user data of the federal learning client is not contacted, the knowledge of the federal learning system to the user is improved, the safety of the user data is ensured, and the federal learning system is promoted to provide high-quality service for the user in a targeted manner.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for evaluating client similarity according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for evaluating client similarity according to the present invention;

FIG. 4 is a schematic diagram of an application scenario in an embodiment of a method for evaluating client similarity according to the present invention;

FIG. 5 is a block diagram of a client similarity evaluation system according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.

It should be noted that fig. 1 may be a schematic structural diagram of a hardware operating environment of a terminal device. The terminal equipment of the embodiment of the invention can be PC, portable computer and other terminal equipment.

As shown in fig. 1, the terminal device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the terminal device structure shown in fig. 1 is not limiting of the terminal device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and an evaluation program of client similarity may be included in a memory 1005 as one type of computer storage medium. The operating system is a program for managing and controlling hardware and software resources of the sample terminal equipment, and supports the operation of evaluation programs of client similarity and other software or programs.

The terminal device shown in fig. 1 and other terminals together form a federal learning system, where the federal learning system at least includes a service end and a plurality of clients, and in the terminal device shown in fig. 1, the user interface 1003 is mainly used to perform data communication with each terminal; the network interface 1004 is mainly used for connecting a background server and carrying out data communication with the background server; and the processor 1001 may be configured to call an evaluation program of client similarity stored in the memory 1005, and perform the following operations:

Further, the processor 1001 may be further configured to invoke the client similarity evaluation program stored in the memory 1005, and perform the following steps:

the server detects a pre-stored sample data set;

Further, the processor 1001 may be further configured to invoke an evaluation program of client similarity stored in the memory 1005, and after executing the server to obtain pre-stored sample data or obtain sample data of each client as first sample data, execute the following steps:

the server generates second sample data based on the first sample data.

the server side invokes a machine learning model of each client side;

The server records each test result of training test of each machine learning model; sequentially extracting any two test results, and calculating the similarity of the user data based on a similarity evaluation function; or alternatively, the process may be performed,

Based on the above structure, various embodiments of the method for evaluating client similarity of the present invention are presented.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a method for evaluating client similarity according to the present invention.

The embodiments of the present invention provide embodiments of a method for evaluating client similarity, it being noted that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in a different order than that illustrated herein.

The method for evaluating the similarity of the client according to the embodiment of the present invention is applied to the above terminal device, and the terminal device according to the embodiment of the present invention may be a terminal device such as a PC, a portable computer, etc., which is not limited herein, and further, the method for evaluating the similarity of the client according to the present invention is applied to a federal learning system, which is an application scenario of the method for evaluating the similarity of the client according to the present invention shown in fig. 4, where the federal learning system shown in the scenario includes at least one server and a plurality of clients.

The method for evaluating the similarity of the client side in the embodiment comprises the following steps:

in step S100, the server obtains pre-stored sample data or obtains sample data of each client as first sample data.

The server side in the federal learning system acquires user data which is irrelevant to each client user connected with the server side in the current federal learning system, and the user data is used as first sample data for testing and evaluating the similarity of the user data of each current client.

In the embodiment, the evaluation method of the client similarity is applied to a federal learning system, and is particularly suitable for transverse federal learning in federal learning. The horizontal federation learning is to take out the data of the client users with the same data characteristics and the users with different data characteristics from each other to perform the federation machine learning when the data characteristics of the clients (users) overlap more and the data of the clients (users) overlap less. For example, in federal learning formed by banks in two different regions, because the user groups of the two banks come from the regions where the two banks are located respectively, the intersection of the users is very small, but because the business of the banks is very similar, the characteristics of the user data recorded by the two banks are mostly the same, and then the federal learning model can be constructed by using the federal learning in the lateral direction in federal learning to predict the behaviors of the clients of the two banks, so as to provide services for the two banks.

Specifically, for example, in one federal learning system represented by the scenario shown in fig. 4, when the server receives an instruction for evaluating the similarity of the user data of the client 1 to the client 6,6 client users in the current federal learning system, the server starts to acquire 6 pieces of first sample data that are not related to the data characteristics of the current 6 client users (for example, the industry domain to which the 6 client users belong is a banking domain, then the current server may acquire other pieces of user data such as 6 pieces of user data in the e-commerce industry domain as the first sample data) for evaluating the similarity of the user data of the current 6 client users.

Further, step S100 includes:

in step S101, the server detects a pre-stored sample data set.

The server detects a sample data set which is stored in advance on the current server and used for evaluating the similarity of user data of all clients in the federal learning system.

In this embodiment, the sample data set pre-stored by the server may include user data that is not related to user data of each client connected to the server in the current federal learning system, for example, the industry domain to which each client connected to the server in the current federal learning system belongs is a banking domain, and the sample data set pre-stored by the server may include other user data in the e-commerce industry domain.

Step S102, obtaining sample data from the sample data set based on a random sampling manner as first sample data of each client.

The server side extracts the sample data with the same number as the number of the clients connected with the current server side from the detected sample data set based on a random sampling mode, and the sample data are used as first sample data for evaluating the similarity of the user data of each client.

Specifically, for example, in a federal learning system represented by the scenario shown in fig. 4, after the server detects a pre-stored sample data set, from among other user data included in the sample data set and different from the banking domain to which each client user currently belongs, such as in the e-commerce domain, or from among a certain number of user data temporarily randomly input by a research, development and maintenance person of the current federal learning system, in order to evaluate the similarity of the client user data, 6 sample data as many as the current 6 clients are proposed based on the existing random sampling method, and the 6 sample data thus extracted are used as first sample data for evaluating the similarity between the respective user data of the 6 clients connected to the current server.

Step S103, the server acquires sample data randomly input by each client as the first sample data.

The server acquires a certain amount of user book data which are temporarily and randomly input for evaluating the similarity of the user data of the client by research, development and maintenance personnel of the current federal learning system, and the user book data are used as first sample data.

Step S200, combining the first sample data with the second sample data generated by the server to form a test sample set, and testing the test sample set.

The server generates second sample data with the same number as the number of the clients connected with the current server, mixes the acquired first sample data with the generated second sample data to form a test sample set, and tests the first sample data and the second sample data in the test sample set.

Specifically, for example, in a federal learning system represented by the scenario shown in fig. 4, when the server side obtains 6 first sample data, which are not related to the data characteristics of the current 6 client users, from the sample data set stored in advance in the server side based on the received instruction for evaluating the similarity of the user data of the client 1 to the client 6,6 client users in the current federal learning system, the server side generates 6 second sample data, which are the same in number as the 6 clients connected to the current server side, and mixes the obtained 6 first sample data with the generated 6 second sample data to form a test sample set required for testing the first sample data and the second sample data before evaluating the similarity of the user data of each client, and immediately tests the first sample data and the second sample data in the test sample set after the current server side detects that the test sample set is combined.

Further, in step S200, the step of combining the first sample data with the second sample data generated by the server into a test sample set includes:

step S201, extracting first target sample data and second target sample data from the first sample data and the second sample data according to a preset ratio.

Step S202, combining the extracted first target sample data and the second target sample data to obtain a test sample set of the first sample data and the second sample data.

When the server side obtains the same number of sample data as the number of clients connected with the server side in the federation learning system from a sample data set stored in advance by the server side based on a received instruction for evaluating the similarity of user data of the clients in the federation learning system, the server side extracts the first target sample data and the second target sample data according to a preset proportion from the obtained first sample data and the second sample data generated by the current server side (for example, in a 1:1 proportion relation, extracts all the obtained first sample data as the first target sample data, and extracts all the generated 6 second sample data as the second target sample data), and combines the extracted first target sample data with the second target sample data, so as to obtain a data set of the first sample data and the second sample data, and the server side marks the data set obtained by mixing the first sample data and the second sample data as a test sample set for the current server side to call a model to test the first sample data and the second sample data.

In this embodiment, the preset ratio is that the server side is based on the evaluation of the client similarity requirement, and the preset ratio relation between the number of the first sample data and the number of the second sample data is extracted, so that in order to obtain a result of evaluating the client similarity more accurately, the present invention may also use other numerical ratios as the preset ratio to extract the first target sample data and the second target sample data for evaluating the client similarity, that is, the specific numerical values of the preset ratio should not be limited.

Further, in step S200, the step of testing the test sample set includes:

in step S203, the server invokes the machine learning model of each client.

Step S204, performing a training test on the first target sample data and the second target sample data in the test sample set based on each machine learning model.

The server side collects and retrieves local machine learning models of all client terminals connected with the current server side in the current federal learning system, and performs training test on first target sample data and second target sample data in the test sample set based on the collected and retrieved local machine learning models of all client terminals.

Specifically, for example, in a federal learning system represented by a scenario shown in fig. 4, a server collects 6 local machine training models of 6 clients connected to the server, and sequentially invokes 1 local machine learning model of the 6 local machine training models, and randomly selects 1 first target sample data and 1 second target sample data in a test sample set to perform a local training test, until the 6 local machine training models all complete the local training test on the first target sample data and the second target sample data in the test sample set.

And step S300, evaluating the similarity of the user data of each client based on the test result obtained by testing the test sample set.

The server generates second sample data with the same number as the number of the clients connected with the current server, combines the first sample data and the second sample data to form a test sample set, tests the first sample data and the second sample data in the test sample set, and evaluates the similarity of the user data of each client based on test results obtained by the test.

Specifically, for example, in a federal learning system represented by a scenario shown in fig. 4, when a server obtains 6 first sample data that are not related to data features of the current 6 client users from a sample data set stored in advance in the server based on a received instruction for evaluating similarity of user data of the client users in the current federal learning system, the server generates 6 second sample data with the same number as 6 clients connected to the current server, extracts 6 first target sample data and 6 second target sample data according to a 1:1 number ratio, combines the extracted first target sample data and second target sample data to form a test sample set required for testing the first sample data and the second sample data, and immediately after the current server detects that the test sample set is combined, tests the first target sample data and the second target sample data in the test sample set, records test results of each local training model for testing the first target sample data and the second target sample data, and performs an arbitrary clustering function based on the test results of each local training model for evaluating similarity of the user data.

Further, step S300 includes:

step S301, the server records each test result of the training test performed on each machine learning model.

The server in the current federal learning system records the process of training and testing the first sample data and the second sample data in the test sample set on the local machine learning training models of all clients, thereby recording the test results of the training and testing the first sample data and the second sample data by all the local machine learning training models.

Specifically, for example, in a federal learning system represented by a scenario shown in fig. 4, 1 local machine learning model in 6 local machine training models of 6 clients is sequentially called at a server, and in a process of randomly selecting 1 first target sample data and 1 second target sample data in a test sample set to perform a local training test, results of training test on each target first sample data and a target random sample by each local machine training model are recorded, so that after the local training test on the first target sample data and the second target sample data in the test sample set is completed by each of the 6 local machine training models, 6 test results of training test on the first target sample data and the second target sample data in the test sample set are obtained by each of the 6 local machine learning training models.

Step S302, any two test results are extracted successively, and the similarity of the user data is calculated based on a similarity evaluation function.

The server side sequentially and arbitrarily extracts two test results from all test results of the recorded local machine training models for training and testing the first sample data and the second sample data, and calculates by using the existing arbitrary data similarity evaluation function so as to obtain calculation results for evaluating the similarity of user data of the client sides corresponding to the two test results.

Specifically, for example, the server sequentially extracts two test results (sequentially extracts a first test result and a second test result, a third test result of the first test result, a first test result and a fourth test result until each test result is combined with other test results) in a combined form from 6 test results obtained by recording 6 local machine learning training models, sequentially performs training tests on the first target sample data and the second target sample data in the test sample set, calculates the two test results by using the existing arbitrary data similarity evaluation function, and evaluates the similarity of clients of the current 6 clients in a one-to-one correspondence manner based on the 15 calculated results, namely, the client similarity of the client corresponding to the group of test results with the largest calculated result is the highest.

Step S303, performing unsupervised clustering on the obtained test results to evaluate the similarity of the user data.

And performing unsupervised clustering on all test results of the recorded local machine training models for training test on the first sample data and the second sample data, so as to obtain the similarity of the user data of the client corresponding to any two test results.

Specifically, for example, when the number of clients connected to the server of the federal learning system is so large that the test results are calculated by combining the data similarity evaluation functions, and a large amount of resources or time is required to be consumed, the server directly performs unsupervised clustering on a large number of test results obtained by performing training test on the first target sample data and the second target sample data in the test sample set sequentially from the local machine learning training model obtained by recording, so that the similarity between the user data of a large number of clients connected to the server of the federal learning system is evaluated.

The method comprises the steps that user data which are irrelevant to users of all clients connected with a server in a current federal learning system are obtained through the server in the federal learning system and serve as first sample data for testing and evaluating the similarity of the user data of all the clients at present, the server generates second sample data with the same number as that of the clients connected with the current server, the obtained first sample data are mixed with the generated second sample data to form a test sample set, the first sample data and the second sample data in the test sample set are tested, the server generates second sample data with the same number as that of the clients connected with the current server, the obtained first sample data and the generated second sample data are mixed to form the test sample set, the first sample data and the second sample data in the test sample set are tested, and the similarity of the user data of all the clients is evaluated based on test results obtained through testing.

According to the federal learning system, the similarity of the user data of the client is evaluated under the condition that the client does not contact the real user data of the federal learning client, so that the knowledge of the federal learning system to the user is improved, the safety of the user data is ensured, and the federal learning system is promoted to provide high-quality service for the user in a targeted manner.

Further, a second embodiment of the method for evaluating client similarity of the present invention is presented.

Referring to fig. 3, fig. 3 is a flowchart of a second embodiment of the method for evaluating the similarity of parameters according to the present invention, in this embodiment, after the step S100 of obtaining pre-stored sample data or obtaining sample data of each client as the first sample data, the method for evaluating the similarity of clients according to the present invention further includes:

in step S400, the server generates second sample data based on the first sample data.

After acquiring user data irrelevant to each client user connected with a server in the current federal learning system, the server serves as first sample data for testing and evaluating the similarity of the current client user data, and then generates second sample data with the same quantity as the first sample data according to the acquired first sample data.

Specifically, for example, in a federal learning system represented by the scenario shown in fig. 4, after the server obtains 6 pieces of first sample data that are not related to the data features of the current 6 client users as the first sample data based on the received instruction for evaluating the similarity of the user data of the client users in the current federal learning system, the server further generates 6 pieces of second sample data one by one according to the 6 pieces of user data.

Further, step S400 includes:

in step S401, the server randomly adds noise to the obtained first sample data, and/or randomly perturbs the first sample data to generate the second sample data.

After the server acquires the first sample data from the pre-stored sample data set, randomly adding data noise into each acquired first sample data, and/or randomly performing data disturbance on each acquired first sample data in sequence, so as to generate second sample data with the same quantity as each first sample data.

Specifically, for example, in a federal learning system represented by a scenario shown in fig. 4, after a server side randomly adds data noise to the extracted sample data in sequence to perform redundancy processing on the sample data after the sample data is taken as first sample data for evaluating the similarity between the respective user data and a client connected to the current server side, or the server side randomly perturbs the extracted sample data in sequence to perform scrambling processing on the sample data, or the server side randomly perturbs the sample data while randomly adding data noise to the extracted sample data to perform redundancy processing on the sample data in sequence to perform redundancy processing on the sample data in a certain number of user data of the current federal learning system, which are included in a pre-stored sample data set, other user data such as in the field of e-commerce to which the current client user belongs.

In this embodiment, the manner in which the server generates the second sample data is not limited to adding data noise or performing data disturbance to the extracted sample data, and the server may also generate the second sample data by performing processing such as cutting, disorder, or the like on the sample data, or performing a combination operation of the processing such as cutting, disorder, or the like.

According to the invention, through the server side of the federal learning system, after user data which is irrelevant to each client side user connected with the server side in the current federal learning system is obtained and is used as first sample data for testing and evaluating the similarity of the user data of each current client side, data cutting, noise adding, disorder and/or disturbance and the like are carried out according to each obtained first sample data so as to generate second sample data with the same quantity as the first sample data, and therefore the first sample data collected by the server side and the generated second sample data are combined to form a test sample set so as to be used by the server side to call each client side local machine learning model for testing, and then the similarity of the user data of each client side is evaluated. The method and the device have the advantages that the similarity of the client in the federal learning system can be evaluated based on the first sample data and the randomly generated sample data, so that the federal learning system can be improved to know the client user, accurate service is provided for the user more pertinently, the user does not need to touch the real original data of the user, and the safety of the user data is ensured.

In addition, referring to fig. 5, an embodiment of the present invention further provides a client similarity evaluation device, where the client similarity evaluation device is applied to a federal learning system, and the federal learning system includes: the device for evaluating the similarity of the clients comprises a server and a plurality of clients, wherein the clients are provided with the evaluation device, and the evaluation device comprises:

Preferably, the acquisition module comprises:

the first acquisition unit is used for detecting a pre-stored sample data set by the server;

the first obtaining unit is further configured to obtain, from the sample data set, sample data based on a random sampling manner, as first sample data of each client; or alternatively, the process may be performed,

The second acquisition unit is used for acquiring the sample data randomly input by each client side by the server side as the first sample data.

Preferably, the device for evaluating the similarity of the clients further includes:

and the generation module is used for generating second sample data based on the first sample data by the server.

Preferably, the generating module includes:

the generating unit is used for randomly adding noise into the acquired first sample data and/or randomly disturbing the first sample data by the server side so as to generate the second sample data.

Preferably, the test module comprises:

a data extraction unit, configured to extract first target sample data and second target sample data from the first sample data and the second sample data according to a preset ratio;

and the data combination unit is used for combining the extracted first target sample data and the second target sample data to obtain a test sample set of the first sample data and the second sample data.

Preferably, the test module further comprises:

the calling unit is used for calling the machine learning model of each client side by the server side;

And the test unit is used for carrying out training test on the first target sample data and the second target sample data in the test sample set based on each machine learning model.

Preferably, the evaluation module comprises:

the test result acquisition unit is used for recording each test result of training test of each machine learning model by the server;

the first evaluation unit is used for successively extracting any two test results and calculating the similarity of the user data based on a similarity evaluation function; or alternatively, the process may be performed,

and the second evaluation unit is used for performing unsupervised clustering on the obtained test results so as to evaluate the similarity of the user data.

The steps of the method for evaluating the parameter similarity when each module of the device for evaluating the client similarity provided in this embodiment runs are not described herein.

In addition, the embodiment of the invention also provides a storage medium which is applied to a computer, namely the storage medium is a computer readable storage medium, the storage medium stores a client similarity evaluation program, and the client similarity evaluation program realizes the steps of the client similarity evaluation method when being executed by a processor.

The method implemented when the evaluation program of the client similarity running on the processor is executed may refer to various embodiments of the evaluation method based on the client similarity according to the present invention, which are not described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The method for evaluating the similarity of the clients is characterized in that the method for evaluating the similarity of the clients is applied to a federal learning system, and the federal learning system comprises: the method for evaluating the similarity of the clients comprises the following steps:

the server side obtains pre-stored sample data or randomly input sample data of each client side to serve as first sample data; the sample data is user data which is not related to each client connected with the server, and the sample data is not real user data of each client;

combining the first sample data and the second sample data which are generated by the server and have the same number as the second sample data generated by the client connected with the server into a test sample set, and calling a local machine learning training model of each client to carry out a local training test on the test sample set;

And evaluating the similarity of each client by using a test result obtained by testing the similarity evaluation function based on the test sample set.

2. The method for evaluating the similarity of clients according to claim 1, wherein the step of the server obtaining pre-stored sample data or obtaining sample data randomly input by each client as the first sample data comprises:

the server detects a pre-stored sample data set;

3. The method for evaluating the similarity of clients according to claim 1, wherein after the step of the server obtaining pre-stored sample data or obtaining sample data randomly inputted by each client as first sample data, the method further comprises:

the server generates second sample data based on the first sample data.

4. The method for evaluating the similarity of clients according to claim 3, wherein the step of generating second sample data by the server based on the first sample data includes:

5. The method for evaluating the similarity of clients according to claim 1, wherein the step of combining the first sample data and the server-side generating the same number of second sample data as the server-side connection clients into a test sample set includes:

6. The method for evaluating client similarity according to claim 5, wherein said step of performing a local training test on said test sample set comprises:

the server side invokes a machine learning model of each client side;

7. The method for evaluating the similarity of clients according to claim 6, wherein the step of evaluating the similarity of the clients based on the test result obtained by the test performed by the test sample set comprises:

sequentially extracting any two test results, and calculating the similarity of the client based on a similarity evaluation function; or alternatively, the process may be performed,

and performing unsupervised clustering on the obtained test results to evaluate the similarity of the clients.

8. The device for evaluating the similarity of the clients is characterized in that the device for evaluating the similarity of the clients is applied to a federal learning system, and the federal learning system comprises: the device for evaluating the similarity of the clients comprises a server and a plurality of clients, wherein the clients are provided with the evaluation device, and the evaluation device comprises:

the acquisition module is used for acquiring pre-stored sample data or acquiring sample data randomly input by each client side as first sample data by the server side; the sample data is user data which is not related to each client connected with the server, and the sample data is not real user data of each client;

The testing module is used for combining the first sample data and the second sample data which are generated by the server and have the same quantity as the second sample data generated by the client connected with the server into a testing sample set, and calling a local machine learning training model of each client to carry out a local training test on the testing sample set;

and the evaluation module is used for evaluating the similarity of each client by utilizing a similarity evaluation function based on a test result obtained by testing the test sample set.

9. A terminal device, characterized in that the terminal device comprises: memory, a processor and a client similarity evaluation program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the client similarity evaluation method according to any one of claims 1 to 7.

10. A storage medium, characterized in that it is applied to a computer, on which a client similarity evaluation program is stored, which when executed by a processor, implements the steps of the client similarity evaluation method according to any one of claims 1 to 7.